Cryptography Basics
1. What is cryptography?
Cryptography is the practice and study of techniques for securing communication and data so that only authorised parties can read or modify it. It transforms readable information (plaintext) into an unreadable form (ciphertext) using mathematical algorithms and keys, and reverses the process for legitimate recipients. Modern cryptography underpins everything from HTTPS web traffic and online banking to messaging apps and password storage.
2. What are the core goals of cryptography?
The four foundational goals are confidentiality (keeping data secret from unauthorised parties), integrity (ensuring data has not been altered), authentication (verifying the identity of a party or the origin of a message), and non-repudiation (preventing a sender from denying they sent a message). These are sometimes summarised alongside the CIA triad of Confidentiality, Integrity, and Availability. Different cryptographic primitives address different goals — encryption provides confidentiality, hashing supports integrity, and digital signatures provide authentication and non-repudiation.
3. What is the difference between plaintext and ciphertext?
Plaintext is the original, human-readable message or data before any cryptographic transformation is applied. Ciphertext is the scrambled, unintelligible output produced after the plaintext is encrypted with an algorithm and a key. The entire purpose of an encryption scheme is that ciphertext reveals nothing useful about the plaintext without access to the correct decryption key.
4. What is the difference between encoding, encryption, and hashing?
Encoding (such as Base64) transforms data into another format for safe transport or storage and is fully reversible without any key — it offers no security. Encryption transforms data into ciphertext using a key and is reversible only with the correct key, providing confidentiality. Hashing produces a fixed-length, one-way digest that cannot be reversed back to the original input, and is used for integrity verification and password storage rather than secrecy of recoverable data.
5. What is a cryptographic key?
A cryptographic key is a piece of secret information, usually a string of bits, that controls the output of a cryptographic algorithm. The same algorithm with different keys produces completely different ciphertext, so the security of a well-designed system depends on the secrecy and strength of the key rather than the secrecy of the algorithm. Key length is measured in bits, and longer keys generally provide stronger resistance to brute-force attacks.
6. What is Kerckhoffs's principle?
Kerckhoffs's principle states that a cryptographic system should remain secure even if everything about the system, except the key, is public knowledge. In other words, security must rest on the secrecy of the key alone, not on the obscurity of the algorithm. This principle encourages open, peer-reviewed algorithms such as AES and RSA, because publicly scrutinised designs are far more trustworthy than secret, unaudited ones.
7. What is the difference between a cipher and a code?
A cipher operates at the level of individual characters or bits, transforming data through systematic mathematical or algorithmic substitution and transposition regardless of meaning. A code replaces entire words, phrases, or concepts with other symbols using a codebook, operating at the level of meaning. Modern cryptography is built almost entirely on ciphers because they are general-purpose and can secure any binary data, whereas codes are largely historical.
8. What is the difference between a block cipher and a stream cipher?
A block cipher encrypts data in fixed-size blocks (for example, AES uses 128-bit blocks), applying the same transformation to each block under the control of a mode of operation. A stream cipher encrypts data one bit or byte at a time by combining the plaintext with a pseudorandom keystream, typically using XOR. Block ciphers are common for files and storage, while stream ciphers like ChaCha20 are well suited to real-time or streaming data where padding a block would be inefficient.
9. What is a nonce and why is it important?
A nonce ("number used once") is a value that should never be repeated for a given key in a cryptographic operation. It ensures that encrypting the same plaintext twice produces different ciphertext and that operations remain unique, which prevents replay attacks and frequency analysis. Reusing a nonce with the same key in modes like CTR or GCM can catastrophically break the security of the scheme, potentially exposing the plaintext or the authentication key.
10. What is an initialization vector (IV)?
An initialization vector is a random or unpredictable value used to seed an encryption operation so that identical plaintexts encrypt to different ciphertexts each time. In block cipher modes such as CBC, the IV is combined with the first block before encryption and must be unpredictable to attackers. The IV does not need to be secret and is usually transmitted alongside the ciphertext, but reusing or making it predictable can leak information about the data.
Symmetric Encryption
11. What is symmetric encryption?
Symmetric encryption uses a single shared secret key for both encrypting and decrypting data, meaning the sender and receiver must both possess the same key. It is very fast and efficient, making it ideal for encrypting large volumes of data such as files, disks, and network streams. Its main challenge is key distribution: securely sharing the secret key between parties without it being intercepted.
12. What is AES and why is it widely used?
AES (Advanced Encryption Standard) is a symmetric block cipher standardised by NIST in 2001 that operates on 128-bit blocks with key sizes of 128, 192, or 256 bits. It is widely used because it is fast in both hardware and software, has withstood decades of intense cryptanalysis without practical breaks, and is supported by dedicated CPU instructions (AES-NI) for high performance. It secures everything from TLS connections and VPN tunnels to full-disk encryption like BitLocker and FileVault.
13. What is the difference between DES, 3DES, and AES?
DES (Data Encryption Standard) is an older 56-bit-key block cipher now considered insecure because its short key can be brute-forced in hours. 3DES (Triple DES) applies the DES algorithm three times to strengthen it, but it is slow and being deprecated due to block-size weaknesses such as the Sweet32 attack. AES superseded both with larger keys, a larger and more efficient design, and strong security, and is the recommended standard for new systems.
14. What are block cipher modes of operation?
Modes of operation define how a block cipher processes data larger than a single block, and they significantly affect security. ECB (Electronic Codebook) encrypts each block independently and is insecure because identical plaintext blocks produce identical ciphertext, leaking patterns. CBC (Cipher Block Chaining) chains blocks using an IV, while CTR (Counter) turns a block cipher into a stream cipher, and GCM (Galois/Counter Mode) adds built-in authentication.
15. Why is ECB mode considered insecure?
ECB mode encrypts each block of plaintext independently with the same key, so any two identical plaintext blocks always produce identical ciphertext blocks. This means structural patterns in the data remain visible in the ciphertext — the classic example is an encrypted bitmap image where the original outline is still recognisable. Because it leaks information about the plaintext, ECB should never be used for anything beyond a single block of random data.
16. What is authenticated encryption (AEAD)?
Authenticated Encryption with Associated Data (AEAD) provides confidentiality, integrity, and authenticity in a single operation, so it both encrypts the data and produces an authentication tag that detects tampering. Modes like AES-GCM and ChaCha20-Poly1305 are AEAD schemes that prevent an attacker from modifying ciphertext without detection. AEAD is strongly preferred today because using encryption without integrity protection leaves systems vulnerable to padding-oracle and bit-flipping attacks.
17. What is the main challenge of symmetric encryption?
The primary challenge is secure key distribution: because the same secret key encrypts and decrypts, both parties must obtain that key without it being exposed to attackers. In a system with many participants, the number of required keys grows rapidly, since each pair of communicating parties needs its own shared key. This key-management problem is precisely what asymmetric encryption and key-exchange protocols were designed to solve.
18. What is padding and why is it needed in block ciphers?
Padding adds extra bytes to plaintext so that its length becomes an exact multiple of the cipher's block size, which block modes like CBC require. A common scheme is PKCS#7, which appends bytes whose value equals the number of padding bytes added. Padding must be handled carefully because mishandled padding validation can expose a padding-oracle vulnerability that lets attackers decrypt data without the key.
19. What is ChaCha20 and when is it preferred over AES?
ChaCha20 is a modern stream cipher, often paired with the Poly1305 authenticator to form an AEAD scheme. It is preferred over AES on devices that lack hardware AES acceleration, such as many mobile and embedded systems, because it runs fast and securely in pure software and is resistant to timing side-channel attacks. Google adopted ChaCha20-Poly1305 in TLS for exactly these performance and security reasons on mobile.
20. What is a key derivation function (KDF)?
A key derivation function transforms a low-entropy secret, such as a password or a shared Diffie-Hellman value, into one or more cryptographically strong keys of the required length. Password-based KDFs like PBKDF2, bcrypt, scrypt, and Argon2 deliberately add computational cost and salt to slow down brute-force attacks. KDFs ensure that keys are well-distributed and that the same input material can safely produce multiple distinct keys for different purposes.
Asymmetric Encryption
21. What is asymmetric encryption?
Asymmetric encryption, also called public-key cryptography, uses a mathematically linked pair of keys: a public key that can be shared openly and a private key kept secret. Data encrypted with the public key can only be decrypted with the corresponding private key, which solves the key-distribution problem since no shared secret needs to be transmitted in advance. It is slower than symmetric encryption, so in practice it is typically used to exchange a symmetric session key rather than to encrypt bulk data.
22. How does RSA work at a high level?
RSA relies on the mathematical difficulty of factoring the product of two large prime numbers. A user generates two large primes, multiplies them to form a public modulus, and derives a public and private key pair from them; the public key encrypts data or verifies signatures, while the private key decrypts or signs. Its security rests on the fact that while multiplying primes is easy, factoring the resulting large number back into those primes is computationally infeasible with current technology for sufficiently large keys.
23. What is the difference between symmetric and asymmetric encryption?
Symmetric encryption uses one shared key for both encryption and decryption, is very fast, but requires a secure way to distribute that key. Asymmetric encryption uses a public/private key pair, eliminates the need to pre-share a secret, and enables digital signatures, but is much slower and limited in the size of data it can directly encrypt. Real systems combine the two in a hybrid model: asymmetric cryptography securely establishes a key, and symmetric cryptography then encrypts the actual data.
24. What is hybrid encryption?
Hybrid encryption combines the strengths of both worlds by using asymmetric cryptography to securely exchange or encrypt a randomly generated symmetric session key, then using that fast symmetric key to encrypt the bulk data. This approach gives the convenient key distribution of public-key systems while retaining the speed of symmetric ciphers for large payloads. Protocols like TLS and tools like PGP rely on exactly this pattern.
25. What is the Diffie-Hellman key exchange?
Diffie-Hellman is a protocol that lets two parties establish a shared secret key over an insecure channel without ever transmitting the key itself. Each party combines its own private value with a shared public base and exchanges intermediate results; through the properties of modular exponentiation, both independently compute the same shared secret that an eavesdropper cannot derive. It is fundamental to establishing session keys in TLS and many VPN protocols.
26. What is Elliptic Curve Cryptography (ECC)?
Elliptic Curve Cryptography is a family of public-key techniques based on the algebraic structure of elliptic curves over finite fields. Its key advantage is that it provides equivalent security to RSA with much smaller keys — a 256-bit ECC key is roughly comparable to a 3072-bit RSA key. The smaller key sizes mean faster computation, lower power use, and reduced bandwidth, making ECC especially attractive for mobile, IoT, and high-volume TLS servers.
27. What is the difference between ECDH and ECDSA?
ECDH (Elliptic Curve Diffie-Hellman) is a key-agreement protocol used to establish a shared secret between two parties over an insecure channel. ECDSA (Elliptic Curve Digital Signature Algorithm) is used to create and verify digital signatures for authentication and integrity. Both are built on elliptic curves, but ECDH solves key exchange while ECDSA solves signing, and they are often used together within the same secure protocol.
28. What is forward secrecy?
Forward secrecy (also called perfect forward secrecy) ensures that the compromise of a server's long-term private key does not allow an attacker to decrypt previously recorded sessions. It is achieved by generating a fresh, ephemeral key pair for each session — for example using ECDHE — so that each session's encryption key is independent and is discarded afterward. Without forward secrecy, an attacker who records traffic today and later steals the private key could decrypt all of that historical traffic.
29. Why is asymmetric encryption slower than symmetric encryption?
Asymmetric algorithms rely on expensive mathematical operations such as large-number modular exponentiation or elliptic-curve point multiplication, which are far more computationally intensive than the simple substitution and permutation operations used in symmetric ciphers. As a result, asymmetric operations can be hundreds or thousands of times slower than symmetric ones. This performance gap is the main reason public-key cryptography is reserved for small tasks like key exchange and signing rather than encrypting large amounts of data.
30. What is a key pair and how should the private key be protected?
A key pair consists of a public key, which is distributed freely, and a private key, which must remain known only to its owner because it can decrypt data and create signatures. If the private key is exposed, an attacker can impersonate the owner and read confidential communications, so it must be protected with strong access controls, encryption at rest, and ideally hardware-backed storage such as an HSM or a TPM. Compromised private keys must be revoked immediately to limit damage.
Hashing & Integrity
31. What is a cryptographic hash function?
A cryptographic hash function takes an input of any size and produces a fixed-length output called a digest or hash that uniquely represents the input. It is deterministic (the same input always yields the same hash), fast to compute, and infeasible to reverse. Hash functions are used for integrity checks, password storage, digital signatures, and data deduplication, with common examples being SHA-256 and SHA-3.
32. What properties should a secure hash function have?
A secure hash function must have preimage resistance (given a hash, you cannot find an input that produces it), second-preimage resistance (given an input, you cannot find a different input with the same hash), and collision resistance (you cannot find any two different inputs that hash to the same value). It should also exhibit the avalanche effect, where changing a single bit of input drastically changes the output. These properties ensure the hash reliably represents data integrity and resists forgery.
33. What is a hash collision?
A hash collision occurs when two different inputs produce the same hash output. Because hash functions map an infinite input space to a finite output space, collisions are mathematically inevitable, but a strong function makes finding them computationally infeasible. When collisions become practical, as happened with MD5 and SHA-1, attackers can forge documents or certificates, which is why those algorithms are now considered broken for security purposes.
34. Why are MD5 and SHA-1 no longer considered secure?
MD5 and SHA-1 are older hash functions that have been shown to be vulnerable to practical collision attacks, meaning attackers can deliberately craft two different inputs that produce the same hash. This breaks their use in digital signatures and certificates, since a malicious file could be made to match the hash of a legitimate one. They should be replaced with the SHA-2 family (such as SHA-256) or SHA-3 for any security-sensitive purpose.
35. What is salting and why is it used in password hashing?
A salt is a unique random value added to each password before hashing, so that even identical passwords produce different hashes. Salting defeats precomputed attacks using rainbow tables, because an attacker cannot reuse a single precomputed table across all users and must attack each salted hash individually. The salt is stored alongside the hash and does not need to be secret; its purpose is to ensure uniqueness, not confidentiality.
36. What is the difference between hashing and encryption?
Hashing is a one-way transformation that produces a fixed-length digest and cannot be reversed to recover the original input, making it suitable for integrity checks and password storage. Encryption is a two-way, reversible transformation that uses a key to convert plaintext to ciphertext and back, providing confidentiality for recoverable data. In short, you hash data you never need to read back, and you encrypt data you must later decrypt.
37. What is a HMAC?
HMAC (Hash-based Message Authentication Code) is a construction that combines a cryptographic hash function with a secret key to verify both the integrity and the authenticity of a message. Unlike a plain hash, an HMAC requires knowledge of the shared secret key to produce or verify the code, so an attacker cannot forge a valid tag without the key. It is widely used in APIs, TLS, and token signing to confirm that a message came from a trusted party and was not altered in transit.
38. What is the difference between a MAC and a digital signature?
A MAC (Message Authentication Code) uses a shared symmetric secret key, so both parties can create and verify it, providing integrity and authentication but not non-repudiation. A digital signature uses asymmetric keys, where only the holder of the private key can sign while anyone with the public key can verify, which additionally provides non-repudiation. MACs are faster and simpler, while signatures are necessary when the verifier must prove to a third party who originated the message.
39. Why should you never store passwords in plaintext?
Storing passwords in plaintext means that any breach of the database immediately exposes every user's actual password, which is often reused across other sites and leads to widespread account compromise. Instead, passwords should be transformed with a slow, salted password-hashing function such as bcrypt, scrypt, or Argon2, so that even a stolen database does not reveal the original passwords. The system should compare hashes at login rather than ever recovering the plaintext.
40. Why are general-purpose hashes like SHA-256 not ideal for storing passwords?
General-purpose hashes such as SHA-256 are designed to be extremely fast, which is a disadvantage for password storage because attackers can compute billions of guesses per second on modern hardware. Dedicated password-hashing functions like bcrypt, scrypt, and Argon2 are deliberately slow and memory-hard, drastically reducing the rate at which an attacker can test candidate passwords. They also incorporate salting and tunable cost factors so defenders can increase difficulty over time as hardware improves.
Digital Signatures & Certificates
41. What is a digital signature?
A digital signature is a cryptographic mechanism that proves the authenticity and integrity of a message and provides non-repudiation. The signer computes a hash of the message and encrypts that hash with their private key; anyone can then verify the signature by decrypting it with the signer's public key and comparing the result to a freshly computed hash of the message. If the values match, it confirms the message came from the holder of the private key and has not been altered.
42. How does a digital signature provide non-repudiation?
Non-repudiation means the signer cannot later deny having signed a message, and digital signatures achieve this because only the holder of the private key could have produced a signature that verifies against the corresponding public key. Since the private key is meant to be uniquely controlled by its owner, a valid signature is strong evidence of the owner's involvement. This is why digital signatures are legally recognised for contracts and electronic transactions.
43. What is a digital certificate?
A digital certificate is an electronic document that binds a public key to the identity of its owner, such as a domain name or organisation, and is issued and signed by a trusted Certificate Authority. It typically follows the X.509 standard and includes the subject's identity, the public key, the issuer, a validity period, and the CA's signature. Certificates let users verify that a public key genuinely belongs to the claimed entity, which is the foundation of trust in HTTPS.
44. What is a Certificate Authority (CA)?
A Certificate Authority is a trusted third party that verifies the identity of certificate applicants and issues digitally signed certificates vouching for the binding between an identity and a public key. Operating systems and browsers ship with a set of trusted root CA certificates, so any certificate that chains up to one of those roots is automatically trusted. If a CA is compromised or behaves maliciously, it can issue fraudulent certificates, which is why CA security and oversight are critical to the entire trust model.
45. What is the chain of trust in PKI?
The chain of trust is the sequence of certificates that links an end-entity certificate (such as a website's) back to a trusted root certificate. A root CA signs intermediate CA certificates, which in turn sign end-entity certificates, so a verifier can validate each signature step by step up to a root it already trusts. This hierarchy allows trust to scale, since a small number of well-protected roots can underpin trust for millions of certificates.
46. How is a certificate revoked before it expires?
A certificate may need to be revoked early if its private key is compromised or issued in error, and there are two main mechanisms for this. A Certificate Revocation List (CRL) is a periodically published list of revoked certificate serial numbers, while the Online Certificate Status Protocol (OCSP) lets a client query a responder in real time about a specific certificate's status. OCSP stapling improves performance and privacy by having the server present a recent, signed status response during the handshake.
TLS, PKI & Applied Cryptography
47. What happens during a TLS handshake?
During a TLS handshake, the client and server agree on a protocol version and cipher suite, the server presents its certificate to prove its identity, and the two parties perform a key exchange (typically ECDHE) to establish a shared symmetric session key. Once the session key is derived, both sides switch to fast symmetric encryption for the rest of the conversation, with integrity protected by an AEAD cipher or MAC. This combines asymmetric cryptography for authentication and key establishment with symmetric cryptography for bulk data.
48. What is the difference between TLS and SSL?
SSL (Secure Sockets Layer) is the original protocol for encrypting web traffic, but all its versions are now deprecated due to serious vulnerabilities such as POODLE. TLS (Transport Layer Security) is its successor and modern replacement, with TLS 1.2 and TLS 1.3 being the versions in current use; TLS 1.3 removes outdated algorithms and streamlines the handshake for better speed and security. Although people still say "SSL" colloquially, secure connections today actually use TLS.
49. What is a Public Key Infrastructure (PKI)?
A Public Key Infrastructure is the combination of hardware, software, policies, certificate authorities, and procedures needed to create, manage, distribute, store, and revoke digital certificates. It establishes trust by binding public keys to verified identities through a hierarchy of certificate authorities and supports services like encryption, digital signatures, and authentication at scale. PKI is what allows strangers on the internet to communicate securely without having pre-shared any secret.
50. What is quantum computing's threat to cryptography and what is post-quantum cryptography?
Large-scale quantum computers could run Shor's algorithm to efficiently factor large numbers and solve discrete logarithms, which would break widely used asymmetric algorithms like RSA and ECC, while Grover's algorithm would weaken symmetric keys and require larger key sizes. Post-quantum cryptography refers to new algorithms, such as lattice-based schemes like ML-KEM (Kyber), designed to remain secure against both classical and quantum attacks. Organisations are beginning to adopt these standards and "harvest now, decrypt later" concerns are driving early migration to quantum-resistant cryptography.