Introduction
This exercise demonstrates the encryption and decryption of text using the Caesar cipher. The process involves:
- Analyzing the frequency of letters in a given text.
- Encrypting the text with a Caesar cipher using a random shift.
- Decrypting the text using frequency analysis to find the shift value.
Step 1: Compile a Large Text Sample
To perform frequency analysis, we need a sufficiently large piece of text. For this example, we use "Pride and Prejudice" by Jane Austen, a public domain work.
Step 2: Analyze Letter Frequency
The frequency of each letter in the text is computed as a proportion of the total number of letters. The following Python code performs this analysis:
from collections import Counter
# Load the text and filter for letters only
filtered_text = ''.join(filter(str.isalpha, text.lower()))
letter_counts = Counter(filtered_text)
total_letters = sum(letter_counts.values())
letter_frequency = {letter: count / total_letters for letter, count in letter_counts.items()}
Step 3: Encrypt the Text
The Caesar cipher shifts each letter by a fixed value. For example, with a shift of 5, 'a' becomes 'f', 'b' becomes 'g', and so on. Wrap-around is handled for 'z'.
def caesar_cipher_encrypt(text, shift):
encrypted_text = []
for char in text:
if char.isalpha():
shifted = ord(char) + shift
if char.islower() and shifted > ord('z'):
shifted -= 26
elif char.isupper() and shifted > ord('Z'):
shifted -= 26
encrypted_text.append(chr(shifted))
else:
encrypted_text.append(char)
return ''.join(encrypted_text)
Step 4: Decrypt the Ciphertext
Frequency analysis can determine the shift value. We compare the letter frequency of the ciphertext with known English letter frequencies using the chi-squared statistic.
def caesar_cipher_decrypt(text, shift):
return caesar_cipher_encrypt(text, -shift)
# Find the shift with smallest chi-squared value
chi_squared_statistics = []
for shift in range(26):
decrypted_text = caesar_cipher_decrypt(encrypted_text, shift)
# Compute chi-squared
chi_squared = sum(
(observed - expected) ** 2 / expected
for letter, observed in decrypted_letter_frequency.items()
if (expected := english_letter_frequency.get(letter, 0))
)
chi_squared_statistics.append((shift, chi_squared))
best_shift = min(chi_squared_statistics, key=lambda x: x[1])[0]
decrypted_text = caesar_cipher_decrypt(encrypted_text, best_shift)
Results
After identifying the correct shift using frequency analysis, the original text is successfully decrypted.
Conclusion
The Caesar cipher provides a simple yet effective demonstration of encryption and decryption. While it is vulnerable to frequency analysis, it serves as an excellent introduction to cryptography.