Been reading The Code Book: The Science of Secrecy from Ancient Egypt to Quantum Cryptography by Simon Singh. It is a very fascinating read between history, cryptography, and linguistics. In this post I compile some deciphering techniques. Some are simple while others are pure genius. But first, some terminology.
plaintext: original message (notated in lowercase)
ciphertext: enciphered message (notated in capitals)
algorithm: the method of enciphering a message
key: the premise of an enciphering method
A person needs to know both the algorithm and the key in order to decipher a ciphertext. But in many cases, the algorithm is obvious and the key can be traced from it.
Alphabetic Substitution
This is the simplest of algorithms where the alphabet is scrambled up to make a cipheralphabet. A good knowledge of English (or whatever the plaintext is written in) is enough to crack the ciphertext.
If a lone alphabet appears commonly throughout a ciphertext, one can deduce that is either "a" or "i". Similarly, a recurrence of a three letter cipherword is probably "and" or "the". If there is no vowel in a four letter cluster, one of the letters is probably a "y". The letter after the "q" must be a "u". If the spaces are eliminated, one can still guess common suffixes for a start. Lingual rules provide many handholds to decipherment.
Know your spelling rules, substitute what you can, and play a little hangman until you get the whole plaintext. That was how I cracked the Gnommish Alphabet in the Artemis Fowl series back in seventh grade.
Caesar Shift
Actually I lied. The Caesar Shift is even simpler. It shifts the alphabet several places, then uses it as the cipheralphabet. The number of shifts is agreed with the recipient beforehand.
This encipherment was used for extremely short messages, such as one phrase. There are not enough clues to reason with, but this is still a weak cipher considering that one only needs to test twenty six cipheralphabets at most to reach the plaintext. If that sounds like a lot of work to you, read on. You will much rather confront a Caesar Shift.
Vigenère Cipher
"The Indecipherable Cipher" utilizes the Vigenère Square, which is essentially all possible Caesar Shifts lined systematically to make a square:
What happens is that the sender and receiver agree on a keyword, such as "BLUE". To encipher a message, the first letter would be enciphered with the Caesar Shift starting with "B", the second letter with the shift starting with "L", the third with the shift starting with "U", the fourth with "E", and the fifth with "B" again. So the message "pig is hungry" enciphered with the keyword "BLUE" will be "QTAMTSORHCS".
B L U E B L U E B L U
p i g i s h u n g r y
Q T A MT S O RH C S
This enciphering technique is a polyalphabetic cipher, which alternates between more than one cipheralphabet. This makes it harder to pick out letters by frequency as opposed to a monoalphabetic cipher, where you can almost guess correctly that the most common cipherletter probably represents the plainletter "e".
Charles Babbage figured that frequency analysis plays a big role concerning the nature of Caesar Shifts in the Vigenère Cipher. Arabs first came up with frequency analysis, the association of cipherletters with plainletters by occurrence. What happens is you get an graph describing the frequency distribution of alphabets in a language..
..then compare it to the frequency distribution of cipherletters in your ciphertext (similarly can be done with Zipf's Law for whole words). Match corresponding frequencies of letters and cipherletters, substitute, do some tweaking, and you should have the plaintext. This technique is not particularly significant for general Alphabetic Substitution since logical reasoning is enough to crack the cipher, but it gives a handhold in Vigenère decipherment.
Homophonic Cipher
The previous ciphers were especially vulnerable to letter frequencies. To make up for that, the homophonic cipher uses numbers as the cipheralphabet, and adds more cipherletters to even out the frequencies. Each cipherletter should appear just as often as another.
It takes much more thought to crack this cipher, but it is still possible. One can consider spelling rules, estimate the amount of extra cipherletters for each plainletter, and take both into account. There is much more trial and error, but it is not such a horror compared to the next cipher..
The Enigma
This is where encryption escalates quickly. Why read my words when you can see for yourself? This video on the Enigma Machine tells what you need to know.
Hooooo, what is this monster? From left to right, the machine components are lamp letters, keyboard, plugboard, first scrambler, second scrambler, third scrambler, and reflector. If you trace this diagram carefully, hitting the "C" key gives the output "F".
The Germans with their Enigma Machines changed their agreed scrambler setting everyday in order to securely encipher the scrambler setting of their actual messages. So a person receiving a message would set their Enigma Machine to the agreed day setting, decipher the new scrambler setting, set to the new setting, and then proceed to decipher the actual message. The plugboard setting stays the same.
The Machine is the algorithm and the scrambler setting is the key. The key is six letters long, where the first three are the starting letters of the scramblers and the last three is a repetition. The key in plaintext "pigpig" may be enciphered as "GXWLDN".
To obtain the key, Marian Rejewski cleverly mapped out the chains of letter relations. He analyzed numerous keys of one day setting and paired up the first and fourth letters of each six letter key, since they are repetitions of each other. A relation for A, B, C, D, E, and F can be:
A B C D E F
D A F E B C
Then he organized this relation into chains. In this example there are two separate chains:
four links: A --> D --> E --> B --> A
two links: C --> F --> C
The significance of this organization is that the plugboard cannot interfere with the amount of chains or links, so that it is useful for cracking the scrambler setting. Instead of finding one key among ten thousand million million keys, he had only 105,456 possible chain-link characteristics to consider. And then there was the manual labour of recording the number of chains and links for each scrambler setting, but they did it. It took a year.
When the Germans found out about their flaw they stopped repeating their keys. To find the key, Alan Turing used Rejewski's idea on cribs. A crib is a ciphertext in which you know its plaintext as well, which in their case had to be guessed. A common crib that the Germans provided was "weather" (or "wetter" in German) in their weather reports.
Then he had to figure some plugboard settings as well. The trial and error went something like this:
He had some data based on a crib.
Given that ciphertext "A" is plaintext "b",
assume that "A" is connected to "S" on plugboard.
A --> plugboard --> S --> scramblers --> ? --> plugboard --> b
A --> plugboard --> S --> scramblers --> F --> plugboard --> b
So he joined "A" to "S" on the plugboard, then saw which letter lighted up. If "b" lighted up, it shows that "b" is not connected to any letter on the plugboard. If "F" lighted up, which he knew should be "b", he then assumed that "F" is connected to "b".
All that is good, but what happens when there is a contradiction? Say, three letters seem to be plugged to each other. Yes, those would be incorrect deductions from an incorrect assumption. To turn mistakes into an advantage, Turing realized that all these incorrect deductions are definitely incorrect, and do not need to be tested further. I am still trying to get my head around this one.
And he threw these testings into a bombe.
For more historical context alongside the logic, you ought to read The Code Book. It is pleasant for leisure as well as for study. If this article makes you insecure about your internet privacy, I have another post coming up that will calm your nerves. Wait for it~
No comments:
Post a Comment