Crack the code

A sequence of letters appears on a screen, divided into blocks but unreadable at first glance. It is the encoded (ciphered or “coded”) form of an initially unknown text (“plaintext”). In other words, it is not the “plaintext” that can be seen, but an encoded or ciphered text, the “ciphertext”. The task of so-called decoding is to translate (decipher, i.e. translate) the ciphertext back into the corresponding plaintext. To do this, the code with which the plaintext has been encoded, i.e. encrypted, must be “cracked”.

The ciphertexts that appear on the screen in MATHEMATICS ADVENTURE LAND have been encoded according to a monoalphabetic code. This code assigns exactly one letter of the ciphertext alphabet to each letter of the alphabet. Such a code has been known and famous for over 2000 years as the so-called Caesar cipher. It bears the name of the Roman general and emperor, Gajus Julius Caesar (100–44 BC), who “encrypted” correspondence with his troops in this way. The alphabet of the ciphertext is created simply by shifting the order of the letters in the alphabet of the plaintext by a certain number of digits (translation). With a shift of four digits, the letters of the plaintext alphabet become the following ciphertext alphabet:

ABCDEFGHIJKLMNOPQRSTUVWXYZ
EFGHIJKLMNOPQRSTUVWXYZABCD
Table 1: The Caesar cipher

For example, if you want to arrange a “secret” meeting in MATHEMATICS ADVENTURE LAND with your girlfriend or boyfriend, the meeting place in the secret code would be:

IVPIFRMWPERH QEXLIQEXMO.

As a rule, however, in a monoalphabetic code the letters of the plaintext alphabet are not “shifted” evenly, but permuted, that is, jumbled up. An example of this is the following encoding:

ABCDEFGHIJKLMNOPQRSTUVWXYZ
DFGKIJHLMEOPRQSBUVNXYAZTCW
Table 2: Another monoalphabetic encoding

The MATHEMATICS ADVENTURE LAND would now be called in the cipher

IBPIFQMPDQK RDXLIRDXMO.

The code is a (reversible) unique assignment of one letter of the plaintext alphabet to one letter of the ciphertext alphabet. For this there are exactly

    \[26!=1\cdot2\cdot3\cdots26=403.291.461.126.605.635.584.000.000\]

Possibilities!

Despite this dizzying number, there is a chance to crack such a code in a manageable time. For this purpose, the so-called frequency analysis is used. First, the frequencies of the individual letters in the ciphertext are determined and compared with the general frequencies of the letters in the language of the (unknown) plaintext. Then the letters in the ciphertext are replaced by the letters of the same frequency in the language. One starts with the most frequent letters. In the case of German plaintexts, these are “E” and “N”. This method, which can be tried out on our exhibit (by first trying to find the “E” in the plaintext, then the “N”, etc.), is of course more reliable the longer the text to be deciphered is. The following table shows for German-language texts which relative frequencies the individual letters of the alphabet have with respect to their occurrence:

LetterPlacerelative frequency
E1.17,40%
N2.9,78%
I3.7,55%
S4.7,27%
R5.7,00%
A6.6,51%
T7.6,15%
D8.5,08%
H9.4,76%
U10.4,35%
L11.3,44%
C12.3,06%
G13.3,01%
M14.2,53%
O15.2,51%
B16.1,89%
W17.1,89%
F18.1,66%
K19.1,21%
Z20.1,13%
P21.0,79%
V22.0,67%
ß23.0,31%
J24.0,27%
Y25.0,04%
X26.0,03%
Q27.0,02%
Table 1: Relative frequencies of the letters

For comparison: If the 27 letters (including “ß”) were distributed equally, the frequency would be 3.704% in each case.


Literature

[1] Bauer, F.L.: Entzifferte Geheimnisse. Codes und Chiffren und wie sie gebrochen werden, Berlin / Heidelberg, 1995.

[2] Beutelspacher, A. u.a.: Mathematik zum Anfassen, Mathematikum, Gießen, 2005.

[3] Beutelspacher, A.: Kryptologie, 7. Auflage, Wiesbaden, 2005.

[4] Singh, S.: Geheime Botschaften. Die Kunst der Verschlüsselung von der Antike bis in die Zeiten des Internet, 7. Auflage, München, 2006.