In World War II the German military used an encryption machine called the Wehrmacht Enigma to send secret messages. The Enigma machine implemented a polyalphabetic cipher. Here is my implementation of a polyalphabetic cipher inspired by the Enigma machine. This is not a simulation of the Enigma machine, but it is based on a similar idea and should give you an idea of how hard it would be to break the German code during the war, and yet the Allied code breakers did!

Try entering a message in the text area below and an encryption key (or a password) in the space for it below the text area. Click the Encrypt button. Your message will become unreadable. Click the Decrypt button and your message will be restored.

Enter a message below:
Encryption key:  

How Does it Work?

A polyalphabetic cipher uses a different character substitution rule for each character encoded. So if a certain letter (for instance the letter ‘b’) appears several times in the cipher text you cannot imply that it represents the same character (for instance the letter ‘a’) in the plain text every time. Or another way of looking at it would be if the plain text was composed by repetition of the same letter a dozen times (for instance the letter ‘a’ e.g. “aaaaaaaaaaaa”), the cipher text would almost certainly not be composed of all the same letter (for instance the letter ‘b’ repeated a dozen times, e.g. “bbbbbbbbbbbb”). Go ahead, try it in the text box above. Type in the letter ‘a’ a bunch of times (just press the ‘A’ key and hold it down for a while). Now enter an encryption key (or password) and click the ‘Encrypt’ button and see what happens.

If when you tried this you carefully counted the number of characters in the plain text and the number of characters in the cipher text you will discover that there are 2 more characters in the cipher text than were in the plain text. This is because the first 2 characters of the cipher text are the seed used by the encryption process. The seed is randomly chosen by the encryption process to further randomize the encrypted output. You can see what this means by encrypting the same plain text with the same encryption key several times. You should get a totally different cipher text each time. If you don’t, you either got “lucky” or chose a weak encryption key (for instance a single letter repeated numerous times).

Now maybe you are wondering how many character substitution rules there are, and what happens when all the character substitution rules have been used? There are 7,744 different character substitution rules, so if the plain text is longer than that, the cycle of character substituion rules begins again, but with a twist. Each time a new cycle begins the order in which the rules are selected changes. Only after 7,744 cycles through the rules does the original rule selection order recur. This results in a rule selection sequence with a period of 59,969,536.

What that means in practical terms is that if you created a message composed of the same letter repeated 120,000,000 times (for instance the message might start out “aaaaaa…”) and encrypted it, you could split up the encrypted message after every 59,969,536th character (excluding the 2 seed characters at the beginning) and you would have 3 parts with the first 2 parts being identical and the third part (a fragment of 60,928 characters) would match the first 60,928 characters of the first 2 parts. You could not break the encrypted message into any smaller parts and still have identical parts. Don’t try this here, it’s not practical, and probably not even possible. I have tried it and you can too on your own PC with the programs below. I have verified that my assertions are true, at least for 1 particular choice of data and encryption key. I haven’t tried (at least not very hard) to produce a formal proof of these assertions. But I think it would rely on the fact that my order selection algorithm is analogous to the way counting numbers are generated and there is no limit to the number of unique counting numbers that can be generated. So this suggests that I could extend the rule selection period to any arbitrary length with just a minor change to my program.

A polyalphabetic cipher with a very long period like this one prevents the common code breaker technique of character frequency analysis. You have probably used it in some puzzles you have seen in newspapers or magazines. The ideas is that since ‘e’ is the most commonly occurring letter in the English language then the letter occuring most frequently in the cipher probably corresponds to ‘e’. Professional code breakers use much more sophisticated math to do this, but that is the idea in a nutshell.

Probably the main weakness of this cipher is that it does not encrypt all the 8-bit bytes, nor does it even encrypt all the printable ASCII characters. It encrypts 88 of the printable ASCII characters most notably excluding whitespace (space, tab and linefeed). Characters not encrypted are passed through unchanged. What this means is that all spaces and tabs and linefeeds are reproduced in the cipher text, so someone trying to break the code can tell how long the lines and the words were in the original message. This allows code breakers to use a word frequency analysis on the cipher to break it. The idea is much the same as the character frequency analysis technique described in the previous paragraph, but with words instead of letters. The most helpful information comes from 1, 2 and 3 letter words, so if you avoid using short words in your message it will be more secure.

NT Encryption Programs

If you are comfortable using the NT command line interface, here are 4 NT command line programs that implement the encryption method used here and 3 other very similar methods (differing mainly in the character sets they support).

NT Encryption Libraries

If you are an NT developer using Visual C or C++ and would like to use these encryption methods in a program you are writing, here are NT libraries containing encryption and decryption routines that you can link with your code.

Last updated: Friday, June 12, 2015 08:48:00 AM