

If not, then we can try another value of \(n\).įor example the following text has been enciphered using a Vigenère cipher with the keyword CIPHERS


Given that each of these has been enciphered using a Caesar shift (with a different value for the shift), we can calculate the index of coincidence for each column and if these are (or better, their average is) close to \(1.734\), then there is a good chance that \(n\) really is the key length. If we have a guess that the keylength is \(n\) then we can write the ciphertext using \(n\) columns and extract all \(n\) columns from the ciphertext. We can exploit this when analysing a piece of ciphertext that we suspect has been encoded using a vigenère cipher.

If it is text written in English (and possibly enciphered using a substitution cipher) then it will be closer to \(1.734\) ( \(= 26\times 0.0667\)). If the text is essentially ‘random’ then the index of coincidence will be close to \(1\). Note that some definitions omit the value 26 from the above definition. Given a piece of text of length \(N\) in which the letters ‘A’ to ‘Z’ appear with frequency \(f_i\) ( \(i=A\) to \(Z\)) then the index of coincidence of the text is defined as We can therefore exploit this phenomenon to decide if a given piece of ciphertext has been enciphered by a substitution cipher such as a Caesar shift. This probability does not change if the text is enciphered with a substitution cipher. If you choose two letters at random from a random piece of text, the probability that they are the same is about 0.0385, whereas if you choose two letters at random from a piece of English text, the probability that they are the same if about 0.0667. The index of coincidence is a measure of how close a frequency distribution is to the uniform distribution.
