[JAVA] Cryptanalysis - Frequency Analysis and Index of Coincidence

Mac Dre · August 25, 2014

Credits go to Deque of EZ.

_________________________

Given an encrypted message without any knowledge about the used cipher you nevertheless might be able to crack it (given that the cipher used has some vulnerabilities).

This post provides codesnippets for analysing encrypted messages for letter frequencies and coincidences.

Index of Coincidence (IC)

What you can find out by the IC, is whether a message is encoded by a monoalphabetic (like Caesar cipher) or a polyalphabetic cipher (like Vigenère).

The index of coincidence is at 1.73 for plaintext in English. If you get an index close to 1 (which would be the value for a randomly created text) you have probably a polyalphabetic cipher and need to do further examinations like Friedman Test. If the index is higher, you have a monoalphabetic one and are able to crack it just by counting the letters (frequency analysis).

A sample output looks like this (the longer the message the better the result):

You see, IC is pretty high, so you have for sure a plaintext or a monoalphabetic cipher. It is the same with kappa text and kappa random, just that they are not normalized. The IC is the same as (kappa text * letters). kappa text for English and with 26 letters is about 0.067. Of course you get other values for kappa if you use more characters than just the letters of the alphabet whereas the IC will stay the same.

import java.util.Map;
 
public class IndexOfCoincidence {
 
    private Language language;
    private FrequencyAnalysis ana;
    private final String newline = System.getProperty("line.separator");
 
    public String analyse(String code) {
        ana = new FrequencyAnalysis();
        ana.analyse(code);
 
        double kappaText = computeKappaText();
        double kappaRand = computeKappaRandom();
        double ic = kappaText * amountOfCharacters();
        return "IC: " + ic + newline + newline
                + "kappa text: " + kappaText + newline
                + "kappa random: " + kappaRand + newline + newline
                + "If kappa text is close to kappa random, you probably have a "
                + "polyalphabethic cipher, otherwise a monoalphabetic one";
    }
 
    public double computeKappaRandom() {
        return 1 / (double) ana.getFrequency().size();
    }
    
    public int amountOfCharacters(){
        return ana.getFrequency().size();
    }
 
    public double computeKappaText() {
        Map<Character, Integer> frequency = ana.getFrequency();
 
        long n = ana.getSumCount();
        int sumNi = 0;
        for (Integer i : frequency.values()) {
            sumNi += (i * (i - 1));
        }
        return (sumNi / (double) (n * (n - 1)));
    }
 
}

Frequency Analysis

As you can see above frequency analysis is used to determine the IC. But that is not the only purpose. Every language has typical letter frequencies, e.g. the letter 'e' is the most common letter in the english language, so you assume that the most counted letter in an encrypted message will be 'e'. But this works only with monoalphabethic ciphers. It wouldn't work with Vigenère, because this cipher uses more alphabeths than one, which means the letter 'e' is not always translated with the same character.

But given a shift cipher (Caesar) it is pretty easy to crack with frequency analysis. You just need to determine the most common character in there. If this is 'h' you know the cipher has shifted every letter of the alphabeth three times.

import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
 
public class FrequencyAnalysis {
 
    private Map<Character, Integer> frequency; 
    private Language language;
 
    public String analyse(String code) {
        frequency = new HashMap<Character, Integer>();
        count(code);
        return getAnalysisString();
    }
    
    public Map<Character, Integer> getFrequency(){
        return new HashMap<Character, Integer>(frequency);
    }
 
    private String getAnalysisString() {
        StringBuilder b = new StringBuilder();
        long sumcount = getSumCount();
        b.append("Char\tAbs\tRel\n");
        for (Character c : getSortedKeyList()) {
            b.append(c + "\t" + frequency.get(c) + "\t"
                    + (frequency.get(c) / (double) sumcount) + "\n");
        }
        b.append("\nletters: " + sumcount);
        b.append("\ncharacters: " + frequency.size());
        return b.toString();
    }
 
        //insertion sort is used here
    private List<Character> getSortedKeyList() {
        List<Character> list = new LinkedList<Character>();
        for(Entry<Character, Integer> entry : frequency.entrySet()){
            for(int i = 1; i < list.size(); i++){
                if(frequency.get(list.get(i)) > entry.getValue()){
                    list.add(i, entry.getKey());
                    break;
                }
            } 
            if(!list.contains(entry.getKey())){
                list.add(entry.getKey());
            }
        }
        return list;
    }
 
    public long getSumCount() {
        long sum = 0;
        for (Integer i : frequency.values()) {
            sum += i;
        }
        return sum;
    }
 
    private void count(String code) {
        char[] text = code.toCharArray();
        for (char c : text) {
            if (frequency.containsKey(c)) {
                frequency.put(c, frequency.get(c) + 1);
            } else {
                frequency.put(c, 1);
            }
        }
    }
 
}

A sample output for a german plaintext looks like this:

Char    Abs    Rel
 
    2    0.0018450184501845018
2    1    9.225092250922509E-4
5    1    9.225092250922509E-4
;    1    9.225092250922509E-4
N    1    9.225092250922509E-4
H    1    9.225092250922509E-4
I    1    9.225092250922509E-4
K    1    9.225092250922509E-4
U    1    9.225092250922509E-4
—    1    9.225092250922509E-4
„    1    9.225092250922509E-4
Z    1    9.225092250922509E-4
“    1    9.225092250922509E-4
€    1    9.225092250922509E-4
y    1    9.225092250922509E-4
x    1    9.225092250922509E-4
(    2    0.0018450184501845018
)    2    0.0018450184501845018
F    2    0.0018450184501845018
B    2    0.0018450184501845018
M    2    0.0018450184501845018
V    2    0.0018450184501845018
j    2    0.0018450184501845018
G    3    0.0027675276752767526
A    3    0.0027675276752767526
W    3    0.0027675276752767526
ß    3    0.0027675276752767526
P    3    0.0027675276752767526
ö    3    0.0027675276752767526
E    4    0.0036900369003690036
L    4    0.0036900369003690036
T    4    0.0036900369003690036
S    4    0.0036900369003690036
ü    4    0.0036900369003690036
p    5    0.004612546125461255
D    6    0.005535055350553505
k    6    0.005535055350553505
,    7    0.006457564575645757
ä    9    0.008302583025830259
.    10    0.00922509225092251
v    11    0.01014760147601476
b    13    0.011992619926199263
w    14    0.012915129151291513
f    15    0.013837638376383764
m    16    0.014760147601476014
o    17    0.015682656826568265
z    18    0.016605166051660517
g    23    0.021217712177121772
c    26    0.023985239852398525
l    28    0.025830258302583026
u    36    0.033210332103321034
s    39    0.035977859778597784
d    40    0.03690036900369004
h    41    0.03782287822878229
a    48    0.04428044280442804
t    51    0.0470479704797048
i    65    0.05996309963099631
n    85    0.07841328413284133
r    87    0.08025830258302583
e    143    0.13191881918819187
     156    0.14391143911439114
 
letters: 1084
characters: 61

In case you want to try this code, just add this method:

   
public static void main(String[] args) {
    Scanner scan = new Scanner(System.in);
    System.out.println("Your encoded message:\n");
    String code = scan.nextLine();
    IndexOfCoincidence ic = new IndexOfCoincidence();
    System.out.println(ic.analyse(code));
}

Sign In

[JAVA] Cryptanalysis - Frequency Analysis and Index of Coincidence

Recommended Posts

Mac Dre

Link to comment

Share on other sites

Recently Browsing 0 members

Browse

Store

Search