Huffman Data Compression using javascript

Huffman data compression this idea was developed by David Huffman in 1952 when he was a graduate student at MIT.The algorithm squeezes the “fluff” out of data but in a way that the original can be reproduced exactly. You see this done in programs like Winzip, stuffit, or for Unix folk, gzip. This is called “lossless” compression and is different from “lossy” compression used in audio and video files where the exact original cannot be reproduced.in this going to compress a text files using huffman data compression this can be achieved by using JavaScript. in this we create seven functions to do the task .now lets start...

Here we going to make it step by step first we are going to write the function to examine the text file. frequency:

    function frequency(str){
    var freqs = {};
    for (var i = 0; i<=str.length;i++){
        if (str[i] in freqs){
            freqs[str[i]] ++;
        }
        else{
            freqs[str[i]] = 1;
        }
}

in this function frequency the input string is str we create a object to produce the element and its corresponding frequency.that is the frequency of a letter is how many times the letter appear in that string.and now we created a dict named freqs to store the frequency of letters.and now lets go to create the input of huffman compression.

sortfreq:

function sortfreq(freqs){
    var letters = [];
    for (var ch in freqs){
        letters.push([freqs[ch],ch]);
    }
    return letters.sort();

}
console.log(sortfreq({'a': 3, 'c': 2, 'b': 1, 'e': 5, 'd': 1, 'g': 1, 'f': 2}));

in this function the the freq object is given to this function and it takes the key and value and add to the array named as letters as arrays.and to get the order its sort by frequency so we get an sorted arrays of array.

now we moving to the core part of huffman compression in order to compress the text file we need to assign the characters to the codes of “0?s and “1?s this is how we get that build a tree structure of the array letters in this we build tree in the reverse order. and when we put codes into characters we get something like this.

  a   '00'
  b   '1010'
  c   '011'
  d   '1011'
  e   '11'
  f   '100'
  g   '010'

and to make this happen we draw a tree structure like this

and now going to the coding part…

enter image description here

buildtree

function buildtree(letters){
    while(letters.length>1){
        var leasttwo = letters.slice(0,2);
        var therest = letters.slice(2,letters.length);
        var combfreq = letters[0][0] + letters[1][0];
        letters = therest;
        //console.log(letters);
        var two = [combfreq,leasttwo];
        letters.push(two);
        //console.log(letters);
        letters.sort();
}
    return letters[0];
}

making the function to build the tree and returning its fisrt element…so we get a tree from the function build tree..and now we have to trim the tree extracting the characters only for that we use trim tree function..

trimtree

    function trimtree(tree)
    {
        var p = tree[1];
        if (typeof p === 'string'){
            return p;
    }
        else
    {
            return (Array(trimtree(p[0]),trimtree(p[1])));
    }
    }

node = (trimtree(tree));
console.log(node);

once we have trimtree function it means that the node array contain only the characters of that tree and now we assign code to the characters in order that left side of the tree gets 0 and right side will be 1.

assigncodes

var codes = {}; 
function assigncodes(node,pat){
    pat = pat || "";
    if(typeof(node) == typeof("")){
        codes[node] = pat;
    }
    else{
        assigncodes(node[0],pat+"0");
        assigncodes(node[1],pat+"1");
    }
    return codes;
}
console.log(assigncodes(node));

to make an complete code here is our huffman compression is over now to gather the codes of each character we use encode function which bind them together.This turns on our original string “aaabccdeeeeeffg” into 44 bits that we represent with “00000010010100101010111111111101101110111000?. The original text at one byte per character requires 120 bits. encode

function encode(str){
    var output = "";
    for (var i=0;i<str.length;i++){
        output = output+codes[str[i]];
    }
    return output;
}

and the last part to decode this compressed output we have to decode it to the original string and to do that we make a function called decode

function decode(tree,str){
    var output="";
    var p=tree;
    for (var i=0;i<str.length;i++){
        if(str[i] == '0'){
            p=p[0];
            }
        else{
            p=p[1];
        }
        if (typeof p === 'string'){
            output = output +p;
            p=tree;
            }
    }
    return  output;
    }
console.log(decode(trimtree(tree),l));

Decoding the bit stream back into text is a little harder but still not bad. Each “zero” in the bit stream means a left turn up the tree, each “one” a right turn. When a leaf is reached, its character is sent to the output and we restart at the base of the tree for the next character.