huffman file compression

A simple file encoder for arbitrary files that uses a huffman tree. Implemented as an example solution for this lecture. Download it here.

Installation

Use the makefile. Execute

make all 

to generate the binary hf. The library QuickCheck is needed for sanity tests. If you have a recent Haskell distribution (or installed Haskell by using the Haskell Platform) you can simply add it by

cabal update
cabal install quickcheck

The other makefile targets are

help          show an overview of make-targets
clean         removes all intermediate files
check         calls hlint on all files
test          starts a sanity test using QuickCheck

Starting

Call the program with hf. Without a parameter it shows the help. hf allows the following parameters:

-c <file>     compress a file and stores the compressed one in <file>.hf
-d <file.hf>  uncompress a file and store the original in <file>
-t <n>        starts a sanity tests for <n> randomly generated strings
-g <string>   prints out a dot-file that represents the huffman-tree for
              the particular <string>. Useful for debugging and education.

              Example: With

                hf -g "hello, world!"|dot -Tpng -ohw.png

              a file hw.png is generated. It shows the huffman tree for the
              string "hello, world!"

Remarks

The program is neither very fast nor does it achieve amazingly compression rates. Its primary purpose is to serve as an educational example of a bigger Haskell program with modules distributed over many files, the usage of many higher order functions (even self-defined ones), own types etc...

In addition (and again, to simplify the code) error correction for corrupted data files has been omitted.

Files

The source files serve different purposes:

Main.hs       parses command line parameters and calls the functions of
              the other modules.
File.hs       handles the serialization of trees and streams into chars that
              can be written into a file.
Huffman.hs    implements the actual algorithm to generate trees and encode
              and decode strings into streams and vice versa.
Graphviz.hs   implements the conversion from a Tree to a graphviz file.
Tests.hs      shows how to use QuickCheck to automatically generate test
              cases to check the sanity of the Huffman-algorithm.

In addition, the darcs repository has been left, such that the curious reader can take a look at the different development steps and iteration of the source code.

last modified: 2011-04-05 11:13 | | twitter | design idea copied from leobabauta.com