A simple file encoder for arbitrary files that uses a huffman tree. Implemented as an example solution for this lecture. Download it here.
Use the makefile. Execute
make all
to generate the binary hf. The library QuickCheck is needed for sanity tests. If you have a recent Haskell distribution (or installed Haskell by using the Haskell Platform) you can simply add it by
cabal update
cabal install quickcheck
The other makefile targets are
help show an overview of make-targets
clean removes all intermediate files
check calls hlint on all files
test starts a sanity test using QuickCheck
Call the program with hf. Without a parameter it shows the help. hf allows the following parameters:
-c <file> compress a file and stores the compressed one in <file>.hf
-d <file.hf> uncompress a file and store the original in <file>
-t <n> starts a sanity tests for <n> randomly generated strings
-g <string> prints out a dot-file that represents the huffman-tree for
the particular <string>. Useful for debugging and education.
Example: With
hf -g "hello, world!"|dot -Tpng -ohw.png
a file hw.png is generated. It shows the huffman tree for the
string "hello, world!"
The program is neither very fast nor does it achieve amazingly compression rates. Its primary purpose is to serve as an educational example of a bigger Haskell program with modules distributed over many files, the usage of many higher order functions (even self-defined ones), own types etc...
In addition (and again, to simplify the code) error correction for corrupted data files has been omitted.
The source files serve different purposes:
Main.hs parses command line parameters and calls the functions of
the other modules.
File.hs handles the serialization of trees and streams into chars that
can be written into a file.
Huffman.hs implements the actual algorithm to generate trees and encode
and decode strings into streams and vice versa.
Graphviz.hs implements the conversion from a Tree to a graphviz file.
Tests.hs shows how to use QuickCheck to automatically generate test
cases to check the sanity of the Huffman-algorithm.
In addition, the darcs repository has been left, such that the curious reader can take a look at the different development steps and iteration of the source code.