Arch Linux
by Michael on June 14, 2009
in miscellaneous
Due to its better haskell support (i.e. very recent ghc-releases as native packages, haskell libraries, …) and generally more minimalistic approach I switched distributions from Ubuntu (been there since a few years, and before on SuSE, Redhat, Fedore, Mandriva, Gentoo, Debian, …) to Arch Linux. Despite some minor problems the switch was quite smooth and currently I’m back at my research about parallelizing stencil based algorithms (where the biolib mentioned earlier in this blog has an example application about string alignments).
Installed it on both machines I currently use (a Samsung NC 10 and a Dell Latitude E4300). If you have any questions about configuring these systems write me a mail, comment this article or visit the Arch Linux Wiki.
Updated speed results for sequence scoring and alignment
by Michael on May 19, 2009
in haskell, miscellaneous, research
All tests done with two randomly generated strings of length 1000 on my Dualcore with +RTS -N2 -K256M
Sequential Simple Global Alignment......CPU 3.028189s WALL 2.770991s Parallel Simple Global Alignment........CPU 0.684042s WALL 0.4524s => Ok. Sequential Simple Local Alignment.......CPU 91.301706s WALL 92.07114s Parallel Simple Local Alignment.........CPU 4.608287s WALL 3.734519s => Ok. Sequential Affine Global Scoring........CPU 3.524219s WALL 3.304805s Parallel Affine Global Scoring..........CPU 1.464092s WALL 1.461364s => Ok. Sequential Affine Local Scoring.........CPU 62.715919s WALL 63.269058s Parallel Affine Local Scoring...........CPU 10.06863s WALL 7.295414s => Ok. |
The “=> Ok.” string means that the result of the sequential reference implementation are the same as the parallel results, i.e. my implementation works and is fast.
We could probably improve the speed of the local alignments by playing with the heap RTS parameters to reduce GC time.
Webcams and Haskell
by Michael on May 16, 2009
in miscellaneous
Side project idea: Develop a library for accessing a webcam through Video4Linux. Steps:
- Understand Video4Linux
- Understand the FFI Interface of GHC
- Develop a rudimentary wrapper
- Cabalize it
- Upload
- Profit
Looks easy *sigh*
Finished side project: Pong
by Michael on May 11, 2009
in miscellaneous
Ok, this was quite fast. OpenGL and especially the Haskell wrapper HOpenGL are quite easy to use for simple things, e.g. a Pong-game
Despite some bugs which I’m too lazy to fix this is finished and I’ll tackle my next project. The low framerate is due to the recording, the game itself runs very smooth of course
New side project
by Michael on May 10, 2009
in miscellaneous
Working on a simple Pong-Game with OpenGL and Haskell.
Did some (now) obvious optimizations…
by Michael on May 7, 2009
in miscellaneous
…and now my parallel version both scales very well and is even faster when using only one core.
Parallel biolib improving…
by Michael on May 6, 2009
in miscellaneous
I’m (still) working on the parallel implementation of string alignment algorithms; currently I have a speedup between one, two and four processors (anyone having an eight-core machine for testing purposes?) but it’s still only as fast as the sequential test application with four cores. Guess I have to improve the non-parallel part of the code. This is quite difficult without a good profiler. /me is waiting for Satnam Singh’s Threadscope.
(Graph of runtime with 1, 2, and 4 processors using different row-sizes)
My new summer toy
by Michael on May 4, 2009
in miscellaneous
Going to have a lot of fun with this. More information can be found on the Giant homepage.
First win game against a 3k
Ok, I’m receiving a 6 stone handicap but nevertheless it’s a big win for me…
What a game…!
Had an intensive game this evening (despite valuing a ko-threat wrong) and saw a (at least for a 9k) great combination (me playing black), starting with White 281; I’m not sure there were many options for white to prevent this except giving me a few points.
Just a flower for my visitors…
by Michael on March 26, 2009
in miscellaneous
Haskell has a new logo
by Michael on March 24, 2009
in miscellaneous
Solving Pentominoes
by Michael on March 16, 2009
in miscellaneous
I finally finished my Pentominosolver in Haskell. More information and the code will follow soon, but until then I’ll just explain what it’s doing. You can give a description of empty spaces on a board and a list of tiles which should fit onto it, the file format is quite simple, e.g. for a board and four pieces to fit you write
xxxx xxxxx xxxxx xxxx xxxx xx x xx x x x x xx xx xxx x xx xx x |
and the result is displayed as a coloured HTML-Table, e.g. for the above problem you get the result
Moth, women and more…
by Michael on March 9, 2009
in miscellaneous
Today I’ve read about
- http://en.wikipedia.org/wiki/Christine_Cushing
- http://en.wikipedia.org/wiki/European_route_E74
- http://en.wikipedia.org/wiki/Atasca
- http://en.wikipedia.org/wiki/Boggy_Bayou_Mullet_Festival
- http://en.wikipedia.org/wiki/Charity_Adams
Read today
by Michael on March 7, 2009
in miscellaneous
Today I’ve read
- http://en.wikipedia.org/wiki/The_End_(Red_Dwarf)
- http://en.wikipedia.org/wiki/Ojo
- http://en.wikipedia.org/wiki/Duyan
- http://en.wikipedia.org/wiki/Book_(law_school)
- http://en.wikipedia.org/wiki/Volvo_B12BLE
which were (at least some of them) strange. Additionally I’ve read the paper “A Note on Two Problems in Connexion with Graphs” from Dijkstra, which was interesting, since I’ve never read the original paper (Thanks to Albert again).
Improving my random knowlegde
by Michael on March 6, 2009
in miscellaneous
Starting from tomorrow, I’ll try to read every day
- five random wikipedia articles
- one famous or otherwise important paper of computer science, not necessarily about parallel programming or functional languages, but more general ones (Leave suggestions in the comments)
Working again with the original biolib code
by Michael on February 24, 2009
in miscellaneous
After hacking / playing with the standard algorithms to get a better feeling for their complexity etc… I’m back at working directly with the code in biolib. Hence, preliminary results or at least thoughts should be expected in the near future (hopefully
).
String Similarity finished
Today I implemented string similarity scoring using a (Char, Char) -> Int matrix. Using a very simple matrix you can solve the longest common subsequence problem with this approach, too.
We’ll just show this simple lcs example, since providing a whole example with the substitution matrix etc.. becomes quite long very fast.
*Similarity> lcs "aaabdbdbabbahh" "ggaabah" "aabah" |
Now I’ll start on the local alignments. When I’ve implemented sequential versions, the both will be embedded in the biolib and parallelization starts.
Faster global, sequential edit and alignment generation
The approach described in the last post was quite elegant but both very, very slow and consuming a lot (e.g., nn GB) of memory.
I’ve then written a more optimized version which uses IOArrays and ByteStrings. This lead to (obviously) incredible speedups: before I was not able to calculate strings with more than ca. 10 – 15 characters, now I’m able to calculate the minimal edit distance for two strings of 1000 characters in 0.7 seconds (Core2Duo 2.4 GHz) in the sequential version.
I still have some problems with the garbage collector (i.e. it should not be called as often as it is when using solely IOArrays…) but I’m pretty sure they’ll be fixed, too.
(Obligatory screenshot of edit transcripts and alignment generation)
Brute force minimal edit distances
by Michael on February 12, 2009
in miscellaneous
Here is a Haskell implementation for the generation of edit transcripts between arbitrary strings:
Things to do:
- Testing the speed with huge strings
- Adding table based lookup
- More benchmarking
- Thinking about parallelization
And the obligatory source code.
-- Hacking example to calculate the edit distance and its transcript of two -- strings using the straightforward recurrence relation. -- -- This is the easy way, i.e. no table-based lookup but pure brute-force -- powered recursion. -- -- Source: Algorithms on Strings, Trees and Sequences: Computer Science and -- Computational Biology, Dan Gusfield, ISBN 978-0521585194, p. 211ff -- -- Remarks: -- The distance function returns the edit result in reverse order. Use the -- edit distance function. -- Describes an edit step data Step = Insert Char | Delete Char | Match | Replace Char Char deriving (Show, Eq) -- If the first string is not empty but the second is, remove all remaining -- elements distance (s1, i) (_, 0) = let inserts = take i (map Delete s1) in (i, reverse inserts) -- If the first string is empty, we have to add the remaining j characters distance (_, 0) (s2, j) = let deletes = take j (map Insert s2) in (j, reverse deletes) -- Default case: Find the operation with minimal costs. distance (s1, i) (s2, j) = let -- Generate the four different possibilities. (a, a') = distance (s1, i-1) (s2, j) -- Delete (b, b') = distance (s1, i) (s2, j-1) -- Insert -- Matching means no extra cost ch = s1 !! (i-1) ch' = s2 !! (j-1) t = if (ch == ch') then 0 else 1 (c, c') = distance (s1, i-1) (s2, j-1) -- Match and Replace -- Find (one of) the optimal modifications cost = minimum [a+1, b+1, c+t] -- And choose the appropiate edit step operation | cost == a+1 = (Delete ch) : a' | cost == b+1 = (Insert ch') : b' | cost == c = Match : c' | otherwise = (Replace ch ch') : c' in (cost, operation) -- Calculates the minimal edit distance and one of the optimal transcripts editDistance s1 s2 = let (n, t) = distance (s1, length s1) (s2, length s2) in (n, reverse t) -- For convenience: a better output displayEdit s1 s2 = do let ed = editDistance s1 s2 n = fst ed ts = snd ed putStrLn $ "Distance = " ++ (show n) ++ "\nSteps = " mapM_ (putStrLn . (" " ++ ) . show) ts |






