Algorithms and their Applications in Biology

cmt_img Comments (0)    
algorithms-web.jpgDuring the second half of the last century, the development of computers and computer simulation has given way to many advances in our society.  Many fields such as medicine have been using computers to database patients and to simulate research objectives.  Throughout the past centuries, our understanding of the human anatomy and medicine has been increasing at exponential rates, from the development of the pace maker to the mapping of the human genome. 

The combination of computer programming and biological research is a fast growing division of universities nationwide and professionals around the globe.  These researchers are using algorithms to simulate everything from the splitting of bacterium to the growth habits of viruses.  An algorithm is a routine that can be written in any necessary computer programming language that accomplishes the required task at hand.

The Link

The link between biology and computer science can be clearly seen in a simple Google search of the phrase "algorithms in biology."  Of the 3.26 million results returned, the majority of the results list courses that are offered at universities and colleges nationwide.  CPS 296.4 - Algorithms in Structural Molecular Biology is offered at Duke University, Algorithms for Computational Biology is offered at the Massachusetts Institute of Technology (MIT), and CS 374 Algorithms in Biology at Stanford are just some of the hundreds of courses offered throughout the nation.  This surge of courses specifically, aimed at learning essential computer algorithms for use in the biological sciences, suggests the importance of this field.  What really is the importance and relationship between algorithms and biology that makes certain universities put such an emphasis on the subject? 

The human body has always been considered a machine.  This machine operates in mechanical and digital ways.  Our joints and muscles coordinate with each other in the same way mechanical devices operate.  Similarly, our minds and nervous system behave like a digital system: electrical signals are sent and received at light speed through our neurons to produce the required action or reaction. 

Recent advances in the field of biology have led to the development and need for computer based simulation.  Some of these advances include the cracking of the human genome. As more and more information is discovered about the building block of human life, the cures to diseases and other ailments are surfacing.  Biologists and scientists are now looking for ways to digitally duplicate and simulate the human genome by using computers.  This can only be done through complex algorithms and computer systems. 

Folding @ Home

Protein is an essential nutrient and a basic building block for all living things.  It is usually comprised of many different types of atoms and forms, many varying molecules which accomplished all varying tasks when brought into the body.  The tasks performed by proteins can be both good or bad, depending on how the molecule is constructed, or as biologists refer to it, how the protein is folded.  The Folding @ Home program is a volunteer program where users can download software from the project's website and lend their idle processor time to the research being done on protein folding. 

The computer software installed on the servers at Stanford University uses a volunteer's computer's idle processing power to compute incredibly complex and difficult equations and algorithms that mimic and simulate how proteins in the human genome fold.   These simulations give researchers a better understanding of diseases that are formed from protein building and how to counteract or prevent them.  The Stanford team writes on their website, "Computer simulation is particularly well suited to address these questions [how molecules assemble themselves], as it naturally lends itself to thermodynamics, kinetic, and atomic level structure detail" (Pande). 

This program has been made available to common personal computer users by simply downloading the software to their desktop.  More recently, the Folding @ Home program has been preinstalled on all Playstation 3 units, Sony's newest and most powerful video game console.  With the newly developed IBM 3.2GHz Cell Processor, the Playstation 3 has plenty of powerful hardware that can be put to good use when not being played.  The consumer has the option to turn Folding @ Home on and donate their game console to the research being done at Stanford.

Strings, Trees, and Sequences

The majority of biological research done using algorithms is on the molecular level.  One of the most important algorithms used by biologists at the molecular level is a pattern recognition algorithm.  This software simulates and computes various known DNA configurations and searches for patterns that may lead to cancer or other human diseases. 

In his book, Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology, Dan Gusfield states in the introduction that:

The digital information that underlies biochemistry ... can be represented by a simple string of G's, A's, T's and C's.  This string is the root data structure of an organism's biology. (Gusfield xiii)

The author continues by outlining a similar important statement regarding biology and computer algorithms by writing that "biology is all about sequences."  Between the author's mention of sequences and structures, any computer scientist, or anyone familiar with object based program languages like C, C#, or Java, immediately sees the need for sorting or comparison algorithms in the field of biology.

Using computer generated sequences of strings based on real life DNA, comparison algorithms were used to find illness patterns between strings of DNA.  Gusfield lists a variety of different algorithms that are most prominently used.  These algorithms are discussed more in depth in the chapters of his book.  The algorithms discussed include: "storing, retrieving, and comparing DNA strings; ... determining physical and genetic maps from probe data under various experimental protocol; ... looking for new or ill-defined patterns occurring frequently in DNA; ... looking for structural patterns in DNA and protein" (xiii - xiv). 

By using complex string based algorithms, scientists can accurately recreate the exact behavioral properties of real DNA.  This allows for accurate measurement without having to deal with delicate and degradable DNA samples.  The algorithms used can accept recorded strings of DNA, which represent real, recorded strings and compare them to others in the current databases.  The different strings of DNA are labeled either by disease, sex, or many other different categories, and then the algorithms can search and compare all these different strings. As a result, the similarity or difference between strings with varying classifications can be compared and may lead to breakthroughs in disease curing or disease stabilization. 

What Algorithms?

The Needleman-Wunsch algorithm is the most popular algorithm in biological simulation and studies for genome research.  This algorithm was developed in 1970 by Saul Needleman and Christian Wunsch.  The Needleman-Wunsch algorithm is used to align separate sequences so that certain characteristics align. For example, when being used on two separate strings of DNA, the algorithm will align the A, G, C, or T proteins that make up DNA into the most logical alignment. 

The Smith-Waterman algorithm is also used in biological research pertaining specifically to DNA sorting and sequential alignment.  Unlike the Needleman-Wunsch algorithm, which tries to align the entire segment, the Smith-Waterman algorithm compares only segments of the possible lengths and optimizes these for similarities. 

The Viterbi algorithm is yet another algorithm that is specifically used for finding the most common sequence within a number of different states.  This unique algorithm is highly dependent on time and can be used when some variables are unknown.  It specializes in finding sequences and similarities between variables, especially the indefinite. 

These various sequencing algorithms can be used to map, organize, and compare strings of DNA that are too complex to be done with pen and pad.  The computer algorithms can also create a three dimensional visualization of these genome sequences that can give scientists and researches a new view into the world of molecular biology. 

Simulation

Object oriented programming and algorithms written in these languages allows users (scientists and biologists) to create extensive three dimensional images and varying simulations of the human genome and other molecular structures.  When analyzing tens of thousands of gene variations, scientists will require massive and incredibly powerful computer simulation and computer modeling.  These models can give scientists a clear view of molecular structures, including the specific proteins that make up individual strands of the human genome.  

The use of computers in all life sciences has been booming for the past few years.  Now that the human genome has been unlocked, the door for computer scientists to work in the field of biology developing various simulation algorithms has opened.  These different algorithms will help scientists predict the behavior of certain proteins and hopefully lead to the discovery of new cures and aids to worldwide diseases. 

Computers can aid consumers in more ways than entertainment and office productivity.  Through highly developed and advanced algorithms, computers can be used to map proteins, human DNA, and sort through databases of these items to find similarities and differences. This may be the key to unlocking some of the most challenging and interesting questions about evolution and molecular development.  

References
Gusfield, Dan.  Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology.  Cambridge University Press.  1997.

Pande, Vijay.  Folding@home.  Web: http://folding.stanford.edu/Pande/Main.  Available: November 19, 2007.  2000 - 2007. 

Pattern Recognition Algorithms for Biology. Web: http://www1.uea.ac.uk/polopoly_fs/1.3621!algorithmsbiology.pdf. Available: November 19, 2007.  2000 - 2007.

Stanford University.  Folding@home. Web: http://folding.stanford.edu/English/Main. Available: November 19, 2007.  2000 - 2007.

Tucker, Allen B.  Computer Science Handbook.  CRC Press. 2004.

Leave a comment

About this Article

This is an article by John Bekisz Jr. from the February 2008 issue.

Green Buildings is the previous article in this issue.

Read the cover article of our latest issue on the main page or look in the archives to find all of the magazine's content.