文件名称:Introduction to Computational molecular biology - Carlos Setubal, Joao Meidanis
文件大小:8.44MB
文件格式:PDF
更新时间:2010-10-06 18:35:41
biology Computational molecular
Chapter 1 presents fundamental concepts from molecular biology. We describe the basic
structure and function of proteins and nucleic acids, the mechanisms of molecular genetics,
the most important laboratory techniques for studying the genome of organisms, and
an overview of existing sequence databases.
Chapter 2 describes strings and graphs, two of the most important mathematical objects
used in the book. A brief exposition of general concepts of algorithms and their
analysis is also given, covering definitions from the theory of NP-completeness.
The following chapters are based on specific problems in molecular biology. Chapter
3 deals with sequence comparison. The basic two-sequence problem is studied and
the classic dynamic programming algorithm is given. We then study extensions of this
algorithm, which are used to deal with more general cases of the problem. A section is
devoted
to the multiple-sequence comparison problem. Other sections deal with programs
used in database searches, and with some other miscellaneous issues.
Chapter 4 covers the fragment assembly problem. This problem arises when a DNA
sequence is broken into small fragments, which must then be assembled to reconstitute
the original molecule. This is a technique widely used in large-scale sequencing projects,
such as the Human Genome Project. We show how various complications make this
problem quite hard to solve. We then present some models for simplified versions of the
problem. Later sections deal with algorithms and heuristics based on these models.
Chapter 5 covers the physical mapping problem. This can be considered as fragment
assembly on a larger scale. Fragments are much longer, and for this reason assembly
techniques are completely different. The aim is to obtain the location of some markers
along the original DNA molecule. A brief survey of techniques and models is given.
We then describe an algorithm for the consecutive ones problem; this abstract problem
plays an important role in physical mapping. The chapter finishes with sections devoted
to algorithmic approximations and heuristics for one version of physical mapping.
Proteins and nucleic acids also evolve through the ages, and an important tool in
understanding how this evolution has taken place is the phylogenetic tree. These trees
also help shed light in the understanding of protein function. Chapter 6 describes some
of the mathematical problems related to phylogenetic tree reconstruction and the simple
algorithms that have been developed for certain special cases.
An important new field of study that has recently emerged in computational biology
is genome rearrangements. It has been discovered that some organisms are genetically
different, not so much at the sequence level, but in the order in which large similar chunks
of their DNA appear in their respective genomes. Interesting mathematical models have
been developed to study such differences, and Chapter 7 is devoted to them.
The understanding of the biological function of molecules is actually at the heart of
most problems in computational biology. Because molecules fold in three dimensions
and because their function depends on the way they fold, a primary concern of scientists
in the past several decades has been the discovery of their three-dimensional structure,
in particular for RNA and proteins. This has given rise to methods that try to predict a
molecule's structure based on its primary sequence. In Chapter 8 we describe dynamic
programming algorithms for RNA structure prediction, give an overview of the difficulties
of protein structure prediction, and present one important recent development in the
xii Preface
field called protein threading, which attempts to align a a protein sequence with a known
structure.
Chapter 9 ends the book presenting a description of the exciting new field of DNA
computing. We present there the basic experiment that showed how we can use DNA
molecules to solve one hard algorithmic problem, and a theoretical extension that applies
to another hard problem.