Bioinformatics :
Application of Bioinformatics
Definition:
Bioinformatics deals
with the creation and maintenance of databases of biological information such
as the nucleic acid, gene sequences and protein sequences. It has its own
applications in gene therapy, diagnostics, drug designing, crop improvement,
biochemical processes etc. It involves the data analysis or creation of
electronic databases on genomes and protein molecules.
History of
Bioinformatics
From the beginning of
the post Mendelian’s period, genetic principles propounded by various
geneticists have revealed the functional behaviour of discrete hereditary
particles called the genes, in the expression of various morphological
(phenotypical) and biochemical traits of organisms. During the last three
decades, the advancement in molecular biology, the invention of computers,
ultra developments in scientific methodologies and introduction of
instrumentation at nano level, have paved the way for the origin of
bioinformatics.
The preliminary
discoveries such as the amino acid sequence of bovine insulin (1950s), nucleic
acid sequence of yeast alaine tRNA with 77 bases (1960s); X-ray
crystallographic structure of protein, formed the basis and original databases
for data entries and file making. With further advancements made in
computational methods , empolying rapid search algorithms (BLAST) with hundreds
of command options and input formats, the birth of bioinformatical science has
been made.
Applications
Bioinformatics is a
synergistic study of both biotechnology and information technology. In
biotechnology living organisms of micro level and macro level organizations are
employed, and manipulated towards harvesting beneficial products to human. In
recent years Biotechnology is turning into an industrial science through
genetic engineering.
Genetic engineering
helps the scientists to incorporate a single gene into an organism, and
synthesize the desirable product without affecting other genes and their
functions. In this way the biological systems or the microbial systems are
manipulated
Scope of Genetic Engineering
i. To manufacture drugs
and other life saving bioproducts such as insulin, growth hormones,
interferons, cytokines and monoclonal antibodies.
ii. For environmental
management to reduce or abate the pollution load in soil or water.
iii. In waste recycling
to increase productivity.
In plant breeding by the incorporation of
useful genes (nif genes = nitrogen fixing genes).
In bringing pest resistance in agriculture
crops.
vi. And in treatment of
diseases by way of gene therapy etc.
Such genetic
engineering and biotechnological processes involve knowledge of enormous number
of genes, their cooling and thier protein sequences. Computers and newly
evolved software packages are utilised for these purposes. Thus biological
studies are provided with a support from electronic computers. This new
integrated field constitutes Bioinformatics.
Scope of Bioinformatics
1. Bioinformatics helps to create an
electronic database on genomes and protein sequences from single celled
organisms to multicellular organisms.
2. It provides techniques by which
three-dimensional models of biomolecules could be understood along with their
structure and function.
3. It integrates mathematical, statistical
and computational methods to analyse biological, biochemical and biophysical
data.
4. Bioinformatics deals with methods for
starting, retrieving and analysing biological data such as nuclei acid
(DNA/RNA) and protein sequences, structure, functions pathways and genetic
interactions.
5. The computational methods in
bioinformatics extend information for probing not only at genome level or
protein level but up to whole organism level, or ecosystem level of
organization.
6. It provides genome level data for
understanding normal biological processes and explains the malfunctioning of
genes leading to diagnosing of diseases and designing of new drugs.
Definition of Database :
‘Creating’ database
means a coherent collection of data with inherent meaning, used for future
application. Database is a general repository of voluminous information or
records to be processed by a programme.
Databases are broadly
classified as
1 Generalized databases
2 Specialized databases.
3 Structural organisation of DNA, protein,
a. carbohydrates are included under
generalized databases.
b. Databases of Expressed Sequence Tags
(ESTs),
c. Genome Survey Sequences (GSS),
d. Single Nucleotide Polymorphisms (SNPs)
e. sequence Tagged sites (STSs).
f. RNA databases are included under
specialized data bases.
Generalized databases
contain
a. sequence database
b. structure databases.
.
Sequence databases are the sequence records of either nucleotides or amino
acids. The former is the nucleic acid databases and the latter are the protein
sequence databases.
.
Structure databases are the individual records of macromolecular structures.
The nucleic acid databases are again classified into primary databases and
secondary databases.
Nucliic acid database
further classified as
a. primary databases and
b. secondary databases
Primary databases
contain the data in their original form taken as such from the source eg.,
Genebank (NCBI/USA) Protein, SWISS-PROT (Switzerland), Protein 3D structure
etc.
Secondary databases
also called as value added databases contain annotated data and information
eg., OMIN – Online Mendelian Inheritance in Man. GDB - Genome Database – Human.
Nucleic acid sequence
databases
European Molecular
Biology Laboratory (EMBL) ; National Centre for Biotechnology Information
(NCBI) and DNA data bank of Japan (DDBJ) are the three premier institutes
considered as the authorities in the nucleotide sequence databases. They can be
reached at
www.ebi.ac.uk/embl (for
EMBL) www.ncbi.nlm.nih.gov/genbank (NCBI) www.ddbj.nig.ac.jp (DDBJ)
Protein sequence
databases :¬
The protein sequence
databases elucidate the high level annotations such as the description of the
protein functions ; their domain structure (configuration), amino acid
sequence, post-translational modifications, variants etc. SWISS-PROT groups at
SIB (Swiss Institute of Bioinformatics) and EBI (European Bioinformatics
Institue) have developed the protein sequence databases. SWISS-PROT is revealed
at http://www.expasy.ch/sprot-top.html.
Fig..Home page of DDBJ Genome sequencing :
The genome of an
organism can be split up into different sized molecules by a technique called
electrophoresis. When DNA of an organism is subjected to electrophoresis they
migrate towards the positive electrode because DNA is a negatively charged
molecule. Smaller DNA fragments move faster than longer ones. By comparing the
distances that the DNA fragments migrate, their number of bases could be
distinguished. The sequence of bases in the DNA fragments can be identified by
chemical / biochemical methods. Nowadays automated sequencing machines called
sequenators are developed to read hundreds of bases in the DNA. The DNA
sequence data are then stored in a computer accessible form.
DNA library :
A DNA library is a
collection of DNA fragments, which contains all the sequences of a single
organism.
cDNA library
(Complementary DNA) :¬
In cDNA copies of
messenger RNA are made by using reverse transcriptase enzymes. The cDNA
libraries are smaller than genomic libraries and contain only DNA molecules for
genes.
In the representation
of either the nucleotides or the proteins, IUB/ IUPAC standards are followed.
The accepted amino acid codes for proteins are given below.
T- Threonine
Z-Glutamate/glutamine X-any *-Translation stop —gap of indeterminate length.
The nucleic acid codes
as follows (FASTA format)
A-adenosine B-GTC
C-cytidine D-GATG-guanosine R- Purines (guanine, adenine) T-Thymidine Y-
Pyrimidines (thymidine, cytosine) U-Uridine H – ACT V-GCA N-AGCT B-GTC D-GAT
—gap of indeterminate length.
To specialize in
bioinformatics, knowledge of both biology and information computer technology
is required. A biologist needs to know programming, optimization (code) and cluster
analysis, as they are bioinformatics methods. The biologists should also be
familiar with key algorithms (set of steps). The languages, which help in
bioinformatics, are C, C++, JAVA, FORTRAN, LINUX, UNIX etc. Besides knowledge of ORACLE
database and Sybase are essential. On the mathematical part knowledge of
calculus and statistical techniques are needed. Knowledge of CGI (common
gateway interface) scripts is also needed. With the above, a bioinformaticist
could collect, organize, search and analyze the biological data viz., the
nucleic acids and protein sequences.
Uses of bio informatics
1. It helps to understand gene structure
and protein synthesis.
2. It helps to know more about the
diseases.
3. It helps to understand more about the
fundamental biology and the thread of life, - the DNA.
4.. It paves the way
for the medical and bio engineering applications.
5. It helps to apply
the biophysical and biotechnologicl principles to biological studies. In turn,
it will help to design new drugs and new chemical compounds to be used in
health and environmental management respectively.
Protein structure
Protein are linear
chains molecules made up of units called amino acids. Approxoimately twenty
different amino acids make up a protein chain. They are called polypeptide
chains as they often contain a few to several hundred amino acids linked with
each other by peptide bonds. Several polypeptide chains form subunits for a
large protein. For example the haemoglobin consists of four subunits (Two alpha
and two beta chains) each harbouring haeme, an iron containing molecule. The
peptide bond between amino acids is fairly flexible. As a result, oligopeptide
and polypeptide chains fold to a convoluted shapes. Every protein folds in a
particular way to form a distinctive configuration for its specific function.
The protein configuration is made primarily by the amino acids side chains.
Some amino acid side chains are electrically charged (positive or negative).
Others called polar molecules or neutral and strongly attract the electrons. A
third group of amino acids are said to have non-polar or hydrophobic side
chains. Thus proteins fold up in such a way that non-polar-hydrophobic groups
remain buried inside the molecule and the polar and charged groups remain
outside.
The sequential and
linear arrangement of amino acids in a polypeptide represents its primary
structure. The folding of protein chain to form recognizable modules such as
alpha helix and beta sheets represents its secondary structure . The three
dimensional shape of a polypeptide is called its tertiary structure.Alpha
helices and beta sheets provide further stability to protein structure.
The proteins
synthesized inside a cell undergo the above mentioned configurational changes
to attain stable structures. Otherwise, they will be digested or destroyed by
the cellular proteolytic enzymes. The proteins take up different profiles as
structural and functional proteins such as enzymes and hormones etc.
In proteomics, the
amino acid sequences are read by automated sequenators and stored in computers
as internationally available databases. The information regarding
three-dimensional structure of protein is stored in another computerized
database called Protein Data Bank. Only three dimensional forms are used to
define protein structure.
Protein Model
In proteomics, to
delineate information about a protein at atomic and molecular levels, models
are constructed. X-ray crystallography can give a skeleton model of a protein
from its results on its atomic details. With atomic data, computers nowadays
generate graphic images of the molecules on high-resolution screen. Computer
modeling of protein began as early as 1970. The computer-generated models
depict not only the properties of amino acids in a protein but also help to
understand the protein function. One of the computer graphic models is the
“Glowing coal” model.
Uses :
1. Protein structure helps in understanding
biomolecular arrangement in tissue or cellular architecture.
2. Protein structures, protein models and
computer aided graphic models help to understand biological reactions mediated
by enzymes (proteins).
3. Graphic models provided by computers are
valuable to predict which fragments of a medically important protein can be
used to design drugs and vaccines.
4. Proteomics also helps in chemical
industries to manufacture drugs, various chemical compounds and enzymes.
No comments:
Post a Comment