Thursday, April 16, 2020

Application of Bioinformatics


Bioinformatics : Application of Bioinformatics
Definition:
Bioinformatics deals with the creation and maintenance of databases of biological information such as the nucleic acid, gene sequences and protein sequences. It has its own applications in gene therapy, diagnostics, drug designing, crop improvement, biochemical processes etc. It involves the data analysis or creation of electronic databases on genomes and protein molecules.
History of Bioinformatics
From the beginning of the post Mendelian’s period, genetic principles propounded by various geneticists have revealed the functional behaviour of discrete hereditary particles called the genes, in the expression of various morphological (phenotypical) and biochemical traits of organisms. During the last three decades, the advancement in molecular biology, the invention of computers, ultra developments in scientific methodologies and introduction of instrumentation at nano level, have paved the way for the origin of bioinformatics.
The preliminary discoveries such as the amino acid sequence of bovine insulin (1950s), nucleic acid sequence of yeast alaine tRNA with 77 bases (1960s); X-ray crystallographic structure of protein, formed the basis and original databases for data entries and file making. With further advancements made in computational methods , empolying rapid search algorithms (BLAST) with hundreds of command options and input formats, the birth of bioinformatical science has been made.
Applications
Bioinformatics is a synergistic study of both biotechnology and information technology. In biotechnology living organisms of micro level and macro level organizations are employed, and manipulated towards harvesting beneficial products to human. In recent years Biotechnology is turning into an industrial science through genetic engineering.
Genetic engineering helps the scientists to incorporate a single gene into an organism, and synthesize the desirable product without affecting other genes and their functions. In this way the biological systems or the microbial systems are manipulated
Scope of Genetic Engineering
i. To manufacture drugs and other life saving bioproducts such as insulin, growth hormones, interferons, cytokines and monoclonal antibodies.
ii. For environmental management to reduce or abate the pollution load in soil or water.
iii. In waste recycling to increase productivity.
 In plant breeding by the incorporation of useful genes (nif genes = nitrogen fixing genes).
 In bringing pest resistance in agriculture crops.
vi. And in treatment of diseases by way of gene therapy etc.
Such genetic engineering and biotechnological processes involve knowledge of enormous number of genes, their cooling and thier protein sequences. Computers and newly evolved software packages are utilised for these purposes. Thus biological studies are provided with a support from electronic computers. This new integrated field constitutes Bioinformatics.
Scope of Bioinformatics
1.       Bioinformatics helps to create an electronic database on genomes and protein sequences from single celled organisms to multicellular organisms.
2.       It provides techniques by which three-dimensional models of biomolecules could be understood along with their structure and function.
3.       It integrates mathematical, statistical and computational methods to analyse biological, biochemical and biophysical data.
4.       Bioinformatics deals with methods for starting, retrieving and analysing biological data such as nuclei acid (DNA/RNA) and protein sequences, structure, functions pathways and genetic interactions.
5.       The computational methods in bioinformatics extend information for probing not only at genome level or protein level but up to whole organism level, or ecosystem level of organization.
6.       It provides genome level data for understanding normal biological processes and explains the malfunctioning of genes leading to diagnosing of diseases and designing of new drugs.
 Definition of Database :
‘Creating’ database means a coherent collection of data with inherent meaning, used for future application. Database is a general repository of voluminous information or records to be processed by a programme.
Databases are broadly classified as
1        Generalized databases
2        Specialized databases.
3        Structural organisation of DNA, protein,
a.       carbohydrates are included under generalized databases.
b.       Databases of Expressed Sequence Tags (ESTs),
c.       Genome Survey Sequences (GSS),
d.       Single Nucleotide Polymorphisms (SNPs)
e.       sequence Tagged sites (STSs).
f.       RNA databases are included under specialized data bases.
Generalized databases contain
a.       sequence database
b.       structure databases.
 .        Sequence databases are the sequence records of either nucleotides or amino acids. The former is the nucleic acid databases and the latter are the protein sequence databases.
 .        Structure databases are the individual records of macromolecular structures. The nucleic acid databases are again classified into primary databases and secondary databases.
Nucliic acid database further classified as
a.       primary databases and
b.       secondary databases
Primary databases contain the data in their original form taken as such from the source eg., Genebank (NCBI/USA) Protein, SWISS-PROT (Switzerland), Protein 3D structure etc.
Secondary databases also called as value added databases contain annotated data and information eg., OMIN – Online Mendelian Inheritance in Man. GDB - Genome Database – Human.
Nucleic acid sequence databases
European Molecular Biology Laboratory (EMBL) ; National Centre for Biotechnology Information (NCBI) and DNA data bank of Japan (DDBJ) are the three premier institutes considered as the authorities in the nucleotide sequence databases. They can be reached at
www.ebi.ac.uk/embl (for EMBL) www.ncbi.nlm.nih.gov/genbank (NCBI) www.ddbj.nig.ac.jp (DDBJ)

Protein sequence databases :¬
The protein sequence databases elucidate the high level annotations such as the description of the protein functions ; their domain structure (configuration), amino acid sequence, post-translational modifications, variants etc. SWISS-PROT groups at SIB (Swiss Institute of Bioinformatics) and EBI (European Bioinformatics Institue) have developed the protein sequence databases. SWISS-PROT is revealed at http://www.expasy.ch/sprot-top.html.




Fig..Home page of DDBJ Genome sequencing :
The genome of an organism can be split up into different sized molecules by a technique called electrophoresis. When DNA of an organism is subjected to electrophoresis they migrate towards the positive electrode because DNA is a negatively charged molecule. Smaller DNA fragments move faster than longer ones. By comparing the distances that the DNA fragments migrate, their number of bases could be distinguished. The sequence of bases in the DNA fragments can be identified by chemical / biochemical methods. Nowadays automated sequencing machines called sequenators are developed to read hundreds of bases in the DNA. The DNA sequence data are then stored in a computer accessible form.

DNA library :
A DNA library is a collection of DNA fragments, which contains all the sequences of a single organism.

cDNA library (Complementary DNA) :¬
In cDNA copies of messenger RNA are made by using reverse transcriptase enzymes. The cDNA libraries are smaller than genomic libraries and contain only DNA molecules for genes.
In the representation of either the nucleotides or the proteins, IUB/ IUPAC standards are followed. The accepted amino acid codes for proteins are given below.
T- Threonine Z-Glutamate/glutamine X-any *-Translation stop —gap of indeterminate length.

The nucleic acid codes as follows (FASTA format)
A-adenosine B-GTC C-cytidine D-GATG-guanosine R- Purines (guanine, adenine) T-Thymidine Y- Pyrimidines (thymidine, cytosine) U-Uridine H – ACT V-GCA N-AGCT B-GTC D-GAT —gap of indeterminate length.
To specialize in bioinformatics, knowledge of both biology and information computer technology is required. A biologist needs to know programming, optimization (code) and cluster analysis, as they are bioinformatics methods. The biologists should also be familiar with key algorithms (set of steps). The languages, which help in bioinformatics, are C, C++, JAVA, FORTRAN, LINUX,  UNIX etc. Besides knowledge of ORACLE database and Sybase are essential. On the mathematical part knowledge of calculus and statistical techniques are needed. Knowledge of CGI (common gateway interface) scripts is also needed. With the above, a bioinformaticist could collect, organize, search and analyze the biological data viz., the nucleic acids and protein sequences.

Uses of bio informatics
1.       It helps to understand gene structure and protein synthesis.
2.       It helps to know more about the diseases.
3.       It helps to understand more about the fundamental biology and the thread of life, - the DNA.
4.. It paves the way for the medical and bio engineering applications.
5. It helps to apply the biophysical and biotechnologicl principles to biological studies. In turn, it will help to design new drugs and new chemical compounds to be used in health and environmental management respectively.

Protein structure
Protein are linear chains molecules made up of units called amino acids. Approxoimately twenty different amino acids make up a protein chain. They are called polypeptide chains as they often contain a few to several hundred amino acids linked with each other by peptide bonds. Several polypeptide chains form subunits for a large protein. For example the haemoglobin consists of four subunits (Two alpha and two beta chains) each harbouring haeme, an iron containing molecule. The peptide bond between amino acids is fairly flexible. As a result, oligopeptide and polypeptide chains fold to a convoluted shapes. Every protein folds in a particular way to form a distinctive configuration for its specific function. The protein configuration is made primarily by the amino acids side chains. Some amino acid side chains are electrically charged (positive or negative). Others called polar molecules or neutral and strongly attract the electrons. A third group of amino acids are said to have non-polar or hydrophobic side chains. Thus proteins fold up in such a way that non-polar-hydrophobic groups remain buried inside the molecule and the polar and charged groups remain outside.
The sequential and linear arrangement of amino acids in a polypeptide represents its primary structure. The folding of protein chain to form recognizable modules such as alpha helix and beta sheets represents its secondary structure . The three dimensional shape of a polypeptide is called its tertiary structure.Alpha helices and beta sheets provide further stability to protein structure.
The proteins synthesized inside a cell undergo the above mentioned configurational changes to attain stable structures. Otherwise, they will be digested or destroyed by the cellular proteolytic enzymes. The proteins take up different profiles as structural and functional proteins such as enzymes and hormones etc.
In proteomics, the amino acid sequences are read by automated sequenators and stored in computers as internationally available databases. The information regarding three-dimensional structure of protein is stored in another computerized database called Protein Data Bank. Only three dimensional forms are used to define protein structure.

Protein Model
In proteomics, to delineate information about a protein at atomic and molecular levels, models are constructed. X-ray crystallography can give a skeleton model of a protein from its results on its atomic details. With atomic data, computers nowadays generate graphic images of the molecules on high-resolution screen. Computer modeling of protein began as early as 1970. The computer-generated models depict not only the properties of amino acids in a protein but also help to understand the protein function. One of the computer graphic models is the “Glowing coal” model.
Uses :
1.       Protein structure helps in understanding biomolecular arrangement in tissue or cellular architecture.
2.       Protein structures, protein models and computer aided graphic models help to understand biological reactions mediated by enzymes (proteins).
3.       Graphic models provided by computers are valuable to predict which fragments of a medically important protein can be used to design drugs and vaccines.
4.       Proteomics also helps in chemical industries to manufacture drugs, various chemical compounds and enzymes.

No comments:

Post a Comment