Thursday, March 27, 2008

Job opportunities in Bioinformatics

When we think about Job or Research after a degree in Bioinformatics, most of the people choose to do Ph.d in US and Europe.

Most of the European Universities prefer the students who are really interested in doing research for 3 - 4 years. Most of the universities don't expect TOEFL to apply for Ph.d.

American universities chose based on GRE and TOEFL scores to do Phd for 5 years.

Finally companies prefer the students who are strong in software side.

SHELL Scripts for Simple Bioinformatics Analysis

Simple SHELL script for parsing BLAST output

1. To parse the sequence names from BLAST output.

"grep" is one of the very powerful unix command to retrieve the particular pattern from a file.

Syntax:
grep "" input_file

Example: grep ">" Blast_output.txt

In this above example grep command will retrieve the lines which are having ">" symbol. In Blast output file all the sequence names are starting with ">". So you can get all the sequence names in the Blast output file.

Learn More


2. Parsing the Sequence names and the sequences from the BLAST output

"egrep" is one of the powerful command in retrieving multiple patterns from a file.

Syntax:
egrep "pattern1 | pattern2 | pattern3" filename

Example:
Below is the combination of SHELL and Perl script for parsing the BLAST Output.

egrep "> | sbjct" Blast_output | sed 's/Sbjct://' BLAST_output.txt >output.txt
open (FH, output.txt);
while(""= $ln)
{
if($ln !~ m/>/)
{
@temp = split(/\t/,$ln);
print "$temp[1]\n";
}
else
{
print $ln;
}
}

In the above example egrep will retrieve the lines which are matching with ">" and Sbjct and store the output in output.txt. Then the Perl script will parse the sequeunces.

Wednesday, March 26, 2008

Bioinformatics in India / Bioinformatics institutes and Companies in India.


Visit 123Bioinformatics.com for more updates.



India has produced many world renowned bioinformaticians. Below is the list of institutes and companies doing Bioinformatics in India.

1. IISc, Bangalore.

2. NCBS, Bangalore.

3. CCMB, Hyderabad.

4. CDFD, Hyderabad.

5. IIIT, Hyderabad.

6. IIT, Delhi.

7. Madras University, Chennai.

8. Madurai Kamaraj University, Madurai.

9. IBAB, Bnagalore.

10. IGIB, Delhi.

11. TIFR, Bombay.

12. Biobase, Bangalore.

13. Astrazenaca, Bangalore.

14. Avesthagen, Bangalore.

15. Cell Lines, Bangalore.

16. TCS, Hyderabad.

17. CDRI, Lucknow.

18. Lacoons (Conservation Genetics ), Hyderabad.

19. CDAC, Pune.

20. IICT, Hyderabad.

21, IIT , Mumbai.

22. IIT, CHennai.

23. Bharathidasan University, Trichy.

24. Madras University ( Biophysics), Chennai.

25. Anna University, Chennai.

26. IISc, ( Molecular Biophysics Unit), Bangalore.

27. IISc, ( Department of Physics).

28. IISc, ( Department of Biochemistry).

29. IISc, ( MRDG ).

30. Neeri, Nagpur.

31. NCL, Pune.

32. NARI, Pune.

33. NCCS, Pune.

34. NIV, Pune.

35. ICRISAT, Hyderabad.

36. IISc (Centre for Ecological Sciences), Bangalore.

37. IMTECH, Chandigarh.

38. ICGEB, Delhi.

39. JNU, Delhi.

40. JNCASR, Bangalore.

41. WII, Dehradun.

42. MONSANTO, Bangalore.

43. NBRC, Haryana.

44. National Centre for Plant Genome Research,JNU campus, New Delhi.

45. NII, Delhi.

46. IISER, Pune.

47. BII, Noida.

48. Institute of Cheminformatics Studies, Noida.

49. RGCB, Trivandrum.

50. CMFRI, Cochin.

51. National Institute of Oceanography, GOA.

52. NIMHANS, Bangalore.
And lot more ...

Branches of Bioinformatics




For more Updates visit 123Bioinformatics.com




These are some of the important fields in bioinformatics


1. Structural Bioinformatics:

Predicting the 3D structure of a protein from its protein sequence. Homology modelling is the best method for predicting the protein structures by using already structured or crystallized protein as a template. MODELLER is one of the best software for Homology modelling. Protein Data Bank is the data base for 3D co-ordinates of a protein.

Recent Studies ..

Crystal structure of Mycobacterium tuberculosis Rv0760c at 1.50 A resolution, a structural homolog of Delta(5)-3-ketosteroid isomerase.

2. Drug Designing:


Drug design is the approach of finding drugs by design, based on their biological targets. Typically a drug target is a key molecule involved in a particular metabolic or signalling pathway that is specific to a disease condition or pathology, or to the infectivity or survival of a microbial pathogen.
Computer-assisted drug design uses computational chemistry to discover, enhance, or study drugs and related biologically active molecules. Click to see the drug discovery softwares.

3. Phylogenetics:

Predicting the genetic or evolutionary relation of set of organisms. Mitochondrial SNPs and Microsatellites ( DNA repeats) are mostly used in Phylogenetics. MEGA,PAUP are PAUP* are some of the important softwares. Maximum Parsimony and Maximum Likelyhood are mostly used methods.

4. Computational biology:

Computational biology is an interdisciplinary field that applies the techniques of computer science, applied mathematics, and statistics to address problems inspired by biology.

5. Population Genetics:

Population Genetics is a study of genotype frequency distribution and the change in the genotype frequencies under the influence of Natural selection, genetics drift, mutation and gene flow. Coalescent theory is one of the most used theory to predict the most recent ancester. Arlequin is one of the best and most used software in population gentics.

6. Genotype Analysis:
Genotype = Genetic variation, SNP,Mutation ....
1. Studying Genotype and phenotype association.
2. Studying Genotype frequencies. There is no specific software for genotype analysis. But its called the "Generation Next Market using Bioinformatics....". Genotyping is mostly done using Illumina and Affy microarry chips.

2008 July - Recent Studies....

Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease.

Estimating coverage and power for genetic association studies using near-complete variation data.

Genetic diversity patterns at the human clock gene period 2 are suggestive of population-specific positive selection.


Environment And Genetics in Lung cancer Etiology (EAGLE) study: an integrative population-based case-control study of lung cancer.


7. Splicing Site prediction:

Splicing prediction is a very important application of Bioinformatics which is very important in Gene expression studies. Visit also Alternative Splicing site Predictior.
For More info

2008 July - Recent Studies ..

ASPicDB: a database resource for alternative splicing analysis.

Diagnostics of pathogenic splicing mutations: does bioinformatics cover all bases?



8. MiRNA prediction:


MiRNA = MicroRNA. MiRNA emerged as a new Gene regulatory element and gained more space in research. 20 -23 base pair RNA which regulates a gene or genes. So many methods and softwares have been developed to predicting this tiny RNAs. But still they are not precise in predicting. It means that we need some more information from experimental labs to predict.

MiRNA binds to the gene and regulates the gene. Most of the time it down regulate the gene expression. Predicting the MiRNA target is also a very important problem in Bioinformatics.

Database..
miRNA Registry from Sanger Institute.

MiRNA target prediction software


There are so many softwares for miRNA and Target prediction....

Recent Studies..
MicroRNA signatures of tumor-derived exosomes as diagnostic biomarkers of ovarian cancer.

Accelerated sequence divergence of conserved genomic elements in Drosophila melanogaster.

miRNA expression in the failing human heart: Functional correlates.

Computational analysis of miRNA-mediated repression of translation: Implications for models of translation initiation inhibition.



9. RNA Structure prediction:

The functional form of single stranded RNA molecules frequently requires a specific tertiary structure. The scaffold for this structure is provided by secondary structural elements which are hydrogen bonds within the molecule. This leads to several recognizable "domains" of secondary structure like hairpin loops, bulges and internal loops. There has been a significant amount of bioinformatics research directed at the RNA structure prediction problem.

10. Gene Prediction:

Predicting the Gene by the predefined conditions. Comparative genomics is the best method for predicting the gene.

Some of the softwares..

GeneMark, Genscan


11. Transcription factor binding site prediction:

Predicting the transcription factor. Most common method is to use "Comparative genomics". And finding clusters of motifs in the noncoding part of gene.

Predicting functional transcription factor binding through alignment-free and affinity-based analysis of orthologous promoter sequences.

12. Genome Annotation:

Predicitng the genes, coding and noncoding sequences are called genome annotation.
Most of the people follow comparative genomics to annotate the newly sequenced genomes.

GOLD
is the database for ongoing genome projects.

13. Ancestry Prediction:

Predicting the Ancestry of an individual based on his/her genetic signatures or SNPs.
mitochondrial SNPs are used in predicting Maternal ancestry because Mitochondria is passed ONLY through mother to the child.
Y chromosome SNPs are used in predicting paternal ancestry becuase Y chromsome is passed from Father to the child.
Ancestry is one of the successful field in Bioinformatics. Genography project by Dr. Spencer Wells is one of the finest one.

Recent studies..

Mitochondrial DNA haplogroup D4a is a marker for extreme longevity in Japan.

Analysis of Y-chromosomal biallelic polymorphisms in Sichuan Han of Chinese population

14. Mathematical Modelling:


Using mathemetics to predict the out come of some complex real time problems which cannot be done in lab or in reality. Ex: population dynamics.

Recent Studies..

Diagnosed and undiagnosed HIV-infected populations in Europe.

15. Ethnicity Prediction:


Predicting the ethnicity of an individual by using genetics variations. Each ethnicity is defined by a set of genetic variations.

16. Functional Domains prediction:


Predicting the protein domains which are functionaly important from its protein sequence like active sites in a protein.

Recent studies ..

Predicting protein function from domain content.


17. Motif Prediction /Pattern matching:


Predicting the motifs or motif clusters which are functionaly important.
Ex: regulatory motifs, Binding site motifs ...miRNA motics ..repeat motis ...Microsatellites are also a kind of motifs.
Recent studies...
Biomolecular network motif counting and discovery by color coding.


18. Protein - protein interaction:

19. Protein folding:

One of the famous and most important and still unsolved problem.

20. Database development:


In some sense Bioinformatics is called as "Comparative Method". Because Bioinformatics depends on Databases for all of its analysis. So developing data base is a very important project. Many companies surviving by devloping and updating the databases.

NCBI , PDB and UCSC genome browser are some of the very important databases.

21. Software development:


Incorporating the usage of Softwares in Biological analysis is called "Bioinformatics".

22. Developing Bioinformatics Methods/Approaches :

23. Primer designing:

24. Modeling genetics History:

25. Ancient DNA:

26. Population Genetics Simulations:

27. Finding SNPs:

28. Genome wide Association Studies:

29. Systems Biology:

30. Homology Search:

31. Computational Genomics:

Peronalized Medicine

Human anatomy is similar in functionalities with other human. But Human genome is not identical. Genes are responding differently with environment and life style. Each individual has variations in their genome. Based on these variations different drugs are responding in different level.

Now a days Genome wide disease association studies are very interesting and coming up with new SNPs and Disease association. Recent studies confirms the association of SNPs and Cancers.

Based on the SNPs even we can predict the traits like eye color, hair color..etc.

Genotyping cost is being reduced every month. So by genotyping an individual he can find the SNPs in his genome. And based on the SNPs in future doctors can refer the drugs.

Bioinformatics and CPAN Modules

Hundreds of Bioinformatics Perl Modules are available in CPAN for almost for all the Bioinformatics analysis works.

Here is Some of the Bioinformatics Modules.

1. To format the HTML output of BLAST

2. Automate the BLAST for number of sequences.

3. Running ClustalW

4. Population Genetics modules.

5. Phylogenetics Modules.

Comparative Genomics

"Compare and Predict" is the basic and whole idea of Bioinformatics.

Comparative Genomics is the Powerful method in Bioinformatics.

Application of Comparative Genomics in Bioinformatics ?

1. To predict and solve the Protein Structure based on existing solved structures.

2. To annotate the newly sequenced Genomes.

3. To predict the functionally important non-coding region or Patterns.

Perl Scripts for Bioinformatics

Why Bioinformatics people prefer Perl scripts ?

1. Perl Scripts are very easy for the String processing. Biological data like Genome sequences and protein sequences.

2. There is no strict rules for writing Perl scripts like other languages. That makes the biologist to write scripts easily.

3. File processing is very easy in Perl.

4. Perl scripts can be combined with SHELL scripts for processing.

5. Using Perl CGI we can develop the Web pages by combining with HTML.

6. CPAN contains so many Perl Modules which are Specific for Bioinformatics.

7. Perl is used for System administration purpose also.

8. Perl Template tool kit is another Perl product which makes the web page development as very easy one for developers.

9. Perl DBIx is an ultimate module for DBI applicaiton. It makes the DBI as an easy job.

10. Processing / Parsing a HTML file is very easy by using CPA modules.

11. File type conversion is also possible in Perl. Ex:Doc to PDF ,HTML to PDF ..Etc.