Saturday, August 9, 2008

Bioinformatics CPAN Modules / Bioinformatics CPAN Modules




Visit 123Bioinformatics.com for more Updates.



Bioperl is the product of a community effort
to produce Perl code which is useful in biology.

Bioperl Tutorial...

Bio::Tools::Run::PiseApplication::bambe

BAMBE Bayesian Analysis in Molecular
Biology and Evolution.

Bio::Tools::HMM
Perl extension to perform Hidden Markov
Model calculations.

Bio::Grep
Perl extension for searching in DNA and
Protein sequences.

Bio::Emboss
Write EMBOSS programs in Perl. This module
allows Perl programmers to access functions
of the EMBOSS (European Molecular Biology
Open Software Suite) package.

ncbinr2phenyxfasta.pl
Tranfoorm ncbirn bank into fasta with
annotated headers.

BLASTaid
A simple interface for byte indexing a
WU-BLAST multi-part report for
faster access.This module was written
to aid accessing specific reports from longer,
multi part WU-BLAST (http://blast.wustl.edu/)
alignments reports.

Bio::Tools::Run::PiseApplication::fasta
Bioperl class for FASTA Sequence database search.

Peptide::Pubmed
Extract peptide sequences from MEDLINE article abstracts.

ONTO-PERL
'ONTO-PERL' a collection of perl modules for dealing
with the Cell Cycle Ontology (CCO) and in general
with OBO ontologies (like the Gene Ontology).

FASTAParse
A light-weight parsing module for handling FASTA
formatted sequence within larger perl applications.

Bio::DB::SwissProt
Database object interface to SwissProt retrieval.

Bio::Tree::DistanceFactory
Construct a phylogenetic tree using distance based
methods.

Bio::Tree::Compatible
Testing compatibility of phylogenetic trees with
nested taxa.

Bio::Seq::Quality
Implementation of sequence with residue quality
and trace values.

Microarray
A Perl module for creating and manipulating DNA
Microarray experiment objects.

uniprotdat2fasta.pl
converts uniprot native text format (.dat or .seq)
into fasta file,reporting varsplic,signal,peptide,PTM,conflicts.

fasta-shuffle-notryptic.pl
Reads input fasta file and produce a shuffle
databank & avoid known cleaved peptides: shuffle
sequence but avoid producing known tryptic peptides.

Bio::Tools::Blast
Bioperl BLAST sequence analysis object.

Bio::Tools::Run::Alignment::Clustalw
Object for the calculation of a multiple sequence
alignment from a set of unaligned sequences or
alignments using the Clustalw program.

Bio::DB::EUtilities
Interface for handling web queries and data retrieval
from Entrez Utilities at NCBI.

Bio::Tools::Run::Alignment::TCoffee
Object for the calculation of a multiple sequence
alignment from a set.

Bio::Tools::Run::Alignment::Amap
Object for the calculation of an iterative multiple
sequence alignment from a set of unaligned sequences
or alignments using the Amap (2.0). program of
unaligned sequences or alignments using the TCoffee program.

Bio::Tools::Run::Primer3
Create input for and work with the output from the program
primer3.

Bio::Tools::Run::Seg
Object for identifying low complexity regions in a given
protein sequence.

Bio::Tools::Run::RepeatMasker
Wrapper for RepeatMasker Program.

Bio::Tools::Run::Pseudowise
Object for predicting 'pseudogenes' in a given sequence
given a protein and a CDNA sequence.

Bio::Tools::Run::Promoterwise
Wrapper for aligning two sequences using promoterwise.

Bio::Tools::Run::PiseApplication::stssearch
Searches a DNA database for matches with a set of STS
primers (EMBOSS).

Bio::Tools::Analysis::Protein::GOR4
wrapper around GOR4 protein secondary structure
prediction server.

Bio::Tools::Analysis::Protein::HNN
wrapper around HNN protein secondary structure prediction
server.

Bio::Tools::Analysis::Protein::Sopma
Wrapper around Sopma protein secondary structure prediction
server.

Bio::Tools::Run::Phylo::Phylip::Neighbor
Wrapper for the phylip program neighbor for creating a
phylogenetic tree(either through Neighbor or UPGMA) based
on protein distances based on amino substitution rate.

Bio::Align::ProteinStatistics
Calculate Protein Alignment statistics (mostly distances).

Bio::Tools::Prepeat
Finding repeats in protein sequences.

Bio::Tools::Analysis::Protein::Scansite
Wrapper around the Scansite server.

Bio::Tools::OddCodes
Object holding alternative alphabet coding for one
protein sequence.

make_mrna_protein
Convert an input mRNA/cDNA sequence into protein.

Chemistry::File::PDB
Protein Data Bank file format reader/writer.

Bio::Tools::Analysis::Protein::ELM
This module is a wrapper around the ELM server (http://elm.eu.org/)
which predicts short functional motifs on amino acid sequences.

Bio::Tools::Analysis::Protein::Mitoprot
Wrapper around Mitoprot server.

Bio::Tools::Run::Genewise
Object for predicting genes in a given sequence given a protein.

Bio::Tools::Run::Tmhmm
Object for identifying transmembrane helixes in a given
protein seequence.

Bio::SeqFeature::Gene::GeneStructure
A feature representing an arbitrarily complex structure
of a gene.

GO::AnnotationProvider::AnnotationParser
Parses a gene annotation file.

GO::OntologyProvider::OboParser
Provides API for retrieving data from Gene Ontology obo file.

Bio::SAGE::Comparison
Compares data from serial analysis of gene expression
(SAGE) libraries.

Friday, August 8, 2008

What is Perl ?

Perl:

* Perl is a stable, cross platform programming language.
* Perl stands for Practical Extraction and Report Language.
* It is used for mission critical projects in the public and private
sectors.
* Perl is Open Source software, licensed under its Artistic

License or the GNU General Public License (GPL).
* Perl was created by Larry Wall.
* Perl 1.0 was released to usenet's alt.comp.sources in 1987
* PC Magazine named Perl a finalist for its 1998 Technical
Excellence Award in the Development Tool category.
* Perl is listed in the Oxford English Dictionary.


Supported Operating Systems:

* Unix systems
* Macintosh - (OS 7-9 and X) see The MacPerl Pages.
* Windows - see ActiveState Tools Corp.
* VMS
* And many more...


Best Features Of Perl :

* Perl takes the best features from other languages, such as C, awk,
sed, sh, and BASIC, among others.
* Perls database integration interface supports third-party databases including Oracle, Sybase, Postgres MySQL and others.
* Perl works with HTML, XML, and other mark-up languages.
* Perl supports Unicode.
* Perl is Y2K compliant.
* Perl supports both procedural and object-oriented programming.
* Perl interfaces with external C/C++ libraries through XS or SWIG.
* Perl is extensible. There are over 500 third party modules available
from the Comprehensive Perl Archive Network.
* The Perl interpreter can be embedded into
other systems.

PERL and the Web

* Perl is the most popular web programming language due to its text
manipulation capabilities and rapid development cycle.
* Perl is widely known as " the duct-tape of the Internet.
* Perl's CGI.pm module, part of Perl's standard distribution, makes
handling HTML forms simple.
* Perl can handle encrypted Web data, including e-commerce transactions.
* Perl can be embedded into web servers to speed up processing by as
much as 2000%.
* mod_perl allows the Apache web server to embed a Perl interpreter.
* Perl's DBI package makes web-database integration easy.

Reference:
Tutorialpoint

Wednesday, August 6, 2008

Bioinformatics Definition / Bioinformatics Definitions / What is Bioinformatics ?

Bioinformatics is a tool to solve the Biological problems based on existing data.

Bioinformatics is a method to solve the Biological outcomes based on existing experimental results.

Bioinformatics = Biology + Informatics + Statistics + (Bio-Chemistry + Bio- Physics).

Bioinformatics creates the way for the Biologists to store all the data.

Bioinformatics makes some lab experiments easy by predicting the outcome of the lab experiment.

Somtimes Bioinformatics shows the initial way to start the lab experiment from existing results.

Bioinformatics helps the researchers to get an idea about any lab experiments before they start.

Sunday, June 29, 2008

ScalaBLAST - For High End genome analysis

Analyzing the whole genome is still a time taking process. A new computational tool developed at the Department of Energy's Pacific Northwest National Laboratory is speeding up our understanding of the machinery of life – bringing us one step closer to curing diseases, finding safer ways to clean the environment and protecting the country against biological threats.

ScalaBLAST is a sophisticated "sequence alignment tool" that can divide the work of analyzing biological data into manageable fragments so large data sets can run on many processors simultaneously. The technology means large-scale problems – such as the analysis of an organism – can be solved in minutes, rather than weeks.

Using ScalaBLAST, researchers can manage the large influx of data resulting from new questions that arise during human genome research. Prior to this new tool, it took researchers 10 days to analyze one organism. Now, researchers can analyze 13 organisms within nine hours, making the time-to-solution hundreds of times faster.

Learn More about ScalaBLAST...

ScalaBLAST

PyroBayes - Analyze 500,000 Sequences in 10 Mins.

Human Genome and Annotation project took more than a decade to complete. Boston College Biologist Gabor Marth and his research team have developed software that can analyze half a million DNA sequences in 10 minutes.
The Marth laboratory's proprietary PyroBayes software is one of a new breed of computer programs able to accurately process the mountains of genome data flowing from the latest generation of gene decoding machines, which have placed a premium on computational speed and accuracy in data-crunching fields known as bioinformatics and high-throughput biology, said Marth, an associate professor of Biology.

Learn More about Pyrobayes.....

Pyrobayes: an improved base caller for SNP discovery in pyrosequences.

The MarthLab : PyroBayes

Saturday, June 28, 2008

Protein Encyclopedia / Human Protein Database.

Advances in Molecular technology have made data generation much easier, but processing it and interpreting observations are now the major hurdles in science today.Johns Hopkins Institute of Genetic Medicine has compiled the Database of Human Proteins which contains the experimental information about human proteins.Human Proteinpedia contains information on when and where specific proteins are expressed or not, including in cells and tissues from diseases such as cancers. Human Proteinpedia allows any researcher to contribute and edit their data as their research progresses.

Highlights of Proteinpedia ...

Number of contributing labs 71
Number of Experiments 2,695
Protein Entries 15,231
Peptide Identifications 1,851,124
MS/MS Spectra 4,567,235
Protein Expression 138,487
Post-translational modifications 17,108
Protein-protein interactions 31,476
Subcellular localization 2,906













Visit ProteinPedia

1. http://www.humanproteinpedia.org/

Friday, June 27, 2008

Bioinformatics Softwares / Tools - Gene Prediction


Gene prediction / Gene finding softwares:

After sequencing a genome of a organism the next and the most important step is to predict the genes in the genome. Homology Search method (Ex:BLAST)is a very simple and straight forward method to predict genes.


  • GLIMMER - To identify coding regions in microbial DNA.

  • GeneScan - To predict complete gene structures, including exons, introns, promoter and poly-adenylation signals, in genomic sequences

  • GeneMark - For finding genes in bacterial DNA sequences.

  • WebGene
    - Web interface for several coding region recognition programs.

Saturday, June 14, 2008

Bioinformatics Market Growth - Year wise

Bioinformatics Opportunities in India

Outsourcing to India, compared to other developed countries, offers about 30-40% costs savings in overall drug discovery research, and close to 60% cost savings when outsourcing core bioinformatics services. This is due to the lower wage costs for skilled manpower, and lower infrastructure costs.

The Indian bioinformatics market has grown from $18 million in 2003-04 to $35 million in 2006-07, at a CAGR of 25%. Interestingly, owing to low local demand, $32 million or about 90% of bioinformatics revenues in India are derived from outsourcing activities. "Local demand for bioinformatics services is low due to low investment in new drug discovery. Even though research investments of life sciences companies are increasing, they are still small compared to global standards" says Ashutosh Mundkur, VBU Life Sciences, Service Offerings, Satyam Computer Services Ltd.

The Indian bioinformatics outsourcing services opportunity is estimated to grow at 25% per annum during 2007-2010 raising its share of the global market from 1.4% in 2007 to 1.7% in 2010. These estimates are made based on the current plans of Indian vendors, as well as considering the impact of scarcity in human resources. Improved availability of skilled workers could help take growth rates higher. Similarly, positive actions by the Indian government to enhance IP rights could also help raise growth.


Indian Bioinformatics Market:


Pharmaceutical companies are under constant pressure to develop new “blockbuster” drugs to replace older ones that are going off patent. With costs to launch a new drug crossing $1 billion and the number of drugs approved for commercial launch decreasing, pharmaceutical companies are increasingly looking at biotechnology to deliver results.

The Indian bioinformatics industry comprises vendors with origins in the life sciences domain or the IT domain. We estimate that there are about 45-50 companies active in the bioinformatics segment in India. Of these, about 35 are involved in the development of software tools, database solutions and providing bioinformatics services; the remaining largely into marketing third party products and services.

2005 Report:

The global bioinformatics market is currently estimated at about $1.4 billion (€1.1 billion). It is expected to grow at an average annual growth rate of 15.8 per cent to reach nearly $3 billion by 2010

Pharmaceutical companies are expected to increase their R&D expenditure in the future. A major portion of this R&D spending is expected to go into bioinformatics. Global drug discovery spending is estimated to increase from $19.6 billion in 2002 to $25.1 billion in 2006.

Scientists are acquiring genomics data through the use of techniques such as amplification, DNA microarray expression, real-time PCR and genotyping. Instrumentation, hardware and software are then required to analyse, integrate and transmit this vast amount of data, which has resulted in significant IT challenges for those in the field.

The segment is estimated to grow at an average annual growth rate (AAGR) of 21.2 per cent from $444.7 million in 2005 to $1.16 billion in 2010. Growth in the genomics-based content will be the key driver for the rise in genomics-based analysis software and services segment.


References:
1. Drug Researcher

2. PharmaAsia

Friday, June 13, 2008

Bioinformatics Books Collection / Bioinformatics Book Collection.


Visit 123Bioinformatics.com for more Updates




Tuesday, May 27, 2008

Bioinformatics openings in Southeast Asia / Bioinformatics Jobs.


Malaysia is one of the fastest growing economic powers in Asia pacific. Cyberjaya is the IT hub of Malaysia with hundreds of software companies. Follow the below link to apply for Bioinformatics and IT jobs in CyberJaya, Malaysia.

http://www.cyberjaya-jobs.com/

123 Bioinformatics Team

Thursday, May 1, 2008

Setting up Standalone BLAST Software in Linux


Visit 123Bioinformatics.com for more updates.

Installing and executing stand-alone BLAST softwares in Linux.


Stand alone BLAST is the local installation of the NCBI BLAST suite of programs. NCBI provides binaries for various platforms. It is the same as the NCBI BLAST programs except that we can execute in the local machine.

The local version is significant when we have a large set of sequences to BLAST and this is not affected by the Internet speed /Traffic etc and it can be automated.

The stand alone blast can be downloaded from the NCBI FTP site (The link can be found at the bottom side tool bar in the NCBI main page “FTP Site-> Blast-> executables->Latest”).

The file should be in binary mode. Filenames are of the following form:

Program-version-architecture-os.extension Remember to choose the appropriate architecture (32 bit or 64 bit). Download the file and extract the contents in the gzip'ed tar archive. The ‘.gz’ file extension indicates that the file has been compressed with gzip (a standard Unix compression utility), The ‘.tar’ extension indicates that the file is a tape archive created with tar (a standard Unix archiving tool).

To uncompress ‘gunzip’ and extract the files from the archive into the current working directory follow the comments given below.

jk@jk:~/Desktop/blast-2.2.18/bin$ gunzip blast-2.2.18-ia32-linux.tar.gz #uncompress

jk@jk:~/Desktop/blast-2.2.18/bin$ tar -xpf blast-2.2.18-ia32-linux.tar #extract

For more information on the options look into $man tar/gunzip.

When you get into the extracted directory you can see three other directories (bin, data, doc). The doc directory contains the README files for each software. The data directory contains the scoring matrices. The bin directory contains all the executables for running various BLAST searches.

How to execute bl2seq (BLAST two sequence):

Bl2seq performs a comparison between two sequences using either the blastn or blastp algorithm. Both sequences must be either nucleotides or proteins.

The input files to any BLAST softwares should always be in FASTA format.

eg
>gi|229673|pdb|1ALC| Alpha-Lactalbumin
KQFTKCELSQNLYDIDGYGRIALPELICTMFHTSGYDTQAIVENDESTEYGLFQISNALWCKSSQSPQSR
NICDITCDKFLDDDITDDIMCAKKILDIKGIDYWIAHKALCTEKLEQWLCEKE

Syntax:

jk@jk:~/Desktop/blast-2.2.18/bin$
./bl2seq - # Displays all options

You can choose the required options. The must-options are -p, -i, -j. The other options can be defined or elze the program will choose the default value.

jk@jk:~/Desktop/blast-2.2.18/bin$ ./bl2seq -p blastp -e 0.01 -i -j # blastp -to execute protein sequence
-i First sequence [File In]
-j Second sequence [File In]
-p Program name: blastp, blastn, blastx, tblastn, tblastx. For blastx 1st sequence should be nucleotide, tblastn 2nd sequence nucleotide.
-e E-Value # (optional)

The two input files (file1, file2) should be in the (/blast-2.2.18/bin) current working directory for the above syntax to work. If not, give the appropriate path. If you have multiple FASTA sequences to compare you can automate the above syntax using shell scripts.

How to execute Blastall:

Blastall is most commonly used tool. It can perform all BLAST programs like blastp, blastn, blastx, tblastn, tblastx. Unlike the bl2seq, The blastall is used when you have multiple FASTA sequences as input/queries and searched against the appropriate protein/nucleotide database.
You can download the Protein or Nucleotide database from swissprot or NCBI. for eg to download the human chr22,

go to NCBI-> FTP site-> RefSeq-> H_sapiens-> H_sapiens ->chr22.

Note:

FASTA formatted files are not compatible for the BLAST programs. You need to prepare the FASTA files for BLAST with formatdb. This indexes the entries in the FASTA file and enables BLAST to run much faster.
Uncompress the database. It will look like the one below if its a protein sequence database. The multiple sequence input query to blastall will look similar to this.

>gi|86438068|gb|AAI12638.1| HGD protein [Bos taurus]
MTELKYISGFGNECASEDPRCPGALPEGQNNPQVCPYNLYAEQLSGSAFTCPRSTNKRSWLYRILPSVSH
KPFEFIDQGHITHNWD
>gi|116283875|gb|AAH44758.1| Hgd protein [Mus musculus]
MSVLQRILAVQVPCPKDSWLYRILPSVSHKPFESIDQGHVTHNWDEVGPDPNQLRWKPFEIPKASEKKVD
FVSGLYTLCGAGDIKSNNGLAVHIFLCNSSMENRCFYNSDGDFLIVPQKGKLLIYTEFGKMSLQPNEICV
>gi|116283724|gb|AAH24369.1| Hgd protein [Mus musculus]
MSVLQRILAVQVPCPKDSWLYRILPSVSHKPFESIDQGHVTHNWDEVGPDPNQLRWKPFEIPKASEKKVD



Formatdb:


jk@jk:~/Desktop/blast-2.2.18/bin$ ./formatdb - # displays all options

jk@jk:~/Desktop/blast-2.2.18/bin$ ./blast-2.2.18/bin/formatdb -i -o T -p T

-i Input file(s) for formatting (this parameter must be set) [File In]
-p Type of file T - protein F - nucleotide (default = T)
-o Parse options T - True: Parse SeqId and create indexes. F - False: Do not parse SeqId. ( default = F)

The input database should be in the (/blast-2.2.18/bin) current working directory for the above syntax to work. If not, give the appropriate path.

After running formatdb you can see seven indexes and data files along with the original input file. All the seven files are required for the blastall to run. Make sure the database along with the generated input database is kept in the same directory. View the contents of formatdb.log for error messages.

2. Executing Blastall:

jk@jk:~/Desktop/blast-2.2.18/bin$ ./blastall -i -p blastp -d -o

-p Program Name [String] Input should be one of "blastp", "blastn", "blastx", "tblastn", or "tblastx".
-d Database [String] default = nr The database specified must first be formatted with formatdb.
-i Query File [File In]
-o BLAST report Output File [File Out]

The input database should be in the (/blast-2.2.18/bin) current working directory for the above syntax to work. If not, give the appropriate path.

The output file will contain the BLAST output for all the input query sequences.

Monday, April 14, 2008

Perl for Bioinformatics.



visit 123Bioinformatics.com for more updates





CPAN offers two command-line utility modules. Perl-Tidy module to beautify, indent, and reformat a messy Perl script and Perl-Critic module to test/analyze Perl scripts.

a. Perl::Tidy
When a Perl script is given as an input to perltidy, it creates a intended, structured Perl script and saves it as a separate file using the same name but with a .ty extension. Perltidy does not change the input script.

Steps to follow,
1. Install Perl::Tidy. It can be run on any system with perl 5.004 or later and used on Unix, Windows, VMS and MacPerl.
2. To execute perltidy,
$ perltidy -[option] test_perl_script.pl
This will create a temporary file test_perl_.pl.ty. The test_perl_script.pl .ty file will contain the well structured perl script. There are many options that can be used indent, to take a back-up etc. For more information on installation and execution see, http://perltidy.sourceforge.net/tutorial.html


b. Perl::Critic


Perl-Critic criticizes/analyses the input Perl script and enforces the user to follow various coding guidelines (or policies). The coding guidelines are based on Damian Conway's book Perl Best Practices. The user can enable/disable or create and customize the modules through the Perl::Critic interface.
The user can set the severity levels. There are 5 severity levels: severity "5" is the most or least restrictive level ie Perl::Critic follow the basic policies/guidelines. The five levels are Gentle (equivalent to 5), stern (equivalent to 4), harsh (equivalent to 3), cruel (equivalent to 2), brutal (equivalent to 1).
Perl::Critic requires a few modules to be pre-installed for it to execute. See http://search.cpan.org/~elliotjs/Perl-Critic-1.082/lib/Perl/Critic.pm


Steps to follow,

1. Install Perl::Critic.
2. Execute Perl::Critic
$ perlcritic –1 test_perl_script.pl

For more information see, http://search.cpan.org/dist/Perl-Critic/

Tuesday, April 8, 2008

Why PERL makes life easy for Bioinformaticians ?

Plz Visit 123Bioinformatics.com for more Updates.

1. Perl Scripts are very easy for the String processing when using biological data like Genome sequences or protein sequences.

2. File handling is easy in Perl.

3. Perl regular expression is very flexible and easy to match similar patters rather than identical ones. It can be used in instance like matching a motif or a repeat in a sequence.

4. There are no strict rules for writing Perl scripts like other languages. That makes it easy for the biologist to learn Perl in short period.

5. Perl scripts can be combined with SHELL scripts for text processing.

6. Using Perl CGI and HTML one can develop the Web pages. Perl CGI is very similar to Perl scripts.

7. CPAN contains hundreds of Perl Modules which are Specific for sequence analysis.
Eg: FASTAParse , Peptide::Pubmed .

8. Perl can be used for System administration purpose also.

9. Perl Template tool kit is another Perl product which can be used for developing advanced web pages.

10. Using perl DBIx it is easier to pass mysql data (backend) to the web page(front end).

11. Processing / Parsing a HTML file is very easy by using CPAN modules.

12. File type conversion is possible in Perl using CPAN modules. Ex:Doc to PDF ,HTML to PDF ..Etc.

13. By using Perl Magick module we can do image processing.

14. Perl critic module will help you to write a best Perl codes by criticizing your code structure.

Tuesday, April 1, 2008

What is Bioinformatics

What is bioinformatics ?

It's method to predict the biological outcomes before anyone go for full fledged research. It's a method to compare the biological data. Ex: sequence analysis. It's a way to predict or solve the protein structure.
It's the only way for PERSONALIZED MEDICINE in this post genomic era. It's the method to do comparative genomics and predict the Human homolog genes in other species.
It's the method to annotate the newly sequenced genomes.

How the biological problems can be predicted ?

We are living in the world of Computers. By analyzing the existing biological data using Information Technology we can predict the biological outcomes.

What is the HOTTEST branch of Bioinformaics in this post genomic era ?

Personalized medicine is the most hottest and fastest growing field. Personalized medicine can be achieved through bioinformatics only.

Monday, March 31, 2008

Genetic Genealogy or Ancestry

Genetic Genealogy is one of the very interesting and successful commercial field in Bioinformatics and Genetics. Based on the Genetic markers of an individual, researchers came up with an idea to predict the genealogy or ancestry of the individual.

There are two types of Ancestry prediction. One is Maternal and the second one is paternal. Maternal ancestry prediction is based on Mitochondrial Genetic Markers(SNPs). That will tell the story of your ancestors and the migration path they took from Africa. Paternal ancestry prediction is based on Y chromosome STRs.

100,000 Years before the First Human called Eve lived in Africa near Ethiopia. she belongs to the Haplogroup called L0. Descendants of L0 called L3 who are the first people came out of Africa to explore the new world 60,000 years before. First they reached Middle eastern part. Then some of the people came to Europe and Asia. Some of the people belong to Haplogrop A crossed Siberia and reached America 10,000 - 20,000 years before. Genography project has defined the Markers for each haplogroups. There are so many companies for the Ancestry prediction.

Learn More
1. Haplogroups.
2. Maternal Ancestry.
3. Paternal Ancestry.

Thursday, March 27, 2008

Job opportunities in Bioinformatics

When we think about Job or Research after a degree in Bioinformatics, most of the people choose to do Ph.d in US and Europe.

Most of the European Universities prefer the students who are really interested in doing research for 3 - 4 years. Most of the universities don't expect TOEFL to apply for Ph.d.

American universities chose based on GRE and TOEFL scores to do Phd for 5 years.

Finally companies prefer the students who are strong in software side.

SHELL Scripts for Simple Bioinformatics Analysis

Simple SHELL script for parsing BLAST output

1. To parse the sequence names from BLAST output.

"grep" is one of the very powerful unix command to retrieve the particular pattern from a file.

Syntax:
grep "" input_file

Example: grep ">" Blast_output.txt

In this above example grep command will retrieve the lines which are having ">" symbol. In Blast output file all the sequence names are starting with ">". So you can get all the sequence names in the Blast output file.

Learn More


2. Parsing the Sequence names and the sequences from the BLAST output

"egrep" is one of the powerful command in retrieving multiple patterns from a file.

Syntax:
egrep "pattern1 | pattern2 | pattern3" filename

Example:
Below is the combination of SHELL and Perl script for parsing the BLAST Output.

egrep "> | sbjct" Blast_output | sed 's/Sbjct://' BLAST_output.txt >output.txt
open (FH, output.txt);
while(""= $ln)
{
if($ln !~ m/>/)
{
@temp = split(/\t/,$ln);
print "$temp[1]\n";
}
else
{
print $ln;
}
}

In the above example egrep will retrieve the lines which are matching with ">" and Sbjct and store the output in output.txt. Then the Perl script will parse the sequeunces.

Wednesday, March 26, 2008

Bioinformatics in India / Bioinformatics institutes and Companies in India.


Visit 123Bioinformatics.com for more updates.



India has produced many world renowned bioinformaticians. Below is the list of institutes and companies doing Bioinformatics in India.

1. IISc, Bangalore.

2. NCBS, Bangalore.

3. CCMB, Hyderabad.

4. CDFD, Hyderabad.

5. IIIT, Hyderabad.

6. IIT, Delhi.

7. Madras University, Chennai.

8. Madurai Kamaraj University, Madurai.

9. IBAB, Bnagalore.

10. IGIB, Delhi.

11. TIFR, Bombay.

12. Biobase, Bangalore.

13. Astrazenaca, Bangalore.

14. Avesthagen, Bangalore.

15. Cell Lines, Bangalore.

16. TCS, Hyderabad.

17. CDRI, Lucknow.

18. Lacoons (Conservation Genetics ), Hyderabad.

19. CDAC, Pune.

20. IICT, Hyderabad.

21, IIT , Mumbai.

22. IIT, CHennai.

23. Bharathidasan University, Trichy.

24. Madras University ( Biophysics), Chennai.

25. Anna University, Chennai.

26. IISc, ( Molecular Biophysics Unit), Bangalore.

27. IISc, ( Department of Physics).

28. IISc, ( Department of Biochemistry).

29. IISc, ( MRDG ).

30. Neeri, Nagpur.

31. NCL, Pune.

32. NARI, Pune.

33. NCCS, Pune.

34. NIV, Pune.

35. ICRISAT, Hyderabad.

36. IISc (Centre for Ecological Sciences), Bangalore.

37. IMTECH, Chandigarh.

38. ICGEB, Delhi.

39. JNU, Delhi.

40. JNCASR, Bangalore.

41. WII, Dehradun.

42. MONSANTO, Bangalore.

43. NBRC, Haryana.

44. National Centre for Plant Genome Research,JNU campus, New Delhi.

45. NII, Delhi.

46. IISER, Pune.

47. BII, Noida.

48. Institute of Cheminformatics Studies, Noida.

49. RGCB, Trivandrum.

50. CMFRI, Cochin.

51. National Institute of Oceanography, GOA.

52. NIMHANS, Bangalore.
And lot more ...

Branches of Bioinformatics




For more Updates visit 123Bioinformatics.com




These are some of the important fields in bioinformatics


1. Structural Bioinformatics:

Predicting the 3D structure of a protein from its protein sequence. Homology modelling is the best method for predicting the protein structures by using already structured or crystallized protein as a template. MODELLER is one of the best software for Homology modelling. Protein Data Bank is the data base for 3D co-ordinates of a protein.

Recent Studies ..

Crystal structure of Mycobacterium tuberculosis Rv0760c at 1.50 A resolution, a structural homolog of Delta(5)-3-ketosteroid isomerase.

2. Drug Designing:


Drug design is the approach of finding drugs by design, based on their biological targets. Typically a drug target is a key molecule involved in a particular metabolic or signalling pathway that is specific to a disease condition or pathology, or to the infectivity or survival of a microbial pathogen.
Computer-assisted drug design uses computational chemistry to discover, enhance, or study drugs and related biologically active molecules. Click to see the drug discovery softwares.

3. Phylogenetics:

Predicting the genetic or evolutionary relation of set of organisms. Mitochondrial SNPs and Microsatellites ( DNA repeats) are mostly used in Phylogenetics. MEGA,PAUP are PAUP* are some of the important softwares. Maximum Parsimony and Maximum Likelyhood are mostly used methods.

4. Computational biology:

Computational biology is an interdisciplinary field that applies the techniques of computer science, applied mathematics, and statistics to address problems inspired by biology.

5. Population Genetics:

Population Genetics is a study of genotype frequency distribution and the change in the genotype frequencies under the influence of Natural selection, genetics drift, mutation and gene flow. Coalescent theory is one of the most used theory to predict the most recent ancester. Arlequin is one of the best and most used software in population gentics.

6. Genotype Analysis:
Genotype = Genetic variation, SNP,Mutation ....
1. Studying Genotype and phenotype association.
2. Studying Genotype frequencies. There is no specific software for genotype analysis. But its called the "Generation Next Market using Bioinformatics....". Genotyping is mostly done using Illumina and Affy microarry chips.

2008 July - Recent Studies....

Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease.

Estimating coverage and power for genetic association studies using near-complete variation data.

Genetic diversity patterns at the human clock gene period 2 are suggestive of population-specific positive selection.


Environment And Genetics in Lung cancer Etiology (EAGLE) study: an integrative population-based case-control study of lung cancer.


7. Splicing Site prediction:

Splicing prediction is a very important application of Bioinformatics which is very important in Gene expression studies. Visit also Alternative Splicing site Predictior.
For More info

2008 July - Recent Studies ..

ASPicDB: a database resource for alternative splicing analysis.

Diagnostics of pathogenic splicing mutations: does bioinformatics cover all bases?



8. MiRNA prediction:


MiRNA = MicroRNA. MiRNA emerged as a new Gene regulatory element and gained more space in research. 20 -23 base pair RNA which regulates a gene or genes. So many methods and softwares have been developed to predicting this tiny RNAs. But still they are not precise in predicting. It means that we need some more information from experimental labs to predict.

MiRNA binds to the gene and regulates the gene. Most of the time it down regulate the gene expression. Predicting the MiRNA target is also a very important problem in Bioinformatics.

Database..
miRNA Registry from Sanger Institute.

MiRNA target prediction software


There are so many softwares for miRNA and Target prediction....

Recent Studies..
MicroRNA signatures of tumor-derived exosomes as diagnostic biomarkers of ovarian cancer.

Accelerated sequence divergence of conserved genomic elements in Drosophila melanogaster.

miRNA expression in the failing human heart: Functional correlates.

Computational analysis of miRNA-mediated repression of translation: Implications for models of translation initiation inhibition.



9. RNA Structure prediction:

The functional form of single stranded RNA molecules frequently requires a specific tertiary structure. The scaffold for this structure is provided by secondary structural elements which are hydrogen bonds within the molecule. This leads to several recognizable "domains" of secondary structure like hairpin loops, bulges and internal loops. There has been a significant amount of bioinformatics research directed at the RNA structure prediction problem.

10. Gene Prediction:

Predicting the Gene by the predefined conditions. Comparative genomics is the best method for predicting the gene.

Some of the softwares..

GeneMark, Genscan


11. Transcription factor binding site prediction:

Predicting the transcription factor. Most common method is to use "Comparative genomics". And finding clusters of motifs in the noncoding part of gene.

Predicting functional transcription factor binding through alignment-free and affinity-based analysis of orthologous promoter sequences.

12. Genome Annotation:

Predicitng the genes, coding and noncoding sequences are called genome annotation.
Most of the people follow comparative genomics to annotate the newly sequenced genomes.

GOLD
is the database for ongoing genome projects.

13. Ancestry Prediction:

Predicting the Ancestry of an individual based on his/her genetic signatures or SNPs.
mitochondrial SNPs are used in predicting Maternal ancestry because Mitochondria is passed ONLY through mother to the child.
Y chromosome SNPs are used in predicting paternal ancestry becuase Y chromsome is passed from Father to the child.
Ancestry is one of the successful field in Bioinformatics. Genography project by Dr. Spencer Wells is one of the finest one.

Recent studies..

Mitochondrial DNA haplogroup D4a is a marker for extreme longevity in Japan.

Analysis of Y-chromosomal biallelic polymorphisms in Sichuan Han of Chinese population

14. Mathematical Modelling:


Using mathemetics to predict the out come of some complex real time problems which cannot be done in lab or in reality. Ex: population dynamics.

Recent Studies..

Diagnosed and undiagnosed HIV-infected populations in Europe.

15. Ethnicity Prediction:


Predicting the ethnicity of an individual by using genetics variations. Each ethnicity is defined by a set of genetic variations.

16. Functional Domains prediction:


Predicting the protein domains which are functionaly important from its protein sequence like active sites in a protein.

Recent studies ..

Predicting protein function from domain content.


17. Motif Prediction /Pattern matching:


Predicting the motifs or motif clusters which are functionaly important.
Ex: regulatory motifs, Binding site motifs ...miRNA motics ..repeat motis ...Microsatellites are also a kind of motifs.
Recent studies...
Biomolecular network motif counting and discovery by color coding.


18. Protein - protein interaction:

19. Protein folding:

One of the famous and most important and still unsolved problem.

20. Database development:


In some sense Bioinformatics is called as "Comparative Method". Because Bioinformatics depends on Databases for all of its analysis. So developing data base is a very important project. Many companies surviving by devloping and updating the databases.

NCBI , PDB and UCSC genome browser are some of the very important databases.

21. Software development:


Incorporating the usage of Softwares in Biological analysis is called "Bioinformatics".

22. Developing Bioinformatics Methods/Approaches :

23. Primer designing:

24. Modeling genetics History:

25. Ancient DNA:

26. Population Genetics Simulations:

27. Finding SNPs:

28. Genome wide Association Studies:

29. Systems Biology:

30. Homology Search:

31. Computational Genomics:

Peronalized Medicine

Human anatomy is similar in functionalities with other human. But Human genome is not identical. Genes are responding differently with environment and life style. Each individual has variations in their genome. Based on these variations different drugs are responding in different level.

Now a days Genome wide disease association studies are very interesting and coming up with new SNPs and Disease association. Recent studies confirms the association of SNPs and Cancers.

Based on the SNPs even we can predict the traits like eye color, hair color..etc.

Genotyping cost is being reduced every month. So by genotyping an individual he can find the SNPs in his genome. And based on the SNPs in future doctors can refer the drugs.

Bioinformatics and CPAN Modules

Hundreds of Bioinformatics Perl Modules are available in CPAN for almost for all the Bioinformatics analysis works.

Here is Some of the Bioinformatics Modules.

1. To format the HTML output of BLAST

2. Automate the BLAST for number of sequences.

3. Running ClustalW

4. Population Genetics modules.

5. Phylogenetics Modules.

Comparative Genomics

"Compare and Predict" is the basic and whole idea of Bioinformatics.

Comparative Genomics is the Powerful method in Bioinformatics.

Application of Comparative Genomics in Bioinformatics ?

1. To predict and solve the Protein Structure based on existing solved structures.

2. To annotate the newly sequenced Genomes.

3. To predict the functionally important non-coding region or Patterns.

Perl Scripts for Bioinformatics

Why Bioinformatics people prefer Perl scripts ?

1. Perl Scripts are very easy for the String processing. Biological data like Genome sequences and protein sequences.

2. There is no strict rules for writing Perl scripts like other languages. That makes the biologist to write scripts easily.

3. File processing is very easy in Perl.

4. Perl scripts can be combined with SHELL scripts for processing.

5. Using Perl CGI we can develop the Web pages by combining with HTML.

6. CPAN contains so many Perl Modules which are Specific for Bioinformatics.

7. Perl is used for System administration purpose also.

8. Perl Template tool kit is another Perl product which makes the web page development as very easy one for developers.

9. Perl DBIx is an ultimate module for DBI applicaiton. It makes the DBI as an easy job.

10. Processing / Parsing a HTML file is very easy by using CPA modules.

11. File type conversion is also possible in Perl. Ex:Doc to PDF ,HTML to PDF ..Etc.

Tuesday, March 18, 2008

What is 123Bioinformatics

Hi Friends,

Welcome to 123bioinformatics page where you can learn about Bioinformatics at research and commercial level. Main objective of this blog is to help the young bioinformatics students to know more about bioinformatics. Students can leave their queries here we will try to help them at bioinformatics and programming level.

Happy Blogging !!

With Passion for Bioinformatics,
Bioinformatician

123 Bioinformatics

What is bioinformatics ?

It's method to predict the biological outcomes before anyone go for full fledged research. It's a method to compare the biological data. Ex: sequence analysis. It's a way to predict or solve the protein structure.
It's the only way for PERSONALIZED MEDICINE in this post genomic era. It's the method to do comparative genomics and predict the Human homolog genes in other species.
It's the method to annotate the newly sequenced genomes.

How the biological problems can be predicted ?

We are living in the world of Computers. By analyzing the existing biological data using Information Technology we can predict the biological outcomes.

What is the HOTTEST branch of Bioinformaics in this post genomic era ?

Personalized medicine is the most hottest and fastest growing field. Personalized medicine can be achieved through bioinformatics only.