Thursday, March 27, 2008

SHELL Scripts for Simple Bioinformatics Analysis

Simple SHELL script for parsing BLAST output

1. To parse the sequence names from BLAST output.

"grep" is one of the very powerful unix command to retrieve the particular pattern from a file.

Syntax:
grep "" input_file

Example: grep ">" Blast_output.txt

In this above example grep command will retrieve the lines which are having ">" symbol. In Blast output file all the sequence names are starting with ">". So you can get all the sequence names in the Blast output file.

Learn More


2. Parsing the Sequence names and the sequences from the BLAST output

"egrep" is one of the powerful command in retrieving multiple patterns from a file.

Syntax:
egrep "pattern1 | pattern2 | pattern3" filename

Example:
Below is the combination of SHELL and Perl script for parsing the BLAST Output.

egrep "> | sbjct" Blast_output | sed 's/Sbjct://' BLAST_output.txt >output.txt
open (FH, output.txt);
while(""= $ln)
{
if($ln !~ m/>/)
{
@temp = split(/\t/,$ln);
print "$temp[1]\n";
}
else
{
print $ln;
}
}

In the above example egrep will retrieve the lines which are matching with ">" and Sbjct and store the output in output.txt. Then the Perl script will parse the sequeunces.

3 comments:

Anonymous said...

JANTA VEDIC COLLEGE,Baraut(U.P.),under CCS University,Meerut offering M.Sc. Bioinformatics since 2007.
For further details:

Dr.Rajeshwari Sharma
09837739099
Co-cordinator,
Dept. of Bioinformatics

Manoj Kumar Sharma
09756276078
Lecturer-Bioinformatics
Dept. of Bioinformatics

Unknown said...

JANTA VEDIC COLLEGE,Baraut(U.P.),under CCS University,Meerut offering M.Sc. Bioinformatics since 2007.
For further details:

Dr.Rajeshwari Sharma
09837739099
co-cordinator,
Dept. of Bioinformatics

Manoj Kumar Sharma
09756276078
Lecturer-Bioinformatics
Dept. of Bioinformatics

Kp tomar said...

About Bioinformatics

21st century is considered to be information technology era that has created an immense impact in every sphere of life leading to knowledge transfer i.e. outsourcing of information. Information is Knowledge and Today’s economy is Knowledge economy. Bioinformatics is a interdisciplinary subject between biological information and IT. Recent developments of the sciences have produced a wealth of experimental data of sequences and three-dimensional structures of biological macromolecules. With the advances of computer and information science, these data are available to the public from a variety of databases on the Internet. Bioinformatics has been defined as the science of examining the structure and function of genes and proteins through the use of computational analysis, statistics, and pattern recognition. A number of recent workforce studies have shown that there is a high current and unmet demand for people trained to various levels of expertise in bioinformatics, from technicians and technical librarians to developers of new and improved methodologies and applications. Bioinformatics is a rapidly evolving and developing field both in terms of breadth of scope of useful applications and in terms of depth of what can be accomplished. National estimates of needed positions in the field in the next four to five years are about 20,000. Bioinformatics is the study of the inherent structure of biological information and biological systems. It brings together the avalanche of systematic biological data (e.g. genomes) with the analytic theory and practical tools of mathematics and computer science.