Thursday, March 27, 2008

SHELL Scripts for Simple Bioinformatics Analysis

Simple SHELL script for parsing BLAST output

1. To parse the sequence names from BLAST output.

"grep" is one of the very powerful unix command to retrieve the particular pattern from a file.

Syntax:
grep "" input_file

Example: grep ">" Blast_output.txt

In this above example grep command will retrieve the lines which are having ">" symbol. In Blast output file all the sequence names are starting with ">". So you can get all the sequence names in the Blast output file.

Learn More


2. Parsing the Sequence names and the sequences from the BLAST output

"egrep" is one of the powerful command in retrieving multiple patterns from a file.

Syntax:
egrep "pattern1 | pattern2 | pattern3" filename

Example:
Below is the combination of SHELL and Perl script for parsing the BLAST Output.

egrep "> | sbjct" Blast_output | sed 's/Sbjct://' BLAST_output.txt >output.txt
open (FH, output.txt);
while(""= $ln)
{
if($ln !~ m/>/)
{
@temp = split(/\t/,$ln);
print "$temp[1]\n";
}
else
{
print $ln;
}
}

In the above example egrep will retrieve the lines which are matching with ">" and Sbjct and store the output in output.txt. Then the Perl script will parse the sequeunces.

2 comments:

Anonymous said...

JANTA VEDIC COLLEGE,Baraut(U.P.),under CCS University,Meerut offering M.Sc. Bioinformatics since 2007.
For further details:

Dr.Rajeshwari Sharma
09837739099
Co-cordinator,
Dept. of Bioinformatics

Manoj Kumar Sharma
09756276078
Lecturer-Bioinformatics
Dept. of Bioinformatics

manoj said...

JANTA VEDIC COLLEGE,Baraut(U.P.),under CCS University,Meerut offering M.Sc. Bioinformatics since 2007.
For further details:

Dr.Rajeshwari Sharma
09837739099
co-cordinator,
Dept. of Bioinformatics

Manoj Kumar Sharma
09756276078
Lecturer-Bioinformatics
Dept. of Bioinformatics