Saturday, June 14, 2008

Bioinformatics Market Growth - Year wise

Bioinformatics Opportunities in India

Outsourcing to India, compared to other developed countries, offers about 30-40% costs savings in overall drug discovery research, and close to 60% cost savings when outsourcing core bioinformatics services. This is due to the lower wage costs for skilled manpower, and lower infrastructure costs.

The Indian bioinformatics market has grown from $18 million in 2003-04 to $35 million in 2006-07, at a CAGR of 25%. Interestingly, owing to low local demand, $32 million or about 90% of bioinformatics revenues in India are derived from outsourcing activities. "Local demand for bioinformatics services is low due to low investment in new drug discovery. Even though research investments of life sciences companies are increasing, they are still small compared to global standards" says Ashutosh Mundkur, VBU Life Sciences, Service Offerings, Satyam Computer Services Ltd.

The Indian bioinformatics outsourcing services opportunity is estimated to grow at 25% per annum during 2007-2010 raising its share of the global market from 1.4% in 2007 to 1.7% in 2010. These estimates are made based on the current plans of Indian vendors, as well as considering the impact of scarcity in human resources. Improved availability of skilled workers could help take growth rates higher. Similarly, positive actions by the Indian government to enhance IP rights could also help raise growth.


Indian Bioinformatics Market:


Pharmaceutical companies are under constant pressure to develop new “blockbuster” drugs to replace older ones that are going off patent. With costs to launch a new drug crossing $1 billion and the number of drugs approved for commercial launch decreasing, pharmaceutical companies are increasingly looking at biotechnology to deliver results.

The Indian bioinformatics industry comprises vendors with origins in the life sciences domain or the IT domain. We estimate that there are about 45-50 companies active in the bioinformatics segment in India. Of these, about 35 are involved in the development of software tools, database solutions and providing bioinformatics services; the remaining largely into marketing third party products and services.

2005 Report:

The global bioinformatics market is currently estimated at about $1.4 billion (€1.1 billion). It is expected to grow at an average annual growth rate of 15.8 per cent to reach nearly $3 billion by 2010

Pharmaceutical companies are expected to increase their R&D expenditure in the future. A major portion of this R&D spending is expected to go into bioinformatics. Global drug discovery spending is estimated to increase from $19.6 billion in 2002 to $25.1 billion in 2006.

Scientists are acquiring genomics data through the use of techniques such as amplification, DNA microarray expression, real-time PCR and genotyping. Instrumentation, hardware and software are then required to analyse, integrate and transmit this vast amount of data, which has resulted in significant IT challenges for those in the field.

The segment is estimated to grow at an average annual growth rate (AAGR) of 21.2 per cent from $444.7 million in 2005 to $1.16 billion in 2010. Growth in the genomics-based content will be the key driver for the rise in genomics-based analysis software and services segment.


References:
1. Drug Researcher

2. PharmaAsia

Friday, June 13, 2008

Bioinformatics Books Collection / Bioinformatics Book Collection.


Visit 123Bioinformatics.com for more Updates




Tuesday, May 27, 2008

Bioinformatics openings in Southeast Asia / Bioinformatics Jobs.


Malaysia is one of the fastest growing economic powers in Asia pacific. Cyberjaya is the IT hub of Malaysia with hundreds of software companies. Follow the below link to apply for Bioinformatics and IT jobs in CyberJaya, Malaysia.

http://www.cyberjaya-jobs.com/

123 Bioinformatics Team

Thursday, May 1, 2008

Setting up Standalone BLAST Software in Linux


Visit 123Bioinformatics.com for more updates.

Installing and executing stand-alone BLAST softwares in Linux.


Stand alone BLAST is the local installation of the NCBI BLAST suite of programs. NCBI provides binaries for various platforms. It is the same as the NCBI BLAST programs except that we can execute in the local machine.

The local version is significant when we have a large set of sequences to BLAST and this is not affected by the Internet speed /Traffic etc and it can be automated.

The stand alone blast can be downloaded from the NCBI FTP site (The link can be found at the bottom side tool bar in the NCBI main page “FTP Site-> Blast-> executables->Latest”).

The file should be in binary mode. Filenames are of the following form:

Program-version-architecture-os.extension Remember to choose the appropriate architecture (32 bit or 64 bit). Download the file and extract the contents in the gzip'ed tar archive. The ‘.gz’ file extension indicates that the file has been compressed with gzip (a standard Unix compression utility), The ‘.tar’ extension indicates that the file is a tape archive created with tar (a standard Unix archiving tool).

To uncompress ‘gunzip’ and extract the files from the archive into the current working directory follow the comments given below.

jk@jk:~/Desktop/blast-2.2.18/bin$ gunzip blast-2.2.18-ia32-linux.tar.gz #uncompress

jk@jk:~/Desktop/blast-2.2.18/bin$ tar -xpf blast-2.2.18-ia32-linux.tar #extract

For more information on the options look into $man tar/gunzip.

When you get into the extracted directory you can see three other directories (bin, data, doc). The doc directory contains the README files for each software. The data directory contains the scoring matrices. The bin directory contains all the executables for running various BLAST searches.

How to execute bl2seq (BLAST two sequence):

Bl2seq performs a comparison between two sequences using either the blastn or blastp algorithm. Both sequences must be either nucleotides or proteins.

The input files to any BLAST softwares should always be in FASTA format.

eg
>gi|229673|pdb|1ALC| Alpha-Lactalbumin
KQFTKCELSQNLYDIDGYGRIALPELICTMFHTSGYDTQAIVENDESTEYGLFQISNALWCKSSQSPQSR
NICDITCDKFLDDDITDDIMCAKKILDIKGIDYWIAHKALCTEKLEQWLCEKE

Syntax:

jk@jk:~/Desktop/blast-2.2.18/bin$
./bl2seq - # Displays all options

You can choose the required options. The must-options are -p, -i, -j. The other options can be defined or elze the program will choose the default value.

jk@jk:~/Desktop/blast-2.2.18/bin$ ./bl2seq -p blastp -e 0.01 -i -j # blastp -to execute protein sequence
-i First sequence [File In]
-j Second sequence [File In]
-p Program name: blastp, blastn, blastx, tblastn, tblastx. For blastx 1st sequence should be nucleotide, tblastn 2nd sequence nucleotide.
-e E-Value # (optional)

The two input files (file1, file2) should be in the (/blast-2.2.18/bin) current working directory for the above syntax to work. If not, give the appropriate path. If you have multiple FASTA sequences to compare you can automate the above syntax using shell scripts.

How to execute Blastall:

Blastall is most commonly used tool. It can perform all BLAST programs like blastp, blastn, blastx, tblastn, tblastx. Unlike the bl2seq, The blastall is used when you have multiple FASTA sequences as input/queries and searched against the appropriate protein/nucleotide database.
You can download the Protein or Nucleotide database from swissprot or NCBI. for eg to download the human chr22,

go to NCBI-> FTP site-> RefSeq-> H_sapiens-> H_sapiens ->chr22.

Note:

FASTA formatted files are not compatible for the BLAST programs. You need to prepare the FASTA files for BLAST with formatdb. This indexes the entries in the FASTA file and enables BLAST to run much faster.
Uncompress the database. It will look like the one below if its a protein sequence database. The multiple sequence input query to blastall will look similar to this.

>gi|86438068|gb|AAI12638.1| HGD protein [Bos taurus]
MTELKYISGFGNECASEDPRCPGALPEGQNNPQVCPYNLYAEQLSGSAFTCPRSTNKRSWLYRILPSVSH
KPFEFIDQGHITHNWD
>gi|116283875|gb|AAH44758.1| Hgd protein [Mus musculus]
MSVLQRILAVQVPCPKDSWLYRILPSVSHKPFESIDQGHVTHNWDEVGPDPNQLRWKPFEIPKASEKKVD
FVSGLYTLCGAGDIKSNNGLAVHIFLCNSSMENRCFYNSDGDFLIVPQKGKLLIYTEFGKMSLQPNEICV
>gi|116283724|gb|AAH24369.1| Hgd protein [Mus musculus]
MSVLQRILAVQVPCPKDSWLYRILPSVSHKPFESIDQGHVTHNWDEVGPDPNQLRWKPFEIPKASEKKVD



Formatdb:


jk@jk:~/Desktop/blast-2.2.18/bin$ ./formatdb - # displays all options

jk@jk:~/Desktop/blast-2.2.18/bin$ ./blast-2.2.18/bin/formatdb -i -o T -p T

-i Input file(s) for formatting (this parameter must be set) [File In]
-p Type of file T - protein F - nucleotide (default = T)
-o Parse options T - True: Parse SeqId and create indexes. F - False: Do not parse SeqId. ( default = F)

The input database should be in the (/blast-2.2.18/bin) current working directory for the above syntax to work. If not, give the appropriate path.

After running formatdb you can see seven indexes and data files along with the original input file. All the seven files are required for the blastall to run. Make sure the database along with the generated input database is kept in the same directory. View the contents of formatdb.log for error messages.

2. Executing Blastall:

jk@jk:~/Desktop/blast-2.2.18/bin$ ./blastall -i -p blastp -d -o

-p Program Name [String] Input should be one of "blastp", "blastn", "blastx", "tblastn", or "tblastx".
-d Database [String] default = nr The database specified must first be formatted with formatdb.
-i Query File [File In]
-o BLAST report Output File [File Out]

The input database should be in the (/blast-2.2.18/bin) current working directory for the above syntax to work. If not, give the appropriate path.

The output file will contain the BLAST output for all the input query sequences.

Monday, April 14, 2008

Perl for Bioinformatics.



visit 123Bioinformatics.com for more updates





CPAN offers two command-line utility modules. Perl-Tidy module to beautify, indent, and reformat a messy Perl script and Perl-Critic module to test/analyze Perl scripts.

a. Perl::Tidy
When a Perl script is given as an input to perltidy, it creates a intended, structured Perl script and saves it as a separate file using the same name but with a .ty extension. Perltidy does not change the input script.

Steps to follow,
1. Install Perl::Tidy. It can be run on any system with perl 5.004 or later and used on Unix, Windows, VMS and MacPerl.
2. To execute perltidy,
$ perltidy -[option] test_perl_script.pl
This will create a temporary file test_perl_.pl.ty. The test_perl_script.pl .ty file will contain the well structured perl script. There are many options that can be used indent, to take a back-up etc. For more information on installation and execution see, http://perltidy.sourceforge.net/tutorial.html


b. Perl::Critic


Perl-Critic criticizes/analyses the input Perl script and enforces the user to follow various coding guidelines (or policies). The coding guidelines are based on Damian Conway's book Perl Best Practices. The user can enable/disable or create and customize the modules through the Perl::Critic interface.
The user can set the severity levels. There are 5 severity levels: severity "5" is the most or least restrictive level ie Perl::Critic follow the basic policies/guidelines. The five levels are Gentle (equivalent to 5), stern (equivalent to 4), harsh (equivalent to 3), cruel (equivalent to 2), brutal (equivalent to 1).
Perl::Critic requires a few modules to be pre-installed for it to execute. See http://search.cpan.org/~elliotjs/Perl-Critic-1.082/lib/Perl/Critic.pm


Steps to follow,

1. Install Perl::Critic.
2. Execute Perl::Critic
$ perlcritic –1 test_perl_script.pl

For more information see, http://search.cpan.org/dist/Perl-Critic/