Scientific Application – BLAST

Overview

BLAST (Basic Local Alignment Search Tool) is a set of similarity search programs for exploring the available sequence databases regardless of whether the query is protein or DNA. These well-known applications are written and maintained by the National Center for Biotechnology Information (NCBI).

A reworked version of Blast based on the NCBI C++ Toolkit has recently become available. This version, sometimes referred to as BLAST+, is the supported version of BLAST from NCBI going forward. This is the version that is installed on kure.unc.edu.

Research Computing Server(s): Kure, Killdevil
Default Version: 2.2.29
Installed Version(s): 2.2.27, 2.2.28, 2.2.29 Kure, Killdevil

Setting up your Environment

On Kure and Killdevil the user environment is managed using the module command. To add blast to just your current session do:

   module add blast

To have blast added automatically each time you login (which won’t affect your current session) use:

   module initadd blast

Executing your program

Here are an examples of running a BLAST job. All jobs are run by submitting them the the LSF job scheduler to run on the compute nodes. This example shows submitting the job from the command line to the queue named week and blasting a sequence named NM_005016 against the benchmark.nt database. Note all bsub options precede the BLAST command to be run (here blastn) and all the BLAST command options come after the command itself. (This particular example runs in under a minute but generates a 10,000 line output file [my_blast_out].)

bsub -q week –o /netscr/[onyen]/my_blast_out blastn \
-db /nas02/data/blast/refseq_rna \
–query /nas02/apps/blast-2.2.23/examples/NM_005016

The databases are installed in “/nas02/data/blast”.

  • You will be able to see the BLAST search result where you specify the ‘-o’ output file parameter option. In this case, the output file location is “/netscr/[onyen]/blastOutputResult.txt” used for Kure.

“/lustre/scr/o/n/[onyen]/blastOutputResult.txt” should be used for KillDevil.

Notes

  • Supported desktop for running BLAST is limited to Linux 2.6 only.
  • Here are the databases installed in “/nas02/data/blast” on the Research Cluster:
    • env_nr
    • env_nt
    • est_human
    • est_mouse
    • est_others
    • est_gss
    • htgs
    • human_genomic
    • human_genomic_transcript
    • mouse_genomic_transcript
    • nr
    • nt
    • other_genomic
    • pataa
    • patnt
    • pdbaa
    • pdbnt
    • refseq_genomic
    • refseq_protein
    • refseq_rna
    • sts
    • swissprot
    • taxdb
    • wgs
    • FASTA is also available. If you would like to have access to other databases, please contact the Research Computing Group at research@unc.edu.
  • It is suggested that you use scratch space for all work files. Since this storage is shared with many other users, please remove any files there that are not associated with currently running jobs. If this space fills completely, all jobs being run by any user that attempts to write there will hang or die. To avoid that, the system administrators run a program that detects when free scratch space is low. When that level is reached old files not associated with running jobs are deleted. This process is automatic and those files cannot be retrieved. Therefore, all result files should be moved to a more secure location, such as your home directory or Mass Storage space.

Links

Additional Help

Research Computing home page