Unity
Unity
About
News
Events
Docs
Contact Us
code
search
login
Unity
Unity
About
News
Events
Docs
Contact Us
dark_mode
light_mode
code login
search

Documentation

  • Requesting An Account
  • Cluster Specifications
    • Node Features (Constraints)
      • NVLink and NVSwitch
    • Node List
    • Partition List
      • Gypsum
    • Storage
  • Frequently Asked Questions
  • Connecting to Unity
    • SSH
    • Unity OnDemand
    • Connecting to Desktop VS Code
  • Managing Files
    • Command Line Interface (CLI)
    • Disk Quotas
    • FileZilla
    • Globus
    • Scratch: HPC Workspace
    • Unity OnDemand File Browser
  • Submitting Jobs
    • Batch Jobs
      • Array Batch Jobs
      • Large Job Counts
      • Monitor a batch job
    • Interactive CLI Jobs
    • Unity OnDemand
    • Message Passing Interface (MPI)
    • Slurm cheat sheet
  • Software Management
    • Conda
    • Modules
      • Module Usage
      • Module Hierarchy
    • Renv
    • Unity OnDemand
      • JupyterLab OnDemand
    • Venv
  • Tools & Software
    • ColabFold
    • R
      • R Parallelization
    • Unity GPUs
  • Datasets
    • AI and ML
      • Code Llama
      • Imagenet
      • Imagenet 1K
      • LAION
      • Llama2
      • mixtral
    • Bioinformatics
      • BFD/MGnify
      • Big Fantastic Database
      • ColabFoldDB
      • dfam
      • EggNOG
      • Kraken2
      • MGnify
      • NCBI BLAST databases
      • NCBI RefSeq database
      • PDB70
      • PDB70 for ColabFold
      • Protein Data Bank
      • Protein Data Bank database in mmCIF format
      • Protein Data Bank database in SEQRES records
      • Tara Oceans 18S amplicon
      • Tara Oceans MATOU gene catalog
      • Tara Oceans MGT transcriptomes
      • Uniclust30
      • UniProtKB
      • UniRef30
      • UniRef90
  • HPC Resources

Documentation

  • Requesting An Account
  • Cluster Specifications
    • Node Features (Constraints)
      • NVLink and NVSwitch
    • Node List
    • Partition List
      • Gypsum
    • Storage
  • Frequently Asked Questions
  • Connecting to Unity
    • SSH
    • Unity OnDemand
    • Connecting to Desktop VS Code
  • Managing Files
    • Command Line Interface (CLI)
    • Disk Quotas
    • FileZilla
    • Globus
    • Scratch: HPC Workspace
    • Unity OnDemand File Browser
  • Submitting Jobs
    • Batch Jobs
      • Array Batch Jobs
      • Large Job Counts
      • Monitor a batch job
    • Interactive CLI Jobs
    • Unity OnDemand
    • Message Passing Interface (MPI)
    • Slurm cheat sheet
  • Software Management
    • Conda
    • Modules
      • Module Usage
      • Module Hierarchy
    • Renv
    • Unity OnDemand
      • JupyterLab OnDemand
    • Venv
  • Tools & Software
    • ColabFold
    • R
      • R Parallelization
    • Unity GPUs
  • Datasets
    • AI and ML
      • Code Llama
      • Imagenet
      • Imagenet 1K
      • LAION
      • Llama2
      • mixtral
    • Bioinformatics
      • BFD/MGnify
      • Big Fantastic Database
      • ColabFoldDB
      • dfam
      • EggNOG
      • Kraken2
      • MGnify
      • NCBI BLAST databases
      • NCBI RefSeq database
      • PDB70
      • PDB70 for ColabFold
      • Protein Data Bank
      • Protein Data Bank database in mmCIF format
      • Protein Data Bank database in SEQRES records
      • Tara Oceans 18S amplicon
      • Tara Oceans MATOU gene catalog
      • Tara Oceans MGT transcriptomes
      • Uniclust30
      • UniProtKB
      • UniRef30
      • UniRef90
  • HPC Resources
  1. Unity
  2. Documentation
  3. Datasets
  4. Bioinformatics
  5. NCBI BLAST databases

NCBI BLAST databases

The NCBI databases are downloaded every Sunday to a directory with that date. The file /datasets/bio/ncbi-db/.ncbirc is then updated to point to the new copy once the download has been verified. This allows running jobs to have a consistent database throughout the run.

Note that other tools that can use the NCBI database but do not read this configuration file can use the output of blastdb_path to find the current copy, as shown in the following example:

module load blast-plus/2.13.0+py3.8.12 diamond/2.0.15+py2.7.18
NR=$(blastdb_path -db nr -dbtype prot)
diamond blastp --db "$NR" -q query.fasta -o matches.tsv
Path:/datasets/bio/ncbi-db/
URL:https://ftp.ncbi.nlm.nih.gov/blast/db/
Downloaded:weekly
Cite:https://support.nlm.nih.gov/knowledgebase/article/KA-03391/en-us
Last modified: Thursday, September 5, 2024 at 4:00 PM. See the commit on GitLab.
University of Massachusetts Amherst University of Massachusetts Amherst University of Rhode Island University of Rhode Island University of Massachusetts Dartmouth University of Massachusetts Dartmouth University of Massachusetts Lowell University of Massachusetts Lowell University of Massachusetts Boston University of Massachusetts Boston Mount Holyoke College Mount Holyoke College
search
close