Hi,

I would like to create a personal blast database of arbitrary sequences and be able to use all the features of BLAST+ to create subsets of databases based on identifiers or filter based on taxonomy.

It looks like the formatting of the definition line in the input FASTA files is crucial to assign proper sequence identifiers.

Using the General database identifier gnl|database|identifier or local identifier format lcl|identifier I wasn't able to use the blastdb_aliastool to create db subsets as it expects a GI list as input. I also didn't have any luck assigning taxonomy identifiers with the -taxid_map option of makeblastdb.

What is the recommended way to format FASTA definition lines in order to be able to use all the filtering features of the BLAST+ tools.

I was thinking of creating pseudo GenBank definitions for all my sequences: gi|<'gi-number>|gb|<'AccessionVersion>|<'Accession>, where <'gi-number> is a generated numeric value, and <'Accession/Version> is my identifier. This works for the GI based filtering, however it seems like an ugly hack and I would prefer something more straight forward.

How is the taxid_map file formatted? I've tried <'gi-number>, <'gb|<'AccessionVersion>>, <'gb|Accession> as the sequence identifier, however they don't seem to be assigned properly and blastdbcmd with -outfmt %T gives me zero for all entries.

Thanks for any help,

Deniz

asked 06 Oct '10, 00:17

Deniz's gravatar image

Deniz
3116
accept rate: 0%

edited 06 Oct '10, 00:18


I've heard back from the NCBI-BLAST support team that the taxid_map isn't functional at the moment, they are working on a bug fix for the upcoming releases. Filtering isn't supported for custom databases either and they are planning to include it in upcoming versions.

link

answered 12 Oct '10, 07:10

Deniz's gravatar image

Deniz
3116
accept rate: 0%

Just discovered your post after I just posted a similar question on biostar. Might get some answers there.

link

answered 12 Oct '10, 03:34

Jelle's gravatar image

Jelle
1
accept rate: 0%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×4
×1

Asked: 06 Oct '10, 00:17

Seen: 1,042 times

Last updated: 12 Oct '10, 07:10

powered by OSQA