Wednesday, October 29, 2008

OpenURL for Genbank records

Following on from adding specimens to my OpenURL resolver, I've added support for GenBank records. Either an OpenURL request such as http://bioguid.info/openurl?id=genbank:DQ502033, or the short URL http://bioguid.info/genbank/DQ502033 will resolve the GenBank record for accession number DQ502033.

The HTML isn't much to look at, the real goodness is the JSON (obtained by appending "&display=json" to the OpenURL request, or ".json" to the short form, e.g. http://bioguid.info/genbank/DQ502033.json).

The resolver gets the sequence form NCBI, does a little post processing, then displays the result. Postprocesisng includes parsing the latitude and longitude coordinates (something of a mess in GenBank, see my earlier metacrap rant), extracting specimen codes, adding bibliographic GUIDs (such as DOIs, Handles, or URLs), finding uBio namebankID's for hosts, etc. Note that some records have a key called "taxonomic_group". This is to provide clues for resolving museum specimens -- often the DiGIR provider needs to know what kind of taxon you are searching for.

The aim is to have a simple service that returns somewhat cleaned up GenBank records that I (and others) can play with.

No comments: