GGBN Data Portal Explanations

From GGBN Wiki
Revision as of 20:28, 16 December 2015 by WikiSysop (talk | contribs) (Record detail)
Jump to: navigation, search

Background

three search options to choose

We use the Berlin Harvesting and Indexing Toolkit (B-HIT) to harvest GGBN provider data. The records (or units) can be harvested from providers having either a BioCASe or an IPT installation. For BioCASe providers, the schemata ABCD 2.06, ABCD 2.1, ABCDDNA, ABCDGGBN and ABCDEFG are supported (single records or ABCD Archives). For IPT providers, DarwinCore Archives are supported, including the GGBN extensions. Elements that are indexed are listed under http://wiki.bgbm.org/bhit/index.php/Indexed_fields.

statistic feature can be found under the freezer icon

The search features can be found under magnifying glass icon. You can use three options to search within the GGBN Data Portal:

  • Search by fields
  • Browse the tree of life
  • Browse collections

Furthermore you can check out statistics of GGBN online collections.

Data Quality and Data Cleaning

During harvesting GGBN provider data are checked and cleaned if necessary. We keep the original provider data in addition to cleaned versions. Data quality tests are done using B-HIT. Country names are translated in English, ISO codes are compared to the country names, coordinates are validated and checked again both ISO code and country name. In case of incomplete data, the tool is looking into the namedareas and localities and tries to extract some information regarding the country or the water body.

Scientific names are parsed using the GBIF Name Parser (http://www.gbif.org/developer/species#parser) and customized regular expressions.

Taxonomic Backbone

After harvesting the scientific names are matched against certain checklists of the GBIF checklist bank. Higher taxa, synonyms and accepted taxa are retrieved, also using the GBIF Checklistbank webservice (http://api.gbif.org/v1/species). These checklists include: Catalogue of Life, NCBI and the GBIF backbone itself. In addition we match the names against the Prokaryotic Nomenclature up-to-date (PNU) web service, provided by the DSMZ.

Search by fields

http://data.ggbn.org/ggbn_portal/search/index The portal will perfom exact search for every parameter. If you want to do a like search, use the asterisk '*', e.g. Accipiter* will give you all records beginning with Accipiter. This wildcard search works for all lists except for dropdown lists, such as country or ocean.
GGBN data portal search form.jpg

Here you can choose different parameters to filter your results. The upper part contains parameters often used by researchers and curators. In addition you can add further parameters (click on "add search field"). We distinguish between GGBN repositories (DNA and tissue banks) and voucher collections. The latter can also be non-GGBN instutions.

select additional search parameters
the field will appear (here Ocean) immediately. It can be deleted by clicking on the red cross

Most of the fields are drop down lists or include suggestion lists to help you. E.g. when typing a name the portal searches for all synonyms and accepted names matching your search term and provides a suggestion list with detailed information about the name found in the GGBN backbone.

GGBN data portal suggestion list.jpg
You can search for any scientific name using "Scientific Name", including higher taxa.

Edit your search

Your results are displayed in a hitlist. You can change the filters at any time. Just select further parameters from "add search field" or delete some using the red cross. To see the new results click on "Refine search". You can also change the order of the columns by clicking on the little arrows. To see the details of a record click on the blue scientific name.

GGBN data portal hitlist.jpg

Record detail

The record details page aggregates data from multiple sources. Here you see an example with DNA sample, Tissue sample and Specimen. These data are coming from up to three different datasources, depending on where the samples and data are deposited. On top you find information about loaning availabilities and conditions. Furthermore it is checked whether the taxon is listed on CITES. Left to the map you find collecting information and determination details. In the lower part you see different blue tabs with information about the physical samples and where to find them.

On top right information about the taxon are retrieved live from external sources, such as GBIF, NCBI, BOLD and EOL. In addition you see how many further samples for this taxon can be found at GGBN.

On the left you find information on samples at GGBN that are from same population or same individual as this one.

In case sequences, publications or multimedia items are provided, further tabs will appear.

GGBN data portal add record detail.jpg

Order samples/Login feature

login feature

To order samples or subscribe to searches you must register as a user. To do so click on "log in" or the little human in the menu. Please check our Data_Privacy Data Privacy Statement.