Difference between revisions of "How to search for material"

From GGBN Wiki
Jump to: navigation, search
(Shopping Cart)
(Preorder samples/Login feature)
 
(81 intermediate revisions by 3 users not shown)
Line 1: Line 1:
In general you can use three options to search our data portal:
+
=Background=
 +
We use the [http://wiki.bgbm.org/bhit/ Berlin Harvesting and Indexing Toolkit (B-HIT)] to harvest GGBN provider data. The records (or units) can be harvested from providers having either a BioCASe or an IPT installation. For BioCASe providers, the schemata ABCD 2.06, ABCD 2.1, ABCDDNA, ABCDGGBN and ABCDEFG are supported (single records or ABCD Archives). For IPT providers, DarwinCore Archives are supported, including the GGBN extensions. Elements that are indexed are listed at http://wiki.bgbm.org/bhit/index.php/Indexed_fields.
  
1. Browse Taxonomy
+
You can either use the full text search on the landing page or the selecting "Search" from the menu
2. Search by Fields
 
3. Search by Citation
 
  
=Browse Taxonomy=  
+
==Data Quality and Data Cleaning==
[[File:Navi-Browse Taxonomy.JPG|thumb|100px]]
+
During harvesting GGBN provider data are checked and cleaned, if necessary. We keep the original provider data in addition to cleaned versions. Data quality tests are done using B-HIT. Country names are translated in English, ISO codes are compared to the country names, coordinates are validated and checked against both ISO code and country name. In case of incomplete data, the tool is looking into the named areas and localities and tries to extract some information regarding the country or the water body.
The GGBN/DNA Bank Network's data portal makes currently use of the Catalogue of Life (CoL) as its main taxonomic backbone. We match family and genus of original records provided by our partners with the CoL annual checklist (version 2009). Selecting [http://www.dnabank-network.org/CoL.php "Browse Taxonomy"] leads you to our Taxonomic Backbone site (figure below).  
 
  
This site gives you an overview on how many samples and taxa are online available for certain higher taxa and genera (taxa and sample counts bracketed). By clicking on "Show DNA samples" you will be redirected to a query that gives you all records for selected taxon.
+
Scientific names are parsed using the GBIF Name Parser (http://www.gbif.org/developer/species#parser) and customized regular expressions.
  
Furthermore you can click on the little plus icon to see the next level of taxa.
+
==Taxonomic Backbone==
 +
After harvesting the scientific names are matched against certain checklists of the GBIF checklist bank. Higher taxa, synonyms and accepted taxa are retrieved, also using the GBIF checklist bank webservice (http://api.gbif.org/v1/species). These checklists include: Prokaryotic Nomenclature Up-to-Date (PNU), Catalogue of Life, NCBI and the GBIF backbone itself.
  
 +
=Search=
 +
<div id="wikinote" align="center">http://www.ggbn.org/ggbn_portal/search/index
  
[[File:CoL-Start1.JPG|500px]]
+
The Data Portal is based on SOLR, which provides powerful full text search. We have implemented this feature in all search fields, apart from select lists, checkboxes and radio buttons. The search is case insensitive, so e.g. “black sea”, “Black Sea” or “Black sea” will all work.</div>
  
=Search by Fields=
+
==Search by fields==  
[[File:Navi-Data Portal.JPG|thumb|100px]]
+
[[File:GGBN data portal search form.jpg|center|700px]]
The main query form for the data portal can be found at "Search & Preorder". Here you can choose a lot of criteria to filter your results. The upper part contains parameters related to the underlying voucher and collection event and the lower part contains facts about the DNA samples. If you select for example a certain DNA bank you can browse it's whole DNA and tissue collection.
 
  
The NCBI Taxonomy ID is often used by the microorganismic community. Right now search does not include synonyms, but will do in the future. Many records are provided with multiple determinations (their determination history). We index all determinations and you can search for all of them. Suggestion lists will help you when entering Family name or Species name.
+
Here you can choose different parameters to filter your results using the facets on the left. The filters/facets are used with AND operator, e.g. materialType=DNA&country=Belgium will search for DNA sample collected in Belgium. Within each facet the OR operator is used, e.g. materialType=DNA&materialType=tissue&country=Belgium will search for DNA OR tissue samples collected in Belgium. This can be extended to as many facets/filters you like.
  
Furthermore we filter collection year as well as Seas and Oceans out of the raw data. Seas/Oceans and Countries are matched with an existing list, so that also countries like "England" will be recognized as United Kingdom for instance.
+
[[File:Search expand sort.PNG|center|700px]]
 +
In addition you can sort the data and preview what kind of material is available by expanding the rows
  
We also index related Sequence Accession Numbers (BOLD and GenBank/EMBL/DDBJ) to be available for search.
 
  
==Dependent parameters==
+
Most of the fields are drop down lists or include suggestion lists to help you. E.g. when typing a name the portal searches for all synonyms and accepted names matching your search term and provides a suggestion list with detailed information about the name found in the GGBN backbone. 
Some parameters are related to each other, e.g. if you select a Continent or an Ocean, the list of Countries or Seas respectively will be reduced to those belonging to the selected Continent/Ocean.
+
[[File:GGBN data portal suggestion list.jpg|center]]
The same happens for Family name and Species name/Taxonomy ID. If you select a Family name, the list of suggested Species names and Taxonomy IDs is reduced to the relevant ones.
 
  
[[File:Queryform.JPG|600px]]
+
<div id="wikinote">You can search for any scientific name using "Scientific Name", including higher taxa.</div>
  
==Getting results==
+
==Record detail==
After clicking on "Search" you will receive a hitlist with 50 records per page. The column heading contain small arrows. Clicking on such an arrow will arrange the results in the new order. The green arrow marks the current order (in the figure ordered by Species name from A to Z). The hitlist contains the species name, the country where the specimen/sample was collected, the DNA number as well as the specimen/voucher number. Clicking on the small magnifier or the species name will give you the record details.
+
[[File:Material_entities.jpg|thumb|400px|This is an example of 6 material entities that are aggregated as three GGBN records (green [1], red [2] and blue [3]) through their relationships. Together they represent one occurrence. In this example there would be three record pages. Please check out our [[Definition of GGBN Terms]] for further explanation]]
 +
The record details page aggregates data from multiple sources. Please have a look at our [[Definition_of_GGBN_Terms | definitions on]] differences between GGBN records, material entities and occurrences. Here you see an example with DNA sample, Tissue sample and Specimen. This is both indicated in the top blue bar (individual tabs for each material entity) and above the title. The data are coming from up to three different datasources, depending on where the samples and data are deposited. Below the map you find information about related records (e.g. another tissue derived from the same specimen) as well as information about loaning availabilities and conditions. Furthermore it is checked whether the taxon is listed on CITES. Left to the map you find collecting information and determination details. In the lower part you'll find with information about the physical samples and where to find them. If you click on the Institution Full Name you'll see the GGBN members page for this institution.
  
Some samples doesn't have a DNA number. This means that DNA was not extracted yet but can be ordered on-demand.
+
In case sequences or multimedia items are associated to these materials, further tabs will appear. At the very bottom you'll find information about the dataset.  
 
+
[[File:GGBN data portal add record detail.jpg|center|700px]]
[[File:Hitlist-portal.JPG|600px]]
 
 
 
==Pre-order samples==
 
Login is required for pre-ordering samples. When you are logged in you can select the required samples (checkboxes on the right) and click on "Add selected DNA/Tissue samples to shopping cart" on top right.
 
Some samples are blocked for the ordering process and marked with a big red X.
 
[[File:Hitlist-portal-shop.JPG|600px]]
 
===Shopping Cart===
 
After putting samples into the shopping cart on top of the website appears the message "Your shopping cart contains xy samples". Clicking on "Show details" will open you shopping cart and will guide you through the pre-ordering process.
 
[[File:ShoppingCart1.JPG|600px]]
 
 
 
The Shopping cart gives you an overview of your selected DNA/tissue samples. Here you can delete samples from your cart or by clicking on the taxon name see the record details again. Click on "Continue Pre-Order (->Step 2/3)" to continue.
 
[[File:ShoppingCart2.JPG|600px]]
 
 
 
The next steps orders the samples by institutions/DNA banks. In the example below you see that the selected samples are deposited at BGBM and DSMZ. You can check your invoice and delivery address and add some notes if you want. When clicking on "Finish Pre-Order (->Step 3/3) your pre-order will be forwarded to the DNA bank(s) in authority of requested samples. Every DNA bank only receives its relevant order information. Subsequently a confirmation email will be send to you by the DNA bank(s) in question. An offer including binding prices will than be made within a separate email.
 
 
 
[[File:ShoppingCart3.jpg|600px]]
 

Latest revision as of 14:00, 29 January 2025

Background

We use the Berlin Harvesting and Indexing Toolkit (B-HIT) to harvest GGBN provider data. The records (or units) can be harvested from providers having either a BioCASe or an IPT installation. For BioCASe providers, the schemata ABCD 2.06, ABCD 2.1, ABCDDNA, ABCDGGBN and ABCDEFG are supported (single records or ABCD Archives). For IPT providers, DarwinCore Archives are supported, including the GGBN extensions. Elements that are indexed are listed at http://wiki.bgbm.org/bhit/index.php/Indexed_fields.

You can either use the full text search on the landing page or the selecting "Search" from the menu

Data Quality and Data Cleaning

During harvesting GGBN provider data are checked and cleaned, if necessary. We keep the original provider data in addition to cleaned versions. Data quality tests are done using B-HIT. Country names are translated in English, ISO codes are compared to the country names, coordinates are validated and checked against both ISO code and country name. In case of incomplete data, the tool is looking into the named areas and localities and tries to extract some information regarding the country or the water body.

Scientific names are parsed using the GBIF Name Parser (http://www.gbif.org/developer/species#parser) and customized regular expressions.

Taxonomic Backbone

After harvesting the scientific names are matched against certain checklists of the GBIF checklist bank. Higher taxa, synonyms and accepted taxa are retrieved, also using the GBIF checklist bank webservice (http://api.gbif.org/v1/species). These checklists include: Prokaryotic Nomenclature Up-to-Date (PNU), Catalogue of Life, NCBI and the GBIF backbone itself.

Search

http://www.ggbn.org/ggbn_portal/search/index The Data Portal is based on SOLR, which provides powerful full text search. We have implemented this feature in all search fields, apart from select lists, checkboxes and radio buttons. The search is case insensitive, so e.g. “black sea”, “Black Sea” or “Black sea” will all work.

Search by fields

GGBN data portal search form.jpg

Here you can choose different parameters to filter your results using the facets on the left. The filters/facets are used with AND operator, e.g. materialType=DNA&country=Belgium will search for DNA sample collected in Belgium. Within each facet the OR operator is used, e.g. materialType=DNA&materialType=tissue&country=Belgium will search for DNA OR tissue samples collected in Belgium. This can be extended to as many facets/filters you like.

Search expand sort.PNG

In addition you can sort the data and preview what kind of material is available by expanding the rows


Most of the fields are drop down lists or include suggestion lists to help you. E.g. when typing a name the portal searches for all synonyms and accepted names matching your search term and provides a suggestion list with detailed information about the name found in the GGBN backbone.

GGBN data portal suggestion list.jpg
You can search for any scientific name using "Scientific Name", including higher taxa.

Record detail

This is an example of 6 material entities that are aggregated as three GGBN records (green [1], red [2] and blue [3]) through their relationships. Together they represent one occurrence. In this example there would be three record pages. Please check out our Definition of GGBN Terms for further explanation

The record details page aggregates data from multiple sources. Please have a look at our definitions on differences between GGBN records, material entities and occurrences. Here you see an example with DNA sample, Tissue sample and Specimen. This is both indicated in the top blue bar (individual tabs for each material entity) and above the title. The data are coming from up to three different datasources, depending on where the samples and data are deposited. Below the map you find information about related records (e.g. another tissue derived from the same specimen) as well as information about loaning availabilities and conditions. Furthermore it is checked whether the taxon is listed on CITES. Left to the map you find collecting information and determination details. In the lower part you'll find with information about the physical samples and where to find them. If you click on the Institution Full Name you'll see the GGBN members page for this institution.

In case sequences or multimedia items are associated to these materials, further tabs will appear. At the very bottom you'll find information about the dataset.

GGBN data portal add record detail.jpg