Difference between revisions of "How to search for material"

From GGBN Wiki
Jump to: navigation, search
(Preorder samples/Login feature)
(Preorder samples/Login feature)
 
(29 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
=Background=
 
=Background=
[[File:GGBN data portal menu search.jpg|thumb|right|200px|three search options to choose]]
+
We use the [http://wiki.bgbm.org/bhit/ Berlin Harvesting and Indexing Toolkit (B-HIT)] to harvest GGBN provider data. The records (or units) can be harvested from providers having either a BioCASe or an IPT installation. For BioCASe providers, the schemata ABCD 2.06, ABCD 2.1, ABCDDNA, ABCDGGBN and ABCDEFG are supported (single records or ABCD Archives). For IPT providers, DarwinCore Archives are supported, including the GGBN extensions. Elements that are indexed are listed at http://wiki.bgbm.org/bhit/index.php/Indexed_fields.
We use the [http://wiki.bgbm.org/bhit/ Berlin Harvesting and Indexing Toolkit (B-HIT)] to harvest GGBN provider data. The records (or units) can be harvested from providers having either a BioCASe or an IPT installation. For BioCASe providers, the schemata ABCD 2.06, ABCD 2.1, ABCDDNA, ABCDGGBN and ABCDEFG are supported (single records or ABCD Archives). For IPT providers, DarwinCore Archives are supported, including the GGBN extensions. Elements that are indexed are listed under http://wiki.bgbm.org/bhit/index.php/Indexed_fields.
 
  
[[File:GGBN data portal menu stats.jpg|thumb|200px|statistic feature can be found under the freezer icon]]
+
You can either use the full text search on the landing page or the selecting "Search" from the menu
The search features can be found under magnifying glass icon. You can use three options to search within the GGBN Data Portal:
 
*Search by fields
 
*Browse the tree of life
 
*Browse collections
 
 
 
Furthermore you can check out statistics of GGBN online collections.
 
  
 
==Data Quality and Data Cleaning==
 
==Data Quality and Data Cleaning==
During harvesting GGBN provider data are checked and cleaned if necessary. We keep the original provider data in addition to cleaned versions. Data quality tests are done using B-HIT. Country names are translated in English, ISO codes are compared to the country names, coordinates are validated and checked again both ISO code and country name. In case of incomplete data, the tool is looking into the namedareas and localities and tries to extract some information regarding the country or the water body.
+
During harvesting GGBN provider data are checked and cleaned, if necessary. We keep the original provider data in addition to cleaned versions. Data quality tests are done using B-HIT. Country names are translated in English, ISO codes are compared to the country names, coordinates are validated and checked against both ISO code and country name. In case of incomplete data, the tool is looking into the named areas and localities and tries to extract some information regarding the country or the water body.
  
 
Scientific names are parsed using the GBIF Name Parser (http://www.gbif.org/developer/species#parser) and customized regular expressions.
 
Scientific names are parsed using the GBIF Name Parser (http://www.gbif.org/developer/species#parser) and customized regular expressions.
  
 
==Taxonomic Backbone==
 
==Taxonomic Backbone==
After harvesting the scientific names are matched against certain checklists of the GBIF checklist bank. Higher taxa, synonyms and accepted taxa are retrieved, also using the GBIF Checklistbank webservice (http://api.gbif.org/v1/species). These checklists include: Catalogue of Life, NCBI and the GBIF backbone itself. In addition we match the names against the Prokaryotic Nomenclature up-to-date (PNU) web service, provided by the [http://bacdive.dsmz.de/api/pnu/ DSMZ].
+
After harvesting the scientific names are matched against certain checklists of the GBIF checklist bank. Higher taxa, synonyms and accepted taxa are retrieved, also using the GBIF checklist bank webservice (http://api.gbif.org/v1/species). These checklists include: Prokaryotic Nomenclature Up-to-Date (PNU), Catalogue of Life, NCBI and the GBIF backbone itself.
 +
 
 +
=Search=
 +
<div id="wikinote" align="center">http://www.ggbn.org/ggbn_portal/search/index
  
=Search by fields=
+
The Data Portal is based on SOLR, which provides powerful full text search. We have implemented this feature in all search fields, apart from select lists, checkboxes and radio buttons. The search is case insensitive, so e.g. “black sea”, “Black Sea” or “Black sea” will all work.</div>
<div id="wikinote" align="center">http://data.ggbn.org/ggbn_portal/search/index
 
  
The portal will perfom exact search for every parameter. If you want to do a like search, use the asterisk '*', e.g. Accipiter* will give you all records beginning with Accipiter. This wildcard search works for all lists except for dropdown lists, such as country or ocean.</div>
+
==Search by fields==
 
[[File:GGBN data portal search form.jpg|center|700px]]
 
[[File:GGBN data portal search form.jpg|center|700px]]
  
Here you can choose different parameters to filter your results. The upper part contains parameters often used by researchers and curators. In addition you can add further parameters (click on "add search field"). We distinguish between GGBN repositories (DNA and tissue banks) and voucher collections. The latter can also be non-GGBN instutions.
+
Here you can choose different parameters to filter your results using the facets on the left. The filters/facets are used with AND operator, e.g. materialType=DNA&country=Belgium will search for DNA sample collected in Belgium. Within each facet the OR operator is used, e.g.  materialType=DNA&materialType=tissue&country=Belgium will search for DNA OR tissue samples collected in Belgium. This can be extended to as many facets/filters you like.
{|
+
 
|[[File:GGBN data portal add search fields.jpg|thumb|200px|select additional search parameters]]
+
[[File:Search expand sort.PNG|center|700px]]
|[[File:GGBN data portal add search fields step2.jpg|thumb|400px|the field will appear (here Ocean) immediately. It can be deleted by clicking on the red cross]]
+
In addition you can sort the data and preview what kind of material is available by expanding the rows
|}
+
 
  
 
Most of the fields are drop down lists or include suggestion lists to help you. E.g. when typing a name the portal searches for all synonyms and accepted names matching your search term and provides a suggestion list with detailed information about the name found in the GGBN backbone.   
 
Most of the fields are drop down lists or include suggestion lists to help you. E.g. when typing a name the portal searches for all synonyms and accepted names matching your search term and provides a suggestion list with detailed information about the name found in the GGBN backbone.   
[[File:GGBN data portal suggestion list.jpg|center|500px]]
+
[[File:GGBN data portal suggestion list.jpg|center]]
  
 
<div id="wikinote">You can search for any scientific name using "Scientific Name", including higher taxa.</div>
 
<div id="wikinote">You can search for any scientific name using "Scientific Name", including higher taxa.</div>
 
==Edit your search==
 
Your results are displayed in a hitlist. You can change the filters at any time. Just select further parameters from "add search field" or delete some using the red cross. To see the new results click on "Refine search". You can also change the order of the columns by clicking on the little arrows. To see the details of a record click on the blue scientific name.
 
[[File:GGBN data portal hitlist.jpg|center|500px]]
 
  
 
==Record detail==
 
==Record detail==
The record details page aggregates data from multiple sources. Here you see an example with DNA sample, Tissue sample and Specimen. These data are coming from up to three different datasources, depending on where the samples and data are deposited. On top you find information about loaning availabilities and conditions. Furthermore it is checked whether the taxon is listed on CITES. Left to the map you find collecting information and determination details. In the lower part you see different blue tabs with information about the physical samples and where to find them.
+
[[File:Material_entities.jpg|thumb|400px|This is an example of 6 material entities that are aggregated as three GGBN records (green [1], red [2] and blue [3]) through their relationships. Together they represent one occurrence. In this example there would be three record pages. Please check out our [[Definition of GGBN Terms]] for further explanation]]
 
+
The record details page aggregates data from multiple sources. Please have a look at our [[Definition_of_GGBN_Terms | definitions on]] differences between GGBN records, material entities and occurrences. Here you see an example with DNA sample, Tissue sample and Specimen. This is both indicated in the top blue bar (individual tabs for each material entity) and above the title. The data are coming from up to three different datasources, depending on where the samples and data are deposited. Below the map you find information about related records (e.g. another tissue derived from the same specimen) as well as information about loaning availabilities and conditions. Furthermore it is checked whether the taxon is listed on CITES. Left to the map you find collecting information and determination details. In the lower part you'll find with information about the physical samples and where to find them. If you click on the Institution Full Name you'll see the GGBN members page for this institution.
On top right information about the taxon are retrieved live from external sources, such as GBIF, NCBI, BOLD and EOL. In addition you see how many further samples for this taxon can be found at GGBN.
 
 
 
On the left you find information on samples at GGBN that are from same population or same individual as this one.
 
  
In case sequences, publications or multimedia items are provided, further tabs will appear.
+
In case sequences or multimedia items are associated to these materials, further tabs will appear. At the very bottom you'll find information about the dataset.  
 
[[File:GGBN data portal add record detail.jpg|center|700px]]
 
[[File:GGBN data portal add record detail.jpg|center|700px]]
 
==Preorder samples/Login feature==
 
[[File:GGBN data portal login.jpg|thumb|200px|login feature]]
 
To preorder samples or subscribe to searches you must register as a user. To do so click on "log in" or the little human in the menu. We appreciate if you fill out the complete contact information, since these data can then be forwarded to the sample holding institution, but this is not mandatory.
 
 
<div id="wikinote">Your orders will be forwarded to the respective institution holding the requested samples. Please check our [[Data_Privacy | Data Privacy Statement]] for more information about storage of user data. If you don't want to register as a user you can also send us an email at info@ggbn.org.</div>
 
[[File:GGBN data portal menu login.jpg|thumb|right|100px]]
 
After login a menu will appear under the human icon.
 
 
'''Profile''' Change your personal information here.
 
 
'''Settings''' Personal settings for the hitlist can be defined here.
 
 
'''Subscription, Save Searches''' When logged in the hitlist shows an additional column to add samples to the cart. If a sample if not available for loaning for some reasons there is an 'x'. On top right appear buttons to subscribe to this search (and get informed via email if new records are available) as well as to save this search or add selected samples to the cart. You can also add a sample to the card via the details page.
 
[[File:GGBN data portal hitlist logged in.jpg|center|700px]]
 
 
'''Shopping Cart''' If you have added samples to your cart you can go to "View cart" or "Shopping Cart" via the menu or the buttons on right. In step 1 you will see an overview of requested samples and if it is a CITES taxon again a note. Please make sure you belong to an institutions registered with CITES, otherwise you can't loan such samples. Go to "Checkout" to proceed.
 
[[File:GGBN data portal shopping cart step1.jpg|center|700px]]
 
 
In step 2 the samples are grouped by holding institution. In this example we preorder at two different institutions. You can add a comment to them if you want. When clicking "Order now" your preorder is placed. GGBN forwards your request to the holding institutions. We do not forward your complete order, but only sample information relevant for the sample holding collection.
 
[[File:GGBN data portal shopping cart step2.jpg|center|700px]]
 
<div id="wikinote">'''Note: You can only preorder samples. It might be that the samples cannot be loaned to you for some reasons. The curator will contact you and provide details about further procedure. Every GGBN partner is responsible for its samples and procedures. Some partners may require a service charge. In any case you have to sign a Material Transfer Agreement before samples can be loaned. The curator will provide you more details about it.</div>
 

Latest revision as of 14:00, 29 January 2025

Background

We use the Berlin Harvesting and Indexing Toolkit (B-HIT) to harvest GGBN provider data. The records (or units) can be harvested from providers having either a BioCASe or an IPT installation. For BioCASe providers, the schemata ABCD 2.06, ABCD 2.1, ABCDDNA, ABCDGGBN and ABCDEFG are supported (single records or ABCD Archives). For IPT providers, DarwinCore Archives are supported, including the GGBN extensions. Elements that are indexed are listed at http://wiki.bgbm.org/bhit/index.php/Indexed_fields.

You can either use the full text search on the landing page or the selecting "Search" from the menu

Data Quality and Data Cleaning

During harvesting GGBN provider data are checked and cleaned, if necessary. We keep the original provider data in addition to cleaned versions. Data quality tests are done using B-HIT. Country names are translated in English, ISO codes are compared to the country names, coordinates are validated and checked against both ISO code and country name. In case of incomplete data, the tool is looking into the named areas and localities and tries to extract some information regarding the country or the water body.

Scientific names are parsed using the GBIF Name Parser (http://www.gbif.org/developer/species#parser) and customized regular expressions.

Taxonomic Backbone

After harvesting the scientific names are matched against certain checklists of the GBIF checklist bank. Higher taxa, synonyms and accepted taxa are retrieved, also using the GBIF checklist bank webservice (http://api.gbif.org/v1/species). These checklists include: Prokaryotic Nomenclature Up-to-Date (PNU), Catalogue of Life, NCBI and the GBIF backbone itself.

Search

http://www.ggbn.org/ggbn_portal/search/index The Data Portal is based on SOLR, which provides powerful full text search. We have implemented this feature in all search fields, apart from select lists, checkboxes and radio buttons. The search is case insensitive, so e.g. “black sea”, “Black Sea” or “Black sea” will all work.

Search by fields

GGBN data portal search form.jpg

Here you can choose different parameters to filter your results using the facets on the left. The filters/facets are used with AND operator, e.g. materialType=DNA&country=Belgium will search for DNA sample collected in Belgium. Within each facet the OR operator is used, e.g. materialType=DNA&materialType=tissue&country=Belgium will search for DNA OR tissue samples collected in Belgium. This can be extended to as many facets/filters you like.

Search expand sort.PNG

In addition you can sort the data and preview what kind of material is available by expanding the rows


Most of the fields are drop down lists or include suggestion lists to help you. E.g. when typing a name the portal searches for all synonyms and accepted names matching your search term and provides a suggestion list with detailed information about the name found in the GGBN backbone.

GGBN data portal suggestion list.jpg
You can search for any scientific name using "Scientific Name", including higher taxa.

Record detail

This is an example of 6 material entities that are aggregated as three GGBN records (green [1], red [2] and blue [3]) through their relationships. Together they represent one occurrence. In this example there would be three record pages. Please check out our Definition of GGBN Terms for further explanation

The record details page aggregates data from multiple sources. Please have a look at our definitions on differences between GGBN records, material entities and occurrences. Here you see an example with DNA sample, Tissue sample and Specimen. This is both indicated in the top blue bar (individual tabs for each material entity) and above the title. The data are coming from up to three different datasources, depending on where the samples and data are deposited. Below the map you find information about related records (e.g. another tissue derived from the same specimen) as well as information about loaning availabilities and conditions. Furthermore it is checked whether the taxon is listed on CITES. Left to the map you find collecting information and determination details. In the lower part you'll find with information about the physical samples and where to find them. If you click on the Institution Full Name you'll see the GGBN members page for this institution.

In case sequences or multimedia items are associated to these materials, further tabs will appear. At the very bottom you'll find information about the dataset.

GGBN data portal add record detail.jpg