Latest revision as of 13:05, 30 January 2025

Overview

The data architecture of the GGBN Data Portal is based on the GBIF infrastructure. The basic principle of GBIF as well as of the GGBN is to record all data sets only once. Stored at only one place they can be used as a linked reference for different applications. The GGBN Data Portal bridges the gap between sequence portals and GBIF (see Figure). More information about the GGBN Data Portal can also be found in Droege et al. 2014 Nucl Acids Res.

Data flow

Within GGBN specimen data are recalled by the same data pipelines which are used by GBIF.

Since many institutions joined GBIF applying different database structures each, the installation of wrappers has become a standard to combine different sources and integrate data easily into networks. There are two main wrapper softwares available BioCASE, and IPT. GGBN has developed the GGBN Data Standard to share DNA and tissue data via GGBN. This standard is meant to be used together with ABCD or DarwinCore.

General data architecture of the GGBN Dta Portal. Specimen and DNA sample databases (on top left) are operated by the Network partners. Their data content is structured and provided by using BioCASe or IPT and the GGBN Data Standard extensions. The Berlin Harvesting and Indexing Toolkit (click to see documentation) is used to harvest GGBN data and store them in a MadriaDB database. After harvesting the data are cleaned and enriched by e.g. a match against certain datasets of the GBIF checklist bank. In addition we update the CITES status as this is important for the sample request system. In addition we use a SOLR instance to speed up the query and prepare the data (e.g. aggregate associations, backbone core). Finally the data are aggregated from multiple sources in the Data Portal to be displayed. GGBN offers a web service for basic statistics, the documentation will follow soon.

DNA and Tissue Bank databases

You can use any database system to manage your DNA or tissue bank. IPT and BioCASe can handle most of them. Please check out which parameters are required to share data via GGBN.

The DNA Module has been developed as an open source solution for administer a DNA and tissue bank. A new version is currently planned. Some of our partners are already using it, but you can use any suitable software.

@@ Line 1: / Line 1: @@
 =Overview=
-The data architecture of the GGBN is based on the [http://www.gbif.org/ GBIF] infrastructure. The basic principle of GBIF as well as of the GGBN is to record all data sets only once. Stored at only one place they can be used as a linked reference for different applications. The GGBN Data Portal bridges the gap between sequence portals and GBIF (see Figure). More information about the GGBN Data Portal can also be found in [http://nar.oxfordjournals.org/content/42/D1/D607 Droege et al. 2014 Nucl Acids Res.]
+The data architecture of the GGBN Data Portal is based on the [http://www.gbif.org/ GBIF] infrastructure. The basic principle of GBIF as well as of the GGBN is to record all data sets only once. Stored at only one place they can be used as a linked reference for different applications. The GGBN Data Portal bridges the gap between sequence portals and GBIF (see Figure). More information about the GGBN Data Portal can also be found in [http://nar.oxfordjournals.org/content/42/D1/D607 Droege et al. 2014 Nucl Acids Res.]
 [[File:GGBN Portal.jpg|center|700px]]
@@ Line 10: / Line 10: @@
 [[File:GGBN Portal Architecture.jpg|center|700px]]
-'''General data architectur of the GGBN Dta Portal architecture.''' Specimen and DNA sample databases (on top left) are operated by the Network partners. Their data content is structured and provided by using BioCASe or IPT and the GGBN Data Standard extensions. The Berlin Harvesting and Indexing Toolkit is used to harvest GGBN data and store them in a MySQL database. In addition we use a SOLR instance to speed the query. After harvesting the data are cleaned and enriched by e.g. a match against certain datasets of the GBIF checklist bank. Finally the data are aggregated from multiple sources in the portal to be displayed.
+'''General data architecture of the GGBN Dta Portal.''' Specimen and DNA sample databases (on top left) are operated by the Network partners. Their data content is structured and provided by using BioCASe or IPT and the GGBN Data Standard extensions. The '''[http://wiki.bgbm.org/bhit/index.php/Main_Page Berlin Harvesting and Indexing Toolkit (click to see documentation)]''' is used to harvest GGBN data and store them in a MadriaDB database. After harvesting the data are cleaned and enriched by e.g. a match against certain datasets of the GBIF checklist bank. In addition we update the CITES status as this is important for the sample request system. In addition we use a SOLR instance to speed up the query and prepare the data (e.g. aggregate associations, backbone core). Finally the data are aggregated from multiple sources in the Data Portal to be displayed. GGBN offers a web service for basic statistics, the documentation will follow soon.
 =DNA and Tissue Bank databases=
-You can use any database system to manage your DNA or tissue bank. IPT and BioCASe can handle most of them. Please check out []Mandatory_and_recommended_fields_for_sharing_data_with_GGBN | which parameters]] are required to share data via GGBN.
+You can use any database system to manage your DNA or tissue bank. IPT and BioCASe can handle most of them. Please check out [[Mandatory_and_recommended_fields_for_sharing_data_with_GGBN | which parameters]] are required to share data via GGBN.
 The [[DNA_Module | DNA Module]] has been developed as an open source solution for administer a DNA and tissue bank. A new version is currently planned. Some of our partners are already using it, but you can use any suitable software.

Data Portal Architecture: Difference between revisions

Latest revision as of 13:05, 30 January 2025

Overview

Data flow

DNA and Tissue Bank databases

Navigation menu

Data Portal Architecture: Difference between revisions

Latest revision as of 13:05, 30 January 2025

Overview

Data flow

DNA and Tissue Bank databases

Navigation menu

Search