Use Cases HTS library samples
The development and use of high-throughput next generation sequencing (HTS) have outstripped current plans of SYNTHESYS and GGBN to join natural history (NH) collection data with DNA and tissue collection data. HTS libraries can be considered a preparation of the genetic material of an organism(s), the actual physical molecular representation of a collection/specimen. These libraries come with specific adaptors that limit their transferability to other sequencing systems. They are prepared at great expense, but usually are only used for a single project, whilst a great deal of additional useful information may be available within these libraries. To increase the potential of the HTS libraries to be used for multiple projects they would need to be discoverable and metadata information made available. HTS library searches require the metadata to be queried on specific standardized keywords (by e.g. organism, HTS method etc.). Actual HTS sequence results are available in public repositories (e.g. INSDC’s sequence read archive (SRA) or DRYAD), because researchers need to make their sequence results publically available as a prerequisite for journal publications. However, the accompanying HTS library metadata are very limited or more usually not available at all in these repositories. In particular, information on post-sequence analytical pipeline processes is limited, preventing accurate and meaningful comparisons between studies and repetition. The range of different HTS techniques and the continued development of new techniques is challenging for setting up an overarching standard. The aim is to provide the best standards and practices for storage and access to metadata of HTS libraries. HTS library parameters already in the GGBN Data Standard have been reviewed and augmented in order to incorporate library metadata of existing as well as future techniques. This is being realized by prepa ration of use cases incorporating a range of different HTS techniques (e.g. whole genome shotgun sequencing, RADseq sequencing, single molecule MinION sequencing). While some parameters (e.g. permits, sample type) exist already, others are missing. This project is just a first step, a public discussion is needed in the near future. All use cases will be made available through the GGBN portal sandbox (http://sandbox.ggbn.org/ggbn_sandbox) and will be kept stable for the next three years. Also a prototype of new search functionalities will be available by the end of June 2017 in the sandbox. This project is funded by SYNTHESYS