Description and annotation of biomedical experimental data sets: work in progress
Ferguson, Jen
Citations
Authors
Student Authors
Faculty Advisor
Academic Program
UMass Chan Affiliations
Document Type
Publication Date
Subject Area
Files
Embargo Expiration Date
Link to Full Text
Abstract
OBJECTIVE: Collaborating with researchers and curators from The Harvard School of Public Health Bioinformatics Core (HSPH/HBC) to annotate experimental descriptions and data sets.
METHODS: ISATab is an open source software suite that can be used to annotate and apply metadata to experimental data. HSPH/HBC curators create ISATab records tying together information from PubMed papers and associated data sets (GEO files). Curators annotate and describe both raw and derived data files for each investigation, as well as supplying metadata for the investigation as a whole. Once annotated, the records are validated and sent to an internal data management system.
RESULTS: As of Jan. ’11, HBC has collected over 50 annotated public studies comprising 900+ assays in their internal data management system. The ultimate goal is to make curated, metadata-enriched data sets openly available in public repositories, allowing for further data analysis & integration.
CONCLUSIONS: Researchers & curators in this group grapple with many of the same issues around data curation and discovery that librarians do. For example, how much metadata is adequate to ensure discovery, and where’s the sweet spot between too much and too little? Where are ontologies necessary? Do all experiments comprising a published work need to be described, or just a selection? My experiences working as a curator with HSPH/HBC have given me some good insights into how librarians can be involved in e‐science in ways that can benefit all concerned.