Experimental Reproducibility, Standardization and FAIR Scientific Data

Speaker:  Syed Ahmad Chan Bukhari – CT, United States
Topic(s):  Information Systems, Search, Information Retrieval, Database Systems, Data Mining, Data Science


B and T cells form the two pillars of the adaptive immune system, and both express antigen-specific receptors at their surface, namely, B cell receptors (BCRs) and T cell receptors (TCRs), respectively. High-throughput sequencing (HTS) of immunoglobulin (B-cell receptor, antibody) and T-cell receptor repertoires has increased dramatically since the technique was introduced in 2009. AIRR (Adaptive immune receptor repertoire) sequencing (AIRR-seq) has enormous promise for understanding the dynamics of the immune repertoire in vaccinology, infectious disease, autoimmunity, and cancer biology, but also poses substantial challenges. AIRR-seq studies are associated with complex metadata, such as donor phenotypes, cell types and the nucleic acid material used, to name a few. These metadata are crucial for ensuring reproducibility, as well as for facilitating secondary analysis and meta-analyses. My talk will be divided into three parts i) experimental reproducibility, ii) scientific standardization and iii) FAIR metadata authoring and submission. In the first part of my talk, I will present previous studies and surveys about experimental reproducibility. With a lot of data generated, experimental reproducibility turns into a crisis in physical and biological sciences with various underlying reasons. In the second part, I will give an overview about the existing efforts to combat the experimental reproducibility issue and will concentrate on our community-led effort to develop standards for the AIRR-seq data. AIRR community has proposed minimal information about adaptive immune receptor repertoire (MiAIRR), a standard for reporting AIRR-seq studies. The MiAIRR standard has been operationalized using the National Center for Biotechnology Information (NCBI) repositories. The third part of my talk will be about the post- standardization issues. Submissions of data to the public repositories typically use a combination of web-based and flat-file templates and include only a minimal amount of terminology validation. As a result, studies at the public repositories are often described using inconsistent terminologies, limiting scientists’ ability to access, find, interoperate, and reuse the data sets. In order to improve metadata quality, ease submission of scientific data to the public repositories and to improve experimental reproducibility, we have developed an array of tools for FAIR (Findable, accessible, interoperable and reusable) scientific metadata data authoring, validation, and submission. The third part of my talk will conclude with short demos of our popular softwares namely CEDAR workbench,  CAIRR, CEDAR-to-NCBI, and CEDAR OnDemand.

About this Lecture

Number of Slides:  45
Duration:  60 minutes
Languages Available:  English
Last Updated: 

Request this Lecture

To request this particular lecture, please complete this online form.

Request a Tour

To request a tour with this speaker, please complete this online form.

All requests will be sent to ACM headquarters for review.