Crowd-Sourced Verification of Computational Methods and Data in

Jan 13, 2017 - ACS AuthorChoice - This is an open access article published under a Creative Commons Attribution (CC-BY) License, which permits ...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/crt

Crowd-Sourced Verification of Computational Methods and Data in Systems Toxicology: A Case Study with a Heat-Not-Burn Candidate Modified Risk Tobacco Product Carine Poussin,*,† Vincenzo Belcastro,† Florian Martin, Stéphanie Boué, Manuel C. Peitsch, and Julia Hoeng PMI R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland (Part of Philip Morris International group of companies) S Supporting Information *

ABSTRACT: Systems toxicology intends to quantify the effect of toxic molecules in biological systems and unravel their mechanisms of toxicity. The development of advanced computational methods is required for analyzing and integrating high throughput data generated for this purpose as well as for extrapolating predictive toxicological outcomes and risk estimates. To ensure the performance and reliability of the methods and verify conclusions from systems toxicology data analysis, it is important to conduct unbiased evaluations by independent third parties. As a case study, we report here the results of an independent verification of methods and data in systems toxicology by crowdsourcing. The sbv IMPROVER systems toxicology computational challenge aimed to evaluate computational methods for the development of blood-based gene expression signature classification models with the ability to predict smoking exposure status. Participants created/trained models on blood gene expression data sets including smokers/mice exposed to 3R4F (a reference cigarette) or noncurrent smokers/Sham (mice exposed to air). Participants applied their models on unseen data to predict whether subjects classify closer to smoke-exposed or nonsmoke exposed groups. The data sets also included data from subjects that had been exposed to potential modified risk tobacco products (MRTPs) or that had switched to a MRTP after exposure to conventional cigarette smoke. The scoring of anonymized participants’ predictions was done using predefined metrics. The top 3 performers’ methods predicted class labels with area under the precision recall scores above 0.9. Furthermore, although various computational approaches were used, the crowd’s results confirmed our own data analysis outcomes with regards to the classification of MRTP-related samples. Mice exposed directly to a MRTP were classified closer to the Sham group. After switching to a MRTP, the confidence that subjects belonged to the smoke-exposed group decreased significantly. Smoking exposure gene signatures that contributed to the group separation included a core set of genes highly consistent across teams such as AHRR, LRRN3, SASH1, and P2RY6. In conclusion, crowdsourcing constitutes a pertinent approach, in complement to the classical peer review process, to independently and unbiasedly verify computational methods and data for risk assessment using systems toxicology.



INTRODUCTION

extrapolate short-term observations to long-term outcomes and to translate the potential risks identified from experimental systems to humans, suggesting that its application could become a new standard for risk assessment and decision making.1−3 The analysis of systems toxicology data as well as extrapolation and translation for predictive toxicological outcomes and risk estimates require the development of advanced computational methodologies. To demonstrate improved performance and reliability of new computational approaches, researchers usually benchmark their own algo-

Systems biology aims to create a detailed understanding of the mechanisms by which biological systems respond or adapt (e.g., signaling pathways, transcriptional changes, and genetic modifications) to external stimuli (e.g., drugs, nutrition, chemicals, and temperature). New mechanistic insights are gained through the analysis and integration of large amounts of molecular and functional data generated using cutting edge technologies such as omics or high content screening. When applied in the field of toxicology, the overall approach termed systems toxicology, enables one to quantify biological system perturbations triggered by xenobiotics (e.g., pesticides and chemicals), elucidate toxicity modes of action, and evaluate associated risks.1 Systems toxicology has the potential to © 2017 American Chemical Society

Special Issue: Systems Toxicology II Received: September 26, 2016 Published: January 13, 2017 934

DOI: 10.1021/acs.chemrestox.6b00345 Chem. Res. Toxicol. 2017, 30, 934−945

Article

Chemical Research in Toxicology rithms against state-of-the-art methods but often fall into what is called the “self-assessment trap” resulting in biased evaluations.4 Furthermore, the deluge of data generated and analyzed in systems biology/toxicology renders the review of published results and conclusions tedious for referees. Although reviewers can in principle access raw data that have been stored in public repositories, they cannot easily reproduce an entire analysis themselves. Therefore, there is a clear need for independent and objective evaluation or verification of methods and data. By leveraging the wisdom of the crowd, the systems biology verification combined with Industrial Methodology for PROcess VERification in Research (sbv IMPROVER, http:// sbvimprover.com) project aims to achieve this mission and complement the classical peer review process.5,6 One way to proceed with the verification of computational methods and data is to design a challenge and make it open to the scientific community. To ensure broad participation, the challenge should aim to address questions related to scientific problems of common interests. The challenge is posed by defining clear objectives and rules, and providing data to the participants. Part of the data, called the “Gold Standard” (True values), to be predicted are kept hidden and are used in combination with predefined scoring metrics to assess the performance of anonymized participants’ prediction submissions. Scoring results and team ranking are submitted to an external and independent Scoring Review Panel of experts for review and final approval. The results, conclusions, and lessons learned from the challenge are shared with participants and with the scientific community through conference and symposium presentations and in peer-reviewed publications (Figure 1). Over the past six years, the sbv IMPROVER project has proposed challenges covering various scientific questions in systems biology.7−10

The latest sbv IMPROVER computational challenge titled “Systems Toxicology challenge” was conducted in 2015/2016 and aimed to identify markers of chemical exposure response in blood. During their lifetimes, humans are exposed to a multitude of xenobiotic chemicals that can trigger molecular changes and, when the chemicals are toxic, increase the risk of disease development. Therefore, the identification of biological markers for which molecular level changes reflect the response to specific chemical exposure is important for applications in toxicological diagnosis and risk assessment. For example, Labreche et al. identified dose-specific peripheral blood-based gene signatures for lead exposure.11 Robust marker-based classification models applied to a new blood sample could predict whether a subject had been exposed or not exposed to a chemical substance as well as monitor the magnitude of the exposure response over time during product testing or withdrawal. Blood is a complex tissue to analyze because of the many different cell subpopulations it contains. However, it is a highly relevant tissue to use for marker identification because blood circulates through all the organs that are more directly exposed to toxicants, and it is easily accessible. Most preclinical toxicological in vivo studies are conducted in rodents, which add a degree of complexity when the results are applied to humans. For reliable translational biology/toxicology, it is important to determine biological mechanisms and markers that are systematically perturbed by nontoxic/toxic compounds across species. In addition to the Species Translation challenge that covered some aspects of this problem,7,8 the current challenge addressed also this question in the context of the identification of blood response markers across human and rodent species (Figure 2A). Cigarette smoking is a leading cause of preventable death. It causes serious diseases, including lung cancer, chronic obstructive pulmonary disease, and cardiovascular diseases.12 Cigarette smoke (CS) constituents that pass the lung barrier into the bloodstream can elicit, for example, changes in gene expression in circulating peripheral blood cells that are associated with systemic immune and inflammatory disorders.13,14 Smoking cessation has been shown to revert some CS-induced functional and molecular changes back to normal or intermediate levels depending on the subject’s smoking history and cessation period.15−17 Providing reduced risk alternatives to adult smokers who would otherwise continue to smoke cigarettes is the basis of “Tobacco Harm Reduction”.18 Phillip Morris International (PMI) is developing a number of noncombustible tobacco or nicotine-containing products, including heated tobacco products and e-cigarettes that have the potential to reduce individual risk and population harm compared with smoking cigarettes. This new generation of products is designated as reduced risk products or modified risk tobacco products (MRTPs) according to the US FDA.18 Rigorous scientific assessment is necessary to evaluate their impact on biology. Heating rather than burning tobacco products has been shown to markedly decrease the levels of harmful constituents in the aerosol which may lead to reduced exposure and toxicity.19,20 The systems toxicology strategy may allow us to determine whether reduced exposure leads to reduced toxicity in laboratory models and eventually to reduced risk in those models.1,21 The results provide an early assessment of the potential of MRTPs to reduce the risk of smoking-related diseases compared with continued smoking. Recent investigations on the effect of a heat-not-burn technology-based MRTPs have shown significant reduced impact on disease-

Figure 1. sbv IMPROVER for systems toxicology verification. This initiative provides a framework to achieve an independent and objective verification of computational methods, data, and scientific conclusions in systems biology/toxicology and aims to complement classical peer review processes. The organization of scientific challenges open to the scientific community is one way to proceed. In the context of toxicology, such an approach could be relevant to providing an independent verification of risk assessment data and support regulators for decision-making. 935

DOI: 10.1021/acs.chemrestox.6b00345 Chem. Res. Toxicol. 2017, 30, 934−945

Article

Chemical Research in Toxicology

Figure 2. Overview of the systems toxicology computational challenge. (A) The challenge aims to identify chemical exposure response markers from human and mouse whole blood gene expression data and leverage these markers as a signature in computational models for predictive classification of new blood samples as part of the exposed or nonexposed groups. (B) Data were obtained from blood samples collected in independent clinical and in vivo studies related to cigarette smoke (CS) exposure and cessation in humans and rodents. The experimental groups also included individuals who were exposed to RRPs, also termed modified risk tobacco product (MRTP), or switched to a MRTP after being exposed to CS for a period of time. Participants were asked to develop models to predict smoking exposure (smoker (S) vs noncurrent smoker (NCS)) and cessation (former smoker (FS) vs never smoker (NS)) status of a subject based on his/her gene expression profile generated from a blood sample. #See the disclaimer related to RRPs in the Notes section.

obtained from smokers or rodents exposed to a reference cigarette, never smokers, or rodents exposed to fresh air (Sham) and former smokers or rodents exposed to a reference cigarette for a period of time and then, exposed to fresh air only, as well as additional data sets were provided to participants (Figure 2). For model development and algorithm training, participants used gene expression data and class labels, while for testing, they received gene expression data only and were asked to predict confidence values related to whether a sample belonged to one class (e.g., exposed to smoke) or the other class (not exposed to smoke, including former and never smokers (i.e., noncurrent smokers) for the human data set). Of note, the data provided for training and testing originated from independent studies, with different inclusion criteria. In addition to the conventional cigarette or cessation groups, the

relevant biological processes/mechanisms (e.g., inflammation, oxidative stress, xenobiotic metabolism, monocyte-endothelial cell adhesion and transmigration, and plaque growth) in various biological systems (e.g., lung, 3D bronchial/nasal/gingival organotypic cultures, and vascular systems) compared with effects promoted by a reference cigarette.22−25 The systems toxicology challenge aims to benchmark computational methods for the identification of human and species-independent blood exposure response markers and models predictive of smoking and cessation status. Previous works published by us and others reported the development of whole blood-based gene expression signature classification models that could predict smoking exposure status.26,27 For the challenge, whole genome Affymetrix array gene expression data used in Martin et al.27 were provided to participants. These data 936

DOI: 10.1021/acs.chemrestox.6b00345 Chem. Res. Toxicol. 2017, 30, 934−945

Article

Chemical Research in Toxicology

Figure 3. Release of the training, test, and verification blood gene expression data sets. After blood sample processing and gene expression data generation, the data from independent studies were divided into training, test, and verification sets. Data and class labels from the training data set were provided for the development and training of the blood-based gene signature classification models. Trained models were applied blindly on randomized test and verification gene expression data sets for class prediction of the blood samples. Biotechnologies Ltd., Beltsville, MD, USA) (data set BLD-SMK-01).27 Samples from both these sources included smokers (S), former smokers (FS), and never smokers (NS) selected on well-defined inclusion criteria; and (iii) clinical ZRHR-reduced exposure (REX)C03-EU and -04-JP studies corresponding to randomized, controlled, open-label, 3-arm parallel group, and single-center studies.30 The REX studies aimed to demonstrate reductions in exposure to selected smoke constituents in smoking, healthy subjects switching to the tobacco heating system (THS) 2.2 (Switch) or smoking abstinence/ cessation (Cess) compared with those continuing to use cigarettes (smokers) for 5 days in confinement. The studies were conducted in Europe and Japan and registered at ClinicalTrials.gov with the identifiers NCT01959932 and NCT01970982, respectively. Mouse blood samples were obtained from two independent inhalation studies conducted with female C57BL/6 and Apoe−/− mice for 7 and 8 months, respectively.17,23 Studies included mice randomized into five groups: Sham (exposed to air), 3R4F (exposed to CS from the reference cigarette 3R4F), MRTPs (exposed to mainstream aerosol from THS2.2 or a pMRTP (prototypic product) at nicotine levels matched to those of 3R4F), smoking cessation (Cess), and switching to a MRTP (THS2.2 or a pMRTP) after a 2-month exposure to 3R4F (Switch). Blood samples were collected at different time points. Both potential MRTPs are heat-not-burn tobacco-based technologies. While the tobacco heating system (THS) 2.2 uses an electrically heated system to heat tobacco,31 pMRTP uses a fast-lighting carbon tip as a heat source of tobacco.17 Blood Transcriptomics Data Sets. Transcriptomics data sets were generated from whole blood samples. Data generation and processing are summarized below (more details are in Supporting Information). Data Generation from Human and Mouse Blood Samples. Briefly, total RNAs were isolated from the samples in the PAXgene tubes according to the manufacturer’s instructions (Qiagen). The quality of the extracted RNA and cDNA quality following target preparation using an Ovation Whole Blood Reagent and Ovation RNA Amplification System V2 (NuGEN, AC Leek, The Netherlands) and fragmentation was checked using an Agilent 2100 Bioanalyzer (Santa Clara, CA, USA). After fragmentation and labeling, the cDNA fragments were hybridized on a GeneChip Human Genome U133 Plus 2.0 Array or a GeneChip Mouse Genome 430 2.0 Array

experimental study designs included exposure groups corresponding to (i) switching to a MRTP (subject exposed to CS and then switched to a MRTP) and, (ii) for mouse in vivo studies only, continuous exposure to a MRTP, to assess its biological impact. For data verification purposes, challenge participants were provided with gene expression data generated from these experimental groups and were informed about it on the challenge description website (https://sbvimprover.com/ challenge-4/the-computational-challenge). Participants were asked to apply their blood-based smoking exposure gene signature classification model(s) to provide confidence values whether a sample belonged to the smoking or the noncurrent smoking class. These samples which correspond to the verification data set were randomized together with samples from the test data set and released to the participants without any distinctive label for class prediction (Figure 3). Crowd results were compared with our own results, which were obtained when our blood-based smoking exposure gene signature classification model was applied on the same verification data sets.28 The current work summarizes the results and learnings from this case study of independent and objective computational method and data verification in systems toxicology. It also provides perspectives on how such an approach could be used more systematically for independent verification of risk assessment data and support regulatory decision making.



MATERIALS AND METHODS

Study Population and Designs. Whole blood samples were collected during clinical and in vivo studies or purchased from a Biobank repository. The studies are described in detail in the Supporting Information. The sample groups/classes, sizes, and characteristics for the different studies are summarized in Table S1. Briefly, human blood samples were obtained from (i) a clinical casecontrol study conducted at the Queen Ann Street Medical Center (QASMC), London, UK and registered at ClinicalTrials.gov with the identifier NCT0178029827,29 and (ii) a biobank repository (BioServe 937

DOI: 10.1021/acs.chemrestox.6b00345 Chem. Res. Toxicol. 2017, 30, 934−945

Article

Chemical Research in Toxicology

Figure 4. PMI human blood sample class predictions for the test and verification data sets. Trained blood-based gene smoking exposure signature models to discriminate between smokers (S) and noncurrent smokers (NCS) including former smokers (FS) and never smokers (NS) were applied on the test (H1) and verification (H3 and H4) gene expression data sets for class prediction. In the clinical ZRHR-reduced exposure (REX) C-03EU (H3) and -04-JP (H4) studies, smokers continued smoking cigarettes (S), were abstinent from smoking (Cess), or switched to THS2.2 (Switch) for 5 days in confinement. Probabilities P that a sample belonged to the smoker group and 1 − P that the sample belonged to the NCS group were computed and transformed as log odds (P/1 − P). (A) Log odds distribution is displayed per class/group on boxplots. Welch’s t test p-value: 3* 257; SC2, team 219 > 250 > 264) Welch’s t test p-value: *