Sensor Issues Cite This: ACS Sens. XXXX, XXX, XXX-XXX
pubs.acs.org/acssensors
Medical Sensors for the Diagnosis and Management of Disease: The Physician Perspective Bradford D. Pendley* and Erno Lindner Department of Biomedical Engineering, 330 Engineering Technology, University of Memphis, Memphis, Tennessee 38152, United States ABSTRACT: The objective of this paper is to assist developers of medical sensors to better formulate the clinically relevant design criteria and required performance characteristics of their novel sensor based on an understanding of how these devices will be used by physicians. Sensor technologies play a central role in medicine, and the most critical aspect of the sensor’s clinical utility relates to these design decisions. Clinically, sensors are used by health care providers to make both diagnostic and management decisions, and the sensors that aid in these decisions are evaluated by certain clinical, as well as analytical, criteria. Failure to adequately address these end-user requirements can lead to the development of sensors without clinical utility. KEYWORDS: medical sensors, clinical, diagnostic, management, design
■
THE NATURE OF CLINICAL DECISIONS AND HOW THEY ARE GUIDED BY TEST RESULTS Like scientists and engineers, physicians are problem solvers. The problems we address relate to a patient’s health and, in this regard, physicians view problems in one of two categories: diagnostic and management. The diagnosis of a disease in a patient is the labeling of the patient’s symptoms, physical exam findings, and test results with the name of an established disease. For example, a person who for 2 days has had a cough productive of sputum, fever of 101 °F (38.3 °F), an elevated white blood cell count in a peripheral blood sample, and an opacity in the left lower lobe of her lung on a chest X-ray could be given a diagnosis of pneumonia. If the diagnosis is correct, it allows the physician to predict and anticipate what might happen next based on an understanding of the natural history of the disease process. For example, if a patient has pneumonia, the prognosis is very different than if the patient’s cough was due to post nasal drip. A diagnosis is a working hypothesis of the most probable disease based upon the available evidence. To arrive at a diagnosis, the physician will group a patient’s pertinent symptoms with more objective physical findings and test results. These “data” are weighted based on their reliability and their predictive value for that disease. Because a diagnosis is a hypothesis, physicians actually construct a list, called a differential diagnosis, of several diseases that could be explained by the “data” and then rank them from most to least likely. The most likely disease is then the diagnosis. Once a diagnosis is made, the natural history of the disease is known and this allows for predictions to be made as to what may occur. Management decisions are those made by a © XXXX American Chemical Society
physician to mitigate the risk of the anticipated future adverse effects of the disease. These decisions are guided by the results of clinical studies that have tested various treatments or interventions to alter the natural progression of the disease. For example, a diagnosis of pneumonia imparts a risk of death to the patient. However, with appropriate treatment and care, the management of a patient with pneumonia can reduce the chance of death and lead to the patient’s full recovery from the disease. During the disease progression, a physician may also collect additional information from the patient, physical exam findings, and ongoing test results to aid in the ongoing treatment. The decisions as to the specific treatment and care are all management decisions. Both diagnostic and management decisions are guided by data provided by the patient, physical exam findings, as well as the results of tests. For the purposes of this article, it is assumed that a sensor designed to measure some analyte of interest, which is linked in some pathophysiological manner to the disease, provides the test results. For example, the amperometric measurement of blood glucose in whole blood is a test that can be used for both diagnosis and management of diabetes mellitus, a disease in which the body’s ability to regulate blood glucose is impaired. Thus, chemical sensor technologies can play a central role in medicine. For those of us involved in the design of sensor technologies used in medicine, a few important ideas are central. First, one must understand the analytical requirements for the sensor Received: September 1, 2017 Accepted: October 30, 2017
A
DOI: 10.1021/acssensors.7b00642 ACS Sens. XXXX, XXX, XXX−XXX
Sensor Issues
ACS Sensors
the degree to which it is present in those individuals with and without the disease. This is a critical point; the utility of a sensor is not only determined by its analytical characteristics but also by how prevalent the disease is (i.e., the occurrence of the disease in humans). To illustrate this point, consider a hypothetical sensor for influenza A (i.e., flu virus) with a medical sensitivity of 95% and a medical specificity of 98%. During the peak flu season, the prevalence of influenza among people with “flu-like” symptoms (e.g., cough, fever, body aches, etc.) might be 40%. So, if you consider 100,000 people with these symptoms, 40,000 will have influenza while 60,000 will not. Table 2 then shows the 2 × 2 table that can be constructed.
(analyte, analytical selectivity, dynamic range, etc.) based upon its intended use (e.g., as a sensor used in the diagnosis or management of a disease). Second, and perhaps more critical, is to understand when and how the sensor is used by the physician: A test result is useful only when there is a question as to how a physician should proceed (with a diagnostic or management decision) and the result of the test will increase the likelihood of making a good clinical decision. The subsequent discussion will focus on these two ideas.
■
ASSESSING THE UTILITY OF A DIAGNOSTIC TEST FROM THE END-USER (PHYSICIAN) PERSPECTIVE
Table 2. 2 × 2 Table for the Evaluation of a Hypothetical Influenza Sensor during Peak Flu Seasona
One way physicians assess the diagnostic value of a test is using the test’s positive and negative predictive values, both of which are related to the sensor’s medical sensitivity and specificity. It is important to note, in contrast to analytical sensitivity and specificity, which informs about a sensor’s performance under well-defined conditions, the medical sensitivity and specificity measures the performance of a sensor for the diagnosis of a disease and depends not only on the sensor’s performance characteristics, but also the prevalence (i.e., how common the disease occurs) of the disease.1,2 The values of a sensor’s medical sensitivity and specificity are experimentally determined using the sensor’s ability to accurately assess whether a patient is with or without a disease. The positive and negative predictive values can be calculated as shown in Table 1 and using eqs 1−4.
test result positive test result negative
disease present
disease absent
true positive false negative
false positive true negative
Therefore, if a patient with flu-like symptoms seeks medical attention during the peak of flu season, there is a 40% likelihood the patient has influenza. However, if the physician orders an influenza test and the result is positive, the positive predictive value of the result is 96.9% that the patient has influenza (using eq 3 and Table 2). Therefore, the use of this test is warranted because a positive result will mean the physician is much more likely to assign a diagnosis of influenza given a positive test results. Furthermore, the negative predictive value is 96.7%, which would argue against the diagnosis of influenza should the result be negative. Thus, this hypothetical sensor would be highly useful to help diagnose influenza in this clinical situation. Now consider a patient with flu-like symptoms who seeks medical attention but not during flu season and when the prevalence of influenza is 0.1%. If you consider 100,000 people with these “flu-like” symptoms, only 100 will have influenza,while 99,900 will not. Table 3 then shows the 2 × 2 table that can be constructed.
The interpretation of each result (i.e., true or false positive or negative) is influenced by selection of specific sensor cutoff value for this binary choice. These cutoffs are selected based on receiveroperator curves for the sensor which optimize the medical sensitivity and specificity.3
true positive results true positive + false negative results
Table 3. 2 × 2 Table for Evaluation of a Hypothetical Influenza Sensor off Flu Season
(1)
medical specificity =
true negative results true negative + false positive results
test result positive test result negative
(2)
positive predictive value =
true positive results total positive test results
true negative results negative predictive value = total negative test results
1200 58800
Values in the table are calculated as follows: Test result positive with disease = sensitivity × 40,000; Test result negative with disease = 40,000 − test result positive with disease; test result negative without disease = specificity × 60,000; test result positive without disease = 60,000 − test result negative without disease.
a
medical sensitivity =
disease absent
38000 2000
a
Table 1. 2 × 2 Table for the Evaluation of a Diagnostic Sensora test result positive test result negative
disease present
(3)
disease present
disease absent
95 5
1998 97902
Now, if a patient with “flu-like” symptoms seeks medical attention during a time with a low prevalence of flu, there is a 0.1% likelihood the patient has influenza. However, if the physician orders an influenza test and the result is positive, the positive predictive value of the result is 4.5% that the patient has influenza. Therefore, the use of this test adds little to the clinical picture as it is unlikely the person has influenza. Thus, the same sensor used in this clinical scenario is not very useful. Understanding the preceding example is critical for those who develop sensors as you can “look up” the prevalence of the disease for which you wish to develop a sensor. Then, using
(4)
From a sensor design viewpoint, the sensor detects an analyte that ideally is uniquely related to the disease of interest (i.e., the analyte is only present when the disease is present) and therefore the sensor’s analytical sensitivity and selectivity are related to the ability of the sensor to detect the disease. In contrast, the sensor’s medical sensitivity and specif icity are related to both the sensor’s ability to detect the analyte in addition to B
DOI: 10.1021/acssensors.7b00642 ACS Sens. XXXX, XXX, XXX−XXX
Sensor Issues
ACS Sensors
that helps with management decisions. For example, consider a blood glucose sensor. Physiologically, blood glucose in a person without diabetes mellitus is tightly regulated within a range of approximately 70−140 mg/dL (range from fasting to after eating). For those people with diabetes mellitus, blood glucose can increase substantially, even into the 2000 mg/dL range or more. On the other hand, if the patient is taking medicine that can lower blood glucose, it can be as low as 10−20 mg/dL before death. A brilliant solution that allowed for the combination of clinical requirements with analytical performance for blood glucose measurements was borne out of the Diabetes Control and Complications Trial9 in which clinicians studied the effects of tightly regulating blood glucose in patients with Type 1 diabetes mellitus. In this trial, tight blood glucose control with insulin led to increased risk of hypoglycemia and its lifethreatening consequences; it also led to fewer long-term medical complications with better blood glucose control. This meant that controlling a patient’s blood glucose as close to the physiological range led to fewer long-term complications of diabetes. However, such “tight” control also increased the risk of hypoglycemia so the accurate measurement of blood glucose in certain ranges became critical. The Parkes error grid was developed after surveying 100 physicians who treat patients with diabetes mellitus and whose “expert consensus” allowed Parkes and co-workers10 to construct a scatter plot of blood glucose readings (test device vs standard device) partitioned into 5 regions (A−E) where the analytical requirements of accuracy were set by the clinical requirements. An updated version of this is currently under development.11 The concept of medical “expert consensus” driving analytical performance characteristics is precisely what we advocate. A consequence of the Parkes error grid is that, as long as the detection limit of a sensor is adequate to detect the lowest level of analyte needed for a clinical decision with minimal uncertainty, there is no need to optimize the detection limit. In fact, doing so might sacrifice other sensor performance characteristics that do impact a sensor’s clinical utility. On the other hand, it is important to recognize that a theoretically attainable (often the published) detection limits may not be manageable in real patient samples because they might have been calculated from the following: (i) results recorded in standard solutions without any potential interfering compounds or (ii) the standard deviation of a background signal (sB) instead of the residual mean standard deviation of the data points (RMSD) around the calibration curve with a slope of S within the physiologically relevant concentration range (c1DL = 3 × sB/S vs c2DL = 3 × RMSD/S, respectively). If c1DL is termed detection limit, c2DL is the resolution of the method, i.e., the smallest concentration difference which can be determined with the sensor. Unfortunately, papers on novel sensors often report detection limit values without justification of the method and calculation used for their determination. With respect for a sensor’s utility for diagnosis or management, the agreement between the results provided by an established method/sensor and the new method/sensor is probably the most important. The level of agreement is most commonly determined in head-to-head comparison of a new method/sensor with an established method/sensor. While experimentally method comparison studies are straightforward, the statistical analysis can be quite complicated, and a careful,
your estimate of the best medical sensitivity and specificity you could reasonably expect (and understanding 100% is not an option), you can decide a priori whether the sensor is likely to achieve its goal. Because of this, diagnostic sensors made for very low prevalence diseases (e.g, ovarian cancer) will likely not have clinical utility and the Food and Drug Administration as well as the U.S. Preventive Services Task Force have argued against their usage.4,5 One pitfall of the “Baysean” or probabilistic approach to the practice of medicine involves the changing prevalence of an infectious disease during an outbreak of that disease. Many times, patients will come to medical attention with non-diseasespecific symptoms such as fever which can have multiple etiologies. Early on in what will become an epidemic, the source of fever is misidentified because the prevalence of that disease is thought to be low. For example, leptospirosis is an infectious zoonosis disease with a prevalence (in tropical regions) of 10 per 100,000 people.6 However, if there were an outbreak, the prevalence would climb higher. If one uses the “normal” (i.e., not during an epidemic) prevalence, there is limited utility in designing a sensor to be used to screen people for leptospirosis because to achieve a respectable positive predictive value of 91%, the medical sensitivity and medical specificity needed for such a sensor is 99.999%, a tall order. However, in times of outbreak when the prevalence increases, such a sensor may have clinical utility.
■
ASSESSING THE UTILITY OF A MANAGEMENT TEST FROM THE END-USER (PHYSICIAN) PERSPECTIVE Ultimately, the value of a management test is in its prognostic value, that is, the ability of the test result to influence the medical decision related to the ongoing care of a patient. For a sensor to be useful to a physician in making management decisions, two criteria must be met: (1) The presence of or variation in the analyte’s concentration must be linked to the disease’s progression, and (2) the sensor must be able to measure the analyte concentration accurately within the matrix of interest (e.g., blood, tears, urine, tissue, breath, etc.) within a task specific time frame. Many of us who design sensors are intimately familiar with the second criterion. Considerable efforts are expended to create a sensor that can selectively measure the analyte within a complex matrix and in a specified time frame. However, for a medical decision based on a test result there are important related issues. For example, if the expectation is that the sensor provides insight into whether a disease is local or widespread (e.g., cancer), it requires attention be paid to sampling. Consider the case of the deadly skin cancer melanoma. If the disease is localized to the skin, the 5-year survival rate is about 97% while the survival rate for widespread, “metastatic” melanoma is 15−20%.7 Therefore, efforts to detect melanoma cell free circulating DNA, the so-called “liquid biopsy”, provides an opportunity for such a sensor to provide information crucial to the management of this disease by providing information on its extent of spread.8 However, such free circulating melanoma DNA are not necessarily homogeneously distributed in the bloodstream, and its detection is limited by the sampling error. One implication of the second criterion is often overlooked by those of us who design sensors because we are focused on the sensor and not on its use. The concentration range of a sensor for a physician and its accuracy are critical performance characteristics only in the range of concentrations of the analyte C
DOI: 10.1021/acssensors.7b00642 ACS Sens. XXXX, XXX, XXX−XXX
ACS Sensors
■
methodical approach is needed to minimize the possibility of misinterpretations of the results.12 Most of us who work on the design and characterization of medical sensors do not pay adequate attention to the largest challenge for management sensors, namely, that most analytes detected with management sensors are actually surrogate markers for that disease and its progression. A surrogate marker is something that is measured (e.g., a molecule or ion) whose presence or concentration is believed to be related to the disease progression or status. Such a surrogate marker is used when no analyte directly related to the disease is available or is measurable with existing methods, and the surrogate has been shown through clinical trials to be linked to the disease. For example, many commercial point-of-care testing devices measure serum creatinine levels electrochemically or optically and the resultant value is used as an indicator of renal (kidney) function.13 Creatinine is a metabolite of protein catabolism that is normally released into the blood and filtered by the kidneys at a constant rate. The rate of filtration through the kidneys (called the glomerular filtration rate, GFR) is used clinically to assess the functioning of the kidneys; thus, serum creatinine level is a surrogate marker for renal function. However, the utility of creatinine as a surrogate marker for GFR has limitations. For example, if a person eats a protein rich meal, exercises vigorously, or takes creatinine supplements, the assumption that creatinine is being released into the blood at a constant rate is incorrect; serum creatinine will rise in this circumstance and the calculated GFR will be lower. Thus, renal function “appears” to be less because the assumptions for the use of creatinine as a surrogate marker are not valid in these circumstances. Fortunately, in this case, clinicians understand these potential errors and can identify them when present.
■
CONCLUSION
■
AUTHOR INFORMATION
Sensor Issues
REFERENCES
(1) Vessman, J.; Stefan, R. I.; van Staden, J. F.; Danzer, K.; Lindner, W.; Burns, D. T.; Fajgelj, A.; Müller, H. Selectivity in Analytical Chemistry (IUPAC Recommendations 2001). Pure Appl. Chem. 2001, 73 (8), 1381−1386. (2) Dorkó, Z.; Verbić, T.; Horvai, G. Selectivity in analytical chemistry: Two interpretations for univariate methods. Talanta 2015, 132, 680−684. (3) Florkowski, C. M. Sensitivity, Specificity, Receiver-Operator Characteristic (ROC) Curves and likelihood ratios: Communicating the Performance of Diagnostic Tests. Clin. Biochem. Rev. 2008, 29, S83−S87. (4) https://www.uspreventiveservicestaskforce.org/Page/ Document/UpdateSummaryFinal/ovarian-cancer-screening accessed 8−20−17. (5) Voelker, R. Ovarian Cancer Screening Tests Don’t Pass Muster. JAMA 2016, 316, 1538. (6) www.who.int/zoonoses/diseases/lerg/en/index2.html accessed 10−9−17. (7) https://www.cancer.org/cancer/melanoma-skin-cancer/ detection-diagnosis-staging/survival-rates-for-melanoma-skin-cancerby-stage.html. Accessed 8−20−17. (8) Huynh, K.; Hoon, D. S. B. Liquid Biopsies for Assessing Metastatic Melanoma Progression. Crit. Rev. Oncog. 2016, 21 (1−2), 141−154. (9) Nathan, D. M.; Genuth, S.; Lachin, J.; Cleary, P.; Crofford, O.; Davis, M.; Rand, L.; Siebert, C. The Effect of Intensive Treatment of Diabetes on the Development and Progression of Long-Term Complications in Insulin-Dependent Diabetes Mellitus. N. Engl. J. Med. 1993, 329 (14), 977−86. (10) Parkes, J. L.; Slatin, S. L.; Pardo, S.; Ginsberg, B. H. A new consensus error grid to evaluate the clinical significance of inaccuracies in the measurement of blood glucose. Diabetes Care 2000, 23 (8), 1143−8. (11) Pfützner, A.; Klonoff, D. C.; Pardo, S.; Parkes, J. L. Technical Aspects of the Parkes Error Grid. J. Diabetes Sci. Technol. 2013, 7 (5), 1275−1281. (12) Lindner, E.; Pendley, B. D. A Tutorial on the application of ionselective electrode potentiometry: An analytical method with unique qualities, unexplored opportunities and potential pitfalls. Anal. Chim. Acta 2013, 762, 1−13. (13) Shepard, M. D. S. Point-Of-Care Testing and Creatinine Measurement. Clin. Biochem. Rev. 2011, 32, 109−114.
Both diagnostic and management decisions can be greatly enhanced with information from sensors, but developers should understand the clinical criteria by which such sensors will be judged and used and their associated pitfalls. The development of medical sensor technologies to aid health care workers in making decisions should focus on the clinical requirements and the information content the sensors will provide. It is essential to understand that even the best sensors, with outstanding analytical characteristics, will most likely not have clinical utility for the diagnosis of very low prevalence diseases. On the other hand, the same sensors in another environment or other times with high prevalence of the disease could be extremely useful.
Corresponding Author
*E-mail: bpendley@memphis.edu. ORCID
Bradford D. Pendley: 0000-0002-3379-8376 Erno Lindner: 0000-0002-2561-4784 Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS The authors gratefully acknowledge the financial support from the FedEx Institute of Technology through the Sensor Institute of the University of Memphis (Sensorium). D
DOI: 10.1021/acssensors.7b00642 ACS Sens. XXXX, XXX, XXX−XXX