Article pubs.acs.org/ac
Bootstrap Approach To Compare the Slopes of Two Calibrations When Few Standards Are Available Graciela Estévez-Pérez,† Jose M. Andrade,*,‡ and Rand R. Wilcox§ †
Department of Mathematics and ‡Group of Applied Analytical Chemistry, Department of Analytical Chemistry, University of A Coruña, Campus da Zapateira, 15071 A Coruña, Galicia, Spain § Department of Psychology, University of Southern California, Los Angeles, California 90089, United States S Supporting Information *
ABSTRACT: Comparing the slopes of aqueous-based and standard addition calibration procedures is almost a daily task in analytical laboratories. As usual protocols imply very few standards, sound statistical inference and conclusions are hard to obtain for current classical tests (e.g., the t-test), which may greatly affect decision-making. Thus, there is a need for robust statistics that are not distorted by small samples of experimental values obtained from analytical studies. Several promising alternatives based on bootstrapping are studied in this paper under the typical constraints common in laboratory work. The impact of number of standards, homoscedasticity or heteroscedasticity, three variance patterns, and three error distributions on least-squares fits were considered (in total, 144 simulation scenarios). The Student’s t-test is the most valuable procedure when the normality assumption is true and homoscedasticity is present, although it can be highly affected by outliers. A wild bootstrap method leads to average rejection percentages that are closer to the nominal level in almost every situation, and it is recommended for laboratories working with a small number of standards. Finally, it was seen that the Theil−Sen percentile bootstrap statistic is very robust but its rejection percentages depart from the nominal ones ( 0.55) because the p-values and/or percentages of rejections remain quite stable (e.g., compare columns 2nd and 8th). The other statistics decrease the percentage of rejections and, a bit surprinsingly, the T statistic is the best. Table 2 presents the results for heteroscedasticity associated with a reduction of the variance of residues when the explanatory variable departs from the central value (VP3). In general the p-values clearly overpass the nominal one, both for normal and asymmetric errors, but for the WB statistic. When the fewest standards are considered (n = 3), results are really poor for T, Qt, and MP. When an outlier is present, the other tests improve their behavior but they do not outperform it. Therefore, WB should be the choice here. An interesting conclusion from the simulations above is that the behavior of the statistics depends much more on the type of heteroscedasticity than on the distribution of errors (in which cases the behaviors were quite homogeneous). Thus, it is E
DOI: 10.1021/acs.analchem.5b04004 Anal. Chem. XXXX, XXX, XXX−XXX
Article
Analytical Chemistry
sample sizes and β3 values. In effect, the more standards we have and the larger the value of β3 is, the greater the statistical power should be. Theoretically, when homoscedasticity holds, T should be more powerful than WB, ex ante. In effect, it was found that both T and WB have very similar power, although for smaller sample sizes, T approaches 1 faster. As expected, both statistics clearly increase their power when the number of standards increases. Hence, WB performed very satisfactorily, even when compared with the optimal approach (under these conditions), T. When heteroscedasticity and nonnormal residuals were considered [we selected increasing variances (VP2) and asymmetric error distribution (g = 0.5, h = 0) because in our experience this seems a common situation in laboratories], the T statistic had very unsatisfactory results, as expected, and only the WB statistic must be considered. However, its statistical power converged to 1 at a slower pace than when homoscedasticity and normal errors occurred: it was relatively low for small values of β3 and/or for very small sample sizes (n1= n2 = 3). However, when the sample size of a calibration is equal or greater than 5, the statistical power increases to 1 quickly. This is a highly relevant result because, in analytical laboratories, to prepare a series of five standards is a de facto standard, and so this practice now has an additional benefit.
average p-values exceed systematically the nominal one (=0.5, or 50%, Figure S2). Figure 2 depicts the average percentage of rejections (average type I error estimated for the three error distributions, nominal value = 5%). A final remark is in order here, because in our studies TS was far less satisfactory than for Ng and Wilcox.8 Further simulations with more standards were considered (n = 10, 20, 40, 60) and the statistic yielded average results close to the nominal ones, comparable to those of WB. In some cases TS overpassed WB, but we feel that TS is not suitable for typical laboratory situations where very few standards are used. Scenario 1, Situation 2: Within-Group Heteroscedasticity and Between-Group Homoscedasticity, Different Calibration Ranges. In the Experimental Section , situation S2 was described as that when the explicative variable has different ranges in both calibrations. Results were totally similar to those above, so they are not presented here for brevity. Scenario 2: Both Within-Group and Between-Group Heteroscedasticity. Out of the many possibilities that might be simulated, three rather “extreme” situations were considered: normal error in group 1 plus asymmetric error in group 2; normal error in group 1 plus normal error with an outlier in group 2; and asymmetric error in group 1 plus normal error with an outlier in group 2. Figure S3 shows that the most useful statistics were T, WB, and TS. Their behaviors agree largely with those in the two previous scenarios. T is the most valuable statistic when homoscedasticity is present, although it becomes very affected by an outlier. In contrast, WB leads to average rejection percentages that are very close to the nominal ones in almost every situation. Hence this would be the recommended choice for laboratories working with a small number of standards. Finally, TS is very robust but its rejection percentages depart from the nominal ones (0.55, and so its use is discouraged for a low number of standards). In the most complex situation, heteroscedasticity was induced both within and between calibrations. Once more, the most useful statistics were T, WB, and TS. Their behaviors agree largely with those in the previous scenarios. T is the most valuable statistic when homoscedasticity is present, although it becomes strongly affected by an outlier. On the contrary, WB lead to average rejection percentages that are very close to the nominal ones in almost every situation. TS is very robust, but its rejection percentages depart from the nominal ones. As a final conclusion, the wild bootstrap (WB) statistic seems a very convenient and useful choice to compare regression straight lines in laboratories working with a small number of standards. In addition, its power is comparable to Student’s ttest even in cases where optimal homoscedasticy occurs, and it does not need previous statistical tests to assess normality or homoscedasticity.
■
when different types of WG heteroscedasticity are considered; and an tutorial on use of the WB Statistic (PDF)
ACKNOWLEDGMENTS The Galician Government, “Xunta of Galicia”, is acknowledged for its support to the QANAP group (Programa de Consolidación y Estructuración de Unidades de Investigación Competitiva, GRC2013-047). The financial support of the Spanish Government (Ministerio de Economiá y Competitividad) and Xunta de Galicia (research projects MTM201452876-R and CN2012/130) is also acknowledged.
■
REFERENCES
(1) Ortiz, M. C.; Sánchez, S.; Sarabia, L. Quality of analytical measurements: univariate regression. In Comprehensive Chemometrics: Chemical and Biochemical Data Analysis, Vol. 1; Brown, S. D.; Tauler, R.; Walczack, B., Eds.; Elsevier: Amsterdam, 2009; pp 127−169; DOI: 10.1016/B978-044452701-1.00091-0. (2) Thompson, M.; Lowthian, P. J. Notes on Statistics and Data Quality for Analytical Chemists; Imperial College Press: London, 2011. (3) Andrade-Garda, J. M.; Carlosena-Zubieta, V.; Soto-Ferreiro, R.; Terán-Baamonde, J.; Thompson, M. Classical Linear Regression by the Least Squares Method. In Basic Chemometric Techniques in Atomic Spectroscopy; Royal Society of Chemistry: London, 2013; Chapt. 2, pp 52−122; DOI: 10.1039/9781849739344-00052. (4) Andrade, J. M.; Estévez-Pérez, G. Anal. Chim. Acta 2014, 838, 1− 12. (5) Draper, N. R.; Smith, H. Applied Regression Analysis; John Wiley & Sons: New York, 1998; DOI: 10.1002/9781118625590. (6) Smithson, M. Frontiers in Psychology 2012, 3, No. 231. (7) DeShon, R. P.; Alexander, R. A. Psychological methods 1996, 1 (3), 261−277. (8) Ng, M.; Wilcox, R. R. Br. J. Math. Stat. Psych. 2010, 63, 319−340. (9) Wilcox, R. R. Br. J. Math. Stat. Psych. 1997, 50, 309−317. (10) Wilcox, R. Introduction to Robust Estimation and Hypothesis Testing, 3rd ed.; Elsevier, New York, 2012. (11) Mooney, C. Z.; Duval, R. D. Bootstrapping: A Nonparametric Approach to Statistical Inference. Sage Publications: Newbury Park, CA, 1993. (12) Wehrens, R.; Putter, H.; Buydens, L. M. C. Chemom. Intell. Lab. Syst. 2000, 54, 35−52. (13) Hartmann, C.; Smeyers-Verbeke, J.; Penninckx, W.; Vander Heyden, Y.; Vankeerberghen, P.; Massart, D. L. Anal. Chem. 1995, 67, 4491−4499. (14) Villa, J. L.; Boqué, R.; Ferré, J. Chemom. Intell. Lab. Syst. 2008, 94 (1), 51−59. (15) Afanador, N. L.; Tran, T. N.; Buydens, L. M. C. Anal. Chim. Acta 2013, 768, 49−56. (16) Afanador, N. L.; Tran, T. N.; Buydens, L. M. C. Chemom. Intell. Lab. Syst. 2014, 137, 162−172. (17) Pereira, A. C.; Reis, M. S.; Saraiva, P. M.; Marques, J. C. Chemom. Intell. Lab. Syst. 2011, 105 (1), 43−55. (18) de Almeida, M. R.; Correa, D. N.; Rocha, W. F. C.; Scafi, F. J. O.; Poppi, R. J. Microchem. J. 2013, 109, 170−177.
ASSOCIATED CONTENT
S Supporting Information *
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.analchem.5b04004. Two tables, listing variance patterns associated with residuals and practical examples to compare the behavior of T and WB tests; three figures, showing relative performance of statistics when sample sizes and values of explanatory variable are equal in both calibrations and average p-values and relative performance of statistics G
DOI: 10.1021/acs.analchem.5b04004 Anal. Chem. XXXX, XXX, XXX−XXX