ac research
Anal. Chem. 2009, 81, 9841–9848
Perspective Acceptance Criteria for Method Equivalency Assessments Marion J. Chatfield* and Phil J. Borman GlaxoSmithKline Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, U.K. Quality by design (ICH-Topic Q8) requires that process control strategy requirements are met and maintained. The challenging task of setting appropriate acceptance criteria for assessment of method equivalence is a critical component of satisfying these requirements. The use of these criteria will support changes made to methods across the product lifecycle. A method equivalence assessment is required when a change is made to a method which may pose a risk to its ability to monitor the quality of the process. Establishing appropriate acceptance criteria are a vital, but not clearly understood, prerequisite to deciding the appropriate design/sample size of the equivalency study. A number of approaches are proposed in the literature for setting acceptance criteria for equivalence which address different purposes. This perspective discusses those purposes and then provides more details on setting acceptance criteria based on patient and producer risk, e.g., tolerance interval approach and the consideration of method or process capability. Applying these to a drug substance assay method for batch release illustrates that, for the equivalence assessment to be meaningful, a clear understanding and appraisal of the control requirements of the method is needed. Rather than a single exact algorithm, the analyst’s judgment on a number of aspects is required in deciding the appropriate acceptance criteria. The setting of appropriate acceptance criteria for method equivalence assessments can be challenging but is key to ensuring control strategy requirements are met. They are a vital prerequisite to deciding the design and sample size of the equivalency study. The use of the two one-sided tests procedure1,2 (TOST) is frequently applied when assessing mean equivalence; however, * To whom correspondence should be addressed. E-mail:
[email protected]. Tel: +44 (0)1438 762228. Fax: +44 (0)1438 764414. (1) Schuirmann, D. J. J. Pharmacokin. Biopharm. 1987, 15, 657–680. 10.1021/ac901944t CCC: $40.75 2009 American Chemical Society Published on Web 11/19/2009
the establishment of acceptance criteria is not always clearly understood. For example, Chambers et al.3 report that predefined acceptance criteria of ±2% are typically used for assay methods (no distinction is made between whether this relates to drug product or drug substance although the authors have observed this commonly applied to drug substance4). As the typical drug substance specification range for an assay is also ±2% (usually 98.0-102.0%), such acceptance criteria would be highly inappropriate. Even if a compound is 100% pure, an accuracy shift of 2% in the method would result in a large proportion of batches being misclassified outside of the specification, especially as a high-performance liquid chromatography (HPLC) assay typically has a % relative standard deviation (RSD) of not better than 0.5%.5 Borman et al.6 explain that methods should meet performance criteria that are linked to the process monitoring and control (specification) requirements. The authors of this article also suggest such criteria could offer the opportunity for much greater regulatory flexibility in the future as they could potentially be registered instead of a specific analytical method. These criteria should include equivalency acceptance criteria especially to support changes made to methods across the product lifecycle. Concepts within this perspective will also be helpful in the initial definition of the method performance criteria. For acceptance criteria to result in a meaningful equivalence assessment, a clear understanding and appraisal of the control requirements of the method is needed. In this perspective, various approaches to setting acceptance criteria taking into account both risk to the patient and the producer have been described and (2) Limentani, G. B.; Ringo, M. C.; Ye, F.; Bergquist, M. L.; McSorley, E. O. Anal. Chem. 2005, 77, 221A–226A. (3) Chambers, D.; Kelly, G.; Limentani, G.; Lister, A.; Lung, K. R.; Warner, E. Pharm. Technol. 2005, 29, 64–80. (4) Ermer, J.; Miller, J. Method Validation in Pharmaceutical Analysis, 1st ed.; Wiley-VCH: Weinheim, Germany, 2005; p 16, p 274, pp 293-296. (5) Hofer, J. D.; Olsen, B. A.; Rickard, E. C. J. Pharm. Biomed. Anal. 2007, 44, 906–913. (6) Borman, P.; Chatfield, M.; Nethercote, P.; Thompson, D.; Truman, K. Pharm. Technol. 2007, 31, 142–152.
Analytical Chemistry, Vol. 81, No. 24, December 15, 2009
9841
Figure 1. Mean equivalence testing.
exemplified using an active pharmaceutical ingredient (API) assay method case study. Derivation of acceptance criteria more appropriate than those historically used (±2%) will be discussed. These are based on analytical judgment on a number of aspects rather than a single exact algorithm. PURPOSES OF EQUIVALENCE TESTING Equivalence Compared to Validation. Equivalence testing is performed where there is a need to demonstrate similarity of two methods, whereas validation demonstrates the quality of a single method. Two methods could be validated as either being acceptable to meet their intended purpose but a change from one method to another might make that purpose no longer achievable. This occurs when data from the new method is to be compared against data from the previous method, e.g., in data trending, or where the data are compared against criteria derived from the first method, e.g., impurity specification limits. Chambers et al.3 provide further discussion on validation versus equivalency and when demonstrating equivalency is of value. True equivalency testing provides sufficient data/evidence to demonstrate equivalence, e.g., a TOST approach where equivalence criteria are set (designated as ±θ),2 and a confidence interval for the mean difference between methods is required to lie within ±θ for mean equivalence to have been demonstrated (illustrated in Figure 1). Equivalence and Comparability. Chambers et al.3 wrote that equivalency is a subset of comparability, with the goal of method equivalency being to demonstrate acceptable method performance by comparison of a specific set of results. The authors restrict or clarify this further by using the term equivalence to indicate a formal statistical demonstration of similarity. In some situations, a risk-based approach may indicate that rather than demonstrating equivalence, all that is required is some assurance that the minimally changed or transferred method is not giving dissimilar results, e.g., in technology transfer the knowledge of the method has been transferred from one set of analysts to another. To use the terminology of Hauck et al.,7 these studies set limits to identify “unusual characteristics/values”. Ermer4 considers that the most important practical risk in a transfer is not a rather small bias but misinterpretation or lack of sufficient detail in the control test description. Assurance of satisfactory transfer may be achieved via a specific study designed to detect large changes such as might be expected if the method has been misinterpreted. However, if (7) Hauck, W. W.; Abernethy, D. R.; Williams, R. L. J. Pharm. Biomed. Anal. 2008, 48, 1042–1045.
9842
Analytical Chemistry, Vol. 81, No. 24, December 15, 2009
Figure 2. Overall distribution of measured batch results (LSL ) lower specification limit, USL ) upper specification limit).
a smaller but practically important bias were to exist, such a study would often fail to detect it. In these low risk situations, ongoing monitoring/verification of the method could provide assurance of equivalence in the longer term and data collected on transfer could be just part of the knowledge transfer and training providing evidence of comparability. Ermer describes a comparability approach based on statistical considerations for the transfer of methods.4 The acceptance criteria for the difference between laboratory means are chosen to allow for the maximum acceptable analytical variability and the size of the chosen study. The approach ensures that the difference between laboratory means is consistent with the minimum process capability required given the size and design of the study. It does not confirm any bias between laboratories meets control requirements. An equivalence approach for method transfer is described by de Fontenay8 that, despite setting criteria on the mean difference (as opposed to the confidence interval uses in the TOST approach), aims to ensure e5% of studies pass the criteria if the true bias is unacceptably high (though no method is given for deciding what is unacceptable). The criteria are set according to the method variability, the number of analytical runs, and the number of replicates within a run. The approach aims to provide a simpler tool for analysts than the TOST approach through the use of a reference table instead of calculating confidence intervals. However, the downside is that it assumes the evaluation of method variability during validation is very reliable and is restricted to simple, though common, designs. Equivalence Focusing on Accuracy. The acceptance criteria considered here are primarily those related to the accuracy of the method, i.e., when assessing mean equivalency of methods. Extensions to other criteria, e.g., precision, are mentioned in the Discussion. Choosing acceptance criteria requires establishing the smallest mean difference, or bias between methods, which is practically important. This involves consideration of the purpose(s) of the method and accuracy requirements, and as Chambers et al.3 indicate this can be a challenging and daunting task. It is to be emphasized that this involves looking at a mean difference or overall bias of a method and not individual values. Figure 2 shows the effect a bias might have on the overall distribution of measured batch results even accepting that the magnitude of the bias may not be large compared with the uncertainty related to an individual result. Accuracy of an analytical method forms part of the method performance criteria. (8) de Fontenay, G. J. Pharm. Biomed. Anal. 2008, 46, 104–112.
Equivalence: “What Must Be Achieved” versus “What Can Be Achieved”. Chambers et al.3 discuss it is important to differentiate between what must be achieved to fulfill method equivalence requirements and what can be achieved. Acceptance criteria based on what must be achieved are the widest limits which can be chosen to demonstrate equivalence. However, the literature suggests that narrower limits may be chosen if it is expected that these should be achieved. Typically this will be where a method has minor modifications or a change of operating environment (e.g., technology transfer). Historical information provides knowledge on what is generally achievable for the method and such limits are used to ensure a re-evaluation of the change if it causes a mean change larger than that expected. Limentani2 suggests a process whereby an initial θ is chosen together with a sample size and upper confidence interval for an estimated variance, but then on the basis of the variance of the transferring site data, the θ is changed to that likely to be met with the current size of study. Problems with this approach are that (i) the transferring site data are not usually sufficient to establish what typically can be achieved, so if the estimated variance is small, θ will be unrealistically and unnecessarily tightened resulting in a higher chance of the equivalence study failing; (ii) if the estimated variance is large, the recalculated θ will be larger than a practically important difference and the appropriate action is actually a larger study size. One practice used to set acceptance criteria is to take a typical study design/size and from knowledge of the method variability establish what acceptance criteria the study has high probability of meeting. As Schwenke9 indicates, this is neither informative nor satisfactory. While this may establish narrower limits, it is still essential to ensure that these limits do not exceed the method equivalence requirements. Hauck et al.7 propose that acceptance criteria fall into one of two classes, those identifying “unacceptable characteristics/values” (corresponding to “what must be achieved”) and those identifying “unusual characteristics/values” (more akin to “what can be achieved”). This perspective focuses on establishing what must be achieved. Even if it is intended to apply a “what can be achieved” approach, it is necessary to know that this is acceptable. APPROACHES TO CHOOSING MEAN EQUIVALENCE ACCEPTANCE CRITERIA ILLUSTRATED THROUGH AN API ASSAY METHOD CASE STUDY Focusing on Patient and Producer Risk. Focusing on patient and producer risk provides clarity of purpose to setting equivalence acceptance criteria. As Chambers et al.3 mention, the new method needs to provide data that continues to support previously established specifications. Thus, focusing on the acceptance/rejection of batches provides a useful means of evaluating risk. Assessing analytical capability provides a means of assessing patient risk, which is of utmost importance in applying a “what must be achieved” approach while knowledge of the process capability is useful to establish producer risk. The calculation and evaluation of risks is illustrated through an API assay method. (9) Schwenke, J. R.; O’Connor, D. K. J. Biopharm. Stat. 2008, 18, 1013–1033.
Figure 3. Risk of misclassifying a single batch with respect to the lower specification limit (LSL) due to the analytical method.
Figure 4. Evaluation of process data.
Misclassification Risks: Introduction. Analytical variation means that, though a batch may be truly inside or outside the specification, its measured result may fall to the other side and thus the batch will be misclassified. Figure 3 illustrates that, for batches close to specification limits, the risk of misclassification increases (a region of uncertainty). API Assay Method: Collation of Relevant Data. Data on API produced from both the research and development (R&D) pilot plant and the manufacturing site were collated. These data, which are fairly typical of the limited amount available in development, were used to estimate variation due to the process and analytical method which will be required to assess patient and producer’s risk. The process mean was estimated from five manufacturing batches (see Figure 4) since there was possibly a small decrease in the assay between R&D and the manufacturing site. This was most likely to be representative of the process mean going forward, and being closer to a specification limit will give more conservative acceptance criteria. As there was no reason to expect the process variability to differ between sites and noting that to get a good estimate of the standard deviation (SD) requires more data than a good estimate of the mean, the variability within each site was estimated and pooled. This gave an estimated manufacturing process mean of 99.26% with a total (process and measurement) SD of 0.51%. The predicted distribution of future batches (golden curve) and their measured values (green curve) is shown in Figure 5. An estimate of the analytical variability was obtained from a Analytical Chemistry, Vol. 81, No. 24, December 15, 2009
9843
Figure 5. Predicted distribution of future manufacturing batches (assuming normality).
Figure 6. Batches at risk of misclassification.
ruggedness study performed on the analytical method. This gave a standard deviation (σa) incorporating intermediate precision type sources of variability of 0.40%. ASSESSING PATIENT RISK Acceptable PTOL Approach. One possibility for evaluating patient risk is to examine the effect of an analytical bias on the capability of an analytical method. The capability of a method is usually summarized by the 2-sided PTOL (precision to tolerance ratio10), which is defined as (6σa)/(USL - LSL), where σa is the analytical method standard deviation. However, this ignores where the process is running and thus does not account for any analytical bias (also noting that many specifications are 1-sided and thus cannot be directly applied). Analogous to process capability, the authors have defined a 1-sided PTOL that compares the analytical method variation to the gap between the process mean and specification. PTOL1-sided ) minimum of (USL
3σa - process mean)
or
(process
3σa mean - LSL)
For analytical methods, generally PTOL should be e0.3 and ideally e0.1. One way of assessing patient risk is to identify the largest analytical bias which still satisfies PTOL1-sided e 0.3. The PTOL1-sided for the case study is (3 × 0.40)/(99.26 - 97.5) ) 0.7. Thus 0.3 is already exceeded even for no method bias. Note that the analytical variation (SD ) 0.40%) and PTOL value are not untypical for assay methods and are generally accepted. Hofer5 et al. mentions estimates of the analytical % RSD for assay ranging from 0.6 to 1.1% suggesting that this approach is unlikely to be useful for assay methods. Region of Uncertainty Approach. An alternative approach examines the effect of analytical bias on the region just outside a specification limit where a batch may be misclassified as acceptable due to analytical bias or variation. First a region of uncertainty is established around each specification limit assuming the analytical method is accurate. The region of uncertainty outside the specification limits represents the patient risk. This is (10) Majeske, K. D.; Hammett, P. C. J. Manuf. Processes 2003, 5, 54–65.
9844
Analytical Chemistry, Vol. 81, No. 24, December 15, 2009
Figure 7. Effect of upward analytical bias on region of uncertainty.
illustrated in Figure 6 for the API assay method (assuming no analytical bias). The region of uncertainty was defined as ±3σa (assuming a normal distribution and an accurate method, 99.7% of measurements will lie within ±3σa of the true value). The smaller the analytical variability, the smaller is the region of uncertainty. With the use of the estimated analytical SD of 0.40, there will be a high confidence that batches with true means of e96.3% (-1.2% ) -3σa) will be measured outside the LSL (97.5%) with the original method. The second step is to account for a potential analytical bias of a new/changed method and add this to the region of uncertainty calculated for an accurate method (see Figure 7). The true mean of a batch which will be confidently rejected for various upward biases is shown. The final step is to establish how large the region of uncertainty/misclassification can be and still provide acceptable patient risk. In some situations, definitive specification limits will exist denoting the quality which at minimum the patient should receive. Contained within those specifications may be tighter internal specifications used for release (or perhaps resulting in a requirement for more precise analytical measurement). In the assay example, there are no definitive specification limits representing actual content of API acceptable to the patient as opposed to specifications for measured results. However examination of API content specifications for drug product (DP) provides limits for evaluating the API method. From a patient perspective, batches which have true content of 95% or even 93.5% would seem acceptable: typical LSL ) 95% in the European Union (EU)4 while a drug product batch will only be confidently rejected if its true
mean e 93.5% (given drug product content typically has an analytical % RSD of around 0.5%). Thus from a patient perspective, a bias in the API analytical method of 1.3% would seem likely to be acceptable and possibly even as large as 2% (corresponding to batches of e95% and e94.3%, respectively, being confidently rejected). Note the U.S. LSL is typically lower (90%). There will always be an uncertain region around a specification limit applied to measured results. If the limit truly important to the patient is unknown, establishing how large this region can be is difficult. While the acceptability of the region of uncertainty should be evaluated against specific patient requirements, an idea of generic regions could be gained by relating them to analytical capability. For 2-sided specifications, the uncertainty region can be defined as a % relative to the specification range, i.e., uncertainty region ) 100% × [(3σa)/(USL - LSL)] ) 100% × [PTOL2-sided/2] (from definition of PTOL2-sided). In the API assay example, batches within 1.2% absolute of the LSL were at risk of misclassification. The relative percentage of the specification range is calculated as follows: [1.2/(102 - 97.5)] × 100% ) 26.7%. With the use of that relationship, general requirements for PTOL and in the absence of more specific requirements, a region of e5% (corresponding to a PTOL of 0.1) might be considered very good while a region g15% (corresponding to a PTOL of 0.3) might be considered unacceptable. The effect of an analytical bias is then taken into consideration. If the equivalence acceptance criterion is say 10% of the specification width, then a new/changed method could measure values with a bias almost as high as this. Thus, if a method has a PTOL of 0.3 this means that the true value has to be 25% outside (15% + 10%) to be strongly confident that the batch will be rejected. This comparison to the specification range does not directly translate to 1-sided specifications, though for those with just USLs, the percentage above the USL may be considered. Probability of Erroneous Acceptance Approach. Another approach,11 which uses the analytical variation but provides a different viewpoint, examines a batch value a certain distance from the specification and calculates the probability of it being erroneously accepted. The effect of biases in the analytical method on this probability is assessed. Deciding what batch values are important to classify correctly is key to assessing patient risk. In the first example discussed by Kringle,11 the release testing procedure for drug product had a specification of 95-105%, and Kringle examined the effect of biases on “marginal batches” of 90 and 92%. In the second example, the drug substance specification was 98-101.5% and batches of 98%, 97%, and low batches in the range 90-95% were considered. It is not clear how Kringle chose the batch values to be evaluated. In the API assay case study, the effect of a bias on a drug substance batch with true means of 95% (representing the tightest drug product specification, see above) and 96.5% (i.e., 1% below specification akin to Kringle’s evaluation of the 97% drug substance batch) were examined (see Figure 8). These show that for a true batch of 95%, a method bias of 1-1.5% can be tolerated, but for a batch of 96.5% even for a relatively small analytical bias (0.5%), the probability of misclassification is quite high (0.1). (11) Kringle, R.; Khan-Malek, R.; Snikeris, F.; Munden, P.; Agut, C.; Bauer, M. Drug Inf. J. 2001, 35, 1271–1288.
Figure 8. Effect of upward analytical bias on erroneous acceptance.
Figure 9. Tolerance interval approach.
ASSESSING PRODUCER’S RISK While the patient risk concentrates on how far a single batch might be outside the specification before it is reliably classified as being out of specification, for a producer, a single batch is less problematic, and of more interest is the proportion of batches which might be erroneously misclassified. This requires knowledge of the process mean and variation. Schwenke9 suggests that production batch data can be used to quantify the expected variation, and that limits can be estimated that define the range of acceptable batches, which in turn can be used to define the acceptance criteria. However, specific details are not given. Tolerance Interval Approach. U.S. Pharmacopeia (USP)12 (General Chapter 1010) suggests an approach which calculates a tolerance interval for the process batch data. A maximum equivalence acceptance criteria is given by the gap between the tolerance interval and the nearest specification limit (Figure 9). This approach can only be applied if the tolerance interval is sufficiently contained within the specification limits. A decision is required by the producer as to what proportion of batches (p) the tolerance interval should contain and what confidence (c) is required. The USP mentions using p ) 95% and c ) 95% but also indicates some laboratories use p and c (12) U.S. Pharmacopeia General Chapter (1010). Pharmacopeial Forum 2004, 30, 236-263.
Analytical Chemistry, Vol. 81, No. 24, December 15, 2009
9845
Figure 11. Probability calculator in Statistica.
measured below the LSL. Using tighter equivalence acceptance criteria of ±0.5%, this figure is 0.64% (from 0.025% for no bias). Figure 10. (a) Batches misclassified outside specifications and (b) effect of bias on batches misclassified/measured outside specifications.
) 99%. For the case study, a tolerance interval containing 99% batches with 95% confidence based on the estimated process mean and variance and based on 16 batches is 97.75% to 100.76% (note this uses typical calculations which use mean and variance estimates from the same data and will be slightly too narrow as the mean is actually only estimated from 5 batches). This would give equivalence acceptance criteria of 0.25% to ensure that a method with this bias still classifies at least 99% of the production batches as within specification. Batches Outside or Misclassified Outside of Specification Approach. An alternative approach is required when process batch data is close to or overlaps specification limits (though clearly this is inconsistent with a successful implementation of quality by design13). Here the effect of a method bias on batches misclassified outside or measured outside specification is examined. If the proportion of batches measured outside the specification limits for the original method is not minimal (Figure 10a) the proportion misclassified outside specification should be calculated (a calculation involving numeric integration). If minimal, the percentage of batches predicted to be measured outside a specification limit given a certain analytical bias can be used instead (easy to calculate assuming a normal distribution). The effect of various biases can be examined (Figure 10b). Applying this approach to the case study the proportion of measured values below the LSL was estimated using the probability calculator in Statistica14 software. The proportion of future batches predicted to be measured below the LSL, Figure 11, is very small 0.000252 (0.0252%) and thus the effect of biases on measured batch values could be used. Biases of -0.5%, -1%, or -1.5% (literature suggest values as large as 2%) were examined. From Figure 12 it can be seen that equivalence acceptance criteria of ±1.5% might result in a method with 30% of batches (13) ICH. ICH-Topic Q8(R1): Pharmaceutical Development, 2008. (14) Statistica software, www.statsoft.com.
9846
Analytical Chemistry, Vol. 81, No. 24, December 15, 2009
SETTING ACCEPTANCE CRITERIA API ASSAY CASE STUDY CONCLUSION Values suggested in the literature range from 0.5% to 2% which in the context of a typical specification range of 98-102% represents rather diverse practice. Chambers et al.3 indicate that many companies use widely held conventions with typical acceptance criteria being ±2% for the assay, content uniformity, and ±5% for the dissolution assay (an “orientational” value of ±2% is also given for a drug substance equivalence by Ermer4). For a technology transfer study of common design and simple comparison of means against acceptance criteria, Ermer4 gives acceptance limits for difference between laboratory means of 1.15%. Kringle11 uses acceptance criteria of -1% to +0.5% in one example. In these examples it is not clear whether they apply solely to a situation considered very low risk, e.g., technology transfer, minimal modification to the method where a comparability approach might be appropriate or to a larger method change where formal demonstration of equivalence of accuracy is required. From the evaluation of patient risk, equivalence acceptance criteria of ±1.3% would seem justifiable for the case study based on the EU requirements for drug product. However, after consideration of producer risk, tighter acceptance criteria of ±0.5% seems more appropriate (note a bias of 0.6% would result in more than 1% of batches being measured outside the specification limit). Given this case study is fairly representative of drug substance assay methods, this suggests that the commonly quoted acceptance criteria of ±2% is far too wide and that even tighter acceptance criteria quoted in the literature of around ±1% fails to ensure adequate change control. The authors believe this is because approaches used often look at what can be achieved given the variability of the method and a typical design/sample size, without discussion of what level of equivalence must be achieved. Alternatively, values may be suggested based on historical precedence or commonly accepted conventions from other applications without an apparent scientific justification.
Figure 12. Effect of analytical bias on measured batch values.
DISCUSSION As discussed, there is no exact approach to the setting of equivalency acceptance criteria. A clear understanding and appraisal of the control requirements of the method is first required. The principles outlined in this perspective can then be applied to ensure both patient and producer risks are taken into consideration. With the advent and evolution of quality by design,13 the purpose of analytical methods is undergoing a change characterized by reduced focus on controlling quality via release testing and a corresponding move toward a control strategy that ensures process requirements are met based on input and in-process controls. In these circumstances, analytical methods would still be required for a set period or longer to verify that the process controls identified are solely responsible for the product quality. The approaches and case study discussed above address the situation where data are compared against criteria derived from the first method, e.g., assay or impurity specification limits. Assay methods may have other control requirements such as trending of data, stability assessment, or assessment of content uniformity. While an equivalence assessment should still focus on equivalence of results, the control requirements and decisions being made on the basis of those results may end in the acceptance criteria being different. An example given by Kringle et al.11 evaluated the effect of a bias on the stability shelf life in the case where the method has been transferred from one location to another. The example given indicates that biases of around 5% could be accepted for zero true loss over the 12 months. An assessment of various amounts of expected degradation would be needed to address patient risk. Note it does not address the situation where a method is changed during the conduct of stability studies. The control requirements for assay methods used to assess content uniformity are likely to differ from those for a release method. If the same method is to be used for API release and content uniformity, it may be that satisfying equivalence for
release requirements result in those for content uniformity being satisfied but both need consideration. Note that as for release testing, the focus should be on the measured results and not on establishing whether a new and old method agree on how many tablets are in or out of specification. A “decision equivalent approach”15 would provide a much less powerful assessment of equivalence and should only be used in situations where methods only provide a pass/fail result (e.g., identity or limit test). This perspective concentrates on acceptance criteria for mean equivalence since validation currently treats accuracy and precision separately, and in terms of equivalence, accuracy is usually most important. However, Hoffman et al.16 advocate a total error approach for validation which combines accuracy and precision. The total error approach reflects how large a measurement error can be, for example, a method which has greater precision can tolerate a larger bias in accuracy. There will be challenges in adopting this approach. However, should it be applied for validation, then equivalency based on the risk to batch classification is likely to require a similar approach. Note that for data trending, the focus may be on mean levels in which case accuracy will still remain of prime importance for equivalency assessment. CONCLUSIONS The establishment of acceptance criteria is a necessary prerequisite to deciding the design/sample size of an equivalency study and requires careful attention. By taking into account the aspects relating to both the patient risk and the producer risk, an analyst will have a clearer understanding of how to set appropriate acceptance criteria for equivalence and thus the assessment will be a meaningful evaluation of whether methods are equivalent. In this perspective, an assessment of patient and producer risk has been performed for an API assay method and an acceptance criteria of ±0.5% has been justified. The authors believe this is more appropriate than a generic limit of ±2.0% that has been used in the past. (15) Hauck, W. W.; DeStefano, A. J.; Cecil, T. L.; Abernethy, D. R.; Koch, W. F.; Williams, R. L. Pharmacopeial Forum 2009, 35, 772–778. (16) Hoffman, D.; Kringle, R. Pharm. Res. 2007, 24, 1157–1164.
Analytical Chemistry, Vol. 81, No. 24, December 15, 2009
9847
Once appropriate criteria have been set, this can then be used to establish what sample size (in conjunction with complexity of design) is required to perform the equivalency study.17 Specific method performance criteria rather than generic validation requirements will greatly assist in this task. (17) Borman, P. J.; Chatfield, M. J.; Damjanov, I.; Jackson, P. Anal. Chem. DOI: 10.1021/ac901945f.
9848
Analytical Chemistry, Vol. 81, No. 24, December 15, 2009
ACKNOWLEDGMENT The authors would like to thank James Coggins for his assistance in the production of figures. Received for review August 28, 2009. Accepted October 19, 2009. AC901944T