1951
Anal. Chem. 1985, 57, 1951-1955
Decision Tree for Chemical Detection Applications William P. Ashman, James H. Lewis, and Edward J. Poziomek* Research Directorate, Chemical Research and Development Center, Aberdeen Proving Ground, Maryland 21010
A chemometrlcs analysis Is performed that identifies the structural and physlcochemlcai descriptors related to the solld-state fluorescence enhancement of 2-(diphenylacetyi)-l ,&lndandlone 1-(p -(dimethyiamino)benzaldazlne), 11, in the presence of a variety of compounds (Le., fungicldes, pesticides, amino aclds, allphatics). A marked fluorescence enhancement is associated with the ability of a compound that has a minimum absolute length of 7.2 A and appropriate T-T type charge transfer or lipophilic groups to form a molecular complex with I I. A structure-fluorescence enhancement (decision tree) model for predlctlng compound/I I fluorescence actlvlty Is defined. Recommendations are made for the design of indandlone derivatives that can detect a specific compound.
Table I. Compound Structure/Physicochemical Descriptors Analyzed FEATURE NUMBER x1 x2 x3 x4 x5 X6 x7 X8 x9 x10
FEATURE DESCRIPTOR 'TYPE OF HYDROGEN BONO POTENTIAL HYDROGEN BONO CHARGE TRANSFER GROUP I-C=C-I NUMBER OF ATOMS BETWEEN A HYDROGEN BOND GROUP PHENYL RING **LENGTH OF COMPOUND SUBSTITUTION ON BENZENE (I, CI, Br. F) MULTIPLE SUBSTITUTION ON BENZENE (I, CI, Br, F) '"*N. 0, S DIRECTLY ATTACHED TO BENZENE OR LARGE PLANAR RING STRUCTURE PLANAR RING SYSTEM GEOMETRICALLY > BENZENE
x11 x12 X13
STERIC HINDRANCE TO OH OR
X14 X15 X16 X17
c-1-c.
X18
-N=N:
-c=c-
-C=C-OR
1
GROUP OF COMPOUND
C=N-
-P, -1-
PYRIDYL RING PHOSPHORUS HETEROCYCLIC RING SYSTEM HAVING C=C t
0
There is considerable interest in finding reagents and interactions that will be active in solid state for detecting chemicals at low concentrations. Their use would be applied in detection devices such as detector tubes, personal dosimeters, or solid-state coatings for various types of microsensors. Poziomek, Crabtree, and co-workers (1-3) reported on a series of 2-(diphenylacety1)-l,&indandione 1-amine, I (Figure l), derivatives that could be used as solid-state reagents in a simple, specific, and direct test (Figure 2) for the detection of various chemicals. When the indandione associates with certain molecules, a brilliant fluorescence is visible with ultraviolet light. That the fluorescence is due to a weak association of molecules rather than by a very fast chemical reaction was substantiated by chromatography of the fluorescence mixture. Screening studies (1-3) indicated that each indandione reagent has a distinctive fluorescence activity profile. Additionally, the screening results indicated that the detection response varied with the indandione derivative and surface used (2). This paper gives the results of a chemometrics analysis (discriminant and regression analysis) of the fluorescence enhancement activity of compounds interacting with 2-(di-
phenylacetyl)-1,3-indandione1-(p-(dimethy1amino)benzaldazine), I1 (Figure 1). The objectives are to provide structure-activity relationships for predicting fluorescence enhancement of the indandione derivative and to provide guidelines for future research efforts in indandione solid-state detection technology.
EXPERIMENTAL SECTION Materials and Assay. Methods of synthesis of the indandione derivatives, the ultraviolet light detection assay procedures, and a listing of some of the compounds tested have been reported (1-5). Approximately 750 compounds were tested for fluorescence enhancement with 11. The compound/II fluorescence response was qualitatively categorized into one of eight intensity groups (none,very weak, weak, weak-medium, medium, medium-strong, strong, and absorbed). A subset of 206 compounds was randomly selected and used in the structure-activity analysis. The types of chemicals tested included pesticides, amino acids, alcohols, rodenticides, and hallucinogens. Descriptor Identification. A literature search (6, 7) and consultations were conducted to identify compound and indan-
x19 x20
TERMINAL ]OH, -d-OH **"ABSOLUTE LENGTH OF COMPOUND
'0 = NONE 0
1 = OH. NH2.
-4-OH.
O H -P(OH.ACH~ TERMINAL
2 = - ~ - O , - ? - O ~ , ! % ~ NON-TERMINAL *"MEASURED BY COUNTING THE NUMBER OF ATOMS THAT FORM THE LONGEST CHAIN I N A 3-DIMENSIONAL STRUCTURAL MODEL OF THE COMPOUND. **'OCHs AND OH NOT INCLUDE0 """MEASURED BY LONGEST LENGTH IN ANGSTROM UNITS OF A 3-DIMENSIONAL DRElOlNG MODEL OF THE COMPOUND
dione parameters and energy activation and deactivation processes that could cause either a quenching or enhancement of fluorescence activity in the compound/II assay. The objective was to identify which, if any, of the parameters or processes correlated with the resultant fluorescence enhancement. The information obtained was grouped into two descriptor categories: (1) StructurallPhysicochemical,Table I. This descriptor category is defined by parameters that identify the presence or absence of a specific structural component or physicochemical functional group in the compound (i.e., Table I: X3, X12, X14) or by a measurement of geometric distances between specific atoms in the compound (i.e., Table I: X6, X20). The rationale for the measurements was to correlate the fluorescence activity due to the geometric position of a compound's functional groups with the geometric position of specific functional groups located on 11. The measurements were made either by constructing a Dreiding stereomodel of the compound, configuring the model to obtain a maximum length between atoms, and measuring this length in angstrom units or simply by counting the number of atoms linked between specific groups in the compound's structure. (2) CompoundlII Interaction (Figure 3). The two energy deactivation processes of internal and external conversion (6)may be involved with the resultant fluorescence activity observed. Therefore, in order to mimic the possible compound/II interactions that would effect these processes, Dreiding stereomodels of I1 and of its enol tautomer (Figure 3) were constructed. First, these models were evaluated to identify functional groups and regions of I1 where a compound could interact. The group and region types identified included charge transfer areas, areas for hydrogen bonding, and regions that could be defined as lipophilic. Next, because I1 has rotatable bonds in its diphenyl and paminobenzaldazine areas, different three-dimensional configurations were constructed for study.
This article not subject to US. Copyright. Published 1985 by the American Chemical Society
1952
ANALYTICAL CHEMISTRY, VOL. 57, NO. 9, AUGUST 1985 n
0
N
N I
N II CH
I
u I
(A)
Flgure 1. (A) I and (8)11. MOLECULAR ASSOCIATION
Y I
R - C - R' t nmX
r y 1 1
I
R - C - R'qXm
I
hv
DETECTOR REAGENT
I
I
__.f
z CHEMICAL COMPOUND
ROOM TEMP
z L --Jn DETECTION SIGNAL
with several molecules of I1 or compound to cause the enhanced fluorescence. Therefore, the possible stoichiometry of the interaction was identified and used as a variable. Finally, the above procedure was reiterated by using the various configurations of 11. Each was evaluated to identify the configuration that gave optimum discrimination and to establish the rules for defining compound/II interaction. Figure 3 illustrates various regions and stoichiometry by which a compound could interact or associate with 11. Five regions (A through E) were selected for analysis. The configuration selected for final analysis was an enol tautomer (Figure 3) that was essentially planar with an equidistance of 7.2 A between the center of the phenyl ring of region E to the center of the phenyl ring of region A closest to region E and to the center of the sixmembered ring of region C. This enol tautomer was selected because it had additional functional groups that could be involved with interactions of the compound with 11. The rules that were used to define the regions and the compound/II interaction or association were the following: 1. Region A is the diphenyl group region. In order for a compound to associate in this area, it must have a lipophilic group with a minimum of a three carbon chain or a -C=C- group in its structure. If a -C(O)O- group is present, there must be at least two carbon methylene groups attached (e.g., 4-C-C(0)O-). The compound is positioned on region A by superimposing its lipophilic or -C=C- group on the phenyl ring of A positioned closest to region D. 2. Region B is defined by the
COLOR OR FLUORESCENCE
Figure 2. Chemical compoundlindandionemolecular association effect that produces fluorescence.
area, A compound associates with I1 in this area if it contains a functional group (e.g., -C(O), -P(O), -OH, -S(O)) that has a potential for hydrogen bonding with the -OH or -C(O) of region B. 3. Region C consists of the OH
group. The same rules for association are used as for region A. 4. Region D consists of the -C(O)CC=NN=Carea. A compound interacting at this area must be planar (e.g., phenyl or naphthalene ring). Nonplanar groups are considered structurally hindered to interact at this region. A compound/II charge transfer interaction may result. 5. Region E consists of the
I
CH3-N-CH3
DETECTOR (DIMER)
DETECTOR - CHEMICAL AGGREGATION
Flgure 3. Potential compound/II interactions.
Theoretically, a rigidity of the structure of I1 would decrease the energy loss due to internal conversion and thereby increase the potential for fluorescence enhancement. Therefore, the models of I1 were positioned so that the phenyl rings, the indandione portion, and the p-aminobenzaldazine region were planar. Second, in order to define the parameters for this descriptor category, a Dreiding stereomodel of each compound used in the analysis was constructed. This stereomodel of the compound was superimposed on the stereomodel of I1 and the region or regions of possible interaction or association were identified. Regions of the compound and I1 having potential for hydrogen bonding were superimposed onto each other. Similarly, the charge transfer or lipophilic regions of the compound would be positioned on corresponding areas of 11. Third, identified during this procedure were areas of steric hindrance and/or lipophobic areas that would interfere with the association or interaction of the compound with 11. Additionally, more than one molecule of I1 or of the compound could associate
group. The same rules as for region A are used. 6. It is possible for a compound to associate simultaneously with more than one region of 11, A compound is defined as interacting with both region A and D simultaneously if it first is able to associate with A and if after being positioned on A, its remaining structural components which are not on A can overlap on region D. If the components that overlap on D contain a -C=C- or a minimum of three connected methylene groups, the compound is defined as associating with A plus D. 7 . A compound interacts with region A plus E (region C plus E), if after associating with regions A or C, it has sufficient length to overlap on E. (Because of the equidistance between regions A and E and regions C and E, and of the same rules for compound/II association with A or C, a compound associating with regions A and E must also be able to associate with regions C and E. Therefore, in the analysis, it does not matter which one is tested. However, since the actual three-dimensional configuration of I1 was not known, although equivalent, both possible interactions were noted.) If the compound components that overlap on E consist of a -C=C- or -C=N- that can interact with E by a potential charge transfer or consist of a straight chain methylene group having a minimum of three carbons, the compound is categorized as
ANALYTICAL CHEMISTRY, VOL. 57, NO. 9, AUGUST 1985 NUMBER OF COMPOUNOS
TOTAL
TRAINING ITRNGI SET
COMPOUND STRUClURE
ACCEPTABLE (ACC) FLUORESCENCE l F L l
UNACCEPTABLE IUNACC) FLUORESCENCE ( F L )
34
120
17
35
154
TEST SET
52
1953
I
MAXIMUM LENGTH
SOME FL
NO FL
c,
POTENTIAL FL 1
CLASSIFICATION MATRIX/JACKNIFE TECHNIOUE e
P
% CORRECT
UNACC TRNG
90.0
ACC TRNG
88.2
UNACC TEST
91 4
NUMBER OF COMPOUNDS CLASSIFIED INTO GROUP UNACC TRNG ACC TRNG 108 4
ACC TEST
94.1
32 1
TOTAL
90.3
145
12
30 3
16
61
DISCRIMINATION FEATURES IDENTIFIED
POTENTIAL COMPOUND COMPLEX TO DIPAIN AREA W E ) OR lC+E) POTENTIAL COMPOUND COMPLEX TO DlPAlN AREA (A+Dl
c
PRESENCE
0 I1
11
POTENTIAL FL
z
i
I 3.
Rs0.S
X.Y.2 CONTAIN S
FER RING SYSTEM
b
a
J
0
2 -P-OH
0 I1
OF TERMINAL COH OR POH GROUP
Flgure 4. Example of compound/II discrimination analysls.
interacting with regions A plus E (C plus E). If the components of the compound that overlap on E contain a -C(O)OH, -P(O)OH, -C(O)O-, or -S(O)O-, it is categorized as not interacting with A plus E (C plus E). Chemometrics Analysis. Each of the 206 chemicals was characterized and each descriptor types was encoded into a numberical value (1,O) for the presence (1)or absence (0) of the descriptor in the compound. If a measurement was defined as the descriptor, the numerical value of the measurement was used. A data matrix was then formed that consisted of the chemical identification, the descriptor value, and the fluorescence intensity. This data matrix was then analyzed by multiple regression techniques. Additionally, the 206 compound data set was separated into two classes that were defined as to the compound’s fluorescence intensity. These classes were unacceptable fluorescence (none to medium fluorescence enhancement, and absorbed) and acceptable fluorescence (medium-strong and strong). Computeraided discriminant analyses (8-10) were made using the BMLIP‘IM program (8) which has the option to classify each compound using the Lachenbruch (9) or “jacknife” procedure.
RESULTS AND DISCUSSION Fluorescence Enhancement. Multiple regression analysis failed to result in a good correlation for any of the descriptors in predicting the individual fluorescence intensity of the compounds. Because the experimental procedures that defined the intensity levels were “qualitative” (simply a visual comparison), it was decided to separate the compounds into the two fluorescence classes (acceptable and unacceptable) and to analyze the data using discriminant analysis techniques. Figure 4 is an example of a result that is obtained by a single computer-aided stepwise discriminant analysis. The 206 compound data matrix is divided into two sets: a training set (154 compounds) that is used to evaluate and determine the discrimination values for each descriptor for its ability to separate the two fluorescence classes; and a test set (52 compounds) that is used to determine the prediction ability of the discrimination values established in the analysis of the test set. In this example, the analysis identified three descriptors as being important for separating the two classes: (1) the ability of the chemical to associate with regions A plus E (C plus E) of I1 simultaneously, (2) the ability of the compound to associate with regions A plus D simultaneously, and (3) the presence of a terminal carboxyl or phosphoryl group in the compound. The prediction ability of the analysis is based on the percentage of compounds that are correctly categorized into the corresponding fluorescence classes. In the test set, 32 of the 35 unacceptable fluorescence (UNACC) and 16 of the 17 acceptable fluorescence (ACC) are correctly classified. This is
@=POSITIONAL
FIT OF COMPOUND TO COMPLEX WITH AREAS IAtEI OR IAtDi OR lC+El
Flgure 5. Decision tree: compound/II fluorescence enhancement.
a correct classification rate of 92.3% for the test set. In order to verify the methodology and identify the descriptors most important for predicting resultant fluorescence intensity, five separate analyses using different sets of chemicals in the training and test sets were performed. The correct classification percentage rates were 92.3,91.1,85.7,86.5, and 89.6%. In all cases, the descriptor that was the best in discriminating and in predicting the fluorescence enhancement was the ability of the chemical to associate simultaneously with regions A plus E (regions C plus E) of 11. Decision Tree. Figure 5 is a decision tree (11, 12) developed from the chemometrics analysis of compound/II fluorescence activity. It can be used in classifying the compound/II fluorescence activity of untested compounds. The following guidelines are suggested in using the tree. (An analysis to predict the fluorescence enhancement of DDT, illustrated in Figure 5, is used as an example to define the decision rules.) First, construct EL three-dimensional Dreiding stereomodel of the compound. Orient the compound model until minimum steric hindrance is obtained. Measure the maximum absolute length that can be determined. If the length is