Classification of Spent Reactor Fuel for Nuclear Forensics - Analytical

May 8, 2014 - With particular interest to nuclear reactors and the radioactive materials that they produce, there are a number of different materials ...
0 downloads 0 Views 895KB Size
Article pubs.acs.org/ac

Classification of Spent Reactor Fuel for Nuclear Forensics Andrew E. Jones,† Phillip Turner,‡ Colin Zimmerman,§ and John Y. Goulermas*,† †

Department of Electrical Engineering and Electronics, University of Liverpool, Brownlow Hill, Liverpool, L69 3GJ, U.K. AWE, Aldermaston, Reading, Berkshire RG7 4PR, U.K. § The UK National Nuclear Laboratory, The Central Laboratory, Sellafield, Seascale, Cumbria CA20 1PG, U.K. ‡

ABSTRACT: In this paper we demonstrate the use of pattern recognition and machine learning techniques to determine the reactor type from which spent reactor fuel has originated. This has been done using the isotopic and elemental measurements of the sample and proves to be very useful in the field of nuclear forensics. Nuclear materials contain many variables (impurities and isotopes) that are very difficult to consider individually. A method that considers all material parameters simultaneously is advantageous. Currently the field of nuclear forensics focuses on the analysis of key material properties to determine details about the materials processing history, for example, utilizing known half-lives of isotopes can determine when the material was last processed (Stanley, F. E. J. Anal. At. Spectrom. 2012, 27, 1821; Varga, Z.; Wallenius, M.; Mayer, K.; Keegan, E.; Millet, S. Anal. Chem. 2009, 81, 8327−8334). However, it has been demonstrated that multivariate statistical analysis of isotopic concentrations can complement these method and are able to make use of a greater level of information through dimensionality reduction techniques (Robel, M.; Kristo, M. J. J. Environ. Radioact. 2008, 99, 1789−1797; Robel, M.; Kristo, M. J.; Heller, M. A. Nuclear Forensic Inferences Using Iterative Multidimensional Statistics. In Proceedings of the Institute of Nuclear Materials Management 50th Annual Meeting, Tucson, AZ, July 2009; 12 pages; Nicolaou, G. J. Environ. Radioact. 2006, 86, 313−318; Pajo, L.; Mayer, K.; Koch, L. Fresenius’ J. Anal. Chem. 2001, 371, 348−352). There has been some success in using such multidimensional statistical methods to determine details about the history of spent reactor fuel (Robel, M.; Kristo, M. J. J. Environ. Radioact. 2008, 99, 1789−1797). Here, we aim to expand on these findings by pursuing more robust dimensionality reduction techniques based on manifold embedding which are able to better capture the intrinsic data set information. Furthermore, we demonstrate the use of a number of classification algorithms to reliably determine the reactor type in which a spent fuel material has been irradiated. A number of these classification techniques are novel applications in nuclear forensics and expand on the existing knowledge in this field by creating a reliable and robust classification model. The results from this analysis show that our techniques have been very successful and further ascertain the excellent potential of these techniques in the field of nuclear forensics at least with regard to spent reactor fuel.

N

understanding that different reactor types use different uranium enrichments and have different spectral energies, resulting, thus, in significantly different isotopic compositions in the resultant materials. There will also be low-level trace elements in the metal composition that could remain from the initial ore or have traces of material from the fission process, providing further evidence for source determination. It should be noted that there are a number of radioactive materials that may be involved in these smuggling activities, but for the interests of this work we will be focusing on the materials related to civil nuclear reactors. With particular interest to nuclear reactors and the radioactive materials that they produce, there are a number of different materials and intermediates which can be found at different points in the nuclear fuel cycle.11

uclear smuggling is a concern and a problem that needs to be tackled on a global scale. The International Atomic Energy Agency (IAEA) has demonstrated that there has been a sharp increase in the smuggling of nuclear materials in recent years. There have been 2 331 reported cases of radioactive materials found outside regulatory control as of 2012.7 Nuclear forensics has emerged as a strategy for determining the source and intended use of these materials when they have been intercepted.8 Various material properties can be measured for a particular sample, and these can be used to extract forensic information from radioactive materials and answer questions about the source and originating location of such unknown materials. There are a number of material parameters that can be used to differentiate spent fuel compositions. Notably, these include the isotopic signatures of the plutonium and uranium isotopes, nuclear decay products, anionic and metallic impurities, and the morphology of the material itself.9 The measurement of each of these characteristics allows us to infer details about the history of the material and what the intention of its possession may have been.10 This relies on the © 2014 American Chemical Society

Received: February 3, 2014 Accepted: May 8, 2014 Published: May 8, 2014 5399

dx.doi.org/10.1021/ac5004757 | Anal. Chem. 2014, 86, 5399−5405

Analytical Chemistry

Article

Table 1. Breakdown of the Samples According to Different Reactor Types and the U-235 Enrichment Levels, Cooling Periods, and Burnup Rates for Each of These Groupings reactor type Advanced Gas Reactor (AGR) Boiling Water Reactor (BWR) Canada Deuterium Uranium (CANDU) Magnesium NonOxidizing (Magnox)

Pressurized Water Reactor (PWR) a

no. of samples

U-235 enrichment (%U-235)

490

1−4a

770

2−4a

138

0.711, 1.2

1360

924

0.711

2−4.5a

cooling periods 1, 5, 10, 15, 20, 30, 40 years 1, 5, 10, 15, 20, 30, 40 years 0, 1, 2, 10, 15, 20, 30, 40 years 0, 16, 90, 115, 280, 365 days 2, 3, 4, 5, 10, 15, 20, 30, 40, 50, 60 years 1, 5, 10, 15, 20, 30, 40 years

burnup (MWd/te) 1 000−20 000b 5 000−55 000b 5 000−30 000b 500, 1 000, 2 000, 3 000, 4 000, 5 000, 6 000, 7 000, 8 000, 9 000, 10 000, 1 000, 12 000, 13 000, 14 000, 15 000

5 000−55 000b

Enrichment levels rise in increments of 0.5% U-235 enrichment. bBurnup levels rise in increments of 5 000 MWd/te.



FISPIN DATA SET The FISsion Product Inventory (FISPIN) depletion code has been used to generate the expected composition of spent reactor fuel for the five reactor types that are represented in this paper. The FISPIN code was designed to determine the nuclear inventory of irradiated nuclear fuel and is capable of estimating the composition of a fuel package after it has been irradiated in reactors for different initial fuel compositions, irradiation levels, ratings, cooling periods, and uranium enrichment levels. This will be the first time data generated using the FISPIN code has been studied using multivariate analysis for forensic applications. However, other studies have used the similar ORIGENARP package for generating similar data sets.3 The FISPIN and ORIGEN code have been validated15 and have been shown to be largely in agreement with one another. The calculations from FISPIN can be seen as representing a good average composition for fuel rods that have been used in these reactors. The FISPIN package uses a WIMS (Winfrith Improved Multigroup Scheme) 2D reactor model with JEF (Joint Evaluated File) 2.2 nuclear data to calculate the representative data set for our investigation. Each reactor design has been calculated using an individual WIMS model that is designed to specifically model the characteristic of that particular reactor. It should be noted that the data set does not represent the variance that might be expected within a fuel rod due to, for example, the migration of nuclides within a fuel rod. However, it does represent an accurate average expected composition. Therefore, this data set should be a good basis for designing a classifier to assess the discriminability of the samples in terms of reactor types. The scope of our data set covers the postirradiation material from a civil nuclear reactor. The material composition that we have considered represents a number of different cooling periods and irradiations of these materials, as detailed in Table 1. We consider this work to cover only postirradiation fuel that has been put into storage and not materials that have subsequently been through additional processing such as reprocessed fuel. The data set of spent fuel produced using the code consists of 3 682 representative samples, with each sample assuming the same input fuel composition. There are only two exceptions to this. First, a selection of enrichment levels of the U-235 isotope have been used depending on the enrichment levels that would typically be expected of the reactor that is being modeled. Additionally, there is a slight variance in the U-234 isotope due to incidental enrichment

Nuclear materials to be used as fuel for nuclear reactors go through a number of processes throughout their lifetime, commonly referred to as the nuclear fuel cycle, all reactors typically use this processing route. Typically, civil reactors require enrichment anywhere between the naturally occurring 0.72% and 5% U-235, and our data sets in this work include samples that represent a good distribution within these enrichment levels. Once the fuel has been irradiated in the reactor, the spent fuel can either be put into long-term storage with all the necessary cooling and shielding to keep it in a stable state. In some cases it can be reprocessed, a technique whereby the small amounts of unused uranium and plutonium that remain in spent fuel can be extracted for use.11 This work focuses specifically on the spent fuel, which is a byproduct from reactors once the fuel rods have been utilized in the reactor itself. There are presently 434 reactors12 in operation worldwide producing significant amounts of spent fuel.11 There are a number of different reactor types in operation as well as a number of experimental designs that have been used in the past. The most significant designs can be grouped into five main categories:11 Boiling Water Reactors (BWR), Heavy Water Reactors (HWR), Magnesium NonOxidizing (Magnox), Advanced Gas Cooled Reactors (AGR), and Pressurized Water Reactors (PWR). Of these, by far the largest numbers in operation are PWR and BWR. The Magnox and AGR systems operate only in the U.K. HWR systems are usually in the form of CANada Deuterium Uranium (CANDU) reactor types. The principal objective of this study is to determine which of these reactor types a particular material sample has originated from by analyzing the elemental/isotopic composition of the spent fuel. We make use of two types of analytical techniques from the machine learning and pattern recognition fields. One is aimed for visualization of the data set and is used for the exploratory analysis of the characteristics of the different sample distributions. The other is aimed for automated classification, in order to estimate the probability a particular type from which the depleted uranium may have originated, is the most likely one. While visualization work has been carried out if the field of nuclear forensics,3,13,14 this will be the first time an in-depth investigation has been carried out to use pattern recognition techniques for the successful classification and characterization of these materials. 5400

dx.doi.org/10.1021/ac5004757 | Anal. Chem. 2014, 86, 5399−5405

Analytical Chemistry

Article

Thus, the final embedding matrix Z is an accurate representation of the original n × d feature matrix X. The above optimization can be directly solved with an eigendecomposition of the Laplacian matrix of the graph defined as L = D − W, where D is the diagonal matrix composed of the row sums of W. LE optimizes eq 1 with the additional constraint that ZTDZ = Ik×k, which imposes a further scaling of the embedding space to keep the dimensions separate. Classification. To assess the discriminability of the data set, we apply machine learning and specifically automated classification. Prior to that, we reduce the data with a simple PCA to remove the dimensions of low variance and retain the directions of potentially useful information (we preserve 99% of the data variance). Because of the fact there is little in the way of classification results for these materials, we experimentally tested a number of different classification techniques. The following discussion provides brief details of the algorithms that proved to have the most successful classification results for our data. The classification function sought for this problem is defined as a mapping from a given input xi (in this case, the PCA adjusted elemental and isotopic data) to a class ωj. Here the classes corresponding to each of the 5 reactor types are ω1 for AGR, ω2 for BWR, ω3 for CANDU, ω4 for Magnox, and finally ω5 for PWR. The classifier can be trained and tested with the available data set D of n = 3 682 samples using cross-validation for objective model assessment. Two of the classifiers19 that we have implemented are the closely related Linear Discriminant Analysis (LDA) and quadratic discriminant analysis (QDA). Both are probabilistic classifiers, based on the comparison of the posterior probabilities

during the U-235 enrichment process. This has been calculated using proprietary information provided by Westinghouse that details the correlation between U-235 and U-234 enrichment. However, it should be noted that the U-234 variance caused by natural deposits16 has not been accounted for. Each of the samples represents a unique combination of five variables/ attributes for which the classification performance can be assessed. These are the reactor type, U-235 enrichment, burnup, rating, and cooling period. The sample grouping across the available reactor types as well as some properties of the data set are summarized in Table 1. For each sample in the data set, there are 34 elements and isotopes represented. Briefly, these include C-14, Mn-54, Co60, Kr-85, Sr-90, Nb-95, Zr-95, Tc-99, Ru-106, Sb-125, I-129, Cs-134, Cs-137, Ce-144, Pm-147, Eu-152, Eu-154, Eu-155, Np237, Pu-238, Pu-239, Pu-240, Pu-241, Pu-242, Am-241, Am243, U-234, U-235, U-236, U-237, U-238, Cm-242, Cm-243, and Cm-244. These values represent the mass per initial tonne of heavy metal. The data set has been transferred into the Matlab environment (Mathworks Inc.) as a matrix representing all 3 682 samples and all 34 isotopes. The data set is a full data set with no missing values.



METHODS Visualization and Dimensionality Reduction. The data set consists of a complex set of 34 isotopes, and it is necessary to compress these into a reduced set of 2 or 3 features for visualization and exploratory purposes. Such analysis is very helpful for discovering trends and aiding the decision making process, especially with regards to preassessing the algorithms that will be suitable for the automated classification. This is not the first time such a technique has been used for representing spent fuel. Principal component analysis (PCA) and partial least squares discriminant analysis (PLSDA) have been previously used for data set visualization.3 In this work, we employ for the first time in this type of application spectral manifold embedding methods.17 Such methods are also capable of compressing the data to fewer dimensions. However, unlike previously used ones that are based on finding orthogonal directions that maximize global measures of data variance, they are designed to preserve certain local properties and characteristics of the original high-dimensional feature space, such as pairwise proximity information between the original samples. In this work, we employ an unsupervised spectral embedding method, called Laplacian Eigenmaps18 (LE). As shown in the Visualization Assessment section, this technique has been successfully implemented to visualize the data set in terms of the different reactor types. LE is an unsupervised method based on graph modeling that operates by first converting the n × d data matrix X (in our case, we have n = 3 682 and d = 34) to an n × n similarity matrix W = [wij], where wij represents the similarity between the ith and the jth rows of X, in this case the cosine similarity. Then, the samples are projected nonlinearly to a lower dimensional space of k dimensions (in our case, k = 3) where original d-dimensional samples xi correspond to kdimensional embeddings zi. This is achieved by minimizing the sum of weighted pairwise distances between all n embeddings, which can be expressed as n

min

Z ∈ 9 n×k

p(ωj|xi) =

∑ ∑ wij || zi − i=1 j=1

p(x i )

(2)

where p(xi|ωj) is the conditional likelihood and p(ωj) the prior of each group ωj. Given a measurement xi, the classifier minimizes the average decision risk by classifying xi as being a member of the most probable class. The major difference between LDA and QDA is that the covariance of all classes are assumed to be the same in LDA, whereas they differ when using QDA. The Random Forest classification technique is based on the concept of bootstrapping the data set into random sample groups and using each of these groups to create a number of prediction trees.20 The class for a given sample is assessed by each of these decision trees, and the mode of all trees is taken as the final class the forest classifier predicts for a given sample. The Parzen window classifier is essentially a probabilistic neural network (PNN) or a kind of normalized radial basis function used for classification. Relying on eq 2, the posterior probability is estimated from the data using the Parzen window technique.21,22 A window function φ is employed for counting the number of samples xi that fall within a region in 9 d of fixed volume V. It can be shown that the jth class likelihood is given by p(x i|ωj) =

n

zj ||2 2

p(xi|ωj)p(ωj)

1 |Dj|

∑ y ∈ Dj

1 ⎛⎜ x i − y ⎞⎟ φ V ⎝ σ ⎠

where Dj is the set of samples in D that belong to class ωj and σ a measure of the regions neighborhood. For the kernel φ, we

(1) 5401

dx.doi.org/10.1021/ac5004757 | Anal. Chem. 2014, 86, 5399−5405

Analytical Chemistry

Article

that they do not radiate out along the trajectory shown by the arrow labeled “enrichment”. It is for this reason that the samples from these two groups are closely correlated to one another. The others have a set of different enrichment levels. For instance, the data for the AGR reactor has seven distinct enrichment levels, each of these are clearly visible in Figure 1 as seven distinct lines at the top of the curve. This same pattern is evident with the other reactors; therefore, we can see a very clear pattern that is attributed to the initial U-235 enrichment of each of the samples. As illustrated in Figure 1, the U-235 enrichment levels increase along the arrow. Taking this into consideration, the AGR and Magnox samples are very closely correlated in all attributes, with the exception of their respective enrichment levels. As discussed, the Magnox samples are enriched to 0.711% U-235; however, the lowest U-235 enriched AGR samples are 1% and the separation between these samples is clearly evident in Figure 1. Magnox and AGR reactors are the only gas cooled reactor types in this data set and may be the cause of this correlation. While it is not covered in this work, there is certainly a possibility that similar analysis could be undertaken to determine the initial U-235 enrichment of spent fuel utilizing all available variables. Combining the reactor type results with the U-235 enrichment would provide a good forensic basis for determining a clear subset of reactor from which a particular spent fuel sample may have originated. Figure 2 is a rotated version of the same data shown in Figure 1. From this position we can easily see the curve

have chosen the Gaussian function as this provides a smoother density approximation. LDA and QDA are flexible and fast methods without the need of complex parameters but generate simple linear and quadric decision boundaries. Random forests and Parzen based classifiers are more complex techniques in terms of implementation and training, but they can also generate more powerful decision surfaces that may have better performance with nonlinear data sets. We have also employed a feature selection method to determine which of the isotopes in this synthetic spent fuel data set contribute significantly to the determination of the reactor type. This complements the results from the classification analysis by determining which of the isotopes would be of particular interest to the practitioner in a real-world scenario. We have used the Relief-F algorithm,23 which is a filter feature selection method that determines a weight of importance for each single feature in the data set relative to its ability to distinguish between respective classes. It is therefore capable of ranking all the available features according to their importance in the classification task.



RESULTS Visualization Assessment. Figure 1 shows the results from plotting the LE transformation of the complete data set.

Figure 1. Visual representation of the high level (34 feature) data set compressed into three dimensions using LE. The data has been plotted according the 5 reactor type groups. It can be seen that samples from the 5 groups are distributed into distinguishable patterns.

Figure 2. Further analysis of the LE plotting shown in Figure 1. The three-dimensional plot has been rotated to show the curvature of the samples, which is attributed to the irradiation.

This shows the 34 different isotopes of the data set embedded into a three-dimensional space of arbitrary unit axes. The results show that there is clearly a distinct separation between the compositions of spent fuel for each of the different reactor types. The samples for each of the reactor types have a consistent form, spreading out from a point into a triangle that curves up at its widest point. The only exception to this is the samples from the Magnox and CANDU reactor types; the curve is still evident, but they do not radiate out to form a triangle as with the other sample groups. The defining feature of these reactors is that they only have samples with low enrichment levels of 0.711% and 1.2% as would be the case in real world applications.11 Because of the fact the Magnox and CANDU samples have a very similar enrichment, we can see

formation that is characteristic of the sample sets and the separation between each of the reactor classes. It has been shown that the curve formation is characteristic of the burn-up of the fuel,3 and this has been consistent with the results generated in our study. As illustrated in Figure 2, the low irradiated samples form the curved characteristic of the sample sets and they increase along the arrow. The more irradiation a sample has undergone, the more distinct the different reactor type groups become, and the more separated the curves are. This is expected due to the differing spectral energies that would be characteristic of each reactor type. The sample groups are much better resolved as the irradiation levels increase, particularly the AGR, BWR, and PWR sample groups. 5402

dx.doi.org/10.1021/ac5004757 | Anal. Chem. 2014, 86, 5399−5405

Analytical Chemistry

Article

However, there are a number of PWR and BWR samples that are tightly grouped toward the bottom of the curve in Figure 2. This is likely due to the similarity in the materials of these two forms of nuclear reactor as they both use light water as a coolant and enriched uranium dioxide fuel. Other studies into this field have also found that the BWR and PWR reactor types are difficult to distinguish.3 In contrast to the well-resolved pattern of the enrichment and irradiation attributes, cooling periods are tightly coupled into groups according to the different reactor type, burnup, and enrichment levels of the respective samples. For instance, looking at Figure 1 there are 7 distinct groups of AGR samples running along the length of the “enrichment” arrow. Each of these groups represents the different cooling periods for the 7 different enrichments of AGR samples with 1 000 MWd/te burnup. As Figures 1 and 2 show, the reactor types are well separated for the most part. However, the Magnox and CANDU samples do seem to be closely distributed, which may cause some confusion for determining samples that fall in this region of the feature space. Furthermore, PWR and BWR classes are closely distributed. The Magnox and CANDU reactor types are both designed to be used with low-enriched fuel as can be seen in Table 1. Furthermore, the PWR and BRW reactor types also have similar enrichment levels. Therefore, it can be shown that the most significant portion of the separability in these visualizations is attributed to the different enrichment levels of the fuel types. This is particularly evident where the Magnox and CANDU samples are very closely grouped together, and the two different CANDU enrichment levels are shown in the form of two separate arches in Figure 2. Toward the rightmost portion of Figure 2 they are well separated, but they quickly merge together as they approach the curved portion of the class distributions. What is clear from the visualizations is that the different classes separate well in terms of sample proximities but they have individual trends that mixed them at specific parts of the space. Classification Assessment. In this section we evaluate the accuracy of the implemented classifiers in automated classification of the isotopic samples. All classifiers are assessed using the same conditions and data sets to ensure an unbiased comparison. 10-fold cross validation has been used to test the generalization performance of each algorithm. The complete data set is split into 10 different subsets of samples. For each pass, nine subsets are used to train the classifier model, while the remaining one to test it. This is repeated separately for each remaining fold, and the final classification performance is the average from all passes.19 To apply the Random Forest classifier, we first need to determine how many decision trees to use. As Figure 3 shows we have assessed the success of the classifier using a varying number of trees. As expected, the out-of-the-bag error begins very high and decreases as additional decision trees are added to the model. We can see that the error values reach a very good level at 10 trees and it subsequently decreases slowly; on the basis of these results, we have decided to use 15 trees for the final model. Application of the Parzen based classifier requires fine-tuning of the neighborhood width parameter σ. In Figure 4, a range of values have been assessed and plotted to show the error rate for this particular classification problem. It is clear from this plot that, first, the Parzen classifier is successful at classifying the reactor types in this data set as the error rate quickly drops to a very low value. Second, the error rate reaches a suitably low

Figure 3. Classification error for different number of trees in the Random Forest.

value at around 30 and it remains low from there. For our implementation, we have used a sigma value of 70, which provides a very low error rate. Table 2 shows the classification error rate for each of the classifiers, and it is clear from these results that the classifiers have been very successful. While the best overall classification results have been observed using the Parzen window classifier, a good level of accuracy has been achieved with each of these techniques. For instance, the results from LDA (which represents the worst results from this comparison) show that we would expect about 4 samples to be incorrectly classified out of every 100. On the other hand, the Parzen classifier generated the most reliable results, as it incorrectly classifies about 5 of every 1000 samples. While the results are not able to achieve a perfect classification rate, these would provide a very valuable tool to assist forensic analysis. Although, these results are strongly encouraging, it should be noted that for real-world samples the corresponding accuracies may be lower, as the current results are based on data generated using the FISPIN depletion code, which may not take into consideration some subtleties of real-world samples. For instance, differences in the reactor feed material arising from different suppliers. However, on the basis of our analytical data from spent fuel there would be a good chance of correctly identifying a sample in a lab scenario. Further to the classification error, we can examine in detail the performance of the best classifier, the Parzen window based, by looking at the confusion matrix of its results in Table 3. Each of the columns in the table represents the predicted value while each of the rows the actual values. It is useful to study the confusion matrix in order to determine the classes where misclassification occurs and therefore pinpoint the particular classes that are more difficult to reliably determine. Of all 3 682 samples in the complete data set, we have managed to build a classification model that is capable of correctly determining the reactor type of all samples except 20. Notably, the CANDU class has achieved 100% correct classification. However, it seems that some samples from the classes PWR, BWR, Magnox, and AGR are closely distributed and cause some incorrect classifications. As can be seen in Table 3, 8 PWR samples have been misclassified as AGR and 7 as BWR. Likewise, the BWR samples have proven to be difficult to predict with 2 misclassified in the AGR class and 3 as Magnox. These results are representative of the fact that the spent fuels from some of these reactors are inherently difficult to distinguish. Furthermore, these results are consistent with the visualization results shown in the Visualization Assessment section where the similarities of the respective reactors are 5403

dx.doi.org/10.1021/ac5004757 | Anal. Chem. 2014, 86, 5399−5405

Analytical Chemistry

Article

Figure 4. Error rate of the Parzen window classifier using different widths.

Figure 5 presents the results from using the Relief-F feature selection filter to assess which of the 34 isotopes are best suited to determining the reactor type of spent fuel. The bar chart represents the Relief-F weights for each of the isotopes as described in the Classification section. Using these weights, the isotopes have been ordered with the most significant ones to the left. The line plot represents the cumulative cross-validation results from classifying samples with an ever-increasing number of isotopes beginning from the most significant. It is interesting to note that with the exception of Pu-239 the Pu isotopes are not weighted as highly as expected.3 These results have been generated using all the samples in the data set including a variety of cooling and burnup values as shown in Table 1. Therefore, it is suggested that some of the isotopes with shorter half-lives may not be as important as suggested here when considering samples that would typically have a longer cooling time. It is also notable that the cross-validation rate dramatically decreases when only the Pu-239 and U-238 isotopes are considered. The only other significant drop in the error rate results from the addition of the U-235 isotope, presumably due to the enrichment variation of U-235. Further work will be required to determine the extent to which these results are

Table 2. Representative Classification Error Rates for LDA, QDA, Random Forest, and Parzen Window Classifiers classification method

error rate

LDA QDA Random Forest Parzen Window

0.0418 0.0060 0.0098 0.0054

Table 3. Confusion Matrix for 10-Fold Cross Validation of the Parzen Classifier AGR BWR CANDU Magnox PWR

AGR

BWR

CANDU

Magnox

PWR

490 0 0 0 0

2 765 0 3 0

0 0 138 0 0

0 0 0 1360 0

8 7 0 0 909

evident, particularly in the case of PWR and BWR types. The misclassified samples still remain a very small portion of the entire data set, because the overall classification accuracy is very high.

Figure 5. Relief-F feature weights represented as bars and cross-validation classification accuracy of the reduced data set overlaid as a line plot. 5404

dx.doi.org/10.1021/ac5004757 | Anal. Chem. 2014, 86, 5399−5405

Analytical Chemistry

Article

(3) Robel, M.; Kristo, M. J. J. Environ. Radioact. 2008, 99, 1789− 1797. (4) Robel, M.; Kristo, M. J.; Heller, M. A. Nuclear Forensic Inferences Using Iterative Multidimensional Statistics. In Proceedings of the Institute of Nuclear Materials Management 50th Annual Meeting, Tucson, AZ, July 2009; 12 pages. (5) Nicolaou, G. J. Environ. Radioact. 2006, 86, 313−318. (6) Pajo, L.; Mayer, K.; Koch, L. Fresenius’ J. Anal. Chem. 2001, 371, 348−352. (7) IAEA. IAEA Annual Report 2012, 2012 ed.; GC(57)/3; IAEA: Vienna, Austria, 2013; pp 1−125. (8) Kristo, M. J.; Tumey, S. J. Nucl. Instrum. Methods Phys. 2012, 1− 6. (9) Mayer, K.; Wallenius, M.; Varga, Z. Chem. Rev. 2013, 113, 884− 900. (10) Banas, K.; Banas, A.; Moser, H. O.; Bahou, M.; Li, W.; Yang, P.; Cholewa, M.; Lim, S. K. Anal. Chem. 2010, 82, 3038−3044. (11) Kok, K., Ed. Nuclear Engineering Handbook; CRC Press: Boca Raton, FL, 2009. (12) Country Nuclear Power Profiles, 2012 ed.; IAEA-CNPP/2012/ CD; www-pub.iaea.org:Austria. (13) Sirven, J. B.; Pailloux, A.; M’Baye, Y.; Coulon, N.; Alpettaz, T.; Gossé, S. J. Anal. At. Spectrom. 2009, 24, 451−459. (14) Varga, Z.; Ö ztürk, B.; Meppen, M.; Mayer, K.; Wallenius, M.; Apostolidis, C. Radiochim. Acta 2011, 99, 807−813. (15) Parker, D. R.; Mills, R. W.; Little, P.; Hassall, C. M. BNFL Res. Technol. 2002, 1−115. (16) Brennecka, G. A.; Borg, L. E.; Hutcheon, I. D.; Sharp, M. A.; Anbar, A. D. Earth Planet. Sci. Lett. 2010, 291, 228−233. (17) Mu, T.; Goulermas, J. Y.; Tsujii, J.; Ananiadou, S. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2216−2232. (18) Belkin, M.; Niyogi, P. Neural Comput. 2003, 15, 1373−1396. (19) Duda, R.; Hart, P.; Stork, D. Pattern Classification; 2nd ed.; John Wiley & Sons: New York, 2001. (20) Breiman, L. Mach. Learn. 2001, 45, 5−32. (21) Parzen, E. Ann. Math. Stat. 1962, 33, 1065−1076. (22) Pan, Z. W.; Xiang, D. H.; Xiao, Q. W.; Zhou, D. X. J. Complex. 2008, 24, 606−618. (23) Kononenko, I.; Šimec, E.; Robnik-Šikonja, M. Appl. Intell. 1997, 7, 39−55. (24) George Koperski, J.; Abbondante, S. F. Nuclear Forensics. In Wiley Encyclopedia of Forensic Science; John Wiley & Sons: New York, 2009

dependent on the range of U-235 enrichment, burn-up, and cooling times that are included in this study.



CONCLUSION The results that we have seen from classification of synthetic analytical chemistry data of spent fuel have been very encouraging as a proof of concept for using a classification algorithm to determine the reactor type of spent reactor fuel. The Parzen window classification technique has achieved the most accurate results. This has been the first time this technique has been used in the field of nuclear forensics, and we have shown that it has been very effective, particularly compared to more traditional methods such as LDA and QDA. It should be noted that the data set has been generated by the FISPIN depletion package and may not take into consideration some of the subtle differences we would expect from real-world sample measurements. Further work will need to focus on applying the methods used here for determining the reactor types for real-world spent fuel samples. The study has considered only the postirradiation materials from civil nuclear reactors; however, it should be noted that other materials from the nuclear fuel cycle could also be subjected to similar analysis. It is anticipated that this could work well for reprocessed reactor materials, as these would exhibit similar characteristics to the materials we have studied here. There are other reactor types that are not included in the data set we have used for this investigation. It would be advantageous to add further reactor types to our classification model such that it can be assessed with a more holistic data set. We have demonstrated that the multivariate techniques used here are able to successfully determine the source reactor type of irradiated nuclear materials using the full wealth of information available. Unlike classical methods where key relationships between isotopes are investigated,8,24 we have taken advantage of the full information available in the data set and used 34 isotopes. It has been demonstrated that accurate classification results can be achieved using a reduced subset of these isotopes using a feature selection technique to determine the most appropriate isotopes that determine the reactor type. However, our approach does have the disadvantage of being based on analytical methods that are not widely used in the nuclear forensic community and is not based on traditional logic, such as parent/daughter nuclide relationships typically used for this form of investigation. Nevertheless, because the results have been achieved using multivariate analytics, it is envisaged that this approach could form the basis for a toolbox to aid fast evaluation in the field of nuclear forensics. It has also been noted that there is potential to also determine the initial U-235 enrichment of the fuel. On the basis of the visual results shown in Figure 1 there is clearly a correlation between spatial arrangements of the samples and the initial U-235 enrichment values.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.



REFERENCES

(1) Stanley, F. E. J. Anal. At. Spectrom. 2012, 27, 1821. (2) Varga, Z.; Wallenius, M.; Mayer, K.; Keegan, E.; Millet, S. Anal. Chem. 2009, 81, 8327−8334. 5405

dx.doi.org/10.1021/ac5004757 | Anal. Chem. 2014, 86, 5399−5405