Quality Control of Food Products using Image Analysis and

Dec 9, 2008 - (14-18) However, given the nature of the data available and the goal of the quality monitoring device that we are addressing in this pap...
7 downloads 9 Views 2MB Size
988

Ind. Eng. Chem. Res. 2009, 48, 988–998

Quality Control of Food Products using Image Analysis and Multivariate Statistical Tools Ana C. Pereira, Marco S. Reis,* and Pedro M. Saraiva Department of Chemical Engineering, UniVersity of Coimbra, Po´lo II, Rua Sı´lVio Lima, 3030-790 Coimbra, Portugal

Visual appearance is an important feature in the quality assessment of food products, since it plays a key role in the decisions made by consumers. Frequently, its evaluation is carried out by a panel of experts from the quality department who analyze, by visual inspection, samples of product taken from the process. It is wellknown that such methodologies of assessment suffer from several drawbacks, such as subjectivity, limited precision, and lack of stability over time, even for well-trained personnel, although extensive training programs can improve assessment performance. In this context, we present in this paper results regarding the development of an automated methodology for assessing the visual appearance of cereal flakes, in what concerns a particular quality feature, relative to the existence of regions where cereal coating is inadequate. The proposed procedure is able to extract the necessary information from images taken from product samples, leading to an objective, stable, and quantitative quality control measurement system for this property. The developed algorithm essentially consists of implementing a supervised classification methodology, based on the estimated Mahalanobis distance for assessing proximity in the color space, while incorporating the natural variability and color correlations found in cereal flakes. This procedure also integrates fuzzy logic reasoning for samples falling under regions closer to the classes’ boundaries. Results obtained from a real industrial plant indicate that it is indeed possible to classify different samples of flakes according to classes previously defined. They also provide a sound basis for further developments, in particular regarding the generation of metrics for quality assessment and the implementation of a similar procedure online and in situ. Introduction An image is always acquired with the purpose of reproducing some properties of an object or scene depicted.1 Therefore, if image acquisition is properly done, it contains essential information about relevant properties of the system under analysis. In fact, an image conveys a large amount of information, which, if correctly handled, can be used to support tasks that otherwise would require extensive time from well-trained personnel or may be even impossible to carry out. No machine can yet match the versatility and flexibility of the human analysis of image contents. However, human image analysis faces difficulties when quality evaluation requires more precise (perhaps quantitative) and stable assessments of what is seen and may even become inadequate for repetitive tasks and/or the analysis of subtle patterns, some of them not even falling in the visible part of the electromagnetic spectrum. It is under these conditions, in particular, that automatic machine-based vision and image analysis systems can bring considerable added value for quality control applications. This happens to be the case for the problem addressed in this paper, that of assessing coating quality in cereal flakes, for which we have developed an automated image analysis system. Most cereal flakes are typically produced by coating the base product with a colored and flavored coating. Coating plays an important role in the final taste and appearance of the cereal and, consequently, in consumer’s acceptance and valuation. Therefore, a correct monitoring of coating characteristics is essential to ensure consumer satisfaction. Under normal operation conditions, coating shows good adherence to the cereal and no uniformity problems are detected. However, under some process upsets, the desired uniform coating may not be achieved. Such a problem can express itself in just a particular region of * To whom correspondence should be addressed. E-mail: marco@ eq.uc.pt.

the flakes: since the flakes under analysis have a sort of halfegg shape, the coating problem only appears in the inner, concave side and never on the outer, convex face. This happens to be so, because the coating operation is performed in a tumbler by pulverization and also by contact between flakes. Given the flakes’ concave shape, it becomes more difficult to totally cover the inner surface, especially when adherence problems do arise. The monitoring of this quality feature is carried out by welltrained personnel in most manufacturing plants. Such experts visually inspect product samples taken from the process at regular times. Formulated in this way, the above problem seems eligible for the development of an alternative, automated image-analysis system for quality grade evaluation. In fact, we have been witnessing an increasing trend toward a more extensive incorporation of such a type of systems in the context of industrial processes supervision and quality control. This trend is likely to increase even further in the future, given the developments and technical advances made both regarding image acquisition and analysis. Hitherto, the so-called “deterministic approaches” have dominated applications, where the focus is essentially centered on issues such as the inspection and/or classification of objects with fixed shapes or other related supervision tasks, predominantly in automated assembly lines. However, more recently, techniques have also been developed to handle less well-defined image structures, that are not so well characterized by a set of fixed shape parameters, exhibiting instead variability in the way they manifest themselves, like texture patterns, product nonuniformities, or surface flaws. Therefore, “stochastic methodologies” have emerged to adequately describe and incorporate variability in image analysis and, in particular, multivariate statistics have been extensively used in this regard, given the large amount of correlated information provided by images, either across pixels or between the different color

10.1021/ie071610b CCC: $40.75  2009 American Chemical Society Published on Web 12/09/2008

Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009 989

channels extracted. Many of these approaches fall now under the heading of multivariate image analysis (MIA),1,2 a discipline that gathers together methodologies where multivariate statistical techniques, such as principal component analysis (PCA) or partial least-squares (PLS), sometimes integrated along with other data processing techniques (e.g., data transforms, such as wavelets and Fourier), are applied to image analysis problems. MIA techniques were already successfully applied to several industrial applications: MacGregor and co-workers applied them to softwood lumber grading,3 boiler’s flames monitoring,4 and to the supervision of snack food processes.5 These applications involve the analysis of “random textures”, as the objects under analysis present complex patterns of variation throughout their surface, which are also the key elements in their analyses.6 Other applications found in the literature, involving the analysis of random textures, encompass the quality monitoring/classification of artificial stone plates,6 rolled steel sheets,7 pasta sauce,8 and the qualitative (“which components?”), quantitative (“how much of each component is present?”), and spatial distribution characterization (“where are they present?”) of ingredients in medication tablets from the pharmaceutical industry.9 In these applications one can find examples of the use of unsupervised multivariate modeling tools, such as PCA, as well as of supervised learning frameworks, namely those used for regression (PLS) and classification (PLS-DA) problems. Multivariate image regression (MIR) has meanwhile acquired some differentiation from the bulk of MIA approaches, being specially suited for the detailed analysis of the spatial distribution of relevant features, such as component concentrations and defect locations.9-12 Additional applications involving congruent images (i.e., images properly preprocessed, oriented and aligned, for which each pixel, at a given position, keeps the same meaning regarding the element under analysis), were also put forward.13 The main contribution of this paper relates to the development of a new automated methodology for performing quality control of nonuniform coating in cereals manufacturing. However, the results achieved remain valid for other problems involving noncongruent stochastic image analysis. The methodology was conceived to be implemented offline, improving current quality monitoring practices based upon experts panels, being more objective, stable, cheaper, and faster. This paper is organized as follows. After this introductory review of MIA applications, in the next section, we briefly review the fundamentals of digital imaging and provide some essential background on MIA methodologies. Then, we describe the proposed approach for an automated, offline image analysis system for coating uniformity quality assessment, and present the results obtained through the application of this methodology, along with some implementation details. We conclude the paper by summarizing its main contributions as well as discussing further developments to be considered in the future, including possible online implementations. Background on Digital Imaging and Multivariate Image Analysis (MIA) Digital Imaging. The availability of increasingly inexpensive and informative imaging sensors has been allowing new advances in product and process quality monitoring and control strategies, based upon a proper acquisition and treatment of images. These images are frequently colored, meaning that they contain, for each position (x,y) in the image plane (pixel), the corresponding intensities at different wavelength bands in the visible region of the light spectrum, such as those usually known

Figure 1. Multivariate image composed with three wavelength channels. In this case, they are the so-called RGB channels (standing for the red, blue, and green color bands). The ellipse contains the R, G, and B intensity values for a given pixel.

Figure 2. Unfolding a three-way array into a two-way matrix, preserving wavelength (color) channel information. This new structure is adequate for conventional PCA analysis of the color correlated structure of pixels.

as red, green, and blue (RGB). However, RGB images are just one particular type of a larger class of hyperspectral images, which can have, in general, an arbitrary but finite number of wavelength channels, so that it is now possible to measure tens or hundreds of individual channels, with correspondingly higher spectral resolutions.8,11 Therefore, a hyperspectral image consists of a stack of congruent images, each being relative to a particular wavelength band and providing the numerical intensity values (normally integers from 0 to 255) for each channel. Together, they form a three-way matrix, where the first two modes regard the spatial (x,y) pixel positions, while the third mode relates to the several wavelength channels. For instance, an RGB image with 512 × 512 pixels originates a 512 × 512 × 3 three-way matrix (Figure 1). These hyperspectral images are usually referred to as multivariate images, given the correlated nature of the information contained across the different channels. The colors or wavelength channels used should be those that present higher sensitivity to the phenomena under analysis and may fall in the visible or nonvisible (IR, UV) part of the electromagnetic spectrum. Therefore, an important decision concerns the selection of an appropriate set of wavelengths to be used. This can be carried out through the use of a multispectral camera, whose images, taken from samples of different quality grades, can then be analyzed in order to select

990 Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009

data variability, but in a more compact and organized way. This can be achieved through multiway methods,23 such as multiway principal component analysis (MPCA), where the decomposition of an image, X _ , with dimensions (Nx × Ny × Nz) is done as follows: p

X _)

∑ (T X p ) + E_ i

i

(1)

i)1

Figure 3. (a) Example of an image used to evaluate coating quality. Flakes are lying with the concave face in the upward direction (quality grade: bad). (b) Areas occupied by flakes and areas with coating problems identified with different colors (red - flake, yellow - flake areas with bad coating).

Figure 4. Regions selected for flake identification (classification subtask no. 1) and bad coating identification (classification subtask no. 2).

the wavelengths subset that most discriminates among the various product quality grades. The selected wavelengths are finally used to monitor the process using a less expensive measurement apparatus, composed of light sensitive diodes for the chosen wavelengths of interest. In our case, the information to be extracted from images is essentially the one used by operators for visual evaluation of flakes coating quality, therefore falling in the visible part of the light spectrum, where furthermore simple and affordable digital cameras are available. We therefore used a conventional RGB camera for acquiring images under well controlled light conditions. These RGB images, each one with dimensions Nx × Ny × 3, form the multivariate images set to be later on analyzed, supporting the development of an automated quality evaluation procedure for flake’s coating (Figure 1). Conventional image analysis tasks have been essentially focused on the image space, and there is a vast literature regarding different aspects of image processing in that representation space. It is usually carried out offline, quite frequently aimed at the enhancement of digital images, followed by several processing steps, such as filtering, edge detection, image segmentation, histogram equalization, and morphological operations.14-18 However, given the nature of the data available and the goal of the quality monitoring device that we are addressing in this paper, where subtle color information, connected to coating uniformity, must be extracted from highdimensional and correlated data structures, we decided to follow a different approach, adopting a multivariate image analysis (MIA) perspective, which has been tested with success under similar contexts. Multivariate Image Analysis (MIA). Being composed by a large amount of correlated information, multivariate images can be analyzed through multivariate statistical frameworks, such as latent variable methodologies, that have been successfully applied before to other complex data sets collected from industrial processes19,20 and analytical measurements,21,22 usually referred to as MIA. In this context, statistical methods such as PCA,1-4 PLS,5,10-12 and PLS-DA6,13 have been applied to extract and analyze information, as well as to perform prediction and classification activities. For instance, the application of PCA to multivariate images consists on decomposing or breakingdown the three-way data structures onto orthogonal blocks of a lower dimension, that still preserve the essential elements of

where, Ti is the ith (Nx × Ny) score matrix, pi is the ith (Nz × 1) loading vector, E _ is the residual three-way array, and p denotes the pseudorank (p < Nz), i.e., the number of components considered (X represents the Kronecker product). This methodology is in fact equivalent to unfolding the three-way array in a two-way matrix that preserves the wavelength channels mode, having dimensions (Nx × Ny) × Nz (see Figure 2), and then performing conventional PCA over such a matrix. The loading vectors would then be Nz × 1, while the scores have dimensions (Nx × Ny) × 1, encoding pixels information about the color structure of the image. Pixels sharing similar color attributes, in the context of the correlated statistical variability underlying the analyzed image, will tend to cluster together in the scores space. This fact is used to develop masking procedures in such a subspace, in order to identify regions whose pixels in the original image have similar color characteristics. The drawing of these masked regions is usually done manually, by trial and error iteration, until they successfully capture most of the features of interest in the original image. For RGB images, which are usually well explained by the first two principal components, such a trial and error process may be good enough. However, when more than two score subspaces are needed for appropriately describing image variability, namely in hyperspectral images having a larger number of wavelength channels, more refined procedures should be adopted.24 In the problem that we will address here, one is also interested in extracting information regarding color relationships, and therefore, we adopt a similar unfolding strategy. However, the methods used to process such a data structure differ from a PCA based analysis, as explained in detail in the next section. Furthermore, in several stages, we will fuse different portions of one or several images into a single two-dimensional matrix, in order to robustly extract the natural variability regarding the correlated behavior that we want to describe, which will be later on used for classification purposes. Proposed MIA Procedure for Flake Coating Quality Evaluation In this section, we will present the main steps involved in the image-based procedure for assessing coating quality in the concave face of cereal flakes proposed in this paper and developed with a real manufacturing plant. The analysis of results, along with implementation details, will also be presented. Image Acquisition System. We have used a conventional digital RGB camera, capable of acquiring images with a maximum resolution of 1944 × 2592 pixels, inserted as part of an experimental apparatus, composed by an illumination system containing eight halogen lamps (300 W), so that the whole area under analysis is uniformly illuminated. Acquired images are transferred to a PC in the “jpeg” compressed format, over which offline data analysis tasks are carried out. Training samples. In order to model and reproduce the current quality evaluation practices, several samples of cereal flakes were used, representative of three quality grades adopted

Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009 991

Figure 5. Box plot for the mean fraction of bad coating area (MFBCA) in prototype samples that correspond to the following different quality grades: good, medium, and bad.

Figure 6. Normal probability plots for the mean fraction of bad coating area (MFBCA) in the three quality grades (data generated from a Gaussian distribution will correspond to approximately straight lines in such plots).

by the expert panel. They correspond to “standard” or “good” (7), “acceptable” or “medium” (13), and “non acceptable” or “bad” (6) coatings. The composition of the samples was carried out in collaboration with a panel of experts from the manufacturing plant that participated in this project, who based their evaluation essentially on three features regarding the appearance of the cereals: number of flakes in the sample that have an inadequate coating, flake area affected by inadequate coating,

and different color hue on the flake surface. These samples were then used to develop and test our statistical classification procedures. Flake Quality Evaluation. As a particular case of a classification problem, our quality evaluation procedure can be conceptually divided in two main stages. In the first stage, a subset of measurement features, offering discrimination power, is selected. According to the topology of the clusters found in

992 Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009

Figure 7. Boundaries separating regions corresponding to classes “good”/“medium” (vertical line F1) and “medium”/“bad” (vertical line F2), together with auxiliary lines indicating regions inside which classification becomes less certain (F1′, F1′′ and F2′, F2′′).

Figure 8. Membership levels for each quality grade, according to MFBCA values. Table 1. Classification Table with Class Predictions Using the Proposed Methodology, Including the Consideration of “Grey” Regions Where Fuzzy Logic Reasoning Is to Be Applied, As Described in the Text classification of samples by the proposed methodology good medium bad uncertain classification of good samples made medium by the panel bad of experts column totals

row totals

6

0

0

1

7

1

8

1

3

13

0

0

5

1

6

7

8

6

5

total 26

the space spanned by such a subset, an adequate classifier is then chosen and applied. This stage is usually referred to as the training stage. Then, in a second stage, new raw images are acquired and the previously selected image features computed, and fed to the classifier, leading to class predictions. This stage is known as the testing stage. In the next paragraphs we will describe the main steps involved in the development of the classifier and its initial validation (training stage).

A typical image to be analyzed is presented in Figure 3a, where the main characteristic to be extracted for quantification involves the evaluation of bad coating regions within flakes. Therefore, we focused the feature extraction activity on different alternatives for evaluating the area of such regions. In particular, we computed the bad coating fractional area, i.e., the ratio between the sum of the areas (pixels) from all regions with bad coating and the whole image area occupied by flakes (overall number of pixels classified as belonging to flakes), when flakes are oriented with the concave side facing upward, since this was found to be closely linked to the judgements made by the panel of experts. Computing the fractional area with bad coating over all the flakes contained in a given sample involves two classification subtasks: (i) identification of flakes as objects in the image (simultaneously removing the image background) and (ii) identification of pixels, within these objects, that correspond to bad coating regions. Both of these subproblems constitute twoclass classification problems, to be applied at the pixel level (“is”/“is not” a flake, and “is”/“is not” a well-coated region,

Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009 993

respectively). The classification scheme adopted here is based upon the estimated Mahalanobis distance measure, which allows the incorporation of variability and correlations across the different color channels. The Mahalanobis distance of observation xi to the cluster center µX, characterized by a covariance matrix ΣX, is given by: DM(xi ;µX, ΣX) ) [(xi - µX)TΣX-1(xi - µX)]0.5

(2)

where we can notice that the squared estimated Mahalanobis distance, obtained by replacing the population parameters µX and ΣX, by their estimates, corresponds to the well-known T2 statistic. In order to properly identify the flakes (classification subtask no. 1), the mean and covariance matrices characterizing their ˆ flake color distributions must be estimated: µˆ flake X , ΣX . For doing so, several regions of interest (ROI) were interactively selected, over several images, corresponding to regions of flakes that do cover most of their color variability (Figure 4). Then, such parts of images were all unfolded and combined in a “long and lean” two-way matrix, i.e., with three columns (corresponding to the three color channels) and a large number of rows (corresponding to the pixels for the selected regions). From this data matrix, the 3 × 1 mean vector estimate, µˆ flake X , and the 3 × 3 covariance matrix estimate, Σˆ Xflake, were finally obtained. With these estimates available, it was then possible to compute the estimated Mahalanobis distances for all pixels, from one or several images, to the estimated mean vector. From an analysis of the histogram for these distances, a bimodal distribution can be easily identified, regarding the population of pixels belonging to the image background (larger distance values) and the population of pixels corresponding to flakes (smaller estimated Mahalanobis distances). Therefore, it was then possible to set a threshold in the estimated Mahalanobis distance dimension to separate the two classes (flake or image background) from this histogram. This initial threshold was then fine-tuned, by analyzing classification performance over several images, where a mask with areas classified as flakes appears superimposed to the real image, in order to evaluate the accuracy of the flakes identification procedure (Figure 3b). Regarding now classification subtask no. 2, a similar procedure was followed. Several bad coating regions were selected from different samples, in order to characterize bad coating color variability and correlation (Figure 4), and from these image pieces, a two-way matrix was assembled (after unfolding), based upon which a characteristic mean vector and covariance matrix were estimated: µˆ Xbad and Σˆ Xbad. Then, for all pixels falling inside the regions previously identified as being flakes (during subtask no. 1), the corresponding estimated Mahalanobis distance to µˆ bad X was computed, and a threshold, that separates good from bad coating regions inside the flakes, was established. With information gathered about all pixels corresponding to flakes (classification subtask no. 1) and to all bad coating areas falling inside them (classification subtask no. 2), it is now possible to compute several features potentially relevant for image analysis, such as the following: mean fraction of bad coating area in all flakes within a sample, overall fraction of bad coating area, standard deviation of the fraction of bad coating area in all flakes, and the fractions of flakes with an inadequate coating area percentage above 5%, 10%, 20%, and 30%. We have analyzed classification performance using these features, both isolated and in different combinations, and found out that a single image descriptor is good enough to provide an adequate class discrimination. It corresponds to the arithmetic average of the fraction of bad coating areas in all flakes within a sample (MFBCA):

nf

∑f

bca

i

MFBCA )

i)1

nf

(3)

where nf stands for the number of flakes contained in an image, and fibca stands for the fraction of bad coating area in flake i. Given the good separation capability achieved with MFBCA (Figure 5), the task of developing a classifier was rather simplified. However, near the class boundaries some overlapping still occurs. This superposition can either result from limitations inherent to the proposed procedure, including the image acquisition system, or from the natural difficulty of the panel of experts to correctly classify these borderline situations. Whatever the reason, we will present later a complementary procedure employed to decide about samples that fall in such uncertain or “grey” MFBCA regions. In order to set up an adequate classifier, we first studied the distributions for the mean fraction of bad coating area (MFBCA) in the three quality grades. During this analysis, we found out that Gaussian probability distributions describe quite well the behavior of data within each of such classes (Figure 6), even though in the bad grade the fitting is not as good as for the other classes. Relying on such probability distribution assumptions, it was then possible to estimate the statistical parameters associated with each class (i.e., the class mean, µˆ j, and variance, σˆ j2, for grades j ≡ {good, medium, bad}). From such estimates, we could then derive the class-conditional probability density distributions, f (X|class j) ≡ N(Xˆ;µˆ j,σˆ j2), where N(Xˆ;µˆ j,σˆ j2) denotes a Gaussian probability density function with parameters µˆ j and σˆ j2 for variable X, and use them to support the implementation of a Bayes classifier, which is known to have optimal characteristics with respect to the minimization of classification error probability.25,26 Such a Bayes classifier consists of assigning to each new sample the class that most likely (i.e., with higher probability) explains its behavior. In other words, it consists of computing the posterior probabilities for each class, P(class j|xi), according to the Bayes rule, P(class k|xi) )

f (xi|class k)P(class k) 3

∑ f (x |class j)P(class j) i

j)1

)

f (xi|class k)P(class k) P(xi)

(4)

where P(class k) stands for the prior probability of class k (we have assumed equal prior probabilities for each class), and assigning to a given sample the class whose posterior probability score is the highest class ) k, if P(class k|xi) ) max P(class j|xi) j){1,2,3}

(5)

where indices 1, 2, and 3 stand for classes good, medium, and bad, respectively. In practice, this method can be easily implemented after computing the boundaries that separate the different classes, hence defining regions for the values of MFBCA associated with each of the classes of product quality. Such boundaries are computed by equating the posterior probabilities of adjacent classes and solving the resulting equation for xi. Figure 7 illustrates such class boundaries, along with samples for each quality grade. It is possible to see that this classifier performs

994 Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009 Table 2. Main Steps in the Training and Testing Stages of the Proposed MIA Approach for Evaluation of Cereal Flakes’ Coating Quality training stage

testing stage

1. Acquire image prototypes representative of the different quality grades of cereal flakes coating. 2. Select representative regions of interest (ROI) for flake identification (classification subtask no. 1) and bad coating regions selection (classification subtask no. 2). 3. Compose the corresponding 2-way matrices (by combining all regions selected for each problem), and compute/store the column means and covariances estimates, to be used in the two ˆ flake classification subtasks: µˆ flake X ,ΣX , ˆ bad and µˆ bad X , ΣX . 4. Select the thresholds to be used for both classification subtasks (analyzing histograms of the estimated Mahalanobis distances computed from image pixels). 5. Determine image features to be used in the classification of product quality grades (MFBCA), for all images in the training set. 6. Develop a classifier for quality grade estimation. 7. Analyze the classifier performance and develop a procedure for handling uncertainty classification regions near class boundaries, using a fuzzy-logic approach.

1. Acquire image for new sample. 2. Unfold data in two-way matrix. 3. Identify flakes and bad coating areas. 4. Compute MFBCA (image feature adopted for classification of coating quality). 5. Introduce such a value on the classifier, and get estimated product quality grade from it. 6. If the value computed in step 4 falls in one of the uncertain regions, repeat sampling and use the proposed fuzzy logic approach, assigning samples to the class grade for which the sum of membership values is higher.

well in general, but there are some overlapping regions, close to the class boundaries, where classification is more uncertain. In order to accommodate for this superposition problem, we drew additional auxiliary lines, corresponding to error margins of 10% around the corresponding boundary lines. These lines define “grey” regions, where our classification becomes less certain. Therefore, we derived an additional complementary procedure for dealing with samples that present MFBCA values falling within such “grey” regions. Such a methodology is based on fuzzy logic reasoning and consists of assigning a membership degree to each class that surrounds the “grey” regions, ranging between 0 and 1 (minimum and maximum membership, respectively). Outside these regions, it is assumed that clas-

sification uncertainty is not significant, and the membership level is unitary for the class in question, and zero for the other two (Figure 8). When a sample falls within a “grey” region, two additional samples from the same batch are collected, leading to a total of three MFBCA scores. Product quality grade is then assigned to the class for which the sum of membership scores is higher: 3

class ) k, if

∑ i)1

3

Mk(i) ) max

j){1,2,3}

∑ M (i) j

(6)

i)1

where Mk(i) stands for the membership score in class k, for sample i. The classification scores obtained for the images analyzed, appear summarized in the following classification table (Table 1). The mains steps of the proposed classification procedure, described so far, are summarized in Table 2. Discussion In the proposed approach, our preliminary classification subtasks, necessary for identifying flakes and bad coating regions from images, were based on the estimated Mahalanobis distance of pixels, as multivariate observations, to an adequate reference vector of RGB intensity means. A possible alternative procedure, particularly useful when the covariance matrices, for such multivariate observations, are ill conditioned, would be to conduct such classification subtasks in the PCA subspace, where distances can also be computed using PCA scores1,2 and/or residuals.6 However, since the two covariance matrices obtained in this particular application have a small dimension (three) and show no significant illconditioning problems with respect to matrix inversion, we decided rather to perform distance computations in the original, untransformed color space, where the identification of class regions was also found to be quite easy, involving only the selection of a single threshold per classification problem (i.e., the minimum requirement on the number of adjustable parameters, which also holds for a latent variable approach strictly based on the scores space). In order to validate this option, we present in the next paragraphs the results of a comparative study, involving as alternatives the approaches based upon a latent variable description of the pixels color correlation structure. Thus, the following situations were considered here (Figure 9): A. analysis in the original space (RGB color space);

Figure 9. Preliminary training stages required to implement the three approaches under comparison: (a) approach based on the original variable space; (b) latent variable approach using the DM2 statistic; (c) latent variable approach using the DM2 and Q statistics.

Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009 995

Figure 10. MIA in the original variable space (RGB color space): histograms for the estimated Mahalanobis distance from each pixel to the reference center, regarding the task of selecting pixels corresponding to flakes (left) and to bad coating areas (right).

Figure 11. MIA in the latent variable subspace (scores space): histograms for the estimated Mahalanobis distance from each pixel to the reference center, regarding the task of selecting pixels corresponding to flakes (left) and to bad coating areas (right).

Figure 12. Boxplots of MFBCA obtained following the implementation of method A (analysis in the original variable space; on the left), method B (analysis in the latent variable space based on the scores subspace; in the center), and method C (analysis in the latent variable space based on the scores and residuals subspaces; on the right).

B. analysis in the latent variable space, using only the DM2 statistic (defined as the squared-estimated Mahalanobis distance in the scores subspace);

C. analysis in the latent variable space, using both the DM2 and the Q statistics (the Q statistic is also referred to as SPE, the squared prediction error, given by the sum of squares of

996 Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009

Figure 13. Boxplots of JInter/Intra after 1000 bootstrap replications of the MFBCA values obtained after implementing the three methods.

Figure 14. Two samples regarding different quality grades: (a) good and (b) bad.

Figure 15. Identification of bad coating regions in the score space (using software MACCMIA, developed by the McMaster University Advanced Control Consortium and available at http://macc.mcmaster.ca/maccmia/maccmia.htm).

the PCA residuals obtained for each observation, i.e., the squared Euclidean distance of each observation to its projection on the PCA subspace). Each one of these approaches does require the preliminary definition of thresholds in order to select those pixels belonging to flakes, and, within these, those belonging to areas with poor or bad coating, which are the two essential identification subtasks required to compute the mean fraction of bad

coating area (MFBCA). In this regard, approach A requires two such thresholds, one for each task, which are established from an analysis of histograms such as the one presented in Figure 10. For instance, for identification task no. 1 (flake identification), the first and more elongated spike corresponds to flakes’ pixels, whereas the second spike regards pixels belonging to the background and shadows. The same type of pixel distribu-

Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009 997

tions can be found when addressing approach B, the latent variable approach in the scores space (using the D statistic), where the nature of the analysis and procedures for establishing the thresholds are essentially the same (Figure 11). In the implementation of approach C, one further condition to be satisfied is added to each identification task, involving the sum of squared residuals (Q or SPE) for each pixel analyzed, and for both of them, thresholds must be specified. In this approach both thresholding conditions must be simultaneously met, in order to select the pixels in each identification task. Thresholds were always established by analyzing these types of histograms, after which fine-tuning was performed in order to optimize the patterns to be extracted in multiple images. The results obtained for the MFBCA values computed, for each sample image, after applying the three methods mentioned to select pixels relative to flakes and bad coating areas, are summarized in the boxplots presented in Figure 12. The separation ability achieved by applying each method can be also assessed through the computation of class separation metrics, such as the ratio between the interclass variability and the intraclass variability, usually known as the inter/intra distance (JInter/Intra). For a data matrix X (n × m, n being the number of observations available for all classes and m the number of variables), from which we can define its submatrices for each class k denoted by Xk (k ) 1:K; K is the number of classes), with the same number of columns (variables) as X, but with as many rows as the number of samples available for each class, such a distance is computed as follows: JInter⁄Intra ) trace(W-1B)

(7)

where K

W)

∑ (X - 1x¯ ) (X - 1x¯ ) k

T T k

k

T k

(8)

k)1

B ) T - W ) (X - 1x¯T)T(X - 1x¯T) - W

(9)

W is the sample within-group matrix, B the sample betweengroup matrix, and T the total data variation matrix; xjk is the k-class mean vector and xj is the mean vector for all data; 1 is an n × 1 column vector of ones. In order to compare the JInter/Intra distance for the three approaches under analysis, taking into account data variability, 1000 bootstrap samples were generated from the MFBCA values thus obtained (represented in Figure 12), leading to the distributions shown in Figure 13. The essential conclusion to be drawn from the analysis of these results is that no significant difference in the separation ability can be attributed or inferred for any particular method of computing MFBCA. Therefore, the use of the method with the simplest structure, method A, is well-justified for our particular application, without any loss of classification performance. Conclusions In this paper we present a methodology for offline evaluation of cereal flake coating quality. The proposed methodology is objective, stable, and fast, therefore avoiding subjectivity and, sometimes, inconsistency of human evaluation, as well as reducing allocation of highly skilled personnel in routine, timeintensive, activities. Our product quality grading algorithm essentially consists of implementing a supervised classification methodology, based on the estimated Mahalanobis distance, to assess the distance, in the color space, from each image pixel to a reference quality standard, allowing for the incorporation

of the natural variability and color correlations present in cereal flakes. The quality control procedure also incorporates fuzzy logic reasoning for samples falling in regions closer to the quality classes’ boundaries. The classification performance of this quality monitoring procedure was found out to be quite good, leading to results consistent with those obtained by a panel of plant experts. Results obtained so far indicate that it is indeed possible to classify different samples of flakes according to quality classes previously defined and should remain valid for other analogous problems, involving noncongruent, stochastic image analysis. The methods and results presented here also provide a sound basis for further developments, in particular regarding the generation of alternative features for supporting product quality monitoring, and the implementation of similar procedures online, in situ. In order to bring quality assessment closer to the production line and real time continuous product quality monitoring, we did study the ability of a similar system to detect coating problems online, when the product is being carried over conveyor belts (Figure 14). We conducted preliminary tests using two classification procedures for handling this problem. One is based on the estimated Mahalanobis distance in the original variables space, while the other one follows a masking approach in the PCA scores space (Figure 15).1-4 The online classification procedure based on the estimated Mahalanobis distance has now a simpler structure, when compared to its offline counterpart, as we do not need now to isolate the flakes, since the images are now totally filled with flakes (however, some attention was paid to the background removal of small areas, not covered by flakes, or relative to uninformative shadows). Preliminary results thus obtained point toward an improved performance of the estimated Mahalanobis distance classification procedure over the congener PCA-based approach, which may be explained in part by difficulties found regarding the definition of an adequate mask in the scores space. However, neither of these approaches attains the classification performance levels obtained with our offline procedure for coating quality evaluation. One important observation that arises from the analysis of image samples (Figure 14) regards the arbitrary nature of flakes orientation. This implies that a large number of flakes, in each image, is not facing upward, thus hiding their concave face, which is the one where coating problems arise. If we consider that a fraction of bad coating with less than 6% of the area (concave face) is enough to classify a sample as medium, we realize that the problem may pass undetected through the online image analysis, due to the arbitrary orientation of flakes, that easily masks such low percentages of bad coating occurrences. One possibility to overcome this limitation would be to rely on the law of large numbers, increasing the number of sampled images from which the classifying features values are extracted, and their average values computed. Theoretically, for a sufficiently high number of images, even small differences in the features mean values will be detected. On the other hand, the use of alternative image acquisition systems, that acquire images at other wavelength bands, with more sensitivity to coating problems, may also facilitate the classification task at hand for online implementation. As an alternative to the online procedure, we also propose another one to be implemented at-line, by an operator supervising the production line, using an adapted version of the protocol developed for offline product quality control. Following a wellestablished sampling procedure and schedule, the operator

998 Ind. Eng. Chem. Res., Vol. 48, No. 2, 2009

collects product samples, turns all the flakes in a plate so that all of them end up with their concave side facing upward, inserts them into an image acquisition system, which immediately provides an evaluation of the flakes coating quality. The whole procedure takes, at most, a few minutes, is as reliable as the laboratory classification procedure, it is also fast and inexpensive. Besides getting more information about the performance of alternative methodologies for dealing with the difficulties raised by online quality monitoring, described in the previous paragraphs, future work may also involve the analysis of alternative wavelength bands. This task, along with further improvements in the image segmentation methodology, will be critical for the development of a successful system for online image-based product quality monitoring and control applied to cereals manufacturing. Another research line worthwhile considering is the integration of spatial information in the analysis, along with the color information of pixels. This can be properly achieved through several techniques, in particular wavelet texture analysis (WTA), which implies a preliminary stage where the 2D wavelet transform is computed in the regions of interest for the analysis. As such regions have now irregular boundaries (the shapes of flakes), very different from the usual rectangular domains, the application of such state-of-the-art technique is not straightforward, requiring further research. Literature Cited (1) Geladi, P.; Grahn, H. MultiVariate Image Analysis; Wiley: Chichester, 1996. (2) Bharati, M. H.; MacGregor, J. F. Multivariate Image Analysis for Real-Time Process Monitoring and Control. Ind. Eng. Chem. Res. 1998, 37, 4715–4724. (3) Bharati, M. H.; MacGregor, J. F.; Tropper, W. Softwood Lumber Grading through On-line Multivariate Image Analysis Techniques. Ind. Eng. Chem. Res. 2003, 42, 5345–5353. (4) Yu, H.; MacGregor, J. F. Monitoring Flames in an Industrial Boiler Using Multivariate Image Analysis. AIChE J. 2004, 50 (7), 1474–1483. (5) Yu, H.; MacGregor, J. F.; Haarsma, G.; Bourg, W. Digital Imaging for Online Monitoring and Control of Industrial Snack Food Processes. Ind. Eng. Chem. Res. 2003, 42, 3036–3044. (6) Prats-Montalba´n, J. M.; Ferrer, A. Integration of Colour and Textural Information in Multivariate Image Analysis: Defect Detection and Classification Issues. J. Chemom. 2007, (21), 10–23. (7) Bharati, M. H.; Liu, J. J.; MacGregor, J. F. Image Texture Analysis: Methods and Comparisons. Chemom. Intell. Lab. Syst. 2004, 72, 57–71.

(8) Antonelli, A.; Cocchi, M.; Fava, P. Automated Evaluation of Food Colour by Means of Multivariate Image Analysis Coupled to a Waveletbase Classification Algorithm. Anal. Chim. Acta 2004, 515, 3–13. (9) Koehler, F. W., IV; Lee, E.; Kidder, L. H.; Lewis, E. N. Near Infrared Spectroscopy: The Practical Chemical Imaging Solution. Spectros. Eur. 2002, 14 (3), 12–19. (10) Lied, T. T.; Esbensen, K. H. Principles of MIR, Multivariate Image Regression I: Regression Typology and Representative Application Studies. Chemom. Intell. Lab. Syst. 2001, 58, 213–226. (11) Burger, J.; Geladi, P. Hyperspectral NIR Image Regression Part I: Calibration and Correction. J. Chemom. 2005, 19, 355–363. (12) Burger, J.; Geladi, P. Hyperspectral NIR Image Regression Part II: Dataset Preprocessing Diagnostics. J. Chemom. 2006, 20, 106–119. (13) Eriksson, L.; Wold, S.; Trygg, J. Multivariate Analysis of Congruent Images (MACI). J. Chemom. 2005, 19, 393403) . (14) Ja¨hne, B. Digital Image Processing, 2nd ed.; Springer-Verlag: Berlin, 1993. (15) van der Heijden, F. Image Based Measurement Systems - Object Recognition and Parameter Estimation; Wiley: Chichester, 1994. (16) De Anda, J. C.; Wang, X. Z.; Roberts, K. J. Multi-Scale Segmentation Image Analysis for the In-Processing Monitoring of Particle Shape with Batch Crystallisers. Chem. Eng. Sci. 2005, 60, 1053–1065. (17) Gonzalez, R. C.; Woods, R. E. Digital Imaging Processing; Addison-Wesley: Reading, MA, 1992. (18) Loebl, J. Image Analysis: Principles and Practice; Oxford University Press: New York, 1985. (19) MacGregor, J. F.; Kourti, T. Statistical Process Control of Multivariate Processes. Control Eng. Pract. 1995, 3 (3), 403–414. (20) Wise, B. W.; Gallagher, N. B. The Process Chemometrics Approach to Process Monitoring and Fault Detection. J. Process Control 1996, 6 (6), 329–348. (21) Estienne, F.; Pasti, L.; Centner, V.; Walczak, B.; Despagne, F.; Rimbaud, D. J.; de Noord, O. E.; Massart, D. L. A Comparison of Multivariate Calibration Techniques Applied to Experimental NIR Data Sets. Part II Predictive Ability Under Extrapolation Conditions. Chemom. Intell. Lab. Syst. 2001, 58, 195–211. (22) Martens, H.; Naes, T. MultiVariate Calibration; Wiley: Chichester, 1989. (23) Smilde, A. K.; Bro, R.; Geladi, P. Multi-way Analysis with Applications in the Chemical Sciences; Wiley: Chichester, UK, 2004. (24) Liu, J. J.; Bharati, M. H.; Dunn, K. G.; MacGregor, J. F. Automatic Masking in Multivariate Image Analysis using Support Vector Machines. Chemom. Intell. Lab. Syst. 2005, 79, 42–54. (25) Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer: New York, 2001. (26) Theodoridis, S.; Koutroumbas, K. Pattern Recognition, 2nd ed.; Elsevier: Amsterdam, 2003.

ReceiVed for reView November 26, 2007 ReVised manuscript receiVed October 28, 2008 Accepted October 28, 2008 IE071610B