14 Multiple Regression Modelling of Functionality MAC R. HOLMES
Downloaded by CORNELL UNIV on August 16, 2016 | http://pubs.acs.org Publication Date: March 6, 1981 | doi: 10.1021/bk-1981-0147.ch014
Department of Agricultural Economics, University of Georgia, Georgia Experiment Station, Experiment, GA 30212
Use of multiple regression techniques in the study of functional properties of food proteins is not new (1-6). Most food scientists have some familiarity with basic statistical concepts and some access to competent statistical advice. At least one good basic text on statistical modelling for biological scientists exists (7). A number of more advanced texts covering use of regression in modelling are available (8, 9). The objectives of this paper are to present some potential uses of regression techniques in food protein research, to discuss some desirable steps in the modelling process, to present an example of the rationale underlying development of a model, and to discuss some potential statistical problems which might arise. An Example of a Regression Model. The basic structure for a regression model having two independent variables is as follows: Y = a + bX + bX + e 1
1
2
2
where Y = the dependent variable, a = the intercept, or constant, of the equation, X^ = the ith independent variable, b.^ = the c o e f f i c i e n t of X , i
e = the v a r i a t i o n i n Y not explained by the preceding v a r i a b l e s and c o e f f i c i e n t s ; "e" i s assumed to be normally d i s t r i b u t e d with mean of zero. The model form i m p l i e s that v a r i a t i o n s i n X. and X cause v a r i a t i o n s i n Y but that some v a r i a t i o n i n Y i s due t o a random component (e) of Y. Since "e" has an expected value of zero, i t i s o r d i n a r i l y not r e f e r r e d t o i n l i s t i n g the estimated equation. 2
0097-6156/81/0147-0299$05.00/0 © 1981 American Chemical Society
Cherry; Protein Functionality in Foods ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Downloaded by CORNELL UNIV on August 16, 2016 | http://pubs.acs.org Publication Date: March 6, 1981 | doi: 10.1021/bk-1981-0147.ch014
300
P R O T E I N
F U N C T I O N A L I T Y
IN
FOODS
The o b j e c t i v e s of e s t i m a t i n g such an equation can be summarized as f o l l o w s : (1) To estimate the c o e f f i c i e n t s - a, b^, b~, f o r example - and thus the e f f e c t s of v a r i a t i o n s i n and X2 ( f o r example) on the l e v e l of Y, i . e . to estimate the response s u r f a c e . I n food p r o t e i n research, the dependent v a r i a b l e could be a f u n c t i o n a l property of the p r o t e i n m a t e r i a l o r some mathematical transformation t h e r e o f , and the independent v a r i a b l e s (the X's) could be c o n t r o l l a b l e c o n d i t i o n s hypothesized t o a f f e c t the f u n c t i o n a l property - such as pH, heat, o r s a l t conc e n t r a t i o n - and/or mathematical transformations of these c o n d i t i o n s . Throughout the remainder o f t h i s paper, the measured c o n d i t i o n s which form the bases f o r the independent v a r i a b l e s w i l l be r e f e r r e d t o as the factors. ( 2 ) To estimate the s t a t i s t i c a l s i g n i f i c a n c e of the e f f e c t s , i . e . , the c o e f f i c i e n t s , of the independent v a r i a b l e s on the dependent v a r i a b l e u s i n g t - t e s t s of s i g n i f i c a n c e , t o estimate the s t a t i s t i c a l s i g n i f i c a n c e of the o v e r a l l model using the F - t e s t , and to estimate the percentage of v a r i a t i o n i n the dependent v a r i a b l e explained by the equation using the c o e f f i c i e n t of determination. Development of a model must be based on a theory, o r t h e o r i e s , concerning the e f f e c t s of the f a c t o r s on the funct i o n a l property being s t u d i e d . Such t h e o r i e s , or hypotheses, can be based on p r i o r research r e s u l t s , t h e o r i e s developed by o t h e r s , c o l l e c t i o n and p r e l i m i n a r y a n a l y s i s of data and, perhaps, i n t u i t i o n . In sum, the hypotheses are i m p l i e d from what i s already known or hypothesized. A prime requirement f o r use of r e g r e s s i o n i s that there must be some way of o b j e c t i v e l y measuring l e v e l s of the f u n c t i o n a l property and of the f a c t o r s i n order to provide data to be used i n e s t i mation of the model c o e f f i c i e n t s . The Experimental
Design
To develop a model f o r a p a r t i c u l a r research experiment to study f u n c t i o n a l p r o p e r t i e s of a food p r o t e i n , the r e searcher must o b v i o u s l y have some knowledge concerning which f a c t o r s are p o t e n t i a l l y important determinants of the l e v e l of the f u n c t i o n a l property t o be s t u d i e d . The researcher must s e l e c t 1) each of the f a c t o r s f o r which there i s to be some v a r i a t i o n i n l e v e l w i t h i n the experiment, 2 ) the l e v e l s to be used f o r each v a r i a b l e f a c t o r , 3) the number of combin a t i o n s of l e v e l s of the d i f f e r e n t f a c t o r s to be used, and 4) the l e v e l a t which each n o n v a r i a b l e , but p o t e n t i a l l y imp o r t a n t , f a c t o r i s t o be set throughout the experiment.
Cherry; Protein Functionality in Foods ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Downloaded by CORNELL UNIV on August 16, 2016 | http://pubs.acs.org Publication Date: March 6, 1981 | doi: 10.1021/bk-1981-0147.ch014
14.
H O L M E S
Multiple Regression Modelling
These c o n s i d e r a t i o n s w i l l be primary determinants of the experimental design, which w i l l determine the s t a t i s t i c a l p r e c i s i o n and r e l i a b i l i t y of the models estimated. The number of l e v e l s of any one f a c t o r which should be i n c l u d e d i n the experimental design w i l l depend l a r g e l y on the f o l l o w i n g c o n s i d e r a t i o n s : 1) the research o b j e c t i v e s ( I s t h i s f a c t o r of primary i n t e r e s t i n t h i s r e s e a r c h ? ) , 2) the degree of c e r t a i n t y attached to current knowledge of the e f f e c t s of the f a c t o r , and 3) the degree of p r o b a b i l i t y that t h i s f a c t o r i n t e r a c t s w i t h other f a c t o r s , which a r e d e f i n i t e l y of primary i n t e r e s t , t o determine the l e v e l s o f the f u n c t i o n a l property ( o r p r o p e r t i e s ) to be s t u d i e d . S e t t i n g the l e v e l of an important c a u s a t i v e f a c t o r a t an a r b i t r a r y p o i n t ( s ) could s e r i o u s l y b i a s r e s u l t s of a n a l y s i s of the experimental data. In s e t t i n g the number of l e v e l s t o be s t u d i e d of any one v a r i a b l e f a c t o r , the type of e f f e c t s which i t i s l i k e l y t o have on the f u n c t i o n a l p r o p e r t i e s t o be s t u d i e d i s very important. I f i t s e f f e c t s a r e known t o be l i n e a r and that f a c t o r i s o f secondary importance t o the researcher, then two l e v e l s (one at each end of some p r a c t i c a l range of l e v e l s ) may be s u f f i c i e n t . I f , on the other hand, i t i s known t h a t the e f f e c t s of t h i s f a c t o r a r e c u r v i l i n e a r and/or d i s continuous a t some p o i n t , then a t l e a s t three l e v e l s should be i n c l u d e d i n the experimental design. I f the i n t e r a c t i o n of a f a c t o r w i t h other f a c t o r s i s known t o be s i g n i f i c a n t , then t h i s too could be s u f f i c i e n t reason to i n c l u d e more than two l e v e l s of that f a c t o r i n the design. M u l t i p l e r e g r e s s i o n , a l s o c a l l e d o r d i n a r y l e a s t squares, can f r e q u e n t l y provide reasonably p r e c i s e , r e l i a b l e estimates of c o e f f i c i e n t s even i f the data analyzed a r e unbalanced, i.e., have "missing c e l l s " . However, a l l p o s s i b l e comb i n a t i o n s of a l l l e v e l s of a l l v a r i a b l e f a c t o r s i n c l u d e d must be used i f maximum s t a t i s t i c a l p r e c i s i o n and r e l i a b i l i t y of the estimated c o e f f i c i e n t s i s d e s i r e d . I f the experiment i s very l a r g e ( i n v o l v i n g s e v e r a l l e v e l s of s e v e r a l f a c t o r s ) , then i t may be d e s i r a b l e t o leave out some combinations. The researcher should weigh the c o s t s of e x c l u s i o n of comb i n a t i o n s - l o s s of s t a t i s t i c a l p r e c i s i o n and r e l i a b i l i t y against the costs of i n c l u s i o n of the combinations - u s u a l l y time and money. This may best be done w i t h the advice of a statistician. One frequent problem i n food science research i s how t o d e f i n e a s i n g l e o b s e r v a t i o n f o r r e g r e s s i o n a n a l y s i s (an o b s e r v a t i o n i s composed of a measurement of the dependent v a r i a b l e , before any t r a n s f o r m a t i o n , s t u d i e d along w i t h the corresponding l e v e l s o f the v a r i a b l e f a c t o r s ) . An experimental u n i t i s g e n e r a l l y defined as the u n i t of m a t e r i a l t o
Cherry; Protein Functionality in Foods ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Downloaded by CORNELL UNIV on August 16, 2016 | http://pubs.acs.org Publication Date: March 6, 1981 | doi: 10.1021/bk-1981-0147.ch014
302
P R O T E I N
F U N C T I O N A L I T Y
IN
FOODS
which a s i n g l e treatment (combination of l e v e l s of the v a r i a b l e f a c t o r s ) i s a p p l i e d ; a sampling u n i t i s that part of the experimental u n i t on which the e f f e c t s of a treatment are measured (_7, p. 90). F r e q u e n t l y , when chemical analyses are c a r r i e d out to measure a f u n c t i o n a l property of a food p r o t e i n m a t e r i a l , m u l t i p l e determinations w i l l be made on each sampling u n i t and/or more than one sample w i l l be drawn from each experimental u n i t . The problem i s whether to i n c l u d e the r e s u l t s of each and every determination (or sample) as an observation o r t o average determinations (or samples) t o o b t a i n a s i n g l e o b s e r v a t i o n f o r each sampling (or experimental) u n i t . Three r u l e s have been recommended f o r making such a choice (10). I f the number of experimental u n i t s i s s m a l l , i f the c o e f f i c i e n t of v a r i a t i o n f o r each f u n c t i o n a l property measured w i t h i n each experimental u n i t i s s m a l l , and the determinations (or samples) are r e l a t i v e l y homogeneous, then each determination (or sample) may be included i n a separate o b s e r v a t i o n ; the values of the v a r i a b l e f a c t o r s as set w i t h i n each experimental u n i t a r e repeated f o r each observation based on that experimental u n i t . I f , on the other hand, the number of experimental u n i t s i s high and the c o e f f i c i e n t of v a r i a t i o n f o r each f u n c t i o n a l property measured w i t h i n each experimental u n i t i s h i g h , then a l l of the values determined (or sampled) of the f u n c t i o n a l property w i t h i n each experimental u n i t should be averaged t o o b t a i n one o b s e r v a t i o n per experimental u n i t . E i t h e r method i s recommended i f the number of experimental u n i t s i s large and the c o e f f i c i e n t s of v a r i a t i o n w i t h i n the experimental u n i t s are s m a l l . I f the number of experimental u n i t s i s s m a l l and the c o e f f i c i e n t s of v a r i a t i o n w i t h i n the experimental u n i t s are l a r g e , expand the number of experimental u n i t s and average the determination (or sample) values to o b t a i n a s i n g l e o b s e r v a t i o n per experimental u n i t (10). Examining the Data Once an experimental design i s s e l e c t e d and the data a r e c o l l e c t e d , then p r e l i m i n a r y analyses of the data should be c a r r i e d out. Some simple s t a t i s t i c s - such as means, ranges, standard d e v i a t i o n s , e t c . - should be c a l c u l a t e d to f a m i l i a r i z e the researcher w i t h the data and t o serve as bases f o r comparison w i t h previous research. These s t a t i s t i c s can a l s o be u s e f u l when graphing equations estimated l a t e r . A second type of a n a l y s i s which can be extremely u s e f u l under some c o n d i t i o n s i s p l o t t i n g of the dependent v a r i a b l e ( f u n c t i o n a l property) data against the corresponding l e v e l s of each v a r i a b l e f a c t o r . When only one v a r i a b l e f a c t o r i s included i n the experiment, then p l o t t i n g of the r e s u l t i n g l e v e l s of the dependent
Cherry; Protein Functionality in Foods ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Downloaded by CORNELL UNIV on August 16, 2016 | http://pubs.acs.org Publication Date: March 6, 1981 | doi: 10.1021/bk-1981-0147.ch014
14.
H O L M E S
Multiple Regression Modelling
303
v a r i a b l e (or f u n c t i o n a l property) a g a i n s t the corresponding v a r i a b l e f a c t o r l e v e l s w i l l a l l o w the researcher t o see i f the data f a l l on a s t r a i g h t l i n e , f a l l i n a c u r v i l i n e a r p a t t e r n , show any evidence of d i s c o n t i n u i t y around any l e v e l of the v a r i a b l e f a c t o r , o r i n d i c a t e very l i t t l e r e l a t i o n s h i p between l e v e l s of the f u n c t i o n a l property and those of the v a r i a b l e f a c t o r ( t h i s would be i n d i c a t e d by an obvious s c a t t e r i n g of data p o i n t s w i t h no d i s c e r n i b l e p a t t e r n s ) . Figure 1 shows a p o s s i b l e p l o t of p o i n t s f o r an experiment. Figure 2 shows a curve which might f i t these data i n F i g u r e 1 and a l s o i n d i c a t e s the fo^rm of the equation graphed i n F i g u r e 2, Y = a + b j X ^ b 2 X . I n t h i s case, two independent v a r i a b l e s , X and X , are formed from the one v a r i a b l e f a c t o r , X, i n order t o f i t a s u i t a b l e n o n l i n e a r equation. However, i t should be noted that some n o n l i n e a r equation forms which may be most s u i t a b l e f o r some data a r e n o n l i n e a r i n the c o e f f i c i e n t s ; these cannot be estimated using o r d i n a r y r e g r e s s i o n techniques. These techniques have been discussed elsewhere (8, 9 ) . One problem which may sometimes be most e a s i l y detected using p l o t s of the data i s that of d e t e c t i n g " o u t l i e r s " , o r "bad" data p o i n t s . These may have r e s u l t e d from improper a p p l i c a t i o n of experimental techniques, i n c o r r e c t measurements, o r other f a c t o r s not accounted f o r i n the experimental design. Such data may be excluded from the r e g r e s s i o n ana l y s i s . However, care should be taken t o not exclude l e g i t imate data p o i n t s a r i s i n g from random v a r i a t i o n i n a funct i o n a l property o r from v a r i a t i o n due t o the c o n s i s t e n t i n f l u e n c e of v a r i a b l e f a c t o r s which should have been i n c l u d e d i n the a n a l y s i s ( f a c t o r s the i n f l u e n c e of which could not have been excluded). However, when more than one v a r i a b l e f a c t o r i s i n c l u d e d i n an experiment, p l o t t i n g of the f u n c t i o n a l property s t u d i e d a g a i n s t each of the v a r i a b l e f a c t o r s may not be u s e f u l and may indeed be m i s l e a d i n g . P l o t t i n g of data p o i n t s when there are two o r more v a r i a b l e f a c t o r s i s g e n e r a l l y u s e f u l only when there are s e v e r a l values of a v a r i a b l e f a c t o r f o r each of one o r more sets of f i x e d values of a l l other f a c t o r s . Otherwise, s e p a r a t i o n of the i n f l u e n c e s of d i f f e r e n t f a c t o r s on the f u n c t i o n a l property w i l l u s u a l l y be impossible on the b a s i s of p l o t s of the data. I f p l o t t i n g cannot be used i n the s e l e c t i o n of a proper geometric r e p r e s e n t a t i o n (and the corresponding equation form) of the i n f l u e n c e of each v a r i a b l e f a c t o r being s t u d i e d , then accepted theory and past research r e s u l t s may a i d the researcher i n s e l e c t i o n of v a r i a b l e and equation forms t o be used. W i t h i n l i m i t s , the r e s u t s of t - t e s t s of r e g r e s s i o n c o e f f i c i e n t s may be used t o s e l e c t the most a p p r o p r i a t e v a r i a b l e s and v a r i a b l e transformations f o r use i n the f i n a l
Cherry; Protein Functionality in Foods ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
304
P R O T E I N
F U N C T I O N A L I T Y
VI
Downloaded by CORNELL UNIV on August 16, 2016 | http://pubs.acs.org Publication Date: March 6, 1981 | doi: 10.1021/bk-1981-0147.ch014
ιομ
10
15
Figure 1. Hypothetical plot of data points
Figure 2. Hypothetical graph of a quadratic equation
Cherry; Protein Functionality in Foods ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
IN
FOODS
14.
H O L M E S
Multiple Regression Modelling
305
model. Thorough d i s c u s s i o n s of p l o t t i n g , equation forms, v a r i a b l e t r a n s f o r m a t i o n s and s e l e c t i o n of independent v a r i ables have been published elsewhere (8, Chapters 3-6; Chapters 3-8).
Downloaded by CORNELL UNIV on August 16, 2016 | http://pubs.acs.org Publication Date: March 6, 1981 | doi: 10.1021/bk-1981-0147.ch014
An A p p l i c a t i o n of M u l t i p l e Regression Techniques At t h i s p o i n t , i t seems u s e f u l to examine an example of the a p p l i c a t i o n of m u l t i p l e r e g r e s s i o n techniques t o a n a l y s i s of experimental data f o r which r e s u l t s have already been obtained and p u b l i s h e d . McWatters and Holmes (4) developed m u l t i p l e r e g r e s s i o n models of the e f f e c t s of pH and s a l t c o n c e n t r a t i o n on f u n c t i o n a l p r o p e r t i e s of soy f l o u r . Design of the experiment and s e l e c t i o n of the f a c t o r s t o be i n c l u d e d were based, i n p a r t , on e a r l i e r f i n d i n g s that emulsion c a p a c i t y of d e f a t t e d peanut meal was i n h i b i t e d around the i s o e l e c t r i c p o i n t (ca pH 4.0) (2). In the soy f l o u r research, three l e v e l s of s a l t concent r a t i o n (0.0, 0.1, and 1.0 M NaCl) were used, and nine l e v e l s of pH from 2.0 through 10.0 were used w i t h a l l three l e v e l s of s a l t c o n c e n t r a t i o n . When measurements of emulsion c a p a c i t y were p l o t t e d a g a i n s t pH w i t h i n each l e v e l of s a l t c o n c e n t r a t i o n , a number of conclusions were evident. F i r s t , the data p l o t s were r a d i c a l l y d i f f e r e n t a t the h i g h s a l t c o n c e n t r a t i o n (1.0 M NaCl) from the p l o t s a t the lower s a l t concentrations (0.0 and 0.1 M NaCl). At the h i g h s a l t conc e n t r a t i o n , the p l o t resembled a smoothly r i s i n g curve which f l a t t e n e d out a t the h i g h end of the pH range. But, a t the lower s a l t c o n c e n t r a t i o n s , emulsion c a p a c i t y tended t o drop o f f s h a r p l y at pH l e v e l s around 4.0. Furthermore, the i n creases i n c a p a c i t y as pH was increased above 4.0 o r decreased below 4.0 were c u r v i l i n e a r r a t h e r than l i n e a r as i n F i g u r e 3. Much the same e f f e c t s were observed f o r emulsion v i s c o s i t y and percent s o l u b l e n i t r o g e n . As noted i n the a r t i c l e , a l l of these e f f e c t s were due t o the combined e f f e c t s of s a l t c o n c e n t r a t i o n and pH on e l e c t r i c a l charges on the p a r t i c l e s i n s o l u t i o n . Given these p l o t s , r e g r e s s i o n equations were estimated which modelled the data very w e l l . The equations a r e presented i n Table 1 (Parts 1 and 2 ) . There a r e a number of key p o i n t s t o be made about the v a r i a b l e s used i n the equations. F i r s t , the equation forms used f o r the h i g h s a l t c o n c e n t r a t i o n (1.0 M NaCl) a r e simple q u a d r a t i c and cubic forms using pH and the square and cube of pH as independent v a r i a b l e s . The high s a l t c o n c e n t r a t i o n negated the emulsion i n h i b i t i n g e f f e c t s of the i s o e l e c t r i c point. The percent of the v a r i a t i o n i n the f u n c t i o n prop e r t i e s accounted f o r by these equations ranged from about 80 percent f o r emulsion v i s c o s i t y t o over 98 percent f o r emulsion capacity.
Cherry; Protein Functionality in Foods ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Downloaded by CORNELL UNIV on August 16, 2016 | http://pubs.acs.org Publication Date: March 6, 1981 | doi: 10.1021/bk-1981-0147.ch014
306
PROTEIN
Figure 3.
F U N C T I O N A L I T Y IN
Hypothetical graph of a discontinuous function
Cherry; Protein Functionality in Foods ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
FOODS
14.
307
Multiple Regression Modelling
H O L M E S
Downloaded by CORNELL UNIV on August 16, 2016 | http://pubs.acs.org Publication Date: March 6, 1981 | doi: 10.1021/bk-1981-0147.ch014
Table 1 (Part 1 ) . M u l t i p l e r e g r e s s i o n models f o r p r e d i c t i o n of n i t r o g e n s o l u b i l i t y and e m u l s i f y i n g p r o p e r t i e s of soy f l o u r as i n f l u e n c e d by pH and s a l t c o n c e n t r a t i o n . Dependent v a r i a b l e with s a l t concen tration
Variable description
Regr. Beta c o e f f i . value
% Soluble N with 0.0 & 6.1 M NaCl (R = 0.903; standard e r r o r of e s t i mate = 10.107)
Constant pH-4.0 4.0-pH Salt level (4.0-pH); (ρΗ-4.0Γ
15.935 29.715 2.165 8.63** 72.740 1.635 5.89** •140.472 -0.237 -4.17** -16.304 -0.691 -2.69** -2.597 -1.075 -4.71**
% Soluble Ν w i t h 1.0 M NaCl (R = 0.905; standard e r r o r of estimate = 4.659)
Constant
2
b
c
pH
value
22.283 15.658 6.830 6.83** -0.917 -4.886 -4.89**
Z
0.221 Constant 0.183 2.226 6.41** pH-4.0 0.451 1.694 4.44** 4.0-pH -2.419 -0.682 -4.59** Salt level (4.0-pH); -0.165 -1.166 -3.39** -0.023 -1.585 -5.20** (ρΗ-4.0Γ 0.592 0.642 4.00** (Salt l e v e l ) ( p H - 4 . 0 ) 1.114 0.314 2.32* ( S a l t level)(4.0-pH) a * S i g n i f i c a n t at 5% confidence l e v e l ; ** s i g n i f i c a n t a t the 1% l e v e l , b The v a r i a b l e described as "pH-4.0 i s given a value of zero unless the pH i s greater than 4.0. c The v a r i a b l e described as "4.0-pH" i s given a value of zero unless the pH i s l e s s than 4.0. Emulsion capa c i t y w i t h 0.0 & 0.1 M NaCl (R = 0.838; standard e r r o r of estimate = 0.081)
b
C
,f
Source:
McWatters, Κ. H., and Holmes, M. R. ( 4 ) . Journal of Food Science
Cherry; Protein Functionality in Foods ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
308
P R O T E I N
F U N C T I O N A L I T Y
IN
FOODS
Downloaded by CORNELL UNIV on August 16, 2016 | http://pubs.acs.org Publication Date: March 6, 1981 | doi: 10.1021/bk-1981-0147.ch014
Table 1 (Part 2 ) . M u l t i p l e r e g r e s s i o n models f o r p r e d i c t i o n ^ o f n i t r o g e n s o l u b i l i t y and e m u l s i f y i n g p r o p e r t i e s of soy f l o u r as i n f l u e n c e d by pH and s a l t c o n c e n t r a t i o n . Dependent variable with s a l t concen tration
Variable description
Emulsion c a p a c i t y w i t h 1.0 M NaCl (R = 0.983; standard e r r o r of e s t i mate = 0.010)
Constant PH
Emulsion v i s c o s i t y w i t h 0.0 and 1.0 M NaCl (R = 0.785; standard e r r o r of estimate = 10,210.917)
Constant pH-4.0 4.0-pH , (4.0-ρΗ)ί (pH-4.0)"
Regr. coeffi.
Beta value
tvalue
0.155 0.108 21.020 21.02** -0.007 -16.837 -16.84**
2
4,701.905 18,031.429 117,147.143 -47,409.048 -2,127.619
b
C
1.908 3.825 -2.919 -1.279
5.18** 9.40** -7.74** -3.82**
Emulsion v i s c o s i t y w i t h Constant 98,207.619 1.0 M NaCl (R = 0.796; pH -28,811.400 -8.588 -3,47** standard e r r o r of e s t i pH^ 4,945.714 18.003 3.28** mate = 4,441.520) plT -277.172 -10.463 -3.33** a * S i g n i f i c a n t at 5% confidence l e v e l ; ** s i g n i f i c a n t at the 1% l e v e l . b The v a r i a b l e described as "pH-4.0 i s given a value of zero unless the pH i s g r e a t e r than 4.0. c The v a r i a b l e described as 4.0-pH" i s given a value of zero unless the pH i s l e s s than 4.0. ,f
f,
Source:
McWatters, Κ. Η., and Holmes, M. R. (4). Journal of Food Science
Cherry; Protein Functionality in Foods ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Downloaded by CORNELL UNIV on August 16, 2016 | http://pubs.acs.org Publication Date: March 6, 1981 | doi: 10.1021/bk-1981-0147.ch014
14.
H O L M E S
Multiple Regression Modelling
309
The models developed f o r the low s a l t concentrations (0.0 and 0.1 M NaCl) were very d i f f e r e n t from those f o r the high s a l t c o n c e n t r a t i o n . The two b a s i c independent v a r i a b l e s used were s a l t l e v e l and the absolute value of the pH minus 4.0. To avoid i m p l y i n g that the behavior of each f u n c t i o n a l property on e i t h e r s i d e of pH 4.0 i s the m i r r o r image of i t s behavior on the other s i d e of pH 4.0, two v a r i a b l e s were formed from the absolute value of pH minus 4.0. These were the absolute values of pH minus 4.0 f o r each pH above 4.0 (values of t h i s v a r i a b l e f o r observations i n which the pH was l e s s than 4.0 were set equal to zero) and the absolute values of pH minus 4.0 f o r each pH below 4.0 (values of t h i s v a r i able f o r observations i n which pH was greater than 4.0 were set equal t o z e r o ) . As shown i n the t a b l e , these b a s i c v a r i a b l e s were used i n the estimated models along w i t h t h e i r squares, t h e i r i n t e r a c t i o n s with s a l t l e v e l (0.0 and 0.1 M N a C l ) , and s a l t l e v e l t o form the independent v a r i a b l e s i n the f i n a l equations used. Other v a r i a b l e s , such as cubic powers of pH minus 4.0, were t r i e d and discarded due t o l a c k of s t a t i s t i c a l s i g n i f i c a n c e i n a r r i v i n g at the f i n a l models. A number of p o i n t s should be noted concerning the s t a t i s t i c s d i s p l a y e d i n the t a b l e . F i r s t , i f the researcher wishes t o rank the v a r i a b l e s i n order of t h e i r importance w i t h i n the equation, absolute values of the beta values a r e the a p p r o p r i a t e i n d i c a t o r s of rank (_7, p. 284). Second, the t-values of the r e g r e s s i o n c o e f f i c i e n t s give us estimates of the s t a t i s t i c a l s i g n i f i c a n c e of the independent v a r i a b l e s used. T h i r d , the R-square, o r c o e f f i c i e n t of determination, i s an estimate of the percent of v a r i a t i o n i n the dependent v a r i a b l e (the f u n c t i o n a l property) explained by the c o r r e sponding r e g r e s s i o n equation. Some P o t e n t i a l Problems A number of r e s e r v a t i o n s concerning use of r e g r e s s i o n should be expressed. I t i s d e s i r a b l e that the experimental data be as balanced as p o s s i b l e (as c l o s e t o having a l l combinations of a l l l e v e l s of a l l v a r i a b l e f a c t o r s i n the design as p o s s i b l e ) , though r e g r e s s i o n , a l s o known as o r d i nary l e a s t squares, i s a s u i t a b l e technique f o r a n a l y z i n g unbalanced data. Use of extremely unbalanced data may reduce the p r e c i s i o n and/or r e l i a b i l i t y of the r e g r e s s i o n c o e f f i c i ents estimated p a r t i c u l a r l y i f the e f f e c t s of the f a c t o r s a r e not l i n e a r . A second c o n s i d e r a t i o n i s that one of the a s sumptions of r e g r e s s i o n i s that there i s no s i g n i f i c a n t c o r r e l a t i o n between the independent v a r i a b l e s i n c l u d e d i n the model. Such c o r r e l a t i o n w i l l e x i s t i f , as i n the above equations, c e r t a i n transformations of independent v a r i a b l e s included i n the model a r e a l s o i n c l u d e d as v a r i a b l e s . This
Cherry; Protein Functionality in Foods ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
310
P R O T E I N
F U N C T I O N A L I T Y
IN
FOODS
w i l l tend to increase the v a r i a n c e of the c o e f f i c i e n t s and may, i n some cases, a f f e c t e s t i m a t i o n p r e c i s i o n . One way of reducing these problems i s t o code the data by s u b t r a c t i n g the mean of each b a s i c f a c t o r from the value of the f a c t o r i n each o b s e r v a t i o n and d i v i d i n g the r e s u l t s by the standard d e v i a t i o n of the f a c t o r (thus " s t a n d a r d i z i n g " the v a r i a b l e ; 11). Any transformations are then performed on the standardi z e d data. Regression c o e f f i c i e n t s estimated using these data must be decoded, however.
Downloaded by CORNELL UNIV on August 16, 2016 | http://pubs.acs.org Publication Date: March 6, 1981 | doi: 10.1021/bk-1981-0147.ch014
Uses of Estimated Models A primary use of r e g r e s s i o n models i s u s u a l l y p r e d i c t i o n of the l e v e l s of the dependent v a r i a b l e (or f u n c t i o n a l property) under given c o n d i t i o n s . Such p r e d i c t i o n s are most r e l i a b l e i f the given c o n d i t i o n s f a l l w i t h i n the ranges 01 the c o n d i t i o n s included i n the data used i n e s t i m a t i o n of the model. I f the r e g r e s s i o n models are considered true b e h a v i o r a l models of the f u n c t i o n a l p r o p e r t i e s (or good approximations thereof) and r e l i a b l e models ( i . e . , the R-square i s g e n e r a l l y considered an i n d i c a t o r of r e l i a b i l i t y of a model), then yet another use might be appropriate. I f no i n t e r a c t i o n s are present, then the f i r s t d e r i v a t i v e ( i f i t e x i s t s f o r the p a r t i c u l a r equation form) w i t h respect t o each f a c t o r (or b a s i c independent v a r i a b l e ) may be taken as an estimate of the marginal e f f e c t ( e f f e c t of the l a s t u n i t of the f a c t o r , or b a s i c v a r i a b l e , added) of that f a c t o r on the f u n c t i o n a l p r o p e r t y , or dependent v a r i a b l e . I f a monetary value can be placed on the dependent v a r i a b l e , then t h i s estimated marg i n a l e f f e c t can be m u l t i p l i e d times that monetary value t o o b t a i n an estimate of the marginal revenue a r i s i n g from a one u n i t increase i n the v a r i a b l e f a c t o r , o r b a s i c independent v a r i a b l e . I f a cost can be attached to the increase i n the l e v e l of the v a r i a b l e f a c t o r , then we can estimate the prof i t , o r lack of p r o f i t , associated with an increase i n the l e v e l of the f a c t o r used by s u b t r a c t i n g the marginal cost from the marginal revenue. I f i n t e r a c t i o n s e x i s t , then p a r t i a l d e r i v a t i v e s must be used r a t h e r than f i r s t d e r i v a t i v e s . The major complication a r i s i n g here i s that the p a r t i a l d e r i v a t i v e can be used as s p e c i f i e d above only by assuming that other v a r i a b l e f a c t o r s which appear i n the p a r t i a l d e r i v a t i v e s are s e t a t f i x e d levels. Any competent production economist can a s s i s t i n s e t t i n g up procedures designed t o produce such r e s u l t s . A number of references are a v a i l a b l e (12, 13, 14).
Cherry; Protein Functionality in Foods ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
14.
H O L M E S
Multiple Regression Modelling
Downloaded by CORNELL UNIV on August 16, 2016 | http://pubs.acs.org Publication Date: March 6, 1981 | doi: 10.1021/bk-1981-0147.ch014
Conclusions M u l t i p l e r e g r e s s i o n techniques can be used i n modelling the e f f e c t s of varying l e v e l s of environmental f a c t o r s on f u n c t i o n a l p r o p e r t i e s of p l a n t p r o t e i n s . Use of m u l t i p l e r e g r e s s i o n not only may a l l o w the researcher t o t e s t s t a t i s t i c a l s i g n i f i c a n c e of p r o p e r t i e s , but i t may a l s o allow the researcher t o estimate magnitudes of the e f f e c t s of the environmental f a c t o r s on behavior of f u n c t i o n a l p r o p e r t i e s . M u l t i p l e r e g r e s s i o n models have been found u s e f u l i n e s t i mating e f f e c t s of such f a c t o r s as moist heat, pH, and s a l t concentration on s o l u b i l i t y and e m u l s i f y i n g p r o p e r t i e s of plant proteins. Some of these e f f e c t s have been found t o be non-linear and d i s c o n t i n u o u s . Use of the technique, however, i s no s u b s t i t u t e f o r good experimental design o r knowledge of the data.
Literature Cited 1.
Cherry, J. P.; McWatters, Κ. H.; Holmes, M. R. J. Food Sci., 1975, 40, 1199. 2. McWatters, Κ. H.; Cherry, J. P.; Holmes, M. R. J. Agric. Food Chem., 1976, 24, 517. 3. McWatters, Κ. H.; Holmes, M. R. J. Food Sci., 1979, 44, 765. 4. McWatters, Κ. H.; Holmes, M. R. J. Food Sci., 1979, 44, 770. 5. McWatters, Κ. H.; Holmes, M. R. J. Food Sci., 1979, 44, 774. 6. Sefa-Dedeh, S.; Stanley, D. J. Agric. Food Chem., 1979, 27, 1238. 7. Steel, R. G. D.; Torrie, J. H. Principles and Pro cedures of Statistics; McGraw-Hill, New York, 1960. 8. Daniel, C.; Wood, F. S. Fitting Equations to Data: Computer Analysis of Multifactor Data for Scientists and Engineers; Wiley, New York, 1971. 9. Draper, N. R.; Smith, H. Applied Regression Analysis; Wiley, New York, 1966. 10. Kubala, J . J.; Gacula, M.C.; Moran, M. J. J. Food Sci., 1974, 39, 209. 11. Snee, R. D. J. Quality Technology, 1973, 5 (2), 67. 12. Heady, E. O. Economics of Agricultural Production and Resource Use; Prentice-Hall, Englewood Cliffs, N.J., 1952. 13. Carlson, S. A Study on the Pure Theory of Production, Kelley and Willman, New York, 1956. 14. Allen, C. L. Elementary Mathematics of Price Theory; Wadsworth Pub. Co., Belmont, CA, 1962. RECEIVED
September 5, 1980.
Cherry; Protein Functionality in Foods ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
311