Abstract
Partial least squares regression was applied to the UV-visible spectra of 200 red wine samples at various stages of fermentation and the concentrations of several groups of phenols as determined by the Harbertson-Adams assay. Prediction functions for each of the phenolic classes were obtained using multivariate methods and were then used to estimate the corresponding phenolic measures in 200 independent samples. The correlations between predicted and measured values had coefficients of determination (r2) of 0.88 for anthocyanins, 0.86 for tannins, 0.82 for nontannin iron-reactive phenols, 0.88 for total phenols, 0.82 for small polymeric pigments, 0.41 for large polymeric pigments, and 0.76 for total polymeric pigments. The method has the potential for the rapid determination of these color and phenol components during the fermentation of red wines.
Phenolic compounds are important contributors to the antioxidant properties and to the color and mouthfeel of red wine (Singleton and Rossi 1965, Singleton and Noble 1976, Thorngate and Noble 1995, Brossaud et al. 2001). The importance of phenolic compounds on sensory perception requires that they should be readily quantified at all stages of winemaking, yet rapid, comprehensive methods to measure a panel of phenolic compounds in wines do not currently exist. While a plethora of analytical methods exist, all have limitations to widespread industry adoption. The Folin-Ciocalteu method, by itself, lacks specificity for phenolics based on molecular size and is unable to distinguish tannins from polymeric pigments (Singleton and Rossi 1965), the sensitivity of dimethyl-cinnamaldehyde staining is sample-dependent (Nagel and Glories 1991), and the Somers color, Boulton copigmentation, and Harbertson-Adams phenolic parameter assays are complex, multistep analyses with compounding errors (Somers and Evans 1977, Boulton 2001, Harbertson et al. 2003). Methods based on high-pressure liquid chromatography (HPLC) circumvent some of these problems, but require costly analytical equipment (Wulf and Nagel 1976, 1978, Lamuela-Raventós and Waterhouse 1994, Donovan et al. 1998, Peng et al. 2002). The greatest limitation of all current methods is the time required to generate usable information. Consequently, the development and application of rapid methods with the potential for real-time monitoring of phenolic composition during fermentation is a high priority for both industry and research.
To be useful and gain widespread industry adoption, an analytical method must require only minimal sample preparation and produce results on multiple parameters rapidly. Spectroscopy-based predictive methods are the most attractive alternative to current methods. These use multivariate statistics to correlate information from sample spectra to a reference analytical method. Once a mathematical model is built and validated, the composition of unknown samples can be rapidly obtained from their spectra. To date such methods to predict grape and wine composition have focused primarily on the use of infrared (IR) spectroscopy. The use of mid- (MIR) and near- (NIR) infrared-based predictive methods has been widely adopted in the agricultural, food, pharmaceutical, and beverage industries (Osbourne et al. 1993, Barton and Kays 2001, Anderson et al. 2002). In the wine industry, MIR-based methods are used to measure parameters throughout the winemaking process, including ethanol, total acid, volatile acidity, organic acids, pH, sugars, glycerol, methanol, and CO2 (Patz et al. 1999, Dubernet and Dubernet 2000, Anderson et al. 2002, Kupina and Shrikhande 2003).
NIR spectroscopy has also been employed in a more limited fashion to quantify phenolic compounds in grapes and wine. These include the measurement of anthocyanins in berry homogenates by reflectance visible-NIR (Dambergs et al. 2004) and the quantitative analysis of anthocyanins, polymeric pigments, and tannin in red wine fermentations by transmission visible-NIR (Cozzolino et al. 2004). Additionally, NIR-based predictive methods have been reported for the determination of monomeric phenolics in tea and condensed tannins in forage legumes (Schulz et al. 1999, Smith and Kelman 1997). These results presage the utility of such an approach for widespread use in compositional studies of grapes and wines.
One challenge associated with the use of IR spectroscopic methods stems from the susceptibility of the predictive strength of the model to changes in the sample matrix. The published literature on applications of visible-NIR spectroscopy to grape and wine analysis suggests that cultivar, soil type, climate, and season all contribute to these matrix effects (Dambergs et al. 2004, Cozzolino et al. 2004). In addition, the requirement for costly, specialized instrumentation and ongoing validation requirements for site, season, and cultivar have prevented the adoption of this technology in small and medium-sized wineries, particularly for routine phenolic analysis. Subsequently, there remains a need for rapid, comprehensive analytical methods capable of analyzing multiple phenolic components simultaneously.
The UV-visible portion of the electromagnetic spectrum contains distinguishing information regarding phenolic compounds. The anthocyanins have absorbance features at 267–275 nm and 475–545 nm, the benzoic acids show a single absorbance in the region 235–305 nm, hydroxycinnamic acids have absorbance maxima at 227–245 nm and 310–332 nm, and flavonols typically have maxima in the regions 250–270 nm and 350–390 nm, while the flavan-3-ols (catechins) have a single absorbance band around 280 nm (Jurd 1962, Harborne 1989).
While UV-visible spectroscopy has been widely applied in traditional colorimetric assays as described above, it has also been used with chemometrics in the development of predictive methods. These methods include determination of individual component concentrations in pharmaceutical mixtures (Bouhsain et al. 1997, Ghasemi and Vosough 2002), simultaneous determination of methylxanthines in coffees and teas (López-Martínez et al. 2003), air and water contaminant measures (Martin and Otto 1995, Cirovic et al. 1996, Dahlén et al. 2000), and real-time monitoring of changes in phenolic concentrations across a membrane (García et al. 2001).
UV-visible chemometric methods have also previously been applied in enological research to predict measures including pH, total and volatile acidities, free SO2, anthocyanins, and tannins in finished red wines (García-Jares and Médina 1995). Spectra were collected from 200–650 nm at 6-nm intervals on a sample set (n = 34) from Bordeaux including several regions, cultivars, and new and aged wines. Despite the limitations imposed by the analytical methods chosen for these analyses and the small sample set, the data demonstrated the potential of UV-visible spectra to predict the concentrations of anthocyanins and tannins in red wine.
The current study sought to investigate the application of multivariate methods to UV-visible spectra of wine samples to develop predictive models for determining a range of phenolic components at all stages of winemaking and to evaluate them over a wide range of grape cultivars and growing regions. The protein precipitation and iron reaction assay, also referred to as the Harbertson-Adams assay (Harbertson et al. 2003), was chosen as the reference analytical method since it has proven to be a robust procedure that provides a comprehensive set of phenolic measurements of interest to winemakers.
Materials and Methods
Sample collection.
362 samples were collected at various stages during commercial fermentations at the Yalumba and Orlando-Wyndham wineries in South Australia during the 2006 harvest. The samples were collected in 21-mL glass scintillation vials with polypropylene screw-cap lids and stored at room temperature with sodium azide (final concentration 0.1%) to prevent further microbial activity. Additionally, 87 finished wines were drawn from a collection at the Department of Primary Industries, Victoria. A wide range of variation in composition because of site was included, with samples representing commercial vineyards in 15 different growing regions in Victoria and South Australia. The majority of the samples came from six regions: Barossa Valley (n = 128), Sunraysia (87), Riverland (52), Langhorne Creek (48), McLaren Vale (30), and Clare (27). Other samples were from Padthaway (22), Coonawarra (15), Adelaide Hills (9), Bordertown (5), Barmera (4), Blanchetown (4), Renmark (4), Loxton (3), and Waikerie (2). The remaining samples (n = 9) used in the analysis had no origin specified. Varieties represented in this sample set included Shiraz (n = 215), Cabernet Sauvignon (133), Merlot (49), Pinot noir (13), Grenache (12), Tempranillo (5), and Chardonnay (6). The remaining 16 samples had no cultivar information.
Sample analysis.
The samples were analyzed by the Harbertson-Adams assay for anthocyanins, pigmented polymers, tannins, total phenols, and non-tannin phenols (Harbertson et al. 2003). The complete protocol for analysis of these parameters is available on the University of California, Davis website (http://boulton.ucdavis.edu/uv-vis/index.htm). Fermentation samples were centrifuged at 6,000 rpm for 10 min (model Rotofix 32; Hettich AG, Bach, Switzerland) to remove gross solids before analysis. The assay was performed directly on finished wine samples.
All spectra were collected within 24 hr of performing the analytical assay. Aliquots (1.0 mL) of both fermentation samples and finished wines were centrifuged in a benchtop centrifuge (model 5415D; Eppendorf, Westbury, NY) for 10 min (13,200 rpm; 16,110 x g) prior to collecting spectra.
UV-visible spectra were collected with a spectrophotometer (model SP8001; Metertech, Taipei, Taiwan) using 2-mm path-length UVettes (Eppendorf). Samples were scanned from 230–900 nm at ~0.17 nm intervals. Water was used for the reference scan. Spectral data were collected on all fermentation samples at both a 5-fold dilution with water and undiluted. Finished wines were diluted 3-fold with water and spectra were collected on both diluted and undiluted samples. The spectral data were corrected for dilution and reduced to one nanometer intervals for the multivariate statistical analysis.
NIR spectra were also collected for all undiluted samples using an InfraXact unit (FOSS, Hillerod, Denmark) equipped with the small sample cup (60 mm diam) and 0.2-mm gold reflector. Samples were scanned from 570–1098 nm at 2-nm intervals and from 1100–1848 nm at 2-nm intervals.
Statistical analysis.
Data analysis was performed using Unscrambler (version 9.0; CAMO AS, Oslo, Norway). Partial least squares (PLS) regression analysis was applied to the UV-visible (and separately, the NIR) spectra and Harbertson-Adams assay data sets using full cross-validation. In this approach, calculations are repeatedly performed on the data set minus one sample until all samples have been removed in turn. The Unscrambler software determined the optimal number of principal components, with the maximum number of principal components limited to 10.
Of the 449 samples collected, 20 could not be used because of inadequate spectrum quality. An additional 29 samples were identified by Unscrambler as spectral outliers in a principal component analysis and were removed prior to the PLS calculations.
The 400 sample set was randomly split into two 200 sample data sets. The first, the calibration data set, was used for model construction, while the second, the validation data set, was used to independently test the model. Regression analysis was used to test that no significant correlation existed between the calibration and validation sample sets for all phenolic parameters. Coefficients of determination (r2) ranged from 0.0001 for total phenols to 0.0471 for anthocyanins (data not shown).
The statistics used to describe and compare model performance include the root mean square error of cross validation (RMSECV), the root mean square error of prediction (RMSEP), and the coefficient of determination (r2) between measured and predicted values. The RMSECV is a measure of the average difference between the values determined by the laboratory method and those predicted by the model using the full cross-validation procedure. The RMSECV provides an indication of the potential error associated with future predictions on samples not included in the calibration data set and is expressed in the same units as the original laboratory analysis. The RMSEP is a measure of the average difference between values predicted by the model and values determined by the laboratory method during independent testing (validation) of the model. The RMSEP can be interpreted as the average prediction error and is expressed in the same units as the original laboratory analysis.
Results
Juice, fermenting must and finished wine samples were collected and assayed for anthocyanins, pigmented polymers, tannins, and total phenols using the Harbertson-Adams assay for all 400 samples (Table 1⇓). Anthocyanins ranged from zero in Chardonnay wines to 1100 mg/L malvidin-3-O-glucoside equivalents in a Shiraz wine from the Bordertown region of South Australia. Pigmented polymers, expressed as absorbance (A520), were measured in all samples; values ranged from zero in white wine samples to 6.6 absorbance units (AU) in a Cabernet Sauvignon sample from the Barossa Valley. Large polymeric pigments, defined as those precipitated by protein, ranged from below the level of detection to 3.1 AU in a Cabernet Sauvignon sample from the Barossa Valley, while the soluble small polymeric pigment fraction ranged in concentration from zero to 4.2 AU in a Shiraz sample from the same region. Negative values at the minimum of the value range for large polymeric pigments and total tannins are artifacts of the method at very low levels of these classes of compounds and should be considered essentially zero. The tannin concentrations ranged from essentially zero in white wines to ~800 mg/L in a Merlot sample from the Barossa Valley and a Cabernet Sauvignon sample from Coonawarra. Total phenolics ranged from ~20 mg/L in Chardonnay wines to 2272 mg/L in the same Shiraz sample that had the highest level of anthocyanins. This sample also had the highest level of nontannin phenols at ~1500 mg/L in contrast to 18 mg/L in a Chardonnay wine.
UV-visible spectra were collected on all 400 of these samples (diluted and undiluted) and both the assay and spectral data for the calibration samples were used to build a predictive model for wine phenolics. Spectra of both undiluted and diluted wines were analyzed separately. The coefficients of determination (r2) for the diluted sample set were equivalent to or higher than for the undiluted sample set for all parameters (Table 2⇓). The corresponding RMSECV values were equivalent to or lower. Because of the improved predictive ability of the model based on diluted samples, the remaining work focused on the diluted spectral data set only.
The coefficient of determination (r2) for all parameters in the diluted data set used for calibration (Table 2⇑) was greater than 0.82, with the exception of large polymeric pigment. For other parameters, coefficients of determination ranged from 0.82 for both total and small polymeric pigment to 0.91 for total phenols. The average difference between the predicted and measured values (RMSECV) for anthocyanins was 77 mg/L malvidin-3-O-glucoside equivalents, for total polymeric pigments (small plus large) it was 0.53 AU, for tannins it was 56 mg/L, and for total phenols it was 118 mg/L. Large polymeric pigments were poorly estimated with a very large error (0.38 AU) relative to the range (−0.5 to 3.1 AU) and the mean (0.5 AU).
The predictive model for Harbertson-Adams assay phenolic parameters developed from the calibration data set was then tested with the 200 independent validation samples. Measured values for each of the Harbertson-Adams assay parameters were plotted against values predicted by the model for each parameter based on the UV-visible spectrum of each sample (Figure 1⇓). There was a strong positive correlation between the measured and predicted values for each of the Harbertson-Adams assays phenolic parameters with the exception of large polymeric pigments, where the coefficient of determination (r2) was only 0.41. In comparison, the coefficient of determination for the correlation between measured and predicted values was 0.88 for anthocyanins and was 0.86 for tannins.
The RMSECV and RMSEP values for the Harbertson-Adams assay parameters for each of the calibration and validation datasets were determined (Table 3⇓). The predictive errors were similar for both the model (RMSECV) and independent test samples (RMSEP).
This research also sought to investigate the independence of the UV-visible-based predictive models for phenolics from the matrix effects of grape cultivar that have been a concern for IR-based models. Shiraz and Cabernet Sauvignon models for anthocyanins and tannins were developed using the 191 Shiraz samples and 119 Cabernet Sauvignon samples. In each case, the remainder of the 400 samples was then used as the validation data set. Thus, the Shiraz model was tested against 209 independent samples, while the Cabernet Sauvignon model was tested against 281 independent samples. There were insufficient calibration samples to develop models for other cultivars.
Range, mean, and standard deviation (SD) for anthocyanins and tannins in the Shiraz and Cabernet Sauvignon calibration data sets, as well as the coefficient of determination (r2) and RMSECV for the Shiraz and Cabernet Sauvignon anthocyanin models were determined (Table 4⇓). The coefficient of determination (r2) and the RMSEP from independent testing of the ability of the Shiraz and Cabernet Sauvignon models to predict anthocyanin and tannin concentration in the validation samples were also determined.
The ability of the Cabernet Sauvignon anthocyanin model to predict anthocyanins in samples of all other cultivars was relatively strong. The predictive ability of the Shiraz model was weaker. When compared with the model built and tested using all of the cultivars in the 400 sample data set, both models had coefficients of determination comparable to that of the full model (calibration r2 = 0.89, Table 2⇑; validation r2 = 0.88, Figure 1⇑). Despite the similarity in these values, the errors in the full data set were less (RMSECV = 77 mg/L and RMSEP = 87 mg/L, Table 3⇑), than for either of the single-cultivar models (Table 4⇑), suggesting that the full model has superior predictive ability.
The ability of the Shiraz and Cabernet Sauvignon tannin models to predict tannins in all other cultivars in the complete data set was relatively strong with coefficients of determination (r2) greater than 0.9 in the calibration data set and r2 = 0.8 in both validation data sets (Table 4⇑). In contrast to the anthocyanin case, the predictive ability of the single cultivar models for tannin was similar to that of the full model.
Discussion
UV-visible spectroscopy has been used to determine wine phenolics, specifically the use of UV-visible-based predictive methods for determining anthocyanins and tannins in finished wines (García-Jares and Médina 1995). Recent work has focused on MIR and NIR spectroscopy, including the ability of MIR to predict the phenolic components measured in this current study (Nakaji 2004) and the use of NIR to predict the HPLC quantification of anthocyanins, polymeric pigments, and tannins (Cozzolino et al. 2004). While all of these studies have generated models for a range of phenolic classes, none of them have reported validation data. Therefore, a comparison of these models with the current work requires comparing the RMSECV associated with the calibration data set for each analysis. In the present study, while the focus was on UV-visible spectra, NIR spectra were also collected and analyzed and are included in the comparisons below.
The anthocyanin content of red wine fermentations and finished red wines is well-predicted by UV-visible and NIR spectroscopy, in this and previous studies (Table 5⇓). The strongest predictive model for anthocyanins is the NIR-based model (Cozzolino et al. 2004). It is likely that the greater accuracy of HPLC analysis, together with a larger sample set from fewer regions, has contributed to a stronger predictive model for anthocyanins. Previous work based on UV-visible spectroscopy showed a similar correlation to the current UV-visible-based model. All of these models demonstrate an acceptable ability to predict anthocyanins in wine.
Predictive models for determining tannin in fermenting musts and wines have also been reported for UV-visible, NIR, and MIR spectroscopies (Table 6⇓). Average predictive errors range from 56 mg/L for the current UV-visible model to approximately double that value in both of the NIR models (Cozzolino et al. 2004, current work). The predictive error was almost 4-fold greater in the MIR model (Nakaji 2004) and 5-fold greater in the previously reported UV-visible model (García-Jares and Medina 1995). The coefficient of determination (r2) was highest for the UV-visible model reported here; however, despite the large predictive errors, both the NIR model (Cozzolino et al. 2004) and the MIR model (Nakaji 2004) also showed strong correlations between measured and predicted tannin values. The correlation for the current NIR model was not as high, although the RMSECV was lower. While the NIR model (Cozzolino et al. 2004) used HPLC quantification of tannin as the primary method, the MIR model (Nakaji 2004) used the same analytical method as the current work. The weakness of the IR-based models in predicting tannin suggests that this spectral region is not well-suited for this application. Potentially, the model relies on colinearity between tannins and other compounds with IR absorbance that may not be consistent. The inability of the earlier UV-visible model to predict tannin (García-Jares and Medina 1995) is likely an artifact of the primary analytical method, which relied on hydrolytic conversion of proanthocyanidin subunits to anthocyanidins. The method may not achieve full conversion of proanthocyanidins and can also convert anthocyanins to anthocyanidins.
The ability of the current UV-visible model to predict tannin was as strong as that for anthocyanins. A concern is the narrow range of tannin concentrations incorporated into the current model (~800 mg/L catechin equivalents). Values reported elsewhere using the same laboratory analytical method cover twice the current range (Kennedy et al. 2006), while anecdotal evidence suggests even higher values in some wines.
Predictive models for polymeric pigments have been developed using UV-visible, MIR, and NIR (Table 7⇓). While the Harbertson-Adams assay parameter for small polymeric pigment was reasonably well predicted by UV-visible and NIR spectra (r2 = 0.82, 0.80), the MIR method (Nakaji 2004) was much less robust (r2 = 0.30). In contrast, large polymeric pigment could not be predicted from either UV-visible or NIR spectra (r2 = 0.63, 0.58), while the model using MIR spectra had a relatively strong correlation (r2 = 0.86). To date only NIR and UV-visible spectroscopy have been used to predict total polymeric pigments (Table 7⇓). The NIR model that used HPLC as the primary method (Cozzolino et al. 2004) had the greatest predictive strength (r2 = 0.87), while both the UV-visible and NIR models in the current work were comparable (r2 = 0.82, 0.85). The ability to predict polymeric pigments was the most problematic aspect of the current model, with the ability to predict large polymeric pigment being the weakest, likely because of the low levels of these components in fermentation samples and young red wines.
Total phenolics, defined here as total iron-reactive phenols, incorporates all of the flavonoid compounds such as flavanols, flavonols, tannins, and pigmented polymers, as well as a range of low molecular weight compounds such as hydroxycinnamates and caffeic acid esters of tartrate. This grouping of phenolics was well predicted from both UV-visible (r2 = 0.91) and MIR (r2 = 0.90) (Nakaji 2004) spectra; it was not well predicted by NIR in the current work (r2 = 0.77) (Table 8⇓). While the coefficient of determination for the MIR model was high, the error (RMSECV) was substantially greater than that for the UV-visible model and similar for the current NIR model.
Of the approaches examined here—UV-visible, NIR, and MIR—it is apparent that the UV-visible model is the most effective for determining all of the major classes of phenolic compounds in red and white wines. However, two questions remain: What is the ongoing requirement for calibration of this model? And, is the model universally applicable? To date samples from only one vintage have been used in development of the model, so it is not yet possible to comment on year-to-year variation in the predictive ability of the model. Neither is it currently possible to demonstrate independence of the model from site-specific influences on the model, or grape matrix, as limits current IR-based models (Dambergs et al. 2004, Cozzolino et al. 2004). However, the UV-visible model does appear to be independent of cultivar-specific effects as suggested by the Shiraz and Cabernet Sauvignon models for anthocyanins and tannins. In both of these models the predictive error (RMSEP) was greater for the full model, which incorporated samples from all eight cultivars. It is reasonable to anticipate that a fully validated UV-visible model would be widely applicable and require little ongoing commitment to recalibration, as currently occurs with predictive models based on other spectroscopies.
Conclusions
The results presented here clearly demonstrate the potential of rapid predictive methods based on UV-visible spectroscopy to quantify a range of phenolic classes in juices, musts, and finished wines. The predictive ability of the model was strongest for anthocyanins, total phenolics, and tannin. Nontannin phenols were also well predicted, but confident prediction of polymeric pigment remains elusive. The approach using UV-visible spectra is general in nature with similar precision for single-cultivar models (Shiraz and Cabernet Sauvignon) as for the full model. This has been demonstrated for the anthocyanin and tannin components, the groups of phenols most important to winemakers during fermentation.
Footnotes
Acknowledgments: Financial and in-kind support from J. Lohr, T.J. Rogers, the Orlando-Wyndham Wine Group, and the Yalumba Winery. Scholarship support from the American Society for Viticulture and Enology, the American Wine Society, the Temecula Wine Society, Wine Spectator Magazine and a travel grant from the University of California, Davis.
- Received November 2006.
- Revision received March 2007.
- Copyright © 2007 by the American Society for Enology and Viticulture