Abstract
A three-year study was conducted to determine if regression models could be developed to predict yeast assimilable nitrogen (YAN) before harvest, using Riesling in the New York Finger Lakes region as a model. Berry samples were taken from 62 commercial Riesling vineyards around the Finger Lakes at veraison, two weeks before harvest, and harvest. Samples were measured for berry weight, Brix, pH, titratable acidity, ammonia, primary amino nitrogen, and yeast assimilable nitrogen (YAN). The average YAN concentration at harvest was 91.8 mg/L, and there were no significant differences in harvest YAN concentration among years. Linear regression models created using preharvest YAN concentrations (p < 0.05) had a cross-validated R2 (Q2) of 70%. Models using only preharvest ammonia had less predictive power (Q2 = 63%) but may allow winemakers more analytical flexibility than those requiring complete YAN measurements. Models created using multiple linear regression provided better predictive power (Q2 = 73.6%). Finally, a multivariate approach using partial least squares regression was used to create models with the highest predictive power (Q2 = 74.2%). The additional analysis required to obtain values for additional prediction variables may limit the practicality of multiple linear regression and partial least squares approaches. Because many winemakers are not able to perform the analyses required to calculate YAN during the busy time of harvest, the development of these regression models as predictive tools may allow winemakers to use preharvest analysis to calculate accurate supplemental nitrogen additions, allowing targeted supplementation and lowering the risk of excessive prophylactic additions.
When Saccharomyces cerevisiae ferments grape juice into wine, nitrogen is required to produce yeast biomass (Kunkee 1991). Grape must contains a variety of nitrogenous compounds including ammonia, free amino acids, peptides, and proteins, but not all can be assimilated by yeast (Bell et al. 1979). Primary amino nitrogen (PAN) and ammonia (AMM), both used by yeast, are known collectively as yeast assimilable nitrogen (YAN) (Bell and Henschke 2005). Proline, a secondary amine and the only amino acid not assimilable by yeast under anaerobic conditions, is not included in PAN measurements (Salmon and Barre 1998).
YAN concentration in grape must is highly variable (Butzke 1998, Gockowiak and Henschke 1992, Hagen et al. 2008) and is often low in Riesling grapes (Stines et al. 2000). As nitrogen is often the limiting metabolic factor determining fermentation rate (Cantarelli 1957), its deficiency can lead to stuck or sluggish fermentations and the production of volatile sulfur off-aromas (Acree et al. 1972, Ugliano et al. 2009, Vilanova et al. 2007, Vos and Gray 1979). As such, many winemakers supplement their must to ensure healthy fermentations. Bisson and Butzke (2000) found that the maximum nitrogen consumed by yeast during fermentation is 400 mg N/L, regardless of the quantity of excess nitrogen present. Residual nitrogen from excessive supplementation can lead to microbial instability and subsequent spoilage defects (Bell and Henschke 2005). Further, high nitrogen levels can lead to the formation of ethyl carbamate, a known carcinogen, and biogenic amines, which can cause headaches and respiratory or gastrointestinal distress in susceptible individuals (Daudt et al. 1992, Monteiro et al. 1989, Bach et al. 2011, Jansen et al. 2003). Although recommendations vary, it is generally thought that a nitrogen concentration of 140 mg N/L is the minimum needed to avoid fermentation difficulties (Bell and Henschke 2005, Bely et al. 1990). The optimum level of YAN for must at 21 Brix has been identified as 200 mg/L (Bisson and Butzke 2000).
Sound supplementation strategies begin with understanding the initial concentration of YAN in the grape must. Determining YAN concentrations in wine requires specialized reagents and equipment (Gump et al. 2002), which may prevent wineries from performing this analysis in-house. Further, many winemakers do not have time to send samples to external analytical laboratories and wait for results at harvest. Thus, many winemakers make prophylactic nitrogen additions without knowing their initial YAN concentration, which may lead to insufficient or excess YAN.
One strategy to facilitate winery YAN measurement is to develop predictive methods based on preharvest analyses. While previous studies have determined YAN concentrations of grape cultivars grown in various world regions, most of these have focused on concentrations at harvest. The objective of this work was to determine whether preharvest measurements of grape berry chemistry can be used to develop statistically significant models that predict YAN at harvest. The amino acid concentration in grape must has been shown to increase during berry ripening, but then plateau or decrease slightly prior to harvest, depending on cultivar (Hilbert et al. 2003, Hernández-Orte et al. 1999). Postveraison increases in amino acids in Vitis vinifera varieties are caused largely by increased proline concentrations (Stines et al. 2000), while AMM concentration decreases throughout ripening (Bell and Henschke 2005). If these metabolic changes are cultivar dependent, it may be possible to develop models based on YAN concentrations determined two weeks preharvest to estimate nitrogen status at harvest. Such a tool will ease the time constraints for YAN analysis and allow winemakers to develop supplementation strategies based on reliable analytical methods.
Materials and Methods
Experimental design.
This survey comprised 62 commercial Riesling vineyard sites in New York State sampled annually over a 3-year period from 2010 to 2012. The sampling area at each site was defined as 12 vines per row in two adjacent rows for a sample unit of 24 vines, and the same vines were sampled for all three years. Sample sites were selected with input from vineyard managers to capture a range of vine vigor and soil types; sites were designated by the vineyard managers as high vigor, low vigor, or unassigned. Vine management was performed by vineyard managers according to their own best practices.
Sample collection and processing.
Grape berry samples were collected at three time points: veraison, two weeks before harvest, and harvest. Preharvest and harvest dates were determined by consultation with vineyard managers. All samples were collected during a 3-day window, and the interval between preharvest and harvest sampling was 14 ± 2 days. In 2010, 2011, and 2012, there were 6, 21, and 12 sites, respectively, that were harvested before samples were collected, making data unavailable for that site and year. Each 200 berry sample was weighed on an Pioneer PA3102 scale (Ohaus Corp., Pine Bluff, NJ), accurate to 0.01 g, to obtain fresh berry weight. Berries were then crushed immediately using a Stomacher 400 paddle blender (Seward Laboratory Systems, Port Saint Lucie, FL) at 120 RPM for 60 seconds, after which 50 mL of must was decanted from macerated berries for analysis.
Cluster counts per vine were recorded during preharvest sampling on two-panel sections (6 to 12 vines). At harvest 25 clusters were collected and weighed on a Sartorius 3807 MP81 scale (Sartorius Corp., Bohemia, NY) accurate to 1 g. Cluster counts per vine and average cluster weights were used to estimate crop yield per vine for each sample site.
Grape berry chemistry.
YAN is comprised of AMM and PAN, which must be analyzed individually. A 2 mL aliquot was drawn from the 50 mL juice sample, placed in a microfuge tube and centrifuged at 12000 × g in an Eppendorf 5415C (Brinkmann Instruments, Westbury, NY) for 2 min prior to nitrogen assay. A ChemWell 2910 Multianalyzer (Unitech Scientific, Hawaiian Gardens, CA) was used to rapidly test samples. AMM was determined by the glutamate dehydrogenase (GDH) catalyzed condensation of ammonia and α-ketoglutarate (ak-G) and simultaneous oxidation of nicotinamide adenine dinucleotide (NADH) (Ough 1969). The oxidation of NADH results in a decrease in absorbance at 340 nm, which can be quantified by spectrophotometry (Unitech Scientific, Ammonia Extended Range UniTAB, 2007). PAN was determined by derivatization of primary amino groups by o-phthaldialdehyde and N-acetyl-l-cysteine (OPA/NAC) to form isoindoles, which are detected spectrophotometrically at 340 nm (Dukes and Butzke 1998) (Unitech Scientific, Primary Amino Nitrogen UniTAB, 2007).
Soluble solids (Brix) were measured using a digital refractometer (model 30016; Sper Scientific, Scottsdale, AZ) with temperature correction. Titratable acidity (TA) was measured with an autotitrator (Titrino 798, Metrohm, Riverview, FL) and expressed as tartaric acid equivalents. pH was measured with an Accumet Excel XL 25 pH meter and an ion selective probe (Fisher Scientific, Waltham, MA).
Environmental factors.
Soil samples were collected from each site after harvest in 2010 and analyzed by the Cornell Nutrient Analysis Laboratory for standard fertility measurements and soil health indicators, including % moisture, potassium, magnesium, calcium, iron, aluminum, manganese, zinc, soil pH, buffering capacity, organic matter, active carbon, mineralizable nitrogen, and aggregate stability.
Statistical analysis methods.
Each sample site had 21 measures of fruit chemistry—3 sample points (veraison, preharvest, and harvest) × 7 measurements (berry weight, Brix, pH, TA, AMM, PAN, YAN)—plus measures of clusters per vine, average cluster weight, yield per vine collected at harvest, and a categorical measure of site vigor, totaling 25 potential regression coefficients. Additionally, data from 2010 included 14 measures of soil health. All data analysis was carried out using Minitab 16 (Minitab, State College, PA) statistical analysis software.
One-way analysis of variance (ANOVA) was used to assess differences in berry chemistry values by year. Tukey’s method was used post hoc to separate means at the 5% significance level. A probability plot was used to evaluate the fit of a distribution to the harvest YAN data and estimate percentiles. Suitable distributions were selected by assessing the fit using the criteria of having a p value < 0.05 and the lowest Anderson-Darling statistic.
Three approaches to regression modeling were used to predict harvest YAN concentrations. Linear models related a single predictor variable to harvest YAN concentration. Multiple linear regression (MLR) models related many predictor variables simultaneously to harvest YAN. Finally, factor analysis and partial least squares regression (PLSR) summarized the covariance structure of the data and used predictor variables to create latent variables to relate to harvest YAN.
Linear regression.
The regression function was used to create linear regression models relating YAN at harvest to preharvest measures of AMM, PAN, and YAN. Individual models were created for each year and for the combination of all three years. Additionally, in 2011 and 2012, YAN concentrations from the previous year were used to predict YAN.
Multiple linear regression.
More complex models were created using stepwise MLR analysis using 21 potential predictor variables that included measures of fruit chemistry at veraison and preharvest plus harvest values for berry weight, Brix, pH, and TA (14 additional soil health indicators were also used in 2010). At each step, coefficients could be added or removed based on their p value using α = 0.1 as the cut-off to add or remove coefficients. Model selection was determined by the lowest predicted residual sum of squares (PRESS), and leave-one-out-cross-validation (LOOCV) was also used to assess predictive power of the model. Models were created for each year individually and combined.
Factor analysis and PLSR.
Factor analysis was used to summarize the covariance structure of the data. Principal components were used to extract factors and Varimax rotation was used to orthogonally rotate the initial solution. The first two factors were plotted to visualize the covariance structure.
PLSR analysis was used to model the YAN concentration in grapes at harvest from individual sites. For model building, all potential predictor variables (35 in 2010; 21 in 2011, 2012, and multiyear) were used to create an initial model of harvest YAN from individual sites (55 in 2010, 40 in 2011, 50 in 2012, and 145 in the multiyear model). The number of latent variables in each model was determined by the lowest PRESS. LOOCV was used to calculate Q2 coefficients to assess the predictive power of the model. The predictor variables with the lowest standardized regression coefficients were removed by a forward selection process (Andersen and Bro 2010). This process was repeated until only one predictor variable remained. The model selection was based on having the highest Q2 value.
Results
Juice chemistry.
The mean values of berry chemistry at harvest by year were determined (Table 1). Soluble solids (Brix) and pH differed by season, with 2010 showing the highest accumulation of soluble solids (p < 0.001), highest pH (p < 0.001), and lowest TA (p < 0.001); 2011 showing the lowest concentration of soluble solids; and 2012 showing the lowest pH. Despite these differences in traditional indicators of ripeness, there were no significant differences in AMM (p = 0.336). The F test of YAN by year appeared to be significant (p = 0.033); however, the more conservative post-hoc analysis with Tukey’s method showed no significant differences in means, indicating a possible type 1 error in the ANOVA F test. PAN was significantly lower in 2012 (p < 0.001). An ANOVA comparing harvest YAN by vigor designations (data not shown) showed no significant differences between vigor designations (p = 0.967).
During the final two weeks of ripening, AMM concentrations decreased (p < 0.001) from a preharvest mean value of 56 mg/L to a mean harvest value of 45 mg/L. A statistically insignificant increase was observed in mean PAN concentration from 53 to 54 mg/L.
Probability distribution.
A histogram of YAN harvest data indicates a Gamma distribution skewed to the right (Figure 1A). The low Anderson-Darling statistic (0.349) and high p value (>0.250) indicate a good fit to the distribution. The probability plot shows the estimated population percentiles (Figure 1B). The distribution predicts ~95% of the population will have a harvest YAN concentration between 30 and 190 mg/L; <1% will contain >200 mg/L YAN at harvest.
YAN prediction using linear regression.
Linear regression models for individual years and combined data, significant at p < 0.05, successfully predicted YAN at harvest using preharvest measurements of YAN, AMM, and PAN. The regression models for harvest YAN predictions using data collected two weeks before harvest are summarized in Table 2. In models combining data from all three years, harvest YAN was best predicted by preharvest YAN values resulting in R2 = 71% (Figure 2), and LOOCV of the data resulted in Q2 = 70%. Preharvest YAN was also the best predictor of harvest YAN in 2010 (Q2 = 75%).
Preharvest AMM concentrations provided the next best measure to predict harvest YAN concentrations. A model using data from all three years resulted in Q2 = 63% (Figure 3). In 2011 and 2012, preharvest AMM was the best predictor of YAN at harvest with Q2 = 59% and 66%, respectively.
Preharvest PAN had the lowest R2 and Q2 of all preharvest nitrogen measures (Figure 4). For the individual year modes, 2010 had the best correlation between preharvest nitrogen measurements and harvest YAN, while 2011 had the lowest correlation.
Finally, harvest YAN data from the previous year explained the lowest amount of variation in observed responses (Figure 5), with R2 = 15% and Q2 = 11% for combined 2011 and 2012 data. Significant regression models (p < 0.05) were also achieved with YAN data collected at veraison to predict YAN at harvest (data not shown). Despite this significance, the models had weak predictive power, with Q2 values for models AMM, PAN, and YAN of 22%, 7%, and 18%, respectively.
YAN prediction using MLR.
Significant (p < 0.05) MLR models could be constructed for harvest YAN (Table 3); R2 and Q2 represent the amount of variation explained and the predictive power of the model, respectively. The MLR models in 2010 (R2 = 88% Q2 = 85%), 2012 (R2 = 81% Q2 =76%), and all years (R2 = 77% Q2 = 74%) had higher R2 and Q2 values than the linear regression models. Only in 2011 did the MLR (R2 = 61% Q2 = 53%) have a slightly lower Q2 value compared to the best linear regression (R2 = 60%, Q2 = 59%). None of the coefficients were used in more than two models. However, each model had either preharvest AMM or preharvest YAN as its most significant prediction variable. In the model combining data from all years, preharvest PAN concentration had a significant (p < 0.05) negative correlation with harvest YAN. Notably, no terms from veraison sampling were included in any of the models. Potassium content was the only soil component (collected in 2010) that was included in the regression model. Preharvest berry weight had marginal significance (p = 0.096) in 2010, and veraison berry weight was included in the 2011 model, but as discussed previously, the 2011 MLR model had low predictive power. No other prediction variables associated with berry weight, or measures of Brix, were included in any of the models.
Multivariate factor analysis of data.
To determine how closely variables were related, factor analysis was conducted to visualize the covariance structure in the data. The loadings for the first two factors suggested that preharvest and harvest measures of nitrogen were strongly correlated along the first factor, while veraison measurements of nitrogen were correlated along the second. This multi-colinearity indicated that some predictor variables were not independent, but rather were correlated with other predictors.
Partial least squares regression.
To compensate for suspected colinearity of prediction variables, PLSR were constructed, and models (p < 0.05) with the highest Q2 are shown (Table 4). The coefficients for each predictor were used to calculate the fitted value of the response variable, harvest YAN, while the standardized coefficients indicated the relative importance of each predictor in the model. Harvest YAN was best predicted by either preharvest YAN or preharvest AMM. Notably, harvest pH and TA were included in three of the four models, including the multiyear model. The number of latent variables in the models, inferred through principal components of predictor variables, ranged from 1 in 2010 and 2011 to 5 in the 2012 and the multiyear model. The response plot from the PLSR for all years provides graphic representation of model prediction of harvest YAN (Figure 6).
Discussion
Juice chemistry.
The differences observed in soluble solids, pH, and TA at harvest may be a result of the weather patterns during the growing seasons. The growing seasons in 2010 and 2012 were similar, with warm springs leading to early budbreak, ~3000 growing degree days, and timely rainfall. In contrast, the 2011 season had a cool, wet spring with about average budbreak, hot and dry weather from June to August, and almost daily rainfall throughout September and October (H. Walter-Peterson, Finger Lakes Vineyard Notes, 2010H. Walter-Peterson, Finger Lakes Vineyard Notes, 2011H. Walter-Peterson, Finger Lakes Vineyard Notes, 2012). The cooler temperatures and heavy rainfall in 2011 resulted in relatively lower soluble solids.
Population distribution.
In the population studied, juice samples from regional Riesling vineyards were generally deficient in YAN, with average concentrations of 92.0 mg/L. Given the average deficiency, winemakers often supplement grape must prophylactically with additions of as much as the maximum legal U.S. addition of 200 mg N/L from DAP, an addition level which has been reported as common practice in the international wine world (Ugliano et al. 2007). The probability distribution (Figure 1) for this population of samples predicts that 95% of sites will have a YAN concentration falling between 29 mg/L and 190 mg/L, with a mean of 86 mg/L. Subsequently, an addition of 200 mg N/L would result in postaddition YAN concentrations greater than 285 mg/L in most samples, with ~1% of samples having concentrations greater than 400 mg/L. Winemakers in the Finger Lakes can use the population distribution data to make a better prophylactic addition of nitrogen. A lower dose of 120 mg N/L would ensure that less than 0.5% of samples from the population would have a concentration below 140 mg/L YAN, and less than 0.1% of samples would have a concentration above 400 mg/L YAN. Further, the average concentration of samples would be 206 mg/L YAN, which is very close to the concentration recommended for musts at 21 Brix (Bisson and Butzke 2000). In addition to the lower risk of excess nitrogen, lowering the prophylactic dose of nitrogen can reduce costs of nitrogen supplementation.
Linear regression.
Of the models described, linear regression models using preharvest data provided the simplest method for harvest YAN prediction, requiring the least data collection. Linear regression using preharvest YAN gave the best results in the multiyear model as well as in 2010, while preharvest AMM resulted in models with the best predictive power in 2011 and 2012 models (Table 2). Because AMM is one of two measurements required for assessing total YAN, the usefulness of preharvest AMM as a predictor of harvest YAN is of practical interest. Because AMM represents a single quantification, which can be performed using either spectrophotometric methods or an ion selective probe, it is less costly than total YAN measurement and provides more method flexibility.
Using the linear regression models, a 95% prediction interval for individual sites can be estimated. For example, the model predicts that sites with preharvest YAN concentrations of 99 mg/L will have harvest YAN values that fall within a 95% prediction interval from 48.2 to 136.2 mg/L, a range of 88 mg/L. This range is about half as wide as the range generated from the population distribution alone (160 mg/L). The smaller prediction interval allows winemakers to further reduce the amount of supplemented nitrogen without increasing the risk for nitrogen-deficient must.
Multiple linear regression.
More complex models using MLR led to more accurate prediction models, as evidenced by the higher Q2 values, and compared favorably to the linear regression models. Despite better predictive power, the MLR models required more analysis to obtain values for required predictor variables and produced only incremental improvement, with Q2 = 73.6% compared to 70% for linear regression. In the models for individual years, none of the measures were used more than once; however, each model did contain at least one measure of preharvest nitrogen. In the MLR model combining data from all three years, preharvest and harvest measurements of pH and TA were included, suggesting a correlation between these values and harvest YAN. Notably, pH and TA are both positively correlated to harvest YAN but are inversely correlated to each other, which may imply buffering effects from YAN components. The fact that Brix measurements were not included in any of the MLR models is likely explained by Bell and Henschke (2005), who describe the conflicting changes that occur during ripening when AMM decreases, but PAN increases with increasing Brix. Similar trends were also observed in the Riesling ripening data (Table 1).
Partial least squares regression.
Factor analysis indicated multi-colinearity between predictor variables, necessitating the use of PLSR techniques to select prediction variables from projections of latent variables. Like the MLR, the PLSR contains six coefficients, selected based on the projections of five uncorrelated latent variables, effectively reducing correlation between predictor variables. The PLSR regression model had the highest predictive power of the three models used (Q2 = 74.2%). Using the model requires additional analysis, compared with linear regression, to obtain values for the six predictor variables. The standardized coefficients indicate the relative magnitude of the effect a predictor variable has on the model (Table 4). In the PLS model, harvest YAN was best predicted by preharvest YAN concentrations, followed by preharvest AMM, harvest pH, harvest TA, preharvest pH, and preharvest TA. pH and TA values at harvest remained important predictors, both positively correlated to harvest YAN despite being inversely correlated to each other.
While variation in amino acid accumulation and decreases in AMM make prediction models cultivar-dependent, it is notable that an efficient predictive model could be developed from data representing a range of sites and climatic variation. These data suggest that regression models can be made to predict harvest YAN in Finger Lakes Riesling using measurements taken two weeks before harvest. Winemakers could use simple linear equations adapted from the regression models to obtain an estimate of harvest YAN concentrations.
Application of the models.
Equations from the models can be applied to predict new observations in the population (i.e., Finger Lakes Riesling sites) within a prediction interval (PI). Harvest YAN values were calculated for new observations using mean values of predictor variables. The predicted value and the 95% PI are shown.
Linear regression (Table 2):
Multiple linear regression (Table 3):
Partial least squares regression (Table 4):
The 95% PI gives the range where new observations are likely to fall; the models with better predictive power result in a lower PI. With this information, winemakers can decide whether the improvement in the PI justifies the additional analysis required to obtain values for the prediction variables. In many cases, measuring AMM only two weeks before harvest may provide enough accuracy to calculate successful nitrogen additions.
Conclusions
It is well understood that nitrogen concentrations can affect fermentation parameters, but the difficulty of measurement and long lead times for external analysis cause many winemakers to forego analysis and rely on prophylactic additions for healthy fermentations. Probability distributions based on harvest nitrogen concentrations collected from 62 commercial Riesling vineyards over three years allow for better estimates of appropriate prophylactic additions to minimize the risk of nitrogen deficiency or excess in Riesling must. Statistically significant linear regression models were developed that further reduce the prediction interval for vineyard sites in the Finger Lakes. In Riesling, preharvest YAN gives the best prediction of harvest YAN; however, preharvest AMM values predict almost as well and may be easier to measure. Finally, more complex MLR and PLSR models result in better predictive power, although they may not be practical because they require additional calculation and measurements. The successful development of prediction models for harvest YAN in Riesling grapes from the Finger Lakes region suggests that this method may be used to develop similar models for specific cultivars and growing regions.
Acknowledgments
Acknowledgments: This project was supported by Cornell Federal Formula Funds Grant Program and the New York Wine and Grape Foundation. The authors gratefully acknowledge the technical assistance of Bill Wilsey and the cooperation of more than 60 vineyards throughout the Finger Lakes.
- Received March 2013.
- Revision received June 2013.
- Accepted July 2013.
- Published online December 1969
- ©2013 by the American Society for Enology and Viticulture