Abstract
The weights of individual Cabernet Sauvignon, Merlot, and Chardonnay clusters were found to be significantly correlated with the numbers of clusters on the shoots from which each cluster grew. Based on this and other findings, a new sampling strategy was devised that used procedures that encompassed blockshape geometry and distributional characteristics of clusters grown on sampled shoots. The vine was chosen randomly, the vine shoot to be sampled was chosen systematically, and then each selected cluster was chosen randomly. This recommended composite randomwithsystematicsampling approach was motivated by the need to minimize biases attributable to subjective judgments made in the field, such as overselection of uncharacteristically large clusters.
During the growing season, the gathering of grape cluster samples is an essential part of viticultural assessment. Sampled clusters are typically assumed to be representative of an entire field or block of vines. It is therefore important that the sampling methodology avoids statistical bias. The implementation of biasreduction procedures, however, must allow for important practical considerations. Freshly gathered quantitative data have maximum usefulness during a limited time in the growing season and other vineyard practices tend to constrain the proportion of effort that can be expended on sampling biasreduction methodology. Yet, to optimize the value of the gathered data, it is vital that sampleselection biases be avoided.
According to Wolpert and Vilas (1992), “Unbiased sampling is not always achievable, but one method is to reach blindly into a vine, contact a shoot and remove all the clusters from the shoot. Repeat this shoot selection from vines selected either casually or systematically, or selected using random number tables until the first sample of 10 or more clusters has been collected. The shoot sampling method should ensure that samples contain a proportionate number of basal and second clusters.” This method may serve well when shoots bear similar numbers of clusters. However, when that is not the case, there is justification for each sampled cluster to be gathered randomly from each selected shoot.
Within the blocks of almost all vineyards, berry clusters grow on vines planted along parallel rows that are equally spaced. In stark contrast to this geometric regularity, clusters and to a lesser extent shoots tend to be spaced unevenly on any given vine. For researchers interested in obtaining an informative sample of grape clusters, the geometric configuration of rows of vines presents both challenges and opportunities. There are two general procedures, systematic sampling and random sampling. In contrast to random sampling, the term systematic often denotes that an investigator uses a regular, nonrandom pattern (Cochran 1977). With the notable exceptions of a few datagathering techniques (Wolpert and Vilas 1992, Tarara et al. 2004) and a recent Gaussian quadraturebased approach (Arndt et al. 2006), little has been published previously in the agricultural literature that pertains to hybrid—mixed random and systematic—processes by which samples can be drawn.
Our goal was to devise a sampling method that would simultaneously minimize bias and address viticulturespecific challenges, such as time constraints, differences in row lengths, and the natural variation in the number of clusters on individual shoots. We developed a threestep randomsystematicrandom (RSR) approach, first sampling rows and vines at random, then sampling shoots systematically, and finally selecting clusters from each shoot at random. Since it takes advantage of the special characteristics of clusters grown on cordonpruned vines, the RSR approach can be implemented within most vineyards.
Materials and Methods
2003 pilot study.
In 2003 a pilot study was conducted with the primary goal of testing a new sampling strategy for use on a larger scale in 2004. A secondary goal was to explore vine characteristics, such as the relationship between cluster location on the shoot and cluster weight, in light of their possible effects on sample bias. Vines from two blocks were studied, all located in the Wappo Hill Vineyard in Napa Valley. DmCS (referred to here as CS1) consisted of Cabernet Sauvignon vines in 253 rows oriented northeast/southwest, each row composed of 7 to 125 vines (29,797 vines total). DmME (referred to here as ME1) contained Merlot vines in 263 rows oriented northeast/southwest, each row composed of 18 to 99 vines (22,332 vines total). The vines in both blocks were on 10114 rootstock, planted in 1997, and were bilateral cordonpruned on vertical trellises with 1.54 m by 1.83 m spacing. Two clones were present in each of these blocks, but when clonespecific standard deviations were checked, minimal differences were detected. It was not, therefore, deemed necessary to stratify by clone for withinblock sampling in the 2003 or 2004 samples.
Randomsystematicrandom sampling: 2003.
Six rows were sampled randomly from within each block, and five vines were sampled randomly from within each of the six rows. Random numbers for the 2003 pilot study and for the 2004 study were obtained from www.random.org, which provides random number sequences based on the monitoring of atmospheric noise, and from a book of random digits (Rand Corp. 1955).
In each block one cordon was selected for sampling from each of the 30 vines, chosen by alternating down each row. Shoots were selected systematically from each chosen cordon, using prespecified relative locations along the cordon based on the application of Gaussian quadrature. This technique was also described in a viticultural setting (Tarter and Keuter 2005) and recommended for another sampling application (Arndt et al. 2006).
In the pilot study, five shoots were selected from each sampled cordon. The relative sampling positions (or Gaussian nodes), according to Stroud and Secrest (1966), were −1, −0.655, 0, 0.655, and 1. These positions can be understood in the context of a hypothetical cordon, two units long, on which position 0 designates the center of the cordon, −0.655 and 0.655 designate points 0.655 units to the left and right of the center, and −1 and 1 designate the cordon ends. The five shoots closest to these five Gaussian nodes were those selected for sampling purposes.
This technique was extrapolated to actual cordons by premarking the five Gaussian node positions on varying lengths of elastic. Correct relative position was maintained by first marking the end positions and the location halfway between them, then multiplying half of the total distance between the two end positions by 0.655. Then the two remaining positions were marked as 0.655*(half total length) to the left and right of the center mark. As long as a length of elastic shorter than the sampled cordon was selected, the elastic could be stretched so that outermost Gaussian node positions aligned with the distal and proximal shoots and the relative sampling positions could thus be preserved without the need for potentially timeconsuming measurement and calculation in the field. (Here proximal and distal respectively denote positions nearest to and farthest from the main trunk from which the cordon extends.) Metal loops were attached to the ends of the elastic lengths for ease of use. The preservation of relative position on the stretched elastic was verified by measurement before use in the field. The five sampling positions were designated by the letters A through E, with A always corresponding to the proximal shoot and E always corresponding to the distal shoot. Cordon lengths were also recorded.
One cluster was chosen randomly from each shoot selected by using a table of random numbers. Its position (1, 2, or 3, with 1 being the basal cluster) and the number of clusters on the shoot prior to sampling were recorded. When the selected shoot bore no clusters, the next nearest clusterbearing shoot was chosen. Clusters bearing less than 15 berries in either samples or counts were not considered, so any cluster mentioned here or later in this discussion was composed of 15 or more berries. In total, 149 clusters were sampled from CS1 and 148 clusters were sampled from ME1. The reduction in the expected sample size of 150 was due to three instances when there was no shoot near the appropriate sampling position.
At harvest (3 Oct for CS1 and 23 Sept for ME1), all remaining clusters from each of the 30 previously sampled vines were removed, counted, and weighed as a whole.
Fall 2004 study.
In 2004 clusters were collected from five blocks in the Napa Valley. CS1 and ME1 were again sampled, along with another Merlot block (Nnsouth ME), another Cabernet Sauvignon block (TwestCS), and a Chardonnay block (DF06CH) (referred to subsequently as ME2, CS2, and CH). CS2 and ME2 were located in the ToKalon Vineyard in Oakville, while CH was in the Huichica Hills Vineyard in NapaCarneros. All blocks contained cordonpruned vines on vertical trellises. The vines in CS2 had bilateral cordons, and ME2 and CH had unilateral cordons. ME2 was planted in 1995 with 1.54 m by 2.13 m spacing, with 121 rows oriented northeast/southwest, each with between 12 and 131 vines (9,989 vines total). Vines were on 110R rootstock. CS2 was planted in 2001 with 1.54 m by 1.83 m spacing, and 89 rows oriented northeast/southwest, each with between 87 and 94 vines (8,830 vines total). CH was planted in 1991 with 1.54 m by 2.13 m spacing, and 82 rows oriented east/west, each with between 4 and 162 vines (9,596 vines total). Nonproductive vines were included in these vine counts. The rootstock for both CS2 and CH was 10114. Clusters were gathered preharvest from CS1 on 21 and 22 July, from CS2 on 1, 7, and 8 Aug, from ME1 on 3 and 5 Aug, from ME2 on 24 and 26 Aug, and from CH on 17 and 19 Aug.
Randomsystematicrandom sampling: 2004.
During the pilot study, we noticed that nonrectangular block geometry could, potentially, greatly increase the odds of short row overrepresentation if row length were not taken into account during the row and vine selection process. For example, consider hypothetically a block with a very long row that is 10 times longer than a very short row. Were the odds of picking each of the two rows the same, and were the vinespecific cluster numbers the same for all vines, then it would follow that a given cluster from the short row would have 10 times the selection odds of any of its long row counterparts.
To protect against this potential shortrow bias, we implemented a modification of the rejection procedure (Tocher 1963). Candidate row and vine identification numbers were selected randomly with replacement in ordered pairs (row, vine), where the row identifier was selected randomly from a sequence of consecutive integers that ranged in value from one up to the total number of rows. Each vine identification number was chosen randomly from a sequence of integers that ranged from one to the total number of vines in the longest row. Whenever the vine designated in this way was not present within the row selected (because the row was too short), the entire (row, vine) identification number pair was discarded and a new (row, vine) number pair generated. If the identification number pair did not need to be discarded (because the identified vine was present within the selected row), then that row was designated for sampling and three more vines were randomly selected without replacement from within that same row, using a table of random numbers. In this way 12 rows and 48 vines were selected from each block.
In order to minimize the number of clusters removed from any individual vine, vines were sampled in pairs and the vine adjacent to each of the 48 vines was also selected for sampling. Adjacent was defined as the next higher numbered vine in the row. If the adjacent vine was nonproductive or the randomly selected vine was the final vine in a row, then the next lower numbered vine was selected. The inwardfacing cordons of each vine pair were sampled. This procedure provided an overall vine/cordon sample size of 96.
Shoots were selected from these cordons using a Gaussian quadrature approach analogous to the approach applied in the pilot study, only using four instead of five sampling positions. Four Gaussian node positions were marked on lengths of elastic, whose relative positions equaled −0.861, −0.340, 0.340, and 0.861. As in the pilot study, 0 again designated the center of the elastic, and the four positions were obtained by making marks 0.861*(half total length) and 0.340*(half total length) both to the left and to the right of the center. In the field, the ends of the elastic were again lined up with the proximal and distal shoots (although these shoots were not sampled), and the four Gaussian node positions were denoted by the letters A through D, with A closest to the proximal shoot. Using a random number table, two sampling positions (A, B, C, or D) were chosen randomly, without replacement, for shoot selection from the first cordon, and the other two positions used for shoot selection from the adjacent cordon. Both cordon lengths were recorded. One cluster was then selected randomly from each of the clusterbearing shoots that were nearest these Gaussian nodes, and the cluster position and number of clusters on the shoot prior to sampling was recorded. In this way, two clusters were sampled from each member of the vine pair, which yielded a sample size of 192 clusters from each of four blocks. Since one cluster from CH was damaged before measurement, this reduced the CH sample size to 191 clusters.
Harvest sample collection.
All clusters from each previously sampled vine were removed, counted, and weighed as a whole at harvest (5 Oct for CS1, 7 Sept for ME1, 4 Oct for CS2, 9 Sept for ME2, and 31 Aug for CH). Data analyses were conducted using A Statistical Program for the Social Sciences version 11.5 (SPSS, Inc., Chicago, IL).
Results
Cluster differences based on clusters per shoot.
Box plots that depict how the weights of individual clusters varied as a function of the total number of clusters on the sampled shoot are shown (Figure 1⇓). These 2003 ME1 and CS1 box plots suggest that individual cluster weight depends statistically on whether clusters grow on shoots that support only one cluster (singleton cluster shoots) or grow on shoots that support two clusters.
For all six blocks sampled in 2003 and 2004, the mean cluster weight for twocluster shoots was greater than the mean cluster weight for singleton cluster shoots (Table 1⇓). Four tabled pvalues were less than 0.03 and one was less than 0.06, which implies that there is a consistent and significant difference in mean cluster weight. For CS2 and 2003 CS1, pvalues were less than 0.0001. CS2 was also the only block with a relatively high number (30) of sampled shoots that bore three clusters; the mean cluster weight from these shoots was 47.5 g, significantly larger than the singleton cluster mean of 32.5 g (p = 0.002).
Vine differences based on clusters per sampled shoot.
Relationships between the mean number of clusters per sampled shoot (two shoots were sampled from each vine) and the overall yield of the vine measured at harvest in 2004 are shown (Figure 2⇓). For CS1, CS2, and CH in particular, box plots indicate a noticeable upward trend in vine yield as the mean number of clusters per sampled shoot increased.
Differences in cluster weight based on cluster position.
Sidebyside box plots that describe cluster weight distributions of 88 basal and 80 second clusters for the 2003 CS1 sample are shown (Figure 3⇓). While the median lines are similar, the whisker lengths are quite different, suggesting that the standard deviations of the two groups of measurements are not the same. Correspondingly, a t test that compared mean cluster weights of basal and second clusters for this block only resulted in a pvalue of 0.078, while Levene’s test for equality of variances resulted in a a highly significant pvalue of 0.003. The 2004 CS1 sample also showed a significant ( p = 0.038) difference in variances between 130 basal and 61 second clusters, this time paired with a highly significant ( p < 0.0002) difference between the mean weights of these clusters, which were 50.0 g (basal) and 37.0 g (second). The 2004 ME1 sample, with 131 basal and 61 second clusters, showed a borderline ( p = 0.079) difference in variances, but a significant (p = 0.02) difference in means of 99.0 g (basal) and 82.4 g (second).
Cluster counts, vine yield, and cordon length.
In three out of the five blocks sampled in 2004, significant correlations were determined between measured vine yield at harvest (g) and sampled cordon length (cm) for CS1 (0.237*), CS2 (0.553**), and CH (0.220*). The data from all five blocks indicated significant correlations between sampled cordon length and number of clusters counted at harvest: CS1 (0.223*), CS2 (0.485**), ME1 (0.212*), ME2 (0.201*), and CH (0.293**).
Uniformity of cluster weights along the cordon.
Box plots of cluster weights (g) as a function of shoot position along the cordon for the five blocks sampled in 2004 are shown (Figure 4⇓). As explained previously, A designates the most proximal sampled shoot and D the most distal sampled shoot relative to the trunk. The sidebyside box plots have similar medians, interquartile ranges, and whisker lengths, suggesting that, regardless of sampled shoot position along the cordon, there is withinblock cluster weight distributional uniformity. (Analogous plots constructed in 2003 supported this conclusion.)
Discussion
Results suggest a possible sampling bias (Table 1⇑). Because shoots with more clusters tend on average to bear larger clusters, sampling all clusters from shoots with nonsingleton clusters is likely to overestimate yield. Samples of clusters are typically limited in size; hence, the removal of all clusters from each located sampled shoot is likely to cause an overrepresentation of the more productive shoots. Bias induction is also suggested by the observed relationship between the mean number of clusters per sampled shoot on a given vine and the yield of that vine at harvest. (Since in 2004 only two shoots were sampled from each vine, finding such a relationship in three of the five blocks is particularly striking.) Evidence of differences between basal and second clusters was also found, with basal clusters appearing more variable and sometimes larger, on average, than second clusters. These findings suggest that random sampling of a sole cluster from each selected shoot is the preferable approach whenever biases in sampling are to be avoided.
The observed uniformity of cluster weights along the cordon indicates that systematic sampling, such as our Gaussian quadraturebased method, can help avoid bias. Correlations between cluster counts and cordon length and between vine yield and cordon length suggest that vines should be sampled as randomly as possible in order to avoid sampling bias. While a completely random sample of vines within blocks would be optimal, such an approach is not likely practical in light of the constraints on worker time that are commonplace during the growing season. The modified rejectiontype procedure for random row and vine selection tends to reduce visits to multiple rows and also helps avoid the overrepresentation of vines grown on short rows.
Conclusion
Our analysis of blocks of cordonpruned vines indicates that withinvine and betweenvine differences do exist that can influence grape cluster sample validity. We propose that the randomsystematicrandom sampling technique provides a viable option for improving the quality of grape cluster samples on cordonpruned vines. The technique is designed not only to reduce sampling bias but also to reduce worker time demands for increased sampling efficiency.
Footnotes

Acknowledgments: Funding for this project was provided by a UC Discovery Grant awarded by the IndustryUniversity Cooperative Research Program (IUCRP). This grant was a threeway partnership between UC Berkeley, California Industry, and the State of California, and from the University of California Committee on Research.

Special thanks to Daniel Bosch, Director of Viticulture, Icon Estates, who helped guide many stages of this research.
 Received November 2006.
 Revision received June 2007.
 Revision received September 2007.
 Copyright © 2008 by the American Society for Enology and Viticulture