Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

VitisNet: “Omics” Integration through Grapevine Molecular Networks

  • Jérôme Grimplet,

    Affiliation Horticulture, Forestry, Landscape, and Parks Department, South Dakota State University, Brookings, South Dakota, United States of America

  • Grant R. Cramer,

    Affiliation Department of Biochemistry, University of Nevada Reno, Reno, Nevada, United States of America

  • Julie A. Dickerson,

    Affiliation Department of Electrical and Computer Engineering and Bioinformatics and Computational Biology Program, Iowa State University, Ames, Iowa, United States of America

  • Kathy Mathiason,

    Affiliation Horticulture, Forestry, Landscape, and Parks Department, South Dakota State University, Brookings, South Dakota, United States of America

  • John Van Hemert,

    Affiliation Department of Electrical and Computer Engineering and Bioinformatics and Computational Biology Program, Iowa State University, Ames, Iowa, United States of America

  • Anne Y. Fennell

    anne.fennell@sdstate.edu

    Affiliation Horticulture, Forestry, Landscape, and Parks Department, South Dakota State University, Brookings, South Dakota, United States of America

Abstract

Background

Genomic data release for the grapevine has increased exponentially in the last five years. The Vitis vinifera genome has been sequenced and Vitis EST, transcriptomic, proteomic, and metabolomic tools and data sets continue to be developed. The next critical challenge is to provide biological meaning to this tremendous amount of data by annotating genes and integrating them within their biological context. We have developed and validated a system of Grapevine Molecular Networks (VitisNet).

Methodology/Principal Findings

The sequences from the Vitis vinifera (cv. Pinot Noir PN40024) genome sequencing project and ESTs from the Vitis genus have been paired and the 39,424 resulting unique sequences have been manually annotated. Among these, 13,145 genes have been assigned to 219 networks. The pathway sets include 88 “Metabolic”, 15 “Genetic Information Processing”, 12 “Environmental Information Processing”, 3 “Cellular Processes”, 21 “Transport”, and 80 “Transcription Factors”. The quantitative data is loaded onto molecular networks, allowing the simultaneous visualization of changes in the transcriptome, proteome, and metabolome for a given experiment.

Conclusions/Significance

VitisNet uses manually annotated networks in SBML or XML format, enabling the integration of large datasets, streamlining biological functional processing, and improving the understanding of dynamic processes in systems biology experiments. VitisNet is grounded in the Vitis vinifera genome (currently at 8x coverage) and can be readily updated with subsequent updates of the genome or biochemical discoveries. The molecular network files can be dynamically searched by pathway name or individual genes, proteins, or metabolites through the MetNet Pathway database and web-portal at http://metnet3.vrac.iastate.edu/. All VitisNet files including the manual annotation of the grape genome encompassing pathway names, individual genes, their genome identifier, and chromosome location can be accessed and downloaded from the VitisNet tab at http://vitis-dormancy.sdstate.org.

Introduction

During the pre-genomics era, gene function was established through a reductionist approach [1] where organism physiology was understood by breaking components into pieces, studying them, and then putting them back together to see the larger picture. With the emergence of genome sequencing, organisms are now seen as complex interactive systems. Systems biology, adapted from the general system theory [2] and the living system theory [3], intends to explain biological phenomena utilizing a systemic view of the objects' relationships rather than their simple composition [4]. Integrative functional genomics combines the molecular components (transcripts, proteins, and metabolites) of an organism and incorporates them into functional networks or models designed to describe the dynamic activities of that organism. While many of the functions of individual parts are unknown or not well defined, their biological role can sometimes be inferred through association with other known parts, providing a better understanding of the biological system as a whole. On a system-wide scale the description requires three levels of information [5], [6]: (1) identification of the components (structural annotation) and characterization of their identity (functional annotation); (2) identification of molecules that interact with each component, which leads to the reconstruction of a biochemical reaction network; and (3) characterization of the behaviors of the transcripts, proteins, and metabolites under various conditions. Integration of the three levels of information into a coherent framework (or canvas) provides a powerful approach to tackle the difficult problem of extracting systems-wide behavior from the component interactions.

The most developed examples of application of this approach can be found in prokaryotes, because of their small genomes [7], [8]. For example, in E. coli, 92% of the gene product functions have been experimentally verified. Genome-scale models (GEMs) have been used for metabolic engineering to systematically manipulate E. coli strains to overproduce lycopene, lactic acid, ethanol, succinate, amino acids, and many other products including hydrogen and vanillin. New biological discoveries of open reading frames (ORF) can be made by focusing on the gaps in the unknown portions of the Omic maps, using the genomic responses of different genotypes under different conditions to determine the probable gene candidates that fill knowledge gaps. GEMs have been widely used to characterize and understand physiological responses to environmental conditions such as abiotic and biotic stresses. This has been particularly useful in the identification of resistance mechanisms that can be established in new strains.

Such global analyses have become possible with the development of high throughput genomics technologies in both the field of nucleic acid sequencing and quantitative data acquisition. Over the last 20 years, expressed tag sequencing (EST) [9] has been widely utilized for gene discovery and genome characterization. EST data are stored in comprehensive databases such as UniGene [10] or the DFCI Gene Indices [11]. Recently, cheaper and faster Next-Gen sequencing technologies have emerged such as 454 [12] or Illumina [13]. Recently, cheaper and faster Next-Gen sequencing technologies have emerged such as 454 [12] or Illumina [13]. In parallel, methods have been developed for quantitative data acquisition: microarrays are used to quantitatively assess the transcriptome [14]. Two dimensional-gels have routinely been used for proteome studies [15]. Recently, however, gel-free technologies have emerged such as ICAT [16] or iTRAQ [17]. Metabolome studies are performed with a variety of tools such as gas chromatography or high performance liquid chromatography for separation and mass spectrometry and nuclear magnetic resonance for the identification and quantification of the metabolites [18].

Genomics resources for Vitis vinifera and related species have proliferated rapidly within the last several years, including EST sequencing [19], [20], [21] to whole genome sequencing [22], [23] and integrated genetic maps [24]. These resources have permitted large-scale mRNA expression profiling studies of gene expression profiles during berry development using cDNA or oligonucleotide microarrays [25], [26]. A high-density, Affymetrix GeneChip® Vitis vinifera (Grape) Genome Array containing approximately one-third of the expected gene content of the V. vinifera genome with some bias towards leaf and berry tissues was developed, leading to numerous publications [27], [28], [29], [30], [31], [32], [33]. Under the encouragement of the international grape community, the microarray data for several of these experiments has been centralized and can be accessed at PLEXdb (http://www.plexdb.org) [34]. Six additional microarray datasets using cDNA, oligo, or Affymetrix arrays are available through Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/sites/entrez?db=geo) and citations for publications are also linked to these public data sets [33], [35], [36]. Proteomics resources have also emerged recently. Most of these studies use 2-D gel analysis and focus either on berry metabolism [37], [38] or abiotic stress resistance [39], [40], [41] or both [42]. Recently high resolution techniques, such as iTRAQ, have also been applied to grape [43]. Metabolomics studies for grape are still rudimentary; however, several works have presented simultaneous analysis of about 50 to 120 compounds [28], [30], [42], [44]. To date, only two studies present the transcriptomic, proteomic, and metabolomic analyses on the same material, one in berry tissues [29], [42] and the other on abiotic stress in shoots [40], [28].

Information from structural and functional genomics must be combined with detailed biochemical reaction networks to further our understanding of biological function and incorporate the knowledge into cultural practice. While a considerable amount of effort has been put into resolving the structural information (level 1) and “Omics” characterization of individual groups of transcripts, proteins or metabolites (level 3), relatively few biochemical reaction networks (level 2) have been constructed in grapevines or other plant systems. While pathway databases exist at the KEGG (http://www.genome.jp/kegg/pathway.html) or AraCyc [45], they are limited to metabolic pathways. In contrast, MetNet (http://metnet3.vrac.iastate.edu/) stores both metabolic and regulatory interactions for Arabidopsis and soybean [46].

In order to contextualize the molecular structure and a metric representing their behavior, we have developed a model of the molecular networks present in grapevines (VitisNet). This resource allows visualization of the dynamic interactions in the transcriptome, proteome, and metabolome within known molecular networks (for example, metabolic or signaling pathways). Integrating transcripts with protein and metabolite profiles in a comprehensive molecular map enables the researcher to elucidate different biochemical responses of grapevines to developmental and environmental cues.

Results and Discussion

A Set of 39,424 Unique Sequences Defined

The set of unique genes was not restricted to the Pinot Noir genome sequences, as an extensive amount of data have been produced on other V. vinifera cultivars and other Vitis species. The V. vinifera EST database contains only a very small fraction of Pinot Noir sequences (1.8% or 6,385/353,688), whereas Cabernet Sauvignon (half of the EST sequences), Chardonnay, Thompson Seedless, Muscat de Hambourg, and Perlette each have at least two times the number of Pinot Noir sequences. In addition, a significant amount of ESTs have been produced for other Vitis species. It is expected that a significant amount of transcript sequences are cultivar and species specific and may not be represented within the Pinot Noir PN40024 genome. A set of 39,424 unique sequences were defined after the matching of the genomic sequences and the transcripts (Figure 1). Only 36.4% of these sequences (14,330) were found in both the genomic sequences and the transcripts. In the set of unique sequences, the genomic sequences were conserved over transcript sequences because they should be the full length gene, whereas there is less certainty for the transcript. In some cases, several supposedly unique transcript sequences matched a single gene, mainly because they matched different regions of the gene. A total of 652 unique sequences corresponded to previously published grapevine sequences (Table S1).

thumbnail
Figure 1. Overview of the unique set assembly and results of the annotation procedure.

Box sizes are relative to respective number of genes inside.

https://doi.org/10.1371/journal.pone.0008365.g001

The set that was found only in the genomic sequences included 40.8% (16,104) of the unique sequences. This means that so far there is no proof that these sequences are actually transcribed. Finally 22.8% (8990) of the unique sequences were found only as transcripts. This set could include cultivar or species specific genes absent in the Pinot Noir genome or genes not yet extracted from the genome. However as 73% (6553) of these unique sequences were not homologous to sequences from other organisms, it is likely that most of them corresponded to short sequences or contained mostly UTR regions so that a BLAST analysis could not be conducted against the genome sequences encoding for their putative proteins. These sequences were of interest because many of them were placed on the highly popular Affymetrix GeneChip® Vitis vinifera (Grape) Genome Array. There were 3208 sequences amongst the 11,734 non-redundant sequences in the Affymetrix chip that did not present a match in the genome.

Half of the Matched Sequences Were Assigned to Molecular Networks

Seventy percent (27,680) of the unique genes matched a previously described Vitis cDNA or protein or a sequence from another organism. The remaining 11,744 sequences were Vitis-specific and a function could not be assigned. This number rose to 83% when only genes from the genome sequences were used. This gene set was divided into two groups, a group that could not be assigned to molecular networks and a group that could be assigned. The group that was not assigned to molecular networks consisted of 14,535 genes (52.5%) that covered a wide range of functional descriptions. At one extreme, the sequences (1,817) presented a completely unknown function. At the other extreme, an identifier was attributed to unmapped sequences (1,578). An identifier was assigned because an EC or KO number could be attributed to these sequences or an Arabidopsis homolog had an identifier; however, they couldn't be placed on the networks. In between the unknown and EC/KO identity, the description of the function ranged from sequences containing a poorly described domain, a general enzymatic activity, or to a well-documented gene.

The second subset of the matched genes (13,145 sequences, 47.5%), which were homologous to proteins with a known function, was assigned to the molecular networks. The 13,145 genes present in the networks were classified into 6 main overlapping categories (Table 1-6): Metabolism (5442 sequences), Genetic Information Processing (1249 sequences), Environmental Information Processing (1305 sequences), Cellular Processes (1121 sequences), Transport (3523 sequences), and Transcription Factors (2423 sequences). The complete annotation of the genes and relevant information for each is presented in Table S1. The references used for annotating genes and for developing pathways not found in KEGG are presented in Text S1.

thumbnail
Table 2. List of Genetic Information Processing Networks.

https://doi.org/10.1371/journal.pone.0008365.t002

thumbnail
Table 3. List of Environmental Information Processing Networks.

https://doi.org/10.1371/journal.pone.0008365.t003

Construction of 219 Networks

The networks were constructed with the CellDesigner software. This software has the benefit of being able to save the networks in the SBML (System Biology Markup Language) format. This format is highly portable into a variety of software packages, including Cytoscape, which was used here for data visualization of molecular expression. The networks were constructed with four main families of nodes (gene, transcripts, proteins, and metabolites) represented by specific shapes and colors in CellDesigner (Figure 2) and by shape only in Cytoscape (Figure 3; color was used to visualize abundance). In VitisNet, some extra node styles can be used in the networks for additional categories (phenotypes, phylogenic tree node, etc.). Edge styles represented different types of reactions, and they were specified by shape in CellDesigner and color in Cytoscape; Text S2 has a legend that summarizes the node and edge styles used in VitisNet in Cytoscape. Five digit IDs were assigned to the networks (Table S2). The first digit refers to the network category (metabolic pathway etc.), and the last four digits refer to the KEGG pathway number (if it existed in KEGG).

thumbnail
Figure 2. Citrate cycle pathway visualized using CellDesigner.

Symbols represent different molecules or reactions, i.e. blue rectangle: gene; green parallelogram: transcript; orange round rectangle: protein; and yellow ellipse: metabolite. Edges with a circle at the tip: catalysis (A). Edges with Delta at the tip: metabolic reaction (B). Edges with dash-dot-dot-dash: transcription (C). Edges with dash-dot-dash: translation (D). Insert box at the upper right represents a zoom-in of an area of the network showing the different molecule types.

https://doi.org/10.1371/journal.pone.0008365.g002

thumbnail
Figure 3. Flavonoid biosynthesis pathway and tissue-specific molecule abundance visualized using Cytoscape.

Parallelogram: transcript; rectangle: protein; ellipse: metabolite; triangle: reaction node. Blue edges with circle at the tip: catalysis. Black edges with Delta at the tip: metabolic reaction. Turquoise edges: translation. Transcript node in bold: existence of an Affymetrix (Vitis Vinifera (Grape) Genome Array) probeset. Red: over abundant in seed; magenta: over abundant in skin; green: over abundant in pulp; orange: over abundant in seed and skin. Insert box at the upper right represents a zoom-in of an area of the network showing the different molecule types.

https://doi.org/10.1371/journal.pone.0008365.g003

Metabolic pathways (1).

Metabolic pathways are the most common type of pathway that can be found for plants in several online databases such as KEGG or PlantCyc (http://www.plantcyc.org/). These networks (Table 1) represented metabolic reactions known to occur in grapevines. With the software package KEGG2SBML, it was easy to import the metabolic pathways from KEGG. The KEGG pathways were limited when they were used; they only showed metabolites and proteins involved in reactions and included reactions that may not occur in plants. Therefore, additional information and symbols representing the missing grape genes and transcripts were added to the networks in VitisNet described in this paper. Reactions in KEGG without a putative grape protein identified and for which no evidence for their presence in plants could be found in the literature were removed. Finally, reactions in grapevines that were absent in KEGG were manually added to the networks. The total number of items in the 88 grape metabolic pathways constructed included: 7,854 genes and transcripts, 1,631 proteins, and 1,998 metabolites. Some of these items were present in more than one network.

Genetic information processing (2).

The category “Genetic Information Processing” (Table 2) corresponds to housekeeping mechanisms that are present and highly conserved in all eukaryotes. These networks were present on the KEGG website but in a different format than the metabolic networks; therefore exportation with KEGG2SBML was not possible. These networks were represented by a picture of a specific modus operandi, with every involved protein listed at the side rather than in a diagram of the enzymatic reactions. In VitisNet, we have tried to represent these pictures interactively. Where this was not possible, the networks were presented as lists of genes, transcripts, and proteins. The total number of items in the 15 “Genetic Information Processing” networks included 1,338 genes and transcripts, 527 proteins, and 71 metabolites.

Environmental information processing (3).

The category “Environmental Information Processing” (Table 3) represents signal processes that occur in the grapevine. The networks belonging to “Signal Transduction” are highly variable amongst species but they are well documented for Arabidopsis in KEGG and were constructed using the Arabidopsis data. The networks for hormone signaling and plant-specific signaling were reconstructed from the literature. To the best of our knowledge, these networks could not be found in any other pathway databases. These networks are particularly valuable for the plant community since hormonal signaling is an important subject in many plant physiology studies. The total number of items in the 12 “Environmental Information Processing” networks included 1,373 genes and transcripts, 563 proteins, and 63 metabolites.

Cellular processes (4).

These networks for the “Cellular Processes” category (Table 4) were named from the KEGG pathways; however the KEGG pathways were not related to the molecular events occurring in plants. Although a small portion of the pathways were derived from KEGG, most components of the networks were constructed from information collected from the literature. The total number of items in the 3 “Cellular process” networks included 1,123 genes and transcripts, 359 proteins, and 12 metabolites.

Transport (5).

The networks for Hormone Transport (5.2) and Transport Systems were constructed from the literature (Table 5). The networks in “Transporters Catalog” present the classification of the putative grape transporters according to the transporter classification (TC) system. This classification was formally adopted by the International Union of Biochemistry and Molecular Biology (IUBMB) in June 2001 and is the international standard for the classification of transporters. In VitisNet, molecules designating a transporter were linked to their corresponding category. The total number of items in the 21 “Transport” networks included 3,622 genes and transcripts, 1,149 proteins, and 1 metabolite.

Transcription factors (6).

These networks presented the classification of the grape putative transcription factors (Table 6). The classification used here was a customized version of two plant transcription factor databases that contained a total of 80 families. The PlantTFDB [47] contained 64 families and the PlnTFDB [48] contained 68 families. Most of the families (58) were present in the two databases, although two families were exclusive to PlantTFDB and eight were exclusive to PlnTFDB. In addition, 12 families were exclusive to the grapevine transcription factors. Representatives of five of these families were present in the plntfdb under the family named “orphans” and we chose to break this group into distinct families. The seven other families identified were proteins that contain a domain found in BTF2-like transcription factors, Synapse associated proteins and DOS2-like proteins (BSD, [49]), the Global Transcription Factor group (GTF), and subfamilies of zinc finger proteins. The transcription factor families were presented as a phylogenetic tree, which allowed subfamilies to be grouped together. The total number of items in the 80 “Transcription factors” networks included 2,423 genes, transcripts, and proteins.

Omics Data Can Be Visualized on the Networks

Annotation of the genes and construction of VitisNet has filled a major gap in precise descriptive and quantitative tools for grapevine systems biology. The next challenge is the integration of the data. The molecular networks were built to allow simultaneous visualization of transcripts, proteins, and metabolites. Their respective abundance under various conditions can be visualized through the Cytoscape software.

Several methods exist to correlate and integrate transcript, protein, and metabolite profiles. For example molecular abundance profiles were linked with Pearson [50], [51] and Spearman [52] correlation coefficients, the BL-SOM method [53], [54] and the O2PLS method [55]. The O2PLS method enables the determination of the effect of each variable, in a multivariable experiment, on the co-expression of molecules. More recently the O2PLS method has been developed further to integrate all three molecular profiles (transcripts, proteins, and metabolites) [56].

In most of these statistical studies, data were visualized by representing molecules by nodes and the correlation by edges. Subsequently, selected pathways were drawn manually for biological phenomenon highlighted by the correlations of molecular abundance. In the visualization of “omics” data in VitisNet, edges represented biological processes and nodes represented molecules, as in classical presentations of pathways. Molecular abundance was represented by color changes of the nodes and biological phenomenon could be visualized automatically. As an illustration of the methodology used in VitisNet to provide visualization of “omics” data, datasets from a study of the differential transcript, protein, and metabolite abundance measured in three berry tissues [29], [42] was uploaded into the molecular maps. For consistency, proteins and metabolites [42] were clustered with the same methods used for clustering the transcripts [29] and the same color scheme was used, (green = molecules over-abundant in pulp, purple = molecules over-abundant in the skin, and orange = molecules over-abundant in seed [29]). The flavonoid biosynthesis pathway (Figure 3) presented here was more complex than previous representations of the pathway in [29] and [42]. Here it was further customized from the total flavonoid biosynthesis pathway in VitisNet by removing the gene nodes for easier visualization. As these studies have illustrated, molecules involved in the flavonoid biosynthesis pathway are slightly more abundant in skin than seed and clearly more abundant in both skin and seed than in the pulp. Transcriptomic results from Affymetrix GeneChip® Vitis vinifera (Grape) Genome Array were used here, but data from any microarray platform can be uploaded onto the networks. For example, Table S1 contains data for mapping the cDNA array used in a grape bud chilling requirement fulfillment study [35]. The integration of the berry tissues “omic” data on all the pathways was divided into higher level pathway categories; the Cytoscape session files, molecular networks and a tutorial (Text S2) can be accessed and downloaded at the VitisNet tab at vitis-dormancy.sdstate.org. All molecular network files are also available for browsing or downloading at MetNet (http://metnet3.vrac.iastate.edu/)

Conclusion

An exhaustive coverage of the network of grapevine molecules has been developed. It presents an easy, fast, and comprehensive method for simultaneous integration and visualization of “omics” data. These molecular networks provide biological value for both grapevine researchers and the rest of the plant scientist community. The following attributes are provided: (i) original plant-specific pathways within VitisNet, (ii) the possibility to create a mapping file of genes from other plants, and (iii) the ability to customize the schematics for new or species-specific reactions. In the future, in cooperation with the scientific community's curation of gene annotations, we are planning to release new networks and update existing networks with emerging data (ie. miRNA) at MetNet (http://metnet3.vrac.iastate.edu/) and VitisNet (http://vitis-dormancy.sdstate.org/pathways.cfm).

Materials and Methods

Definition of a Unique Set of Genes

The 30,434 DNA sequences encoding for putative proteins from the Vitis vinifera (c.v Pinot noir PN40024) genome [23] were matched to EST sequences from Vitis vinifera and other Vitis species. The V. vinifera sequences originated from the 5.0 release of the DFCI grape index (http://compbio.dfci.harvard.edu/tgi/cgi-bin/tgi/gimain.pl?gudb=grape) which contained 34,134 unique sequences. The set of non-vinifera sequences contained a total of 26,589 redundant ESTs obtained from the NCBI website. This set included sequences from the following species: V. shuttleworthii (10,704 sequences), hybrid cultivars (6,542 sequences), V. arizonica x rupestris (5,421 sequences), V. aestivalis (2,101 sequences), and V. riparia (1,821 sequences). A BLAST analysis of the sequences from the V. vinifera EST set and the non-vinifera EST set (Megablast, p > 95, e-value<1e-15) was conducted against the genomic sequences. Sequences not identified in the genome were added to the genomic sequences to constitute the unique sequences set. The 1395 mRNAs corresponding to grapevine protein sequences registered in UniProt and not belonging to one of the two genome sequencing projects were manually retrieved and BLAST analyzed (blastn e-value <1 e-15) against the unique sequences set.

Gene Annotation

During the first steps of annotation, a batch BLAST analysis (blastx, e-value<1e-10) of unique sequences was conducted against several relevant databases, including the Arabidopsis and rice genomes and the Viridiplantae protein sequences in NCBI. For each gene, the ten best significant matches in each database were conserved and reviewed for defining the most likely annotation. Particular attention was paid to using identical nomenclature for genes with the same function. A BLAST analysis of the genes that had at least one significant match containing a putative function was conducted against the KEGG database (http://www.genome.jp/kegg/) for defining an enzyme commission (EC) number or a KEGG Orthology (KO) number. For genes not identified in this screen, the EC number of genes suspected to encode for a protein with enzymatic function was identified by browsing enzyme nomenclature databases (such as Expasy (http://www.expasy.org/enzyme/) or BRENDA (http://www.brenda-enzymes.org/)). A BLAST analysis (blastx, e-value<1e-10) of the unique set was conducted against the Transport Classification Database (TCDB) (http://www.tcdb.org/) and the genes matching sequences from that database were again manually reviewed and assigned to a category from the Transport Classification System [57].

BLAST analysis (blastx, e-value<1e-10) of the unique set was conducted against two plant transcription factor databases, PlantTFDB, (http://planttfdb.cbi.pku.edu.cn/) [47] and PlnTFDB (http://plntfdb.bio.uni-potsdam.de/v2.0/) [48]. InterPro domains obtained for the grape sequences from the UniProt website were also used for the classification of transcription factors. The transcription factors were then grouped into families.

Where molecular interactions were identified in the literature, the gene function was browsed to identify the Vitis gene potentially involved. The genes described in the literature were validated by BLAST against the unique set of Vitis sequences to correctly identify any potential homolog that was previously mislabeled.

A short identifier was defined for genes that were present on the networks but did not have a previously defined EC number or a KO. For most of these, that identifier corresponded to the one commonly used for their Arabidopsis homolog in their Entrez webpage (http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene). For genes without an Arabidopsis homolog with a clear identifier, a unique identifier was created that was consistent with the gene function.

Network Construction

Metabolic pathways (1).

KEGG metabolic pathways were downloaded from the KEGG website and converted into SBML files with the KEGG2SBML software package [58]. Grape genes and transcripts were manually added to the networks and linked to their corresponding proteins with the CellDesigner software package [59]. Plant- or grape-specific reactions that were not present in KEGG but were described in the literature were added manually.

Genetic information processing (2), signal transduction (3.1), and ABC transporters (5.2).

KEGG pathways were manually reconstructed with CellDesigner using the SBML format, and then grape genes and transcripts were manually added to the networks and linked to their corresponding proteins. Plant- or grape-specific processes that were not present in KEGG but were described in the literature were manually added.

Hormones signaling (3.2), plant-specific signaling (3.3), cellular processes (4), hormone transport (5.2), and transport system (5.3).

Networks were manually constructed from the literature with CellDesigner using the SBML format, and then grape genes and transcripts were manually added to the networks and linked to their corresponding proteins.

Transport catalog (5.4).

Networks were manually constructed with CellDesigner using the SBML format. Grape genes and transcripts matching transporter proteins from any other organisms were manually added to the networks and linked to their corresponding proteins. Proteins were linked to an object class representing a transporter subcategory from the TCdb.

Transcription (6).

Networks were manually constructed with CellDesigner using the SBML format. Grape genes and transcripts matching transcription factors from other species were manually added to the networks and linked to their corresponding proteins. For each transcription factor family, a phylogenetic tree was constructed based on protein alignment generated with the neighbor-joining method using ClustalW. The transcription factors were then grouped according to the phylogenetic tree. Distances are not related to respective phylogenic distances. All the relevant bibliography for the construction of literature-based pathways is included in Table S2 and Text S1.

Expression Profiling

Affymetrix probesets were matched to the genome using the same process as that used between the genome sequences and EST sequences. The tentative contigs from the DFCI Grape Gene Index (http://compbio.dfci.harvard.edu/tgi/cgi-bin/tgi/gimain.pl?gudb=grape), that contain the ESTs that were used as templates for the Affymetrix probesets, were BLAST analyzed against the genome sequences (Megablast, p>95, e-value<1e-15).

Transcriptomic data were retrieved from Grimplet et al. [29]. Proteomics and metabolomics data were retrieved from Grimplet et al. [42]. All molecules with differential abundance were grouped into 12 clusters presented by Grimplet et al. [29] according to their abundance in the three berry tissues. Data were visualized using VitisNet with the Cytoscape software [60] (see Text S2 for a tutorial on the complete procedure).

Supporting Information

Table S1.

The complete grape gene annotation based on the 8X assembly (Jaillon et al., 2007) of transcript sequences. Unique Gene: Genoscope ID (Jaillon et al., 2007) is used if a genome sequence has been identified, otherwise VVGI 5 TC (Tentative Consensus sequences) number or EST GenBank ID is used. Unique transcript: VVGI 5 TC number or EST GenBank ID is used if a transcript has been identified, otherwise the Genoscope ID is used. Function: tentative functional annotation. Network ID: the identifier that is used in the networks. Network or simplified category: list of the networks where the genes appear, otherwise a short description of the biological role. In Network: the gene is present in at least one network. Probeset: probeset ID for the Affymetrix GeneChip® Vitis vinifera (Grape) Genome Array. Best Arabidopsis match: best matched hit in Arabidopsis putative proteins. InterPro domain ID: list of the domains detected from InterPro (Hunter et al., 2009). Gene Ontology ID: list of the identified GO terms. Gene Ontology description: description of the GO term (The Gene Ontology Consortium, 2009). Accession UniProt: UniProt ID for the genome sequences (Apweiler et al., 2004). Accession UniProt for published grapevine protein: UniProt ID for grapevine proteins individually published apart of the genome sequencing. EST probeset: EST from which the probeset was designed. IASMA gene: ID from the heterozygote Vitis genome (Velasco et al., 2007). Chromosome position: position of the gene on chromosome retrieved from Gramene.org. Other Vitis: presence in non-vinifera Vitis species. cDNA array: ID used in the cDNA array from Mathiason et al., (2009). Other TC from VVGI5: list of other TC from the DFCI matching the gene. Other probesets: other Affymetrix probesets matching the gene.

https://doi.org/10.1371/journal.pone.0008365.s001

(10.28 MB XLS)

Table S2.

List of pathways constructed from bibliographic data and the corresponding journal articles used.

https://doi.org/10.1371/journal.pone.0008365.s002

(0.03 MB DOC)

Text S1.

References for supporting material.

https://doi.org/10.1371/journal.pone.0008365.s003

(0.06 MB DOC)

Text S2.

Tutorial for Using VitisNet, a database for the grapevine molecular networks.

https://doi.org/10.1371/journal.pone.0008365.s004

(3.51 MB DOC)

Acknowledgments

The authors wish to thank Wei Ma and Wendy Cradduck for the http://vitis-dormancy.sdstate.org website design, Kim Victor for curation of the SBML networks source code, and Yves Sucaet and Eve S. Wurtele for the MetNet interface API.

Author Contributions

Designed the study, annotated the genes, constructed the networks, set up the “omics” visualization protocol and drafted the manuscript: JG. Participated in the organization of the study and finalization of the manuscript: GRC JAD. Participated in the validation of the “omics” visualization protocol and finalization of the manuscript: KM. Participated in VitisNet incorporation into MetNET: JLV. Organized the study, finalized the “omics” visualization protocol and finalized the written manuscript: AYF.

References

  1. 1. Descartes R (1637) Discours de la méthode. Jan Maire (ed), Leiden, Netherlands .
  2. 2. Von Bertalanffy L (1968) General System Theory: Foundations, Development, Applications. George Braziller (ed), New York, USA .
  3. 3. Miller JG (1978) Living Systems. Mcgraw-Hill, New York, USA .
  4. 4. Mesarovic MD (1968) Systems theory and biology - view of a theoretician. Springer: Verlag New York, USA. In Systems Theory and Biology, Mesarovic MD (ed) pp 59–87.
  5. 5. Kitano H (2002) Systems biology: a brief overview. Science 295: 1662–1664.
  6. 6. Albert R (2007) Network inference, analysis, and modeling in systems biology. Plant Cell 19: 3327–3338.
  7. 7. Feist AM, Palsson BO (2008) The growing scope of applications of genome-scale metabolic reconstructions using Escherichia coli. Nat Biotechnol 26: 659–667.
  8. 8. Becker SA, Palsson BO (2005) Genome-scale reconstruction of the metabolic network in Staphylococcus aureus n315: an initial draft to the two-dimensional annotation. BMC Microbiol 5: 8.
  9. 9. Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, et al. (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252: 1651–1656.
  10. 10. Boguski MS, Schuler GD (1995) ESTablishing a human transcript map. Nat Genet 10: 369–371.
  11. 11. Quackenbush J, Cho J, Lee D, Liang F, Holt I, et al. (2001) The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res 29: 159–64.
  12. 12. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437: 376–380.
  13. 13. Bennett S (2004) Solexa Ltd. Pharmacogenomics 5: 433–438.
  14. 14. Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270: 467–470.
  15. 15. O'Farrell PH (1975) High resolution two-dimensional electrophoresis of proteins. J Biol Chem 250: 4007–21.
  16. 16. Gygi SP, Rist B, Gerber SA, Turecek F, Gelb MH, et al. (1999) Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat Biotechnol 17: 994–999.
  17. 17. Ross PL, Huang YN, Marchese JN, Williamson B, Parker K, et al. (2004) Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol Cell Proteomics 3: 1154–1169.
  18. 18. Fiehn O (2002) Metabolomics–the link between genotypes and phenotypes. Plant Mol Biol 48: 155–171.
  19. 19. Da Silva FG, Iandolino A, Al-Kayal F, Bohlmann MC, Cushman MA, et al. (2005) Characterizing the grape transcriptome. Analysis of expressed sequence tags from multiple Vitis species and development of a compendium of gene expression during berry development. Plant physiology 139: 574–597.
  20. 20. Moser C, Segala C, Fontana P, Salakhudtinov I, Gatto P, et al. (2005) Comparative analysis of expressed sequence tags from different organs of Vitis vinifera L. Funct Integr Genomics 5: 208–217.
  21. 21. Peng FY, Reid KE, Liao N, Schlosser J, Lijavetzky D, et al. (2007) Generation of ESTs in Vitis vinifera wine grape (Cabernet Sauvignon) and table grape (Muscat Hamburg) and discovery of new candidate genes with potential roles in berry development. Gene 402: 40–50.
  22. 22. Velasco R, Zharkikh A, Troggio M, Cartwright DA, Cestaro A, et al. (2007) A high quality draft consensus sequence of the genome of a heterozygous grapevine variety. PLoS ONE 2: 1326.
  23. 23. Jaillon O, Aury J-M, Noel B, Policriti A, Clepet C, et al. (2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449: 463–467.
  24. 24. Vezzulli S, Micheletti D, Riaz S, Pindo M, Viola R, et al. (2008) A SNP transferability survey within the genus Vitis. BMC Plant Biol 8: 128.
  25. 25. Terrier N, Glissant D, Grimplet J, Barrieu F, Abbal P, et al. (2005) Isogene specific oligo arrays reveal multifaceted changes in gene expression during grape berry (Vitis vinifera L.) development. Planta 222: 832–847.
  26. 26. Waters DL, Holton TA, Ablett EM, Lee LS, Henry RJ (2005) cDNA microarray analysis of developing grape (Vitis vinifera cv. Shiraz) berry skin. Funct Integr Genomics 5: 40–58.
  27. 27. Tattersall E, Grimplet J, Deluc L, Wheatley M, Vincent D, et al. (2007) Transcript abundance profiles reveal larger and more complex responses of grapevine to chilling compared to osmotic and salinity stress. Funct Integr Genomics 7: 317–333.
  28. 28. Cramer G, Ergul A, Grimplet J, Tillett R, Tattersall E, et al. (2007) Water and salinity stress in grapevines: early and late changes in transcript and metabolite profiles. Funct Integr Genomics 7: 111–134.
  29. 29. Grimplet J, Deluc LG, Tillett RL, Wheatley MD, Schlauch KA, et al. (2007) Tissue-specific mRNA expression profiling in grape berry tissues. BMC Genomics 8: 187.
  30. 30. Deluc LG, Grimplet J, Wheatley MD, Tillett RL, Quilici DR, et al. (2007) Transcriptomic and metabolite analyses of cabernet sauvignon grape berry development. BMC Genomics 8: 429.
  31. 31. Espinoza C, Vega A, Medina C, Schlauch K, Cramer G, et al. (2007) Gene expression associated with compatible viral diseases in grapevine cultivars. Funct Integr Genomics 7: 95–110.
  32. 32. Pilati S, Perazzolli M, Malossini A, Cestaro A, Dematte L, et al. (2007) Genome-wide transcriptional analysis of grapevine berry ripening reveals a set of genes similarly modulated during three seasons and the occurrence of an oxidative burst at veraison. BMC Genomics 8: 428.
  33. 33. Fung RW, Gonzalo M, Fekete C, Kovacs LG, He Y, et al. (2008) Powdery mildew induces defense-oriented reprogramming of the transcriptome in a susceptible but not in a resistant grapevine. Plant Physiol 146: 236–249.
  34. 34. Wise RP, Caldo RA, Hong L, Shen L, Cannon EK, et al. (2007) pp. 347–363. BarleyBase/PLEXdb: A Unified Expression Profiling Database for Plants and Plant Pathogens In Methods in Molecular Biology, Vol. 406, Plant Bioinformatics - Methods and Protocols. Edwards D. ed. Humana Press, Totowa, NJ.
  35. 35. Mathiason K, He D, Grimplet J, Venkateswari J, Galbraith D, et al. (2009) Transcript profiling in Vitis riparia during chilling requirement fulfillment reveals coordination of gene expression patterns with optimized bud break. Funct Integr Genomics 9: 81–96.
  36. 36. Lund ST, Peng FY, Nayar T, Reid KE, Schlosser J (2008) Gene expression analyses in individual grape (Vitis vinifera L.) berries during ripening initiation reveal that pigmentation intensity is a valid indicator of developmental staging within the cluster. Plant Mol Biol 68: 301–315.
  37. 37. Sarry J-E, Sommerer N, Sauvage F-X, Bergoin A, Rossignol M, et al. (2004) Grape berry biochemistry revisited upon proteomic analysis of the mesocarp. Proteomics 4: 201–215.
  38. 38. Giribaldi M, Perugini I, Sauvage F-X, Schubert A (2007) Analysis of protein changes during grape berry ripening by 2-DE and Maldi-Tof. Proteomics 7: 3154–3170.
  39. 39. Castro AJ, Carapito C, Zorn N, Magné C, Leize E, et al. (2005) Proteomic analysis of grapevine (Vitis vinifera L.) tissues subjected to herbicide stress. J Exp Bot 56: 2783–2795.
  40. 40. Vincent D, Ergul A, Bohlman MC, Tattersall EA, Tillett RL, et al. (2007) Proteomic analysis reveals differences between Vitis vinifera L. cv. Chardonnay and cv. Cabernet Sauvignon and their responses to water deficit and salinity. J Exp Bot 58: 1873–1892.
  41. 41. Jellouli N, Jouira BH, Skouri H, Ghorbel A, Gourgouri A, et al. (2008) Proteomic analysis of Tunisian grapevine cultivar Razegui under salt stress. J Plant Physiol 165: 471–481.
  42. 42. Grimplet J, Wheatley MD, Ben Jouira H, Deluc LG, Cramer GR, et al. (2009) Proteomic and selected metabolite analysis of grape berry tissues under well watered and water-deficit stress conditions. Proteomics. DOI:https://doi.org/10.1002/pmic.200800158.
  43. 43. Lucker J, Laszczak M, Smith D, Lund ST (2009) Generation of a predicted protein database from EST data and application to I-TRAQ analyses in grape (Vitis vinifera cv. Cabernet Sauvignon) berries at ripening initiation. BMC Genomics 10: 50.
  44. 44. Figueiredo A, Fortes AM, Ferreira S, Sebastiana M, Choi YH, et al. (2008) Transcriptional and metabolic profiling of grape (Vitis vinifera L.) leaves unravel possible innate resistance against pathogenic fungi. J Exp Bot 59: 3371–3381.
  45. 45. Zhang P, Foerster H, Tissier CP, Mueller L, Paley S, et al. (2005) Metacyc and aracyc. metabolic pathway databases for plant research. Plant Physiol 138: 27–37.
  46. 46. Wurtele ES, Li L, Berleant D, Cook D, Dickerson JA, et al. (2007) MetNet: Systems Biology Software for Arabidopsis. In: Concepts in Plant Metabolomics. (B. J. Nikolau,E.S. Wurtele; editors). Springer 145–158.
  47. 47. Guo AY, Chen X, Gao G, Zhang H, Zhu QH, et al. (2008) Planttfdb: a comprehensive plant transcription factor database. Nucleic Acids Res 36: 966–969.
  48. 48. Riano-Pachon DM, Ruzicic S, Dreyer I, Mueller-Roeber B (2007) Plntfdb: An integrative plant transcription factor database. BMC Bioinformatics 8: 42.
  49. 49. Doerks T, Huber S, Buchner E, Bork P (2002) BSD: a novel domain in transcription factors and synapse-associated proteins. Trends Biochem Sci 27: 168–170.
  50. 50. Oresic M, Clish CB, Davidov EJ, Verheij E, Vogels J, et al. (2004) Phenotype characterization using integrated gene transcript, protein and metabolite profiling. Appl Bioinformatics 3: 205–217.
  51. 51. Rischer H, Oresic M, Seppänen-Laakso T, Katajamaa M, Lammertyn F, et al. (2006) Gene-to-metabolite networks for terpenoid indole alkaloid biosynthesis in Catharanthus roseus cells. Proc Natl Acad Sci U S A 103: 5614–5619.
  52. 52. Carrari F, Baxter C, Usadel B, Urbanczyk-Wochniak E, Zanor M-I, et al. (2006) Integrated analysis of metabolite and transcript levels reveals the metabolic shifts that underlie tomato fruit development and highlight regulatory aspects of metabolic network behavior. Plant Physiol 142: 1380–1396.
  53. 53. Hirai MY, Yano M, Goodenowe DB, Kanaya S, Kimura T, et al. (2004) Integration of transcriptomics and metabolomics for understanding of global responses to nutritional stresses in Arabidopsis thaliana. Proc Natl Acad Sci U S A 101: 10205–10210.
  54. 54. Hirai MY, Klein M, Fujikawa Y, Yano M, Goodenowe DB, et al. (2005) Elucidation of gene-to-gene and metabolite-to-gene networks in Arabidopsis by integration of metabolomics and transcriptomics. J Biol Chem 280: 25590–25595.
  55. 55. Bylesjo M, Eriksson D, Kusano M, Moritz T, Trygg J (2007) Data integration in plant biology: the O2PLS method for combined modeling of transcript and metabolite data. Plant J 52: 1181–1191.
  56. 56. Bylesjo M, Nilsson R, Srivastava V, Gronlund A, Johansson AI, et al. (2009) Integrated analysis of transcript, protein and metabolite data to study lignin biosynthesis in hybrid aspen. J Proteome Res 8: 199–210.
  57. 57. Saier MH, Tran CV, Barabote RD (2006) TCDB: the transporter classification database for membrane transport protein analyses and information. Nucleic Acids Res 34: 181–186.
  58. 58. Funahashi A, Jouraku A, Kitano H (2004) Converting KEGG pathway database to SBML. In 8th Annual International Conference on Research in Computational Molecular Biology, San Diego.
  59. 59. Funahashi A, Morohashi M, Kitano H, Tanimura N (2003) Celldesigner: a process diagram editor for gene-regulatory and biochemical networks. BIOSILICO 1: 159–162.
  60. 60. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, et al. (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13: 2498–2504.