Lana David | Download | HTML Embed
  • Feb 4, 2012
  • Views: 6
  • Page(s): 17
  • Size: 1.26 MB
  • Report



1 AJB Advance Article published on February 6, 2012, as 10.3732/ajb.1100385. The latest version is at http://www.amjbot.org/cgi/doi/10.3732/ajb.1100385 American Journal of Botany 99(2): 000000. 2012. GENOMICS OF GENE BANKS: A CASE STUDY IN RICE1 SUSAN R. MCCOUCH2, KENNETH L. MCNALLY3, WEN WANG4, AND RUARAIDH SACKVILLE HAMILTON3 2Department of Plant Breeding and Genetics, Cornell University, Ithaca, NewYork 14853-1901 USA; 3International Rice Research Institute, DAPO Box 7777, Metro Manila 1301, Philippines; and 4Kunming Institute of Zoology, the Chinese Academy of Sciences, No. 32 Jiaochang Donglu Kunming, Yunnan 650223 P.R. China Only a small fraction of the naturally occurring genetic diversity available in the worlds germplasm repositories has been ex- plored to date, but this is expected to change with the advent of affordable, high-throughput genotyping and sequencing technol- ogy. It is now possible to examine genome-wide patterns of natural variation and link sequence polymorphisms with downstream phenotypic consequences. In this paper, we discuss how dramatic changes in the cost and efficiency of sequencing and genotyping are revolutionizing the way gene bank scientists approach the responsibilities of their job. Sequencing technology provides a set of tools that can be used to enhance the quality, efficiency, and cost-effectiveness of gene bank operations, the depth of scientific knowledge of gene bank holdings, and the level of public interest in natural variation. As a result, gene banks have the chance to take on new life. Previously seen as warehouses where seeds were diligently maintained, but evolutionarily frozen in time, gene banks could transform into vibrant research centers that actively investigate the genetic potential of their holdings. In this paper, we will discuss how genotyping and sequencing can be integrated into the activities of a modern gene bank to revolutionize the way scientists document the genetic identity of their accessions; track seed lots, varieties, and alleles; identify duplicates; and ra- tionalize active collections, and how the availability of genomics data are likely to motivate innovative collaborations with the larger research and breeding communities to engage in systematic and rigorous phenotyping and multilocation evaluation of the genetic resources in gene banks around the world. The objective is to understand and eventually predict how variation at the DNA level helps determine the phenotypic potential of an individual or population. Leadership and vision are needed to coordinate the characterization of collections and to integrate genotypic and phenotypic information in ways that will illuminate the value of these resources. Genotyping of collections represents a powerful starting point that will enable gene banks to become more effec- tive as stewards of crop biodiversity. Key words: breeding value; genetic diversity; genotype; natural variation; next generation sequencing; Oryza rufipogon; Oryza sativa; phenotype; resequencing; rice. Crop gene banks exist to conserve the genetic diversity of built on available diversity, and one key role of a gene bank is cultivated and wild plants that humans depend on for food, to help safeguard natural forms of genetic variation so that even fiber, and fuel. That diversity is essential to improving the agri- if it is lost from natural environments and farmers fields, it re- cultural productivity, sustainability, and nutritional quality of mains readily accessible to plant biologists, breeders, and other crops in the face of changing climates, pests, diseases, and con- key users. Gene banks conserve living samples, mostly in the sumer preferences. Collectively, there are over 7 million plant form of seeds, where each sample, or accession, represents a germplasm accessions housed in some 1750 national and inter- distinct combination of genes and alleles that confer different national gene banks (FAO, 2010). They are part of a worldwide attributes and adaptive potential. These ex situ collections are effort to conserve, characterize, and use plant biological diver- complemented by in situ conservation efforts. In situ conserva- sity to address problems of global importance. tion involves the creation of national parks and nature preserves Natural variation, which has evolved over millions of years, to protect stands of wild relatives of crops and efforts to support provides the essential building blocks for both conscious and traditional farmers who maintain the diversity of locally adapted unconscious forms of plant breeding. All breeding advances are landraces. These materials often harbor unique forms of diver- sity in the form of rare alleles or unusual allele combinations, 1 Manuscript which can provide important clues about adaptation. Paradoxi- received 5 August 2011; revision accepted 29 December 2011. We thank Theo van Hintum and Tom Payne for helpful comments and cally, the success of modern breeding efforts has tended to criticisms that helped to greatly improve this manuscript. We also erode the diversity found in farmers fields, because a relatively acknowledge funding support from the U. S. National Science Foundation small number of high yielding, genetically uniform varieties of (Plant Genome Research Program grant #1026555 to S.Mc.) and the U. S. major crop species have often replaced the patchwork of het- Department of Agriculture (Agriculture and Food Initiative grant #2009- erogeneous, locally adapted landraces that once characterized 65300-05698 to S.Mc.), the Generation Challenge Program project the agricultural landscape. G4005.01.07 Genotyping of Composite Germplasm Set, Tier 1, Rice to Over the last ~60 years, people have become increasingly K.M. and R.S.H. and a 973 Program (No. 2007CB815700) of China to aware of the implications of genetic erosion in terms of its im- W.W. Photographs in Fig. 1 were taken by Renato Reao and Anthony pact on environmental and agricultural sustainability (Ford-Lloyd Telosa of the Resources Center at IRRI, with editing and assembly by K.M. We are also grateful to Cheryl Utter for help formatting. et al., 2008). Reductions in both the number of species and the 2 Author for correspondence (e-mail: [email protected]) level of intraspecific variation on which the human food supply depends means that crops are more vulnerable to unpredict- doi:10.3732/ajb.1100385 able weather patterns, epidemics of pests and diseases, and American Journal of Botany 99(2): 117, 2012; http://www.amjbot.org/ 2012 Botanical Society of America 1 Copyright 2012 by the Botanical Society of America

2 2 AMERICAN JOURNAL OF BOTANY [Vol. 99 fluctuations in global markets, all of which directly affect the (Fig. 1). Along with the collecting, they strive to obtain reliable availability of basic foodstuffs for humans. The ability to re- passport information about the origin or source of those spond constructively to these situations requires continuing ac- samples, to conserve viable seed stocks representing each sam- cess to a broad range of novel forms of genetic variation. ple, and to distribute healthy seeds to users upon request. These Several of the worlds largest, publicly available germplasm activities have posed enormous challenges, and some gene collections were placed under the auspices of the Food and Ag- banks have been able to meet those challenges better than riculture Organization (FAO) of the United Nations in 1994, as others. Successful gene bank operation requires careful fi- part of an international network of ex situ collections. In 2007 nancial and administrative management, coupled with state- they were placed within the purview of the International Treaty of-the-art technical, scientific, and biological understanding on Plant Genetic Resources for Food and Agriculture. These of each species to guarantee high-quality conservation of ge- international gene banks focus on ca. 20 of the worlds most netic materials. widely consumed staple food crops (table 1 from Hawtin et al., Many gene banks find it difficult to ensure the genetic integ- 2011) and, ideally, are complementary to the system of national rity, identity, and/or viability of their seed stocks, and, though germplasm collections that exist in most countries. they collect and store the seeds, they are often unable to reliably Over the last several decades, gene banks have collected test and maintain viability, characterize, propagate, or distrib- large numbers of samples, aiming to represent the broad range ute them (FAO, 2010). Even in the best-managed gene banks, of diversity that exists within a species or primary gene pool management decisions are often based more on intuition than Fig. 1. Diversity of rice samples from the IRRI gene bank. A montage of photographs illustrating phenotypic variation in plant, leaf, panicle, and grain morphology, and bar-coded, aluminum packets (for seed distribution) and containers (for long-term storage). Rice seeds can retain viability if they are first dried to ca. 6% moisture content (drying at 15% RH 15C) and stored in a dark, cold place; under medium-term storage (4C) seeds last ca. 3040 years, and under long-term storage (vacuum packed in aluminum containers at 18C), they are expected to last 50100 years.

3 February 2012] MCCOUCH ET AL.GENOMICS OF GENE BANKS 3 reason because of inadequate characterization of the germplasm nating species); how to store and distribute specialized seed stocks, in their collections. Internationally agreed crop-specific stan- DNA and tissue samples; and what kinds of remote-access dards have been developed (Bioversity International, 2007, databases are needed to ensure the integrity, security, and ac- 2011), but these focus on phenotypic characterization of a lim- cessibility of the genotyping and phenotyping information so it ited number of traits that are highly heritable and simple to as- can be easily queried and used by users around the world. sess, since this has been the only approach that makes possible The systematic integration of sequence-based information on the characterization of whole collections. Characterization gene bank holdings, along with many additional layers of ge- based on DNA, which is 100% heritable, may be considered the netic, genomic, and phenotypic information, will help scientists perfect data for gene bank management, but the technology explore the biological significance of the sequence diversity. has simply not been in place either to undertake or to interpret The quest to understand this diversity will help transform the comprehensive genetic characterization data. The more ad- working concept of a gene bank from that of a warehouse, vanced gene banks have been able to undertake limited genetic where seeds are diligently maintained but evolutionarily fro- characterization, typically involving the use of selected markers zen in time, to that of a vibrant research and discovery center for selected subsets of accessions. Occasionally, genetic char- where the genetic potential of diverse materials is actively acterization has been undertaken for a whole collection (e.g., investigated. the Dutch lettuce collection characterized with three AFLP Gene banks currently manage both genetic resources and in- primer combinations: van Hintum, 2003), or a sample of acces- formation about those resources. Their activities can be classified sions have been characterized in greater depth (e.g., 3000 rice into three general areas (FAO, 1996): (1) Collection and conser- accessions were characterized at 17 isozyme loci (Glaszmann, vation (including seed multiplication); (2) documentation, char- 1987), 20,562 rice accessions were characterized at 20 isozyme acterization, and evaluation; (3) distribution and dissemination. loci (Khush et al., 2003), 413 accessions characterized at 44 000 Active genomics-oriented research programs directed by gene single nucleotide polymorphism (SNP) loci (Zhao et al., 2011)), bank scientists can provide tools and insights to improve the ef- 20 rice varieties characterized at 160 000 SNP loci (McNally ficiency and cost-effectiveness of all three domains of activity, et al., 2009), and 50 varieties characterized at 6.5 M SNP loci leading to greater awareness by the public and the scientific com- (Xu et al., 2011). Never, however, has comprehensive mo- munity of the role that gene banks play in developing sustainable, lecular characterization been undertaken across the whole ge- ecologically sound solutions to many of the worlds most signifi- nome for a whole collection. cant problems. Further, the flourishing of such programs can Looking beyond gene bank managers need to manage their breathe new life into the scientific culture of gene banks, trans- collections, to their need to facilitate use by others, phenotypic forming what had been viewed as a mundane effort to store seeds evaluation for traits of agronomic significance presents an even in a cold room into an intellectually challenging, scientifically more severe constraint. As described below, comprehensive rigorous, and internationally competitive area of research that at- evaluation of whole collections is simply not a viable proposi- tracts visionary young scientists. tion and has been considered beyond the responsibility of gene In this paper, we discuss ways in which genomics research bank managers (Ebert et al., 2010). As a result, there is very can increase the efficiency and cost-effectiveness of traditional little information available on the phenotypic evaluation of ac- gene bank operations and how it can inspire gene banks to take cessions in gene banks. Trait data in databases such as Genesys on new activities designed to mobilize greater interest in and (Genesys, 2011) comprise mostly characterization rather than knowledge of the value of their holdings. We emphasize excit- evaluation data, and even the evaluation data are not recorded ing opportunities that are taking gene banks into new research in a form suitable for thorough assessment of genotype envi- domains, aiming to discover and eventually make predictions ronment interaction or environment-specific performance. This about the genetic potential and breeding value of the currently makes it difficult to respond effectively to requests for informa- underutilized material in gene banks, particularly crop wild tion relevant to breeding objectives. relatives and exotic landrace accessions. Recent advances in sequencing technology offer new oppor- Our examples focus largely on rice, illustrating how ge- tunities for gene banks to become more efficient, cost-effective, nomics-based research is being integrated into the Interna- and informative in their transactions as collectors, conservers, tional Rice Research Institute (IRRI)s gene bank activities and providers of germplasm and information. Genomic infor- in collaboration with scientists around the world. IRRI is mation can be readily generated on a large number of acces- currently working with partners under the RiceSNP Consor- sions in a cost-effective manner, providing valuable new tium (http://www.ricesnp.org) to generate high-quality gen- insights into the identity, ancestry, and genetic and phenotypic otypes on several thousand rice accessions using high-density potential of individual holdings, and this information can help SNP arrays, and has initiated a project in collaboration with improve the way gene banks operate. Innovative research in a BGI-Shenzhen and Chinese Academy of Agricultural Sciences number of advanced gene banks has paved the way for a larger (CAAS) in China for multiplex resequencing strategies for investment in sequence-based genetic analysis on a global scale 10 000 rice accessions (http://en.genomics.cn/navigation/show_ (e.g., McGregor et al., 2002; van Hintum and van Treuren 2002; news.action?newsContent.id=8952). Both projects are orga- Borner et al., 2005; Spooner et al., 2005; van de Wiel et al., nized under Theme 1 of the Global Rice Science Partnership 2010; van Treuren et al., 2010). While the power and cost of the (http://www.grisp.net). One objective is to determine a cost- new technologies have reached a level where we can begin to effective approach for evaluating the extent of variation em- contemplate routine whole-genome genotyping or sequencing bodied in the IRRI gene bank. The data will be used to classify of entire collections, there is an urgent need to develop strate- the degree of genetic similarity between accessions, as well gies and protocols for managing both sequenced germplasm as the diversity within accessions, to provide insights into and sequencing information in a rational and productive way. population structure and admixture, to classify potential du- Some of the issues that need to be addressed include how to plicates and to identify genetic novelty. These data will provide evaluate within-accession variation (particularly in cross-polli- the first comprehensive view at the haplotype level of the

4 4 AMERICAN JOURNAL OF BOTANY [Vol. 99 genetic diversity housed in the IRRI gene bank, and the data et al., 2008), genotyping-by-sequencing (GBS) (Elshire et al., sets will be sufficient to support rational decisions about how 2011), and other low-coverage approaches that depend heavily to better manage and use the collection. on imputation to fill in the missing data (Huang et al., 2009). In both cases, the randomly generated short sequence reads are generally aligned to a high quality reference genome, and RICE AS A MODEL FOR OTHER CROPS SNPs, indels, and other types of polymorphism (e.g., copy number variation) are called for reads that can be unequivocally Rice (Oryza sativa L.), the first crop to have its genome fully aligned to a specific region in the reference. Repetitive se- sequenced (IRGSP, 2005), has a full repertoire of genetic, ge- quences are thrown out, and the process of alignment can be nomic, and germplasm resources and is a critical component of challenging, depending on the degree of divergence separating food security for millions of people. For these reasons, and be- the resequenced genome(s) from the reference genome. Many cause of the active interest on the part of the international rice different algorithms exist for aligning short read sequences to a research community, rice offers a good model for examining reference genome, and some are better able to align divergent how large volumes of DNA sequence information on thousands sequences than others (Horner et al., 2010; M. H. Wright et al., of gene bank accessions are likely to impact the ways in which unpublished manuscript). Rapid advances in computational bi- gene banks function, how information and genetic materials are ology take advantage of the abundance of raw sequencing data provided to users, and how researchers engage with gene bank in public databases to improve the accuracy and efficiency of scientists to explore the value of their holdings in the future. the alignment process, as well as to make new discoveries about To some extent, the principles being developed for rice also how genomes function and evolve. In cases where no reference apply to other crops. All crop gene banks face similar challenges, genome is available, short read sequences from different ge- which can be addressed more effectively when supported by nomes can be aligned to each other, and SNPs can be called information on genomic diversity. Given the rapid evolution without knowing their exact location in a species genome of faster, cheaper sequencing technology (http://www.genome. (Elshire et al., 2011). gov/sequencingcosts/), genotyping large numbers of samples is Alignment to a reference genome allows researchers to take now feasible and economical for virtually any species, with or advantage of available genome annotation to predict whether a without a reference genome, though polyploids and cross-polli- SNP falls within or near a gene of interest, and whether a genic nating species still present special challenges (van Hintum et SNP is expected to cause a functional change in the protein al., 2007; Metzker, 2010). Given the current trends in cost and product (synonymous vs. nonsynonymous change) or in the efficiency, sequence-based analysis is likely to become routine promoter region such that it might affect the expression of the as a starting point for addressing fundamental biological ques- gene (Ondov et al., 2008). This information can be very useful tions in a wide range of species. Low-coverage genotyping, in determining whether a particular SNP is likely to be respon- genotyping-by-sequencing (GBS) (Baird et al., 2008; Elshire et sible for a phenotype of interest. Even when SNPs do not fall al., 2011), and other more targeted genotyping strategies iden- within genes, the frequency and distribution of polymorphisms tify large numbers of polymorphisms and can be very useful for can be used to construct haplotypes and determine ancestry or characterizing the diversity found in natural populations of both identify regions of the genome associated with traits of interest wild and cultivated species. However, the large amounts of data (Gupta et al., 2001; Rafalski, 2002; McCouch et al., 2010). that are so easily generated by resequencing approaches impose Polyploids are particularly challenging because genic regions significant burdens for data analysis and may impose too great are usually multiple copy, and it is very difficult to reliably dis- a burden for under-resourced gene banks. The advent of com- tinguish homeologs, tandemly arrayed genes, paralogs, and other munity based efforts for annotation such as crowdsourcing of- forms of gene duplication within a single genome (Feuillet et al., fers a means to alleviate this burden by drawing on combined 2011; Potato Genome Sequencing Consortium, 2011). Thus, expertise for annotation. The costs associated with hiring trained there is little emphasis placed on aligning short reads to unique personnel, intensive use of computing resources, data storage positions in a reference genome, and greater emphasis on sim- capacity, and analysis tools must all be factored into any calcu- ply finding regions of high similarity among different genomes, lation of the benefits that accrue from access to low-cost se- so those regions can be aligned to each other as the basis for quencing capability. SNP calling. This means that virtually all SNPs detected by re- sequencing in polyploid genomes are in genic regions. None- theless, there are significant gray areas in such alignments THE NEW TECHNOLOGIES because of the internal duplication that makes it virtually im- possible to avoid paralogs, homeologs, copy-number variants, Second-generation sequencing platforms produce millions of and other repetitive sequences. The long reads associated with short-sequence reads, typically 25400 bp long, and can do so third-generation sequencing technology (Eid et al., 2009; in multiple genomes simultaneously. The massive data capture Munroe and Harris, 2010) may be partially able to relieve this associated with next-generation sequencing (NGS) has come at problem and make genome-wide resequencing more feasible some cost in terms of data quality compared to Sanger sequenc- for polyploid crops. ing, but the loss of quality is generally compensated by the deep The availability of high-resolution genotypic information on a coverage. Third-generation platforms use single-molecule se- rapidly expanding number of accessions held in gene banks over quencing without the requirement for DNA amplification, and the next several years will empower both gene bank scientists and they produce relatively long reads (>1 kb) in real time (Eid the research and breeding communities in numerous ways. We et al., 2009). NGS is currently the most widely available high- are standing at the threshold of a genomics revolution within our throughput sequencing approach and has been adapted for use gene banks, and it is exciting to imagine the many ways in which with reduced representation libraries in the form of sequencing future gene bank scientists will be able to explore the wealth of restriction-site-associated genomic DNA (i.e., RAD tags) (Baird natural genetic variation in their collections.

5 February 2012] MCCOUCH ET AL.GENOMICS OF GENE BANKS 5 IMPLICATIONS FOR GENE BANK MANAGEMENT species with high conservation costs, accurate determination of genetic duplication has been uneconomical. For clonally propa- Gene bank managers face formidable challenges in deter- gated crops, such as potato, where each accession represents a mining what to conserve. Given that they do not have unlimited single genotype and where the most essential conservation costs resources, they must limit the size of their collections while are already very high, molecular evaluation is likely to be cost- conserving, as much as possible, the entire gene pool of the effective (van Treuren et al., 2004). However, it is simply not crop. How can they optimize the composition of their collec- cost-effective for most seed-propagated crops, including rice, tion, assuring proper representation of the gene pool while where conservation costs are low and polymorphism within ac- avoiding redundancy? cessions is greater. For example, it is not sufficient to demon- Gene bank managers also face challenges keeping track of strate that two accessions have identical passport data, such as the many accessions housed in their collections, ensuring that sharing a common origin in the same sample collected from a seed identity is maintained and that seed packets are labeled farmers field; such historical duplicates are often genetically correctly. How can they ensure genetic integrity, especially distinct and may even, through curatorial errors, be entirely un- during seed amplification, avoiding genetic drift, unconscious representative of the genetic diversity present in the farmers selection, contamination (unwanted pollen flow, undesired mix- field. Nor is it sufficient to use the phenotypic characterization ing of seed lots), and labeling/handling errors, and how can data conventionally collected by gene banks. These data focus they track those errors to correct or contain them? on highly heritable traits scored with low precision in one This section considers how genomics data will provide gene environment, often ignoring variation within accessions, and bank managers with new solutions to these questions and con- they provide weak metrics of variation within or between cerns and will help improve the quality of gene bank conserva- accessions. tion practices. Genomic information will be useful in virtually The cost of a phenotypic analysis that is sufficiently rigorous all areas of gene bank management. Ultimately, it will enable to detect quantitative differences in seed propagated crops is more effective classification of collections, help establish the extremely high. Even then, it risks losing diversity for traits not bounds and composition of crop gene pools, identify duplicates, included in the phenotyping, and it risks losing alleles whose provide a rationale for limiting the size of collections, and expression is hidden by epistasis or other gene gene interac- facilitate detailed genetic gap analyses to guide future acquisi- tions or by epigenetic effects. For cross-pollinated crops, such tion (Jarvis et al., 2003; Sackville Hamilton et al., 2003; as maize or Brassica species, where populations represent dy- Upadhyaya et al., 2009; Ramrez-Villegas et al., 2010). namic and complex mixtures of alleles and regeneration can be difficult and expensive, a combination of passport and pheno- Use of genomics to classify samples, reduce redundancy, typic data has been used to reduce redundancy (van Treuren and rationally limit the size of collections The size of collec- et al., 2010), but this practice would be unacceptable in an in- tions continues to increase, often through the uncontrolled du- breeding, easily conserved crop like rice. plication of materials in different gene banks. FAO (1996) High-throughput genotyping and/or sequencing can be used estimated that, of 6 million accessions conserved worldwide, to help quantify levels of genetic similarity and heterozygosity only 12 million were distinct. Since then, FAO (2010) esti- among individuals within an accession as well as between mated that 1.4 million new accessions have been added to gene accessions. These data provide a set of objective criteria that banks, of which only 240 000 represent newly collected materi- can be used as the basis for determining which are considered als. IRRI, cross-referencing 223 000 accessions from seven rice to be duplicate samples. Different levels of resolution are collections, found that on average 30% of the accessions in any needed for each species, type of variety, and specific question one collection were duplicated in at least one of the other six to be resolved. For example, investment by the Generation collections (unpublished, data available in IRIS, 2011). As their Challenge Program to characterize 2800 rice accessions based size increases, there is increasing pressure to reduce the size of on 50 SSR markers demonstrated that low-density molecular collections by eliminating duplicates to reduce redundancy. marker coverage was not sufficient to clearly distinguish be- How can duplicates be identified? tween closely related and duplicate rice accessions (K. L. Sackville Hamilton et al. (2003) established a logical frame- McNally et al., unpublished data). More recently, researchers at work for reducing redundancy. They examined the different IRRI have used 384-SNP assays for classification purposes concepts and definitions of duplication as applicable to clonal (Thomson et al., 2011), but neither of these assays interrogated crops, inbreeding species, and outbreeding species, and set out more than 0.000001% of the rice genome. Thus, two accessions the conditions under which it is appropriate to consider acces- that appear to be identical for these markers are almost certainly sions to be duplicates for each breeding system (and also for the closely related, but the resolution of the assays was too low to converse situation, when to split heterogeneous accessions). In reliably determine whether samples were true duplicates. It re- practical terms, accessions of seed-propagated crops should al- mains to be determined what degree of genetic similarity will ways be considered populations of genotypes where the level of be required to consider two inbred samples to be duplicates, and heterozygosity and the degree of genetic similarity among indi- what level of marker resolution a gene bank manager will be viduals within an accession is dependent on factors such as the comfortable using as the basis for a decision about combining mating system of the species, the type of variety being con- or eliminating presumed duplicates from a collection. served, the way seed was originally collected in the field or at a Then, having identified two accessions as duplicates, how local market, and the way seed has been maintained in the gene should they then be handled? For accessions conserved as seed, bank. Duplication has to be defined and quantified in terms of since they are not exact duplicates, discarding one is usually not variation between relative to variation within accessions. appropriate because of the resulting loss of diversity. Combin- Until the advent of the latest low-cost technologies for ge- ing them may be a good option. However, in this case it is es- nome-wide molecular characterization, reliable rationalization sential to ensure that an appropriate criterion of duplication is for declaring duplicates has been impossible, and, except for used, to ensure that the resulting increase in within-accession

6 6 AMERICAN JOURNAL OF BOTANY [Vol. 99 variability is acceptably small, perhaps even desirable to over- opportunities for greater savings. Thus, genomics can help to come inbreeding depression in out-crossing species, and does improve the efficiency of gene bank operations and help not invalidate previous data on the accessions or add to the dif- managers make rational decisions without sacrificing future ficulty of maintaining its genetic integrity. opportunities. An alternative option is to archive one of them. Most of the costs of conserving seed accessions arise from keeping them as Monitor regeneration of seed stocks, track samples, and im- active collections (Koo et al., 2002), meaning that they are prove operations efficiency Mislabeling or mixing samples is readily available for userequiring gene banks to invest in dis- a risk in all biological experiments. Where many samples are tribution, germination testing, and regeneration. Merely storing handled, it becomes a near certainty unless very rigorous qual- a sample untouched in a long-term deep freeze can be done at a ity control standards are implemented. In good gene banks, a relatively small cost (Table 1). Historically, this approach has whole suite of measures is applied during regeneration to mini- been considered undesirable because, by definition, no further mize the problem. Seed packets are labeled inside and out; germ- data will be collected on an accession that has been archived, plasm lists are independently double checked against labels at and therefore it may not be reactivated before the seeds lose every step in the process from taking parental seed stocks out of viability. If we do not use it, why conserve it? Yet if it is ar- the gene bank to storing the packaged progeny back in the gene chived with full resequencing data, we will have some informa- bank; plots are planted in a way that allows easy identification tion upon which to base a future decision to reactivate it in case and roguing out of volunteer plants; parental and progeny seed new biological questions and/or new technology make it valu- are compared visually against original samples; phenotypic able for future studies. In addition, managing duplicates by traits observed in the regeneration plots are compared against archiving and reactivating can be safely practiced with a looser previous characterization data. Yet despite all these measures, definition of duplication than combining them and thus presents it is impossible to be sure that the progeny seed are indeed true-to-type. Low-resolution genotyping fingerprints are in principle suf- TABLE 1. Costs of acquisition, active conservation vs. archiving in long- ficient to track samples with full assurance of no mislabeling; term storage in a range of seed collections held in CGIAR centers even a 48-SNP assay can theoretically distinguish over 1028 vari- (Hawtin et al., 2011). ants. As a minimum level of quality control, parent and offspring should be genotyped as a routine part of regeneration, and the Annual recurring cost per accession offspring rejected if they do not match the parent. Experimen- Additional (US$) one-time tally, if mislabeling is found to be a problem, seeds and plants Cost for fully active Cost of long-term cost of could be genotyped at a number of key steps in regeneration to Centre/Crop accession storage acquisition a determine the most problematic steps and, hence, to design an AfricaRice improved work flow. Rice 10.06 1.00 23.51 The critical feature is to ensure that the chosen fingerprint CIAT enables unambiguous distinction between the accessions being Beans 19.48 1.53 87.89 grown in the same field, taking into account the within-accession Tropical forages 26.82 1.91 109.64 CIMMYT variation. High-resolution genotyping or sequence data could Maize 16.96 0.20 28.64 be used to optimize the design of the fingerprint and to take Wheat 3.28 0.08 10.51 advantage of future developments in fingerprinting technolo- ICARDA gies by simply redesigning the fingerprint. Barley 5.65 0.76 23.63 Chickpea 6.09 0.89 29.52 Adding new accessions to collections As the cost of se- Faba beans 6.09 0.89 29.52 quencing drops below the cost of acquiring and conserving new Forage and range 6.72 0.76 59.72 Grasspea 6.03 0.76 17.77 accessions, genomics data will be generated upon receipt of all Lentil 6.09 0.89 29.52 new samples, and the information will be used to determine Pea 6.03 0.76 36.13 whether to incorporate them into the collection as new acces- Wheat 7.14 0.76 23.73 sions. This will ensure that a gene bank acquires only materials ICRISAT that bring genuine genetic novelty (in terms of both new alleles Chickpea 10.74 0.61 64.17 and, particularly for clonal accessions, new combinations of Groundnut 12.74 0.61 69.63 Pearl millet 12.49 0.60 54.77 alleles) to the collection. As can be seen from the earlier break- Pigeonpeas 12.86 0.60 54.77 down of costs, we are now at a point where low-coverage Small millet 15.75 0.72 111.50 resequencing data on an individual can be generated for less Sorghum 10.20 0.55 42.42 than the cost of incorporating a single cultivated rice sample IITA into the collection. As sequencing and genotyping efficiencies Cowpea 11.15 1.28 32.77 continue to increase, the use of pooling strategies are emerging Maize 12.12 1.28 53.47 Misc. legumes 11.78 1.10 33.83 that allow us to evaluate multiple individuals per accession as the ILRI basis for evaluating genetic identity and within-accession varia- Tropical forages 32.95 5.28 43.75 tion. It remains to be seen what the optimum number of indi- IRRI viduals per accession should be for different species and types Cultivated rice 7.36 0.30 49.54 of accession, but new technologies are opening the door to new Wild rice 21.27 0.30 129.67 insights in this area of research. At the moment, sequencing a Additional = in addition to the first cycle of seed increase, seed restriction-site-associated genomic DNA (i.e., RAD tags) (Baird preparation, germination testing, etc. required before a sample received et al., 2008), genotyping-by-sequencing (GBS) (Elshire et al., becomes an accession 2011), and related approaches (Huang et al., 2009) are likely to

7 February 2012] MCCOUCH ET AL.GENOMICS OF GENE BANKS 7 become the de facto methods for making decisions about for- about a subset of pure lines). The conflict could be resolved by mally incorporating newly received samples as accessions in devising new management approaches. Specialized genetic the IRRI gene bank. stocks, such as purified lines, mapping populations, and DNA samples, are designed specifically to promote more effective How will genomics impact the type of germplasm main- gene discovery. Even if these stocks do not contain unique ge- tained in gene banks? Genetic characterization raises new netic novelty and would therefore not merit long-term conser- issues for the management of genetic diversity within acces- vation per se, they will need to be created and made available, sions. Gene bank managers normally seek to conserve acces- if only in the short term, while needed for research. sions with genetic composition unchanged from the original Thus, gene bank managers could establish a two-tier system, sample. In many cases, particularly for wild relatives and tradi- distinguishing between accessions that merit long-term conser- tional varieties, this involves conserving genetically heteroge- vation and designer lines or discovery populations created to neous populations in a form that is difficult to use for gene meet short- to medium-term research needs. Discovery popula- discovery. In autogamous species, these materials are often pu- tions would include collections of recombinant inbred lines rified through one or more rounds of single plant selection prior (RILs), backcross introgression lines (BILs), chromosome seg- to sequencing, resulting in the creation of one or more derived ment substitution lines (CSSLs), multiparent advanced genera- genetic stocks per accession. It has been recommended that tion intercross (MAGIC) lines, training populations developed wherever possible, single plants should be used as the source of to represent different breeding pools, etc. Development, ampli- DNA for sequencing, and seed derived from these single plants fication, distribution, and evaluation of these materials could be should be set aside as reference seed stocks (Tung et al., undertaken as a collaborative effort involving geneticists, phys- 2010). This will ensure that the investment in sequencing is ac- iologists, breeders, and gene bank managers, all of whom share companied by an immortal seed stock in the gene bank and that an interest in exploring the genetic potential of underutilized any future resequencing can be done on derived material of genetic resources and ultimately making predictions about the known pedigree. It will also serve as the source of material for breeding value of many novel genes/alleles contained in tradi- phenotyping and ensure that phenotypic information can be tional landraces and wild species. associated with the sequence information in a meaningful way. Purified reference (core) sets have been made from various rice collections, including those at USDA (Yan et al., 2007), IRRI IMPLICATIONS FOR GENE BANK USE (K. L. McNally, unpublished data), NIAS (Kojima et al., 2005; Ebana et al., 2008), EMBRAPA (Abadie et al., 2005), and CAS Perhaps the biggest challenge faced by gene bank managers (Huang et al., 2010), among others. is to identify the most appropriate set of accessions to meet the However, creating a new accession for each genotyped needs of users. Each user has a specific need to achieve a spe- accession has significant consequences. The cost of growing cific breeding or research objective, and the gene bank manager every genotyped plant to maturity and creating an accession aims to tailor the selection of accessions to these needs. To the from the offspring would have to be factored into the time and extent that a gene bank might conduct its own research on gene cost of genotyping; it already multiplies the cost of genotyping discovery, diversity analyses, and prebreeding, the same chal- several-fold. If every accession were genotyped following the lenge applies to meet the gene banks own research needs. How same practice, the size of the collection would be doubled; and can the most appropriate set of accessions be identified to ad- so would its maintenance costs although no new diversity is dress a given research or breeding objective? What approaches conserved. This raises a number of questions about the manage- have been used in the past and how will the availability of whole ment of those purified stocks (SGRP, 2011). genome sequence information change the way recommenda- An appropriate strategy must be developed and rigorously tions are made or accessions are selected? applied. The cost of genotyping is now small relative to con- Advances in genetics and genomics have dramatically serving an accession for one year and even smaller in relation to changed the types of questions that are being asked in all realms adding a new accession to the collection. This in turn is much of science. How will new knowledge about the molecular varia- cheaper than phenotypic evaluation. One effective strategy tion of gene bank accessions impact the kinds of questions that would be to maintain progeny of the genotyped sample as an are addressed using gene bank accessions in the future? How accession only if the material will be used immediately for phe- will users interact with the progressive accumulation of ge- notypic evaluation and genotypephenotype association analy- nomic information about the genetic resources housed in ex situ ses. If the genotyping data are being collected to support gene collections? Will the growing body of genomic information bank management decisions without evaluation data or if it is impact the type of germplasm to be maintained in gene banks? being collected as an aid to future strategic selection of subsets Will it ultimately expand the utilization of gene bank holdings? of the collection for evaluation at some unknown date, then This section considers how genomics data will provide gene there would be no economic justification for creating the addi- bank managers with new opportunities to improve the effi- tional accession. ciency, reliability, and utility of their responses to users and Moreover, these inbred diversity panels have some obvious how it will fundamentally change the kinds of research ques- limitations. They offer only a limited view of the variation that tions asked and germplasm used by scientists inside and outside was present in the original landrace or wild accessions, and they the gene bank. cannot access the cryptic variation hidden away in agronomi- cally unadapted landraces and wild species. That variation can Choosing appropriate accessions to meet user needs In only be unmasked via crossing and population development. the past, phenotypic evaluation of tens of thousands of acces- This generates a conflict between the needs of conservation sions has been successful in identifying useful sources of varia- (conserving diversity within as well as among accessions) and the tion for traits with simple inheritance, such as resistance to grassy opportunity for genetic/association analysis (creating information stunt virus or cytoplasmic male sterility in rice (Plucknett et al.,

8 8 AMERICAN JOURNAL OF BOTANY [Vol. 99 1987). However, most traits of interest to plant breeders and TABLE 2. Usage of rice from the gene bank at IRRI. geneticists are quantitatively inherited, often demonstrating Metric Value low heritabilities and high genotype environment (GE) in- teractions. Furthermore, many traits are expressed only at par- a Number of accessions available for distribution under the Treaty 103 841 ticular developmental stages, in response to particular types of (as of January 2011) stress and may be affected by epigenetic processes. These traits 10-year mean number of samples distributed per year 23 500 may not be obvious without the use of specialized experimental Percentage of accessions evaluated for at least one trait by IRRI 76 scientists since 1980. (The remainder are mostly old accessions designs, equipment, and expertise (Chen, 2007; King et al., probably screened before 1980 and new accessions acquired too 2010). recently to have been screened) In addition, much of the natural variation that is of potential Percentage used in crosses 11 interest to plant breeders lies hidden in low-performing wild Percentage in the ancestry of commercially released cultivars 2 and unadapted materials whose breeding value cannot be ascer- a International Treaty on Plant Genetic Resources for Food and tained by phenotyping gene bank accessions per se (Tanksley Agriculture: http://www.planttreaty.org. and McCouch, 1997). In cases where genotype genotype (GG) interactions are critical to the expression of a phenotype of interest, phenotyping potential donors is not sufficient to de- collection based on individual and/or pooled samples, and termine their breeding value; progeny testing of intercrossed improved statistical genomic and computational biology methods derivatives is also essential. A great deal of research is needed make it possible to impute missing data with high levels to unravel the complexities of GG interaction effects in intra- of accuracy and to identify informative genotypephenotype and interspecific populations and to train models that can reliably associations. predict the spectrum of phenotypic outcomes in the progeny. However, sequence data alone are not sufficient. A great deal Without comprehensive information about the genotype and/ of work will be needed to obtain relevant information about or the phenotype of gene bank accessions, it is very difficult to phenotypic performance for the thousands of gene bank acces- determine which are the most useful for addressing a particular sions held in gene banks. It will be important to integrate the breeding objective or research question. As a result, gene bank interpretation of genomic data with other kinds of information, managers have to make recommendations based on very sparse including pedigree relationships, genetic similarity, subpopula- data, and many of their recommendations do not hit the mark. tion associations, ecogeographic origins, morphological char- Nonetheless, requestors frequently ask gene bank managers to acteristics, agronomic performance in diverse environments, help them identify the best accessions for their particular inter- crop management specifications, as well as functional annota- est. A skillful crop-specific curator draws on extensive knowl- tions of the genomic data providing information about genes edge about his/her crop species and deep familiarity with the and alleles, and relationships to biochemical and regulatory user community (Widrlechner, 1997), but some requests leave pathways. the manager searching for the proverbial needle in the haystack. As a result, most of the samples ultimately distributed do not, in Genotyping a collection: Where to begin? We envision fact, meet their needs. Thus, even for gene banks that are heav- that the process of genotyping a gene bank collection will be ily used, a significant proportion of usage does not meet the stepwise and iterative, aiming to provide maximum benefit to users needs, in that only a small percentage of the accessions the largest number of gene bank users as soon as possible, but evaluated are ultimately incorporated into breeding or research also helping to improve the quality and efficiency of gene bank programs (Table 2). Nonetheless, valuable information can be operations. The selection of what to genotype first, and at what generated by users who evaluate accessions, even if they turn level of genome coverage, can be made knowing that improve- out not to be what was hoped for. Negative results can contrib- ments in technology will inevitably enhance the quality and ute to the pool of information about gene bank material and decrease the cost of sequencing over time. While this might ap- may be very helpful in streamlining selection of material for pear to argue for holding back until the technology matures, it future users, if that information is reported or shared with the is important to initiate the sequencing effort as soon as possible. germplasm provider. The availability of a steady stream of sequencing and genotyp- The need, therefore, is not to increase the amount of usage so ing information will help to facilitate changes in the organiza- much as to increase the effectiveness of use (Widrlechner and tion of gene bank activities. Generating moderately sized data Burke, 2003; Rubenstein et al., 2006). As more and more acces- sets at the beginning will provide an opportunity to bring in sions are evaluated, the basket of evaluation data gets fuller, scientists with appropriate skill sets to help organize and man- providing a more robust starting point for making recommen- age the effort, to develop analysis pipelines and databasing dations to users. However, without full evaluation data, gene strategies that are in tune with the requirements of gene bank bank scientists need appropriate proxy data to predict which operations, while preparing to expand operations to meet future accessions are most likely to meet users needs. It is axiomatic demands, keeping in mind that subsequent materials will likely that what is needed is relevant, comparable data for every be genotyped at higher resolution and for less cost than first accession since without it, the curators intuition and expertise selections. are the only basis upon which to select or reject it. For many years, the concept of a core collection has featured Genome-wide sequence data potentially provides an ideal prominently in gene bank discussions about how to select ma- proxy. Today, genotyping is faster and cheaper than phenotyp- terials for in-depth characterization. A core collection repre- ing, genotypic information is highly heritable, and a single sents only 510% of the number of original accessions, but is genome-wide assessment provides underlying molecular diver- selected to represent a majority of the genetic variation present sity data relevant to virtually any trait that may be of interest in the original collection (Frankel, 1984; van Hintum et al., now or at any time in the future. Genotyping strategies such as 2000). This approach was recently advocated by Glaszmann GBS make it cheap enough to genotype every accession in a et al. (2010), who recommended using molecular genotyping

9 February 2012] MCCOUCH ET AL.GENOMICS OF GENE BANKS 9 data to help identify a core collection (or a mini-core, represent- an international team is being organized to tackle assembly, ing ~1% of the original number of accessions). They suggested variant calling, annotation, population genetics, and visualiza- purifying a core crop reference set of materials that is defined tion of the data set. A database designed to host the data, inte- as a set of genetic stocks that is representative of the genetic grate multiple forms of annotation and support user queries in resources of the crop and is used by the scientific community as real time is currently under development. The finalized data- a reference for an integrated characterization of its biological base will be publicly accessible. diversity (Glaszmann et al., 2010, p. 3). This core crop refer- Early investments in genotyping must be complemented by a ence set would be targeted for in-depth phenotyping and trait corresponding investment in phenotyping of the same materi- dissection by the community of researchers interested in the als. When genotypic and phenotypic data are available for the species. As a means for entering the broader collection, a well- same set of germplasm, the data provide the basis for undertak- characterized core collection has many advantages. It offers a ing genome-wide association studies (GWAS) to identify starting point for researchers to create smaller, more targeted regions of the genome that are associated with quantitative phe- collections focused on particular traits, gene families, pedi- notypic variation (Huang et al., 2010; Zhao et al., 2011). GWAS grees, or geographic regions (Staub et al., 2002). as a strategy for trait dissection is ideally suited to gene bank accessions because, unlike classical QTL mapping, genotyping Accelerating the virtuous cycle of discovery While a core and phenotyping are performed on a diverse collection of unre- collection is useful for evaluating a broad range of variation, it lated strains (referred to as a diversity panel) rather than on the may not efficiently capture traits or alleles that are rare and po- progeny of a biparental cross (referred to as a QTL mapping tentially have high potential breeding value or adaptive poten- population) (Yu et al., 2005; Zhu et al., 2008). Genetic relation- tial. To address this concern, Mackay and Street (2004) proposed ships among individuals in an association mapping panel may a mechanism called the focused identification of germplasm vary widely, and thus a FIGS approach could be used in con- strategy (FIGS) for constructing small subsets of accessions junction with GWAS to identify not only new sources of varia- that maximize the likelihood of encountering specific traits of tion for the trait of interest, but to simultaneously map the genes interest. This strategy focused on the selection of variation for or QTLs responsible for the phenotype. In practice, a diversity one trait at a time by choosing accessions from collection sites panel should be composed of the most diverse set of lines pos- that are most likely to impose a selection pressure for the trait sible, while ensuring that they are adapted to the region or envi- being sought (Street et al., 2008). The approach assumes that ronment in which they will be phenotyped. It can be difficult to the traits of interest are, as a result of adaptation to the environ- evaluate certain traits or phenotypes in the field if the range of ment, geographically localized in predictable regions. It will diversity is so high that some genotypes exhibit abnormal de- only be effective if this assumption holds. It might, for example, velopment such that trait expression is affected. While both be expected to be ineffective for traits such as atypical seed-oil GWAS and QTL mapping aim to identify genes or regions of chemistries where selection is not associated with geographi- the genome underlying complex phenotypes, GWAS does so in cally defined stresses. FIGS has been used successfully to iden- the context of evolutionary biology and population genetics, tify subsets of barley and wheat accessions containing resistance while QTL mapping does so in the context of inheritance to diverse fungal pathogens (Endresen et al., 2011) and Russian genetics (Bernardo, 2008). In practice, the two approaches wheat aphid (El Bouhssini et al., 2011). These, or a combina- are complementary and are often pursued jointly (Yu et al., tion of similar approaches, may be used to prioritize samples 2005; Legarra and Fernando, 2009; McMullen et al., 2009; for allele mining approaches involving resequencing or inten- Famoso et al., 2011). sive genotyping. GWAS and QTL mapping studies generate hypotheses about Plans to sequence up to 10 000 rice accessions from the QTLs or genes that require validation to be of practical use. IRRI gene bank have begun in collaboration between BGI, Validation generally requires further mapping and introgres- CAAS, and IRRI (http://en.genomics.cn/navigation/show_ sion of QTL regions into different genetic backgrounds to de- news.action?newsContent.id=8952). The first phase will in- termine the reliability and heritability of a QTL for breeding volve a diverse collection of 3000 accessions that will be purposes (Venuprasad et al., 2011a, b), which is accomplished sequenced at 6 genome coverage or greater, including 50 ac- by traditional crossing and progeny analysis using an array of cessions sequenced at ~30 depth. The deep sequencing will be biparental and multiparent populations. This phase of work can done using multiple libraries with varying insert sizes to facili- be augmented by collaboration with plant breeding programs tate de novo assembly by tools such as AllPaths (Butler et al., that routinely carry out multilocation field evaluation. The in- 2008). Previous projects involving high-quality resequencing sights gained from these studies can then be used to make pre- of 50150 diverse rice genomes at ~1575 genome coverage dictions about how subsequent sets of materials are likely to have established the foundation for this approach in rice and perform. After the initial system of prediction has been devel- demonstrated that a variety of tools and strategies are already in oped for material in a particular gene pool, future phenotyping place to make it feasible (Wright et al., unpublished manuscript; can be targeted only at selected subsets of genotyped accessions Xu et al., 2011; http://www.ricesnp.org). The availability of and crosses among them to confirm or improve predictions. As these and other high-quality genomes serve as the framework the prediction system improves, phenotyping can be targeted for creating pan-genome reference sequences for the different more and more reliably within a certain set of accessions. pools of variation that exist for rice. These reference sequences Selecting materials to be genotyped at each stage in this pro- will form frameworks to facilitate the alignment and build re- cess requires coordination so that hypotheses can be developed finement for genomes resequenced at lower depth, with itera- and tested, and so that different gene bank user groups will be tive improvements of pan-genome builds as the depth increases. served. If the genotyping is coordinated with phenotyping, ex- All of the data will be deposited in short read archives in the pression analysis, and other forms of genetic dissection, gene public domain (such as http://www.gigadb.org and others) at banks have the potential to initiate virtuous cycles of discovery the time of publication, or earlier. Under the GRiSP framework, that will reverberate productively in the research community.

10 10 AMERICAN JOURNAL OF BOTANY [Vol. 99 This cycle of discovery can simultaneously improve subsequent organizations (http://www.public.iastate.edu/~usda-gem/) in which predictions about the genetic value of diverse accessions. The exotic accessions of maize were top crossed to a standard set of positive feedback loop triggered by the availability of rese- tester inbreds, and the resulting hybrids were evaluated for nu- quencing or genotyping information should greatly enhance the merous traits in multilocation trials around the United States by value of gene bank materials and simultaneously enhance the a network of public breeders and private seed companies. The ability of gene bank scientists to make predictions about the introduction of exotic alleles enhanced the performance of genetic value of materials that have never been phenotyped and maize grown in the United States for many important traits about which little or no passport information may be available. (Pollak, 2003). The set of donor lines used in the GEM project One of the most important benefits of having high-resolution are now part of the Seeds of Discovery project, a collaborative sequencing or genotypic information for gene bank collections program funded by the government of Mexico, the Gates Foun- is that it allows plant breeders and other researchers to begin to dation, and Centro Internacional de Mejoramiento de Maz y identify lines or accessions that share alleles or haplotypes and Trigo (CIMMYT) that aims to genotype accessions of maize to accumulate information on the phenotypic performance of and wheat from the CIMMYT gene bank (http://masagro.cimmyt. alleles in diverse backgrounds rather than of lines (Heffner et al., org/index.php/areas-prioritarias/descubriendo-la-diversidad- 2009; Lorenz et al., 2011). Through evaluation of the effects of genetica-de-las-semillas). Similarly, the Sorghum Conversion alleles that occur in many lines, researchers can leverage in- project converted many exotic lines to photoperiod insensitivity vestments made in all phenotyping experiments to improve the so they could be evaluated under conditions in the United States. predicted performance of any particular accession. This lever- Lines were converted irrespective of per se performance, and aging is an important advance over conventional approaches to many alleles derived from poor-performing lines proved bene- germplasm evaluation where a phenotypic measurement pro- ficial in the genetic background of elite varieties. Germplasm vides information only on a particular line or accession itself. released from the Conversion program was widely used to im- This also has important implications for the way phenotyp- prove commercial sorghum hybrids for pest and disease resis- ing experiments are likely to be conducted. As high-resolution tance, drought and saline tolerance, stalk strength, grain quality genotypic information becomes routinely available on large characteristics, and improved yield and yield stability (Miller, numbers of germplasm accessions and derived breeding lines, it 1979). Genotyping and sequencing of the donor lines, prebreed- will be relatively straightforward to identify alleles or haplotypes ing germplasm and elite releases, in combination with pheno- that are replicated across lines in different germplasm clusters. typic information about these materials, will provide a valuable This will impact the optimal design of experiments aiming to starting point for identifying the genes and gene combinations examine phenotypegenotype associations in collections of re- responsible for the improved performance of the hybrids. lated germplasm and is particularly relevant for implementing While breeders do not require knowledge about specific genomic selection strategies in breeding. When evaluating genes, alleles, or pathways that determine performance, gene breeding pools or targeted collections of geographically or eco- bank scientists will benefit greatly from this information. Know- logically adapted lines (e.g., using FIGS) where genetic back- ing which alleles and allele combinations are associated with ground is relatively stable, the repeatability of allele rather than desirable characteristics will provide the basis for allele mining line effects is emphasized. This allows for testing of as many activities aimed at identifying accessions carrying favorable different lines as possible with little or no replication of indi- alleles (alleles that are favorable per se, as well as those that are vidual lines. As the relative cost of genotyping drops below only favorable when combined with others in elite backgrounds), that of phenotyping, the implications are that it will be increas- or accessions carrying unknown, novel alleles at loci that merit ingly possible and cost-effective to use genotypic informa- further investigation. Appropriate data mining software will be tion to predict phenotypic performance. This has far-reaching essential to ensure that this process is as efficient as possible, consequences for the implementation of field trials in the particularly for large gene banks with abundant sequence infor- context of genomics-based germplasm evaluation and crop mation. Further, it will require an ongoing curation process to improvement. annotate the sequence in meaningful ways. Some of the annota- tion will involve identifying functional polymorphisms and in- Predicting breeding value for underutilized materials One dicating their association with phenotypic variation of interest, objective of a gene bank is to support the broadening of the di- noting where sequence variants fall with respect to gene mod- versity in commercial crop varieties through providing ready els, promoters, or epigenetic marks and providing information access to novel and useful genetic variation. However, to make about the frequency of variants in specified subpopulations or the germplasm from the gene bank attractive to the breeder, germplasm pools. As more is known about how plants respond prebreeding can be required. Examples include the Wheat Ge- to internal, developmental signals and to external, environmen- netics and Genomics project in Manhattan, KS (http://www. tal cues and what specific GG combinations are desirable, se- k-state.edu/wgrc), wheat projects in Germany (Hammer et al., quence information will be increasingly helpful in searching for 1996), and the Germplasm Enhancement of Maize project in materials that meet specific criteria. Ames, IA (http://www.public.iastate.edu/~usda-gem/; Pollak, In rice, the use of advanced backcross QTL analysis (Tanksley 2003) which aim to explore the genetic potential of diverse wild and Nelson, 1995; Tanksley and McCouch, 1997) has been and exotic accessions and to enhance the utilization of a broad very successful in introgressing yield-enhancing genes from range of variation. Prebreeding projects in maize and sorghum low-yielding wild relatives. Researchers in China, Colombia, provide examples of well-coordinated and well-funded efforts India, Indonesia, Korea, and the United States have demon- between public and private sector breeders in the United States, strated that virtually any wild or unimproved accession, regard- though they predate the era of whole genome sequencing. The less of its phenotype, can serve as an excellent source of alleles for Germplasm Enhancement of Maize (GEM) program is a coop- rice improvement (Xiao et al., 1998; Moncada et al., 2001; erative effort between the USDA-ARS, land-grant universi- Septiningsih et al., 2003; Thomson et al., 2003; Li et al., 2004; ties, private industry, and international and nongovernmental Marri et al., 2005; Sarla and Mallikarjuna Swamy, 2005; Xie

11 February 2012] MCCOUCH ET AL.GENOMICS OF GENE BANKS 11 et al., 2006, 2008; McCouch et al., 2007). Molecular markers understanding of the genes, alleles, and useful haplotypes they were used to identify the location of favorable QTL alleles in contain. BC populations derived from crosses between elite, adapted, Emerging genomics technologies also enable the study of high-yielding varieties and divergent exotic materials (Eshed epigenetics, which is likely to have an increasingly important and Zamir, 1995; Brar and Khush, 1997; Tanksley and McCouch, role in conservation biology (Bossdorf et al., 2008; Richards et al., 1997; Zamir, 2001; Nguyen et al., 2003; McCouch et al., 2010). There is evidence that epigenetic processes are impor- 2007). Markers were also used to quantify the degree of genetic tant in mediating the responses of individuals and populations divergence between different donors and recurrent parent to rapid environmental change as well as in stabilizing the prod- varieties (Garris et al., 2005; Zhao et al., 2010), and these esti- ucts of wide hybridization and allopolyploidization between mates of divergence proved useful as the basis for planning genetically diverse strains (Salmon et al., 2005; Chinnusamy et al., productive crossing schemes. 2008; Boyko et al., 2010). In addition, epigenetic effects have Large-scale genomics experiments now are getting under- been implicated as an important source of de novo variation way to help characterize wild and landrace resources in our that facilitates the rapid evolution and adaptation of invasive gene banks, and one objective is to develop predictive models species (Allendorf and Lundquist, 2003). This could partially about GG that can help unlock the genetic potential of many explain the paradox of invasive species that have lost genetic other wild and exotic accessions for use in plant improve- variation during a bottleneck associated with their introduc- ment (Ammiraju et al., 2008; Huang et al., 2010; Famoso et al., tion but are nonetheless able to adapt to new environmental 2011; Zhao et al., 2011). This effort involves identifying com- conditions. binations of alleles and regions of the genome responsible for Finally, the emerging concept of the microbiome and the heterosis and transgressive variation and learning how to pre- metagenome are relevant in understanding how communities dict which introgressions from diverse wild or exotic donors are of microorganisms mediate the interaction between plants and likely to enhance performance in elite genetic backgrounds of their biotic and abiotic environment. Genomics technologies interest. To approach this problem in a systematic way, several and high-throughput sequencing are key to understanding these groups have constructed libraries of chromosome segment sub- relationships. While the complexity of community interactions stitution lines (CSSLs), which greatly facilitate the identifica- presents a formidable challenge, information derived from fo- tion of agronomically valuable genes introduced from wild or cused studies is being integrated into useful models about how unadapted donors (Tian et al., 2006; Ali et al., 2010; Fukuoka the genetics of crop plants contribute to community dynamics et al., 2010; Xu et al., 2010). in agricultural systems, mediated by, for example, microorgan- isms on the phyllosphere (Andrews and Harris, 2000; van der Germplasm characterization in an era of high-throughput Heijden et al., 2008; Whipps et al., 2008; Balint-Kurti et al., sequencing Sequencing technology has become so widely 2010; Meyer and Leveau, 2011). Thus, genomics is opening up accessible and cost effective that it is being used in a highly an entirely new dimension of research suggesting that a plants distributed fashion to address a broad range of basic and ap- response to its biotic and abiotic environment is to some extent plied questions. Many of these questions are pertinent to gene mediated by particular microbial communities with which it is banks because they contribute directly to the characterization of in contact. This has implications for the way gene bank acces- genetic resources or because they offer models of how to ap- sions are characterized and possibly the way they are man- proach germplasm characterization in new ways. Sequencing or aged. Extensive new information will be needed to accelerate genotyping data are used for phylogenetic reconstruction (Ge et al., our understanding of the metagenome and to better understand 1999; Soltis et al., 2004; Zou et al., 2008; de la Torre-Barcena the implications of this diversity for the management of plant et al., 2009), allele mining and haplotype structure analysis genetic resources. (Bermdez et al., 2008; Mikami et al., 2008; Bhullar et al., 2009; Kovach et al., 2009; Takahashi et al., 2009), identifying genomic regions that are identical by descent (IBD) in pedigree- DOCUMENTATION AND DISSEMINATION OF related lines (Yang et al., 2007; Yamamoto et al., 2010; Aylor INFORMATION et al., 2011), characterizing regions of maximum or mini- mum divergence between lines, populations, or species (Ammiraju Generating and analyzing genomics data require major in- et al., 2006, 2008; Tang et al., 2007; Sanyal et al., 2010; Tian vestments in technology and bioinformatics infrastructure. To et al., 2011), selective sweep mapping (Pollinger et al., 2005; the extent that genotyping and sequence analysis employs the Molina et al., 2011), genome-wide association studies (Rostoks same technology platforms and analytical skills regardless of et al., 2006; Cockram et al., 2010; Huang et al., 2010; Zhao et al., the organism(s) being studied, this favors centralization be- 2011), introgression analysis (Sweeney et al., 2007; Takano- cause it allows gene banks to take advantage of economies of Kai et al., 2009; Fujino et al., 2010; Zhao et al., 2010; Famoso scale and to leverage access to the necessary (but scarce) com- et al., 2011), examining the genetic basis of heterosis (Li et al., putational and analytical expertise to make sense of the data. 2008), selection of parents for crossing and population develop- Gene banks with a large number of holdings are in a more fa- ment (Churchill et al., 2004; Cavanagh et al., 2008; McNally et al., vorable position to negotiate terms for the genotyping of their 2009), prebreeding activities (Hammer et al., 1996; McCouch collections, though it will take longer and cost more to com- et al., 2007), facilitating fine-mapping, gene cloning and pletely sequence their collections than those of smaller gene functional studies (Fukuoka et al., 2009; Huang et al., 2009; banks. Collectively, smaller gene banks may develop consortia Yamamoto et al., 2009), and providing the basis for pathway to promote sharing of technology and computational expertise, and systems analysis (Shinozaki and Yamaguchi-Shinozaki, helping to mobilize resources and create economies of scale 2007; Qiu et al., 2008; Ficklin et al., 2010; Gu et al., 2011). that will enable them to take advantage of the opportunity to These are examples where genome sequence data have been sequence their collections. In contrast to sequencing or geno- integral to the characterization of genetic resources and an typing, phenotyping requires a highly distributed system of

12 12 AMERICAN JOURNAL OF BOTANY [Vol. 99 well-coordinated teams that can evaluate plant performance in Along with the sequence data, gene banks will be urged to geographically and ecologically relevant field environments. develop online seed ordering systems so that in cases where a Complementary, and increasingly automated, high-throughput user determines which accession(s) they want after consulting phenomic approaches are also being developed that can take the available sequence information, they could order seed di- advantage of the ability to evaluate biochemical, physiological and rectly. This model is likely to require a two-pronged approach, developmental characters under controlled conditions (Lahner with a simplified interface for users who do not want to have to et al., 2003; Shinozaki and Yamaguchi-Shinozaki, 2007; deal with a great deal of detail and a more sophisticated inter- Buescher et al., 2010; Clark et al., 2011; Famoso et al., 2011). face for users looking for comprehensive information about In the area of bioinformatics, modular, well-structured, and specific accessions or seed stocks. Curation of the information user-friendly databases must be developed to track, store, dis- presented to users in online ordering system would require tribute, and analyze germplasm records and genotypic and phe- community effort, much like Wikipedia or Wikigenes, whereby notypic data. Common standards for acquiring, defining, storing, specialists would help annotate gene bank holdings, providing and managing genotypic and phenotypic data across laborato- as much information as possible about the history, genotype, ries and disciplines can greatly enhance the efficiency of gene phenotype, performance, and adaptation of each accession. bank operations and utility to gene bank users in many disciplines. In cases where a user requires the attention and assistance of Data-management requirements are closely coupled with the a gene bank curator to find what they are looking for, the cura- rapidly expanding adoption of genotyping-by-sequencing strat- tor would also benefit from the availability of an online se- egies as the front line for characterizing gene bank holdings. quence-search and seed-ordering system to expedite the process. To capture value from the deluge of sequencing data, a coor- Current online ordering systems have limited effectiveness, as dinated, distributed network of statisticians, computational expert knowledge is almost always required to convert a users biologists, population geneticists, and computer programmers is statement of research needs into a database query designed needed to help gene banks organize, analyze, and interpret the to select appropriate accessions. This is, in part, due to the genomic data that will be generated on their collections. Se- sparseness of the data that is available to query. For example, quencing the accessions in a gene bank will result in an unprec- Willocquet et al., (2011), in selecting accessions to explore the edented amount of information that is likely to inundate existing hypothesis that canopy architecture was an important factor in databases. Hence, it will be necessary to develop new database the spread of rice sheath blast, had to determine how existing schemas and systems that will enable the data to be integrated, data on standard morphological traits might relate to canopy visualized, queried by users, and cross-linked to the passport architecture and hence the spread of sheath blast. The selection and other information gene banks have curated in the past with process required the combined specialist expertise of the sheath the entry point as the genomic data instead of the seed. These blast epidemiologist and the gene bank curator; it is difficult to database systems must also make use of controlled vocabular- envision an effective online ordering system that could have ies and ontologies, such as the Plant Ontology (Avraham et al., achieved the same result. For queries that do not benefit from 2008), to ensure cross-database and cross-species compatibil- the availability of sequencing information, this will continue to ity. Development of scalable tools for data storage, retrieval, be the case. However, as more and more sequencing informa- and analysis using cloud-based solutions and parallel comput- tion becomes available, queries that leverage that sequence in- ing will certainly play a major role in mining this treasure formation are likely to be somewhat more effective. trove of information. Initiatives currently underway in the In the future, as knowledge of genotypephenotype relation- iPlant Collaborative (http://www.iplantcollaborative.org) are ships improves, novel querying systems can be envisioned that exploring approaches to help facilitate this process. take, as user input, definitions of target phenotypes, zones of With the creation of massive amounts of new sequencing adaptation, or degrees of genetic similarity, and translate these data related to their accessions, gene banks will be expected to through a genotypephenotype prediction system into an inter- provide summaries of sequence-based information to interested nal sequence-based database query. As the prediction system users and to provide it online. Menu-driven, sequence-based improves, users will be able to make increasingly reliable selec- query systems are needed urgently. For example, a genomics tions of accessions that meet their breeding and research needs. user might be looking for accessions that are as genetically In addition to an effective information dissemination and similar as possible across a specific region(s) of the genome seed distribution system, a quality gene bank will always re- (indicated by genome start and stop positions), while belonging quire significant investment in human capital to ensure that to different subpopulation groups (the groups would be named), there are highly competent curators ready to assist users in find- or might be looking for new alleles at genes x, y and z ing what they are looking for. There is little point in conserving (designated by genome start and stop positions, or possibly by and characterizing the wealth of genetic variation embodied by Gene_Identifiers). Such queries could be objectively structured a collection unless appropriate accessions can be effectively so that the user filled in the blanks and a set of accessions that identified and distributed upon request. met the criteria was presented. Genomics researchers are ac- customed to this type of database query; thus, it is likely that this group of users would be well served by providing this ser- SUMMARY vice. Breeders interested in finding similar information would prefer to structure queries around traits or performance records, It is imperative that we initiate the systematic genotyping of pedigrees, and environmental information, and this potential gene bank accessions to become more effective as stewards of will have to be developed with time as knowledge of genotype crop biodiversity. Large-scale genotyping and targeted rese- phenotype relationships improves. In the short term, breeders quencing has the potential to significantly advance the rational might be best served by developing translation tables that can conservation, characterization, and utilization of crop genetic begin to link genomic information with reference varieties or resources. It can provide gene bank scientists with significant variety groups whose performance is familiar. new information about their holdings and enable them to more

13 February 2012] MCCOUCH ET AL.GENOMICS OF GENE BANKS 13 effectively target future collecting missions, manage seed-mul- numbers of gene bank holdings and the exponentially increas- tiplication and in-house quality-control procedures, and advise ing volume of sequence information related to these materials, people about the value of germplasm for particular purposes. it is obvious that new, more rational forms of data management, The availability of genotypic information on a large number of compression, indexing, and access are needed. Genomic infor- gene bank accessions will serve as an inspiration for scientists mation, coupled with the accumulation of phenotypic observa- to take a more active role in characterizing natural variation in tions, will allow gene bank scientists and users to test hypotheses ex situ collections and for breeders to more rapidly and effi- and answer questions about genotypephenotype relationships, ciently discover and deploy novel alleles in new crop varieties, develop knowledge-based predictions about which accessions empowering them to take a more hypothesis-driven approach to are valuable for diverse applications, and identify gaps in knowl- variety development. edge, revealing new directions where much remains to be learned. To achieve these potential benefits, gene bank managers will Genotyping the reservoirs of natural variation conserved in need to adopt a suite of changes in the way they work. One the worlds gene banks and in situ preserves focuses a powerful change might involve adopting the model of the e-library by beam of light into the depths of existing gene pools and illumi- hosting an electronic catalogue of high-quality information nates opportunities for future research and development. Exist- about gene bank holdings, making it easier and more economi- ing forms of biological diversity have evolved over millions of cal for the public to access both seeds and information. This years and though we currently know relatively little about natu- model would require several layers so that nave users would not be confused by an overabundance of detail, while more so- ral variation, it continues to provide the world with essential phisticated users could drill down to discover increasingly building blocks for improving the productivity, sustainability, comprehensive information about specific accessions or seed and nutritional quality of food, fiber, and fuel resources that are stocks. This effort would involve community curation (crowd- essential for human welfare. sourcing) whereby specialists from the research community would be called upon to help annotate gene bank holdings, aim- LITERATURE CITED ing to provide as much information as possible about each ac- cession. Another change would encourage gene bank scientists ABADIE, T., C. M. T. CORDEIRO, J. R. FONSECA, R. B. N. ALVES, M. L. to actively engage in collaborations to discover the genetic po- BURLE, C. BRONDANI, AND P. H. N. RANGEL. 2005. Constructing a rice core collection for Brazil. Pesquisa Agropecuaria Brasileira tential of underutilized resources and to make predictions about 40: 129136. the alleles and traits they carry, their combining ability, per- ALI, M. L., P. L. SANCHEZ, S. B. YU, M. M. LORIEUX, AND G. C. EIZENGA. formance or breeding value. This approach to germplasm 2010. Chromosome segment substitution lines: A powerful tool for exploration would involve a combination of biological experi- the introgression of valuable genes from Oryza wild species into cul- mentation, data-mining activities and novel forms of public en- tivated rice (O. sativa). Rice 3: 218234. gagement. Finally, genomics information would enable gene ALLENDORF, F. W., AND L. L. LUNDQUIST. 2003. Introduction: Population bank scientists to monitor how their materials (or specific al- biology, evolution, and control of invasive species. Conservation leles) were being used in research and breeding, providing Biology 17: 2430. much-needed feedback as the basis for future inference. AMMIRAJU, J., M. LUO, J. GOICOECHEA, W. WANT, D. KUDRNA, C. MUELLER, To effectively integrate genomic information into the opera- J. TALAG, ET AL. 2006. The Oryza bacterial artificial chromosome tional context of a gene bank will require that gene banks begin library resource: Construction and analysis of 12 deep-coverage large- to operate in a new scientific arena, employing biologists with insert BAC libraries that represent the 10 genome types of the genus high level quantitative, computational, and statistical back- Oryza. Genome Research 16: 140147. grounds as well as those familiar with plant biology, plant ge- AMMIRAJU, J. S. S., F. LU, A. SANYAL, Y. YU, ET AL. 2008. Dynamic evolu- tion of Oryza genomes is revealed by comparative genomic analysis netics and breeding, ecology, agronomy, and environmental of a genus-wide vertical data set. The Plant Cell 20: 31913209. science. Specialists with demonstrated expertise in the develop- ANDREWS, J. H., AND R. F. HARRIS. 2000. The ecology and biogeography of ment and management of online databases, distributed an- microorganisms of plant surfaces. Annual Review of Phytopathology notation services, hosting of large, computationally intensive 38: 145180. querying resources, and tracking of inventory will be particu- AVRAHAM, S., C. W. TUNG, K. ILIC, P. JAISWAL, E. A. KELLOGG, S. MCCOUCH, larly in demand. Many of the people who currently have these A. PUJAR, L. REISER, ET AL. 2008. The Plant Ontology database: A skills are not trained in biology; one strategy for filling this gap community resource for plant structure and developmental stages, would be to appeal to policymakers to support new, high-pro- controlled vocabulary and annotations. Nucleic Acids Research 36 file, international training programs designed to recruit a new (supplement 1): D449D454. generation of motivated young scientists to become the leaders AYLOR, D. L., W. VALDAR, W. FOULDS-MATHES, R. J. BUUS, R. A. VERDUGO, of tomorrows gene banks. What is needed are people with an R. S. BARIC, M. T. FERRIS, ET AL. 2011. Genetic analysis of com- interest in improving the efficiency and long-term, strategic plex traits in the emerging collaborative cross. Genome Research 21: management of the worlds germplasm resources, the ethical 12131222. and moral compass required to stay true to the public mission of BAIRD, N. A., P. D. ETTER, T. S. ATWOOD, M. C. CURREY, A. L. A. SHIVER, ET AL. 2008. Rapid SNP discovery and genetic mapping using se- the gene bank, expertise in highly efficient forms of quality quenced RAD markers. PLoS ONE 3: e3376. control and inventory management, and familiarity with data- BALINT-KURTI, P., S. J. SIMMONS, J. E. BLUM, C. L. BALLARE, AND A. E. base development and design, computational biology, quantitative STAPLETON. 2010. Maize leaf epiphytic bacteria diversity patterns are genetics, statistical genomics, and information technology. genetically correlated with resistance to fungal pathogen infection. For maximum benefit to both gene banks and users, invest- Molecular Plant-Microbe Interactions 23: 473484. ments in sequencing and genotyping of gene bank holdings BERMDEZ, L., U. URIAS, D. MILSTEIN, L. KAMENETZKY, R. ASIS, A. R. must go hand in hand with significant improvements in the FERNIE, M. A. VAN SLUYS, ET AL. 2008. A candidate gene survey of power and efficiency of databases designed to collect, manage, quantitative trait loci affecting chemical composition in tomato fruit. store, integrate, and retrieve information. Due to the large Journal of Experimental Botany 59: 28752890.

14 14 AMERICAN JOURNAL OF BOTANY [Vol. 99 BERNARDO, R. 2008. Molecular markers and selection for complex EID, J., A. FEHR, J. GRAY, K. LUONG, J. LYLE, G. OTT, P. PELUSO, ET AL. traits in plants: Learning from the last 20 years. Crop Science 48: 2009. Real-time DNA sequencing from single polymerase mol- 16491664. ecules. Science 323: 133138. BHULLAR, N. K., M. MACKAY, N. YAHIAOUI, AND B. KELLER. 2009. EL BOUHSSINI, M., K. STREET, A. AMRI, M. MACKAY, F. C. OGBONNAYA, Unlocking wheat genetic resources for the molecular identification A. OMRAN, O. ABDALLA, ET AL. 2011. Sources of resistance in bread of previously undescribed functional alleles at the Pm3 resistance wheat to Russian wheat aphid (Diuraphis noxia) in Syria identified locus. Proceedings of the National Academy of Sciences, USA 106: using the focused identification of germplasm strategy (FIGS). Plant 95199524. Breeding 130: 9697. BIOVERSITY INTERNATIONAL. 2007. Guidelines for the development of crop ELSHIRE, R. J., J. C. GLAUBITZ, Q. SUN, J. A. POLAND, K. KAWAMOTO, E. S. descriptor lists. Bioversity Technical Bulletin series 13. Bioversity BUCKLER, AND E. S. MITCHELL. 2011. A robust, simple genotyping- International, Rome, Italy. by-sequencing (GBS) approach for high diversity species. PLoS ONE BIOVERSITY INTERNATIONAL. 2011. Descriptor lists and derived standards. 6: e19379. Bioversity International, Rome, Italy. Website http://www.bioversity- ENDRESEN, D. T. F., K. STREET, M. MACKAY, AND E. DEPAUW. 2011. international.org/?id=3737 [accessed 15 November 2011]. Predictive association between biotic stress traits and eco-geographic BORNER, A., M. S. RODER, S. CHEBOTAR, R. K. VARSHNEY, AND A. WEINDER. data for wheat and barley landraces. Crop Science 51: 20362055. 2005. Molecular tools for gene bank management and evaluation. ESHED, Y., AND D. ZAMIR. 1995. An introgression line population of Czech Journal of Genetics and Plant Breeding 41 (special issue): Lycopersicon pennellii in the cultivated tomato enables the iden- 122127. tification and fine mapping of yield-associated QTL. Genetics 141: BOSSDORF, O., C. L. RICHARDS, AND M. PIGLIUCCI. 2008. Epigenetics for 11471162. ecologists. Ecology Letters 11: 106115. FAMOSO, A. N., K. ZHAO, R. T. CLARK, C. W. TUNG, C. BUSTAMANTE, BOYKO, A., T. BLEVINS, Y. YAO, A. GOLUBOV, A. BILICHAK, Y. ILNYTSKYY, L. V. KOCHIAN, AND S. R. MCCOUCH. 2011. Genetic architecture of ET AL. 2010. Transgenerational adaptation of Arabidopsis to stress aluminum tolerance in rice (O. sativa) determined through genome- requires DNA methylation and the function of dicer-like proteins. wide association analysis and QTL mapping. PLOS Genetics 7: PLoS ONE 5: e9514. e1002221. BRAR, D. S., AND G. S. KHUSH. 1997. Alien introgression in rice. Plant FAO [Food and Agriculture Organization]. 1996. The global plan of ac- Molecular Biology 35: 3547. tion for the conservation and sustainable utilization of plant genetic BUESCHER, E., T. ACHBERGER, I. AMUSAN, A. GIANNINI, C. OCHSENFELD, resources for food and agriculture. FAO, Rome, Italy. Website http:// A. RUS, B. LAHNER, ET AL. 2010. Natural genetic variation in selected www.fao.org/agriculture/crops/core-themes/theme/seeds-pgr/gpa/en/. populations of Arabidopsis thaliana is associated with ionomic differ- FAO. 2010. The second report on the state of the worlds plant genetic ences. PLoS ONE 5: e11081. resources for food and agriculture. FAO, Rome, Italy. BUTLER, J., I. MACCALLUM, M. KLEBER, I. A. SHLYACHTER, M. K. BELMONTE, FEUILLET, C., J. E. LEACH, J. ROGERS, S. SCHNABLE, AND K. EVERSOLE. 2011. E. S. LANDER, C. NUSBAUM, AND D. B. JAFFE. 2008. ALLPATHS: Crop genome sequencing: lessons and rationales. Trends in Plant De novo assembly of whole-genome shotgun microreads. Genome Science 16: 7788. Research 18: 810820. FICKLIN, S. P., F. LUO, AND F. A. FELTUS. 2010. The association of multiple CAVANAGH, C., M. MORELI, I. MACKAY, AND W. POWELL. 2008. From muta- interacting genes with specific phenotypes in rice using gene coex- tions to MAGIC: Resources for gene discovery, validation and deliv- pression networks. Plant Physiology 154: 1324. ery in crop plants. Current Opinion in Plant Biology 11: 215221. FORD-LLOYD, B. V., D. BRAR, G. S. KHUSH, M. T. JACKSON, AND P. S. CHEN, Z. J. 2007. Genetic and epigenetic mechanisms for gene expression VIRK. 2008. Genetic erosion over time of rice landrace agrobiodi- and phenotypic variation in plant polyploids. Annual Review of Plant Biology 58: 377406. versity. Plant Genetic Resources; Characterization and Utilization 7: CHINNUSAMY, V., Z. GONG, AND J. K. ZHU. 2008. Abscisic acid-mediated 163168. epigenetic processes in plant development and stress responses. FRANKEL, O. 1984. Genetic perspectives of germplasm conservation. Journal of Integrative Plant Biology 50: 11871195. In W. Arber, K. Illmensee, W. J. Peacock, and P. Starlinger [eds.], CHURCHILL, G. A., D. C. AIREY, H. ALLAYEE, J. M. ANGEL, A. D. ATTIE, Genetic manipulation: Impact on man and society, 161170. ET AL. 2004. The Collaborative Cross, a community resource Cambridge University Press, Cambridge, UK. for the genetic analysis of complex traits. Nature Genetics 36: FUJINO, K., J. WU, H. SEKIGUCHI, I. ITO, T. IZAWA, AND T. MATSUMOTO. 11331137. 2010. Multiple introgression events surrounding the Hd1 flowering- CLARK, R., R. MACCURDY, J. JUNG, J. SHAFF, S. R. MCCOUCH, D. time gene in cultivated rice, Oryza sativa L. Molecular Genetics and ANESHANSLEY, AND L. KOCHIAN. 2011. 3-Dimensional root phenotyp- Genomics 284: 137146. ing with a novel imaging and software platform. Plant Physiology FUKUOKA, S., Y. NONOUE, AND M. YANO. 2010. Germplasm enhancement 156: 455465. by developing advanced plant materials from diverse rice accessions. COCKRAM, J., J. WHITE, D. L. ZULUAGA, D. SMITH, J. COMADRAN, M. Breeding Science 60: 509517. MACAULAY, Z. LUO, ET AL. 2010. Genome-wide association mapping FUKUOKA, S., N. SAKA, H. KOGA, K. ONO, T. SHIMIZU, K. EBANA, N. HAYASHI, to candidate polymorphism resolution in the unsequenced barley ge- A. TAKAHASHI, H. HIROCHIKA, K. OKUNO, AND M. YANO. 2009. Loss of nome. Proceedings of the National Academy of Sciences, USA 107: function of a proline-containing protein confers durable disease resis- 2161121616. tance in rice. Science 325: 9981001. DE LA TORRE-BARCENA, J. E., S. O. KOLOKOTRONIS, E. K. LEE, D. W. GARRIS, A. J., T. H. TAI, J. R. COLBURN, S. KRESOVICH, AND S. R. MCCOUCH. STEVENSON, E. D. BRENNER, M. S. KATARI, G. M. CORUZZI, AND R. 2005. Genetic structure and diversity in Oryza sativa L. Genetics DESALLE. 2009. The impact of outgroup choice and missing data on 169: 16311638. major seed plant phylogenetics using genome-wide EST data. PLoS GE, S., T. SANG, B. R. LU, AND D. Y. HONG. 1999. Phylogeny of rice ge- ONE 4: e5764. nomes with emphasis on origins of allotetraploid species. Proceedings EBERT, A., P. HANSON, AND P. GNIFFKE. 2010. Seed quality considerations of the National Academy of Sciences, USA 96: 1440014405. during germplasm maintenance, breeding and varietal development. GENESYS. 2011. Genesys: Gateway to genetic resources. Website http:// Asian Seed Congress, 9 November 2010, Shanhua, Taiwan. Website www.genesys-pgr.org/ [accessed 14 November 2011]. http://www.apsaseed.org/ASC2010/docs/PreCongress/1-Germplasm_ GLASZMANN, J. C. 1987. Isozymes and classification of Asian rice variet- Maintenance_and_Breeding.pdf. ies. Theoretical and Applied Genetics 74: 2130. EBANA, K., Y. KOJIMA, S. FUKUOKA, T. NAGAMINE, AND M. KAWASE. 2008. GLASZMANN, J. C., B. KILIAN, H. D. UPADHYAYA, AND R. K. VARSHNEY. 2010. Development of mini core collection of Japanese rice landrace. Accessing genetic diversity for crop improvement. Current Opinion Breeding Science 58: 281291. in Plant Biology 13: 167173.

15 February 2012] MCCOUCH ET AL.GENOMICS OF GENE BANKS 15 GU, H., P. ZHU, Y. JIAO, Y. MENG, AND M. CHEN. 2011. PRIN: A predate LI, J., J. XIAO, S. GRANDILLO, L. JIANG, Y. WAN, Q. DENG, L. YUAN, AND S. R. rice interactome network. BMC Bioinformatics 12: 161. MCCOUCH. 2004. QTL detection for rice grain quality traits using an GUPTA, P. K., J. K. ROY, AND M. PRASAD. 2001. Single nucleotide poly- interspecific backcross population derived from cultivated Asian morphisms: A new paradigm for molecular marker technology and (O. sativa L.) and African (O. glaberrima S.) rice. Genome 47: DNA polymorphism detection with emphasis on their use in plants. 697704. Current Science 80: 524535. LI, L., K. LU, Z. CHEN, T. MU, Z. HU, AND X. LI. 2008. Dominance, over- HAMMER, K., M. NEUMANN, AND H. U. KISON. 1996. Pre-breeding work dominance and epistasis condition the heterosis in two heterotic rice on einkornCooperation between gene bank and breeders. In S. hybrids. Genetics 180: 17251742. Padulosi, K. Hammer, and J. Heller [eds.], Hulled wheat: Proceedings LORENZ, A. J., S. CHAO, F. ASORO, E. L. HEFFNER, T. HAYASHI, H. IWATA, of the First International Workshop on Hulled Wheats, 200204, 1995, K. P. SMITH, M. E. SORRELLS, AND J. L. JANNICK. 2011. Genomics Castelvecchio Pascoli, Tuscany, Italy. International Plant Genetic selection in plant breeding: Knowledge and prospects. Advances in Resources Institute, Rome, Italy. Agronomy 110: 77123. HAWTIN, G., H. SHANDS, AND G. MACNEIL. 2011. The cost to the CGIAR MACKAY, M. C., AND K. A. STREET. 2004. Focused identification of germ- centres of maintaining and distributing germplasm. In Consortium plasm strategyFIGS. In C. K. Black, J. F. Panozzo, and G. J. Rebetzke Board of Trustees, Proposal to the Fund Council for Financial [eds.], Cereals 2004: Proceedings of the 54th Australian Cereal Chemistry Support to the CGIAR Center Genebanks in 2011. Annex 4, 1591. Conference and the 11th Wheat Breeders Assembly, 138141, 2004, Website http://www.cgiarfund.org/cgiarfund/sites/cgiarfund.org/files/ Canberra, Australian Capital Territory. Royal Australian Chemical Documents/PDF/fc4_funding_proposal_CGIAR_Genebanks.pdf [ac- Institute, Melbourne, Australia. cessed 22 November 2011]. MARRI, P. R., N. SARLA, L. V. REDDY, AND E. A. SIDDIQ. 2005. HEFFNER, E. L., M. E. SORRELLS, AND J. L. JANNINK. 2009. Genomic selec- Identification and mapping of yield and yield related QTLs from an tion for crop improvement. Crop Science 49: 112. Indian accession of Oryza rufipogon. BMC Genetics 6: 3347. HORNER, D. S., G. PAVESI, T. CASTRIGNANAO, P. D. DE MEO, S. LIUNI, M. MCCOUCH, S. R., M. SWEENEY, J. LI, M. THOMSON, E. SEPTINGSIH, J. SAMMETH, AND G. PRESOLE. 2010. Bioinformatics approaches for ge- EDWARDS, P. MONCADA, ET AL. 2007. Through the genetic bottle- nomics and post genomics applications of next-generation sequenc- neck: O. rufipogon as a source of trait-enhancing alleles for O. sativa. ing. Briefings in Bioinformatics 11: 181197. Euphytica 154: 317339. HUANG, X., Q. FENG, Q. QIAN, Q. ZHAO, L. WANG, ET AL. 2009. High- MCCOUCH, S. R., K. ZHAO, M. WRIGHT, C. W. TUNG, K. EBANA, M. THOMSON, throughput genotyping by whole-genome resequencing. Genome A. REYNOLDS, ET AL. 2010. Development of genome-wide SNP as- Research 19: 1068. says for rice. Breeding Science 60: 524535. HUANG, X., X. WEI, T. SANG, Q. ZHAO, Q. FENG, ET AL. 2010. Genome- MCGREGOR, C. E., R. VAN TREUREN, R. HOEKSTRA, AND T. J. L. VAN HINTUM. wide association studies of 14 agronomic traits in rice landraces. 2002. Analysis of the wild potato germplasm of the series Acaulia Nature Genetics 42: 961967. with AFLPs: Implications for ex situ conservation. Theoretical and IRGSP. 2005. International Rice Genome Sequencing Project [online]. Applied Genetics 104: 146156. Website http://rgp.dna.affrc.go.jp/IRGSP/nature436_793-800/na- MCMULLEN, M. D., S. KRESOVICH, H. S. VILLEDA, P. BRADBURY, H. LI, ture05.html [accessed 15 December 2011]. Q. SUN, S. FLINT-GARCIA, ET AL. 2009. Genetic properties of the IRIS. 2011. The International Rice Information System. Website http:// maize nested association mapping population. Science 325: 737740. irri.org/knowledge/tools/international-rice-information-system. [ac- MCNALLY, K. L., K. L. CHILDS, R. BOHNERT, R. M. DAVIDSON, K. ZHAO, cessed 15 December 2011]. V. J. ULAT, G. ZELLER, ET AL. 2009. Genomewide SNP varia- JARVIS, A., M. E. FERGUSON, D. E. WILLIAMS, L. GUARINO, P. G. JONES, H. tion reveals relationships among landraces and modern varieties of T. STALKER, J. F. M. VALLIS, ET AL. 2003. Biogeogeography of wild rice. Proceedings of the National Academy of Sciences, USA 106: Arachis: Assessing conservation status and setting future priorities. 1227312278. Crop Science 43: 11001108. METZKER, M. L. 2010. Sequencing technologiesThe next generation. KHUSH, G. S., D. S. BRAR, P. S. VIRK, S. X. TANG, S. S. MALIK, G. A. BUSTO, Nature Reviews. Genetics 11: 3146. Y. T. LEE, R. MCNALLY, L. N. TRINH, Y. JIANG, M. A. M. SHATA. 2003. MEYER, K. M., and J. H. J. LEVEAU. 2011. Microbiology of the phyllo- Classifying rice germplasm by isozyme polymorphism and origin of sphere: A playground for testing ecological concepts. Oecologia. cultivated rice. IRRI Discussion Paper Series No. 46. International MIKAMI, I., N. UWATOKO, Y. IKEDA, H. YAMAGUCHI, H. Y. HIRANO, Y. Rice Research Institute, Los Baos, Laguna, Philippines, 282 p. SUZUKI, AND Y. SANO. 2008. Allelic diversification at the ux locus KING, G. J., S. AMOAH, AND S. KURUP. 2010. Exploring and exploiting in landraces of Asian rice. Theoretical and Applied Genetics 116: epigenetic variation in crops. Genome 53: 856868. 979989. KOJIMA, Y., K. EBANA, S. FUKUOKA, T. NAGAMINE, AND M. KAWASE. 2005. MILLER, F. R. 1979. Utilization of introduced germplasm in crop improve- Development of an RFLP-based rice diversity research set of germ- ment programs. In Proceedings of American Society of Agronomy plasm. Breeding Science 55: 431440. annual meeting, Fort Collins, Colorado, USA, 1979 [abstract]. KOO, B., P. G. PARDEY, AND B. D. WRIGHT. 2002. Endowing future harvests: MOLINA, J., M. SIKORA, N. GARUD, J. M. FLOWERS, S. RUBINSTEIN, A. The long-term costs of conserving genetic resources at the CGIAR REYNOLDS, P. HUANG, ET AL. 2011. Molecular evidence for a single Centres. A report prepared for the CGIAR System-wide Genetic evolutionary origin of domesticated rice. Proceedings of the National Resources Programme by the International Food Policy Research Academy of Sciences, USA. Institute (IFPRI) in collaboration with the University of California, MONCADA, M. P., C. P. MARTNEZ, J. TOHME, E. GUIMARAES, M. CHATEL, J. Berkeley. Website http://www.sgrp.cgiar.org/?q=node/657 [accessed BORRERO, H. GAUCH, AND S. R. MCCOUCH. 2001. Quantitative trait loci 16 December 2011]. for yield and yield components in an Oryza sativa Oryza rufipogon KOVACH, M. J., M. N. CALINGACION, M. A. FITZGERALD, AND S. R. MCCOUCH. BC2F2 population evaluated in an upland environment. Theoretical 2009. The origin and evolution of fragrance in rice (Oryza sativa and Applied Genetics 102: 4152. L.). Proceedings of the National Academy of Sciences, USA 106: MUNROE, D. J., AND T. J. R. HARRIS. 2010. Third-generation sequencing 1444414449. fireworks at Marco Island. Nature Biotechnology 28: 426427. LAHNER, B., J. GONG, M. MAHMOUDIAN, E. L. SMITH, K. B. ABID, E. NGUYEN, B. D., D. S. BRAR, B. C. BUI, T. V. NGUYEN, L. N. PHAM, AND H. T. E. ROGERS, M. L. GUERINOT, ET AL. 2003. Genomic scale profil- NGUYEN. 2003. Identification and mapping of the QTL for aluminum ing of nutrient and trace elements in Arabidopsis thaliana. Nature tolerance introgressed from the new source, Oryza rufipogon Griff., Biotechnology 21: 12151221. into indica rice (Oryza sativa L.). Theoretical and Applied Genetics LEGARRA, A., AND R. L. FERNANDO. 2009. Linear models for joint associa- 106: 583593. tion and linkage QTL mapping. Genetics, Selection, Evolution. 41: ONDOV, B. D., A. VARADARAJAN, K. D. PASSALACQUA, AND N. H. BERGMAN. 4360. 2008. Efficient mapping of Applied Biosystems SOLiD sequence

16 16 AMERICAN JOURNAL OF BOTANY [Vol. 99 data to a reference genome for functional genomic applications. ing phenotypic and molecular marker data. Journal of the American Bioinformatics (Oxford, England) 24: 2776. Society for Horticultural Science 127: 558567. PLUCKNETT, D. L., N. J. H. SMITH, J. T. WILLIAMS, AND N. M. ANISHETTY. STREET, K., M. MACKAY, O. MITROFANOVA, J. KONOPKA, M. EL BOUHSSINI, 1987. Gene banks and the worlds food. Princeton University Press, N. KAUL, AND E. ZUEV. 2008. Swimming in the genepoolA rational Princeton, New Jersey, USA. approach to exploiting large genetic resource collections. R. Appels, POLLAK, L. M. 2003. The history and success of the public-private project R. Eastwood, E. Lagudah, P. Langridge, M. Mackay, L. McIntyre, and on germplasm enhancement of maize (GEM). Advances in Agronomy P. Sharp [eds.], Proceedings of the 11th International Wheat Genetics 78: 4587. Symposium, Brisbane, Australia, 2008, 14. Sydney University Press, POLLINGER, J. P., C. D. BUSTAMANTE, A. FLEDEL-ALON, S. SCHMUTZ, M. M. Sydney, Australia. GRAY, AND R. K. WAYNE. 2005. Selective sweep mapping of genes SWEENEY, M. T., M. J. THOMSON, Y. G. CHO, Y. J. PARK, S. H. WILLIAMSON, with large phenotypic effects. Genome Research 15: 18091819. C. D. BUSTAMANTE, AND S. R. MCCOUCH. 2007. Global dissemination POTATO GENOME SEQUENCING CONSORTIUM. 2011. Genome sequence and of a single mutation conferring white pericarp in rice. PLOS Genetics analysis of the tuber crop potato. Nature 475: 189197. 3: e133. QIU, D., J. XIAO, W. XIE, H. LIU, X. LI, L. XIONG, AND S. WANG. 2008. TAKAHASHI, Y., K. M. TESHIMA, S. YOKOI, H. INNAN, AND K. SHIMANOTO. Rice gene network inferred from expression profiling of plants over- 2009. Variations in HD1 proteins, HD3A promoters, and EHD1 ex- expressing OsWRKY13, a positive regulator of disease resistance. pression levels contribute to diversity of flowering time in cultivated Molecular Plant 1: 538551. rice. Proceedings of the National Academy of Sciences, USA 106: RAFALSKI, S. 2002. Applications of single nucleotide polymorphisms in 45554560. crop genetics. Current Opinion in Plant Biology 5: 94100. TAKANO-KAI, N., H. JIANG, T. KUBO, M. SWEENEY, T. MATSUMOTO, H. RAMREZ-VILLEGAS, J., C. KHOURY, A. JARVIS, D. G. DEBOUCK, AND L. KANAMORI, B. PADHUKASAHASTAM, ET AL. 2009. Evolutionary his- GUARINO. 2010. A gap analysis methodology for collecting crop tory of GS3, a gene conferring grain length in rice. Genetics 182: genepools: A case study with Phaseolus beans. PLoS ONE 5: e13497. 13231334. RICHARDS, C. L., O. BOSSDORF, AND M. PIGLIUCCI. 2010. What role does TANG, T., J. LU, J. HUANG, J. HE, S. R. MCCOUCH, Y. SHEN, Z. KAI, M. D. heritable epigenetic variation play in phenotypic evolution? Bioscience PURUGGANAN, S. SHI, AND C. I. WU. 2007. Genomic variation in rice: 60: 232237. genesis of highly polymorphic linkage blocks during domestication. ROSTOKS, N., L. RAMSAY, K. MACKENZIE, L. CARDLE, P. R. BHAR, M. L. PLOS Genetics 2: e199. ROOSE, J. T. SVENSSON, ET AL. 2006. Recent history of artificial out- TANKSLEY, S. D., AND S. R. MCCOUCH. 1997. Seed banks and molecu- crossing facilitates whole-genome association mapping in elite inbred lar maps: Unlocking genetic potential from the wild. Science 277: crop varieties. Proceedings of the National Academy of Sciences, USA 10631066. 103: 1865618661. TANKSLEY, S. D., AND J. C. NELSON. 1995. Advanced backcross QTL RUBENSTEIN, K. D., M. SMALE, AND M. P. WIDRLECHNER. 2006. Demand for analysis: A method for the simultaneous discovery and transfer of genetic resources and the U.S. national plant germplasm system. Crop valuable QTLs from unadapted germplasm into elite breeding lines. Science 46: 10211031. Theoretical and Applied Genetics 92: 191203. SACKVILLE HAMILTON, R., J. ENGELS, AND T. VAN HINTUM. 2003. THOMSON, M. J., T. TAI, A. MCCLUNG, X.-H. XAI, M. HINGA, K. LOBOS, Y. Rationalization of genebank management. In J. M. M. Engels and L. XU, ET AL. 2003. Mapping quantitative trait loci for yield, yield com- Visser [eds.], A guide to effective management of germplasm col- ponents and morphological traits in an advanced backcross popula- lections, IPGRI handbooks for genebanks no. 6, 8486. International tion between Oryza rufipogon and the Oryza sativa cultivar Jefferson. Plant Genetic Resources Institute, Rome, Italy. Theoretical and Applied Genetics 107: 479493. SALMON, A., M. L. AINOUCHE, AND J. F. WENDEL. 2005. Genetic and epige- THOMSON, M. J., K. ZHAO, M. WRIGHT, K. L. MCNALLY, J. REY, C. W. TUNG, netic consequences of recent hybridization and polyploidy in Spartina A. REYNOLDS, ET AL. 2011. High-throughput SNP genotyping for (Poaceae). Molecular Ecology 14: 11631175. breeding applications in rice using the BeadXpress platform. Molecular SANYAL, A., J. S. S. AMMIRAJU, F. LU, Y. YU, ET AL. 2010. Orthologous Breeding DOI: 10.1007/s11032-011-9663-x. comparisons of the Hd1 region across genera reveal Hd1 gene lability TIAN, F., D. J. LI, Q. FU, Z. F. ZHU, Y. C. FU, X. K. WANG, AND C. Q. SUN. within diploid Oryza species and disruptions to microsynteny in sor- 2006. Construction of introgression lines carrying wild rice (Oryza ghum. Molecular Biology and Evolution 27: 24872506. rufipogon Griff.) segments in cultivated rice (Oryza sativa L.) back- SARLA, N., AND B. P. MALLIKARJUNA SWAMY. 2005. Oryza glaberrima: ground and characterization of introgressed segments associated with A source for the improvement of Oryza sativa. Current Science 89: yield-related traits. Theoretical and Applied Genetics 112: 570580. 955963. TIAN, Z., T. YU, F. LIN, Y. YU, P. J. SANMIGUEL, R. A. WING, S. R. SEPTININGSIH, E., J. PRASETIYONO, E. LUBIS, T. H. TAI, T. TJUBARYAT, S. MCCOUCH, J. MA, AND S. A. JACKSON. 2011. Exceptional lability of MOELJOPAWIRO, AND S. R. MCCOUCH. 2003. Identification of quantitative a genomic complex in rice and its close relatives revealed by inter- trait loci for yield and yield components in an advanced backcross specific and intraspecific comparison and population analysis. BMC population derived from the Oryza sativa variety IR64 and the wild rela- Genomics 12: 142. tive O. rufipogon. Theoretical and Applied Genetics 107: 14191432. TUNG, C. W., K. ZHAO, M. WRIGHT, L. ALI, J. JUNG, J. KIMBALL, W. TYAGI, SGRP [System Wide Genetic Resource Programme]. 2011. Report of the M. THOMSON, ET AL. 2010. Development of a research platform for 21st meeting of the Intr-Centre Working Group on Genetic Resources dissecting phenotye-genotype associations in rice (Oryza spp.). Rice [ICWG-GR], Bali, Indonesia, 2011. Website http://www.sgrp.cgiar. 3: 205217. org/ [accessed 22 December 2011]. UPADHYAYA, H. D., K. N. REDDY, M. IRSHAD AHMED, AND C. L. L. SHINOZAKI, K., AND K. YAMAGUCHI-SHINOZAKI. 2007. Gene networks GOWDA. 2009. Identification of geographical gaps in the pearl involved in drought stress response and tolerance. Journal of millet germplasm conserved at ICRISAT genebank from West and Experimental Botany 58: 221227. Central Africa. Plant Genetic Resources; Characterization and SOLTIS, D. E., V. A. ALBERT, V. SAVOLAINEN, K. HILU, Y. L. QIU, M. W. Utilization 8: 4551. CHASE, J. S. FARRIS, ET AL. 2004. Genome-scale data, angiosperm re- VAN DER HEIJDEN, M. G. A., R. D. BARDGETT, AND N. M. VAN STAALEN. lationships, and ending incongruence: A cautionary tale in phyloge- 2008. The unseen majority: Soil microbes as drivers of plant diver- netics. Trends in Plant Science 9: 477483. sity and productivity in terrestrial ecosystems. Ecology Letters 11: SPOONER, D., R. VAN TREUREN, AND M. C. DE VICENTE. 2005. Molecular 296310. markers for genebank management. IPGRI Technical Bulletin no. 10. VAN DE WIEL, C. C. M., T. SRETENOVI RAJII, R. VAN TREUREN, K. International Plant Genetic Resources Institute, Rome, Italy. J. DEHMER, C. G. VAN DER LINDEN, AND T. J. L. VAN HINTUM. 2010. STAUB, J. E., F. DANE, K. REITSMA, G. FAZIO, AND A. LOPEZ-SESE. 2002. Distribution of genetic diversity in wild European populations of The formation of test arrays and a core collection in cucumber us- prickly lettuce (Lactuca serriola): Implications for plant genetic re-

17 February 2012] MCCOUCH ET AL.GENOMICS OF GENE BANKS 17 sources management. Plant Genetic Resources; Characterization and XIE, X., M. H. SONG, F. JIN, S. N. AHN, J. P. SUH, H. G. HWANG, AND S. R. Utilization 8: 171181. MCCOUCH. 2006. Fine mapping of a grain weight quantitative trait VAN HINTUM. T. J. L., A. H. D. Brown, C. Spillane, and T. Hodgkin. 2000. locus on rice chromosome 8 using near-isogenic lines derived from Core collections of plant genetic resources. IPGRI Technical Bulletin a cross between Oryza sativa and Oryza rufipogon. Theoretical and No. 3. International Plant Genetic Resources Institute, Rome, Italy. Applied Genetics 113: 885894. VAN HINTUM, T. J., AND R. VAN TREUREN. 2002. Molecular markers: XIE, X., M. H. SONG, J. P. SUH, H. G. HWANG, Y. G. KIM, S. MCCOUCH, AND tools to improve genebank efficiency. Cellular & Molecular Biology S. N. AHN. 2008. Fine mapping of a yield-enhancing QTL cluster Letters 7: 737744. associated with transgressive variation in an Oryza sativa O. rufi- VAN HINTUM, T. J. L. 2003. Molecular characterization of a lettuce ger- pogon cross. Theoretical and Applied Genetics 116: 613622. mplasm collection. In T. J. L. van Hintum, A. Lebeda, D. Pink, and XU, J., Q. ZHAO, P. DU, C. XU, B. WANG, Q. FENG, Q. LIU, S. TANG, M. GU, B. J. W. Schut [eds.], Eucarpia leafy vegetables. Website http://www. HAN, AND G. LIANG. 2010. Developing high throughput genotyped chro- leafyvegetables.nl/download/17_099-104_Hintum.pdf [accessed 14 mosome segment substitution lines based on population whole-genome November 2011]. re-sequencing in rice (Oryza sativa L.). BMC Genomics 11: 656. VAN HINTUM, T. J. L., C. M. M. VAN DE WIEL, D. L. VISSER, R. VAN TREUTEN, Xu, X., X. Liu, S. Ge, J. D. Jenson, F. Hu, X. Li, Y. Dong, et al. 2011. AND B. VOSMAN. 2007. The distribution of genetic diversity in a Resequencing 50 accessions of cultivated and wild rice yields Brassica oleracea gene bank collection related to the effects on di- markers for identifying agronomically important genes. Nature versity of regeneration, as measured with AFLPs. Theoretical and Biotechnology. Applied Genetics 114: 777786. YAMAMOTO, T., H. NAGASAKI, J. YONEMARU, K. EBANA, M. NAKAJIMA, T. VAN TREUREN, R., J. W. BOUKEMA, E. C. DE GROOT, C. C. M. VAN DE WIEL, SHIBAYA, AND M. YANO. 2010. Fine definition of the pedigree hap- AND T. J. L. VAN HINTUM. 2010. Marker-assisted reduction of redun- lotypes of closely related rice cultivars by means of genome-wide dancy in a genebank collection of cultivated lettuce. Plant Genetic discovery of single-nucleotide polymorphisms. BMC Genomics 11: Resources; Characterization and Utilization 8: 95105. 267281. VAN TREUREN, R., A. MAGDA, R. HOEKSTRA, AND T. J. L. VAN HINTUM. 2004. YAMAMOTO, T., J. YONEMARU, AND M. YANO. 2009. Towards the under- Genetic and economic aspects of marker-assisted reduction of redun- standing of complex traits in rice: Substantially or superficially? DNA dancy from a wild potato germplasm collection. Genetic Resources Research 16: 141154. and Crop Evolution 51: 277290. YAN, W., J. N. RUTGER, R. J. BRYANT, H. E. BOCKELMAN, R. G. FJELLSTROM, VENUPRASAD, R., M. E. BOOl, L. QUIATCHOn, M. T. STA CRUZ, M. AMANTE, M. H. CHEN, T. H. TAI, AND A. M. MCCLUNG. 2007. Development and and G. N. ATLIN. 2011a. A large-effect QTL for rice grain yield under evaluation of a core subset of the USDA rice germplasm collection. upland drought stress on chromosome 1. Molecular Breeding . Crop Science 47: 869878. VENUPRASAD, R., S. IMPA, R. P. VEERESH GOWDA, G. N. ATLIN, AND R. SERRAJ. YANG, H., T. A. BELL, G. A. CHURCHILL, AND F. P. M. DE VILLENA. 2007. 2011b. Rice near-isogenic-lines (NILs) contrasting for grain yield On the subspecific origin of the laboratory mouse. Nature Genetics under lowland drought stress. Field Crops Research 123: 3846. 39: 11001107. WHIPPS, J. M., P. HAND, D. PINK, and G. D. BENDING. 2008. Phyllosphere mi- YU, J., G. PRESSPOIR, W. H. BRIGGS, I. V. BI, M. YAMASAKI, J. F. DOEBLEY, crobiology with special reference to diversity and plant genotype. Journal M. D. MCMULLEN, ET AL. 2005. A unified mixed-model method for Applied Microbiology 105: 17441755. association mapping that accounts for multiple levels of relatedness. WILLOCQUET, L., M. NOEL, R. SACKVILLE HAMILTON, AND S. SAVARY. 2011. Nature Genetics 38: 203208. Susceptibility of rice to sheath blight: An assessment of the diversity ZAMIR, D. 2001. Improving plant breeding with exotic genetic libraries. of rice germplasm according to genetic groups and morphological Nature Reviews. Genetics 2: 983989. traits. Euphytica Online First, 21 May 2011 doi. Website http://www. ZHAO, K., C. W. TUNG, G. EIZENGA, M. H. WRIGHT, M. L. ALI, A. H. PRICE, springerlink.com/content/0120283n18gn3717/fulltext.pdf [accessed G. J. NORTON, ET AL. 2011. Genome-wide association mapping re- 17 November 2011]. veals rich genetic architecture of complex traits in Oryza sativa. WIDRLECHNER, M. P. 1997. Managerial tools for seed regeneration. Plant Nature Communications 2: 467. Varieties and Seeds 10: 185193. ZHAO, K., M. H. WRIGHT, J. KIMBALL, G. EIZENGA, A. MCCLUNG, M. KOVACH, WIDRLECHNER, M. P., AND L. A. BURKE. 2003. Analysis of germplasm dis- W. TYAGI, ET AL. 2010. Genomic diversity and introgression in O. tribution patterns for collections held at the North Central Regional sativa reveal the impact of domestication and breeding on the rice Plant Introduction Station, Ames, Iowa, USA. Genetic Resources and genome. PLoS ONE 5: e10780. Crop Evolution 50: 329337. ZHU, C., M. GORE, E. S. BUCKLER, AND J. YU. 2008. Status and prospects XIAO, J., J. LI, S. GRANDILLO, S. AHN, L. YUAN, S. TANKSLEY, AND S. R. of association mapping in plants. Plant Genome 1: 520. MCCOUCH. 1998. Identification of trait-improving quantitative trait ZOU, X. H., F. M. ZHANG, J. G. ZHANG, L. L. ZANG, L. TANG, J. WANG, T. loci alleles from a wild rice relative, Oryza rufipogon. Genetics 150: SANG, AND S. GE. 2008. Analysis of 142 genes resolves the rapid di- 899909. versification of the rice genus. Genome Biology 9: R49.

Load More