Contact Us

  The Triticeae Toolbox

Selection of an Optimized Training set for use in Genomic Prediction

This analysis is used prediction problems where per individual cost of observing / analyzing the response variable is high and therefore a small number of training examples is sought or when the candidate set from which the training set must be chosen (is not representative of the test data set). The optimized training sets are calculated via a genetic algorithm combined with a reliability measure of genomic estimated breeding values (GEBV) for any given test set. The functions to perform these analyses are available in the STPGA 3.0 R package. The default values are npop = 100 and niterations = 1000. Calculation of the training set can typically take anywhere from 5 minutes to 4 hours depending on the size of the dataset and the parameters selected. You will receive an email notification when your results are available. Reference: Deniz Akdemir, Julio Sanchez and Jean-Luc Jannink. Genetics Selection Evolution201547:38 DOI: 10.1186/s12711-015-0116-6. Optimization of genomic selection training populations with a genetic algorithm.
Missing genotype data can cause inaccurate results, the default filter setting removes markers missing greater than 10% of data. If a Test set is used it should have common markers with the Candidate set.
Consensus Genotype data - Select lines, trait, and trials for a Candidate Set. To select a test set first select "Save Candidates" then use the lines, trait, and trials page to pick any set of lines with common markers with the Candidate set.
Single Experiment Genotype data - Select lines by genotype experiment for a Candidate Set. To select a test set first select "Save Candidates", highlight the lines not desired in the Test set, select "Deselect highlighted lines", then "Analyze".

Login, to receive email notification when the analysis is finished.

In 2009 the Toronto International Data Release Workshop agreed on a policy statement about prepublication data sharing. Accordingly, the data producers are making many of the datasets in T3 available prior to publication of a global analysis. Guidelines for appropriate sharing of these data are given in the excerpt from the Toronto Statement.

I agree to the Data Usage Policy as specified in Toronto Statement.