The objective of this tutorial is to simulate a MET dataset using FieldSimR which captures GxE interaction and spatial variation with realistic structure and complexity. Genetic values will be simulated using AlphaSimR based on an additive genetic trait representing grain yield.
Summary
The simulation can be summarised by the following steps:
field_trial_error()
and plot_effects()
functionsmake_phenotypes()
function
Clean the working environment.
rm(list = ls())
Load required packages.
library(FieldSimR)
library(AlphaSimR)
#> Loading required package: R6
library(asreml)
#> Loading required package: Matrix
#> Online License checked out Sat Feb 10 11:14:20 2024
#> Loading ASReml-R version 4.2
library(ggplot2)
library(mbend)
The exact results in this tutorial can be reproduced by initially setting the following seed.
set.seed(123)
Consider a scenario in which 200 inbred wheat genotypes are evaluated for grain yield (t/ha) in field trials across 20 growing environments. The genotypes are simulated with additive genetic relatedness using AlphaSimR. Each trial comprises 20 columns and 20 rows for 400 plots in total, with two complete blocks of all genotypes aligned in the column direction (side-by-side). The overall trait mean is 4 t/ha and the overall trait variance is 0.2 t2/ha2. The average plot-level heritability is 0.3, which reflects the average ratio of genetic variance to total phenotypic variance within environments. The error variance is given by 0.2/0.3-0.2 = 0.47 t2/ha2.
The simulation parameters are:
n_genos <- 200 # no. genotypes
n_envs <- 20 # no. environments
n_reps <- n_blocks <- 2 # no. replicates/blocks per environment
block_dir <- "col" # block direction
n_cols <- 20 # no. columns per environment
n_rows <- 20 # no. rows per environment
n_plots <- n_cols * n_rows # no. plots per environment == n_genos * n_reps
mu <- 4 # overall trait mean
sigma2 <- 0.2 # overall trait variance
H2 <- 0.3 # average plot-level heritability
sigma2e <- sigma2/H2 - sigma2 # error variance
round(sigma2e, digits = 2)
#> [1] 0.47
The phenotypes are generated based on the linear mixed model:
\(\mathbf{y} = \mathbf{Z}\mathbf{gv} + \mathbf{e}\),
where \(\mathbf{gv}\) is a vector of genetic values with design matrix \(\mathbf{Z}\) and \(\mathbf{e}\) is a vector of plot errors. The inclusion of additional non-genetic effects is straightforward.
The simulation involves three steps:
The genetic values are simulated to capture GxE interaction, which can be broadly categorised as either heterogeneity of genetic variance (changes in genotype scale) or lack of genetic correlation (changes in genotype rank).
Simulating the genetic values involves two parts:
FieldSimR provides wrapper functions for compound symmetry and unstructured models for GxE interaction.
Set the number of homogeneous genotypes in the founder population to 20, the number of chromosomes to 20 and the number of segregating sites (biallelic QTN) per chromosome to 300.
n_founders <- 20 # Number of genotypes in the founder population.
n_chr <- 20 # Number of chromosomes.
n_seg_sites <- 300 # Number of QTN per chromosome.
The overall trait mean and variance are:
mu # trait mean
#> [1] 4
sigma2 # trait variance
#> [1] 0.2
Simulate the founder population using the runMacs()
function.
# Simulating a founder population using AlphaSimR's "WHEAT" presets to mimic the species' evolutionary history
founders <- runMacs(nInd = n_founders,
nChr = n_chr,
segSites = n_seg_sites,
inbred = TRUE,
species = "WHEAT",
nThreads = 2)
founders # MapPop object
#> An object of class "MapPop"
#> Ploidy: 2
#> Individuals: 20
#> Chromosomes: 20
#> Loci: 6000
SP <- SimParam$new(founders)
Simulate an additive genetic trait representing grain yield with AlphaSimR, using FieldSimR’s wrapper functions.
The genetic values for the unstructured model are simulated as:
\(\mathbf{gv} \sim \mbox{N}(\mu\mathbf{1}, \mathbf{G_e} \otimes \mathbf{I})\),
where \(\mu\) is the overall trait mean and \(\mathbf{G_e}\) is the between-environment genetic covariance matrix, which is constructed as:
\(\mathbf{G_e} = \mathbf{D}^{1/2}_\mathbf{e}\mathbf{C_e}\mathbf{D}^{1/2}_\mathbf{e}\)
where \(\mathbf{D_e}\) is a diagonal genetic variance matrix and \(\mathbf{C_e}\) is a between-environment genetic correlation matrix.
Sample the genetic variances from an inverse gamma distribution, with shape parameter of 5 and scale parameter of 1. Shift the genetic variances to give an overall trait variance of 0.2.
de <- 1/rgamma(n_envs, shape = 5, rate = 1) # sample
de <- c(scale(de, center = TRUE, scale = FALSE) + sigma2) # shift
round(de, digits = 2)
#> [1] 0.21 0.08 0.11 0.19 0.20 0.46 0.19 0.29 0.05 0.17 0.13 0.13 0.16 0.15 0.27
#> [16] 0.26 0.16 0.30 0.27 0.21
mean(de) # overall trait variance
#> [1] 0.2
De <- diag(de)
Sample the genetic correlations from a continuous uniform
distribution ranging from 0.1 to 0.7, which produces a
mean genetic correlation of 0.4. This is achieved using
FieldSimR’s function rand_cor_mat()
. When
pos_def = TRUE
, the correlation matrix is bent to ensure it
is positive definite.
Ce <- rand_cor_mat(p = n_envs, min_cor = 0.1, max_cor = 0.7, pos_def = TRUE)
#> Unweighted bending
#> max.iter = 10000
#> small.positive = 1e-04
#> method = hj
#> Found a correlation matrix.
#> Convergence met after 7 iterations.
round(Ce, digits = 2)[1:10,1:10]
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 1.00 0.59 0.37 0.53 0.52 0.46 0.50 0.47 0.44 0.22
#> [2,] 0.59 1.00 0.28 0.33 0.61 0.37 0.58 0.47 0.21 0.44
#> [3,] 0.37 0.28 1.00 0.26 0.54 0.52 0.21 0.57 0.44 0.38
#> [4,] 0.53 0.33 0.26 1.00 0.34 0.42 0.27 0.58 0.51 0.36
#> [5,] 0.52 0.61 0.54 0.34 1.00 0.24 0.37 0.58 0.12 0.24
#> [6,] 0.46 0.37 0.52 0.42 0.24 1.00 0.47 0.48 0.58 0.52
#> [7,] 0.50 0.58 0.21 0.27 0.37 0.47 1.00 0.48 0.25 0.43
#> [8,] 0.47 0.47 0.57 0.58 0.58 0.48 0.48 1.00 0.50 0.46
#> [9,] 0.44 0.21 0.44 0.51 0.12 0.58 0.25 0.50 1.00 0.31
#> [10,] 0.22 0.44 0.38 0.36 0.24 0.52 0.43 0.46 0.31 1.00
Populate a list of input parameters for AlphaSimR using FieldSimR’s
wrapper function unstr_asr_input()
. The list contains
vectors for the mean and variance of the genetic values in each
environment, as well as the between-environment genetic correlation
matrix.
input_asr <- unstr_asr_input(n_traits = 1,
n_envs = n_envs,
mean = mu,
var = sigma2,
cor_A = Ce)
input_asr
#> $mean
#> [1] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
#>
#> $var
#> [1] 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2
#> [20] 0.2
#>
#> $cor_A
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 1.0000000 0.5865950 0.3747776 0.5312653 0.5209284 0.4579192 0.5037274
#> [2,] 0.5865950 1.0000000 0.2788552 0.3333472 0.6104026 0.3671630 0.5778049
#> [3,] 0.3747776 0.2788552 1.0000000 0.2573593 0.5368721 0.5163457 0.2145860
#> [4,] 0.5312653 0.3333472 0.2573593 1.0000000 0.3431965 0.4221246 0.2651794
#> [5,] 0.5209284 0.6104026 0.5368721 0.3431965 1.0000000 0.2419720 0.3721250
#> [6,] 0.4579192 0.3671630 0.5163457 0.4221246 0.2419720 1.0000000 0.4679405
#> [7,] 0.5037274 0.5778049 0.2145860 0.2651794 0.3721250 0.4679405 1.0000000
#> [8,] 0.4723535 0.4717764 0.5713872 0.5844716 0.5836117 0.4825200 0.4829291
#> [9,] 0.4374403 0.2137179 0.4440244 0.5050202 0.1248017 0.5790501 0.2481605
#> [10,] 0.2216919 0.4439445 0.3808778 0.3596939 0.2398324 0.5232349 0.4317380
#> [11,] 0.4272026 0.1951003 0.1510230 0.3004538 0.4694223 0.1358066 0.1464872
#> [12,] 0.5631774 0.6078177 0.2515479 0.5105382 0.4450509 0.1783966 0.6117654
#> [13,] 0.5653952 0.5668616 0.4661577 0.2702833 0.6856596 0.5862155 0.2998098
#> [14,] 0.2732173 0.4503076 0.6215478 0.2299637 0.5542515 0.3097469 0.2851664
#> [15,] 0.3556537 0.6190483 0.1626147 0.5634988 0.4944988 0.4315678 0.3724607
#> [16,] 0.5178988 0.4171787 0.3766877 0.4449417 0.4129186 0.4937052 0.3733435
#> [17,] 0.2958459 0.4980034 0.3673450 0.5650577 0.5446077 0.5202848 0.3226012
#> [18,] 0.5427328 0.5216488 0.3949568 0.4891362 0.3295607 0.4975671 0.2115585
#> [19,] 0.5474724 0.3028055 0.1833938 0.5393218 0.6099049 0.2506814 0.4865935
#> [20,] 0.5908551 0.3018725 0.5636187 0.4487172 0.4552799 0.3591105 0.3355753
#> [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> [1,] 0.4723535 0.4374403 0.2216919 0.4272026 0.5631774 0.5653952 0.2732173
#> [2,] 0.4717764 0.2137179 0.4439445 0.1951003 0.6078177 0.5668616 0.4503076
#> [3,] 0.5713872 0.4440244 0.3808778 0.1510230 0.2515479 0.4661577 0.6215478
#> [4,] 0.5844716 0.5050202 0.3596939 0.3004538 0.5105382 0.2702833 0.2299637
#> [5,] 0.5836117 0.1248017 0.2398324 0.4694223 0.4450509 0.6856596 0.5542515
#> [6,] 0.4825200 0.5790501 0.5232349 0.1358066 0.1783966 0.5862155 0.3097469
#> [7,] 0.4829291 0.2481605 0.4317380 0.1464872 0.6117654 0.2998098 0.2851664
#> [8,] 1.0000000 0.5013918 0.4592016 0.1413567 0.3512898 0.4442442 0.3263808
#> [9,] 0.5013918 1.0000000 0.3110177 0.1989468 0.3305566 0.3681091 0.2194992
#> [10,] 0.4592016 0.3110177 1.0000000 0.1407194 0.2458284 0.4223482 0.2254922
#> [11,] 0.1413567 0.1989468 0.1407194 1.0000000 0.2861474 0.3386404 0.4543756
#> [12,] 0.3512898 0.3305566 0.2458284 0.2861474 1.0000000 0.1708999 0.3021806
#> [13,] 0.4442442 0.3681091 0.4223482 0.3386404 0.1708999 1.0000000 0.4133189
#> [14,] 0.3263808 0.2194992 0.2254922 0.4543756 0.3021806 0.4133189 1.0000000
#> [15,] 0.5402087 0.1464343 0.3897020 0.1231351 0.1723089 0.3590156 0.4012039
#> [16,] 0.5799000 0.4942309 0.3425990 0.5116605 0.2865068 0.3742182 0.2594955
#> [17,] 0.4042298 0.3717029 0.5409229 0.3901456 0.5962697 0.4290102 0.2626102
#> [18,] 0.3560651 0.1800172 0.2850317 0.2702417 0.1846140 0.5022811 0.3309685
#> [19,] 0.4174559 0.3616561 0.3719966 0.5400123 0.4540245 0.6130500 0.2632928
#> [20,] 0.1767090 0.2744708 0.2570862 0.2847362 0.5081165 0.3981526 0.3222016
#> [,15] [,16] [,17] [,18] [,19] [,20]
#> [1,] 0.3556537 0.5178988 0.2958459 0.5427328 0.5474724 0.5908551
#> [2,] 0.6190483 0.4171787 0.4980034 0.5216488 0.3028055 0.3018725
#> [3,] 0.1626147 0.3766877 0.3673450 0.3949568 0.1833938 0.5636187
#> [4,] 0.5634988 0.4449417 0.5650577 0.4891362 0.5393218 0.4487172
#> [5,] 0.4944988 0.4129186 0.5446077 0.3295607 0.6099049 0.4552799
#> [6,] 0.4315678 0.4937052 0.5202848 0.4975671 0.2506814 0.3591105
#> [7,] 0.3724607 0.3733435 0.3226012 0.2115585 0.4865935 0.3355753
#> [8,] 0.5402087 0.5799000 0.4042298 0.3560651 0.4174559 0.1767090
#> [9,] 0.1464343 0.4942309 0.3717029 0.1800172 0.3616561 0.2744708
#> [10,] 0.3897020 0.3425990 0.5409229 0.2850317 0.3719966 0.2570862
#> [11,] 0.1231351 0.5116605 0.3901456 0.2702417 0.5400123 0.2847362
#> [12,] 0.1723089 0.2865068 0.5962697 0.1846140 0.4540245 0.5081165
#> [13,] 0.3590156 0.3742182 0.4290102 0.5022811 0.6130500 0.3981526
#> [14,] 0.4012039 0.2594955 0.2626102 0.3309685 0.2632928 0.3222016
#> [15,] 1.0000000 0.3750602 0.3810373 0.3941801 0.2692220 0.1112151
#> [16,] 0.3750602 1.0000000 0.5465220 0.4274354 0.3959432 0.4482510
#> [17,] 0.3810373 0.5465220 1.0000000 0.2913456 0.4536348 0.5029859
#> [18,] 0.3941801 0.4274354 0.2913456 1.0000000 0.1529263 0.4623088
#> [19,] 0.2692220 0.3959432 0.4536348 0.1529263 1.0000000 0.5143422
#> [20,] 0.1112151 0.4482510 0.5029859 0.4623088 0.5143422 1.0000000
Simulate an additive genetic trait representing grain yield. This produces 20 columns in the founder population. The columns contain the genetic values for each environment.
SP$addTraitA(nQtlPerChr = n_seg_sites,
mean = input_asr$mean,
var = input_asr$var,
corA = input_asr$cor_A)
founders <- newPop(founders) # create pop-class object
head(founders@gv)
#> Trait1 Trait2 Trait3 Trait4 Trait5 Trait6 Trait7 Trait8
#> [1,] 3.662319 3.327789 3.626410 3.060205 3.782182 3.061413 2.601274 3.257708
#> [2,] 3.609349 4.689218 4.006780 4.744841 4.529037 4.320794 4.561244 4.716394
#> [3,] 3.876705 3.977862 3.718663 3.664568 4.181587 4.057004 4.146229 3.933701
#> [4,] 4.428944 4.058932 4.378631 4.441817 4.709719 4.535052 4.097166 4.383028
#> [5,] 3.682564 2.979650 4.258279 4.455997 4.004899 4.284199 4.121226 4.195624
#> [6,] 3.878690 4.026717 4.875523 4.110571 4.786910 4.430079 3.864503 4.704515
#> Trait9 Trait10 Trait11 Trait12 Trait13 Trait14 Trait15 Trait16
#> [1,] 3.294190 2.849119 4.069572 3.042133 3.722130 3.587591 3.106844 3.178407
#> [2,] 4.080286 4.298731 3.471209 4.470825 3.997283 4.318946 5.194595 4.335121
#> [3,] 3.865454 3.675572 4.607009 3.760303 3.724924 3.944731 4.204465 4.287346
#> [4,] 3.878202 4.186585 4.480847 4.121653 4.617747 4.498126 4.374674 4.208419
#> [5,] 4.606708 4.019972 5.156479 4.018475 3.457132 4.327196 3.682245 4.716456
#> [6,] 4.464547 3.924118 3.846259 3.929079 4.108051 4.617361 4.451845 4.119595
#> Trait17 Trait18 Trait19 Trait20
#> [1,] 3.080543 3.807549 3.002744 3.287518
#> [2,] 4.758906 4.290912 4.076275 3.903677
#> [3,] 4.068368 3.796067 3.635399 3.488543
#> [4,] 4.361450 4.457638 4.585593 4.305124
#> [5,] 4.514736 3.199222 4.795258 4.263101
#> [6,] 4.284648 3.924236 3.581546 3.958096
Generate the wheat genotypes by randomly crossing the founders using
the randCross()
function, and then make doubled haploid
(DH) lines using the makeDH()
function.
f1s <- randCross(pop = founders,
nCrosses = 20, # no. crosses
nProgeny = 10) # no. progeny per cross
DHs <- makeDH(pop = f1s, nDH = 1)
DHs
#> An object of class "Pop"
#> Ploidy: 2
#> Individuals: 200
#> Chromosomes: 20
#> Loci: 6000
#> Traits: 20
Obtain the genetic values for the DH population using FieldSimR’s
wrapper function unstr_asr_output()
.
gv_df <- unstr_asr_output(pop = DHs,
n_traits = 1,
n_envs = n_envs,
n_reps = n_reps)
head(gv_df)
#> env rep id gv.Trait1
#> 1 1 1 221 3.202680
#> 2 1 1 222 3.829752
#> 3 1 1 223 4.237403
#> 4 1 1 224 3.416975
#> 5 1 1 225 3.409848
#> 6 1 1 226 3.595984
GV <- matrix(gv_df$gv.Trait1[gv_df$rep == 1], ncol = n_envs)
round(mean(GV), digits = 2) # overall trait mean in the DH population
#> [1] 4
round(cov2cor(var(GV)), digits = 2) # simulated between-environment genetic correlation matrix in the DH population
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
#> [1,] 1.00 0.40 0.24 0.46 0.39 0.41 0.39 0.37 0.48 0.05 0.34 0.53 0.56
#> [2,] 0.40 1.00 0.16 0.32 0.56 0.25 0.54 0.49 0.23 0.46 -0.20 0.66 0.63
#> [3,] 0.24 0.16 1.00 0.23 0.59 0.58 0.07 0.60 0.54 0.35 0.25 0.12 0.31
#> [4,] 0.46 0.32 0.23 1.00 0.29 0.47 0.39 0.60 0.52 0.34 0.16 0.56 0.25
#> [5,] 0.39 0.56 0.59 0.29 1.00 0.34 0.28 0.60 0.22 0.21 0.36 0.40 0.60
#> [6,] 0.41 0.25 0.58 0.47 0.34 1.00 0.45 0.63 0.64 0.52 0.30 0.19 0.56
#> [7,] 0.39 0.54 0.07 0.39 0.28 0.45 1.00 0.58 0.39 0.53 0.10 0.63 0.38
#> [8,] 0.37 0.49 0.60 0.60 0.60 0.63 0.58 1.00 0.65 0.56 0.19 0.38 0.47
#> [9,] 0.48 0.23 0.54 0.52 0.22 0.64 0.39 0.65 1.00 0.39 0.21 0.41 0.38
#> [10,] 0.05 0.46 0.35 0.34 0.21 0.52 0.53 0.56 0.39 1.00 -0.04 0.27 0.45
#> [11,] 0.34 -0.20 0.25 0.16 0.36 0.30 0.10 0.19 0.21 -0.04 1.00 0.02 0.19
#> [12,] 0.53 0.66 0.12 0.56 0.40 0.19 0.63 0.38 0.41 0.27 0.02 1.00 0.31
#> [13,] 0.56 0.63 0.31 0.25 0.60 0.56 0.38 0.47 0.38 0.45 0.19 0.31 1.00
#> [14,] 0.03 0.37 0.59 0.30 0.50 0.44 0.30 0.56 0.32 0.41 0.35 0.19 0.34
#> [15,] 0.14 0.68 0.27 0.56 0.56 0.40 0.40 0.66 0.20 0.47 -0.08 0.30 0.37
#> [16,] 0.46 0.13 0.58 0.39 0.47 0.67 0.32 0.63 0.61 0.28 0.55 0.17 0.35
#> [17,] 0.16 0.41 0.50 0.46 0.64 0.55 0.26 0.45 0.36 0.46 0.22 0.50 0.39
#> [18,] 0.57 0.48 0.29 0.43 0.40 0.36 0.15 0.27 0.11 0.09 0.13 0.28 0.49
#> [19,] 0.49 0.17 0.00 0.52 0.29 0.32 0.60 0.37 0.40 0.37 0.44 0.43 0.54
#> [20,] 0.46 -0.02 0.43 0.27 0.26 0.21 -0.02 -0.06 0.20 -0.02 0.13 0.30 0.16
#> [,14] [,15] [,16] [,17] [,18] [,19] [,20]
#> [1,] 0.03 0.14 0.46 0.16 0.57 0.49 0.46
#> [2,] 0.37 0.68 0.13 0.41 0.48 0.17 -0.02
#> [3,] 0.59 0.27 0.58 0.50 0.29 0.00 0.43
#> [4,] 0.30 0.56 0.39 0.46 0.43 0.52 0.27
#> [5,] 0.50 0.56 0.47 0.64 0.40 0.29 0.26
#> [6,] 0.44 0.40 0.67 0.55 0.36 0.32 0.21
#> [7,] 0.30 0.40 0.32 0.26 0.15 0.60 -0.02
#> [8,] 0.56 0.66 0.63 0.45 0.27 0.37 -0.06
#> [9,] 0.32 0.20 0.61 0.36 0.11 0.40 0.20
#> [10,] 0.41 0.47 0.28 0.46 0.09 0.37 -0.02
#> [11,] 0.35 -0.08 0.55 0.22 0.13 0.44 0.13
#> [12,] 0.19 0.30 0.17 0.50 0.28 0.43 0.30
#> [13,] 0.34 0.37 0.35 0.39 0.49 0.54 0.16
#> [14,] 1.00 0.56 0.30 0.30 0.22 0.14 -0.11
#> [15,] 0.56 1.00 0.26 0.44 0.35 0.12 -0.15
#> [16,] 0.30 0.26 1.00 0.54 0.30 0.35 0.37
#> [17,] 0.30 0.44 0.54 1.00 0.24 0.27 0.41
#> [18,] 0.22 0.35 0.30 0.24 1.00 0.10 0.41
#> [19,] 0.14 0.12 0.35 0.27 0.10 1.00 0.24
#> [20,] -0.11 -0.15 0.37 0.41 0.41 0.24 1.00
round(Ce, digits = 2)[1:20, 1:20] # expected between-environment genetic correlation matrix
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
#> [1,] 1.00 0.59 0.37 0.53 0.52 0.46 0.50 0.47 0.44 0.22 0.43 0.56 0.57
#> [2,] 0.59 1.00 0.28 0.33 0.61 0.37 0.58 0.47 0.21 0.44 0.20 0.61 0.57
#> [3,] 0.37 0.28 1.00 0.26 0.54 0.52 0.21 0.57 0.44 0.38 0.15 0.25 0.47
#> [4,] 0.53 0.33 0.26 1.00 0.34 0.42 0.27 0.58 0.51 0.36 0.30 0.51 0.27
#> [5,] 0.52 0.61 0.54 0.34 1.00 0.24 0.37 0.58 0.12 0.24 0.47 0.45 0.69
#> [6,] 0.46 0.37 0.52 0.42 0.24 1.00 0.47 0.48 0.58 0.52 0.14 0.18 0.59
#> [7,] 0.50 0.58 0.21 0.27 0.37 0.47 1.00 0.48 0.25 0.43 0.15 0.61 0.30
#> [8,] 0.47 0.47 0.57 0.58 0.58 0.48 0.48 1.00 0.50 0.46 0.14 0.35 0.44
#> [9,] 0.44 0.21 0.44 0.51 0.12 0.58 0.25 0.50 1.00 0.31 0.20 0.33 0.37
#> [10,] 0.22 0.44 0.38 0.36 0.24 0.52 0.43 0.46 0.31 1.00 0.14 0.25 0.42
#> [11,] 0.43 0.20 0.15 0.30 0.47 0.14 0.15 0.14 0.20 0.14 1.00 0.29 0.34
#> [12,] 0.56 0.61 0.25 0.51 0.45 0.18 0.61 0.35 0.33 0.25 0.29 1.00 0.17
#> [13,] 0.57 0.57 0.47 0.27 0.69 0.59 0.30 0.44 0.37 0.42 0.34 0.17 1.00
#> [14,] 0.27 0.45 0.62 0.23 0.55 0.31 0.29 0.33 0.22 0.23 0.45 0.30 0.41
#> [15,] 0.36 0.62 0.16 0.56 0.49 0.43 0.37 0.54 0.15 0.39 0.12 0.17 0.36
#> [16,] 0.52 0.42 0.38 0.44 0.41 0.49 0.37 0.58 0.49 0.34 0.51 0.29 0.37
#> [17,] 0.30 0.50 0.37 0.57 0.54 0.52 0.32 0.40 0.37 0.54 0.39 0.60 0.43
#> [18,] 0.54 0.52 0.39 0.49 0.33 0.50 0.21 0.36 0.18 0.29 0.27 0.18 0.50
#> [19,] 0.55 0.30 0.18 0.54 0.61 0.25 0.49 0.42 0.36 0.37 0.54 0.45 0.61
#> [20,] 0.59 0.30 0.56 0.45 0.46 0.36 0.34 0.18 0.27 0.26 0.28 0.51 0.40
#> [,14] [,15] [,16] [,17] [,18] [,19] [,20]
#> [1,] 0.27 0.36 0.52 0.30 0.54 0.55 0.59
#> [2,] 0.45 0.62 0.42 0.50 0.52 0.30 0.30
#> [3,] 0.62 0.16 0.38 0.37 0.39 0.18 0.56
#> [4,] 0.23 0.56 0.44 0.57 0.49 0.54 0.45
#> [5,] 0.55 0.49 0.41 0.54 0.33 0.61 0.46
#> [6,] 0.31 0.43 0.49 0.52 0.50 0.25 0.36
#> [7,] 0.29 0.37 0.37 0.32 0.21 0.49 0.34
#> [8,] 0.33 0.54 0.58 0.40 0.36 0.42 0.18
#> [9,] 0.22 0.15 0.49 0.37 0.18 0.36 0.27
#> [10,] 0.23 0.39 0.34 0.54 0.29 0.37 0.26
#> [11,] 0.45 0.12 0.51 0.39 0.27 0.54 0.28
#> [12,] 0.30 0.17 0.29 0.60 0.18 0.45 0.51
#> [13,] 0.41 0.36 0.37 0.43 0.50 0.61 0.40
#> [14,] 1.00 0.40 0.26 0.26 0.33 0.26 0.32
#> [15,] 0.40 1.00 0.38 0.38 0.39 0.27 0.11
#> [16,] 0.26 0.38 1.00 0.55 0.43 0.40 0.45
#> [17,] 0.26 0.38 0.55 1.00 0.29 0.45 0.50
#> [18,] 0.33 0.39 0.43 0.29 1.00 0.15 0.46
#> [19,] 0.26 0.27 0.40 0.45 0.15 1.00 0.51
#> [20,] 0.32 0.11 0.45 0.50 0.46 0.51 1.00
round(mean(diag(var(GV))), digits = 2) # overall trait variance in the DH population
#> [1] 0.22
Simulate the plot errors as the sum of spatial, random and extraneous error terms. Sample the proportions of spatial and extraneous error variance from a continuous uniform distribution ranging from 0.2 to 0.6 and from 0 to 0.2, respectively. This produces a mean proportion of spatial error variance of 0.5 and a mean proportion of extraneous error variance of 0.1. FieldSimR will then allocate the remaining error variance to the random errors by default.
The column and row autocorrelations are sampled from a continuous uniform distribution ranging from 0.3 to 0.7 and from 0.5 to 0.9, respectively. This produces a mean column autocorrelation of 0.5 and a mean row autocorrelation of 0.7.
The extraneous errors are simulated based on zig-zag ordering between neighbouring rows.
The initial simulation parameters are:
spatial_model <- "AR1:AR1"
round(sigma2e, digits = 2) # error variance
#> [1] 0.47
prop_spatial <- runif(n_envs, min = 0.2, max = 0.6) # proportions of spatial error variance
round(prop_spatial, digits = 2)
#> [1] 0.33 0.56 0.48 0.49 0.27 0.41 0.45 0.29 0.23 0.35 0.22 0.59 0.21 0.37 0.52
#> [16] 0.50 0.43 0.54 0.29 0.48
prop_ext <- runif(n_envs, min = 0, max = 0.2) # proportions of extraneous error variance
round(prop_ext, digits = 2)
#> [1] 0.11 0.19 0.06 0.09 0.08 0.10 0.03 0.10 0.16 0.14 0.15 0.18 0.00 0.05 0.19
#> [16] 0.17 0.20 0.10 0.08 0.13
round(1 - (prop_spatial + prop_ext), digits = 2) # proportions of random error variance
#> [1] 0.56 0.25 0.46 0.42 0.65 0.49 0.52 0.60 0.61 0.50 0.63 0.23 0.78 0.58 0.29
#> [16] 0.33 0.37 0.36 0.63 0.39
col_cor <- runif(n_envs, min = 0.3, max = 0.7) # column autocorrelations
row_cor <- runif(n_envs, min = 0.5, max = 0.9) # row autocorrelations
ext_ord <- "zig-zag" # ordering of extraneous variation
ext_dir <- "row"
Populate the field_trial_error()
function with the terms
above. A data frame is returned containing the environment, block,
column and row numbers as well as the simulated plot error. When
return_effects = TRUE
, a list is returned with an
additional data frame containing the spatial, random and extraneous
error terms.
error_df <- field_trial_error(n_traits = 1,
n_envs = n_envs,
n_blocks = n_blocks,
block_dir = block_dir,
n_cols = n_cols,
n_rows = n_rows,
var_R = sigma2e,
prop_spatial = prop_spatial,
spatial_model = spatial_model,
col_cor = col_cor,
row_cor = row_cor,
prop_ext = prop_ext,
ext_ord = ext_ord,
ext_dir = ext_dir,
return_effects = TRUE)
head(error_df$plot_df) # data frame with simulated plot errors
#> env block col row e.Trait1
#> 1 1 1 1 1 -0.38134595
#> 2 1 1 1 2 -0.51062922
#> 3 1 1 1 3 0.25215211
#> 4 1 1 1 4 0.04483486
#> 5 1 1 1 5 -0.58427151
#> 6 1 1 1 6 1.27038179
head(error_df$Trait1) # data frame with simulated spatial, random and extraneous errors <--
#> env block col row e.spat e.rand e.ext.col e.ext.row
#> 1 1 1 1 1 -0.1855081 -0.13772252 0 -0.05811535
#> 2 1 1 1 2 -0.5272642 -0.07370537 0 0.09034033
#> 3 1 1 1 3 -0.1019815 0.45984631 0 -0.10571273
#> 4 1 1 1 4 0.1973199 -0.21359287 0 0.06110786
#> 5 1 1 1 5 0.7554173 -0.97253173 0 -0.36715705
#> 6 1 1 1 6 0.4808541 0.48525221 0 0.30427547
Display the separate error terms using FieldSimR’s function
plot_effects()
.
plot_effects(error_df$Trait1[error_df$Trait1$env == 1,], effect = "e.spat", labels = TRUE)
The total plot errors for environment 1 are given by:
plot_effects(error_df$plot_df[error_df$Trait1$env == 1,], effect = "e.Trait1", labels = TRUE)
Display the theoretical and sample variograms using FieldSimR’s
functions theoretical_variogram()
and
sample_variogram()
.
theoretical_variogram(n_cols = n_cols, n_rows = n_rows, var_R = sigma2e, prop_spatial = 1, col_cor = 0.5, row_cor = 0.7)
sample_variogram(error_df$plot_df[error_df$Trait1$env == 1,], effect = "e.Trait1")
Generate the phenotypes using FieldSimR’s function
make_phenotypes()
. When randomise = TRUE
,
genotypes are allocated to plots using a randomised complete block
design.
met_df <- make_phenotypes(gv_df = gv_df, error_df = error_df, randomise = TRUE)
head(met_df)
#> env block col row id phe.Trait1
#> 1 1 1 1 1 315 3.679432
#> 2 1 1 1 2 300 3.401802
#> 3 1 1 1 3 318 4.269240
#> 4 1 1 1 4 326 3.882307
#> 5 1 1 1 5 402 3.785814
#> 6 1 1 1 6 286 4.977517
Display the phenotypes.
plot_effects(met_df[met_df$env == 1,], effect = "phe.Trait1", labels = TRUE)
Run a linear mixed model analysis with ASReml-R and compare the prediction accuracies between models.
# main effects only model, and AR1 + ID residual
asr1 <- asreml(phe.Trait1 ~ env, # fixed environment main effects
random = ~ id + diag(env):block + diag(env):units, # random genotype main effects and block effects plus ID error (units)
residual = ~ dsum(~ar1(col):ar1(row)|env), # separable AR1 process for each environment
data = met_df,
workspace = "1gb")
#> Warning in asreml(phe.Trait1 ~ env, random = ~id + diag(env):block +
#> diag(env):units, : Warning : Log-likelihood decreased to -2585.90; trying with
#> reduced updates 0.735258
#> Warning in asreml(phe.Trait1 ~ env, random = ~id + diag(env):block +
#> diag(env):units, : Log-likelihood not converged
#> Warning in asreml(phe.Trait1 ~ env, random = ~id + diag(env):block +
#> diag(env):units, : Some components changed by more than 1% on the last
#> iteration
asr1 <- update(asr1) # update the model
#> Warning in asreml(fixed = phe.Trait1 ~ env, random = ~id + diag(env):block + :
#> Log-likelihood not converged
#> Warning in asreml(fixed = phe.Trait1 ~ env, random = ~id + diag(env):block + :
#> Some components changed by more than 1% on the last iteration
BLUPs1_g <- asr1$coefficients$random[grep("id", rownames(asr1$coefficients$random)),] # predicted genotype main effects
# prediction accuracy = correlation between true and predicted effects
# prediction accuracy of genotype main effects
g_main <- with(gv_df[gv_df$rep == 1,], tapply(gv.Trait1, id, mean)) # true genotype averages across environments
cor(BLUPs1_g, g_main)
#> [1] 0.9522539
# prediction accuracy of genotype effects within environments
BLUPs1_ge <- rep(BLUPs1_g, n_envs)
# Note that this model does not fit separate genotype effects for each environment. It only fits genotype main effects. Thus, these predictions are obtained by repeating the predicted genotype main effects across all environments.
hist(diag(cor(matrix(BLUPs1_ge, ncol = n_envs), GV)), main = "Accuracy per one environment")
abline(v = mean(diag(cor(matrix(BLUPs1_ge, ncol = n_envs), GV))), col="blue", lwd=3, lty=1)
# compound symmetry model, and AR1 + ID residual
asr2 <- asreml(phe.Trait1 ~ env, # fixed environment main effects
random = ~ id + env:id + diag(env):block + diag(env):units, # random genotype main effects and GxE interaction effects, block effects plus ID error (units)
residual = ~ dsum(~ar1(col):ar1(row)|env), # separable AR1 process for each environment
data = met_df,
workspace = "1gb")
#> Warning in asreml(phe.Trait1 ~ env, random = ~id + env:id + diag(env):block + :
#> Warning : Log-likelihood decreased to -2321.83; trying with reduced updates
#> 0.667756
#> Warning in asreml(phe.Trait1 ~ env, random = ~id + env:id + diag(env):block + :
#> Log-likelihood not converged
#> Warning in asreml(phe.Trait1 ~ env, random = ~id + env:id + diag(env):block + :
#> Some components changed by more than 1% on the last iteration
asr2 <- update(asr2) # update the model
BLUPs2_g <- asr2$coefficients$random[grep("^id", rownames(asr2$coefficients$random)),] # predicted genotype main effects
BLUPs2_gxe <- asr2$coefficients$random[grep("env.*id", rownames(asr2$coefficients$random)),] # predicted GxE interaction effects
# prediction accuracy = correlation between true and predicted effects
# prediction accuracy of genotype main effects
cor(BLUPs2_g, g_main)
#> [1] 0.9537427
# prediction accuracy of genotype effects within environments
BLUPs2_ge <- rep(BLUPs2_g) + BLUPs2_gxe # construct genotype effects within environments
hist(diag(cor(matrix(BLUPs2_ge, ncol = n_envs), GV)), main = "Accuracy per one environment")
abline(v = mean(diag(cor(matrix(BLUPs2_ge, ncol = n_envs), GV))), col="blue", lwd=3, lty=1)
# factor analytic model of order 3, and AR1 + ID residual
asr3 <- asreml(phe.Trait1 ~ env, # fixed environment main effects
random = ~ rr(env, 3):id + diag(env):id + diag(env):block + diag(env):units, # random genotype effects within environments, block effects plus ID error (units)
# the factor analytic model is fitted as a reduced rank matrix (rr(env), these are the lambdas)
# and a diagonal matrix (diag(env) these are the psi's)
residual = ~ dsum(~ar1(col):ar1(row)|env), # separable AR1 process for each environment
data = met_df, workspace = "2gb")
#> Warning in asreml(phe.Trait1 ~ env, random = ~rr(env, 3):id + diag(env):id + :
#> Warning : Log-likelihood decreased to -2236.65; trying with reduced updates
#> 0.487305
#> Warning in asreml(phe.Trait1 ~ env, random = ~rr(env, 3):id + diag(env):id + :
#> Log-likelihood not converged
#> Warning in asreml(phe.Trait1 ~ env, random = ~rr(env, 3):id + diag(env):id + :
#> Some components changed by more than 1% on the last iteration
asr3 <- update(asr3) # update the model
#> Warning in asreml(fixed = phe.Trait1 ~ env, random = ~rr(env, 3):id +
#> diag(env):id + : Log-likelihood not converged
#> Warning in asreml(fixed = phe.Trait1 ~ env, random = ~rr(env, 3):id +
#> diag(env):id + : Some components changed by more than 1% on the last iteration
BLUPs3_ge <- asr3$coefficients$random[grep("rr.*env.*id", rownames(asr3$coefficients$random)),][1:(n_genos*n_envs)] +
asr3$coefficients$random[grep("^env.*id", rownames(asr3$coefficients$random)),] # predicted genotype effects within environments
# prediction accuracy = correlation between true and predicted effects
# prediction accuracy of genotype main effects
BLUPs3_g <- rowMeans(matrix(BLUPs3_ge, ncol = n_envs))
# Note that this model does not fit genotype main effects, so these are obtained as simple average across environments
cor(BLUPs3_g, g_main)
#> [1] 0.9532985
# prediction accuracy of genotype effects within environments
hist(diag(cor(matrix(BLUPs3_ge, ncol = n_envs), GV)), main = "Accuracy per one environment")
abline(v = mean(diag(cor(matrix(BLUPs3_ge, ncol = n_envs), GV))), col="blue", lwd=3, lty=1)
What model produced the highest prediction accuracies for the genotype main effects? What about for the genotype effects within environments? Why does model 1 do poorly for the latter?
Compare the top 20 genotypes selected from each model in terms of the genotype main effects. This may represent selection for broad adaptation across all environments.
true_gv <- names(g_main)[rev(order(g_main))][1:20] # sort based on true genotype effects and take top 20 genos
top1_g <- names(g_main)[rev(order(BLUPs1_g))][1:20] # sort based on BLUPs and take top 20 genos
top2_g <- names(g_main)[rev(order(BLUPs2_g))][1:20]
top3_g <- names(g_main)[rev(order(BLUPs3_g))][1:20]
rank_list <- cbind(true_gv, top1_g, top2_g, top3_g) # top 20 genotypes selected from each model
colnames(rank_list) <- c("True rank", "Model 1", "Model 2", "Model 3")
rownames(rank_list) <- paste("Ind", 1:n_envs)
rank_list
#> True rank Model 1 Model 2 Model 3
#> Ind 1 "291" "291" "291" "291"
#> Ind 2 "231" "274" "274" "274"
#> Ind 3 "249" "249" "295" "295"
#> Ind 4 "325" "295" "231" "231"
#> Ind 5 "274" "231" "249" "325"
#> Ind 6 "380" "325" "325" "249"
#> Ind 7 "240" "240" "245" "240"
#> Ind 8 "295" "245" "240" "245"
#> Ind 9 "236" "385" "408" "408"
#> Ind 10 "245" "380" "380" "248"
#> Ind 11 "296" "248" "299" "299"
#> Ind 12 "237" "299" "248" "385"
#> Ind 13 "385" "408" "385" "380"
#> Ind 14 "373" "271" "372" "271"
#> Ind 15 "335" "279" "271" "343"
#> Ind 16 "271" "372" "343" "279"
#> Ind 17 "331" "332" "279" "372"
#> Ind 18 "332" "343" "367" "332"
#> Ind 19 "279" "236" "332" "367"
#> Ind 20 "297" "367" "287" "287"
tt <- data.frame(model = factor(rep(1:4, each = 20)),
id = c(true_gv, top1_g, top2_g, top3_g))
tt <- with(tt, table(id, model))
t(tt) %*% tt # top 20 genotypes in common between pairs of models
#> model
#> model 1 2 3 4
#> 1 20 14 13 13
#> 2 14 20 19 19
#> 3 13 19 20 20
#> 4 13 19 20 20
Reduce(intersect, list(top1_g, top2_g, top3_g)) # top 20 genotypes in common between all three models
#> [1] "291" "274" "249" "295" "231" "325" "240" "245" "385" "380" "248" "299"
#> [13] "408" "271" "279" "372" "332" "343" "367"
How similar are the top 20 genotypes from the three models in terms of the genotype main effects? Why would this be the case?
Compare the mean genetic value (genetic gain) and genetic variance of the top 20 genotypes selected from each model in terms of the genotype main effects.
round(mean(g_main[names(g_main) %in% true_gv]), digits = 2) # obtain genetic values for top 20 genos
#> [1] 4.49
round(mean(g_main[names(g_main) %in% top1_g]), digits = 2)
#> [1] 4.45
round(mean(g_main[names(g_main) %in% top2_g]), digits = 2)
#> [1] 4.44
round(mean(g_main[names(g_main) %in% top3_g]), digits = 2)
#> [1] 4.44
round(var(g_main[names(g_main) %in% true_gv]), digits = 2) # obtain genetic values for top 20 genos
#> [1] 0.02
round(var(g_main[names(g_main) %in% top1_g]), digits = 2)
#> [1] 0.03
round(var(g_main[names(g_main) %in% top2_g]), digits = 2)
#> [1] 0.03
round(var(g_main[names(g_main) %in% top3_g]), digits = 2)
#> [1] 0.03
Which model has produced the highest genetic gain and lowest genetic variance in terms of the genotype main effects? Is this what you would expect? If not, why might this be the case?
Next, compare the top genotype in each environment selected from each model. This may represent selection for specific adaptation within particular environments.
rownames(GV) <- levels(gv_df$id)
top_true <- top1_ge <- top2_ge <- top3_ge <- c()
for(i in 1:n_envs){
top_true[i] <- which(matrix(gv_df[gv_df$rep==1,4], ncol = n_envs)[,i] == max(matrix(gv_df[gv_df$rep==1,4], ncol = n_envs)[,i]))
top1_ge[i] <- which(matrix(BLUPs1_ge, ncol = n_envs)[,i] == max(matrix(BLUPs1_ge, ncol = n_envs)[,i]), arr.ind = TRUE)
top2_ge[i] <- which(matrix(BLUPs2_ge, ncol = n_envs)[,i] == max(matrix(BLUPs2_ge, ncol = n_envs)[,i]), arr.ind = TRUE)
top3_ge[i] <- which(matrix(BLUPs3_ge, ncol = n_envs)[,i] == max(matrix(BLUPs3_ge, ncol = n_envs)[,i]), arr.ind = TRUE)
}
top_list <- cbind(top_true, top1_ge, top2_ge, top3_ge)
colnames(top_list) <- c("True top", "Model 1", "Model 2", "Model 3")
rownames(top_list) <- paste("Env", 1:n_envs)
top_list
#> True top Model 1 Model 2 Model 3
#> Env 1 188 71 188 71
#> Env 2 21 71 184 28
#> Env 3 112 71 71 71
#> Env 4 156 71 71 71
#> Env 5 73 71 17 75
#> Env 6 71 71 11 71
#> Env 7 143 71 123 79
#> Env 8 71 71 71 71
#> Env 9 112 71 20 20
#> Env 10 16 71 152 54
#> Env 11 62 71 54 95
#> Env 12 28 71 25 25
#> Env 13 16 71 71 71
#> Env 14 71 71 17 17
#> Env 15 20 71 16 16
#> Env 16 165 71 79 165
#> Env 17 29 71 29 71
#> Env 18 167 71 105 105
#> Env 19 95 71 147 147
#> Env 20 195 71 104 75
table(top_true == top1_ge)[2] # match model 1 top genotype with true top genotype in each envrionment
#> TRUE
#> 3
table(top_true == top2_ge)[2] # match model 2 top genotype with true top genotype in each envrionment
#> TRUE
#> 3
table(top_true == top3_ge)[2] # match model 3 top genotype with true top genotype in each envrionment
#> TRUE
#> 3
How similar are the top genotypes in each environment from the three models? Why would this be the case?
Extra: Also simulate two different levels of GxE
interaction using the rand_cor_mat()
function with
between-environment genetic correlations ranging from 0.5 to 1 (low GxE)
and -1 to 1 (high GxE). Label these met_df2 and met_df3.
Now go and re-run models 1,2 and 3 for the other datasets (met_df2 and met_df3). What is the effect of changing the complexity of GxE? In particular, what happens to the prediction accuracies, selection and genetic gain.