The objective of this tutorial is to simulate a MET dataset using FieldSimR which captures GxE interaction and spatial variation with realistic structure and complexity. Genetic values will be simulated using AlphaSimR based on an additive genetic trait representing grain yield.


Summary

The simulation can be summarised by the following steps:

  1. Simulating genetic values with AlphaSimR
    1. Simulation parameters
    2. FieldSimR’s wrapper functions for simulating GxE interaction
      • Measures of variance explained

  2. Simulating plot errors with FieldSimR
    1. Simulation parameters
    2. field_trial_error() and plot_effects() functions

  1. Generating phenotypes by combining the plot errors with the genetic values
    1. make_phenotypes() function

  1. Model fitting in ASReml-R


Preliminaries

Clean the working environment.

rm(list = ls())

Load required packages.

library(FieldSimR)
library(AlphaSimR)
#> Loading required package: R6
library(asreml)
#> Loading required package: Matrix
#> Online License checked out Sat Feb 10 11:14:20 2024
#> Loading ASReml-R version 4.2
library(ggplot2)
library(mbend)

The exact results in this tutorial can be reproduced by initially setting the following seed.

set.seed(123)


Simulation details

Consider a scenario in which 200 inbred wheat genotypes are evaluated for grain yield (t/ha) in field trials across 20 growing environments. The genotypes are simulated with additive genetic relatedness using AlphaSimR. Each trial comprises 20 columns and 20 rows for 400 plots in total, with two complete blocks of all genotypes aligned in the column direction (side-by-side). The overall trait mean is 4 t/ha and the overall trait variance is 0.2 t2/ha2. The average plot-level heritability is 0.3, which reflects the average ratio of genetic variance to total phenotypic variance within environments. The error variance is given by 0.2/0.3-0.2 = 0.47 t2/ha2.


The simulation parameters are:

n_genos <- 200 # no. genotypes
n_envs <- 20   # no. environments
n_reps <- n_blocks <- 2 # no. replicates/blocks per environment
block_dir <- "col" # block direction
n_cols <- 20   # no. columns per environment
n_rows <- 20   # no. rows per environment
n_plots <- n_cols * n_rows # no. plots per environment == n_genos * n_reps
mu <- 4        # overall trait mean
sigma2 <- 0.2  # overall trait variance 
H2 <- 0.3      # average plot-level heritability
sigma2e <- sigma2/H2 - sigma2 # error variance
round(sigma2e, digits = 2)
#> [1] 0.47


The phenotypes are generated based on the linear mixed model:

\(\mathbf{y} = \mathbf{Z}\mathbf{gv} + \mathbf{e}\),

where \(\mathbf{gv}\) is a vector of genetic values with design matrix \(\mathbf{Z}\) and \(\mathbf{e}\) is a vector of plot errors. The inclusion of additional non-genetic effects is straightforward.

The simulation involves three steps:

  1. Simulating the genetic values with AlphaSimR, using FieldSimR’s wrapper functions
  2. Simulating the plot errors with FieldSimR
  3. Generating the phenotypes by combining the genetic values with the non-genetic effects and plot errors.


1. Simulating the genetic values with AlphaSimR

The genetic values are simulated to capture GxE interaction, which can be broadly categorised as either heterogeneity of genetic variance (changes in genotype scale) or lack of genetic correlation (changes in genotype rank).

Simulating the genetic values involves two parts:

  1. Setting the simulation parameters for the founder population and trait architecture
  2. Simulating an additive genetic trait using FieldSimR’s wrapper functions for GxE interaction.

FieldSimR provides wrapper functions for compound symmetry and unstructured models for GxE interaction.


a. Setting the simulation parameters

Set the number of homogeneous genotypes in the founder population to 20, the number of chromosomes to 20 and the number of segregating sites (biallelic QTN) per chromosome to 300.

n_founders <- 20 # Number of genotypes in the founder population.
n_chr <- 20 # Number of chromosomes.
n_seg_sites <- 300 # Number of QTN per chromosome.


The overall trait mean and variance are:

mu # trait mean
#> [1] 4
sigma2 # trait variance
#> [1] 0.2


b. Simulating an additive genetic trait

Simulate the founder population using the runMacs() function.

# Simulating a founder population using AlphaSimR's "WHEAT" presets to mimic the species' evolutionary history
founders <- runMacs(nInd = n_founders,
                    nChr = n_chr,
                    segSites = n_seg_sites,
                    inbred = TRUE,
                    species = "WHEAT",
                    nThreads = 2)
founders # MapPop object
#> An object of class "MapPop" 
#> Ploidy: 2 
#> Individuals: 20 
#> Chromosomes: 20 
#> Loci: 6000
SP <- SimParam$new(founders)


Simulate an additive genetic trait representing grain yield with AlphaSimR, using FieldSimR’s wrapper functions.


Unstructured model

The genetic values for the unstructured model are simulated as:

\(\mathbf{gv} \sim \mbox{N}(\mu\mathbf{1}, \mathbf{G_e} \otimes \mathbf{I})\),

where \(\mu\) is the overall trait mean and \(\mathbf{G_e}\) is the between-environment genetic covariance matrix, which is constructed as:

\(\mathbf{G_e} = \mathbf{D}^{1/2}_\mathbf{e}\mathbf{C_e}\mathbf{D}^{1/2}_\mathbf{e}\)

where \(\mathbf{D_e}\) is a diagonal genetic variance matrix and \(\mathbf{C_e}\) is a between-environment genetic correlation matrix.

Sample the genetic variances from an inverse gamma distribution, with shape parameter of 5 and scale parameter of 1. Shift the genetic variances to give an overall trait variance of 0.2.

de <- 1/rgamma(n_envs, shape = 5, rate = 1) # sample
de <- c(scale(de, center = TRUE, scale = FALSE) + sigma2) # shift
round(de, digits = 2)
#>  [1] 0.21 0.08 0.11 0.19 0.20 0.46 0.19 0.29 0.05 0.17 0.13 0.13 0.16 0.15 0.27
#> [16] 0.26 0.16 0.30 0.27 0.21
mean(de) # overall trait variance
#> [1] 0.2
De <- diag(de)


Sample the genetic correlations from a continuous uniform distribution ranging from 0.1 to 0.7, which produces a mean genetic correlation of 0.4. This is achieved using FieldSimR’s function rand_cor_mat(). When pos_def = TRUE, the correlation matrix is bent to ensure it is positive definite.

Ce <- rand_cor_mat(p = n_envs, min_cor = 0.1, max_cor = 0.7, pos_def = TRUE)
#> Unweighted bending
#> max.iter = 10000
#> small.positive = 1e-04
#> method = hj
#> Found a correlation matrix.
#> Convergence met after 7 iterations.
round(Ce, digits = 2)[1:10,1:10]
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#>  [1,] 1.00 0.59 0.37 0.53 0.52 0.46 0.50 0.47 0.44  0.22
#>  [2,] 0.59 1.00 0.28 0.33 0.61 0.37 0.58 0.47 0.21  0.44
#>  [3,] 0.37 0.28 1.00 0.26 0.54 0.52 0.21 0.57 0.44  0.38
#>  [4,] 0.53 0.33 0.26 1.00 0.34 0.42 0.27 0.58 0.51  0.36
#>  [5,] 0.52 0.61 0.54 0.34 1.00 0.24 0.37 0.58 0.12  0.24
#>  [6,] 0.46 0.37 0.52 0.42 0.24 1.00 0.47 0.48 0.58  0.52
#>  [7,] 0.50 0.58 0.21 0.27 0.37 0.47 1.00 0.48 0.25  0.43
#>  [8,] 0.47 0.47 0.57 0.58 0.58 0.48 0.48 1.00 0.50  0.46
#>  [9,] 0.44 0.21 0.44 0.51 0.12 0.58 0.25 0.50 1.00  0.31
#> [10,] 0.22 0.44 0.38 0.36 0.24 0.52 0.43 0.46 0.31  1.00

Populate a list of input parameters for AlphaSimR using FieldSimR’s wrapper function unstr_asr_input(). The list contains vectors for the mean and variance of the genetic values in each environment, as well as the between-environment genetic correlation matrix.

input_asr <- unstr_asr_input(n_traits = 1,
                             n_envs = n_envs,
                             mean = mu,
                             var = sigma2,
                             cor_A = Ce)
input_asr
#> $mean
#>  [1] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
#> 
#> $var
#>  [1] 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2
#> [20] 0.2
#> 
#> $cor_A
#>            [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
#>  [1,] 1.0000000 0.5865950 0.3747776 0.5312653 0.5209284 0.4579192 0.5037274
#>  [2,] 0.5865950 1.0000000 0.2788552 0.3333472 0.6104026 0.3671630 0.5778049
#>  [3,] 0.3747776 0.2788552 1.0000000 0.2573593 0.5368721 0.5163457 0.2145860
#>  [4,] 0.5312653 0.3333472 0.2573593 1.0000000 0.3431965 0.4221246 0.2651794
#>  [5,] 0.5209284 0.6104026 0.5368721 0.3431965 1.0000000 0.2419720 0.3721250
#>  [6,] 0.4579192 0.3671630 0.5163457 0.4221246 0.2419720 1.0000000 0.4679405
#>  [7,] 0.5037274 0.5778049 0.2145860 0.2651794 0.3721250 0.4679405 1.0000000
#>  [8,] 0.4723535 0.4717764 0.5713872 0.5844716 0.5836117 0.4825200 0.4829291
#>  [9,] 0.4374403 0.2137179 0.4440244 0.5050202 0.1248017 0.5790501 0.2481605
#> [10,] 0.2216919 0.4439445 0.3808778 0.3596939 0.2398324 0.5232349 0.4317380
#> [11,] 0.4272026 0.1951003 0.1510230 0.3004538 0.4694223 0.1358066 0.1464872
#> [12,] 0.5631774 0.6078177 0.2515479 0.5105382 0.4450509 0.1783966 0.6117654
#> [13,] 0.5653952 0.5668616 0.4661577 0.2702833 0.6856596 0.5862155 0.2998098
#> [14,] 0.2732173 0.4503076 0.6215478 0.2299637 0.5542515 0.3097469 0.2851664
#> [15,] 0.3556537 0.6190483 0.1626147 0.5634988 0.4944988 0.4315678 0.3724607
#> [16,] 0.5178988 0.4171787 0.3766877 0.4449417 0.4129186 0.4937052 0.3733435
#> [17,] 0.2958459 0.4980034 0.3673450 0.5650577 0.5446077 0.5202848 0.3226012
#> [18,] 0.5427328 0.5216488 0.3949568 0.4891362 0.3295607 0.4975671 0.2115585
#> [19,] 0.5474724 0.3028055 0.1833938 0.5393218 0.6099049 0.2506814 0.4865935
#> [20,] 0.5908551 0.3018725 0.5636187 0.4487172 0.4552799 0.3591105 0.3355753
#>            [,8]      [,9]     [,10]     [,11]     [,12]     [,13]     [,14]
#>  [1,] 0.4723535 0.4374403 0.2216919 0.4272026 0.5631774 0.5653952 0.2732173
#>  [2,] 0.4717764 0.2137179 0.4439445 0.1951003 0.6078177 0.5668616 0.4503076
#>  [3,] 0.5713872 0.4440244 0.3808778 0.1510230 0.2515479 0.4661577 0.6215478
#>  [4,] 0.5844716 0.5050202 0.3596939 0.3004538 0.5105382 0.2702833 0.2299637
#>  [5,] 0.5836117 0.1248017 0.2398324 0.4694223 0.4450509 0.6856596 0.5542515
#>  [6,] 0.4825200 0.5790501 0.5232349 0.1358066 0.1783966 0.5862155 0.3097469
#>  [7,] 0.4829291 0.2481605 0.4317380 0.1464872 0.6117654 0.2998098 0.2851664
#>  [8,] 1.0000000 0.5013918 0.4592016 0.1413567 0.3512898 0.4442442 0.3263808
#>  [9,] 0.5013918 1.0000000 0.3110177 0.1989468 0.3305566 0.3681091 0.2194992
#> [10,] 0.4592016 0.3110177 1.0000000 0.1407194 0.2458284 0.4223482 0.2254922
#> [11,] 0.1413567 0.1989468 0.1407194 1.0000000 0.2861474 0.3386404 0.4543756
#> [12,] 0.3512898 0.3305566 0.2458284 0.2861474 1.0000000 0.1708999 0.3021806
#> [13,] 0.4442442 0.3681091 0.4223482 0.3386404 0.1708999 1.0000000 0.4133189
#> [14,] 0.3263808 0.2194992 0.2254922 0.4543756 0.3021806 0.4133189 1.0000000
#> [15,] 0.5402087 0.1464343 0.3897020 0.1231351 0.1723089 0.3590156 0.4012039
#> [16,] 0.5799000 0.4942309 0.3425990 0.5116605 0.2865068 0.3742182 0.2594955
#> [17,] 0.4042298 0.3717029 0.5409229 0.3901456 0.5962697 0.4290102 0.2626102
#> [18,] 0.3560651 0.1800172 0.2850317 0.2702417 0.1846140 0.5022811 0.3309685
#> [19,] 0.4174559 0.3616561 0.3719966 0.5400123 0.4540245 0.6130500 0.2632928
#> [20,] 0.1767090 0.2744708 0.2570862 0.2847362 0.5081165 0.3981526 0.3222016
#>           [,15]     [,16]     [,17]     [,18]     [,19]     [,20]
#>  [1,] 0.3556537 0.5178988 0.2958459 0.5427328 0.5474724 0.5908551
#>  [2,] 0.6190483 0.4171787 0.4980034 0.5216488 0.3028055 0.3018725
#>  [3,] 0.1626147 0.3766877 0.3673450 0.3949568 0.1833938 0.5636187
#>  [4,] 0.5634988 0.4449417 0.5650577 0.4891362 0.5393218 0.4487172
#>  [5,] 0.4944988 0.4129186 0.5446077 0.3295607 0.6099049 0.4552799
#>  [6,] 0.4315678 0.4937052 0.5202848 0.4975671 0.2506814 0.3591105
#>  [7,] 0.3724607 0.3733435 0.3226012 0.2115585 0.4865935 0.3355753
#>  [8,] 0.5402087 0.5799000 0.4042298 0.3560651 0.4174559 0.1767090
#>  [9,] 0.1464343 0.4942309 0.3717029 0.1800172 0.3616561 0.2744708
#> [10,] 0.3897020 0.3425990 0.5409229 0.2850317 0.3719966 0.2570862
#> [11,] 0.1231351 0.5116605 0.3901456 0.2702417 0.5400123 0.2847362
#> [12,] 0.1723089 0.2865068 0.5962697 0.1846140 0.4540245 0.5081165
#> [13,] 0.3590156 0.3742182 0.4290102 0.5022811 0.6130500 0.3981526
#> [14,] 0.4012039 0.2594955 0.2626102 0.3309685 0.2632928 0.3222016
#> [15,] 1.0000000 0.3750602 0.3810373 0.3941801 0.2692220 0.1112151
#> [16,] 0.3750602 1.0000000 0.5465220 0.4274354 0.3959432 0.4482510
#> [17,] 0.3810373 0.5465220 1.0000000 0.2913456 0.4536348 0.5029859
#> [18,] 0.3941801 0.4274354 0.2913456 1.0000000 0.1529263 0.4623088
#> [19,] 0.2692220 0.3959432 0.4536348 0.1529263 1.0000000 0.5143422
#> [20,] 0.1112151 0.4482510 0.5029859 0.4623088 0.5143422 1.0000000


Simulate an additive genetic trait representing grain yield. This produces 20 columns in the founder population. The columns contain the genetic values for each environment.

SP$addTraitA(nQtlPerChr = n_seg_sites,
             mean = input_asr$mean,
             var = input_asr$var,
             corA = input_asr$cor_A)
founders <- newPop(founders) # create pop-class object
head(founders@gv)
#>        Trait1   Trait2   Trait3   Trait4   Trait5   Trait6   Trait7   Trait8
#> [1,] 3.662319 3.327789 3.626410 3.060205 3.782182 3.061413 2.601274 3.257708
#> [2,] 3.609349 4.689218 4.006780 4.744841 4.529037 4.320794 4.561244 4.716394
#> [3,] 3.876705 3.977862 3.718663 3.664568 4.181587 4.057004 4.146229 3.933701
#> [4,] 4.428944 4.058932 4.378631 4.441817 4.709719 4.535052 4.097166 4.383028
#> [5,] 3.682564 2.979650 4.258279 4.455997 4.004899 4.284199 4.121226 4.195624
#> [6,] 3.878690 4.026717 4.875523 4.110571 4.786910 4.430079 3.864503 4.704515
#>        Trait9  Trait10  Trait11  Trait12  Trait13  Trait14  Trait15  Trait16
#> [1,] 3.294190 2.849119 4.069572 3.042133 3.722130 3.587591 3.106844 3.178407
#> [2,] 4.080286 4.298731 3.471209 4.470825 3.997283 4.318946 5.194595 4.335121
#> [3,] 3.865454 3.675572 4.607009 3.760303 3.724924 3.944731 4.204465 4.287346
#> [4,] 3.878202 4.186585 4.480847 4.121653 4.617747 4.498126 4.374674 4.208419
#> [5,] 4.606708 4.019972 5.156479 4.018475 3.457132 4.327196 3.682245 4.716456
#> [6,] 4.464547 3.924118 3.846259 3.929079 4.108051 4.617361 4.451845 4.119595
#>       Trait17  Trait18  Trait19  Trait20
#> [1,] 3.080543 3.807549 3.002744 3.287518
#> [2,] 4.758906 4.290912 4.076275 3.903677
#> [3,] 4.068368 3.796067 3.635399 3.488543
#> [4,] 4.361450 4.457638 4.585593 4.305124
#> [5,] 4.514736 3.199222 4.795258 4.263101
#> [6,] 4.284648 3.924236 3.581546 3.958096


Generate the wheat genotypes by randomly crossing the founders using the randCross() function, and then make doubled haploid (DH) lines using the makeDH() function.

f1s <- randCross(pop = founders, 
                 nCrosses = 20, # no. crosses
                 nProgeny = 10) # no. progeny per cross
DHs <- makeDH(pop = f1s, nDH = 1)
DHs
#> An object of class "Pop" 
#> Ploidy: 2 
#> Individuals: 200 
#> Chromosomes: 20 
#> Loci: 6000 
#> Traits: 20


Obtain the genetic values for the DH population using FieldSimR’s wrapper function unstr_asr_output().

gv_df <- unstr_asr_output(pop = DHs,
                          n_traits = 1,
                          n_envs = n_envs,
                          n_reps = n_reps)
head(gv_df)
#>   env rep  id gv.Trait1
#> 1   1   1 221  3.202680
#> 2   1   1 222  3.829752
#> 3   1   1 223  4.237403
#> 4   1   1 224  3.416975
#> 5   1   1 225  3.409848
#> 6   1   1 226  3.595984
GV <- matrix(gv_df$gv.Trait1[gv_df$rep == 1], ncol = n_envs)
round(mean(GV), digits = 2) # overall trait mean in the DH population
#> [1] 4
round(cov2cor(var(GV)), digits = 2) # simulated between-environment genetic correlation matrix in the DH population
#>       [,1]  [,2] [,3] [,4] [,5] [,6]  [,7]  [,8] [,9] [,10] [,11] [,12] [,13]
#>  [1,] 1.00  0.40 0.24 0.46 0.39 0.41  0.39  0.37 0.48  0.05  0.34  0.53  0.56
#>  [2,] 0.40  1.00 0.16 0.32 0.56 0.25  0.54  0.49 0.23  0.46 -0.20  0.66  0.63
#>  [3,] 0.24  0.16 1.00 0.23 0.59 0.58  0.07  0.60 0.54  0.35  0.25  0.12  0.31
#>  [4,] 0.46  0.32 0.23 1.00 0.29 0.47  0.39  0.60 0.52  0.34  0.16  0.56  0.25
#>  [5,] 0.39  0.56 0.59 0.29 1.00 0.34  0.28  0.60 0.22  0.21  0.36  0.40  0.60
#>  [6,] 0.41  0.25 0.58 0.47 0.34 1.00  0.45  0.63 0.64  0.52  0.30  0.19  0.56
#>  [7,] 0.39  0.54 0.07 0.39 0.28 0.45  1.00  0.58 0.39  0.53  0.10  0.63  0.38
#>  [8,] 0.37  0.49 0.60 0.60 0.60 0.63  0.58  1.00 0.65  0.56  0.19  0.38  0.47
#>  [9,] 0.48  0.23 0.54 0.52 0.22 0.64  0.39  0.65 1.00  0.39  0.21  0.41  0.38
#> [10,] 0.05  0.46 0.35 0.34 0.21 0.52  0.53  0.56 0.39  1.00 -0.04  0.27  0.45
#> [11,] 0.34 -0.20 0.25 0.16 0.36 0.30  0.10  0.19 0.21 -0.04  1.00  0.02  0.19
#> [12,] 0.53  0.66 0.12 0.56 0.40 0.19  0.63  0.38 0.41  0.27  0.02  1.00  0.31
#> [13,] 0.56  0.63 0.31 0.25 0.60 0.56  0.38  0.47 0.38  0.45  0.19  0.31  1.00
#> [14,] 0.03  0.37 0.59 0.30 0.50 0.44  0.30  0.56 0.32  0.41  0.35  0.19  0.34
#> [15,] 0.14  0.68 0.27 0.56 0.56 0.40  0.40  0.66 0.20  0.47 -0.08  0.30  0.37
#> [16,] 0.46  0.13 0.58 0.39 0.47 0.67  0.32  0.63 0.61  0.28  0.55  0.17  0.35
#> [17,] 0.16  0.41 0.50 0.46 0.64 0.55  0.26  0.45 0.36  0.46  0.22  0.50  0.39
#> [18,] 0.57  0.48 0.29 0.43 0.40 0.36  0.15  0.27 0.11  0.09  0.13  0.28  0.49
#> [19,] 0.49  0.17 0.00 0.52 0.29 0.32  0.60  0.37 0.40  0.37  0.44  0.43  0.54
#> [20,] 0.46 -0.02 0.43 0.27 0.26 0.21 -0.02 -0.06 0.20 -0.02  0.13  0.30  0.16
#>       [,14] [,15] [,16] [,17] [,18] [,19] [,20]
#>  [1,]  0.03  0.14  0.46  0.16  0.57  0.49  0.46
#>  [2,]  0.37  0.68  0.13  0.41  0.48  0.17 -0.02
#>  [3,]  0.59  0.27  0.58  0.50  0.29  0.00  0.43
#>  [4,]  0.30  0.56  0.39  0.46  0.43  0.52  0.27
#>  [5,]  0.50  0.56  0.47  0.64  0.40  0.29  0.26
#>  [6,]  0.44  0.40  0.67  0.55  0.36  0.32  0.21
#>  [7,]  0.30  0.40  0.32  0.26  0.15  0.60 -0.02
#>  [8,]  0.56  0.66  0.63  0.45  0.27  0.37 -0.06
#>  [9,]  0.32  0.20  0.61  0.36  0.11  0.40  0.20
#> [10,]  0.41  0.47  0.28  0.46  0.09  0.37 -0.02
#> [11,]  0.35 -0.08  0.55  0.22  0.13  0.44  0.13
#> [12,]  0.19  0.30  0.17  0.50  0.28  0.43  0.30
#> [13,]  0.34  0.37  0.35  0.39  0.49  0.54  0.16
#> [14,]  1.00  0.56  0.30  0.30  0.22  0.14 -0.11
#> [15,]  0.56  1.00  0.26  0.44  0.35  0.12 -0.15
#> [16,]  0.30  0.26  1.00  0.54  0.30  0.35  0.37
#> [17,]  0.30  0.44  0.54  1.00  0.24  0.27  0.41
#> [18,]  0.22  0.35  0.30  0.24  1.00  0.10  0.41
#> [19,]  0.14  0.12  0.35  0.27  0.10  1.00  0.24
#> [20,] -0.11 -0.15  0.37  0.41  0.41  0.24  1.00
round(Ce, digits = 2)[1:20, 1:20] # expected between-environment genetic correlation matrix
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
#>  [1,] 1.00 0.59 0.37 0.53 0.52 0.46 0.50 0.47 0.44  0.22  0.43  0.56  0.57
#>  [2,] 0.59 1.00 0.28 0.33 0.61 0.37 0.58 0.47 0.21  0.44  0.20  0.61  0.57
#>  [3,] 0.37 0.28 1.00 0.26 0.54 0.52 0.21 0.57 0.44  0.38  0.15  0.25  0.47
#>  [4,] 0.53 0.33 0.26 1.00 0.34 0.42 0.27 0.58 0.51  0.36  0.30  0.51  0.27
#>  [5,] 0.52 0.61 0.54 0.34 1.00 0.24 0.37 0.58 0.12  0.24  0.47  0.45  0.69
#>  [6,] 0.46 0.37 0.52 0.42 0.24 1.00 0.47 0.48 0.58  0.52  0.14  0.18  0.59
#>  [7,] 0.50 0.58 0.21 0.27 0.37 0.47 1.00 0.48 0.25  0.43  0.15  0.61  0.30
#>  [8,] 0.47 0.47 0.57 0.58 0.58 0.48 0.48 1.00 0.50  0.46  0.14  0.35  0.44
#>  [9,] 0.44 0.21 0.44 0.51 0.12 0.58 0.25 0.50 1.00  0.31  0.20  0.33  0.37
#> [10,] 0.22 0.44 0.38 0.36 0.24 0.52 0.43 0.46 0.31  1.00  0.14  0.25  0.42
#> [11,] 0.43 0.20 0.15 0.30 0.47 0.14 0.15 0.14 0.20  0.14  1.00  0.29  0.34
#> [12,] 0.56 0.61 0.25 0.51 0.45 0.18 0.61 0.35 0.33  0.25  0.29  1.00  0.17
#> [13,] 0.57 0.57 0.47 0.27 0.69 0.59 0.30 0.44 0.37  0.42  0.34  0.17  1.00
#> [14,] 0.27 0.45 0.62 0.23 0.55 0.31 0.29 0.33 0.22  0.23  0.45  0.30  0.41
#> [15,] 0.36 0.62 0.16 0.56 0.49 0.43 0.37 0.54 0.15  0.39  0.12  0.17  0.36
#> [16,] 0.52 0.42 0.38 0.44 0.41 0.49 0.37 0.58 0.49  0.34  0.51  0.29  0.37
#> [17,] 0.30 0.50 0.37 0.57 0.54 0.52 0.32 0.40 0.37  0.54  0.39  0.60  0.43
#> [18,] 0.54 0.52 0.39 0.49 0.33 0.50 0.21 0.36 0.18  0.29  0.27  0.18  0.50
#> [19,] 0.55 0.30 0.18 0.54 0.61 0.25 0.49 0.42 0.36  0.37  0.54  0.45  0.61
#> [20,] 0.59 0.30 0.56 0.45 0.46 0.36 0.34 0.18 0.27  0.26  0.28  0.51  0.40
#>       [,14] [,15] [,16] [,17] [,18] [,19] [,20]
#>  [1,]  0.27  0.36  0.52  0.30  0.54  0.55  0.59
#>  [2,]  0.45  0.62  0.42  0.50  0.52  0.30  0.30
#>  [3,]  0.62  0.16  0.38  0.37  0.39  0.18  0.56
#>  [4,]  0.23  0.56  0.44  0.57  0.49  0.54  0.45
#>  [5,]  0.55  0.49  0.41  0.54  0.33  0.61  0.46
#>  [6,]  0.31  0.43  0.49  0.52  0.50  0.25  0.36
#>  [7,]  0.29  0.37  0.37  0.32  0.21  0.49  0.34
#>  [8,]  0.33  0.54  0.58  0.40  0.36  0.42  0.18
#>  [9,]  0.22  0.15  0.49  0.37  0.18  0.36  0.27
#> [10,]  0.23  0.39  0.34  0.54  0.29  0.37  0.26
#> [11,]  0.45  0.12  0.51  0.39  0.27  0.54  0.28
#> [12,]  0.30  0.17  0.29  0.60  0.18  0.45  0.51
#> [13,]  0.41  0.36  0.37  0.43  0.50  0.61  0.40
#> [14,]  1.00  0.40  0.26  0.26  0.33  0.26  0.32
#> [15,]  0.40  1.00  0.38  0.38  0.39  0.27  0.11
#> [16,]  0.26  0.38  1.00  0.55  0.43  0.40  0.45
#> [17,]  0.26  0.38  0.55  1.00  0.29  0.45  0.50
#> [18,]  0.33  0.39  0.43  0.29  1.00  0.15  0.46
#> [19,]  0.26  0.27  0.40  0.45  0.15  1.00  0.51
#> [20,]  0.32  0.11  0.45  0.50  0.46  0.51  1.00
round(mean(diag(var(GV))), digits = 2) # overall trait variance in the DH population
#> [1] 0.22


2. Simulating the plot errors with FieldSimR

Simulate the plot errors as the sum of spatial, random and extraneous error terms. Sample the proportions of spatial and extraneous error variance from a continuous uniform distribution ranging from 0.2 to 0.6 and from 0 to 0.2, respectively. This produces a mean proportion of spatial error variance of 0.5 and a mean proportion of extraneous error variance of 0.1. FieldSimR will then allocate the remaining error variance to the random errors by default.

The column and row autocorrelations are sampled from a continuous uniform distribution ranging from 0.3 to 0.7 and from 0.5 to 0.9, respectively. This produces a mean column autocorrelation of 0.5 and a mean row autocorrelation of 0.7.

The extraneous errors are simulated based on zig-zag ordering between neighbouring rows.

The initial simulation parameters are:

spatial_model <- "AR1:AR1"
round(sigma2e, digits = 2) # error variance
#> [1] 0.47
prop_spatial <- runif(n_envs, min = 0.2, max = 0.6) # proportions of spatial error variance
round(prop_spatial, digits = 2)
#>  [1] 0.33 0.56 0.48 0.49 0.27 0.41 0.45 0.29 0.23 0.35 0.22 0.59 0.21 0.37 0.52
#> [16] 0.50 0.43 0.54 0.29 0.48
prop_ext <- runif(n_envs, min = 0, max = 0.2) # proportions of extraneous error variance
round(prop_ext, digits = 2)
#>  [1] 0.11 0.19 0.06 0.09 0.08 0.10 0.03 0.10 0.16 0.14 0.15 0.18 0.00 0.05 0.19
#> [16] 0.17 0.20 0.10 0.08 0.13
round(1 - (prop_spatial + prop_ext), digits = 2) # proportions of random error variance
#>  [1] 0.56 0.25 0.46 0.42 0.65 0.49 0.52 0.60 0.61 0.50 0.63 0.23 0.78 0.58 0.29
#> [16] 0.33 0.37 0.36 0.63 0.39
col_cor <- runif(n_envs, min = 0.3, max = 0.7) # column autocorrelations
row_cor <- runif(n_envs, min = 0.5, max = 0.9) # row autocorrelations
ext_ord <- "zig-zag" # ordering of extraneous variation
ext_dir <- "row"

Populate the field_trial_error() function with the terms above. A data frame is returned containing the environment, block, column and row numbers as well as the simulated plot error. When return_effects = TRUE, a list is returned with an additional data frame containing the spatial, random and extraneous error terms.

error_df <- field_trial_error(n_traits = 1,
                              n_envs = n_envs,
                              n_blocks = n_blocks,
                              block_dir = block_dir,
                              n_cols = n_cols,
                              n_rows = n_rows,
                              var_R = sigma2e,
                              prop_spatial = prop_spatial,
                              spatial_model = spatial_model,
                              col_cor = col_cor,
                              row_cor = row_cor,
                              prop_ext = prop_ext,
                              ext_ord = ext_ord,
                              ext_dir = ext_dir,
                              return_effects = TRUE)
head(error_df$plot_df) # data frame with simulated plot errors
#>   env block col row    e.Trait1
#> 1   1     1   1   1 -0.38134595
#> 2   1     1   1   2 -0.51062922
#> 3   1     1   1   3  0.25215211
#> 4   1     1   1   4  0.04483486
#> 5   1     1   1   5 -0.58427151
#> 6   1     1   1   6  1.27038179
head(error_df$Trait1) # data frame with simulated spatial, random and extraneous errors <--
#>   env block col row     e.spat      e.rand e.ext.col   e.ext.row
#> 1   1     1   1   1 -0.1855081 -0.13772252         0 -0.05811535
#> 2   1     1   1   2 -0.5272642 -0.07370537         0  0.09034033
#> 3   1     1   1   3 -0.1019815  0.45984631         0 -0.10571273
#> 4   1     1   1   4  0.1973199 -0.21359287         0  0.06110786
#> 5   1     1   1   5  0.7554173 -0.97253173         0 -0.36715705
#> 6   1     1   1   6  0.4808541  0.48525221         0  0.30427547


Display the separate error terms using FieldSimR’s function plot_effects().

plot_effects(error_df$Trait1[error_df$Trait1$env == 1,], effect = "e.spat", labels = TRUE)


The total plot errors for environment 1 are given by:

plot_effects(error_df$plot_df[error_df$Trait1$env == 1,], effect = "e.Trait1", labels = TRUE)


Display the theoretical and sample variograms using FieldSimR’s functions theoretical_variogram() and sample_variogram().

theoretical_variogram(n_cols = n_cols, n_rows = n_rows, var_R = sigma2e, prop_spatial = 1, col_cor = 0.5, row_cor = 0.7)

sample_variogram(error_df$plot_df[error_df$Trait1$env == 1,], effect = "e.Trait1")


3. Generating the phenotypes

Generate the phenotypes using FieldSimR’s function make_phenotypes(). When randomise = TRUE, genotypes are allocated to plots using a randomised complete block design.

met_df <- make_phenotypes(gv_df = gv_df, error_df = error_df, randomise = TRUE)
head(met_df)
#>   env block col row  id phe.Trait1
#> 1   1     1   1   1 315   3.679432
#> 2   1     1   1   2 300   3.401802
#> 3   1     1   1   3 318   4.269240
#> 4   1     1   1   4 326   3.882307
#> 5   1     1   1   5 402   3.785814
#> 6   1     1   1   6 286   4.977517


Display the phenotypes.

plot_effects(met_df[met_df$env == 1,], effect = "phe.Trait1", labels = TRUE)


4. Model fitting

Run a linear mixed model analysis with ASReml-R and compare the prediction accuracies between models.

# main effects only model, and AR1 + ID residual
asr1 <- asreml(phe.Trait1 ~ env, # fixed environment main effects
               random = ~ id + diag(env):block + diag(env):units, # random genotype main effects and block effects plus ID error (units)
               residual = ~ dsum(~ar1(col):ar1(row)|env), # separable AR1 process for each environment
               data = met_df,
               workspace = "1gb")
#> Warning in asreml(phe.Trait1 ~ env, random = ~id + diag(env):block +
#> diag(env):units, : Warning : Log-likelihood decreased to -2585.90; trying with
#> reduced updates 0.735258
#> Warning in asreml(phe.Trait1 ~ env, random = ~id + diag(env):block +
#> diag(env):units, : Log-likelihood not converged
#> Warning in asreml(phe.Trait1 ~ env, random = ~id + diag(env):block +
#> diag(env):units, : Some components changed by more than 1% on the last
#> iteration
asr1 <- update(asr1) # update the model
#> Warning in asreml(fixed = phe.Trait1 ~ env, random = ~id + diag(env):block + :
#> Log-likelihood not converged
#> Warning in asreml(fixed = phe.Trait1 ~ env, random = ~id + diag(env):block + :
#> Some components changed by more than 1% on the last iteration
BLUPs1_g <- asr1$coefficients$random[grep("id", rownames(asr1$coefficients$random)),] # predicted genotype main effects 
# prediction accuracy = correlation between true and predicted effects
# prediction accuracy of genotype main effects
g_main <- with(gv_df[gv_df$rep == 1,], tapply(gv.Trait1, id, mean)) # true genotype averages across environments
cor(BLUPs1_g, g_main) 
#> [1] 0.9522539
# prediction accuracy of genotype effects within environments
BLUPs1_ge <- rep(BLUPs1_g, n_envs)
# Note that this model does not fit separate genotype effects for each environment. It only fits genotype main effects. Thus, these predictions are obtained by repeating the predicted genotype main effects across all environments.
hist(diag(cor(matrix(BLUPs1_ge, ncol = n_envs), GV)), main = "Accuracy per one environment") 
abline(v = mean(diag(cor(matrix(BLUPs1_ge, ncol = n_envs), GV))), col="blue", lwd=3, lty=1)


# compound symmetry model, and AR1 + ID residual
asr2 <- asreml(phe.Trait1 ~ env, # fixed environment main effects
               random = ~ id + env:id + diag(env):block + diag(env):units, # random genotype main effects and GxE interaction effects, block effects plus ID error (units)
               residual = ~ dsum(~ar1(col):ar1(row)|env), # separable AR1 process for each environment
               data = met_df,
               workspace = "1gb")
#> Warning in asreml(phe.Trait1 ~ env, random = ~id + env:id + diag(env):block + :
#> Warning : Log-likelihood decreased to -2321.83; trying with reduced updates
#> 0.667756
#> Warning in asreml(phe.Trait1 ~ env, random = ~id + env:id + diag(env):block + :
#> Log-likelihood not converged
#> Warning in asreml(phe.Trait1 ~ env, random = ~id + env:id + diag(env):block + :
#> Some components changed by more than 1% on the last iteration
asr2 <- update(asr2) # update the model
BLUPs2_g <- asr2$coefficients$random[grep("^id", rownames(asr2$coefficients$random)),] # predicted genotype main effects
BLUPs2_gxe <- asr2$coefficients$random[grep("env.*id", rownames(asr2$coefficients$random)),] # predicted GxE interaction effects
# prediction accuracy = correlation between true and predicted effects
# prediction accuracy of genotype main effects
cor(BLUPs2_g, g_main) 
#> [1] 0.9537427
# prediction accuracy of genotype effects within environments
BLUPs2_ge <- rep(BLUPs2_g) + BLUPs2_gxe # construct genotype effects within environments
hist(diag(cor(matrix(BLUPs2_ge, ncol = n_envs), GV)), main = "Accuracy per one environment")
abline(v = mean(diag(cor(matrix(BLUPs2_ge, ncol = n_envs), GV))), col="blue", lwd=3, lty=1)


# factor analytic model of order 3, and AR1 + ID residual
asr3 <- asreml(phe.Trait1 ~ env, # fixed environment main effects
               random = ~ rr(env, 3):id + diag(env):id + diag(env):block + diag(env):units, # random genotype effects within environments, block effects plus ID error (units)
               # the factor analytic model is fitted as a reduced rank matrix (rr(env), these are the lambdas)
               # and a diagonal matrix (diag(env) these are the psi's)
               residual = ~ dsum(~ar1(col):ar1(row)|env), # separable AR1 process for each environment
               data = met_df, workspace = "2gb")
#> Warning in asreml(phe.Trait1 ~ env, random = ~rr(env, 3):id + diag(env):id + :
#> Warning : Log-likelihood decreased to -2236.65; trying with reduced updates
#> 0.487305
#> Warning in asreml(phe.Trait1 ~ env, random = ~rr(env, 3):id + diag(env):id + :
#> Log-likelihood not converged
#> Warning in asreml(phe.Trait1 ~ env, random = ~rr(env, 3):id + diag(env):id + :
#> Some components changed by more than 1% on the last iteration
asr3 <- update(asr3) # update the model
#> Warning in asreml(fixed = phe.Trait1 ~ env, random = ~rr(env, 3):id +
#> diag(env):id + : Log-likelihood not converged
#> Warning in asreml(fixed = phe.Trait1 ~ env, random = ~rr(env, 3):id +
#> diag(env):id + : Some components changed by more than 1% on the last iteration
BLUPs3_ge <- asr3$coefficients$random[grep("rr.*env.*id", rownames(asr3$coefficients$random)),][1:(n_genos*n_envs)] + 
                asr3$coefficients$random[grep("^env.*id", rownames(asr3$coefficients$random)),] # predicted genotype effects within environments
# prediction accuracy = correlation between true and predicted effects
# prediction accuracy of genotype main effects
BLUPs3_g <- rowMeans(matrix(BLUPs3_ge, ncol = n_envs))
# Note that this model does not fit genotype main effects, so these are obtained as simple average across environments
cor(BLUPs3_g, g_main) 
#> [1] 0.9532985
# prediction accuracy of genotype effects within environments
hist(diag(cor(matrix(BLUPs3_ge, ncol = n_envs), GV)), main = "Accuracy per one environment")
abline(v = mean(diag(cor(matrix(BLUPs3_ge, ncol = n_envs), GV))), col="blue", lwd=3, lty=1)


What model produced the highest prediction accuracies for the genotype main effects? What about for the genotype effects within environments? Why does model 1 do poorly for the latter?


Compare the top 20 genotypes selected from each model in terms of the genotype main effects. This may represent selection for broad adaptation across all environments.

true_gv <- names(g_main)[rev(order(g_main))][1:20]  # sort based on true genotype effects and take top 20 genos
top1_g <- names(g_main)[rev(order(BLUPs1_g))][1:20] # sort based on BLUPs and take top 20 genos
top2_g <- names(g_main)[rev(order(BLUPs2_g))][1:20]
top3_g <- names(g_main)[rev(order(BLUPs3_g))][1:20]
rank_list <- cbind(true_gv, top1_g, top2_g, top3_g) # top 20 genotypes selected from each model
colnames(rank_list) <- c("True rank", "Model 1", "Model 2", "Model 3")
rownames(rank_list) <- paste("Ind", 1:n_envs)
rank_list
#>        True rank Model 1 Model 2 Model 3
#> Ind 1  "291"     "291"   "291"   "291"  
#> Ind 2  "231"     "274"   "274"   "274"  
#> Ind 3  "249"     "249"   "295"   "295"  
#> Ind 4  "325"     "295"   "231"   "231"  
#> Ind 5  "274"     "231"   "249"   "325"  
#> Ind 6  "380"     "325"   "325"   "249"  
#> Ind 7  "240"     "240"   "245"   "240"  
#> Ind 8  "295"     "245"   "240"   "245"  
#> Ind 9  "236"     "385"   "408"   "408"  
#> Ind 10 "245"     "380"   "380"   "248"  
#> Ind 11 "296"     "248"   "299"   "299"  
#> Ind 12 "237"     "299"   "248"   "385"  
#> Ind 13 "385"     "408"   "385"   "380"  
#> Ind 14 "373"     "271"   "372"   "271"  
#> Ind 15 "335"     "279"   "271"   "343"  
#> Ind 16 "271"     "372"   "343"   "279"  
#> Ind 17 "331"     "332"   "279"   "372"  
#> Ind 18 "332"     "343"   "367"   "332"  
#> Ind 19 "279"     "236"   "332"   "367"  
#> Ind 20 "297"     "367"   "287"   "287"
tt <- data.frame(model = factor(rep(1:4, each = 20)),
                 id = c(true_gv, top1_g, top2_g, top3_g))
tt <- with(tt, table(id, model))
t(tt) %*% tt # top 20 genotypes in common between pairs of models
#>      model
#> model  1  2  3  4
#>     1 20 14 13 13
#>     2 14 20 19 19
#>     3 13 19 20 20
#>     4 13 19 20 20
Reduce(intersect, list(top1_g, top2_g, top3_g)) # top 20 genotypes in common between all three models
#>  [1] "291" "274" "249" "295" "231" "325" "240" "245" "385" "380" "248" "299"
#> [13] "408" "271" "279" "372" "332" "343" "367"


How similar are the top 20 genotypes from the three models in terms of the genotype main effects? Why would this be the case?


Compare the mean genetic value (genetic gain) and genetic variance of the top 20 genotypes selected from each model in terms of the genotype main effects.

round(mean(g_main[names(g_main) %in% true_gv]), digits = 2) # obtain genetic values for top 20 genos
#> [1] 4.49
round(mean(g_main[names(g_main) %in% top1_g]), digits = 2) 
#> [1] 4.45
round(mean(g_main[names(g_main) %in% top2_g]), digits = 2)
#> [1] 4.44
round(mean(g_main[names(g_main) %in% top3_g]), digits = 2)
#> [1] 4.44
round(var(g_main[names(g_main) %in% true_gv]), digits = 2) # obtain genetic values for top 20 genos
#> [1] 0.02
round(var(g_main[names(g_main) %in% top1_g]), digits = 2) 
#> [1] 0.03
round(var(g_main[names(g_main) %in% top2_g]), digits = 2)
#> [1] 0.03
round(var(g_main[names(g_main) %in% top3_g]), digits = 2)
#> [1] 0.03

Which model has produced the highest genetic gain and lowest genetic variance in terms of the genotype main effects? Is this what you would expect? If not, why might this be the case?


Next, compare the top genotype in each environment selected from each model. This may represent selection for specific adaptation within particular environments.

rownames(GV) <- levels(gv_df$id)
top_true <- top1_ge <- top2_ge <- top3_ge <- c()
for(i in 1:n_envs){
  top_true[i] <- which(matrix(gv_df[gv_df$rep==1,4], ncol = n_envs)[,i] == max(matrix(gv_df[gv_df$rep==1,4], ncol = n_envs)[,i]))
  top1_ge[i] <- which(matrix(BLUPs1_ge, ncol = n_envs)[,i] == max(matrix(BLUPs1_ge, ncol = n_envs)[,i]), arr.ind = TRUE)
  top2_ge[i] <- which(matrix(BLUPs2_ge, ncol = n_envs)[,i] == max(matrix(BLUPs2_ge, ncol = n_envs)[,i]), arr.ind = TRUE)
  top3_ge[i] <- which(matrix(BLUPs3_ge, ncol = n_envs)[,i] == max(matrix(BLUPs3_ge, ncol = n_envs)[,i]), arr.ind = TRUE)
}
top_list <- cbind(top_true, top1_ge, top2_ge, top3_ge)
colnames(top_list) <- c("True top", "Model 1", "Model 2", "Model 3")
rownames(top_list) <- paste("Env", 1:n_envs)
top_list
#>        True top Model 1 Model 2 Model 3
#> Env 1       188      71     188      71
#> Env 2        21      71     184      28
#> Env 3       112      71      71      71
#> Env 4       156      71      71      71
#> Env 5        73      71      17      75
#> Env 6        71      71      11      71
#> Env 7       143      71     123      79
#> Env 8        71      71      71      71
#> Env 9       112      71      20      20
#> Env 10       16      71     152      54
#> Env 11       62      71      54      95
#> Env 12       28      71      25      25
#> Env 13       16      71      71      71
#> Env 14       71      71      17      17
#> Env 15       20      71      16      16
#> Env 16      165      71      79     165
#> Env 17       29      71      29      71
#> Env 18      167      71     105     105
#> Env 19       95      71     147     147
#> Env 20      195      71     104      75
table(top_true == top1_ge)[2] # match model 1 top genotype with true top genotype in each envrionment 
#> TRUE 
#>    3
table(top_true == top2_ge)[2] # match model 2 top genotype with true top genotype in each envrionment 
#> TRUE 
#>    3
table(top_true == top3_ge)[2] # match model 3 top genotype with true top genotype in each envrionment 
#> TRUE 
#>    3


How similar are the top genotypes in each environment from the three models? Why would this be the case?


Extra: Also simulate two different levels of GxE interaction using the rand_cor_mat() function with between-environment genetic correlations ranging from 0.5 to 1 (low GxE) and -1 to 1 (high GxE). Label these met_df2 and met_df3.

Now go and re-run models 1,2 and 3 for the other datasets (met_df2 and met_df3). What is the effect of changing the complexity of GxE? In particular, what happens to the prediction accuracies, selection and genetic gain.