Chapter 2. Simulation models
The computer simulation models GeneFlow [Princée, 1988, 1989b, 1995b], ChromoFlow and GsPed [Princée, 1995b] are used in this study. The GeneFlow and ChromoFlow models use Monte Carlo methods to estimate various measures of genetic variation that have been retained in hypothetical or real populations of diploid species for which pedigree data are available. The GsPed model is used to generate hypothetical pedigrees that can be analysed by both GeneFlow and ChromoFlow. The GeneFlow model Princée [1988] simulates the transmission of up to 40 independent autosomal loci. The original model was developed in 1985 to study genetic processes in zoo red pandas, Ailurus fulgens. Although the GeneFlow model has no limitations regarding numbers of loci, numbers of alleles per loci and pedigree size, practical limitations have been imposed by the capabilities of computers that are available for in-house use by zoo managers. For example, the first version of the model only allowed definition of two alleles per locus and was restricted to the computation of gene diversity. Later versions of this model that have been used in studies on e.g. red pandas and Przewalski's horses, Equus przewalskii, [Princée, 1989b, 1990] allow for more simulation parameters and results. The GeneFlow model is used in this study to evaluate effects of mating structure on genetic loss [Chapter 6; Princée, 1995b]. Specifications of the various GeneFlow versions are listed in table 2.
The ChromoFlow model has been developed as part of this study to evaluate the risks which are associated with assumptions made in genome models [Chapters 3 and 4]. This model differs substantially from the GeneFlow in that it simulates the transmission of autosomal chromosomes on which various linked genes can be located (see also Table 2). As in GeneFlow numbers of loci, numbers of alleles per locus and allele frequencies can be manipulated. The ChromoFlow model accommodates recombination by crossing over. Therefore, the location of each locus in the genome (i.e. on one of the defined chromosomes and the position on this chromosome) needs to be defined. ChromoFlow can simulate the characteristics of GeneFlow by defining one chromosome (with one locus) for each locus. The ChromoFlow model is designed to be independent of computer and /or operating system types. The number of chromosomes (c), loci (r) or alleles per locus (n) that can be defined are, therefore, not limited by the program but by the hardware configuration.
The version of GeneFlow used in this study [Chapter 5] has some new features compared with the model as described in Princée [1988]. Since the ChromoFlow model has inherited some basic capabilities of the (extended) GeneFlow model, full descriptions of both models are given in the following sections. Pseudo-source code of the ChromoFlow computer program is listed in Appendix A.
GsPed is the third simulation model that is used in this study. This model generates pedigrees according specified population characteristics such as number of breeding animals, sex-ratio, mating structure and dispersal patterns. This model is used to study effects of genome models on genetic loss [Chapter 3] and effects of social (mating) structure on genetic loss [Chapter 6]. The GeneFlow, ChromoFlow and GsPed models are described in the following sections.
GeneFlow simulation model
Model description
The computer simulation model GeneFlow estimates genetic variation in populations in which all pedigree patterns are known. This model presupposes an infinitely large source population with r loci (1 < r < 40) and with ni alleles (2 < ni < 5) at locus j (j=1 ,.., r). Numbers of alleles and their frequencies fij (i=1,.., nj and j=1, .., r) can vary between loci. These data on initial genetic variation can be hypothetical or based on research results e.g. those of isozyme studies. In each simulation run a pseudo-random number generator, uniformly distributed on (0,1), is used to sample genotypes for each locus in the genome of (potential) founders based on the allele frequency distributions in the source population. A pseudo-random number is also used to sample genotypes for descendants of founders based upon Mendelian segregation of the parental alleles, in each simulation run. GeneFlow does not suppose mutation or selection.
Measures of genetic variation
GeneFlow computes observed heterozygosity using equation 3 (page 19), gene diversity using equations 4 and 5 (see page 20), and the total numbers of alleles (n) in the source and founder populations, and in generation groups of descendants of founders and in sub-populations.
Output of results
The various measures of genetic variation in the source population are computed from data on numbers of alleles and their frequencies as provided at the start of each simulation experiment. After each simulation run, levels of genetic variation are estimated in the founder population, in generation groups and in any other defined group. These measures are expressed in terms of proportion of the genetic variation assumed to be present in the source population. Arithmetic means of simulation runs and (population) variances between simulation runs are estimated for each measure and for each group at the end of a simulation experiment.
ChromoFlow simulation model
Model description
The simulation model ChromoFlow supposes r loci which are distributed over c autosomal chromosomes of a diploid genome. The position of each locus is defined in terms of the chromosome number and the map distance, in centi-morgans (cM), from a fixed point (e.g. centromeric or telomeric end) on the chromosome arm. Map distances are relative to the first locus at position 0. Genome maps can be hypothetical or can be based on empirical studies. Each individual in the pedigree is assigned a genome composed of two homologues for each chromosome. Genotypes of each individual are assigned in each simulation run.
ChromoFlow makes the same assumptions with respect to initial genetic variation in the source population and uses same methods to draw genotypes for (potential) founders as described for GeneFlow. Numbers of loci (r) and alleles per locus (n) are not restricted to a pre-set maximum as in GeneFlow but are determined by limitations of computer hardware. Monte Carlo methods are used to sample homologues for descendants of founders based upon Mendelian segregation of the parental homologues, in each simulation run. The simulation model does not presuppose mutation or selection. Genetic crossing over during meiosis can, optionally, be simulated for each parental chromosome in each cross and in each simulation run. The following recombination model is presupposed in ChromoFlow: Chiasma interference nor chromatid interference is presupposed in ChromoFlow. Chiasmata occur along the chromosome according to a Poisson distribution (i.e. Haldane's model). The model allows two-strand double, three-strand double and four-strand double types of chiasmata. The recombination frequency (R) between two loci can be computed from the map distance (z)(10) between these loci following equation 15 [Haldane, 1919].
A pseudo random number is used to determine whether a chiasma has occurred between the first locus and the second locus. When a chiasma occurs, a pseudo random number is used to decide which of the non-sister chromatids are involved (probabilities 0.5 for each of a sister pair). This process is repeated for locus two and three, three and four, and so on. Once the crossing over process on a parental chromosome has been completed, a pseudo random number is used to decide which of the four (recombined) homologue strands is transmitted.
Measures of genetic variation
ChromoFlow computes the same genetic measures as described for the GeneFlow model in the source and founder populations, in generation groups of descendants of founders and in any user-defined group (e.g. social groups). Additionally, ChromoFlow computes the proportion of polymorphic loci (P) using equation 1 (page 19). A locus is considered polymorphic in this study if the frequency of the most common allele is less than 0.95.
Output of results
The basic results produced by ChromoFlow are similar to those of the GeneFlow program. However, extended reports and the storage of results have been implemented in ChromoFlow to facilitate analyses of individual runs. Frequency distribution of the level of genetic variation per simulation run is constructed for each measure of variation. These distributions are based on 41 classes of width 0.05. Class 0 ranges from 0.00 to 0.05 and class 40 ranges from 2.00 to 2.05. Values of genetic variation that are larger than 2.05 times the source variation are added to class 40.
GsPed simulation model
The computer simulation model GsPed has been developed to generate pedigrees for taxa with different social organizations. These pedigrees can be used for simulation experiments with the models GeneFlow and ChromoFlow. Population parameters that can be manipulated in GsPed are included in the following list [modified from Princée, 1995b]:
- Number of wild-caught animals.
- The number of breeding groups.
- Type of social system: pair, harem, colony.
- Number of females in harem groups or number of males and females in colonies.
- Number of offspring in breeding group.
- Reproductive success of individuals in breeding groups. The distribution of
offspring in males and females, and breeding combinations can be selected:
- Random distribution of offspring.
- Equal distribution of offspring and as many mating combinations as possible are generated.
- User defined breeding combinations and distribution of offspring.
- Number of generations.
- Migration of male offspring according Maximal Avoidance of Inbreeding schemes [Wright, 1921; see Chapter 5] or continuous inbreeding (full-sib matings) within breeding groups can be selected.
- Generations do not overlap in GsPed as parental groups are removed from the breeding group before their offspring can breed. Furthermore, sex-ratio of offspring is the same as that in the parental groups.
The output of the GsPed model consists of two files, called MASTER and MOVES, respectively. The MASTER file contains general data on sex, parents, location of birth, date of birth and date of death for each individual in the pedigree. An unique serial number (ID) is assigned to each individual. Parents of individuals that are born in the population are identified by their ID number. However, parents of (potential) founders are identified as "WILD". It is assumed that (potential) founders are unrelated to each other. Location of birth either refers to the breeding group (a number) or to "WILD" (indicating that the individual is a potential founder). The MOVES file contains chronological data on each location transfer of each individual in the pedigree since it entered the population (either by birth or by import).