Main class¶
The class Simulator manages all parameters and lets the user run coalescence simulations.
- class egglib.coalesce.Simulator(num_pop, migr=0.0, **kwargs)¶
Manager of the coalescent simulator. The constructor takes arguments controlling the demographic and mutation models used for simulation. Only the number of populations is required at the time of construction. Once it is set, it can be never modified. Other constructor arguments are all optional and can be set or modified later using the params instance attribute (either using its update() method or the [] operator, such as in simulator.params['theta'] = 2.85. Keyword arguments are passed as is to update(). List-based parameters can be set if all values are provided in a sequence. Matrix-based parameters cannot be set here.
Parameters: - num_pop – number of populations.
- migr – migration rate.
Other keyword arguments can be any of the parameters defined in the table below.
Parameter Definition Default num_pop Number of populations None, required num_sites Number of sites 0 (ISM) num_mut Fixed number of mutations 0 theta parameter 0.0 recomb parameter 0.0 mut_model Mutation model (among KAM, IAM, SMM and TPM) KAM TPM_proba Probability parameter of TPM 0.5 TPM_param Shape parameter of TPM 0.5 num_alleles Number of alleles for KAM 2 rand_start Pick start allele randomly for KAM (boolean) False num_chrom Number of sampled chromosomes (per population) 0 for all num_indiv Number of sampled individuals (per population) 0 for all N Population size, expressed relatively to (per population) 1.0 for all G Exponential growth/decline rate, negative values mean decline (per population) 0.0 for all s Population selfing probability (per population) 0.0 for all outgroup Label of samples to place in the outgroup (population index for normal samples, label for delayed) None site_pos Site position, as values between 0.0 and 1.0 (per site) Equally spread site_weight Site mutation weight, controlling the relative probability of sites (per site) 1.0 for all migr_matrix Pairwise migration rate matrix (the diagonal cannot be set) 0.0 for all trans_matrix Matrix of transition weights between pairs of alleles 1.0 for all events List of events added by the user Empty seed Random number generator seed Based on clock max_iter Maximum number of iterations 100,000 The following table presents the categories of events that can be added to the events list using add().
Event code Description Parameters size Change population size T – date (1) N – new size idx – population index (2) migr Change all migration rate T – date (1) M – migration rate (all pairwise migration rates are set to M/(num_pop-1)) pair_migr Change pairwise migration rate T – date (1) M – pairwise migration rate src – source population index idx – destination population index growth Change population exponential growth/decline rate T – date (1) G – new rate idx – population index (2) selfing Change population self-fertilization rate T – date (1) s – new rate idx – population index (2) recombination Change recombination rate T – date (1) R – new rate bottleneck Apply a bottleneck T – date (1) S – bottleneck strength (3) idx – population index (2) admixture Move lineages from one population to another T – date (1) proba – migration probability (in [0,1] range src – source population index dst – destination population index merge Merge a population to another (take all lineages from src, move them to dst, and remove dst T – date (1) src – source population index dst – destination population index sample Perform a delayed a some point in the past in one of the populations T – date (1) idx – population index label – group label (4) num_chrom – number of sampled chromosomes num_indiv – number of sampled individuals - Time is expressed in units of generations.
- If idx is omitted, the event is applied to all populations at once.
- Bottleneck strength is expressed in time units. A bottleneck is implemented as a period of time during which coalescences are the only event allowed to occur.
- The label of delayed sample can be set to the same value than the populations index (in this case, delayed samples will have the same label than normal samples from the same population), or to a different label, as the user’s option. Based on the value of the outgroup option, this label controls whether delayed samples should go to the outgroup.
The parameters num_chrom, num_indiv, N, G, s, site_pos, and site_weight are represented by ParamList instances that behave like lists (except that their length cannot be changed). In particular, ParamList support subscript indexing and it can also be initialized by passing a sequence. The parameters migr_matrix and trans_matrix are represented by ParamMatrix instances that support a double-index subscript system to read/changes values (as in params['migr_matrix'][i,j] to access the value at row i and column j. Diagonal values are read as None and cannot be changed. Finally, events is represented by a EventList instance that exhibit limited list functionality and provides methods to add, read, and modify events. See this class for more information about out editing the list of events.
- align¶
A Align instance containing simulated data for the current iteration of iter_simul(). The data in this instance will be updated at each iteration round, and deleted at the end of the iteration. If it must be copied, a deep copy is required (typically with Align.create()).
- iter_simul(nrepet, cs=None, cs_filter=None, cs_structure=None, dest_trees=None, **feed_params)¶
Performs simulations based on the current value of parameters stored in params. Return an iterator that will loop over the requested number of simulations. The simulated alignments are always available at each iteration loop as the instance attribute align. By default, all iterations return a reference to this instance, but if cs is specified a dictionary of statistics is returned at each simulation.
Parameters: - nrepet – number of simulations.
- cs – ComputeStats instance properly configured to compute statistics from simulated data. If cs is specified, each iteration round will yield the dictionary of statistics obtained from ComputeStats.process_align(). See also cs_filter and cs_struct.
- cs_filter – a Filter instance to be passed to ComputeStats (ignored if cs is None). By default, use filter_default.
- cs_structure – a Structure instance to be passed to ComputeStats (ignored if cs is None).
- dest_trees – a list in which simulated trees will be appended. Since each simulation can yield several trees (because of recombination), a new sub-list will be appended for each simulation, with one item for each tree. Each tree covered a defined region of the simulated chromosome, so each item of each sub-list will be (tree, start, stop) tuple with tree as a new Tree instance, and start and stop as the start and stop positions (real numbers comprises between 0 and 1). If recombination occurred, trees will not be sorted within their sub-list. If recombination did not occur, each simulation will be represented by a list with a single tuple. Any previously data contained in dest_tree is left untouched.
- feed_params – other arguments provide sequences of values for any parameter (except num_pop and seed), allowing to modify any set of parameters between simulations. All changes are permanent and affect any later simulation. All additional options must be in the key=value format, with have one of the parameters as key, and a sequence of values as value. All sequences must be of length at least equal to nrepet. If longer, additional values are ignored. For list-based parameters, each value must be a (index, value) tuple and, for matrix-based parameters, each value must be a (index1, index2, value) tuple.
Returns: An iterator, yielding a reference to align if cs is None, or otherwise a dictionary of computed statistics.
- params¶
Simulation parameters of this instance. This object is a instance of ParamDict, which is a specialized version of dict that does not let the users add or remove parameters, but lets them modify values of parameters (similarly, the number of items of per-population or per-site parameters cannot be modified).
- simul()¶
Perform a single simulation based on the current value of parameters stored in params.
Returns: A new Align instance containing the simulated data unless dest is specified (otherwise None). Note
The method iter_simul() can be much more efficient. For performance-critical applications, its use is recommended.