# Coalescence simulator¶

## Main class¶

The class Simulator manages all parameters and lets the user run coalescence simulations.

class egglib.coalesce.Simulator(num_pop, migr=0.0, **kwargs)

Manager of the coalescent simulator. The constructor takes arguments controlling the demographic and mutation models used for simulation. Only the number of populations is required at the time of construction. Once it is set, it can be never modified. Other constructor arguments are all optional and can be set or modified later using the params instance attribute (either using its update() method or the [] operator, such as in simulator.params['theta'] = 2.85. Keyword arguments are passed as is to update(). List-based parameters can be set if all values are provided in a sequence. Matrix-based parameters cannot be set here.

Parameters: num_pop – number of populations. migr – migration rate.

Other keyword arguments can be any of the parameters defined in the table below.

Parameter Definition Default
num_pop Number of populations None, required
num_sites Number of sites 0 (ISM)
num_mut Fixed number of mutations 0
theta 4N_0\mu parameter 0.0
recomb 4N_0c parameter 0.0
mut_model Mutation model (among KAM, IAM, SMM and TPM) KAM
TPM_proba Probability parameter of TPM 0.5
TPM_param Shape parameter of TPM 0.5
num_alleles Number of alleles for KAM 2
rand_start Pick start allele randomly for KAM (boolean) False
num_chrom Number of sampled chromosomes (per population) 0 for all
num_indiv Number of sampled individuals (per population) 0 for all
N Population size, expressed relatively to N_0 (per population) 1.0 for all
G Exponential growth/decline rate, negative values mean decline (per population) 0.0 for all
s Population selfing probability (per population) 0.0 for all
outgroup Label of samples to place in the outgroup (population index for normal samples, label for delayed) None
site_pos Site position, as values between 0.0 and 1.0 (per site) Equally spread
site_weight Site mutation weight, controlling the relative probability of sites (per site) 1.0 for all
migr_matrix Pairwise migration rate matrix (the diagonal cannot be set) 0.0 for all
trans_matrix Matrix of transition weights between pairs of alleles 1.0 for all
events List of events added by the user Empty
seed Random number generator seed Based on clock
max_iter Maximum number of iterations 100,000

The following table presents the categories of events that can be added to the events list using add().

Event code Description Parameters
size Change population size T – date (1)
N – new size
idx – population index (2)
migr Change all migration rate T – date (1)
M – migration rate (all pairwise migration rates are set to M/(num_pop-1))
pair_migr Change pairwise migration rate T – date (1)
M – pairwise migration rate
src – source population index
idx – destination population index
growth Change population exponential growth/decline rate T – date (1)
G – new rate
idx – population index (2)
selfing Change population self-fertilization rate T – date (1)
s – new rate
idx – population index (2)
recombination Change recombination rate T – date (1)
R – new rate
bottleneck Apply a bottleneck T – date (1)
S – bottleneck strength (3)
idx – population index (2)
admixture Move lineages from one population to another T – date (1)
proba – migration probability (in [0,1] range
src – source population index
dst – destination population index
merge Merge a population to another (take all lineages from src, move them to dst, and remove src T – date (1)
src – source population index
dst – destination population index
sample Perform a delayed a some point in the past in one of the populations T – date (1)
idx – population index
label – group label (4)
num_chrom – number of sampled chromosomes
num_indiv – number of sampled individuals
1. Time is expressed in units of 4N_0 generations.
2. If idx is omitted, the event is applied to all populations at once.
3. Bottleneck strength is expressed in time units. A bottleneck is implemented as a period of time during which coalescences are the only event allowed to occur.
4. The label of delayed sample can be set to the same value than the populations index (in this case, delayed samples will have the same label than normal samples from the same population), or to a different label, as the user’s option. Based on the value of the outgroup option, this label controls whether delayed samples should go to the outgroup.

The parameters num_chrom, num_indiv, N, G, s, site_pos, and site_weight are represented by ParamList instances that behave like lists (except that their length cannot be changed). In particular, ParamList support subscript indexing and it can also be initialized by passing a sequence. The parameters migr_matrix and trans_matrix are represented by ParamMatrix instances that support a double-index subscript system to read/changes values (as in params['migr_matrix'][i,j] to access the value at row i and column j. Diagonal values are read as None and cannot be changed. Finally, events is represented by a EventList instance that exhibit limited list functionality and provides methods to add, read, and modify events. See this class for more information about out editing the list of events.

align

A Align instance containing simulated data for the current iteration of iter_simul(). The data in this instance will be updated at each iteration round, and deleted at the end of the iteration. If it must be copied, a deep copy is required (typically with Align.create()).

iter_simul(nrepet, cs=None, cs_filter=None, cs_structure=None, dest_trees=None, **feed_params)

Performs simulations based on the current value of parameters stored in params. Return an iterator that will loop over the requested number of simulations. The simulated alignments are always available at each iteration loop as the instance attribute align. By default, all iterations return a reference to this instance, but if cs is specified a dictionary of statistics is returned at each simulation.

Parameters: nrepet – number of simulations. If None, iterate for ever (you are required to use break in loop with your own stopping criterion is reach). cs – ComputeStats instance properly configured to compute statistics from simulated data. If cs is specified, each iteration round will yield the dictionary of statistics obtained from ComputeStats.process_align(). See also cs_filter and cs_struct. cs_filter – a Filter instance to be passed to ComputeStats (ignored if cs is None). By default, use filter_default. cs_structure – a Structure instance to be passed to ComputeStats (ignored if cs is None). dest_trees – a list in which simulated trees will be appended. Since each simulation can yield several trees (because of recombination), a new sub-list will be appended for each simulation, with one item for each tree. Each tree covered a defined region of the simulated chromosome, so each item of each sub-list will be (tree, start, stop) tuple with tree as a new Tree instance, and start and stop as the start and stop positions (real numbers comprises between 0 and 1). If recombination occurred, trees will not be sorted within their sub-list. If recombination did not occur, each simulation will be represented by a list with a single tuple. Any previously data contained in dest_tree is left untouched. feed_params – other arguments provide sequences of values for any parameter (except num_pop and seed), allowing to modify any set of parameters between simulations. All changes are permanent and affect any later simulation. All additional options must be in the key=value format, with have one of the parameters as key, and a sequence of values as value. All sequences must be of length at least equal to nrepet. If longer, additional values are ignored. For list-based parameters, each value must be a (index, value) tuple and, for matrix-based parameters, each value must be a (index1, index2, value) tuple. An iterator, yielding a reference to align if cs is None, or otherwise a dictionary of computed statistics.
params

Simulation parameters of this instance. This object is a instance of ParamDict, which is a specialized version of dict that does not let the users add or remove parameters, but lets them modify values of parameters (similarly, the number of items of per-population or per-site parameters cannot be modified).

simul()

Perform a single simulation based on the current value of parameters stored in params.

Returns: A new Align instance containing the simulated data unless dest is specified (otherwise None).

Note

The method iter_simul() can be much more efficient. For performance-critical applications, its use is recommended.

## Dictionary- and list-like classes¶

All parameters are accessible from the class Simulator. They are returned in a dictionary-like type that lets the user modify parameters but not modify the list of parameters. Note that none of the classes described in this section are meant to be built by the user.

### ParamDict¶

class egglib.coalesce.ParamDict(npop)

Emulator of dict for managing parameters. Do most of what a dictionary does except for adding and removing keys. The order of parameters is fixed and consistent.

In addition to the methods documented below, ParamDict instances support the following operations (where params in one instance):

Expression Action
len(params) Number of parameters
params[key] Get the value of parameter key
params[key] = value Assign value to parameter key (see note below)
for key in params Same as for key in params.iterkeys()
reversed(params) Return a reversed iterator
key in params Check if key in a parameter name
str(params) Representation of the instance as a dict

Note that the params[key] = value expression is straightforward only for parameters that have a single value. For parameters represented by a ParamList instance, this expression is only supported if the right-hand operand is a sequence of matching length (in order to set all values at once). For parameters represented by a ParamMatrix instance, this expression is only supported if the right-hand operand is another ParamMatrix instance of matching dimension. For events, this expression is not supported at all. In all cases where params[key] returns ParamList, ParamMatrix or a EventList instance, the returned value can be modified using its own methods.

add_event(cat, T, **params)

Add an event to the list. See the documentation for the class Simulator for more details on what is expected as arguments.

Parameters: cat – category of the event. T – date of the event. params – all needed parameters for this category of events.
copy()

Return a shallow copy of this instance. All modifications of the values of either copy will modify the other one (even modification of integer, float and string parameters).

disable_trans_matrix()

Disable the matrix of weights of transition between alleles (it is disabled by default, and automatically activated if any value is set). This sets all weights to 1.0.

get(key, default=None)

Return the value for key if key is one of the parameters, else return the value passed as default (by default, None). This method therefore never raises a KeyError.

get_values(other)

Update the dictionary with values from another ParamDict instance (with the same numbers of populations and sites).

has_key(key)

Return a boolean indicating if the passed name is one of the parameters.

items()

Return a copy of the list of (key, value) parameter name and value pairs.

iteritems()

Return an iterator over the (key, value) parameter name and value pairs.

iterkeys()

Return an iterator over the parameter names.

itervalues()

Return an iterator over the parameter values.

keys()

Return a copy of the list of parameter names.

mk_structure(skip_indiv=False)

Export a stats.Structure instance containing the structure information corresponding to currently loaded simulation parameters.

Parameters: skip_indiv – this argument determines whether the individuals level should be skipped (if True, alleles from each given individual are treated as belonging to separate haploid individuals. Warning The individual level cannot be processed if the ploidy is not constant (mixing sampled chromosomes and sampled individuals). A new stats.Structure instance.
set_migr(value)

Set all pairwise migration rates to value/(num_pop-1).

summary()

Return a string displaying all current values of parameters at the level of the C++ simulator. Use for debugging only, as the format is not guaranteed to be stable.

update(other=None, **kwargs)

Update the dictionary with the (key, value) parameter name and value pairs from the object passed as other, overwriting existing keys and raising a KeyError exception if an unknown parameter name is met.

Accepts dict instances, or else any iterable of (key, value) pairs. If keyword arguments are specified, the instance is then updated with those, as in param_dict.update(theta=5.0, recomb=2.5).

values()

Return a copy of the list of parameter values.

### ParamList¶

class egglib.coalesce.ParamList(getter, setter, num_values, check_item)

Emulator of list for managing values for a parameter. Do most of what a list does except for adding and removing values. The methods reverse() and sort() are also not provided as they make little sense. Expressions involving the + and * arithmetic operators are supported, but return genuine list instances that are disconnected from the original parameter holder.

In addition to the methods documented below, ParamList instances support the following operations (where items is a ParamList instance):

Expression Action
len(items) Number of items
items[i] Get ith item
items[i:j] Slice from i to j
items[i:j:k] Slice from i to j with step k
items[i] = value Assign value to parameter key
items[i:j] = values Replace a slice of values (with a sequence of the same length)
items[i:j:k] = values Replace a slice of values (with a sequence of the same length)
for item in items Iterates over items
reversed(items) Return a reversed iterator
item in items Check if item is present among items
items + other Same as list(items) + other (returns a list)
items * n Same as list(items) * n (returns a list)
str(items) Representation of the instance as a list

The indexing operator ([]) supports negative indexes (to count from the end) and slices operators, just as for the built-in type list. However, all operators (such as del) or methods (such as append(), extend(), remove()) that would change the length of the list are not available.

count(value)

Return the number of items that are equal to value.

index(value, imin=0, imax=None)

Return the index of the first value matching value. The returned value is >=imin and <imax if they are provided. Raise a ValueError if the value is not found.

### ParamMatrix¶

class egglib.coalesce.ParamMatrix(getter, setter, num_values, check_item, check_active, set_active)

Emulator of list for managing values for a parameter as a matrix. Similar to ParamList except that it holds a matrix. It therefore implements less methods (no support for + and * operators, no slices) and use double indexing system as in matrix[i, j] = x, allowing both setting and getting values. The iterator, such as in for i in matrix, flattens the matrix, returning all values from the first row, then those from the second tow, and so on (replacing the diagonal values by None). len(matrix) returns the dimension (number of rows, which is the same as the number of columns).

In addition to the methods documented below, ParamMatrix instances support the following operations (where matrix is a ParamMatrix instance):

Expression Action
len(matrix) Size of the matrix (as number of rows/colums)
matrix[i,j] Get jth item of the ith row (None for diagonal)
matrix[i,j] = value Assign value to (i,j)th item (no diagonal)
for item in matrix Iterates over items (row-first flat iteration)
reversed(matrix) Return a reversed iterator
item in matrix Check if item is present among items
str(matrix) Representation of the instance as a nested list (including diagonal)

It is not possible to slice-set ranges of values in the matrix, but one can set all values at once from a compatible source with get_values().

count(value)

Return the number of items that are equal to value.

get_values(other)

Get all values from other, which must be another ParamMatrix or a nested sequence (such as a list or list). The dimension of other must be the same as the current instance. Not that in the input object is not a ParamMatrix, any value it can have on the diagonal will be ignored (but they must be present to avoid shifting indexes).

index(value, imin=0, imax=None, jmin=0, jmax=None)

Return the (i, j) tuple of indexes of the first value (iterating first over rows and then over columns) matching value. The returned row index is >=imin and <imax and the returns column index is >=jmin and <jmax if they are provided. Raise a ValueError if the value is not found. The diagonal is not considered.

### EventList¶

class egglib.coalesce.EventList(npop, add, clear)

Class storing the list of demographic events. Even if events appear as unsorted, they are internally sorted so that the user does not have to care about their order. This class has limited functionality: add events (but not removing them), accessing and modifying parameters.

In addition to the methods documented below, EventList instances support the following operations (where events is an EventList instance):

Expression Action
len(events) Number of events currently loaded
events[i] Return a dictionary with parameters of the ith event (see note)
for event in events Same as for i in range(len(events)): event = events[i]
str(events) Representation of the instance, roughly as a list

The string representation is such as a string at the first level, but each event is represented as an angle-bracketed delimited string containing comma-separated key=value pairs with the parameter as key and its value as value (with an additional key, event_index, providing the index of the event in the list).

Note

There is no way to modify content of the instance using the indexing operator events[i]. One must use update().

More information on the list of events and their parameters is available in the documentation for class Simulator.

add(cat, T, **params)

Add an event to the list. See the documentation for the class Simulator for more details on what is expected as arguments.

Parameters: cat – category of the event. T – date of the event. params – all needed parameters for this category of events.
clear()

Clear list of events.

replace(other)

Replace own list of events with the one in the EventList instance passed as other. The current list of events is dropped.

update(event_index, **params)

Modify any parameter from one of the event of the list. If an event’s date is modified, sorting will be updated automatically.

Parameters: event_index – index of the event to modify (based on the order in which events have been specified with add()`, which is the same order as events appear when representing the instance or iterating). params – keyword arguments specifying what parameters to modify. Only parameters that have to be changed should be specified.