Coalescent simulator

Parameters

Params class

class

Coalescence simulation parameters holder.

Holds parameters for a coalescence simulation. Default values are:

  • k: 1
  • L: 0
  • R: 0.
  • theta: 0.
  • fixed: 0
  • mutmodel: KAM
  • TPMproba: 0.5
  • TPMparam: 0.5
  • K: 2
  • random_start_allele: false
  • n1: 0 for any population
  • n2: 0 for any population
  • N: 1. for any population
  • G: 0. for any population
  • s: 0. for any population
  • sitePos: not initialized (beware!)
  • siteW: 1. for any site
  • M: all rates at 0.0
  • transW_matrix: false
  • transW: 1. for any combination
  • changes: none
  • max_iter: 100000
See the corresponding setter methods for more details on the signification of parameters. Note that every population has an independent size, growth/decline rate and selfing rate (in addition to numbers of samples, obviously).

Header: <egglib-cpp/Params.hpp>

Public Types

enum egglib::Params::MutationModel

Specification for mutation models.

The possible values are:

  • KAM: fixed finite number of alleles.
  • IAM: infinite number of alleles.
  • SMM: stepwise mutation model.
  • TPM: two phase mutation model.
All models are constrained by the mutation parameter (theta or, if theta is 0, fixed). TPM is the only model that requires additional parameters (TPMproba and TPMparam see below).

Nucleotide sequences can be simulated using the KAM with K=2 or 4 (the latter should be used with models with a finite number of sites, to control homoplasy). When using an infinite site model, each site cannot experience more than one mutation. Therefore, it makes little sense to use any other mutation model than KAM with K=2.

In the KAM, each mutation cause a transition to one of the other alleles. Transition biases can be controlled using the tranW matrix (if tranW_matrix is true). The allele values are integers in the range 0, 1, …, K-1.

In the IAM, each mutation generate a new allele. Alleles values are integers in the range 0, 1, …, S-1 where S is the number of mutations having occurred for a particular site. The allele value indicates the rank of occurrence of the mutation, and should not be taken as having any biological signification.

In the SMM, each mutation change the allele value by a step of -1 or +1 (with equal probabilities). Alleles values can therefore have any negative or positive values.

In the TPM, each mutation change the allele value by a step of -n or +n (with equal probabilities). n is either 1 or drawn from a geometric distribution. TPMproba gives the probability that n follows a geometric distribution (that is, of not following a SMM) and TPMparam gives the shape parameter of the geometric distribution. The GSM (generalized stepwise model) can be obtained using the TPM by setting TPMproba to 1.0.

Values:

Public Functions

egglib::Params::Params()

Default constructor.

Parameters are initialized to default values. The number of populations is 1.

egglib::Params::Params(unsigned int npop, double migr)

Standard constructor.

Except the number of populations, the parameters are initialized to default values. The number of populations must be at least 1. All pairwise migration rates can be specified later (the optional argument is the per-population migration rate to be applied to all populations.

egglib::Params::~Params()

Destructor.

void egglib::Params::addChange(Event *e)

Add a demographic change.

The argument must be an instance of Event (but not of the none type). Each of the types represents a given type of demographic change (meaning is broad). All changes have a date parameter, specifying at what date the change should occur (in backward, coalescent time). The unit is 4N generations, where N is the number of diploi individuals of a population. Multiple changes can be specified (just call this method repetitively). Changes can be entered in any order (they are automatically sorted). However, if two changes appear to occur to the same date, the first entered will also be the first to be applied. Note that many parameters imply constraints on values: all dates must be >= 0, population indices must refer to a valid population, pairwise migration rate population indices must be different, selfing rate (and other probabilities must be >= 0 and <= 1, migration rates must be > 0. Beware that final coalescence of all sample must be allowed (beware of null migration rates and exponential decline rates). As a rule, the change, Params and Coal classes DO NOT perform sanity checks, so most invalid values will result in program crash, incorrect results or exceptions. The user should carefully check passed values or ensure passed values are valid for all change type used. Note that Params does not take ownership of the Event pointer and will not delete the object at destruction time.

void egglib::Params::autoSitePos()

Automatically set site positions.

Set all site positions such as they use all possible span over the simulated chromosome. If there only one site, it is set at position 0.5. If there are two sites, there are set a positions 0.0 and 1.0. If there are three sites, at 0.0, 0.5, 1.0, and so on.

void egglib::Params::clearChanges()

Clear the event list.

Event *egglib::Params::firstChange()

Get the first change.

Returns the first event of the queue (irrespective of the internal event counter). The value is NULL if there is no event loaded.

void egglib::Params::G_R(unsigned int pop, double value, double t)

Set a population relative growth/decline rate(reversible)

Parameters
  • pop -

    population index.

  • value -

    growth/decline rate.

  • t -

    time of the change.

unsigned int egglib::Params::get_fixed()
const

Get the fixed number of mutations.

This parameter is ignored if theta is different of 0.

double egglib::Params::get_G(unsigned int pop)
const

Get a population growth/decline rate.

unsigned int egglib::Params::get_K()
const

Get the number of alleles.

unsigned int egglib::Params::get_L()
const

Get number of mutable sites.

A value of 0 denotes an infinite number of sites.

unsigned long egglib::Params::get_max_iter()
const

Maximum number of iterations.

Params::MutationModel egglib::Params::get_mutmodel()
const

Get the mutation model value.

Detailed information is provided within the documentation of the enum type MutationModel.

double egglib::Params::get_N(unsigned int pop)
const

Get a population relative size.

unsigned int egglib::Params::get_n1(unsigned int pop)
const

Get the number of single samples in a population.

unsigned int egglib::Params::get_n2(unsigned int pop)
const

Get the number of double samples in a population.

void egglib::Params::get_nsam(unsigned int &ns, unsigned int &no)

Gives the number of ingroup/outgroup samples (including delayed samples)

unsigned int egglib::Params::get_outgroup()
const

Get label identifying outgroup samples.

MISSING for none.

double egglib::Params::get_R()
const

Get the recombination rate.

bool egglib::Params::get_random_start_allele()
const

Get random start allele boolean.

double egglib::Params::get_s(unsigned int pop)
const

Get a population selfing rate.

double egglib::Params::get_sitePos(unsigned int site)
const

Get a site position.

double egglib::Params::get_siteW(unsigned int site)
const

Get a site weight.

double egglib::Params::get_theta()
const

Get the mutation rate.

If theta is 0, the value of fixed is considered.

double egglib::Params::get_TPMparam()
const

Get the shape parameter of the TPM distribution.

This parameter is ignored if the mutation model is any other than TPM. This parameter sets the shape of the geometric distribution from which mutation steps are drawn with probability given by TPMproba.

double egglib::Params::get_TPMproba()
const

Get the TPM probability parameter.

This parameter is ignored if the mutation model is any other than TPM. The probability gives the probability that mutation steps are drawn from a geometric distribution (otherwise, they are set to 1).

bool egglib::Params::get_transW_matrix()
const

Allele transition matrix flag

If true, transition weights between alleles are taken from a matrix (values must be set using method transW).

double egglib::Params::get_transW_pair(unsigned int i, unsigned int j)
const

Get an allele transition matrix entry.

This method should not be used to access to diagonal items. If transW_matrix() is false, this method always return 1.

double egglib::Params::get_transW_row(unsigned int i)
const

Get transition weight sum for an allele.

If transW_matrix() is false, this method always return the number of alleles minus one (the weight is always 1).

unsigned int egglib::Params::k()
const

Get the number of populations.

double egglib::Params::lastChange(unsigned int pop)
const

Date of the last size change for a given pop.

Migration &egglib::Params::M()

Get the migration matrix.

void egglib::Params::N_R(unsigned int pop, double value, double t)

Set a population relative size (reversible)

Parameters
  • pop -

    population index.

  • value -

    population size.

  • t -

    time of the change.

unsigned int egglib::Params::nDSChanges()
const

Number of delayed sample changes left to apply.

double egglib::Params::nextChangeDate()
const

Get the date of the next change.

Returns the date of the next change to be applied. The number will be >= 0 if there is at least one change, and egglib::UNDEF otherwise.

void egglib::Params::nextChangeDo(Coalesce *coal)

Apply next change.

The argument must be the address of the Coalesce instance that should be modified (note that not all change types will require affecting the Coalesce instance directly).

unsigned int egglib::Params::numChanges()
const

Get the number of changes.

void egglib::Params::R_R(double value)

Set the recombination rate (reversible)

void egglib::Params::restore()

Restore object to the initial state.

All variable parameters (that is, those who have a reversible method, and the number of populations) will be reset to the last value set using a normal setter (or to the default value).

void egglib::Params::s_R(unsigned int pop, double value, double t)

Set a population selfing rate (reversible)

Parameters
  • pop -

    population index.

  • value -

    selfing rate.

  • t -

    time of the change.

void egglib::Params::set_fixed(unsigned int value)

Set the fixed number of mutations.

This parameter is ignored if theta is different of 0. The value can be 0.

void egglib::Params::set_G(unsigned int pop, double value)

Set a population relative growth/decline rate.

The population index must be smaller than the last number of population specified. If the rate is > 0, then the population is experiencing an exponential growth (was smaller in the past). If the rate is < 0, then the population is experiencing an exponential decline (was bigger in the past). Caution is required when using exponential population decline. Going back in time, population size can grow to excessive size and cause infinite coalescence time (this will cause a runtime error). To prevent this, always ensure that population decline stops at some point in the past and that coalescence of remaining lineages can occur.

Parameters
  • pop -

    population index.

  • value -

    growth/decline rate.

void egglib::Params::set_K(unsigned int value)

Set the number of alleles.

The value must be at least 2. If the transition matrix flag is on, calling this method will reset all weights to default value (all weights to 1, sum to number of alleles minus one). Also if the new value is smaller than or equal to the previous one.

void egglib::Params::set_L(unsigned int value)

Set number of mutable sites.

A value of 0 denotes an infinite number of sites. The site weights are initialized to 1, but the site positions are not initialized.

Parameters
  • value -

    new number.

void egglib::Params::set_max_iter(unsigned long)

Maximum number of iterations.

void egglib::Params::set_mutmodel(MutationModel value)

Set the mutation model.

Detailed information is provided within the documentation of the enum type MutationModel.

void egglib::Params::set_N(unsigned int pop, double value)

Set a population relative size.

The population index must be smaller than the last number of population specified. The population size must be larger than 0.

Parameters
  • pop -

    population index.

  • value -

    population size.

void egglib::Params::set_n1(unsigned int pop, unsigned int value)

Set the number of single samples in a population.

The population index must be smaller than the last number of populations specified. Single samples are individuals in which only one allele is sampled. If the selfing rate is 0, there is no difference between 2n single samples and n double samples (or any combination summing up to a total of 2n alleles).

Parameters
  • pop -

    population index.

  • value -

    number of single samples.

void egglib::Params::set_n2(unsigned int pop, unsigned int value)

Set the number of double samples in a population.

The population index must be smaller than the last number of populations specified. Double samples are individuals in which both alleles are sampled. They therefore actually represent two samples (which are correlated whenever the selfing rate is larger than 0). If the selfing rate is 0, there is no difference between 2n single samples and n double samples (or any combination summing up to a total of 2n alleles).

Parameters
  • pop -

    population index.

  • value -

    number of double samples.

void egglib::Params::set_outgroup(unsigned int lbl)

Set label identifying outgroup samples.

MISSING for none.

void egglib::Params::set_R(double value)

Set the recombination rate.

The value must be at least 0.

void egglib::Params::set_random_start_allele(bool value)

Set random start allele boolean.

If true, the coalescent simulator will use as start allele a random value in the range [0, K-1]. If false, it will use 0.

void egglib::Params::set_s(unsigned int pop, double value)

Set a population selfing rate.

The population index must be smaller than the last number of population specified. The selfing rate must be >= 0 and <= 1.

Parameters
  • pop -

    population index.

  • value -

    selfing rate.

void egglib::Params::set_sitePos(unsigned int site, double value)

Set a site position.

The site index must be smaller than L. The site position must be >= 0 and <= 1. It is assumed that sites are ranked by increased site position, but this is not enforced.

Parameters
  • site -

    site index.

  • value -

    site position.

void egglib::Params::set_siteW(unsigned int site, double value)

Set a site weight.

Site weights give the relative mutation probability of sites. The higher the site weight, the more likely that a mutation will hit this site. Note that the sum of weights is not required to sum up to one. Weights don’t affect the total mutation rate. Once a mutation occurs, they determine which site is affected.

The site index must be smaller than L. The site weight can be any strictly positive value.

Parameters
  • site -

    site index.

  • value -

    site weight.

void egglib::Params::set_theta(double value)

Set the mutation rate.

If theta is 0, the value of fixed is considered. The value cannot be less than 0.

void egglib::Params::set_TPMparam(double value)

Set the shape parameter of the TPM distribution.

This parameter is ignored if the mutation model is any other than TPM. This parameter sets the shape of the geometric distribution from which mutation steps are drawn with probability given by TPMproba. The passed value should be >= 0 and <= 1.

void egglib::Params::set_TPMproba(double value)

Set the TPM probability parameter.

This parameter is ignored if the mutation model is any other than TPM. The probability gives the probability that mutation steps are drawn from a geometric distribution (otherwise, they are set to 1). The value must be at least 0. and at most 1.

void egglib::Params::set_transW_matrix(bool flag)

Allele transition matrix flag

If true, transition weights between alleles are taken from a matrix (values must be set using method transW). Setting the flag to true will reset all weights to default values: all weights to 1, sum to number of alleles minus one. Also if the new value is smaller than or equal to the previous one (even if the flag was already true).

void egglib::Params::set_transW_pair(unsigned int i, unsigned int j, double value)

Set an allele transition matrix entry.

Set transition weights from allele i to allele j to value. Transition weights are not required to sum up to 1. If the sum is different of 1, it will not affect the overall mutation rate. If a mutation occur and if the current allele is the one given by i, the value will determine the relative probability of mutating to the allele given by j. The value must be strictly larger than 0. Both i and j must be smaller than the number of alleles (K), and they must be different to each other. It is not allowed to call this method if transW_matrix() is false (and it will likely result in a crash).

std::string egglib::Params::summary()
const

String summary of the parameter values.

The string is generated each time this method is called.

double egglib::Params::totalSiteW()
const

Get the sum of site weights.

void egglib::Params::validate()
const

Check consistency of entered parameters.

This methods ensures that the number of populations of the migration matrix is consistent with the one of the current Params instance, and that all changes have valid parameter values (increase date, valid indices, etc.). The method throws an EggArgumentValueError for the first encounterer problem.

Migration class

class

Arbitrary migration matrix type.

This class represents an arbitrary migration model where any pairwise can be specified independently.

Header: <egglib-cpp/Params.hpp>

Public Functions

egglib::Migration::Migration(unsigned int n, double M)

Constructor.

The migration rate argument is per population. By default, all pairwise migration rates are M/(n-1).

Parameters
  • n -

    number of populations.

  • M -

    migration rate.

egglib::Migration::~Migration()

Destructor.

double egglib::Migration::get_pair(unsigned int i, unsigned int j)
const

Get a pairwise migration rate.

Return the pairwise migration from population i to population j. Both arguments should be smaller than n and i should be different of j. It is not guaranteed that all errors will properly generate an exception.

double egglib::Migration::get_row(unsigned int i)
const

Get a population migration rate.

Return the sum of pairwise migration rates from this population. If i>=n1*n2, the behaviour is undefined.

unsigned int egglib::Migration::n()
const

Number of populations.

void egglib::Migration::restore()

Restore to initial state.

void egglib::Migration::set_all(double M)

Set all migration rates.

The migration rate argument is per population. Change all pairwise migration rates to the value M/(n-1). This method is definitive - the previous value is lost.

void egglib::Migration::set_all_R(double M)

Set all migration rates (recursive)

The migration rate argument is per population. Change all pairwise migration rates to the value M/(n-1). This method is reversible by the restore method.

void egglib::Migration::set_pair(unsigned int i, unsigned int j, double m)

Set a given pairwise migration rate.

The migration rate argument is the pairwise value. The diagonal sum is automatically updated. i and j should both be smaller than n, and should not be equal. If any of these conditions is not met, the behaviour is not defined and is likely to be unpleasant. The user is prompted to be particularly vigilant regarding the fact that i and j must be different. This method is definitive - the previous value is lost.

void egglib::Migration::set_pair_R(unsigned int i, unsigned int j, double m)

Set a pairwise migration rate (reversible)

The migration rate argument is the pairwise value. The diagonal sum is automatically updated. i and j should both be smaller than n, and should not be equal. If any of these conditions is not met, the behaviour is not defined and is likely to be unpleasant. The user is prompted to be particularly vigilant regarding the fact that i and j must be different. This method is definitive - the previous value is lost. This method is reversible by the restore method.

void egglib::Migration::set_row(unsigned int i, double M)

Set migration rates of a given source population.

Change all pairwise migration rates originating from the specified population. The migration rate argument is per population. Change all pairwise migration rates to the value M/(n-1). population should be smaller than n. If this condition is not met, the behaviour is not defined. This method is definitive - the previous value is lost.

void egglib::Migration::set_row_R(unsigned int i, double M)

Set migration rates from a pop (recursive)

Change all pairwise migration rates originating from the specified population. The migration rate argument is per population. Change all pairwise migration rates to the value M/(n-1). population should be smaller than n. If this condition is not met, the behaviour is not defined. This method is reversible by the restore method. In M/(n-1).

Event class

class

Generic class for all historical events.

All types of events are implemented by this class. To create an event of a given type, pass the date and the appropriate flag to the constructor and then set the correct parameters. The table below gives the meaning of parameters for all types of events (blank when the parameter is undefined).

* | type       | date | param             | index       | dest     | number1         | number2         |
* | :--------- | :--: | :---------------: | :---------: | :------: | :-------------: | :-------------: |
* | none       | +    |                   |             |          |                 |                 |
* | change_N   | +    | new size          | population* |          |                 |                 |
* | change_M   | +    | new popwise rate  |             |          |                 |                 |
* | change_Mp  | +    | new pairwise rate | population  | dest pop |                 |                 |
* | change_G   | +    | new rate          | population* |          |                 |                 |
* | change_s   | +    | new rate          | population* |          |                 |                 |
* | change_R   | +    | new rate          |             |          |                 |                 |
* | delayed    | +    |                   | population  | label    | haploid samples | diploid samples |
* | admixture  | +    | migr proba        | source pop  | dest pop |                 |                 |
* | bottleneck | +    | strength          | population* |          |                 |                 |
*

(*) for those events (and them only), it is possible to use egglib::MAX as population index to specify “all populations”.

Public Types

enum egglib::Event::Type

Type of the event.

Values:

no event (use only as list head)

change population size

change all migration rates

change a pairwise migration rate

change exponential growth rate/decline

change autofertilization rate

change recombination rate

cause immediate coalescences in a population

move lineages from a population to another

delayed sample

Public Functions

egglib::Event::Event(Type type, double date)

Constructor.

Depending on the object type, it is required to specify at least one other parameter in addition to the date.

void egglib::Event::copy(const Event &src)

Copy all members (except links) from a source.

double egglib::Event::date()
const

Get date.

void egglib::Event::disconnect()

Disconnect all events in chain (this and down)

Event::Type egglib::Event::event_type()
const

Get type.

unsigned int egglib::Event::get_dest()
const

Get destination population index or label for delayed events.

unsigned int egglib::Event::get_index()
const

Get locus or main population index.

unsigned int egglib::Event::get_number1()
const

Get first number.

unsigned int egglib::Event::get_number2()
const

Get second number.

double egglib::Event::get_param()
const

Get parameter.

void egglib::Event::insert(Event *event)

Insert an event in the chain.

The current previous and next links value of the passed instance (if any) will be ignored and overwritten. This method should be called only on the head item of a chain. It is illegal to load a none type.

void egglib::Event::move(double date)

Change the date and update position.

The event is moved either up or down the chain of events until its date is at the right location.

It is possible to call this method on an unconnected instance.

Event *egglib::Event::next()

Next event (NULL if end)

void egglib::Event::perform(Params *param, Coalesce *coal)

Apply the event.

The parameter set is modified and (for some events only) the coalesce instance as well. It is illegal to call this on an event of none type.

Event *egglib::Event::prev()

previous event (NULL if end)

void egglib::Event::set_dest(unsigned int d)

Set the event destination population.

For admixture and pairwise migration rate change events: sets the index of the destination population. For delayed sample events: set the value used as population label for this sample.

void egglib::Event::set_index(unsigned int i)

Set the event index.

Sets the population (for event types that support it) that should be affected by the event. All events except recombination rate changes and global migration rate changes must have a value for this parameter (except the type “none” that must not be used as a genuine event). The value egglib::MAX may be used (but only for some kinds of events; see class description) to specify “all populations”.

void egglib::Event::set_number1(unsigned int n)

Set the number of haploid samples.

Number of samples with one sampled allele (for delayed sample events).

void egglib::Event::set_number2(unsigned int n)

Set the number of diploid samples.

Number of samples with both sampled alleles (for delayed sample events).

void egglib::Event::set_param(double p)

Set the event parameter.

Most types have a value for this, but the meaning varies: it is generally a new rate (migration, recombination, selfing) or a new size (population size) and also the strength for bottleneck and instant emigration for admixture.

Main class

class

Coalescence simulation manager.

The coalesce object must take a Params address. The Params object will not modified. Coalesce objects cannot be created without passing a valid Params address (NULL is out of question) and cannot be copied either. Not that all action must be effected using the simul method, including setting parameters. The other methods such as coalescence, delayedSample and so on are meant to be called by change types and should not be used directly. This does not apply, of course, to DataHolder and tree getters.

Header: <egglib-cpp/Coalesce.hpp>

Public Functions

egglib::Coalesce::Coalesce()

Constructor.

egglib::Coalesce::~Coalesce()

Destructor.

void egglib::Coalesce::admixt(unsigned int source, unsigned int dest, double proba)

Admixture.

Instant migration of lineages from a given population to another.

It is legal to provide a probability equal to 0 or 1, and to use a source population which does not actually have migrants. It is not leage to use equal source and dest indexes.

Parameters
  • source -

    index of the population providing migrants.

  • dest -

    index of the population accepting migrants.

  • proba -

    migration probability.

void egglib::Coalesce::bottleneck(unsigned int pop, double duration)

Bottleneck.

Perform coalescence in the specified population during a given amount of time.

void egglib::Coalesce::coalescence(unsigned int pop, unsigned int i, unsigned int j)

Perform a single coalescence event.

It is not required that i<j. However it is required that i!=j and that all indices are within their respective ranges.

Parameters
  • pop -

    population index.

  • i -

    index of the first lineage to coalesce.

  • j -

    index of the second lineage to coalesce.

DataHolder const *const egglib::Coalesce::data()
const

Get simulated genotypes.

Get the DataHolder (as a matrix object) corresponding to the last simulated data set. Only meaningful if the simul() method has been called with mutate=true. Otherwise, the returned instance will correspond to the previous simulations, or will be empty. In all cases, there will be two levels of structure, the first corresponding to populations (from 0 to k-1) where k is the number of populations, and the second corresponding to individual indices (ranging from 0 to the total number of sampled individuals over all populations, minus one). An individual is represented by either one or two consecutive samples. The allele values are depending on the mutation model. For FSS, they are restricted to the range [0, K-1] where K is the number of allowed alleles, while for other models unrestricted positive and negative values are allowed. In the infinitely many sites model (that is, if the sites parameter is 0), only the variable sites are exported, and all exported sites are variables. If the number of sites is finite, all sites are exported and there can be invariable sites. Beware that the address to the internally stored object is returned, and the address must be considered invalid after the next call to the simul method.

void egglib::Coalesce::delayedSample(double date, unsigned int pop, unsigned int n1, unsigned int n2)

Add a delayed sample to the simulation.

This method ought to be called by the class DelayedSample.

Parameters
  • date -

    date of the event (should be larger than the current date).

  • pop -

    index of the population (should be smaller than the current number of populations).

  • n1 -

    number of single samples.

  • n2 -

    number of double samples.

void egglib::Coalesce::migrate(unsigned int source, unsigned int i, unsigned int dest)

Perform a single migration event.

It is required that all indices are within their repestive ranges.

Parameters
  • source -

    source population.

  • i -

    index of the lineage to migrate.

  • dest -

    destination population.

void egglib::Coalesce::mutate()

Perform mutation only.

The simul() method must have been called previous (with the mutate boolean true or false). The DataHolder will be regenerated.

unsigned int egglib::Coalesce::number_of_trees()
const

Number of trees.

This method provides access to the number of tree, or number of recombination segments (equal to the number of recombination events plus one). If no recombination event has occurred, the value is 1. If no simulation has been performed, the value is 0.

void egglib::Coalesce::simul(Params *params, Random *random, bool mutate)

Perform a simulation.

Parameters
  • params -

    the address of a Params instance to provide parameters. The Params object will be restored after completion of the simulation.

  • random -

    address of a random number generator.

  • mutate -

    if this boolean is false, mutation phase will be skipped and the DataHolder member will keep any previous value (or will be empty otherwise). In all cases the trees will be generated.

double egglib::Coalesce::site_position(unsigned int mut)
const

Get the position of a mutation.

This method must not be called if the number of sites was fixed to a non-zero value in the Params instances used to configure simulations. This method returns the position of a given mutation when the mutation was randomly drawn. It is guaranteed that positions are in increasing order. It is required that the index is smaller than the number of mutations drawn from the last simulation.

Tree const *const egglib::Coalesce::tree(unsigned int i)
const

Get a tree.

This method provides access to a given tree. The passed index must be smaller than the value returned by the method number_of_trees(). Obviously, a simulation must have been performed. The returned pointer refers to an instance which is stored within the Coalesce object and managed by it. The object can not be modified by any means. Out of range values might result in program crash or even worse. Beware that the address to the internally stored object is returned, and the address must be considered invalid after the next call to the simul() method.

double egglib::Coalesce::tree_start(unsigned int i)
const

Get the start position of a tree.

double egglib::Coalesce::tree_stop(unsigned int i)
const

Get the stop position of a tree.

ARG representation

class

This class handles a bifurcating tree.

Each tree corresponds to a given segment of chromosome. The class provides a method for generating allelic data for a given site, and exposes to root node, allowing recursive tree exploration.

Header: <egglib-cpp/Tree.hpp>

Public Functions

egglib::Tree::Tree(unsigned int numberOfLeaves, double start, double stop)

Constructor.

The constructor builds a preliminary tree consisting of only leaves, all available for coalescence. Leaves are labelled from 0 to (numberOfLeaves - 1).

Parameters
  • numberOfLeaves -

    number of leaves (unconnected).

  • start -

    start of the region covered by the tree.

  • stop -

    end of the region covered by the tree

egglib::Tree::Tree(const Tree &tree)

Copy constructor.

Deep copy of the tree structure.

virtual egglib::Tree::~Tree()

Destructor.

unsigned int egglib::Tree::addNode(double t, unsigned int label)

Add a node and return its index.

void egglib::Tree::clear_mutations()

Remove already stored mutations.

unsigned int egglib::Tree::coal(unsigned int nodeIndex1, unsigned int nodeIndex2, double time)

Coalesce two lineages

The two indices must be different and be valid indices for this Tree object.

Return
The index of the parent node.
Parameters
  • nodeIndex1 -

    first node to coalesce.

  • nodeIndex2 -

    second node to coalesce.

  • time -

    absolute time point of the event.

double egglib::Tree::cov()
const

Length of the segment covered by the tree.

The tree length is not taken into account; this is exactly stop() - start().

double egglib::Tree::L()
const

Total tree length.

unsigned int egglib::Tree::mutate(unsigned int site, DataHolder &data, const Params *params, Random *random)

Generate genetic data for a given site.

The DataHolder instance should have the correct dimensions but its values do not need to be initialized.

Return
The number of mutations (>=0) found at this site.
Parameters
  • site -

    index of the site (that is, column of the DataHolder) for which mutations should be generated.

  • data -

    DataHolder object in which mutations should be placed.

  • params -

    parameters holder (where parameters should be read).

  • random -

    a random number generator.

unsigned int egglib::Tree::nnodes()
const

Get the number of nodes.

Node *const egglib::Tree::node(unsigned int i)
const

Get a node address by its index.

The index must be within range.

Tree &egglib::Tree::operator=(Tree tree)

Copy assignment operator.

Deep copy of the tree structure.

void egglib::Tree::recomb(double point, Tree *new_tree)

Generate a new tree by recombination.

It is assumed that the recombination point lies between the start and stop positions of this tree. This tree is copied to the new tree whose address is passed. The current tree’s interval is shrunk to the what is at the left of the recombination point (from start to point), and the complement to the new tee.

void egglib::Tree::reset(unsigned int numberOfLeaves, double start, double stop)

Reset to initial state.

Restore the object as newly created, but retain allocated memory arrays. Leaves are automatically labelled from 0 up to (numberOfLeaves - 1).

Parameters
  • numberOfLeaves -

    number of leaves (unconnected).

  • start -

    start of the region covered by the tree.

  • stop -

    end of the region covered by the tree

Node *egglib::Tree::root()
const

Get the root of the tree.

It is mandatory that the tree is completed (all coalescence has been performed, with only one lineage left) before using this method. In this case, recursing over node descendants allows to explore the tree.

double egglib::Tree::start()
const

Start position of the segment covered by the tree.

double egglib::Tree::stop()
const

End position of the segment covered by the tree.

class

Implements a node of bifurcating tree.

Companion of the class Tree. A given Node instance is connected to either no or two other instances (“sons”). Sons addresses are lost by calling the destructor or the assignment operator. The sons are never destroyed.

The member default values are:

Header: <egglib-cpp/Tree.hpp>

Public Functions

egglib::Node::Node()

Default constructor.

Initialize members to default values.

egglib::Node::Node(const Node &src)

Copy constructor.

Copy members values and sons addresses.

virtual egglib::Node::~Node()

Destructor.

void egglib::Node::addMutation(unsigned int site, double pos)

Add a mutation.

Parameters
  • site -

    site index

  • pos -

    site position (between 0 and 1)

void egglib::Node::clear_mutations()

Remove already stored mutations.

double egglib::Node::date()
const

Get node date.

double egglib::Node::get_L()
const

Get branch length.

bool egglib::Node::is_terminal()
const

Internal or terminal node.

unsigned int egglib::Node::label()
const

Gets label.

double egglib::Node::mutationPos(unsigned int mut)
const

Get the site position of a mutation.

Warning: no out-of-bound check is performed!

TODO check if this method is needed anyway

unsigned int egglib::Node::mutationSite(unsigned int mut)
const

Get the site index of a mutation.

Warning: no out-of-bound check is performed!

unsigned int egglib::Node::nmut()
const

Number of mutations borne by this node.

Node &egglib::Node::operator=(const Node &src)

Assignment operator.

Copy members values and sons addresses.

void egglib::Node::reset()

Restore initial state.

Deliver object as new, but retain reserved memory.

void egglib::Node::set_internal(double t, unsigned int son1, unsigned int son2)

Specify an internal node.

Other members than those specified by argument are not changed (except boolean is_terminal).

Parameters
  • t -

    node time.

  • son1 -

    first descending node.

  • son2 -

    second descending node.

void egglib::Node::set_L(double value)

Set branch length.

The branch is going from the ancestror node to this node. As a result, the root has a branch length of 0.

void egglib::Node::set_terminal(double t, unsigned int label)

Specify a terminal node.

Other members than those specified by argument are not changed (except boolean is_terminal).

Parameters
  • t -

    node time (>=0).

  • label -

    sample label.

unsigned int egglib::Node::son1()
const

Get first son.

If the node is terminal, UNKNOWN is returned.

unsigned int egglib::Node::son2()
const

Get second son.

If the node is terminal, UNKNOWN is returned.

The last one is only for being used during simulations:

class

Implements a single ARG lineage.

This class implements a (recombinable and coalescable) lineage at the ancestral recombination graph (ARG) level. It essentially points to a node in each of the trees used to model the ARG.

Header: <egglib-cpp/Lineage.hpp>

Public Functions

egglib::Lineage::Lineage(unsigned int ntrees)

Constructor (no default constructor)

Initializes the size of the node mapping array (node index in each of the trees of the ARG). The array cells are not initalized.

Parameters
  • ntrees -

    number of Trees of the ARG (if the ARG has more trees, excess ones are considered not to be covered by this lineage.

virtual egglib::Lineage::~Lineage()

Destructor.

void egglib::Lineage::addTree(unsigned int node, double cov)

Add a cell to the node mapping array.

Parameters
  • node -

    node index (with respect to the tree).

  • cov -

    coverage of the tree (increment the total coverage of the lineage).

double egglib::Lineage::cov()
const

Get total coverage.

This is the sum of tree coverage (stop-start) for all trees where this lineage is represented.

unsigned int egglib::Lineage::get_node(unsigned int treeIndex)
const

Access node mapping cell.

This methods returns UNKNOWN if the considered tree is not covered. The index must be in range.

void egglib::Lineage::reset(unsigned int ntrees)

Reset the number of tree.

It allocates the node mapping array (only allocate if no reserved memory available). This method does not initialize the content of the table. It sets the coverage to 0.

void egglib::Lineage::set_node(unsigned int index, unsigned int node, double cov)

Set node mapping cell.

Indices must be valid, except that node must be UNKNOWN for trees that are not covered.

Parameters
  • index -

    tree index.

  • node -

    node index (with respect to the tree).

  • cov -

    coverage of the tree (increment the total coverage of the lineage).