Core components¶
Constants¶
These constant values have use to give special meaning to input parameters or return values of functions. Usually they mean “data not available” or “irrelevant value”. Different names are given to the same value in order to make them more relevant to their context. Read the documentation carefully.

const int
MISSINGDATA
¶ Missing data (large value)

const unsigned int
MAX
¶ Unknown/undefined (large value)

const unsigned int
UNKNOWN
¶ Unknown value (large value)

const unsigned int
MISSING
¶ Missing data (large value)

const unsigned int
OUTGROUP
¶ Outgroup (large value)

const unsigned long int
BEFORE
¶ Value before the first (large value)

const char
MAXCHAR
¶ Unknown value (large value)

const double
UNDEF
¶ Unknown value (small / very negative value)
Basic classes¶
DataHolder¶
 class
Integer data set.
Holds a data set with associated sample names and group information. The data consists of given numbers of ingroup and outgroup samples, which can all have a different number of sites, unless the object is configured to be a matrix. In that cases, it is assumed that all samples have the same number of sites as the first loaded sample. There can be any number of group levels (but this number must be the same for all samples), meaning that samples can be described by several group labels in addition to their name. Group labels are not group indices (they do not need to be consecutive). There is a separate data set for sample belonging to the outgroup. There can be any number of outgroup samples. The outgroup has always one level of group labels, but the labels are not initialized. All data are represented by signed integers. Note that none of the accessors performs outofbound checking. The user is responsible to provide valid indices. This class follows a memory caching system: allocated memory is never freed with the aim of efficiently reusing the same object.
Header: <egglibcpp/DataHolder.hpp>
Public Functions

egglib::DataHolder::
DataHolder
(bool is_matrix)¶ Default constructor.
Create an empty matrix. The object will be usable only when resizers will have been called.
 Parameters
is_matrix
determines if the object is configured to a matrix (that is, where all ingroup and outgroup samples have the same number of sites). If so, the user is responsible to ensure that all loaded samples are consistent. This value can be changed using the method is_matrix(bool).

egglib::DataHolder::
DataHolder
(const DataHolder &src)¶ Copy constructor.
The reserved memory of the source is not copied.

virtual
egglib::DataHolder::
~DataHolder
()¶ Destructor.

void
egglib::DataHolder::
clear
(bool is_matrix)¶ Clear object.
Actually clears all memory stored by the object (including cache). All memory vector data are effectively lost and memory is released.

void
egglib::DataHolder::
del_sample_i
(unsigned int sam)¶ Delete a sample.
Delete the specified sample and decrease index of all subsequent samples by one. If there is only one sequence in the instance, set the number of sites (the maximal number of sites for nonmatrix objects) to 0.

void
egglib::DataHolder::
del_sample_o
(unsigned int sam)¶ Delete an outgroup sample.
Delete the specified sample and decrease index of all subsequent samples by one. If there is only one sequence in the instance, set the number of sites (the maximal number of sites for nonmatrix objects) to 0.

void
egglib::DataHolder::
del_sites
(unsigned int start, unsigned int stop)¶ Delete a given range of sites.
The sites are removed for all ingroup and outgroup samples. The index must be valid. This method may be used for both matrix and nonmatrix objects.
If the stop argument is larger than the number of sites (the number of sites for this sample in the case of a nonmatrix object), then sites are removed until the end of the sequence. If the start argument is larger or equal to the number of sites (the number of sites for this sample in the case of a nonmatrix object), then nothing is done.
 Parameters
start
start position of the range to remove.
stop
stop position of the range to remove (this site IS NOT removed).

void
egglib::DataHolder::
del_sites_i
(unsigned int sam, unsigned int start, unsigned int stop)¶ Delete a given range of sites for an ingroup sample.
As del_sites(unsigned int, unsigned int) but for a single sample. This method may not be called on a matrix object.

void
egglib::DataHolder::
del_sites_o
(unsigned int sam, unsigned int start, unsigned int stop)¶ Delete a given range of sites for an outgroup sample.
As del_sites(unsigned int, unsigned int) but for a single sample. This method may not be called on a matrix object.

unsigned int
egglib::DataHolder::
find
(unsigned int sam, bool of_outgroup, VectorInt &motif, unsigned int start, unsigned int stop)¶
const Find the start position of the first match of a motif.
 Return
 The index of the start positon of the first exact match for the passed set of values, or egglib::MAX if no match was found (within the specified region).
 Parameters
sam
sample index.
of_outgroup
specifies whether the sample is in the outgroup.
motif
the list of integers representing the motif to be found.
start
at which to start search. No returned value will be smaller than this value.
stop
position at which to stop search (the motif cannot overlap this position). No returned value will be larger than stop  n.

unsigned int
egglib::DataHolder::
get_group_i
(unsigned int sam, unsigned int lvl)¶
const Get a group label.
The indices must be valid, otherwise a segmentation fault or aberrant behaviour will occur.
 Parameters
sam
sample index.
lvl
group level index.

unsigned int
egglib::DataHolder::
get_group_o
(unsigned int sam)¶
const Get the group label for an outgroup sample.
The indexe must be valid, otherwise a segmentation fault or aberrant behaviour will occur. There is necessarily one group level for outgroups, and the default value is 0.
 Parameters
sam
sample index.

int
egglib::DataHolder::
get_i
(unsigned int sam, unsigned int sit)¶
const Get an ingroup data entry.
The indices must be valid, otherwise a segmentation fault or aberrant behaviour will occur.
 Parameters
sam
sample index.
sit
site index.

bool
egglib::DataHolder::
get_is_matrix
()¶
const Check if the object is configured to be a matrix.

const char *
egglib::DataHolder::
get_name_i
(unsigned int sam)¶
const Get an ingroup name.

const char *
egglib::DataHolder::
get_name_o
(unsigned int sam)¶
const Get an outgroup name.

unsigned int
egglib::DataHolder::
get_ngroups
()¶
const Get the number of group levels.

unsigned int
egglib::DataHolder::
get_nsam_i
()¶
const Get the number of ingroup samples.

unsigned int
egglib::DataHolder::
get_nsam_o
()¶
const Get the number of outgroup samples.

unsigned int
egglib::DataHolder::
get_nsit
()¶
const Get the number of sites.
This method may not be called on a nonmatrix object.

unsigned int
egglib::DataHolder::
get_nsit_i
(unsigned int sam)¶
const Get the number of sites for an ingroup sample.
This method may not be called on a matrix object.

unsigned int
egglib::DataHolder::
get_nsit_o
(unsigned int sam)¶
const Get the number of sites for an outgroup sample.
This method may not be called on a matrix object.

int
egglib::DataHolder::
get_o
(unsigned int sam, unsigned int sit)¶
const Get an outgroup data entry.
The indices must be valid, otherwise a segmentation fault or aberrant behaviour will occur.
 Parameters
sam
sample index.
sit
site index.

void
egglib::DataHolder::
insert_sites
(unsigned int pos, unsigned int num)¶ Insert sites at a given position.
Increase the number of sites for all samples. This method may be used for matrix or nonmatrix objects. Note that the inserted sites are not initialized.
 Parameters
pos
the position at which to insert sites. The new sites are inserted before the specified index. Use 0 to add sites at the beginning of the sequence, and the current number of sites to add sites at the end. If the value is larger than the current number of sites, sites are added at the end of the sequence. Therefore it is possible to use egglib::MAX as the position to specify that new sites must be inserted at the end.
num
number of sites at add.

void
egglib::DataHolder::
insert_sites_i
(unsigned int sam, unsigned int pos, unsigned int num)¶ Insert sites at a given position for an ingroup sample.
As insert_sites(unsigned int, unsigned int, int) but for only one sample of the ingroup. Available only for nonmatrix objects.

void
egglib::DataHolder::
insert_sites_o
(unsigned int sam, unsigned int pos, unsigned int num)¶ Insert sites at a given position for an outgroup sample.
As insert_sites(unsigned int, unsigned int, int) but for only one sample of the outgroup. Available only for nonmatrix objects.

bool
egglib::DataHolder::
is_equal
()¶
const Test if all sequences have the same length.
True if no sequences at all. Only valid for containers.

void
egglib::DataHolder::
name_append_i
(unsigned int sam, const char *ch)¶ Add character at the end of specified ingroup name.

void
egglib::DataHolder::
name_append_o
(unsigned int sam, const char *ch)¶ Add characters at the end of specified ingroup name.

void
egglib::DataHolder::
name_appendch_i
(unsigned int sam, char ch)¶ Add a character to the specified ingroup name.

void
egglib::DataHolder::
name_appendch_o
(unsigned int sam, char ch)¶ Add a character to the specified outgroup name.

DataHolder &
egglib::DataHolder::
operator=
(const DataHolder &src)¶ Assignment operator.
The reserved memory of the source is not copied. The reserved memory of the current object is retained.

void
egglib::DataHolder::
reserve
(unsigned int nsi, unsigned int nso, unsigned int ln, unsigned int ng, unsigned int ls)¶ Reserve memory to speed up data loading.
This method does not change the size of the data set contained in the instance, but reserves memory in order to speed up incremental loading of data. The passed values are not required to be accurate. In case the instance has allocated more memory than what is requested, nothing is done (this applies to both dimensions independently). It is always valid to use 0 for any values (in that case, nothing is done). Note that one character is always preallocated for all names.
 Parameters
nsi
expected number of ingroup samples.
nso
expected number of outgroup samples.
ln
expected length of names.
ng
expected number of groups.
ls
expected number of sites (the same for all ingroup and outgroup samples, whichever the object is a matrix or not).

void
egglib::DataHolder::
reset
(bool is_matrix)¶ Restore object to initial state.
This method is designed to allow reusing the object and reusing previously allocated memory. All data contained in the instance is considered to be lost, but allocated memory is actually retained to speed up later resizing operations.

void
egglib::DataHolder::
set_group_i
(unsigned int sam, unsigned int lvl, unsigned int label)¶ Set a group label.
The indices must be valid, otherwise a segmentation fault or aberrant behaviour will occur.
 Parameters
sam
sample index.
lvl
group level index.
label
group label.

void
egglib::DataHolder::
set_group_o
(unsigned int sam, unsigned int label)¶ Set the group label for an outgroup sample.
The indexe must be valid, otherwise a segmentation fault or aberrant behaviour will occur. There is necessarily one group level for outgroups.
 Parameters
sam
sample index.
label
group label.

void
egglib::DataHolder::
set_i
(unsigned int sam, unsigned int sit, int value)¶ Set an ingroup data entry.
The indices must be valid, otherwise a segmentation fault or aberrant behaviour will occur.
 Parameters
sam
sample index.
sit
site index.
value
allele value.

void
egglib::DataHolder::
set_is_matrix
(bool flag)¶ Configure the object (not) to be a matrix.
If a nonmatrix object is converted to a matrix, the user is responsible of ensuring that all samples (including outgroup) have the same number of samples. The method will assume that all samples have the same number of sites as the first sample among the ingroup and outgroup. There is no requirement for converting a matrix to a nonmatrix.

void
egglib::DataHolder::
set_name_i
(unsigned int sam, const char *name)¶ Set an ingroup name.

void
egglib::DataHolder::
set_name_o
(unsigned int sam, const char *name)¶ Set an outgroup name.

void
egglib::DataHolder::
set_ngroups
(unsigned int ngrp)¶ Set the number of group levels.
Perform memory allocation as needed but does not initialize new values.

void
egglib::DataHolder::
set_nsam_i
(unsigned int nsam)¶ Set the number of ingroup samples.
Perform memory allocation as needed but does not initialize new values (except names). If the object is a matrix, new samples are set to have the current number of sites. Otherwise, new samples have no sites. Set the number of samples to a smaller value equals to remove the last samples.

void
egglib::DataHolder::
set_nsam_o
(unsigned int nsam)¶ Set the number of outgroup samples.
As nsam(unsigned int) but for the outgroup data table.

void
egglib::DataHolder::
set_nsit
(unsigned int val)¶ Set the number of sites.
Perform memory allocation as needed but does not initialize new values. It is possible to use this method for both matrix and nonmatrix objects. In both cases, the effective result is that all ingroup and outgroup samples are resized to the specified number of sites.

void
egglib::DataHolder::
set_nsit_i
(unsigned int sam, unsigned int val)¶ Set the number of sites for an ingroup sample.
Similar to nsit(unsigned int) but for only one sample of the ingroup. Available only for nonmatrix objects.

void
egglib::DataHolder::
set_nsit_o
(unsigned int sam, unsigned int val)¶ Set the number of sites for an outgroup sample.
Similar to nsit(unsigned int) but for only one sample of the outgroup. Available only for nonmatrix objects.

void
egglib::DataHolder::
set_o
(unsigned int sam, unsigned int sit, int value)¶ Set an outgroup data entry.
The indices must be valid, otherwise a segmentation fault or aberrant behaviour will occur.
 Parameters
sam
sample index.
sit
site index.
value
allele value.

void
egglib::DataHolder::
to_outgroup
(unsigned int sam, unsigned int label)¶ Move a sample to the outgroup.
The specified sample is moved to the outgroup and its group labels are discarded. Obviously, this decreases the ingroup size by 1, and increases the outgroup size accordingly.
 Parameters
sam
ingroup sample index.
label
group label to assign to the sample after it it is moved to the outgroup (use 0 if not relevant).

bool
egglib::DataHolder::
valid_phyml_aa
()¶
const True if all data are amino acids expected by PhyML (for alignment)

bool
egglib::DataHolder::
valid_phyml_names
()¶
const True if all names are got for PhyMl.

bool
egglib::DataHolder::
valid_phyml_nt
()¶
const True if all data are nucleotides expected by PhyML (for alignment)

GeneticCode¶
 class
Hold genetic code tables.
Handle genetic codes. All genetic codes defined by the National Centor for Biotechnology Information are supported.
Header: <egglibcpp/GeneticCode.hpp>
Public Functions

egglib::GeneticCode::
GeneticCode
(unsigned int index)¶ Constructor.
Build an instance of the genetic code.

egglib::GeneticCode::
GeneticCode
()¶ Default constructor.
Like GeneticCode(int), except that the code used is 1 (standard).

char
egglib::GeneticCode::
aminoacid
(unsigned int codon)¶
const Returns the translation of a codon.
The codon should be passed as an integer code (see codon()). This methods returns the single amino acid code for valid codons (represented by integers in the range 063), and ‘X’ for any other integer.

unsigned int
egglib::GeneticCode::
get_code
()¶
const Get the current genetic code.

const char *
egglib::GeneticCode::
name
()¶
const Get the name of the genetic code.

double
egglib::GeneticCode::
NSsites
(unsigned int codon, bool ignorestop)¶
const Give the number of nonsynonymous sites of a codon.
The number is in the range 03 (3 is all changes at all of the three positions would lead to a nonsynonymous change).
 Parameters
codon
codon integer code (see codon()).
ignorestop
if true, potential changes to stop codons are excluded and all stop codons return 0; if false, changes to stop codons are considered to be nonsynonymous.

double
egglib::GeneticCode::
NSsites
(const SiteHolder &site1, const SiteHolder &site2, const SiteHolder &site3, unsigned int &num_samples, bool ignorestop, int A, int C, int G, int T)¶
const Give the number of nonsynonymous sites of a codon site.
The number is in the range 03 (3 is all changes at all of the three positions would lead to a nonsynonymous change). This is the same as NSistes(unsigned int, bool), but average over all samples based on provided Site instances.
All three codon positions must have the same number of samples such as the ith nucleotides at the three sites give the codon for the ith sample. (Same ploidy as well.)
 Parameters
site1
first position of the codon site.
site2
second position of the codon site.
site3
third position of the codon site.
num_samples
variable used to provide the number of samples analyzed by the method (that is, number of samples minus number of samples containing at least one missing data). The original value of the variable is ignored and is modified by the instance. If 0, the return value should be ignored.
ignorestop
if true, potential changes to stop codons are excluded and all stop codons are treated as missing data; if false, changes to stop codons are considered to be nonsynonymous.
A
integer value representing bases A.
C
integer value representing bases C.
G
integer value representing bases G.
T
integer value representing bases T.
All nucleotides represented by an integer allele values not matching either of the four values passed as the A, C, G and T arguments are considered as missing data.
 Warning
 This method will not accept coding sequences mixing upper and lower case characters. It is however possible to configure how the four nucleotides are represented.

double
egglib::GeneticCode::
NSsites
(const SiteHolder &codons, unsigned int &num_samples, bool ignorestop, int A, int C, int G, int T)¶
const Give the number of nonsynonymous sites of a codon site.
See NSsites(const SiteHolder&, const SiteHolder&, const SiteHolder&, unsigned int&, bool, int, int C, int, int). Do the same, except that this version takes a single SiteHolder reference instance of three, and the SiteHolder reference passed to this function contains integer alleles representing codons, that is in the range 063. Other values are considered to be missing data.

void
egglib::GeneticCode::
set_code
(unsigned int index)¶ Set the genetic code.

double
egglib::GeneticCode::
Ssites
(unsigned int codon, bool ignorestop)¶
const Give the number of synonymous sites of a codon.
The number is in the range 03 (3 is all changes at all of the three positions would lead to a nonsynonymous change).
 Parameters
codon
codon integer code (see codon()).
ignorestop
if true, potential changes to stop codons are excluded and all stop codons return 0; if false, changes to stop codons are considered to be nonsynonymous.

double
egglib::GeneticCode::
Ssites
(const SiteHolder &site1, const SiteHolder &site2, const SiteHolder &site3, unsigned int &num_samples, bool ignorestop, int A, int C, int G, int T)¶
const Like NSsites, for synonmous sites.

double
egglib::GeneticCode::
Ssites
(const SiteHolder &codons, unsigned int &num_samples, bool ignorestop, int A, int C, int G, int T)¶
const Like NSsites, but for synonymous sites.

bool
egglib::GeneticCode::
start
(unsigned int codon)¶
const Tells if a codon is an initiation codon.
The codon should be passed as an integer code (see codon()). This methods returns
true
if the codon in an initiation codon (including any of the alternative initiation codons known for the genetic code of the set for the current object), andfalse
otherwise (including for invalid codon codes).

int
egglib::GeneticCode::
translate
(int first, int second, int third, bool smart)¶ Translate a codon directly.
Codon positions should be ASCIIcoded. Return ‘X’ if missing data or invalid nucleotides. For fourfold degenerate positions. Codons including nonambiguity characters always return ‘X’ (even ‘?’ at a fourfold degenerate position), except if the codon is ‘’ (in that case, ‘‘ is returned).
 Return
 ASCIIcoded aminoacid.
 Parameters
first
first codon position.
second
second codon position.
third
third codon position.
smart
smart translation.
Public Static Functions

static char
egglib::GeneticCode::
base
(unsigned int codon, unsigned int index)¶ Returns one of the base of a codon.
This method can be called on the class directly (as in
GeneticCode::base(0, 0)
and it is not dependent on the specification of a genetic code). Return
 Returns the character at the specified position of the codon (as an uppercase character).
 Warning
 The methods returns ‘?’ if it cannot perform base extraction, but it is not guaranteed that all invalid arguments will be detected properly.
 Parameters
codon
the codon should be passed as an integer code (see codon()). Only values in the range 063 are supported.
index
index of the base to extract (only 0, 1 and 2 are accepted; other values will result in aberrant outcome).

unsigned int
egglib::GeneticCode::
codon
(char first, char second, char third)¶ Return the integer code for a codon.
The first, second and third bases of the codon must be passed as character arguments. The case of characters is ignored. Returns and integer in the range [0, 63] for the 64 codons (see the table below).
The codons are identified by single integers as given by the table below:
 Warning
 The base ‘U’, although biologically relevant, is treated as an invalid base.
 Note
 As a static method, this method can be called as
GeneticCode::codon(base1, base2, base3)
directly (it does not require instanciation of an object) and it is not dependent on any genetic code specification.
All other triplets: egglib.UNKNOWN.

static bool
egglib::GeneticCode::
diff1
(unsigned int codon1, unsigned int codon2)¶ Check if the first position of two codons is identical.

static bool
egglib::GeneticCode::
diff2
(unsigned int codon1, unsigned int codon2)¶ Check if the second position of two codons is identical.

static bool
egglib::GeneticCode::
diff3
(unsigned int codon1, unsigned int codon2)¶ Check if the third position of two codons is identical.

static unsigned int
egglib::GeneticCode::
int2codon
(unsigned int base1, unsigned int base2, unsigned int base3)¶ Return the integer code for a codon.
The first, second and third bases of the codon must be passed as integer code, according to the following mapping: 0 for T, 1 for C, 2 for G and 3 for T. This code must absolutely be following and no other value may be passed. Returns and integer in the range [0, 63] for the 64 codons (see the documentation of the method codon(char, char, char)).
 Note
 As a static method, this method can be called as
GeneticCode::int2codon(base1, base2, base3)
directly (it does not require instanciation of an object) and it is not dependent on any genetic code specification.

static unsigned int
egglib::GeneticCode::
ndiff
(unsigned int codon1, unsigned int codon2)¶ Returns the number of nucleotide differences between two codons.
This method can be called on the class directly (as in
GeneticCode::base(0, 0)
and it is not dependent on the specification of a genetic code).This method is only valid if both arguments are less than
 Returns only 0, 1, 2 or 3.

unsigned int
egglib::GeneticCode::
num_codes
()¶ Get the number of available codes.

Random¶
 class
Pseudorandom number generator.
This class implements the Mersenne Twister algorithm for pseudorandom number generation. It is based on work by Makoto Matsumoto and Takuji Nishimura (see http://www.math.sci.hiroshimau.ac.jp/~mMAT/MT/emt.html) and Jasper Bedaux (see http://www.bedaux.net/mtrand/) for the core generator, and the Random class of Egglib up to 2.2 for conversion to other laws than uniform.
Note that different instances of the class have independent chain of pseudorandom numbers. If several instances have the same seed, they will generate the exact same chain of pseudorandom numbers. Note that this applies if the default constructor is used and that instances are created within the same second.
All nonuniform distribution laws generators are based either on the rand_int32() or the standard (halfopen, 32 bit) uniform() methods.
Header: <egglibcpp/Random.hpp>
Public Functions

egglib::Random::
Random
()¶ Constructor with default seed.
Uses the current system clock second as seed.

egglib::Random::
Random
(unsigned long s)¶ Constructor with custom seed.
Favor large, highcomplexity seeds. When using different instances of Random in a program, or different processes using Random, ensure they are all seeded using different seeds.

virtual
egglib::Random::
~Random
()¶ Destructor.

unsigned long
egglib::Random::
binomrand
(long n, double p)¶ Draws a number from a binomial law.
 Parameters
n
number of tests (must be >=0).
p
test probability.

bool
egglib::Random::
brand
()¶ Boolean integer.
Return true with probability 0.5.

double
egglib::Random::
erand
(double expectation)¶ Draws a number from an exponential distribution.
Beware, the argument is the distribution’s mean (and is also 1/lambda where lambda is the rate parameter).

unsigned long
egglib::Random::
get_seed
()¶
const Get seed value.
Return the value of the seed that was used to initiate the instance. If the generator was reseeded, return the seed value passed at that point.

unsigned int
egglib::Random::
grand
(double param)¶ Draws a number from a geometric law.
The argument is the geometric law parameter.

unsigned int
egglib::Random::
irand
(unsigned int ncards)¶ Draws a uniform integer.
The argument is the number of values that can be generated. Returns an integer in the range [0, ncards1]. Therefore, ncards is not included in the range.

double
egglib::Random::
nrand
()¶ Draws a number from a normal distribution.
Return a normal variation with expectation 0 and standard deviation 1. The algorithm used is the polar form of the BoxMuller algorithm. A draw is performed every two calls unless the instance is reseeded.

unsigned int
egglib::Random::
prand
(double p)¶ Draws an integer from a Poisson distribution.
The argument is the Poisson distribution parameters.

unsigned long
egglib::Random::
rand_int32
()¶ Generate a 32bit random integer.
Returns an integer in the range [0, 4294967295] (that is in the range [0, 2^321].

void
egglib::Random::
set_seed
(unsigned long s)¶ Reseed an instance.
Favor large, highcomplexity seeds. When using different instances of Random in a program, or different processes using Random, ensure they are all seeded using different seeds.

double
egglib::Random::
uniform
()¶ Generate a real in the halfopen interval [0,1)
0 is included but not 1.

double
egglib::Random::
uniform53
()¶ Generate a 53bit real.
The value has increased precision: even uniform integer pseudorandom numbers can take a finite number of values (2^32 of them, that is). This method increases the complexity of return values, with a cost as increased computing time.

double
egglib::Random::
uniformcl
()¶ Generate a real in the closed interval [0,1].
Both 0 and 1 are included.

double
egglib::Random::
uniformop
()¶ Generate a real in the open interval (0,1)
Neither 0 nor 1 is included.

Model fitting utilities¶
ABC¶
 class
Model estimation by Approximate Bayesian Computation.
It is required to set the number of statistics and at least one input file name before performing analysis. The analysis itself consists in several steps. (1) Computation of the threshold, which requires to read through all files and imports statistics. In the process, the standard deviation of all statistics will be calculated and will be available. (2) Computation of Euclidean distances and weights and generation of a second sample file with weights and (nonstandardized) statistics with only nonnull weights exported. This step requires that the observed statistics have been set (between steps 1 and 2). (3) Locallinear regression using the fit method. While in the previous steps several models can be mixed, in this step a single model can be processed at a time. The output is a simple file with adjusted parameters only.
Public Types

enum
egglib::ABC::
TransformMode
¶ Modes for parameter transformation.
Values:
Public Functions

egglib::ABC::
ABC
()¶ Constructor.

egglib::ABC::
~ABC
()¶ Destructor.

void
egglib::ABC::
add_fname
(const char *fname, unsigned int number_of_params)¶ Adds a file name.

void
egglib::ABC::
get_threshold
(double tolerance)¶ Gets the regression threshold.
If several files are loaded, the data will be aggregated (note that they must all contain the same number of statistics). At least one file must have been set, and the number of statistics must have been set as well.
 Parameters
tolerance
rejection threshold (proportion of points in the local region.

unsigned int
egglib::ABC::
number_of_samples
()¶
const Gets number of imported data samples.

unsigned int
egglib::ABC::
number_of_samples_part
(unsigned int i)¶
const Gets number of imported data samples for a given file.

void
egglib::ABC::
number_of_statistics
(unsigned int ns)¶ Sets number of statistics.
If data was already present in the instance, it will all be cleared.

void
egglib::ABC::
obs
(unsigned int index, double value)¶ Sets a summary statistics.
The number of statistics must have been set, and the index must not be out of bound.

unsigned int
egglib::ABC::
regression
(const char *infname, const char *outfname, TransformMode mode, const char *header)¶ Performs regression step.
 Return
 Number of data point processed
 Parameters
infname
input file name (generated using rejection)
outfname
output file name (final posterior)
mode
transformation mode
header
outfile file header (name of parameters; if empty string, no header is printed)

unsigned int
egglib::ABC::
rejection
(const char *outfname, bool exportlabels, bool strip)¶ Performs rejection step.
The observed value must have been entered (otherwise the results will be meaningless), and the threshold must have been computed.
 Return
 the number of points in the local region.
 Parameters
outfname
the name of the intermediary file.
exportlabels
if true: exports a tag at the beginning of each line to identify the file or origin of each accepted sample (starting from 1).
strip
if true: remove statistics and weights (only export statistics of accepted points; then the file cannot be used for regression).

double
egglib::ABC::
sd
(unsigned int index)¶
const Gets a standard deviation.
The get_threshold() method must have been called, and the index must not be out of bound.

double
egglib::ABC::
threshold
()¶
const Gets Euclidean threshold.

enum
Neutral networks¶
 class
Training and prediction with neural networks.
This classes implements the back propagation algorithm for training neural network.
The constructor of Neural generates a bare, unusable instance. It is required set up the network with the setup() method and initialize weights (normally using the init_weights() method, or by setting manually all weights) before calling train(). More information is available in the documentation of the setup() method.
Header: <egglibcpp/Neural.hpp>
Public Functions

egglib::nnet::Network::
Network
()¶ Constructor.

virtual
egglib::nnet::Network::
~Network
()¶ Destructor.

void
egglib::nnet::Network::
init_weights
(double range_input, double range_output)¶ Initialize weights.
It is required to call this method before calling training, otherwise the weights have undefined values. All weights are initialized to random values from a uniform distribution in the range [X, X] with X is the range_input argument for all weights connecting input variables to neurons of the hidden layer, and the range_output arguments for all weights connecting neurons of the hidden layer to the output neurons.
 Parameters
range_input
bound for firstlevel weights.
range_output
bound for secondlevel weights.

unsigned int
egglib::nnet::Network::
num_iter
()¶
const Get the number of training iterations.
The internal counter is reset by setup().

void
egglib::nnet::Network::
predict
(const Data &data, unsigned int pattern, bool compute_error)¶ Use the neural network to predict output.
The network must have been trained using a Data instance with the same number of input and output variables. This method generates an array of predicted values based on the current values of the weight (after training) that can be accessed using the prediction() method. If the compute_error argument is true, the error is accessed using error().
 Parameters
data
data set with the correct number of input (always) and output variables (unless compute_errors is false).
pattern
pattern to process.
compute_error
if false, don’t compute errors, and output variables are not considered.

double
egglib::nnet::Network::
prediction
(unsigned int output_var)¶
const Get a predicted output.
The predict() method must have been called. Get the predicted value for one of the output variables. Warning: training (using the method train()) modifies the predicted values and invalidates the results of the method predict().
 Parameters
output_var
index of the output variable (must be smalled than the number of output variables defined by both the Data instances used for training and for prediction).

void
egglib::nnet::Network::
setup
(const Data &training_data, unsigned int training_start, unsigned int training_stop, const Data &testing_data, unsigned int testing_start, unsigned int testing_stop, unsigned int num_hidden, ActivationFunction fun1, ActivationFunction fun2, double rate1, double rate2, double bound, double momentum, Random *random)¶ Setup the network.
Upon call to this method, the network is set up based on the number of input and output variables of the provided data set, and the specified number of neurons in the hidden layer. Note that all neurons (from both the hidden and output layers) are automatically connected to an additional neuron generating a constant input of 1.0 (the bias).
Loading a training data set (using the train_data) is required. The first and lastplusone indexes must be passed to indicate which patterns must be processed for training. Note that the pattern corresponding to the train_stop argument is not included. It is possible to use a train_stop value larger than the number of patterns in train_data. To use all patterns, use train_start=0 and train_stop=egglib::MAX.
Loading a test data set is not mandatory but advisable. The test data set is not used for training but allows to evaluate the predictive ability of the network. To skip this option, pass any data set as test_data (for example, the same as train_data) and set test_start >= test_stop.
 Parameters
training_data
a training data set with at least one input variable, at least one output variable, at least one pattern and all data loaded. The object passed must not modified until training is finished.
training_start
index of the first pattern to process for training.
training_stop
index of the pattern immediately after the last pattern to process for training.
testing_data
a data set to use for testing the predictive ability of the network (not used for fitting). The data set must have the same number of input and output variables as train_data (preferably it will the same object, this nonoverlapping ranges of patterns to process).
testing_start
index of the first pattern to process for testing.
testing_stop
index of the pattern immediately after the last pattern to process for test.
num_hidden
number of neurons in the hidden layer of the network.
fun1
activation function to use for neurons of the hidden layer.
fun2
activation function to use for output neurons.
rate1
rate for weights from input to hidden layer.
rate2
rate for weights form hidden layer to output.
bound
extreme value (for both signs) for weight values.
momentum
proportion of the previous weight change to apply to all weight changes (use 0.0 to skip momentum).
random
Random object to use for generating random numbers (only used for initial weights).

double
egglib::nnet::Network::
testing_error
()¶
const Get total error for testing data.
As training_error() but for the testing data set. Only valid if testing data has been passed to setup().

void
egglib::nnet::Network::
train
(unsigned int num_iter)¶ Train the neural network.
Train the network for a fixed number of iterations. This method must be called iteratively until a given stop criterion is fulfilled. After each call to train(), it is possible to access the the current values of weights, the error for training data and, if testing data have been loaded, the error for testing data.

double
egglib::nnet::Network::
training_error
()¶
const Get total error for training data.
The error is computed sqrt(sum((pred[i]obs[i])^2)) where sqrt is the square root function, sum is the sum over all output nodes, pred[i] is the predicted value provided by output neuron i and obs[i] is the observed value for output variable i. Computed using the training data only. Requires train() but also modified if predict() is called with compute_error=true.
Value of a weight to a hidden neuron.
Access to the current value of the weight of the connexion of a hidden neuron to an input variable or to the bias neuron. The bias weight’s index is equal to the number of input variables.
 Parameters
i
index of the hidden neuron.
j
index of the input variable.
Set the value of a weight to a hidden neuron.
Set the value of the weight of the connexion of a hidden neuron to an input variable or to the bias neuron. The bias weight’s index is equal to the number of input variables.
 Parameters
i
index of the hidden neuron.
j
index of the input variable.
value
weight value.

double
egglib::nnet::Network::
weight_output
(unsigned int i, unsigned int j)¶
const Value of a weight to an output neuron.
Access to the current value of the weight of the connexion of an output neuron to hidden neuron or to the bias neuron. The bias weight’s index is equal to the number of hidden neurons.
 Parameters
i
index of the output neuron.
j
index of the hidden neuron.

void
egglib::nnet::Network::
weight_output
(unsigned int i, unsigned int j, double value)¶ Set the value of a weight to an output neuron.
Set the value of the weight of the connexion of an output variable to a hidden neuron or to the bias neuron. The bias weight’s index is equal to the number of hidden neurons.
 Parameters
i
index of the output neuron.
j
index of the hidden neuron.
value
weight value.

 class
Neuron, that is a node of a neural network
Header: <egglibcpp/Neural.hpp>
Public Functions

egglib::nnet::Neuron::
Neuron
()¶ Create a neuron.
The neuron is created naked, empty and unusable. The user must use config() before doing anything with it. To reuse a neuron after it has been used (e.g. for training again the same network), it is required to call config() again.

virtual
egglib::nnet::Neuron::
~Neuron
()¶ Delete a neuron.

void
egglib::nnet::Neuron::
activate
()¶ Activate the neuron.
Process all incoming connexions, apply weights and the activation function to generate the output.

void
egglib::nnet::Neuron::
config
(ActivationFunction fun, unsigned int num_input)¶ Set up the neuron.
The object is reset, but previously data may not have been reinitialized.
 Parameters
fun
function used for activation of this neuron.
num_input
number of incoming connexions (either neurons of the previous layer or input variables) of this neuron.

double
egglib::nnet::Neuron::
delta
(unsigned int index)¶
const Get the delta value for a given weight.
The propagate() method must have been called, which itself requires that the neuron had been previously loaded with all needed data and activated. This method returns the change of the specified weight value based on the propagated error. The delta value is computed as f’(I) * E[index] where f’ is the derivative of the activation function, I is the input value for this neuron and E[index] is the error associated to the weight in question.

double
egglib::nnet::Neuron::
get_output
()¶
const Collect the output.
Get the output value. Calling this method does not update the output if any input data has changed (make sure to call the activate() method for this).

double
egglib::nnet::Neuron::
get_weight
(unsigned int index)¶
const Get a weight.
Get the current valueo of the weight applied to a given incoming connexion.

void
egglib::nnet::Neuron::
propagate
(unsigned int index, double val)¶ Propagate error for a given weight.
This method can only be used with neurons that have loaded input and have been activated. After the error has been propagated, the delta value can be obtained using delta(), typically for propagating errors to the previous layer. The delta value is the value passed as val to this method multiplied by the value derivative of the activation function at the current input value of this neuron. Note that the neuron’s weight are not modified by this method, but only when update() is called.
 Parameters
index
index of one of the weights of this neuron.
val
propagated error for this weight (for an output neuron: this neuron’s error times the output value of the corresponding neuron of the hidden layer; for a hidden neuron: the sum of delta values for all output neurons, weighted by the corresponding weights.

void
egglib::nnet::Neuron::
set_input
(unsigned int index, double value)¶ Set an input value.
Load a value for a given incoming connexion. Until changed, the value will be remembered. Changing the value does not update the output value of this neuron.

void
egglib::nnet::Neuron::
set_weight
(unsigned int index, double value)¶ Set a weight.
Load the value of the weight to be applied to a given incoming connexion. Until changed, the value will be remembered. Changing the value does not update the output value of this neuron.

void
egglib::nnet::Neuron::
update
(double rate, double bound, double momentum)¶ Update all weights.
The propagate() method must have been called for all weights. This method applies all delta values.
 Parameters
rate
learning rate.
bound
absolute limit value: weights are bound to the range [bound, +bound].
momentum
proportion of the previous weight change to apply to the new change.

 class
Holds input and output data for training neural networks.
When creating a data set, the user must first set the number of patterns with num_patterns(), of input variables with num_input() and of output variables with num_output(). Only then it is possible to load data using input() and output(). Make sure to load every declared slots as data are not initialized.
Header: <egglibcpp/Neural.hpp>
 Note
 In case a same object must be reused will smaller values of num_input and/or num_output more larger num_patterns, it is more efficient to call the methods num_input() and num_output() before num_patterns().
Public Functions

egglib::nnet::Data::
Data
()¶ Constructor.

virtual
egglib::nnet::Data::
~Data
()¶ Destructor.

double
egglib::nnet::Data::
get_input
(unsigned int pattern, unsigned int variable)¶
const Get an input data item.

unsigned int
egglib::nnet::Data::
get_num_input
()¶
const Get number of input variables.

unsigned int
egglib::nnet::Data::
get_num_output
()¶
const Get number of output variables.

unsigned int
egglib::nnet::Data::
get_num_patterns
()¶
const Get number of patterns.

double
egglib::nnet::Data::
get_output
(unsigned int pattern, unsigned int variable)¶
const Get an output data item.

double
egglib::nnet::Data::
mean_input
(unsigned int index)¶
const Get the mean for an input variable.
If data have been normalized using the method normalize(), this method returns the mean of a given input variable in the original data.

double
egglib::nnet::Data::
mean_output
(unsigned int index)¶
const Get the mean for an output variable.
If data have been normalized using the method normalize(), this method returns the mean of a given output variable in the original data.

void
egglib::nnet::Data::
normalize
()¶ Normalize data for all input and output variables.
All data are modified permanently and the average and standard deviation for each input and output variable are saved and remain accessible using the methods mean_input(), mean_output(), std_input() and std_output(), until normalize() is called again or the number of input or output variables or the number of patterns is modified.

void
egglib::nnet::Data::
normalize_input
(unsigned int index, double mean, double std)¶ Normalize data for an input variable.
All data are modified permanently. The passed mean and standard deviation are not saved.

void
egglib::nnet::Data::
normalize_output
(unsigned int index, double mean, double std)¶ Normalize data for an output variable.
All data are modified permanently. The passed mean and standard deviation are not saved.

void
egglib::nnet::Data::
set_input
(unsigned int pattern, unsigned int variable, double value)¶ Load an input data item.

void
egglib::nnet::Data::
set_num_input
(unsigned int num)¶ Set number of input variables.
Invalidate all means and standard deviations if the data was previously normalized.

void
egglib::nnet::Data::
set_num_output
(unsigned int num)¶ Set number of output variables.
Invalidate all means and standard deviations if the data was previously normalized.

void
egglib::nnet::Data::
set_num_patterns
(unsigned int num)¶ Set number of patterns.
Invalidate all means and standard deviations if the data was previously normalized.

void
egglib::nnet::Data::
set_output
(unsigned int pattern, unsigned int variable, double value)¶ Load an output data item.

double
egglib::nnet::Data::
std_input
(unsigned int index)¶
const Get the standard deviation for an input variable.
If data have been normalized using the method normalize(), this method returns the standard deviation of a given input variable in the original data.

double
egglib::nnet::Data::
std_output
(unsigned int index)¶
const Get the standard deviation for an output variable.
If data have been normalized using the method normalize(), this method returns the standard deviation of a given output variable in the original data.

void
egglib::nnet::Data::
unnormalize_output
(unsigned int index, double mean, double std)¶ Unnormalize data for an output variable.
All data are modified permanently. The passed mean and standard deviation are not saved.
Utilities¶
IntersperseAlign¶
 class
Insert nonvarying sites within alignments.
This class allows to add nonvarying sites within an alignment at given positions. The procedure below must be strictly followed:
 Create an instance. The constructors takes no arguments.
 Load a DataHolder instance using load(). It is required that it is an alignment (a matrix). The instance will create an array of positions internally.
 Specify the desired length of the final alignment using set_length().
 Specify the position of all sites of the original alignment. This can be achieved by three ways:
 Specify manually all positions as real numbers using set_position().
 Specify manually all positions as extant indexes (as integer values) using set_round_position() for all positions. If this approach is used, it is necessary to set the round option of intersperse to false.
 Pass the reference to the Coalesce instance that has generated the alignment (assuming it is a simulation) and let the instance find itself the site positions, with the method get_positions().
 Specify the list of alleles values used for nonvarying positions using set_num_alleles() and then set_allele() as many times as needed. If there is more than one allele, nonvarying alleles will be picked randomly. This method can be skipped (by default, the value corresponding to
A
will be used).  Provide the address of a random number generator using set_random() (it is always needed).
 Call intersperse(). This will change the DataHolder originally loaded.
Header: <egglibcpp/DataHolder.hpp>
Public Functions

egglib::IntersperseAlign::
IntersperseAlign
()¶ Constructor.

egglib::IntersperseAlign::
~IntersperseAlign
()¶ Destructor.

void
egglib::IntersperseAlign::
get_positions
(const Coalesce &coalesce)¶ Gets automatically the positions of sites of the original alignment.
A DataHolder reference must have been loaded using load(), and this DataHolder object must be the last one simulated using the Coalesce object whose reference is passed. This method will load the positions of all sites as provided by Coalesce.

void
egglib::IntersperseAlign::
intersperse
(bool round_positions)¶ Insert nonvarying sites.
This method modifies the DataHolder instance that has been loaded. It is required to have loaded one, and to have specified the position of each of its sites. It is also logical (but not formally required) to have specified the desired length of the alignment. It is possible to specify more than one alleles for inserted positions. It is required to have passed a random number generator.
After call to this method, the loaded DataHolder instance will have a length equal to the value specified using set_length(), unless the original DataHolder was longer (in such case, it is not changed at all).
 Parameters
round_positions
a boolean indicating if site positions must be rounded. Set it to false if already rounded positions have been provided.

void
egglib::IntersperseAlign::
load
(DataHolder &data)¶ Loads a data set.
The loaded data set may contain any number of sites (even zero).
This method does not reset the random number generator, the final alignment length, the position of sites (unless the new alignment has a different number of sites compared with the previous one) or the number and value of nonvarying alleles.

void
egglib::IntersperseAlign::
set_allele
(unsigned int index, int allele)¶ Sets an allele for inserted positions.
The number of alleles must have been fixed using set_num_alleles(). The default value for the first allele is
A
.

void
egglib::IntersperseAlign::
set_length
(unsigned int length)¶ Specifies the desired length of the final alignment.
If this method is skipped, the default value is 0 (the final alignment is identical to the original one), or the previously specified value (if set_length() has been called previously).

void
egglib::IntersperseAlign::
set_num_alleles
(unsigned int num)¶ Specifies the number of possible alleles at inserted positions.
This method may be called at any time. The value must be at least one. If more than one, alleles at inserted positions will be picked randomly (they will be fixed among samples). All alleles must be specified using set_allele(). The default value is one and the default first allele is
A
. It is possible not to specify the first allele even if the number of alleles is increased (theA
value will be retained).

void
egglib::IntersperseAlign::
set_position
(unsigned int index, double position)¶ Sets the position of one of the sites of the original alignment.
A DataHolder reference must have been loaded using load(). This method allows to specify the position of each of the sites of the passed DataHolder instance. Note that the position of all sites must be specified, that positions must always be increasing (consecutive positions might be equal), and all positions must be at least 0 and at most 1.

void
egglib::IntersperseAlign::
set_random
(Random *random)¶ Provides a random number generator.
It is always required to provide a random number generator, even if set_num_alleles() is one.

void
egglib::IntersperseAlign::
set_round_position
(unsigned int index, unsigned int position)¶ Sets the position of one of the sites of the original alignment.
A DataHolder reference must have been loaded using load(). This method allows to specify the position of each of the sites of the passed DataHolder instance. Note that the position of all sites must be specified, that positions must always be increasing (consecutive positions might be equal), and all positions must be at least 0 and at most ls1 where ls is the length of the final alignment.
If you use this method, you must use it for all sites and then set the argument of intersperse() to false.
VectorInt¶
 class
Minimal reimplementation of a vector<int>
Header: <egglibcpp/DataHolder.hpp>
Public Functions

egglib::VectorInt::
VectorInt
()¶ Constructor (default: 0 values)

virtual
egglib::VectorInt::
~VectorInt
()¶ Destructor.

void
egglib::VectorInt::
clear
()¶ Release memory.

int
egglib::VectorInt::
get_item
(unsigned int i)¶
const Get a value.

unsigned int
egglib::VectorInt::
get_num_values
()¶
const Get the number of values.

void
egglib::VectorInt::
set_item
(unsigned int i, int value)¶ Set a value.

void
egglib::VectorInt::
set_num_values
(unsigned int n)¶ Set the number of vqlues (values are not initialized)

Exceptions¶
EggException¶
 class
Base exception type for errors occurring in this library.
Header: <egglibcpp/egglib.hpp>
EggArgumentValueError¶
 class
Exception type for argument value errors.
Header: <egglibcpp/egglib.hpp>
EggFormatError¶
 class
Exception type for file/string parsing errors.
Header: <egglibcpp/egglib.hpp>
Public Functions

egglib::EggFormatError::
EggFormatError
(const char *fileName, unsigned int line, const char *expectedFormat, const char *m, char c, const char *paste_end)¶ Creates the exception.

egglib::EggFormatError::
~EggFormatError
()¶ Destructor.

char
egglib::EggFormatError::
character
()¶ Get character.

const char *
egglib::EggFormatError::
info
()¶ Get additional information field.

unsigned int
egglib::EggFormatError::
line
()¶ Get line number.

const char *
egglib::EggFormatError::
m
()¶ Get bare error message (before formatting)

EggInvalidAlleleError¶
 class
Exception type for invalid allele.
Header: <egglibcpp/egglib.hpp>
EggMemoryError¶
 class
Exception type for memory errors.
There is a macro EGGMEM which stands for EggMemoryError(LINE, FILE).
Header: <egglibcpp/egglib.hpp>
EggOpenFileError¶
 class
Exception type for errors while opening a file.
Header: <egglibcpp/egglib.hpp>
EggPloidyError¶
 class
Exception type for inconsistent ploidy over individuals EggInvalidChromosomeError
Header: <egglibcpp/egglib.hpp>
EggRuntimeError¶
 class
Exception type for runtime errors.
Runtime error definition is rather large. Includes bugs as well as logical errors.
Header: <egglibcpp/egglib.hpp>