Diversity analyses¶
Elementary classes¶
Filter¶
 class
Holds lists of valid (exploitable and missing) data codes.
This class holds two list of integer values: one corresponding to data codes that should be treated as valid and exploitable, and one corresponding to data codes that should be treated as valid but missing. If the list of exploitable data is empty, all data are considered to be exploitable (even those who are missing).
Header: <egglibcpp/Filter.hpp>
Public Functions

egglib::Filter::
Filter
()¶ Constructor.

virtual
egglib::Filter::
~Filter
()¶ Destructor.

void
egglib::Filter::
add_exploitable
(int code)¶ Add a exploitable data code.
By default (that is, if no exploitable data codes have been entered), all data values are considered to be exploitable, even if they appear in missing. If one or more data have been set as exploitable, only those will be considered to be exploitable.

void
egglib::Filter::
add_exploitable_range
(int first, int last)¶ Add a range of exploitable data codes.
Like add_exploitable(int), but add all values included between first and last (both included). It is not required at all that all exploitable codes have a synonym.

void
egglib::Filter::
add_exploitable_with_alias
(int code, int alias)¶ Add a exploitable data code with synonym character.
Similar to add_exploitable(int), except that a second parameter is passed to specify a synonym for the main code. If a data item matching
alias
is passed to is_exploitable(), it will be replaced bycode
and the function will returntrue
. For a large range of continguous codes, do not use this method within a loop and use the appropriate and more efficient method add_exploitable_range(int, int).

void
egglib::Filter::
add_missing
(int code)¶ Add a missing data code.

void
egglib::Filter::
add_missing_with_alias
(int code, int alias)¶ Add a missing data code with synonym character.
Similar to add_missing(int), except that a second parameter is passed to specify a synonym for the main code. If a data item matching
alias
is passed to is_missing(int), it will be replaced bycode
and the function will returntrue
.

int
egglib::Filter::
check
(int code, bool &flag)¶
const Check data value.
Return: (i) the allelic value if it is exploitable as is; (ii) the main value if the provided value is a synonym for an exploitable value; (iii) MISSINGDATA if it is one of the missing data codes or synonyms; (iv) MISSINGDATA if the value is invalid (and in addition the flag argument is set to true).

void
egglib::Filter::
clear
()¶ Reset instance.
The instance is delivered as newly created, and memory is actually released.

bool
egglib::Filter::
is_exploitable
(int &code)¶
const Check if a data value is exploitable.
If the list of exploitable data is empty, then this method always returns
true
. Otherwise, it checks if the value matches one of the value in the exploitable list. If the value does not match any of the exploitable list, but matches one of the synonyms entered using add_exploitable(int, int), then the method returnstrue
and modifies the passed value (so that the synonym is modified to the reference exploitable code). Otherwise, it never modified the value

bool
egglib::Filter::
is_missing
(int &code)¶
const Check if a data value is missing.
Returns
true
if the code matches one of the missing data codes. Otherwise, returnstrue
and modifies the value if the code matches one of the synonyms. If also not, returnfalse
.

void
egglib::Filter::
reserve
(unsigned int num_expl, unsigned int num_expl_ranges, unsigned int num_missing, unsigned int num_synonyms, unsigned int num_missing_synonyms)¶ Reserve memory.
The method preallocates data arrays in order to speed up subsequent loading operations (up to the numbers passed). The instance is not formally changed by this method, and it is absolutely not required to call this method prior setting valid or missing data codes.
 Parameters
num_expl
expected number of exploitable data codes.
num_expl_ranges
expected number of ranges of exploitable data codes.
num_missing
expected number of missing data codes.
num_synonyms
expected number of synonyms (it is not required that all exploitable codes have a synonym).
num_missing_synonyms
expected number of synonym for missing data (it is not required that all missing codes have a synonym).

Structure¶
 class
Manage hierarchical group structure.
Public Functions

StructureCluster *
egglib::StructureHolder::
add_cluster
(unsigned int label)¶ Add a cluster with no samples in it.

StructureIndiv *
egglib::StructureHolder::
add_individual_ingroup
(unsigned int label, StructureCluster *cluster, StructurePopulation *population)¶ Add an ingroup individual with no samples in it.

StructureIndiv *
egglib::StructureHolder::
add_individual_outgroup
(unsigned int label)¶ Add an outgroup individual with no samples in it.

void
egglib::StructureHolder::
add_pop_filter
(unsigned int lbl)¶ Add a population label to filter.
If at least one is passed, process only those passed. By default, include all populations. Use reset_filter() to reset to default.

StructurePopulation *
egglib::StructureHolder::
add_population
(unsigned int label, StructureCluster *cluster)¶ Add a population with no samples in it.

void
egglib::StructureHolder::
add_sample_ingroup
(unsigned int sam_idx, StructureCluster *cluster, StructurePopulation *population, StructureIndiv *indiv)¶ Add one ingroup sample.

void
egglib::StructureHolder::
add_sample_outgroup
(unsigned int sam_idx, StructureIndiv *indiv)¶ Add one outgroup sample.

void
egglib::StructureHolder::
check_ploidy
(unsigned int value)¶ Ensure ploidy is consistent and optionally equal to passed value.
Automatically called by get_structure(). Need to be called if process_ingroup() and/or process_outgroup() is used.
Value must be >0.

void
egglib::StructureHolder::
copy
(const StructureHolder &source)¶ Copy data frome source object.

const StructureCluster &
egglib::StructureHolder::
get_cluster
(unsigned int idx)¶
const Get a cluster.

const StructureIndiv &
egglib::StructureHolder::
get_indiv_ingroup
(unsigned int idx)¶
const Get an ingroup individual.

const StructureIndiv &
egglib::StructureHolder::
get_indiv_outgroup
(unsigned int idx)¶
const Get an outgroup individual.

unsigned int
egglib::StructureHolder::
get_no
()¶
const Get number of outgroup samples.

unsigned int
egglib::StructureHolder::
get_no_req
()¶
const Get required number of outgroup samples.

unsigned int
egglib::StructureHolder::
get_ns
()¶
const Get number of ingroup samples.

unsigned int
egglib::StructureHolder::
get_ns_req
()¶
const Get required number of ingroup samples.

unsigned int
egglib::StructureHolder::
get_ploidy
()¶
const Get ploidy.
Default is UNKNOWN.

unsigned int
egglib::StructureHolder::
get_pop_index
(unsigned int)¶
const Index of the population containing this sample (default: MISSING).

const StructurePopulation &
egglib::StructureHolder::
get_population
(unsigned int idx)¶
const Get a population.

void
egglib::StructureHolder::
get_structure
(DataHolder &data, unsigned int lvl_clust, unsigned int lvl_pop, unsigned int lvl_indiv, unsigned int ploidy, bool skip_outgroup)¶ Process labels from a DataHolder.
Use UNKNOWN for any level to skip (but skipping individuals is not the same as skipping clusters/pops). You must set pop filter separately.

unsigned int
egglib::StructureHolder::
num_clust
()¶
const Number of clusters.

unsigned int
egglib::StructureHolder::
num_indiv_ingroup
()¶
const Number of ingroup individuals (total).

unsigned int
egglib::StructureHolder::
num_indiv_outgroup
()¶
const Number of outgroup individuals.

unsigned int
egglib::StructureHolder::
num_pop
()¶
const Number of populations (total).

void
egglib::StructureHolder::
process_ingroup
(unsigned int idx, unsigned int lbl_clust, unsigned int lbl_pop, unsigned int lbl_indiv)¶ Process one ingroup sample.

void
egglib::StructureHolder::
process_outgroup
(unsigned int idx, unsigned int lbl_indiv)¶ Process one outgroup sample.

void
egglib::StructureHolder::
reserve_filter
(unsigned int howmany)¶ Prealloc filter table.

void
egglib::StructureHolder::
reset
()¶ Reset to defaults.

void
egglib::StructureHolder::
reset_filter
()¶ Reset pop filter only.

StructureCluster *
 class
Manage a cluster.
Public Functions

StructureIndiv *
egglib::StructureCluster::
add_indiv
(StructurePopulation *pop, unsigned int label)¶ Add and create an individual.

StructurePopulation *
egglib::StructureCluster::
add_pop
(unsigned int label)¶ Add and create a population.

void
egglib::StructureCluster::
add_sample
()¶ Add a sample.

const StructureIndiv &
egglib::StructureCluster::
get_indiv
(unsigned int idx)¶
const Get an individual.

unsigned int
egglib::StructureCluster::
get_label
()¶
const Get label.

const StructurePopulation &
egglib::StructureCluster::
get_population
(unsigned int idx)¶
const Get a population.

unsigned int
egglib::StructureCluster::
num_indiv
()¶
const Number of individuals (total for this cluster).

unsigned int
egglib::StructureCluster::
num_pop
()¶
const Number of populations.

void
egglib::StructureCluster::
reset
(StructureHolder *holder, unsigned int label)¶ Restore defaults.

StructureIndiv *
 class
Manage a population.
Public Functions

StructureIndiv *
egglib::StructurePopulation::
add_indiv
(unsigned int label)¶ Add and create an individual.

void
egglib::StructurePopulation::
add_sample
()¶ Add a sample.

StructureCluster *
egglib::StructurePopulation::
get_cluster
()¶ Get containing cluster.

const StructureIndiv &
egglib::StructurePopulation::
get_indiv
(unsigned int idx)¶
const Get an individual.

unsigned int
egglib::StructurePopulation::
get_label
()¶
const Get label.

unsigned int
egglib::StructurePopulation::
num_indiv
()¶
const Number of individuals.

void
egglib::StructurePopulation::
reset
(StructureHolder *holder, StructureCluster *cluster, unsigned int label)¶ Restore defaults.

StructureIndiv *
 class
Manage an individual.
Public Functions

void
egglib::StructureIndiv::
add_sample
(unsigned int index)¶ Add a sample.

StructureCluster *
egglib::StructureIndiv::
get_cluster
()¶ Get containing cluster (NULL if outgroup).

unsigned int
egglib::StructureIndiv::
get_label
()¶
const Get label.

StructurePopulation *
egglib::StructureIndiv::
get_population
()¶ Get containing population (NULL if outgroup).

unsigned int
egglib::StructureIndiv::
get_sample
(unsigned int idx)¶
const Get a sample.

unsigned int
egglib::StructureIndiv::
num_samples
()¶
const Number of samples.

void
egglib::StructureIndiv::
reset
(StructureHolder *holder, StructureCluster *cluster, StructurePopulation *population, unsigned int label)¶ Restore defaults.

void
Classes analysing frequencies¶
FreqHolder¶
 class
Class holding frequencies for all compartments for a site.
Possible uses of this class:
 Process a site with structure stored in a StructureHolder:
 setup_structure(structure, ploidy, flag) and keep structure available
 process_site()
 Process a site without structure or with a manual structure:
 setup_raw(nc, np, no, ploidy, flag)
 setup_pop(i, cluster, relative, ns) for all populations
 process_site() assuming all individuals are consecutive
 Enter frequencies manually:
 setup_raw(nc, np, no, ploidy, flag)
 setup_pop(i, cluster, relative, ns) for all populations
 set_nall(na, ng)
 FreqSet.incr_allele()
 [FreqSet.ncr_genotype()]
 [FreqSet.tell_het()]
 Process data from a VCF parser:
Public Functions

egglib::FreqHolder::
FreqHolder
()¶ Constructor.

egglib::FreqHolder::
~FreqHolder
()¶ Destructor.

int
egglib::FreqHolder::
allele
(unsigned int)¶
const Get an allele value.

unsigned int
egglib::FreqHolder::
cluster_index
(unsigned int)¶
const Get the index of the cluster of a given population.

const FreqSet &
egglib::FreqHolder::
frq_population
(unsigned int)¶
const Get frequencies in a population.

const unsigned int *
egglib::FreqHolder::
genotype
(unsigned int)¶
const Get a genotype (as array of allele indexes) (none if haploid)

bool
egglib::FreqHolder::
genotype_het
(unsigned int)¶
const True if genotype is heterozygote.

unsigned int
egglib::FreqHolder::
genotype_item
(unsigned int, unsigned int)¶
const Get part of a genotype.

unsigned int
egglib::FreqHolder::
num_alleles
()¶
const Number of alleles.

unsigned int
egglib::FreqHolder::
num_clusters
()¶
const Get number of clusters.

unsigned int
egglib::FreqHolder::
num_genotypes
()¶
const Number of genotypes with nonnull frequency (0 if haploid)

unsigned int
egglib::FreqHolder::
num_populations
()¶
const Get number of populations.

unsigned int
egglib::FreqHolder::
ploidy
()¶
const Ploidy.

void
egglib::FreqHolder::
process_site
(const SiteHolder &site)¶ Compute frequencies (structure cannot have individual level)

void
egglib::FreqHolder::
set_genotype_item
(unsigned int i, unsigned int j, unsigned int a)¶ Set part of a genotype.

void
egglib::FreqHolder::
set_nall
(unsigned int na, unsigned int ng)¶ Before loading frequencies manually.

void
egglib::FreqHolder::
setup_pop
(unsigned int i, unsigned int clu_idx, unsigned int rel_idx, unsigned int ns)¶ Follows setup_raw() (for all pops)

void
egglib::FreqHolder::
setup_raw
(unsigned int nc, unsigned int np, unsigned int no, unsigned int ploidy)¶ Setup manual structure.

void
egglib::FreqHolder::
setup_structure
(const StructureHolder *structure, unsigned int ploidy)¶ Set up based on provided structure (no individual level)
 Process a site with structure stored in a StructureHolder:
FreqSet¶
 class
Class holding frequencies in a given compartment.
Public Functions

egglib::FreqSet::
FreqSet
()¶ Constructor (all empty)

egglib::FreqSet::
~FreqSet
()¶ Destructor.

void
egglib::FreqSet::
add_genotypes
(unsigned int num)¶ Add genotypes.

unsigned int
egglib::FreqSet::
frq_all
(unsigned int)¶
const Get an allele frequency.

unsigned int
egglib::FreqSet::
frq_gen
(unsigned int)¶
const Get an genotype frequency.

unsigned int
egglib::FreqSet::
frq_het
(unsigned int)¶
const Frequency of heterozygotes have >= 1 copies of allele.

void
egglib::FreqSet::
incr_allele
(unsigned int all_idx, unsigned int num)¶ Increment frequency of a given allele.

void
egglib::FreqSet::
incr_genotype
(unsigned int gen_idx, unsigned int num)¶ Increment frequency of a given genotype.

unsigned int
egglib::FreqSet::
nieff
()¶
const Total frequency (number of individuals) (0 if haploid)

unsigned int
egglib::FreqSet::
nseff
()¶
const Total frequency (number of samples)

unsigned int
egglib::FreqSet::
num_alleles
()¶
const Number of alleles (equal to userprovided value)

unsigned int
egglib::FreqSet::
num_alleles_eff
()¶
const Number of alleles with nonnull frequency.

unsigned int
egglib::FreqSet::
num_genotypes
()¶
const Number of genotypes (userprovided)

unsigned int
egglib::FreqSet::
num_genotypes_eff
()¶
const Number of genotypes with nonnull frequency.

void
egglib::FreqSet::
reset
(unsigned int)¶ Set number of alleles (set nsam/ngen to 0)

void
egglib::FreqSet::
setup
()¶ Setup.

void
egglib::FreqSet::
tell_het
(unsigned int i, unsigned int a)¶ Tell the class that genotype i is heterozygote for allele a call it several times! don’t change frequencies after that!

unsigned int
egglib::FreqSet::
tot_het
()¶
const Total frequency of heterozygotes.

Sitelevel operations¶
SiteHolder¶
 class
Holds data for a site for diversity analysis.
Usage of this class: first set the ploidy. Then, either load an alignment or data from a VCF, or individuals manually. Before loading individuals manually, it is required to preset the number such as the indexes will exist. If you don’t set all samples manually with load_ing() or load_otg(), you must force set the alleles to the default value. Note: the instance is not reset unless you ask it. Data will add up.
Public Functions

egglib::SiteHolder::
SiteHolder
()¶ Constructor (ploidy = 1)

egglib::SiteHolder::
SiteHolder
(unsigned int ploidy)¶ Constructor.

virtual
egglib::SiteHolder::
~SiteHolder
()¶ Destructor.

void
egglib::SiteHolder::
add_ing
(unsigned int num)¶ Add ingroup individuals.

void
egglib::SiteHolder::
add_otg
(unsigned int num)¶ Add outgroup individuals.

int
egglib::SiteHolder::
get_allele
(unsigned int)¶
const Get an allele (MISSINGDATA for MISSING)

unsigned int
egglib::SiteHolder::
get_i
(unsigned int idv, unsigned int chrom)¶
const Get allele index for ingroup (MISSING for missing data)

unsigned int
egglib::SiteHolder::
get_missing
()¶
const Number of missing alleles found in the last processed data.

unsigned int
egglib::SiteHolder::
get_missing_ing
()¶
const Total number of missing alleles in ingroup.

unsigned int
egglib::SiteHolder::
get_missing_otg
()¶
const Total number of missing alleles in outgroup.

unsigned int
egglib::SiteHolder::
get_nall
()¶
const Number of alleles.

unsigned int
egglib::SiteHolder::
get_nall_ing
()¶
const Number of alleles in ingroup only.

unsigned int
egglib::SiteHolder::
get_ning
()¶
const Get number of ingroup individuals.

unsigned int
egglib::SiteHolder::
get_nout
()¶
const Get number of outgroup individuals.

unsigned int
egglib::SiteHolder::
get_o
(unsigned int idv, unsigned int chrom)¶
const Get allele index for outgroup (MISSING for missing data)

const unsigned int *
egglib::SiteHolder::
get_pi
(unsigned int idv)¶
const get_i as pointer

unsigned int
egglib::SiteHolder::
get_ploidy
()¶
const Get ploidy.

const unsigned int *
egglib::SiteHolder::
get_po
(unsigned int idv)¶
const get_o as pointer

unsigned int
egglib::SiteHolder::
get_straight_i
(unsigned int sam)¶
const Get allele index for ingroup (MISSING for missing data)

unsigned int
egglib::SiteHolder::
get_straight_o
(unsigned int sam)¶
const Get allele index for outgroup (MISSING for missing data)

unsigned int
egglib::SiteHolder::
get_tot_missing
()¶
const Total number of missing alleles.

void
egglib::SiteHolder::
load_ing
(unsigned int idv, unsigned int chrom, int allele)¶ Analyze an ingroup allele (MISSINGDATA for missing data)

void
egglib::SiteHolder::
load_otg
(unsigned int idv, unsigned int chrom, int allele)¶ Analyze an outgroup allele (MISSINGDATA for missing data)

bool
egglib::SiteHolder::
process_align
(const DataHolder &data, unsigned int idx, const StructureHolder *struc, const Filter &filtr, unsigned int max_missing, bool consider_outgroup_missing)¶ Process an alignment.
Does not reset instance! Ploidy must be defined before!
 Return
 true if the number of missing data was not exceeded.
 Parameters
data
an alignment.
idx
index of the site to process.
struc
the structure to use (NULL can be passed to process all samples as haploid individuals).
filtr
the allele filter.
max_missing
maximum number of missing alleles. If there are more missing data, stop processing and return false. Then, the instance should absolutely not be used further.
consider_outgroup_missing
if true, consider missing data for the max_missing argument, otherwise, only count missing data of the ingroup.

bool
egglib::SiteHolder::
process_vcf
(VcfParser &data, unsigned int start, unsigned int stop, unsigned int max_missing)¶ Import allelic data and compute frequencies from VCF data.
Beware: this method does not reset the instance.
 Return
 A boolean specifying whether processing was completed.
 Parameters
data
a VcfParser reference containing data and having the GT format field filled.
start
index of the first sample to consider.
stop
index of the last sample to consider.
max_missing
maximum number of missing alleles. If this proportion is processing is stopped and get_missing() returns max_missing + 1. Only missing data in this data set, and in the ingroup, are considered.

void
egglib::SiteHolder::
reset
(unsigned int ploidy)¶ Reset all to defaults.

void
egglib::SiteHolder::
set_allele
(unsigned int, int a)¶
const Set an allele value.

void
egglib::SiteHolder::
set_i
(unsigned int idv, unsigned int chrom, unsigned int all)¶ Set allele index for ingroup (MISSING for missing data)

void
egglib::SiteHolder::
set_nall
(unsigned int all, unsigned int ing)¶ Set number of alleles (up to you that all is consistent)

void
egglib::SiteHolder::
set_o
(unsigned int idv, unsigned int chrom, unsigned int all)¶ Set allele index for outgroup (MISSING for missing data)

SiteGeno¶
 class
SiteHolder subclass to transform regular data to genotypic.
Public Functions

bool
egglib::SiteGeno::
homoz
(unsigned int genotype)¶
const tell if a genotype is homozygote

void
egglib::SiteGeno::
process
(const SiteHolder &src)¶ reset and get data from a site

bool
SiteDiversity¶
 class
Diversity analyses at the level of a site
Computes standard diversity indexes for a unique site or marker.
process() and average() return a composite flag.
Statistics:
 If fstats_diplo is called
 npop_eff2 (pops with >= 1 indiv)
 If fstats_haplo is called
 npop_eff3 (pops with >= 1 sample)
 If fstats_hier is called
 nclu
 nclu_eff (>= 1 pops each with >= 1 indiv)
 npop_eff2 (same as for fstats_diplo)
 flag&1 (always on for process()):
 ns
 flag&2:
 npop
 Aglob
 Aing
 Stot
 pairdiff
 He
 R
 npop_eff1 (pops with >= 2 samples)
 He[pop] (for pops with >= 2 samples)
 pairdiff_pop[pop1][pop2] (for pops with >= 2 samples and pop2 != pop1)
 flag&4:
 thetaIAM
 thetaSMM
 flag&8:
 Ho
 Hi
 flag&16:
 Sder
 der
 flag&2048:
 Aout
 flag&32:
 n
 d
 flag&64:
 a
 b
 c
 flag&128:
 a0
 b1
 b2
 c0
 flag&256:
 JostD
 flag&2048:
 Hst
 flag&4096:
 Gst
 flag&8192:
 Gste
 flag&512: ns, Aglob, Aing, Aout, Stot, Sder, and der are actually integers
 flag&1024: site is polymorphic / there is at least one polymorphic site
 MAF
 MAF_pop
Fit = 1  c/(a+b+c) Fst = a/(a+b+c) Fis = 1  c/(b+c)
Fst = n/d
Fit = 1  c0/(a0+b2+b1+c0) Fst = (a0+b2)/(a0+b2+b1+c0) Fct = a0/(a0+b2+b1+c0) Fis = 1  c0/(b1+c0)
Hst = 1  Hs / He Gst = 1  Hs / Httilde Gste = 1  Hse / Hte
Requires: stats()
Public Functions

egglib::SiteDiversity::
SiteDiversity
()¶ Constructor.

virtual
egglib::SiteDiversity::
~SiteDiversity
()¶ Destructor.

double
egglib::SiteDiversity::
a
()¶
const Computed by fstats_diplo()

double
egglib::SiteDiversity::
a0
()¶
const Computed by fstats_hier()

double
egglib::SiteDiversity::
Aglob
()¶
const Number of alleles (including outgroupspecific alleles) (stats)

double
egglib::SiteDiversity::
Aing
()¶
const Number of alleles excluding outgroupspecific alleles (stats)

double
egglib::SiteDiversity::
Aout
()¶
const Number of different alleles in the outgroup (stats)

unsigned int
egglib::SiteDiversity::
average
()¶ Compute the average of all stats (except those per pop)

double
egglib::SiteDiversity::
b
()¶
const Computed by fstats_diplo()

double
egglib::SiteDiversity::
b1
()¶
const Computed by fstats_hier()

double
egglib::SiteDiversity::
b2
()¶
const Computed by fstats_hier()

double
egglib::SiteDiversity::
c
()¶
const Computed by fstats_diplo()

double
egglib::SiteDiversity::
c0
()¶
const Computed by fstats_hier()

double
egglib::SiteDiversity::
d
()¶
const Computed by fstats_haplo()

double
egglib::SiteDiversity::
D
()¶
const Computed by hstats()

double
egglib::SiteDiversity::
derived
(unsigned int)¶
const < Number of derived alleles (stats+outgroup)
Derived allele frequency (stats+outgroup)

unsigned int
egglib::SiteDiversity::
flag
()¶
const Get flag value.

int
egglib::SiteDiversity::
global_allele
(unsigned int)¶
const Get one of the global alleles.

double
egglib::SiteDiversity::
Gst
()¶
const Computed by hstats()

double
egglib::SiteDiversity::
Gste
()¶
const Computed by hstats()

double
egglib::SiteDiversity::
He
()¶
const Unbiased heterozygosity (averaged if relevant) stats()

double
egglib::SiteDiversity::
He_pop
(unsigned int pop)¶
const Unbiased heterozygosity for a population stats()

double
egglib::SiteDiversity::
Hi
()¶
const Avg number of differents between individuals (stats)

double
egglib::SiteDiversity::
Ho
()¶
const Frequency of heterozygotes (stats)

double
egglib::SiteDiversity::
Hst
()¶
const Computed by hstats()

unsigned int
egglib::SiteDiversity::
k
()¶
const Number of populations (stats)

double
egglib::SiteDiversity::
MAF
()¶
const Frequency of second most frequent allele.

double
egglib::SiteDiversity::
MAF_pop
(unsigned int)¶
const MAF for a population.

double
egglib::SiteDiversity::
n
()¶
const Computed by fstats_haplo()

unsigned int
egglib::SiteDiversity::
nclu_eff
()¶
const Number of clusters with >= 1 pop with >= 1 indiv (fstats_hier)

unsigned int
egglib::SiteDiversity::
npop_eff1
()¶
const Number of populations with >= 2 samples (stats)

unsigned int
egglib::SiteDiversity::
npop_eff2
()¶
const Number of populations with >= 1 indiv (fstats_diplo + fstats_hier)

unsigned int
egglib::SiteDiversity::
npop_eff3
()¶
const Number of populations with >= 1 sample (fstats_haplo)

double
egglib::SiteDiversity::
ns
()¶
const Number of analyzed samples (stats)

unsigned int
egglib::SiteDiversity::
nsites1
()¶
const For average ns.

unsigned int
egglib::SiteDiversity::
nsites10
()¶
const For average Hst.

unsigned int
egglib::SiteDiversity::
nsites11
()¶
const For average Gst.

unsigned int
egglib::SiteDiversity::
nsites12
()¶
const For average Gste.

unsigned int
egglib::SiteDiversity::
nsites2
()¶
const For average Aglob, Aing, Atot, Stot, pairdiff, He, thetaIAM, and thetaSMM.

unsigned int
egglib::SiteDiversity::
nsites3
()¶
const For average thetaIAM and thetaSMM.

unsigned int
egglib::SiteDiversity::
nsites4
()¶
const For average Ho and Hi.

unsigned int
egglib::SiteDiversity::
nsites5
()¶
const For average derived and Sd.

unsigned int
egglib::SiteDiversity::
nsites6
()¶
const For average n and d.

unsigned int
egglib::SiteDiversity::
nsites7
()¶
const For average a, b, and c.

unsigned int
egglib::SiteDiversity::
nsites8
()¶
const For average c0, b1, b2, a0.

unsigned int
egglib::SiteDiversity::
nsites9
()¶
const For average D.

bool
egglib::SiteDiversity::
orientable
()¶
const True if the site is orientable (stats)

double
egglib::SiteDiversity::
pairdiff
()¶
const Average number of pairwise differences (stats)

double
egglib::SiteDiversity::
pairdiff_inter
(unsigned int pop1, unsigned int pop2)¶
const Average number of differences between a pair of population (stats)

unsigned int
egglib::SiteDiversity::
process
(const FreqHolder &frq)¶ Compute toggled statistics.

double
egglib::SiteDiversity::
R
()¶
const Allelic richness (stats)

void
egglib::SiteDiversity::
reset
()¶ Reset stats sums to 0 (keep toggled flags)

double
egglib::SiteDiversity::
S
()¶
const Number of alleles at frequency one (singletons) (stats)

double
egglib::SiteDiversity::
Sd
()¶
const Number of derived singletons (stats) (requires outgroup)

double
egglib::SiteDiversity::
thetaIAM
()¶
const Requires stats()

double
egglib::SiteDiversity::
thetaSMM
()¶
const Requires stats()

void
egglib::SiteDiversity::
toggle_fstats_diplo
()¶ Toggle Fstatistics.

void
egglib::SiteDiversity::
toggle_fstats_haplo
()¶ Toggle Fstatistics.

void
egglib::SiteDiversity::
toggle_fstats_hier
()¶ Toggle Fstatistics.

void
egglib::SiteDiversity::
toggle_hstats
()¶ Toggle Hstatistics.

void
egglib::SiteDiversity::
toggle_off
()¶ Set all flags to off.
 If fstats_diplo is called
CodingSite¶
 class
Holds data for a coding site (as a triplet of sites)
This class assists the detection of polymorphism at codon sites, although diversity analyses themselves have to be performed using SiteDiversity itself. The class perform analyses through the process() method, which automatically resets all previously stored data.
Header: <egglibcpp/CodongSite.hpp>
Public Functions

egglib::CodingSite::
CodingSite
()¶ Constructor.

egglib::CodingSite::
~CodingSite
()¶ Destructor.

const SiteHolder &
egglib::CodingSite::
aminoacids
()¶
const Get access to amino acid data.
Requires that a codon site has been analyzed using process().
This instance contains amino acids. It contains integer data representing amino acids (‘*’ for stop codons). The GeneticCode instance passed to process() determines the translation.

const SiteHolder &
egglib::CodingSite::
codons
()¶
const Get access to merged codon data.
Requires that a codon site has been analyzed using process().
This instance contains codon alleles. See GeneticCode for information about encoding of codons.

bool
egglib::CodingSite::
mutated
(unsigned int codon_allele1, unsigned int codon_allele2, unsigned int pos)¶
const Check if two given codon alleles differ at a given position.
Tells, for two given codon alleles (see alleles()), if they differ (at least) at the specified position. Possible values are 0, 1 or 2.
Requires that a codon site has been analyzed using process().
The indexes must both be < nall() but they may be passed in any order.

unsigned int
egglib::CodingSite::
ndiff
(unsigned int codon_allele1, unsigned int codon_allele2)¶
const Number of nucleotides differences between two given codon alleles.
Give, for two given codon alleles (see alleles()), the number of nucleotide differences between them. Possible values are 1, 2 or 3.
Requires that a codon site has been analyzed using process().
The indexes must both be < nall() but they may be passed in any order.

unsigned int
egglib::CodingSite::
ni
()¶
const Number of ingroup indiv.

unsigned int
egglib::CodingSite::
no
()¶
const Number of outgroup indiv.

bool
egglib::CodingSite::
NS
(unsigned int codon_allele1, unsigned int codon_allele2)¶
const True if the two given codon alleles encode different aminoacids.
Requires that a codon site has been analyzed using process().
The indexes must both be < nall() but they may be passed in any order.

unsigned int
egglib::CodingSite::
nseff
()¶
const Number of analyzed samples.
Requires that a codon site has been analyzed using process().
Value bound by 0 and ns(). Depends on the number of missing data (and stop codons, if skipstop was set to true).

unsigned int
egglib::CodingSite::
nseffo
()¶
const Analyzed samples for outgroup.

double
egglib::CodingSite::
NSsites
()¶
const Estimated number of nonsynonymous sites.
Requires that a codon site has been analyzed using process().

unsigned int
egglib::CodingSite::
nstop
()¶
const Number of stop codons met during processing of codon site.
Requires that a codon site has been analyzed using process().
Not affected by the skipstop option.

unsigned int
egglib::CodingSite::
pl
()¶
const Ploidy.

bool
egglib::CodingSite::
process
(const SiteHolder &site1, const SiteHolder &site2, const SiteHolder &site3, const GeneticCode &code, bool skipstop, unsigned int max_missing)¶ Analyzes a codon site.
The three codon positions must be loaded as Site instances containing nucleotides encoded as integer values. All values except values equal to A, C, G and T (caseindependent) are treated as missing data. Obviously, the three sites must have the same number of samples and also the same number of populations (and matching affectation of samples to populations). Upon processing, the class generates and holds a Site instance (available as codons()) containing data from the three sites merged, and another (available as aminoacids()) with the same data translated).
 Return
 A boolean indicating whether analysis was completed. If False, data contained in the object should not be used since stored objects will not have been filled. Even ni() and nieff() will be invalid.
 Note
 It is not allowed to use egglib::UNKNOWN for any of A, C, G and T argument.
 Parameters
site1
first nucleotide position of the codon.
site2
second nucleotide position of the codon.
site3
third nucleotide position of the codon.
code
GeneticCode instance representing the code to be used for treating this codon.
skipstop
if true, stop codons are treated as missing data and skipped. If set to true, potential mutations to stop codons are not taken into account when estimating the number of nonsynonymous sites. Warning (this may be counterintuitive). It actually assumes that stop codons are not biologically plausible and considers them as missing data. On the other hand, if skipstop is false, it takes stop codons as if they were valid amino acids.
max_missing
maximum number of missing data to allow (including stop codons if skipstop if true).

AlleleStatus¶
 class
Classify alleles and site for frequencies with several populations.
Statistics: Sp populationspecific alleles Spd populationspecific derived alleles ShP number of alleles segregating in at least one pair of populations ShA number of alleles in nonnull frequencies in at least one pair of populations FxA number of alleles fixed in at least one population FxD number of fixed differences (two different alleles fixed in a pair of populations)
The user must ensure that all passed sites are polymorphic. The user should also probably exclude populations with low sample sizes if they are interested in the number of fixed alleles (populations with no samples are automatically skipped).
The statistics are computed for each site. Sums for multisites are available as Sp_T and Sp_T1 (and similarly for other statistics). T1 is such as each site is counted only once for any statistic.
Header: <egglibcpp/AlleleStatus.hpp>
Public Functions

egglib::AlleleStatus::
AlleleStatus
()¶ Constructor.

egglib::AlleleStatus::
~AlleleStatus
()¶ Destructor.

unsigned int
egglib::AlleleStatus::
FxA
()¶
const Fixed alleles.

unsigned int
egglib::AlleleStatus::
FxA_T1
()¶
const Fixed alleles.

unsigned int
egglib::AlleleStatus::
FxD
()¶
const Fixed differences.

unsigned int
egglib::AlleleStatus::
FxD_T1
()¶
const Fixed differences.

unsigned int
egglib::AlleleStatus::
nsites
()¶
const Number of sites with valid data.

unsigned int
egglib::AlleleStatus::
nsites_o
()¶
const Number of orientable sites with valid data.

void
egglib::AlleleStatus::
process
(const FreqHolder &freqs)¶ Analyze a site.

void
egglib::AlleleStatus::
reset
()¶ Reset sums (but keep toggle flag)

unsigned int
egglib::AlleleStatus::
ShA
()¶
const Shared alleles.

unsigned int
egglib::AlleleStatus::
ShA_T1
()¶
const Shared alleles.

unsigned int
egglib::AlleleStatus::
ShP
()¶
const Shared polymorphisms.

unsigned int
egglib::AlleleStatus::
ShP_T1
()¶
const Shared polymorphisms.

unsigned int
egglib::AlleleStatus::
Sp
()¶
const Popspecific alleles.

unsigned int
egglib::AlleleStatus::
Sp_T1
()¶
const Popspecific alleles.

unsigned int
egglib::AlleleStatus::
Spd
()¶
const Popspecific derived alleles.

unsigned int
egglib::AlleleStatus::
Spd_T1
()¶
const Popspecific derived alleles.

void
egglib::AlleleStatus::
total
()¶ Copy all sums to director accessors.

ComputeV¶
 class
Compute allele size variance.
Public Functions

egglib::ComputeV::
ComputeV
()¶ Constructor.

egglib::ComputeV::
~ComputeV
()¶ Destructor.

double
egglib::ComputeV::
average
()¶
const Get average V (UNDEF if no computed values)

unsigned int
egglib::ComputeV::
num_sites
()¶
const Number of sites with computed V.

void
egglib::ComputeV::
reset
()¶ Reset.

void
egglib::ComputeV::
set_allele
(unsigned int, int a)¶ Set an allele value.

void
egglib::ComputeV::
setup_alleles
(unsigned int)¶ Specify number of alleles.

void
egglib::ComputeV::
setup_alleles_from_site
(const SiteHolder &site)¶ Get alleles directly from site.

Generic classes¶
Diversity1¶
 class
Compute population summary statistics from allele frequencies at several sites.
Diversity1 instances cannot be copied. This class is designed to allow reuse of objects without unnecessary memory reallocation.
This class computes statistics that does not require access to a full Site instance and for which only frequencies are needed. The frequency for all sites that must be analyzed should be loaded.
Statistics:
code  requirement  flag  toggle flag ========================================================= lt       ls       nsmax  ls>0  1   S  ls>0  1   Ss  ls>0  1   eta  ls>0  1   Pi  ls>0  1   lso    4  ori_site nsmaxo  lso>0  8  ori_site So  lso>0  8  ori_site Sso  lso>0  8  ori_site etao  lso>0  8  ori_site lM  lso>0  8  ori_site pM  lM>0  16  ori_site nseffo  lso>0  32  ori_div thetaH  lso>0  32  ori_div thetaL  lso>0  32  ori_div Hns  lso>0  32  ori_div Hsd  So>0 & nseffo>=3 & varZ>0  1024  ori_div E  So>0 & nseffo>=3 & varE>0  2048  ori_div Dfl  So>0 & nseffo>=3 & varDfl>0  4096  ori_div F  So>0 & nseffo>=3 & varF>0  8192  ori_div nseff  ls>0  128  basic thetaW  ls>0  128  basic Dxy  ls>0 npop=2  16384  basic Da  ls>0 npop=2  16384  basic Fstar  S>0 & ns>2  256  basic D  S>0 & ns>3  512  basic Deta  S>0 & ns>3  512  basic Dstar  S>0 & ns>3  512  basic sites  i<S    site_lists sites_o  i<So    site_lists singl  i<Ss    site_lists singl_o  i<Sso    site_lists
Header: <egglibcpp/Diversity.hpp>
Public Functions

egglib::Diversity1::
Diversity1
()¶ Constructor.

egglib::Diversity1::
~Diversity1
()¶ Destructor.

unsigned int
egglib::Diversity1::
compute
()¶ Compute statistics, return flag but does not reset.

double
egglib::Diversity1::
D
()¶
const Tajima’s D.

double
egglib::Diversity1::
Da
()¶
const Net pairwise distance for 1st pair.

double
egglib::Diversity1::
Deta
()¶
const Tajima’s D using eta instead of S.

double
egglib::Diversity1::
Dfl
()¶
const Fu and Li’s D.

double
egglib::Diversity1::
Dstar
()¶
const Fu and Li’s D*.

double
egglib::Diversity1::
Dxy
()¶
const Pairwise distance for 1st pair.

double
egglib::Diversity1::
E
()¶
const Zeng et al.’s E.

unsigned int
egglib::Diversity1::
eta
()¶
const eta

unsigned int
egglib::Diversity1::
etao
()¶
const eta for orientable sites

double
egglib::Diversity1::
F
()¶
const Fu and Li’s F.

double
egglib::Diversity1::
Fstar
()¶
const Fu and Li’s F*.

double
egglib::Diversity1::
Hns
()¶
const Unstandardized Fay and Wu’s H.

double
egglib::Diversity1::
Hsd
()¶
const Fay and Wu’s H standardized by Zeng et al.

void
egglib::Diversity1::
load
(const FreqHolder &freqs, const SiteDiversity &div, unsigned int position)¶ Analyze a site.

unsigned int
egglib::Diversity1::
ls
()¶
const Number of loaded sites (with >=2 valid data)

unsigned int
egglib::Diversity1::
lso
()¶
const Number of loaded orientable sites (with valid data)

unsigned int
egglib::Diversity1::
lt
()¶
const Number of loaded sites (total)

unsigned int
egglib::Diversity1::
nM
()¶
const Sites available for MFDM test.

double
egglib::Diversity1::
nseff
()¶
const Average number of used samples.

double
egglib::Diversity1::
nseffo
()¶
const Average number of used samples for orientable sites.

unsigned int
egglib::Diversity1::
nsingld
()¶
const Number derived singletons.

unsigned int
egglib::Diversity1::
nsmax
()¶
const Largest number of used samples.

unsigned int
egglib::Diversity1::
nsmaxo
()¶
const Largest number of used samples for orientable sites.

double
egglib::Diversity1::
Pi
()¶
const Sum of He.

double
egglib::Diversity1::
pM
()¶
const Li’s MFDM test p value (large positive value by default)

void
egglib::Diversity1::
reset_stats
()¶ Reset counters to 0.

unsigned int
egglib::Diversity1::
S
()¶
const Number of polymorphic sites.

void
egglib::Diversity1::
set_option_multiple
(bool b)¶ Set multiple option (default: False)

void
egglib::Diversity1::
set_option_ns_set
(unsigned int)¶ Set maximum number of samples, for H and co. (default: UNKNOWN)

unsigned int
egglib::Diversity1::
singl
(unsigned int)¶
const Get position of site with a singleton.

unsigned int
egglib::Diversity1::
singl_o
(unsigned int)¶
const Get position of site with an orientable singleton.

unsigned int
egglib::Diversity1::
site
(unsigned int)¶
const Get position of polymorphic site.

unsigned int
egglib::Diversity1::
site_o
(unsigned int)¶
const Get position of polymorphic orientable site.

unsigned int
egglib::Diversity1::
So
()¶
const Number of polymorphic orientable sites.

unsigned int
egglib::Diversity1::
Ss
()¶
const Number of polymorphic sites with =1 singleton.

unsigned int
egglib::Diversity1::
Sso
()¶
const Number of polymorphic orientable sites with =1 singleton.

double
egglib::Diversity1::
thetaH
()¶
const ThetaH estimator.

double
egglib::Diversity1::
thetaL
()¶
const ThetaL estimator.

double
egglib::Diversity1::
thetaPi
()¶
const thetaPi estimator (using orientable sites)

double
egglib::Diversity1::
thetaW
()¶
const Theta estimator based on S.

void
egglib::Diversity1::
toggle_basic
()¶ Activate basic pergene.

void
egglib::Diversity1::
toggle_off
()¶ Cancel all flags.

void
egglib::Diversity1::
toggle_ori_div
()¶ Activate pergene oriented.

void
egglib::Diversity1::
toggle_ori_site
()¶ Activate persite oriented.

void
egglib::Diversity1::
toggle_site_lists
()¶ Activate lists of site positions.

Diversity2¶
 class
Compute population summary statistics from an array of sites.
Diversity instances cannot be copied. This class is designed to allow reuse of objects without unnecessary memory reallocation.
This class computes statistics that require access to the full Site instance (and therefore to the individual allele of each individual). Sites with missing data are ignored when computing Wall’s statistics.
Meaning of flag:
 flag&1 an error occurred (+ one of 2, 4, 8, 16, 32)
 flag&2 error: less than 2 samples (including missing)
 flag&4 error: inconsistent number of samples
 flag&8 error: inconsistent ploidy
 flag&16 error: inconsistent frequency holder (sample size not checked)
 flag&32 error: provided SiteDiversity does not have proper data
 flag&64 at least 1 polymorphic site with at least 2 nonmissing samples
 flag&128 at least 1 polymorphic, orientable site with at least 2 nonmissing samples
 flag&256 computed R2, R3, R4, and Ch
 flag&512 computed R2E, R3E, R4E, and ChE
 flag&1024 computed B and Q (at least 2 sites with no missing data)
Header: <egglibcpp/Diversity.hpp>
Public Functions

egglib::Diversity2::
Diversity2
()¶ Constructor.

egglib::Diversity2::
~Diversity2
()¶ Destructor.

double
egglib::Diversity2::
B
()¶
const Wall’s statistic.

double
egglib::Diversity2::
Ch
()¶
const RamosOnsins and Rozas’s statistic.

double
egglib::Diversity2::
ChE
()¶
const RamosOnsins and Rozas’s statistic.

unsigned int
egglib::Diversity2::
compute
()¶ Compute singletons and/or partitions stats, return flag.

double
egglib::Diversity2::
k
()¶
const Average number of differences.

double
egglib::Diversity2::
ko
()¶
const Average number of differences at orientable sites.

void
egglib::Diversity2::
load
(const SiteHolder &site, const SiteDiversity &div, const FreqHolder &frq)¶ Load site with its matching structure (requires basic stats)

unsigned int
egglib::Diversity2::
num_clear
()¶
const Number of sites with 0 missing data (Wall stats)

unsigned int
egglib::Diversity2::
num_orientable
()¶
const Number of orientable sites.

unsigned int
egglib::Diversity2::
num_sequences
()¶
const Number of sequences.

unsigned int
egglib::Diversity2::
num_sites
()¶
const Number of loaded sites (only polymorphic)

double
egglib::Diversity2::
Q
()¶
const Wall’s statistic.

double
egglib::Diversity2::
R2
()¶
const RamosOnsins and Rozas’s statistic.

double
egglib::Diversity2::
R2E
()¶
const RamosOnsins and Rozas’s statistic.

double
egglib::Diversity2::
R3
()¶
const RamosOnsins and Rozas’s statistic.

double
egglib::Diversity2::
R3E
()¶
const RamosOnsins and Rozas’s statistic.

double
egglib::Diversity2::
R4
()¶
const RamosOnsins and Rozas’s statistic.

double
egglib::Diversity2::
R4E
()¶
const RamosOnsins and Rozas’s statistic.

void
egglib::Diversity2::
reset
()¶ Restore all variables to the default state (except toggled flags)

void
egglib::Diversity2::
set_option_multiple
(bool b)¶ Toggle option for multiple alleles.

void
egglib::Diversity2::
toggle_off
()¶ Cancel flags.

void
egglib::Diversity2::
toggle_partitions
()¶ Activate computation of B and Q stats (must be set before load()

void
egglib::Diversity2::
toggle_singletons
()¶ Activate computation of Rx/Ch RxE/ChE stats.
Other statistics¶
Haplotypes¶
 class
Identifies haplotypes from a set of sites.
How to use this class:
 1) Setup (or reset) and an optional structure.
 2) Load all sites with load(site). Haplotypes are computed and all samples with at least one missing data are marked as missing. While loading sites, you can monitor the values of:
 3) Call cp_haplotypes() to finalize haplotype processing. After that you may use:
 4) If you wish (but you don’t have to), you can try and guess the haplotype of samples with missing data. For this you need first to call prepare_impute(). After that, you may use:
 5) If you impute, load again all sites with solve(). This may change the value of:
 map()
 6) If you impute, you are required to call impute() (and otherwise you can’t). If you do, the following values will be updated [note: maybe impute and cp_haplotypes do exactly the same]
 ni_hapl()
 ng_hapl()
 freq_i()
 freq_o() and these values will be set:
 num_potential()
 potential()
 7) After whether or not you performed 46, you can now call this to make a site:
 8) Whether or not you performed 46 and/or 7, you can now call cp_dist() that will let you access to the distance matrix:
 9) To compute stats, call cp_stats() (you must have set a structure and called 8). Then you may use:
 Fst()
 Kst() The function cp_stats() returns a flag: 0 (no stats computed), 1 (Fst computed), or 2 (both Fst and Kst computed).
 10) You can also call this method (requires a structure and 8 but not 9, and valid if nstot is >1):
Header: <egglibcpp/Haplotypes.hpp>
Public Functions

egglib::Haplotypes::
Haplotypes
()¶ Constructor.

egglib::Haplotypes::
~Haplotypes
()¶ Destructor.

void
egglib::Haplotypes::
cp_dist
()¶ Compute distance matrix.

void
egglib::Haplotypes::
cp_haplotypes
()¶ Finalize haplotype estimation.

unsigned int
egglib::Haplotypes::
cp_stats
()¶ Compute differentiation stats.

unsigned int
egglib::Haplotypes::
dist
(unsigned int, unsigned int)¶
const Distance matrix entry (0<=j<i<_nt_hapl)

unsigned int
egglib::Haplotypes::
freq_i
(unsigned int)¶
const Frequency of haplotype in intgroup.

unsigned int
egglib::Haplotypes::
freq_o
(unsigned int)¶
const Frequency of haplotype in outgroup.

double
egglib::Haplotypes::
Fst
()¶
const Fst value.

void
egglib::Haplotypes::
get_site
(SiteHolder &site)¶ Get haplotypic data as a site (recycle passed object)

unsigned int
egglib::Haplotypes::
hapl
(unsigned int, unsigned int)¶
const Get site j of haplotype i.

void
egglib::Haplotypes::
impute
()¶ Try to guess haplotype of samples with missing data.

double
egglib::Haplotypes::
Kst
()¶
const Gst value.

void
egglib::Haplotypes::
load
(const SiteHolder &site)¶ Process a site (implies call to setup())

unsigned int
egglib::Haplotypes::
map_sample
(unsigned int)¶
const Get haplotype index of a sample.

unsigned int
egglib::Haplotypes::
mis_idx
(unsigned int)¶
const Index of one of the samples with missing data.

unsigned int
egglib::Haplotypes::
n_ing
()¶
const Total number of ingroup samples.

unsigned int
egglib::Haplotypes::
n_mis
()¶
const Number of samples with missing data.

unsigned int
egglib::Haplotypes::
n_missing
(unsigned int)¶
const Number of missing data per sample.

unsigned int
egglib::Haplotypes::
n_otg
()¶
const Total number of outgroup samples.

unsigned int
egglib::Haplotypes::
n_pop
()¶
const Number of populations.

unsigned int
egglib::Haplotypes::
n_potential
(unsigned int)¶
const Number of compatible haplotypes for a sample with missing data.

unsigned int
egglib::Haplotypes::
n_sam
()¶
const Total number of samples.

unsigned int
egglib::Haplotypes::
n_sites
()¶
const Number of sites.

unsigned int
egglib::Haplotypes::
ne_ing
()¶
const Nonmissing number of ingroup samples.

unsigned int
egglib::Haplotypes::
ne_otg
()¶
const Nonmissing number of outgroup samples.

unsigned int
egglib::Haplotypes::
ne_pop
()¶
const Number of populations with.

unsigned int
egglib::Haplotypes::
ng_hapl
()¶
const Number of haplotypes at nonzero frequency overall.

unsigned int
egglib::Haplotypes::
ni_hapl
()¶
const Number of haplotypes at nonzero ingroup frequency.

unsigned int
egglib::Haplotypes::
nstot
()¶
const Number of samples (nonmissing and not ignored by structure)

unsigned int
egglib::Haplotypes::
nt_hapl
()¶
const Total number of haplotypes (including truncated ones)

unsigned int
egglib::Haplotypes::
potential
(unsigned int, unsigned int)¶
const Index of a potential haplotype.

void
egglib::Haplotypes::
prepare_impute
(unsigned int)¶ Initialize required tables for imputing.

void
egglib::Haplotypes::
resolve
(const SiteHolder &site)¶ Second pass: try to resolve missing data.

void
egglib::Haplotypes::
setup
(const StructureHolder *struc)¶ Setup/reset instance (with optional structure object)

double
egglib::Haplotypes::
Snn
()¶
const Snn value computed on the fly.
ParalogPi¶
 class
Specifically designed to compute Innan 2003 statistics for a gene family.
This class computes the within and betweenparalog Pi of Innan (2003). The withinparalog Pi is the same as the standard Pi, except that it is not unbiased. The betweenparalog Pi is the same as Dxy, taking the paralogs as populations, except that one pair of genes (paralogs from the same sample) is not considered.
Setup provides the two structure objects describing respectively the structure in paralogs and the structure in samples (two different structure objects are required because they are necessarily nonnested). It is required that the structures of interest are loaded as population levels. Cluster levels are ignored. It is required that ploidy is 1 (if you have genotypic data, skip the individual level). Required tests: ploidy of both structure and of all sites is equal to 1, the maximum index of the paralog structure is represented in all sites (other disagrements are treated as missing data).
Public Functions

egglib::ParalogPi::
ParalogPi
()¶ Constructor (default: 0 pop)

egglib::ParalogPi::
~ParalogPi
()¶ Destructor.

void
egglib::ParalogPi::
load
(const SiteHolder &site)¶ Load a site.

unsigned int
egglib::ParalogPi::
num_paralogs
()¶
const Number of copies (or: number of pops)

unsigned int
egglib::ParalogPi::
num_samples
()¶
const Number of samples (or: size of each pop)

unsigned int
egglib::ParalogPi::
num_sites_pair
(unsigned int, unsigned int)¶
const Number of analyzed sites for a pair of paralogs.

unsigned int
egglib::ParalogPi::
num_sites_paralog
(unsigned int)¶
const Number of analyzed sites for a paralog.

unsigned int
egglib::ParalogPi::
num_sites_tot
()¶
const Total number of analyzed sites.

double
egglib::ParalogPi::
Pib
(unsigned int, unsigned int)¶
const Betweenparalog Pi.

double
egglib::ParalogPi::
Piw
(unsigned int)¶
const Withinparalog Pi.

void
egglib::ParalogPi::
reset
(const StructureHolder &str_prl, const StructureHolder &str_idv)¶ Reset and setup structure.

Rd¶
 class
Compute the bar{r_d} (or rD) statistic.
Rd instances cannot be copied. The procedure is:
 Load as many sites as needed. They are analysed on the fly. The number of samples and ploidy are expected to match (as well as the order of individuals and alleles). If there is a mismatch in ploidy and/or number of samples, the Rd value will be UNDEF.
 Compute the Rd value (resets the instance).
Public Functions

egglib::Rd::
Rd
()¶ Constructor.

egglib::Rd::
~Rd
()¶ Destructor.

double
egglib::Rd::
compute
()¶ Compute rD (UNDEF if not computable)

void
egglib::Rd::
load
(const SiteHolder &site)¶ Load a site.

unsigned int
egglib::Rd::
num_indiv
()¶
const Get number of individuals as provided to configure()

unsigned int
egglib::Rd::
num_loci
()¶
const Get number of processed loci (some loci may be skipped)

unsigned int
egglib::Rd::
ploidy
()¶
const Get ploidy as provided to configure()

void
egglib::Rd::
reset
()¶ Reset to initial state.
Fu’s F¶

double
egglib::
Fs
(unsigned int n, unsigned int K, double pi)¶ Compute Fu’s Fs.
This function computes Fu’s Fs statistic using haplotype statistics (that should have been computed using the Haplotypes class) and, as a theta estimator, pi provided by the Diversity1 class. The values must have been computed using the same data set.
Warning: this function is not available for values of n (number of samples) larger than MAX_N_STIRLING. k must be >= 1 and <= n.
The behaviour of the function is not defined if K < 0. The function returns UNDEF if the value cannot be computed, which can happen:
 Parameters
n
number of exploited samples for determining the number of haplotypes.
K
number of haplotypes obtained with the same data.
pi
average number of pairwise differences (as theta estimator, per gene).
 if n is larger than MAX_N_STIRLING;
 if the sum of probabilities of k values >= K is too close of 0 or 1 (based on the computer’s precision);
 if pi is 0 (no polymorphism);
 if K > n (which is an error).
Linkage analysis¶
PairwiseLD¶
 class
Analyzes linkage disequilibrium for a pair of polymorphic sites.
This class considers a single pair of polymorphic sites at a time. The first method, process(), detects alleles at both sites under consideration and determines whether the pairwise comparison is fit for analysis (based on the presence of polymorphism, and allele frequencies). Statistics are computed by compute() for a given pair of alleles. Letting the user filter out sites for which that are more than two alleles and, if necessary, process multiple pairs of alleles.
One should first process a pair of sites with process(). If the return value is false, one should not process data further. Otherwise, one can access data with num_alleles1(), num_alleles2(), index1(), index2(), freq1(), freq2(), freq(), and nsam(), and can also compute LD with compute() (for a given pair of alleles) and then access to LD estimates.
Public Functions

egglib::PairwiseLD::
PairwiseLD
()¶ Default constructor.

egglib::PairwiseLD::
~PairwiseLD
()¶ Destructor.

void
egglib::PairwiseLD::
compute
(unsigned int allele1, unsigned int allele2)¶ Compute D, D’, r and r^2 statistics for a given pair of alleles.
The method process() must have been executed and must have returned true.
Statistics are computed only for a given pair of alleles. If there are only two alleles, all allele pairs result in consistent results. Otherwise, some multiallele summarizing methodology has to be applied.
allele1 and allele2 are the allele indexes at the first and second site, respectively.

double
egglib::PairwiseLD::
D
()¶
const Get the D statistic.
This value is reset to 0 upon call to process().
Requires compute().

double
egglib::PairwiseLD::
Dp
()¶
const Get the D’ statistic.
This value is reset to 0 upon call to process().
Requires compute();

unsigned int
egglib::PairwiseLD::
freq
(unsigned int allele1, unsigned int allele2)¶
const Get the frequency of a genotype.
The method process() must have been executed and must have returned true.
The indexes must be smaller than the value returned by num_alleles1() and num_alleles2() respectively.

unsigned int
egglib::PairwiseLD::
freq1
(unsigned int allele)¶
const Get the frequency of an allele for the first site.
The method process() must have been executed and must have returned true.
The index must be smaller than the value returned by num_alleles1().

unsigned int
egglib::PairwiseLD::
freq2
(unsigned int allele)¶
const Get the frequency of an allele for the second site.
The method process() must have been executed and must have returned true.
The index must be smaller than the value returned by num_alleles2().

unsigned int
egglib::PairwiseLD::
index1
(unsigned int allele)¶
const Get the index of an allele for the first site.
The method process() must have been executed and must have returned true.
For a given allele, get its index within the original SiteHolder instance. The indexes can be shifted by process() due to missing data.

unsigned int
egglib::PairwiseLD::
index2
(unsigned int allele)¶
const Get the index of an allele for the second site.
The method process() must have been executed and must have returned true.
For a given allele, get its index within the original SiteHolder instance. The indexes can be shifted by process() due to missing data.

unsigned int
egglib::PairwiseLD::
nsam
()¶
const Get the number of analyzed samples.
The method process() must have been executed and must have returned true.
The returned value might be smaller than the initial number of samples due to missing data.

unsigned int
egglib::PairwiseLD::
num_alleles1
()¶
const Get the actual number of alleles at the first site.
The method process() must have been executed and must have returned true.
Gives the number of different alleles at the first site, considering only samples for which both sites have exploitable data.

unsigned int
egglib::PairwiseLD::
num_alleles2
()¶
const Get the actual number of alleles at the second site.
The method process() must have been executed and must have returned true.
Gives the number of different afirst lleles at the second site, considering only samples for which both sites have exploitable data.

bool
egglib::PairwiseLD::
process
(const SiteHolder &site1, const SiteHolder &site2, unsigned min_n, double max_maj, const StructureHolder *stru)¶ Analyze a pair of sites.
The method takes two sites as argument. The two sites must be taken from the same data set. In particular, the sample sizes must be identical. Structure and outgroup are ignored. The indexes of samples must be matching over the two sites. Samples which are missing in either of the samples are skipped. If the remaining samples are less than the argument min_n, the whole computation is dropped. Genotypes are ignored (only alleles are considered).
If this method returns true, statistics might be computed for a given pair of alleles using the compute() method. The number of alleles available for analysis is available at either site using num_alleles1() and num_alleles2(). When returning false, this method stops as early as possible, and the state of the object might be inconsistent. In this case, no accessor must be used and compute() must not be called.
 Return
 true if computations have been performed, false if the sites fall in one the following cases: not enough samples (based on the min_n argument); either site is fixed; the allele frequencies are too unbalanced with at least one allele at a frequency larger than max_maj.
 Note
 Due to missing data, a site that is initially polymorphic might appear to be fixed when considering only samples that are not missing for the other site, causing this method to drop the pairwise comparison. Conversely, a site that has more than two alleles might have only two when considering only samples that are not missing for the other site. For this reason, it is not trivial to filter out sites before calling this method, and sites might not be consistently included or rejected.
 Parameters
site1
first site.
site2
second site.
min_n
minimum number of samples used (this value must always be larger than 1).
max_maj
maximum relative frequency of the majority allele (if any allele at either site has a frequency larger than this value, the pairwise comparison is dropped).
stru
a Structure object (used to process a subset of samples). By default, use all samples.

double
egglib::PairwiseLD::
r
()¶
const Get the r statistic.
This value is reset to 0 upon call to process().
Requires compute().

MatrixLD¶
 class
Analyzes linkage disequilibrium between pairs of sites.
This class processes a set of SiteHolder instances and computes linkage disequilibrium for all pairs of sites. A PairwiseLD instance is provided for all comparison, skipping all pairs for which LD cannot be computed (there are several criteria). The approach consists in first calling load() by providing a set of SiteHolder instances. The method computeLD() computes the LD for each pair and computeStats() computes the statistics of Kelly (1997) and Rozas et al. (2001). These statistics are based on the average of pairwise linkage disequilibrium statistics. In addition, computeRmin() computes Rm of Hudson and Kaplan (1985) and does not generate nor use PairwiseLD instances, and it can be used independently.
Public Types

enum
egglib::MatrixLD::
MultiAllelic
¶ Flags for processing multiallelic sites.
This enum is used to specify what should be done with pairs of sites for which at least one site has more than two alleles.
Values:
Only process pairs of sites with exactly two alleles.
Use the allele with highest frequency.
Use all alleles.
Public Functions

egglib::MatrixLD::
MatrixLD
()¶ Constructor.

egglib::MatrixLD::
~MatrixLD
()¶ Destructor.

void
egglib::MatrixLD::
computeLD
(unsigned min_n, double max_maj)¶ Compute LD between all pairs of sites.
Use sites loaded using load() and process all possible pairs. Each pairwise comparison is retained only if all filters are passed (see arguments of this method). After call of this method, the number of pairs can be accessed using num_tot() (it is equal to n(n1)/2 where n is the number of loaded sites); the number of analyzed pairs can be accessed using num_pairs(); the total number of analyzed allele pairs can be accessed using num_alleles(). Then the method compute() can be called to compute Kelly’s statistics.
 Note
 Due to missing data, it is not trivial to predict whether a pairwise comparison will be dropped. See the documentation of PairwiseLD::process().
 Parameters
min_n
minimum number of samples used (this value must always be larger than 1).
max_maj
maximum relative frequency of the majority allele (if any allele at either site has a frequency larger than this value, the pairwise comparison is dropped).

void
egglib::MatrixLD::
computeRmin
(bool oriented)¶ Computes Hudson and Kaplan’s Rmin.
To be used, this method requires that sites have been loaded in increasing position order. Only sites with exactly two alleles and no missing data at all are used. In addition, if the oriented argument is set to true, only orientable sites are considered. Sites with more than two alleles and sites with any missing data, and sites not orientable if oriented is set to true, are ignored.
When the method has finished, a few methods provide access to the results. Rmin_num_sites() give the number of sites considered for the analysis. If the value is less than two, the statistic itself bears no signification. Rmin() gives the minimum number of recombination events. Finally, all of the nonreductible intervals containing a recombination even can be accessed using the two methods Rmin_left(unsigned int) and Rmin_right(unsigned int). The number of intervals is always Rmin().
 Parameters
oriented
if true, consider only orientable sites and apply the three (instead four) gametes rule. If false, ignore all outgroup data and include orientable and nonorientable sites.

void
egglib::MatrixLD::
computeStats
(MultiAllelic multiallelic, unsigned int min_freq)¶ Computes Kelly’s and Rozas et al.’s statistics.
Computes ZnS, Z*nS and Z*nS* (Kelly 1997), and Za and ZZ (Rozas et al. 2001) on the basis of analyzed site pairs (requires computeLD()). The number of alleles pairs used for computing ZnS, Z*nS and Z*nS* is available as num_allele_pairs(), and the number of allele pairs used for computing Za and ZZ is available as num_allele_pairs_adj(). The statistics must not be used if the corresponding number of allele pairs is 0.
If multiallelic equals to MatrixLD::ignore, only pairs of sites for which both sites have exactly two alleles are processed. In this case, the first allele of each site is considered. If multiallelic is MatrixLD::use_main, the alleles with highest frequency are considered (even if one or both sites have only two alleles). In case of equality, the first allele is considered. If multiallelic is MatrixLD::use_all, then all alleles of all sites are used, and the final statistics are averaged over num_alleles() (rather than num_pairs).
 Parameters
multiallelic
modifies the behaviour of the method (see above).
min_freq
this flag has an effect only if used in conjunction with MatrixLD::use_all (it is ignored otherwise); if larger than 0, rather than using all alleles, use only those that have a frequency equal to or larger than the given value.

unsigned int
egglib::MatrixLD::
distance
(unsigned int index)¶
const Get the distance between sites for a given pair.
Requires that enough pairs have been loaded using load() and that requested index must be smaller than num_pairs(). The distance is returned as an absolute value. It is not possible to determine to which sites the pair index corresponds. If you need it, you might want to use PairwiseLD directly.

unsigned int
egglib::MatrixLD::
index1
(unsigned int allele)¶
const Index of first site for a given pair.
See pairLD().

unsigned int
egglib::MatrixLD::
index2
(unsigned int allele)¶
const Index of second site for a given pair.
See pairLD(). Note that index2 is always > index1.

void
egglib::MatrixLD::
load
(const SiteHolder &site, double position)¶ Load a site.
 Parameters
site
One of the sites.
position
All sites must have a valid position. Positions are required to be increasing. For computing Rmin, positions are ignored (they only are fed back if interval limits are required).

unsigned int
egglib::MatrixLD::
num_allele_pairs
()¶
const Number of allele pairs used to compute Kelly’s statistics.
Return the number of allele pairs used by computeStats() to compute Kelly’s ZnS, Z*nS and Z*nS* statistics. If multiallelic equals ignore, this value equals the number of pairs of sites with exactly two alleles each (at most, num_pairs()); if multiallelic was use_main, this value equals num_pairs(); if multiallelic was use_all, this value equals num_alleles(). If the returned value is 0 (no loaded pairs of sites, or no pairs of diallelic sites, if multiallelic was set to MatrixLD::ignore), Kelly’s statistics have been reset to 0 but should then be considered as not computable. If num_allele_pairs() is null, none of the Kelly’s and Rozas et al.’s statistics can be computed.

unsigned int
egglib::MatrixLD::
num_allele_pairs_adj
()¶
const Number of allele pairs used to compute Rozas et al.’s statistics.
Return the number of allele pairs used by computeStats() to compute Rozas et al.’s Za and ZZ statistics. See the documentation of num_allele_pairs() for reference. The meaning of this value is similar, except that it applies only to adjacent polymorphic sites (the value hence can only be smaller, or equal is limit cases). If this vallue is 0, Rozas et al.’s statistics have been reset to 0 but should then be considered as not computable.

unsigned int
egglib::MatrixLD::
num_alleles
()¶
const Get the total number of allele pairs.
Requires computeLD(). Returns the sum of allele pairs over all analyzed sites (see num_pairs()). The minimum value is twice num_pairs() (since, by definition, there must be at least two alleles at each retained site).

unsigned int
egglib::MatrixLD::
num_pairs
()¶
const Get the number of analyzed pairs of sites.
Requires computeLD(). The returned value excludes all pairwise comparisons with no polymorphism failing any other criterion (see the computeLD() method).

unsigned int
egglib::MatrixLD::
num_tot
()¶
const Get the total number of pairs of sites.
Requires that site pairs have been processed using computeLD().

const PairwiseLD &
egglib::MatrixLD::
pairLD
(unsigned int index)¶
const Get linkage disequilibrium for a given pair of sites.
Requires that enough pairs have been loaded using load() and that requested index must be smaller than num_pairs(). Use methods index1() and index2() to obtain the corresponding site indexes.

unsigned int
egglib::MatrixLD::
process
(unsigned min_n, double max_maj, MultiAllelic multiallelic, unsigned int min_freq, bool oriented)¶ Call {computeLD() and computeStats()} and/or computeRmin() based on toggled flags.
Arguments are like for the three methods. Enter anything if they are not used.
Return value is a flag with the following bits:
 0: ZnS, Z*nS, and Z*nS* are computed.
 1: Za and ZZ are computed.
 2: Rmin was computed.

void
egglib::MatrixLD::
reset
()¶ Reset to defaults.

unsigned int
egglib::MatrixLD::
Rmin
()¶
const Minimal number of recombination events.
Requires computeRmin().

unsigned int
egglib::MatrixLD::
Rmin_left
(unsigned int i)¶
const Left bound of a recombination interval.
Requires computeRmin() and that Rmin_num_sites() is at least 2.

unsigned int
egglib::MatrixLD::
Rmin_num_sites
()¶
const Number of sites used for computing Rmin.
Requires computeRmin().
Fixed to 0 if there are less than two sites in total (no computation is performed by computeRmin() in that case). If Rmin_num_sites() is less than 2, Rmin() is not defined and fixed to 0.

unsigned int
egglib::MatrixLD::
Rmin_right
(unsigned int i)¶
const Right bound of a recombination interval.
Requires computeRmin() and that Rmin_num_sites() is at least 2.

void
egglib::MatrixLD::
toggle_off
()¶ Toggle all off.

void
egglib::MatrixLD::
toggle_Rmin
()¶ Toggle Rmin.

void
egglib::MatrixLD::
toggle_stats
()¶ Toggle summary statistics.

double
egglib::MatrixLD::
Za
()¶
const Get Rozas et al.s Za statistic.
Requires computeStats(). See documation of this method to known when this value is defined.

double
egglib::MatrixLD::
ZnS
()¶
const Get Kelly’s ZnS statistic.
Requires computeStats(). See documation of this method to known when this value is defined.

double
egglib::MatrixLD::
ZnS_star1
()¶
const Get Kelly’s Z*nS statistic.
Requires computeStats(). See documation of this method to known when this value is defined.

double
egglib::MatrixLD::
ZnS_star2
()¶
const Get Kelly’s Z*nS* statistic.
Requires computeStats(). See documation of this method to known when this value is defined.

double
egglib::MatrixLD::
ZZ
()¶
const Get Rozas et al.s ZZ statistic.
Requires computeStats(). See documation of this method to known when this value is defined.

enum
See also the Rd
class.
Extended haplotype homozygosity¶
 class
Compute Extended Haplotype Homozygosity statistics.
Compute statistics described in Sabeti et al. (Nature 2002), Voight et al. (PLoS Biology 2006), RamirezSoriano et al. (Genetics 2008) and Tang et al. (PLoS Biology 2007).
The user must first load the core haplotype or site using the set_core() method which also allows to specify option values, and then all needed distant sites using load_distant(). Distant sites must be loaded for one side only and with always increasing distance relatively to the core. To load sites of the other side, the user needs to call set_core() again with the same core site in order to reset statistics. Statistics are automatically computed and updated at each loaded distant site. It is required to load at least one valid core site before using accessors.
Header: <egglibcpp/EHH.hpp>
Public Functions

egglib::EHH::
EHH
()¶ Constructor.

virtual
egglib::EHH::
~EHH
()¶ Destructor.

double
egglib::EHH::
dEHHc
(unsigned int haplotype)¶
const Get an EHHc decay value.

double
egglib::EHH::
dEHHG
()¶
const Get an EHHS (genotypes) decay value.

double
egglib::EHH::
dEHHS
()¶
const Get an EHHS decay value.

double
egglib::EHH::
EHHc
(unsigned int haplotype)¶
const Get an EHHc value.

double
egglib::EHH::
EHHG
()¶
const Get an EHHG value.

double
egglib::EHH::
EHHS
()¶
const Get an EHHS value.

bool
egglib::EHH::
flag_EHHG_done
()¶
const Tell if decay has been reached for EHHG.

bool
egglib::EHH::
flag_EHHS_done
()¶
const Tell if decay has been reached for EHHS.

double
egglib::EHH::
iEG
()¶
const Get an iEG value.

double
egglib::EHH::
iES
()¶
const Get an iES value.

double
egglib::EHH::
IHH
(unsigned int haplotype)¶
const Get an IHH value.

double
egglib::EHH::
IHHc
(unsigned int haplotype)¶
const Get an IHHc value.

double
egglib::EHH::
iHS
(unsigned int haplotype)¶
const Get an iHS value.

unsigned int
egglib::EHH::
K_core
()¶
const Number of used haplotypes of the core.

unsigned int
egglib::EHH::
K_cur
()¶
const Current number of haplotypes.

void
egglib::EHH::
load_distant
(const SiteHolder *site, double distance)¶ Load a distant site.
For each core haplotype, compute or update all statistics.
 Parameters
site
the distant site to be loaded. The method will only throw an exception if the number of samples differ.
distance
between the core and the distant site. The nature of the distance metrics is up to the user but must be consistent over sites. Distances must be >=0 this must correspond to the positions of distant sites relatively to the core region or site. Site must be loaded such as distances must always be increasing.

unsigned int
egglib::EHH::
num_avail_core
(unsigned int)¶
const Current number of nonmissing samples for a core haplotype.

unsigned int
egglib::EHH::
num_avail_cur
(unsigned int)¶
const Current number of nonmissing samples for a current haplotype.

unsigned int
egglib::EHH::
num_avail_tot
()¶
const Current number of nonmissing samples.

unsigned int
egglib::EHH::
num_EHH_done
()¶
const Number of haplotypes for which computation of dEHH and IHH has been completed.

unsigned int
egglib::EHH::
num_EHHc_done
()¶
const Number of haplotypes for which computation of dEHHc and IHHc has been completed.

double
egglib::EHH::
rEHH
(unsigned int haplotype)¶
const Get an rEHH value.

void
egglib::EHH::
set_core
(const SiteHolder *site, bool genotypes, double EHH_thr, double EHHc_thr, double EHHS_thr, double EHHG_thr, unsigned int min_freq, unsigned int min_sam, bool crop)¶ Load the core site or region.
This method automatically resets the instance (clear all previously computed data and reallocate arrays to proper sizes). The Site instance passed as core is only used by this method. All counters will be incremented, until the next call to set_core(), or eventual destruction of object. All thresholds are understood as either EHH or EHHS values and therefore must lie between 0.0 and 1.0.
 Parameters
site
core site or region. If a region, haplotypes within the core region must have been identified previously and should be loaded as a Site instance. The site may contain missing data. The samples containing missing data at the core site will be ignored for all subsequently loaded distant site.
genotypes
if true, consider that data are entered as unphased genotypes (the Site instance must have consistent data).
EHH_thr
threshold EHH value.
EHHc_thr
threshold EHHc value.
EHHS_thr
threshold EHHS value.
EHHG_thr
threshold EHHS (genotypes) value.
min_freq
minimal absolute frequency for haplotypes (haplotypes with lower frequencies are ignored). Required to be strictly larger than zero.
min_sam
minimal number of samples to continue computing (applied both within core haplotypes and for the total).
crop
if True, set values of EHHS that are below the threshold to 0 to emulate the behaviour of the R package rehh (also affects iES).

Helpers¶
Constants¶

const unsigned int
MAX_N_STIRLING
¶ Maximal n values for precomputed Stirling numbers.
Header: <egglibcpp/stirling.hpp>

const unsigned int
NUM_STIRLING
¶ Size of the Stirling numbers table.
Header: <egglibcpp/stirling.hpp>

const double
STIRLING_TABLE
[500499]¶ Array of log(S(n,k)) (Stirling numbers of the 1st kind)
The values must be accessed using the stirling_table() function.
Header: <egglibcpp/stirling.hpp>