EggLib

Table Of Contents

Previous topic

Generic tools

Next topic

Diversity statistics

This Page

Import/export utilities

ms format

egglib.io.to_ms(data, fname=None, positions=None, spacer=None, include_outgroup=False, recode=False)

Export data in the format used by the ms program (Hudson 2002 Bioinformatics 18: 337-338). The original format is designed for binary (0/1) allelic values. This implementation will export any allelic values that are present in the provided alignment (but always as integers; if sequence data is included, they will be represented by their corresponding integer values). In addition, it is possible to insert spaces between all loci to accomodate allelic values exceeding the range [0,9] (outside this range, it is not possible anymore to discriminate loci using the standard format). See the spacer option.

Parameters:
  • data – Alignments to export, either as a Align instance or as an iterable of Align instances. In the latter case, all instances are exported consecutively.
  • fname – Name of the file to export data to. By default, the file is created (or overwritten if it already exists). If the option append is True, data is appended at the end of the file (and it must exist). If fname is None (default), no file is created and the formatted data is returned as a str. In the alternative case, nothing is returned.
  • positions – The list of site positions, with length matching the alignment length. Positions are required to be in the [0,1] range but the order is not checked. By default (if the argument value is None, sites are supposed to be evenly spread over the [0,1] interval. The value for this argument should match exactly the value for the argument data: if data is a single Align, positions should be a single list of positions and if data is a list of Align instances (even if the length of this list is one), then positions should be a list (of lists of positions) of the same length. If a list is provided, any of its items (supposed to represent a list of positions) can be replaced by None.
  • spacer – Define if a space must be inserted between each allelic value. If None, the space is inserted only if at least one allele at any locus is outside the range [0,9]. If True, the space is always inserted. If False, the space is not inserted. The automatic detection of out-of-range allelic values comes with the cost of increased running time.
  • include_outgroup – A boolean: if True, the outgroup is exported after the ingroup. Otherwise, the outgroup is skipped.
  • recode – If True, all allelic values are recoded such as the first encountered values is 0, the second is 1, and so on. The original alignments are left unmodified.

The format is as follows:

  • One line with two slashes.

  • One line with the number of sites

  • One line with the positions, or an empty line if the number of

    sites is zero.

  • The matrix of genotypes (one line per sample), only if the number

    of sites is larger than zero.

Fasta format

Here is the description of the fasta format used in EggLib:

  • Each sequence is preceded by a header limited to a single line and starting by a > character.
  • The header length is not limited and all characters are allowed but white spaces and special characters are discouraged. The header is terminated by a newline character.
  • Group labels are specified a special markup system placed at the end of the header line. The labels are specified by an at sign (@) followed by any integer value (@0, @1, @2 and so on). It is allowed to define several group labels for any sequence. In that case, integer values must be enter consecutively after the at sign, separated by commas, as in @1,3,2 for a sequence belonging to groups 1, 3 and 2 in three different grouping levels. Multiple grouping levels can be used to specify hierarchical structure, but not only (independent grouping structure can be freely specified). The markup @# (at sign or hash sign) specifies an outgroup sequence. The hash sign may be followed by a single integer to specify a unique group label. Multiple grouping levels are not allowed for the outgroup. The group labels of the ingroup and the outgroup are independent, so the same labels may be used. The at sign can be preceded by a unique space. In that case, the parser automatically discards one space before the at sign (both >name@1 and >name @1 are read as name) but if there are more than one space, additional spaces are considered to be part of the name. By default, no grouping structure is assumed and all sequences are assumed to be part of the ingroup.
  • The sequence itself continues on following lines until the next > character or the end of the file.
  • White spaces, tab and carriage returns are allowed at any position. They are ignored unless for terminating the header line. There is no limitation in length and different sequences can have different lengths.
  • Characters case is preserved and significant (although polymorphism analysis can be configured to take case-differing characters as synonyms).
egglib.io.from_fasta(source, groups=False, string=False, cls=None)

Create a new instance of either Align or Container from data read from the file whose name is provided as argument (or, if string is set to True, from the string passed as first argument).

Parameters:
  • source – name of a fasta-formatted sequence file. If the string argument is True, read source as a fasta-formatted string. If the returned type if Align, the sequence are required to be aligned.
  • groups – boolean indicating whether group labels should be imported. If so, they are not actually required to be present for each (or any) sequence. If not, the labels are ignored and considered to be part of the sequence.
  • string – boolean indicating whether the first argument is an explicit fasta-formatted string (by default, it is taken as the name of a fasta-formatted file).
  • cls – type that should be generated. Possible values are: Align (then, data must be aligned), Container or None. In the latter case, an Align is returned if data are found to be aligned or if the data set is empty, and otherwise a Container is returned.
Returns:

A new Container or Align instance depending on the value of the cls option.

class egglib.io.fasta_iter(fname, groups=False)

Iterative sequence-by-sequence fasta parser. Return an object that can be iterated over:

for item in egglib.io.fasta_iter(fname):
    do things

fasta_iter objects support the with statement:

with egglib.io.fasta_iter(fname) as f:
    for item in f:
        do things

Each iteration yields a SampleView instance (which is valid only during the iteration round, see the warning below). It is also possible to iterate manually using next(). The number of groups is defined by the current sample (if the number of defined groups varies among samples, it is reset at each iteration).

Warning

The aim of this iterator is to iterator over large fasta files without actually storing all data in memory at the same time. SampleView provided for each iteration are a proxy to a local Container instance that is recycled at each iteration step. They should be used immediately and never stored as this. If one want to store data accessible through any SampleView, they should copy this data to another data structure (typically using Container.add_sample()).

Parameters:
  • fname – name of a fasta-formatted file.
  • groups – if True, import group labels from sequence names (by default, they are considered as part of the name).

New in version 3.0.0.

next()

Perform an iteration round. Raise a StopIteration exception if the file is exhausted. The normal usage of this type of objects is with the for statement.

GFF3 format

class egglib.io.GFF3(source, from_string=False, liberal=False, only_genes=False)

Read General Feature Format (GFF)-formatted (version 3) genome annotation data from a file specified by name or from a provided string.

See the description of the GFF3 format http://www.sequenceontology.org/gff3.shtml.

This class supports segmented features but only if they are consecutive in the file. All features are loaded into memory and can be processed interactively.

Parameters:
  • source – name of GFF3-formatted data file, or a GFF3-formatted string is from_string is True.
  • from_string – if True, the first argument is a GFF3-formatted string; if False, the first argument is a file name.
  • liberal – if True, support some violations of the GFF3 format.
  • only_genes – if True, only index top-level features that have gene as type. This does not reduce memory usage, but it reduces access time.

The liberal argument allows to support a few violations from the canonical GFF3 format. The current list of violations is:

  • CDS features may lack a phase.

The list of supported violations may change in the future between minor versions of EggLib. It is recommended to use liberal=True only for exploratory analyses. Otherwise, it is better to fix the original file so that it complies with the format.

New in version 3.0.0.

feature_iter(seqid, start, end, feat_type=None)

Return an iterator over features, returned as GFF3Feature instances. Only top features complying with arguments are considered.

Parameters:
  • seqid – seqid identifier (only features associated with this seqid are yielded.
  • start – start position on the considered seqid (only features whose start position is >= this value are yielded).
  • end – end position on the considered seqid (only features whose end position is <= this value are yielded. One can use None to process features until the end of the seqid.
  • feat_type – process only features whose type is equal to this value. By default, process all features within range.

If seqid is not valid for this data file, raise a ValueError.

metadata

Metadata of the imported file (information present in the file header). Metadata are available as a list of (key, value) tuple() ‘s. It is not possible to replace the list by another object, but it is allowed to modify the returned object.

num_top_features

Number of features that are directly accessible (that is, those who don’t have a parent).

num_tot_features

Total number of features of the input file, included lower-level features and features that have not been indexed, if any.

seqid

All seqid values present in the imported file, as a frozenset instance.

types

Number of top features of each of the encountered types. This does not include lower-level features (that is, those who have a parent). Value is a dict (there is no good reaon to modify it).

class egglib.io.GFF3Feature

Provide information related to a given feature. Currently, instances cannot be created by the user and are read-only. Data are available as read-only properties; some of these are lists and can be modified but it does not make any sense to do so.

ID

Value of the ID attribute, or None if this attribute was not defined.

aliases

List of Alias attributes.

attributes

List of non-predefined attributes, as (key, items) tuples, where items is itself a list.

dbxref

List of Dbxref attributes.

derives_from

Value of the Derives_from attribute, or None if this attribute was not defined.

end

End position.

gap

Value of the Gap attribute, or None if this attribute was not defined.

get_parent(idx)

Get a parent, as a GFF3Feature instance. The index should be within range.

get_part(idx)

Get a part (descending feature), as a GFF3Feature instance. The index should be within range.

is_circular

Value of the Is_circular attribute, as a boolean.

name

Value of the Name attribute, or None if this attribute was not defined.

notes

List of Note attributes.

num_fragments

Number of fragments.

num_parents

Number of parents.

num_parts

Number of parts (descending features).

ontology_terms

List of Ontology_term attributes.

phase

Value of the Phase attribute. Possible values are: 0, 1, 2 and None (if undefined).

positions

List of start/end positions of all fragments.

score

Value of the Score attribute, or None if this attribute was not defined.

seqid

Value of seqid for this feature.

source

Source of this feature.

start

Start position.

target

Value of The Target attribute, or None if this attribute was not defined. The target value is no processed and is provided as a single string.

type

Type of this feature.

VCF format

class egglib.io.VcfParser(fname, allow_X=False, allow_gap=False)

Read Variant Call Format (VCF)-formatted data for genomic polymorphism information from a file specified by name or from a strings.

Parameters:
  • fname – name of a properly formatted VCF file. The header section will be processed upon instance creation, and lines will be read later, when the user iterates over the instance (or call next()).
  • allow_X – if True, the characters X and x can be used instead of a base in alternate alleles. This is not allowed in the VCF specification but some software has actually used it. If X is allowed and one is found, the alternate type will be set to an ad hoc type and the corresponding allele string will be X (regardless of the original case).
  • allow_gap – if True, the gap symbol - is accepted as a valid base for the specification of both reference and alternate alleles. This is not allowed in the VCF specification which follows a different convention to represents insertions and deletions.

See the description of the VCF format.

There are two ways to process VCF data: one is using a static file that is iteratively parsed, using the standard constructor VcfParser(fname) and then iterate over lines in a for loop (alternatively, one can use VcfParser.next() directly), and the other way is to use the class factory method VcfParser.from_header(string) and then feed manually each line as a string using VcfParser.read_line(string).

VcfParser instances are iterable (with support for the for statement and the next() method) only if they are created with a file to process. Otherwise they must be fed line-by-line with the read_line() method. Every loop in a for loop or call to next() or read_line() yields a (chromosome, position, num_all) tuples that allows the user to determines if the variant is of interest. If so, the VcfParser object provides methods to extract all data for this variant (which can be time-consuming and should be restricted to pre-filtered lines to improve efficiency.

New in version 3.0.0.

file_format

File format present in the read header.

classmethod from_header(string, allow_X=False, allow_gap=False)

Create and return a new VcfParser instance reading the header passed as the string argument.

Parameters:
  • string – single string including system-consistent line endings, the first line being the file format specification and the last line being the header line (starting with #CHROM). This function allows leading and trailing white spaces (spaces, tabs, empty lines).
  • allow_X – see class description.
  • allow_gap – see class description.
get_alt(idx)

Get data for a given ALT field defined in the VCF header. The passed index must be smaller than num_alt. Return a dict containing the following data:

  • id: ID string.
  • description: description string.
  • extra: all extra qualifiers, presented as a list of (key, value) tuples.
get_filter(idx)

Get data for a given FILTER field defined in the VCF header. The passed index must be smaller than num_filter. Return a dict containing the following data:

  • id: ID string.
  • description: description string.
  • extra: all extra qualifiers, presented as a list of (key, value) tuples.
get_format(idx)

Get data for a given FORMAT field defined in the VCF header. The passed index must be smaller than num_format. Return a dict containing the following data:

  • id: ID string.
  • type: one of "Integer", "Float", "Character", and "String".
  • description: description string.
  • number: expected number of items. Special values are None (if undefined), "NUM_GENOTYPES" (number matching the number of genotypes for any particular variant), "NUM_ALTERNATE" (number matching the number of alternate alleles for any particular variant), and "NUM_ALLELES" (number matching the number of alleles–including the reference–for any particular variant).
  • extra: all extra qualifiers, presented as a list of (key, value) tuples.
get_genotypes([parser1, [parser2, ]]..., get_genotypes=False, dest=None)

Process genotype data loaded into one or more VcfParser instances and return them as a single Site instance.

There are two ways to use this method:

  1. As an instance method, to process a single parser, as in: parser.get_genotypes() (if parser is a VcfParser instance).
  2. As a class method, to process several parsers, as in: VcfParser.get_genotypes(parser1, parser2, parser3) (where parser1, parser2 and parser3 are three VcfParser instances).

This method requires that all parsers have been used to process valid data. If the AA field is present in the processed parsers, its value is imported as outgroup. If get_genotypes is True, the ancestral genotype is assumed to be be homozygote for the given ancestral allele. If get_genotypes is False, the ancestral allele is loaded only once in the outgroup.

Parameters:
  • parser – a VcfParser instance (only required if used as a class method). This argument can be repeated several times, but can not be passed as a keyword argument.
  • get_genotypes – if True, use genotypic data rather than allelic data.
  • dest – if specified, it must be a Site instance that will be recycled and used to place results.
Returns:

A Site instance by default, or None if dest was specified.

get_info(idx)

Get data for a given INFO field defined in the VCF header. The passed index must be smaller than num_info. Return a dict containing the following data:

  • id: ID string.
  • type: one of "Integer", "Float", "Flag", "Character", and "String".
  • description: description string.
  • number: expected number of items. Special values are None (if undefined), "NUM_GENOTYPES" (number matching the number of genotypes for any particular variant), "NUM_ALTERNATE" (number matching the number of alternate alleles for any particular variant), and "NUM_ALLELES" (number matching the number of alleles–including the reference–for any particular variant).
  • extra: all extra qualifiers, presented as a list of (key, value) tuples.
get_meta(idx)

Get data for a given META field defined in the VCF header. The passed index must be smaller than num_meta. Return a tuple containing the key and the value of the META field.

get_sample(idx)

Get the name of a sample read from the header. The passed index must be smaller than num_samples.

last_variant()

Return a Variant instance containing all data available for the last variant processed by this instance. It is required that a variant has been effectively processed.

next()

Read one variant. Raise a StopIteration exception if no data is available.

Returns:The same as an iteration loop (see class description).
num_alt

Number of defined ALT fields.

num_filter

Number of defined FILTER fields.

num_format

Number of defined FORMAT fields.

num_info

Number of defined INFO fields.

num_meta

Number of defined META fields.

num_samples

Number of samples read from header.

read_line(string)

Read one variant from a user-provided single line. The string should contain a single line of VCF-formatted data (no header). All field specifications and sample information should be consistent with the information contained in the header that has been provided at creation-time to this instance (whichever it was read from a file or also provided as a string).

Returns:The same as an iteration loop (see class description).
class egglib.io.Variant

Represent a single variant (one line from a VCF-formatted data file). The user cannot create instances of this class himself (instances are generated by VcfParser) and instances are not modifiable in principle (however, some attributes provide mutable objects, as mentioned).

Note

The AA (ancestral allele), AN (allele number), AC (allele count), and AF (allele frequency) INFO fields as well as the GT (deduced genotype) FORMAT are automatically extracted if they are present in the the file and if their definition matches the format specification (meaning that they were not re-defined with different number/type) in the header. If present, they are available through the dedicated attributes AN, AA, AC, AF, GT, GT_ploidy and GT_phased. However, they are still available in the respective info and samples (sub)-dictionaries.

AA

Value of the AA info field (None if missing).

AC

Value of the AC info field, as a tuple (None if missing).

AF

Value of the AF info field, as a tuple (None if missing).

AN

Value of the AN info field (None if missing).

GT

Genotypes from GT fields (only if this format field is available), provided as a tuple of sub-tuples. The number of sub-tuples is equal to the number of samples (num_samples). The number of items within each sub-tuples is equal to the ploidy (GT_ploidy). These items are allele expression (as found in alleles), or None (for missing values). This attribute is None if GT is not available.

GT_phased

Boolean indicating whether the genotype for each sample is phased (None if GT is not available).

GT_ploidy

Ploidy among genotypes (None if GT is not available).

ID

Tuple containing all IDs (even if just one or none).

alleles

Variant alleles (the first is the reference and is not guaranteed to be present in samples), as a tuple.

alt_type_breakend = 3

Alternate allele symbolizing a breakend (see VCF description for more details).

alt_type_default = 0

Explicit alternate allele (the string represents the nucleotide sequence of the allele).

alt_type_referred = 2

Alternate allele referring to a pre-defined allele (the string provides the ID of the allele).

alternate_types

Alternate alleles types, as a tuple. One value is provided for each alternate allele. The provided values are integers whose values should always be compared to class attributes alt_type_default, alt_type_referred and alt_type_breakend, as in (for the type of the first alternate allele):

type_ = variant.alternate_types[0]
if type_ == variant.alt_type_default:
    allele = variant.allele(0)
chromosome

Chromosome name (None if missing).

failed_tests

Named of filters at which this variant failed, as a tuple (None if no filters applied).

format_fields

Available FORMAT fields ID’s available for each sample, as a frozenset (empty if no sample data is available).

info

Dictionary of INFO fields for this variant. Keys are ID of INFO fields available for this variant, and values are always a tuple of items. For flag INFO types, the value is always an empty tuple.

Note

This dict is mutable, which enables the user to modify the data contained in the instance. Note that this will modify the data contained in this Variant instance, although not in the related VcfParser instance.

num_alleles

Number of alleles (including the reference in all cases).

num_alternate

Number of alternate. Equal to num_alleles minus 1.

num_samples

Number of samples (equivalent to len(Variant.samples)).

position

Position (None if missing).

quality

Variant quality (None if missing).

samples

List of information available for each sample (empty list if no samples are defined). The list contains one dict for each sample: keys of these dictionary are FORMAT fields ID (the keys are always the same as the content of format_fields), and their values are tuples in all cases.

Note

This list and the dict instances it contains are all mutable, which enables the user to modify the data contained in the instance. Note that this will modify the data contained in this Variant instance, although not in the related VcfParser instance.

Legacy parsers

egglib.io.from_clustal(string)

Imports a clustal-formatted alignment. The input format is the one generated and used by CLUSTALW (see http://web.mit.edu/meme_v4.9.0/doc/clustalw-format.html).

Parameters:string – input clustal-formatted sequence alignment.
Returns:A new Align instance.

Changed in version 3.0.0: Renamed (previous name was aln2fas()). Input argument is a string rather than a file.

egglib.io.from_staden(string, delete_consensus=True)

Import the output file of the GAP4 program of the Staden package.

The input file should have been generated from a contig alignment by the GAP4 contig editor, using the command “dump contig to file”. The sequence named CONSENSUS, if present, is automatically removed unless the option delete_consensus is False.

Staden’s default convention is followed:

  • - codes for an unknown base and is replaced by N.
  • * codes for an alignment gap and is replaced by -.
  • . represents the same sequence than the consensus at that position.
  • White space represents missing data and is replaced by ?.

New in version 2.0.1: Add argument delete_consensus.

Changed in version 2.1.0: Read from string or fname.

Changed in version 3.0.0: Renamed from_staden(). Only string input is supported now.

egglib.io.from_genalys(string)

Converts Genalys-formatted sequence alignment files to fasta. This function imports files generated through the option Save SNPs of Genalys 2.8.

Parameters:string – input data as a Genalys-formatted string.
Returns:An Align instance.

Changed in version 3.0.0: Renamed from_genalys(). Only string input is supported now.

egglib.io.get_fgenesh(string, locus='locus')

Imports fgenesh output.

Parameters:fname – a string containing fgenesh ouput.
Parma locus:locus name.
Returns:A list of gene and CDS features represented by dictionaries. Note that 5’ partial features might not be in the appropriate frame and that it can be necessary to add a codon_start qualifier.

Changed in version 3.0.0: Input as string. Added locus argument.

class egglib.io.GenBank(fname=None, string=None)

This class represents a GenBank-formatted DNA sequence record.

Parameters:
  • fname – input file name.
  • string – GenBank-formatted string.

Only one of the two arguments fname and string can be non-None. If both are None, the constructor generates an empty instance with sequence of length 0. If fname is non-None, a GenBank record is read from the file with this name. If string is non-None, a GenBank record is read directly from this string. The following variables are read from the parsed input if present: accession, definition, title, version, GI, keywords, source, references (which is a list), locus and others. Their default value is None except for references and others for which default is an empty list. source is a (description, species, taxonomy) tuple. Each of references is a (header, raw reference) tuple and each of others is a (key, raw) tuple.

In addition to methods documented below, the following operations are supported for gb if it is a GenBank instance:

Expression Action
len(gb) Length of the sequence attached to this record
str(gb) GenBank representation of the record
for feat in gb Iterate over GenBankFeature instances of this record
add_feature(feature)

Add a feature to the instance. The argument feature must be a well-formed GenBankFeature instance.

extract(from_pos, to_pos)

Return a new GenBank instance representing a subset of the current instance, from position from_pos to to_pos. All (and only) features that are completely included in the specified range are exported.

number_of_features()

Give the number of features contained in the instance.

rc()

Reverse-complement the instance (in place). All features positions and the sequence will be reverted and applied to the complementary strand. The features will be sorted in increasing start position (after reverting). This method should be applied only on genuine nucleotide sequences.

sequence

Sequence string (can be modified). Note that changing the record’s string might obsolete the features (meaning that the setting an invalid sequence might cause the features to point to incorrect or out-of-bounds regions of the sequence).

write(fname)

Create a file named fname and write the formatted record in.

write_stream(stream)

Writes the content of the instance as a Genbank-formatted string within the passed file (or file-compatible) stream.

class egglib.io.GenBankFeature(parent)

Instances of this class represent features associated to a GenBank instance. They should not be instantiated or used separatedly of a GenBank instance. The constructor creates an empty instance (although a GenBank instance must be passed as parent) and either set() or parse() must be used subsequently.

Parameters:parent – a GenBank instance to which the feature should be attached.

In addition to methods documented below, the following operations are supported for feat if it is a GenBankFeature instance:

Expression Action
str(feat) GenBank representation of the feature
add_qualifier(key, value)

Add a qualifier to the instance’s qualifiers.

copy(genbank)

Return a copy of the current instance, connected to the GenBank instance genbank.

get_sequence()

Return the string corresponding to this feature. If the positions pass beyond the end of the parent’s sequence, a RuntimeError (and not an IndexError) is raised.

get_start()

First position of the first (or unique) segment, in such a way that start() is always smaller than stop().

get_stop()

Last position of the last (or unique) segment, in such a way that start() is always smaller than stop().

get_type()

Return the type string of the instance.

parse(string)

Update feature information from information read in a GenBank-formatted string.

qualifiers()

Return a dictionary with all qualifier values. This method cannot be used to change data within the instance. Note that changes of the returned dictionary don’t affect data contained in the instance.

Changed in version 2.1.0: Meaning changed.

rc(length=None)

Reverse-complement the feature: apply it to the complement strand and reverse positions counting from the end. The length argument specifies the length of the complete sequence and is usually not required.

shift(offset)

Shift all positions according to the (positive of negative) argument.

update(feat_type, location, **qualifiers)

Update feature information.

Parameters:
  • feat_type – a string identifying the feature type (such as "gene", "CDS", "misc_feature", etc.). All strings are acceppted.
  • location – a GenBankFeatureLocation instance giving the feature’s location.
  • qualifiers – other qualifiers must be passed as keyword arguments. It is not allowed to use "type" as a qualifier keyword.
class egglib.io.GenBankFeatureLocation(string=None)

Hold the location of a GenBank feature. Supports various forms of location as defined in the GenBank format specification. The constructor contains a parser working from a GenBank-formatted string. By default, features are on the forward strand and segmented features are ranges (not orders).

In addition to methods documented below, the following operations are supported for loc if it is a GenBankFeatureLocation instance:

Expression Action
len(loc) Number of segments
loc[index] Return the (fist, last) tuple for the corresponding segment
for (first, last) in loc Iterator over segments
str(params) Generate a GenBank representation

GenBankFeatureLocation supports iteration and allows to iterate over (first,last) segments regardless of their types (for a single-base segment a position position, the tuple (position,position) is returned; similar 2-item tuples are returned for other types of segment as well).

add_base_choice(first, last, left_partial=False, right_partial=False)

Add a segment corresponding to a single base chosen within a base range. If no segments were previously enter, set the unique segment location. first and last must be integers. The feature will be set between first and last positions, including both limits. If the feature is intended to be placed on the complement strand between positions, say, 1127 and 1482, one must use add_base_choice(1127,1482) in combination with set_complement(). All entered positions must be larger than any positions entered previously and last must be strictly larger than first. left_partial and/or right_partial must be set to True if, respectively, the real start of the segment lies 5’ of first and/or the real end of the segment lies beyond last (relatively to the forward strand and consistently with the numbering system).

add_base_range(first, last, left_partial=False, right_partial=False)

Add a base range the feature. If no segments were previously enter, set the unique segment location. first and last must be integers. The feature will be set between first and last positions, including both limits. If the feature is intended to be placed on the complement strand between positions, say, 1127 and 1482, one must use add_base_range(1127,1482) in combination with set_complement(). All entered positions must be larger than any positions entered previously and last must be larger than first (but can be equal). left_partial and/or right_partial must be set to True if, respectively, the real start of the segment lies 5’ of first and/or the real end of the segment lies beyond last (relatively to the forward strand and consistently with the numbering system).

add_between_base(position)

Add a segment lying between two consecutive bases. If no segments were entered previously, set the unique segment location. position must be an integer. The feature will be set between position and position + 1. If the feature is intended to be placed on the complement strand between positions, say, 1127 and 1128, one must use add_between_base(1127) in combination with set_complement(). All entered positions must be larger than any positions entered previously.

add_single_base(position)

Add a single-base segment to the feature. If no segments were entered previously, set the unique segment location. position must be an integer. All entered positions must be larger than any positions entered previously.

as_order()

Define the feature as an order instead of a range.

as_range()

Define the features as a range, with is the default.

copy()

Return a deep copy of the current instance.

is_complement()

True if the feature is on the complement strand.

is_range()

True if the feature is a range (the default), False if it is an order.

rc(length)

Reverse the feature positions: positions are modified to be counted from the end. The length of the complete sequence must be passed.

set_complement()

Place the feature on the complement strand.

set_forward()

Place the feature on the forward (not complement) strand, which is the default.

shift(offset)

Shift all positions according to the (positive of negative) argument.