Basic usage

Note

Ensure that your simulated tree sequence follows the guidelines mentioned in Simulation setup.

Here’s a sample tree sequence simulated with msprime. Note the census time at 100.01:

import msprime

pop_size = 500
sequence_length = 1e7
seed = 98765
rho = 3e-8

# Make the Demography object.
demography = msprime.Demography()
demography.add_population(name="RED", initial_size=pop_size)
demography.add_population(name="BLUE", initial_size=pop_size)
demography.add_population(name="ADMIX", initial_size=pop_size)
demography.add_population(name="ANC", initial_size=pop_size)
demography.add_admixture(
    time=100, derived="ADMIX", ancestral=["RED", "BLUE"], proportions=[0.5, 0.5]
)
demography.add_census(time=100.01) # Census is here!
demography.add_population_split(
    time=1000, derived=["RED", "BLUE"], ancestral="ANC"
)

# Simulate.
ts = msprime.sim_ancestry(
    samples={"RED": 0, "BLUE": 0, "ADMIX" : 2},
    demography=demography,
    random_seed=seed,
    sequence_length=sequence_length,
    recombination_rate=rho
)

Apply tspop.get_pop_ancestry() to get a tspop.PopAncestry object.

import tspop

pa = tspop.get_pop_ancestry(ts, census_time=100.01)

Use print to see a summary of the information held within the object.

print(pa)

> PopAncestry summary
>
> Number of ancestral populations:    2
> Number of sample chromosomes:       4
> Number of ancestors:          118
> Total length of genomes:      40000000.000000
> Ancestral coverage:        40000000.000000

The ancestral information itself is inside two tables. The tspop.PopAncestry.squashed_table shows tracts of ancestry:

st = pa.squashed_table
print(st)

>     sample       left       right  population
> 0        0        0.0    419848.0           0
> 1        0   419848.0    483009.0           1
> 2        0   483009.0   1475765.0           0
> 3        0  1475765.0   2427904.0           1
> 4        0  2427904.0   3635390.0           0
> ..      ...        ...         ...       ...         ...
> 55       3  7369409.0   7596783.0           1
> 56       3  7596783.0   8289015.0           0
> 57       3  8289015.0   8918727.0           1
> 58       3  8918727.0  10000000.0           0

The tspop.PopAncestry.ancestry_table shows a superset of this information: tracts of ancestry, and the ancestor at the census time who contributed each tract. Each row of the squashed table above can be obtained by ‘gluing together’ rows of the ancestry table.

at = pa.ancestry_table
print(at)

>      sample       left       right  ancestor  population
> 0         0        0.0     33027.0        74           0
> 1         0    33027.0    155453.0        33           0
> 2         0   155453.0    290542.0        46           0
> 3         0   290542.0    419848.0        18           0
> 4         0   419848.0    483009.0        83           1
> ..      ...        ...         ...       ...         ...
> 133       3  8672850.0   8849756.0        95           1
> 134       3  8849756.0   8918727.0       131           1
> 135       3  8918727.0   9165035.0        44           0
> 136       3  9165035.0   9176562.0        47           0
> 137       3  9176562.0  10000000.0        58           0

Both the tspop.PopAncestry.squashed_table and the tspop.PopAncestry.ancestry_table are pandas dataframes, so can be analysed using standard operations.

Example: calculating global ancestry

For instance, we could get the sum of all regions inherited from an ancestor in population 0 like this. We’ll first subset the tspop.PopAncestry.squashed_table to only those tracts inherited from an ancestor in population 0:

st0 = st[st.population == 0]
print(st0)

>     sample       left       right  population
> 0        0        0.0    419848.0           0
> 2        0   483009.0   1475765.0           0
> 4        0  2427904.0   3635390.0           0
> 6        0  4606954.0   6277367.0           0
> ..      ...        ...         ...       ...         ...
> 52       3  7043989.0   7134130.0           0
> 54       3  7362300.0   7369409.0           0
> 56       3  7596783.0   8289015.0           0
> 58       3  8918727.0  10000000.0           0

By summing the tract lengths in the rows, we get the length of the tracts from population 0:

pop0_lengths = sum(st0.right - st0.left)
print(pop0_lengths)

> 23278398.0

Dividing this by the sum of the genomic lengths in the tspop.PopAncestry object gives the proportion of the genomes that were inherited from individuals in population 0, with reference to the ancestors present at the census time:

print(pop0_lengths/pa.total_genome_length)

> 0.58195995