bsxplorer.Genome.gene_body

Genome.gene_body(min_length: int = 0, flank_length: int = 2000) DataFrame[source]

Filter annotation by type == gene and calculate positions of flanking regions.

Warning

This method will have empty output, if type is not specified in input file.

Parameters:
  • min_length – Region length threshold.

  • flank_length – Length of flanking regions.

Return type:

Return polars.DataFrame for downstream usage.

Examples

>>> path = "/path/to/genome.gff"
>>> genome = genome.from_gff(path)
>>> genome.gene_body(min_length=2000, flank_length=2000)
shape: (14_644, 7)
┌─────────────┬────────┬────────┬────────┬──────────┬────────────┬────────────────┐
│ chr         ┆ strand ┆ start  ┆ end    ┆ upstream ┆ downstream ┆ id             │
│ ---         ┆ ---    ┆ ---    ┆ ---    ┆ ---      ┆ ---        ┆ ---            │
│ str         ┆ str    ┆ u64    ┆ u64    ┆ u64      ┆ u64        ┆ str            │
╞═════════════╪════════╪════════╪════════╪══════════╪════════════╪════════════════╡
│ NC_003070.9 ┆ +      ┆ 3631   ┆ 5899   ┆ 1631     ┆ 7899       ┆ gene-AT1G01010 │
│ …           ┆ …      ┆ …      ┆ …      ┆ …        ┆ …          ┆ …              │
│ NC_000932.1 ┆ +      ┆ 104691 ┆ 107500 ┆ 102691   ┆ 109500     ┆ gene-ArthCr087 │
│ NC_000932.1 ┆ +      ┆ 141485 ┆ 143708 ┆ 139485   ┆ 145708     ┆ gene-ArthCp086 │
└─────────────┴────────┴────────┴────────┴──────────┴────────────┴────────────────┘