Universal I/O

Reference

UniversalReader

Class for batched reading methylation reports.

UniversalReplicatesReader

Class for reading from replicates methylation reports.

UniversalWriter

Class for writing reports in specific methylation report format.

UniversalBatch

Class for storing and converting methylation report data.

Reading and writing

For reading methylation reports of different types BSXplorer offers UniversalReader and UniversalReplicatesReader. They allow user to iterate over methylation report data in fast and convinient way. UniversalReplicatesReader merges methylation data from several methylation reports of biological replicates and returns merged data.

import bsxplorer as bsx

# For single methylation report
reader = bsx.UniversalReader("path/to/file.txt", report_type="bismark", use_threads=True)

for batch in reader:
    # Note that the returned batch is instance of UniversalBatch
    do_something(batch)

# For reading replicates, firstly initialize single readers
reader1 = bsx.UniversalReader("path/to/file1.txt", report_type="bismark", use_threads=True)
reader2 = bsx.UniversalReader("path/to/file2.txt", report_type="bismark", use_threads=True)

# Than you can initialilize UniversalReplicatesReader class with them
for batch in bsx.UniversalReplicatesReader([reader1, reader2]):
    do_something(batch)

BSXplorer inner methylation data format is UniversalBatch, which stores maximum available information about cytosine methylation status and context. UniversalBatch data attribute stores methylation information in polars.DataFrame with schema:

UniversalBatch.data Schema

Field name

Data type

Description

strand

Utf8

DNA strand

position

UInt64

Chromosome position

context

Utf8

Methylation context

trinuc

Utf8

Cytosine trinucleotide sequence

count_m

UInt32

Count of methylated reads

count_total

UInt32

Total reads

density

Float64

Methylation density (NaN if no reads cover cytosine)

For converting one report type into another, BSXplorer offers UniversalWriter, which accepts UniversalBatch as an input and writes it into the file with specified format.

import bsxplorer as bsx

# For single methylation report
reader = bsx.UniversalReader("path/to/file.txt", report_type="bismark", use_threads=True)

with bsx.UniversalWriter("path/to/out.txt", report_type="cgmap") as writer:
    for batch in reader:
        writer.write(batch)

Note

UniversalWriter accepts only UniversalBatch as input for UniversalWriter.write().