I. EDA of BSSeq data generated from non-model organism

BSXplorer is a tool that is specifically designed for easy EDA and visualization of bisulfite sequencing data. It is particularly useful for non-model organisms such as commercially valuable crops and plants, where often there is no reference database available. Researchers in such cases may only have access to an assembled genome and partially or spuriously assembled CDS from publications and associated genomic projects. To demonstrate its capabilities, we will use Brassica rapa subsp. perviridis WGBS dataset [1] as an example.

[1] Tirnaz S, Miyaji N, Takuno S, Bayer PE, Shimizu M, Akter MstA, et al. Whole-Genome DNA Methylation Analysis in Brassica rapa subsp. perviridis in Response to Albugo candida Infection. Front Plant Sci. 2022;13:849358.

Metagene analysis

Read reports

import bsxplorer

The analysis begins with the initialization of the Genome object, which stores information about the annotation of genomic regions and their coordinates.

genome = bsxplorer.Genome.from_custom(
    "genomic_id_ncbi.tsv",
    chr_col=0, type_col=1, start_col=2, end_col=3, strand_col=4, id_col=5, has_header=True
).gene_body(min_length=0, flank_length=2000)

The Metagene object is initialized with the dedicated constructor as shown below:

metagene = bsxplorer.Metagene.from_bismark(
    "DRR336466.CX_report.txt.gz", genome, 
    up_windows=250, body_windows=500, down_windows=250
)

Visualize

To generate a simple line plot to visualize DNA methylation patterns in gene regions the Metagene’s .line_plot() is applied. Then to render the figure the .draw_mpl() method of is used.

tick_args = dict(
    major_labels=["", ""], 
    minor_labels=["-2000bp", "Gene body", "+2000bp"]
)

metagene.line_plot(stat="mean").draw_mpl(confidence=.99, **tick_args)

EDA1 - LinePlot

The metagene object to can be subsetted by methylation context and strand to make a customized heatmap representation (with a .heat_map() function) of methylation pattern in/around gene regions.

filtered = metagene.filter(context="CG", strand="-")
filtered.heat_map(ncol=100, nrow=100).draw_mpl(
    major_labels=["", ""], 
    minor_labels=["-2000bp", "Gene body", "+2000bp"]
)

EDA1 - HeatMap

In cases where a heatmap is too cluttered, clustering genes can generate a clearer representation of the heatmap.

kmeans = filtered.resize(20).cluster(count_threshold=5, na_rm=0).kmeans(n_clusters=10)
kmeans.draw_mpl()

EDA1 - ClusterMap

To facilitate further exploration of the data via, for instance, functional enrichment analyses the tab-delimited list of regions and the associated module IDs can be exported to a file, as shown below:

kmeans.save_tsv("kmeans_labels.tsv")

Chromosome methylation levels analysis

BSXplorer allows a user to visualize the overall methylation levels of chromosomes using the corresponding ChrLevels object:

levels = bsxplorer.ChrLevels.from_bismark("DRR336466.CX_report.txt.gz", chr_min_length=10**6, window_length=10**6)
levels.draw_mpl(smooth=5)

EDA1 - ChrAll

In a way that is similar to the Metagene method, the methylation data can be subjected to filtering to selectively display a methylation context that is of interest.

levels.filter(context="CG").draw_mpl(smooth=5)

EDA1 - ChrFiltered

Genes to TE comparison

BSXplorer is a tool that provides various capabilities for the analysis, comparison, and visualization of methylation profiles for different samples and genomic elements. For instance, it can compare methylation patterns of genes and transposable elements in Brassica rapa subsp. perviridis.

annotation = bsxplorer.Genome.from_custom(
    "genomic_id_ncbi.tsv",
    chr_col=0, type_col=1, start_col=2, end_col=3, strand_col=4, id_col=5,    has_header=True
)

genes = annotation.gene_body(min_length=0, flank_length=500)
te = bsxplorer.Genome.from_gff("TE.gff").other("match", 0, flank_length=500)

To start the analysis, we first read the annotation and then select genes (.gene_body) and transposable elements (.other(region_type="match")) for further analysis. We then create an object using the MetageneFiles class that stores information about several metagenes.

As shown previously, the object is filtered by DNA strand and methylation context prior to plotting.

filtered = metagenes.filter(context="CG", strand="-")

labels_settings = dict(major_labels=["", ""], minor_labels=["-500bp", "Body", "+500bp"])

filtered.line_plot(smooth=5, confidence=.95).draw_mpl(**labels_settings)
filtered.heat_map(10, 20).draw_mpl(**labels_settings)

EDA1 - Genes_vs_TE LinePlot

EDA1 - Genes_vs_TE HeatMap

The MetageneFiles object has .box_plot() and .violin_plot() functions for box plot and violin plot graphs. The .trim_flank() method analyzes methylation of the body of the region.

filtered.trim_flank().box_plot().draw_mpl()
filtered.trim_flank().box_plot(violin=True).draw_mpl()

EDA1 - Genes_vs_TE ViolinPlot EDA1 - Genes_vs_TE BoxPlot