-
Notifications
You must be signed in to change notification settings - Fork 0
Home
RefTM (Reference-guided Topic Modeling of single-cell chromatin accessibility data) is a reference-guided approach based on topic modeling to analyze scCAS data, which not only utilizes the information in existing bulk chromatin accessibility and annotated scCAS data, but also takes advantage of topic models for single-cell data analysis. RefTM simultaneously models: 1) the shared biological variation among reference data and the target scCAS data; 2) the unique biological variation in scCAS data; 3) other variations from known covariates in scCAS data (enables batch effect correction). RefTM can be expanded to do more general integrative data analysis by setting proper reference data and sc data.
You can install the released version of package RefTM from Github:
devtools::install_github("cuhklinlab/RefTM")Two count matrices should be provided as input, sc_data and ref_data respectively.
sc_data: Peak-by-Cell count matrix
ref_data: Cell-by-Peak count matrix
And the peaks of the two matrices must be the same. Other than peak, other feature inputs such as motifs and genomic bins are also acceptable.
First, load package RefTM:
library(RefTM)
reference data: pseudo-bulk forebrain_ref_data. MG and OC cells are left when constructing the pseudo-bulk reference data to investigate the influence of incomplete reference data. scCAS data: forebrain_sc_data
sc_data <- forebrain_sc_data
ref_data <- forebrain_ref_data
cell_label <- forebrain_label_mat
set.seed(2022)
result <- RefTM(sc_data, ref_data)
result_LDA <- LDA(t(sc_data), k = 10)
RetTM_tsne(result_LDA@gamma, cell_label)

theta = RefTM_postprocess(result, k1 = 5)
RetTM_tsne(theta[, 1:k1], cell_label)

RetTM_tsne(theta[, -c(1:k1)], cell_label)

RetTM_tsne(theta, cell_label)

First, load package RefTM:
library(RefTM)
reference data: pseudo-bulk CLPLMPPMPP_ref_data. scCAS data: CLPLMPPMPP_sc_data. covariate: CLPLMPPMPP_donor_label.
sc_data <- CLPLMPPMPP_sc_data
ref_data <- CLPLMPPMPP_ref_data
donor_label <- CLPLMPPMPP_donor_label
cell_label <- CLPLMPPMPP_label_mat
set.seed(2022)
result <- RefTM(sc_data, ref_data, workflow = "STM", covariate = as.factor(donor_label))
theta = RefTM_postprocess(result, k1 = 5, erase.BF = FALSE)
RetTM_tsne(theta, cell_label)

theta = RefTM_postprocess(result, k1 = 5)
RetTM_tsne(theta, cell_label, donor_label)

Seurat_louvain <- RA3::RA3_clustering(t(theta), length(unique(cell_label)))