This is the repo to store the processed data used in the iPoLNG manuscript. We also include a code example preprocess_example_code to preprocess the original data.
Each data folder consists of the following files:
<dataname>_barcodes.csv: the cell barcodes in the data.
<dataname>_DNA20k.mtx: the feature by cell matrix for ATAC / histone modification data.
<dataname>_DNA20kbins.csv: the selected features for ATAC / histone modification data.
<dataname>_RNA5k.mtx: the feature by cell matrix for RNA data.
<dataname>_RNA5kgenes.csv: the selected features for RNA data.
For dataset with ground truth labels (Paired-Tag and SHARE-seq), an additional file <dataname>_celltype.csv is attached.
For 10xPBMC10k data, the ATAC data is divided into two parts for the data storage issue. 10xPBMC10k_DNA20k_part1.mtx is the feature by cell matrix with the first 10,000 features, while 10xPBMC10k_DNA20k_part2.mtx is the feature by cell matrix with the last 10,000 features. Interested users should combine the features together to get the full matrix.