Software
- SLIDE (Sparse Linear modeling of RNA-Seq data for Isoform Discovery and abundance Estimation)
- Citation: Li, J.J., Jiang, C.-R., Brown, B.J., Huang, H., and Bickel, P.J. (2011). Sparse linear modeling of RNA-seq data for isoform discovery and abundance estimation. Proc Natl Acad Sci. USA 108(50):19867-19872.
- Important note: SLIDE is compatible with RNA-seq .bam files mapped by TopHat and TopHat2.
- (Updates on Jan 30th, 2018 -- Several bugs have been fixed for RNA-seq .bam files with more than one read lengths.)
- (Updates on May 7th, 2012 -- A feature was added for estimating the annotated isoform abundance without doing isoform discovery.)
- (Updates on Apr 18th, 2012 -- A feature was added for removing erroneously mapped reads; a bug of multiprocessing was fixed.)
- (Updates on Apr 5th, 2012 -- A feature was added for handling single-end RNA-Seq reads or a mixture of single-end and paired-end reads.)
- NMFP (Non-negative Matrix Factorization based Preselection)
- Please cite the following paper in any research that uses this software package
- Ye, Y. and Li, J.J. (2016). NMFP: a non-negative matrix factorization based preselection method to increase accuracy of identifying mRNA isoforms from RNA-seq data. BMC Genomics 17(Supp 1):11.
- Please cite the following paper in any research that uses this software package
- TROM (TRanscriptome Overlap Measure)
- Citation: Li, W.V., Chen, Y., and Li, J.J. (2017). TROM: a testing-based method for finding transcriptomic similarity of biological samples. Statistics in Biosciences 9(1):105-136.
- EPOM (EPigenome Overlap Measure)
- Citation: Li, W.V., Razaee, Z.S., and Li, J.J. (2016). Epigenome overlap measure (EPOM) for comparing tissue/cell types based on chromatin states. BMC Genomics 17(Supp 1):10.
- scImpute (accurate and robust imputation for single-cell RNA-Seq data)
- Citation: Li, W.V. and Li, J.J. (2018). An accurate and robust imputation method scImpute for single-cell RNA-seq data. Nature Communications 9:997.
- NPROC (Neyman-Pearson (NP) classification algorithms and NP receiver operating characteristics (NP-ROC))
- Citation: Tong, X., Feng, Y., and Li, J.J. (2018). Neyman-Pearson (NP) classification algorithms and NP receiver operating characteristics (NP-ROC). Science Advances 4(2):eaao1659.
- MSIQ (joint modeling of Multiple RNA-seq Samples for accurate Isoform Quantification)
- Citation: Li, W.V., Zhao, A., Zhang, S., and Li, J.J. (2018). MSIQ: joint modeling of multiple RNA-seq samples for accurate isoform quantification. Annals of Applied Statistics 12(1):510-539.
- EpiAlign (an alignment-based bioinformatic tool for comparing chromatin state sequences)
- Citation: Ge, X., Zhang, H., Xie, L., Li, W.V., Kwon, S.B., and Li, J.J. (2019). EpiAlign: an alignment-based bioinformatic tool for comparing chromatin state sequences. Nucleic Acids Research gkz287.
- scDesign (a statistical simulator for rational scRNA-seq experimental design)
- Citation: Li, W.V. and Li, J.J. (2019). A statistical simulator scDesign for rational scRNA-seq experimental design. Bioinformatics 35(14):i41–i50.
- gR2 (generalized R squares measures)
- Citation: Li, J.J., Tong, X., and Bickel, P.J. (2019). Generalized R2 measures for a mixture of bivariate linear dependences. arXiv:1811.09965.
Data
- Estimates of D. melanogaster and C. elegans gene expression in different developmental stages, tissues and cells (in FPKM units)
- D. melanogaster gene expression estimates in 30 fly developmental stages (download)
- D. melanogaster gene expression estimates in 29 fly tissues and 19 fly cell lines (download)
- C. elegans gene expression estimates in 35 worm developmental stages (download)
- C. elegans gene expression estimates in 4 worm tissues and 14 worm dissected cells (download)
- Please cite the following paper in any research that uses the above data
- Li, J.J., Huang, H., Bickel, P.B., and Brenner, S.E. (2014). Comparison of D. melanogaster and C. elegans developmental stages, tissues, and cells by modENCODE RNA-seq data. Genome Research 24(7):1086-1101.
- For more details about the data, please refer to the section "Estimating gene expression in developmental stages and tissues/cells" in the Methods of the above paper ([html] [pdf]).
- Associated promoter and enhancer regions identified based on signals of three histone modification marks (H3K4me1, H3K4me3 and H3K27ac) in 16 human tissue and cell types (download)
- Estimates of gene expression (FPKM) in various cell and tissue types from human, chimpanzee, bonobo and mouse
- Expression estimates of protein-coding genes in human (download)
- Expression estimates of protein-coding genes in chimpanzee (download)
- Expression estimates of protein-coding genes in bonobo (download)
- Expression estimates of protein-coding genes in mouse (download)
- Expression estimates of protein-coding genes in pig (download)
- Expression estimates of long non-coding RNAs in human (download)
- Expression estimates of long non-coding RNAs in chimpanzee (download)
- Expression estimates of long non-coding RNAs in bonobo (download)
- Expression estimates of long non-coding RNAs in mouse (download)
- Please cite the following paper in any research that uses the above data
- Yang et al. Large-scale mapping of mammalian transcriptomes identifies conserved genes associated with different cell states. Nucleic Acids Research 45(4):1657–1672.
- For more details about the data, please refer to the section "RNA-seq data collection and processing" in the Methods of the above paper ([html] [pdf]).
- Data for the R package Clipper (download)