TABLE 3.
Summary of data and statistical analysis methods used by different genome-wide high-throughput deep-sequencing DNA methylation studies
| Sequencing platform | Aim of the study | Methods for sequencing data analysis and statistical analysis | Reference |
|---|---|---|---|
| Roche/454 | Analysis of global DNA methylation in the tissue and the sera of breast cancer patients | The data and efficacy of bisulfite mutagenesis for this study were analyzed using MethylMapper. The data set was also examined to ensure that each amplicon and each patient had balanced representation. T-tests and discriminant analysis were performed to identify significantly changed amplicons in cancer-free versus cancer samples. All the statistical analyses were performed using SAS (SAS Institute) and R (R Foundation for Statistical Computing) software. | Bormann et al. 2010 |
| Analysis of global CpG islands methylation of sperm and female white blood cells | The data was first preprocessed to remove all the sequences that had adaptor sequence. The remaining sequences were then mapped using VerJInxer software . For mapping of the CpG island–enriched fragments, a CpG island reference sequence was generated by extracting CpG island sequences and defined by the UCSC browser from the Repeat Masked human genome reference sequence. | Zeschnigk et al. 2009 | |
| ABI SOLiD | Evaluating the utility of SOLiD for bisulfite sequencing of large and complex genomes | First, two bisulfite reference genomes were created by replacing all the C’s by T’s in silico in both the DNA strands of DH10B genome. Sequence reads were then aligned to both bisulfite-converted reference genomes and to the normal DH10B genome using the SOLiD System Analysis Pipeline Tool, allowing up to five mismatches per read. | Lister et al. 2008 |
| Development of a method (Methyl-MAPS) that can globally detect DNA methylation status for both unique and repetitive DNA sequences | Initial tag mapping was performed using the SOLiD System Analysis Pipeline Tool. Paired-end tags were each individually mapped in color space, allowing up to two mismatches in each 25-bp tag to the human hg18 sequence obtained from the UCSC Genome Browser. A custom Perl script was used for further identifying the methylated versus unmethylated regions in the sequences. CpG island, RepeatMasker, and RefSeq gene data were all downloaded from the UCSC Genome Browser. Each CpG island was annotated according to its genomic location. Promoter islands were defined as islands that occur within 1 kb of a gene transcription start site. | Edwards et al. 2010 | |
| Illumina/SOLEXA | Genome-wide DNA methylation analysis of pluripotent and differentiated cells | Sequence reads from bisulfite-converted libraries were identified using standard Illumina base-calling software and analyzed using a custom computational pipeline. | Meissner et al. 2008 |
| Single-base resolution DNA methylation map of Arabidopsis | Sequence information was extracted from the image files with the Illumina Firecrest and Bustard applications and mapped to the Arabidopsis (Col-0) reference genome sequence (TAIR 7) with the Illumina ELAND algorithm. Reads were mapped against computationally bisulfite-converted and nonconverted genome sequences. Reads that aligned to multiple positions in the three genomes were aligned to an unconverted genome using the cross_match algorithm. To identify the presence of a methylated cytosine, a significance threshold was determined at each base position using the binomial distribution, read depth, and precomputed error rate based on the combined bisulfite conversion failure rate and sequencing error. Methylcytosine calls that fell below the minimum required threshold of percent methylation at a site were rejected. This approach ensured that no more than 5% of methylcytosine calls were false positives. The investigators also developed an open source web-based application called Anno-J for visualization of genomic data. | Ruike et al. 2010 | |
| Human DNA methylation analysis at single-base resolution | MethylC-seq sequencing data were processed using the Illumina analysis pipeline, and FastQ format reads were aligned to the human reference genome (hg18) using the Bowtie alignment algorithm46. The base calls per reference position on each strand were used to identify methylated cytosines at a 1% false discovery rate. | Lister et al. 2009 |
-
Reproduced, with permission, from Gupta et al. 2010.










