Genome sequence

Genome Analytics

The Genome Analytics (GMAK) facility provides next-generation sequencing (NGS) technology to the HZI. External users may also have access to the technology, e.g. upon collaborative arrangements. In addition to sample DNA/RNA quality check, library preparation and sequencing, the facility provides primary data processing, data quality check and analysis. A number of pipelines for secondary data processing are established.

Dr Robert Geffers

Head

Dr Robert Geffers
Research Group Leader

Our Services

RNA Sequencing

RNA sequencing, or RNA-Seq, is the latest technology to study the transcriptome, i.e., the full set of RNA transcripts as genome readouts in a cell or population of cells. This technology directly sequences RNA molecules in the transcriptome in order to determine their genes of origin and abundance. RNA species need to undergo a sequencing library preparatory process prior to sequencing. The libraries are then sequenced to generate millions of reads for each sample. After sequencing, the generated reads are mapped to the reference genome to identify their genomic origin. The total number of reads mapped to a particular genomic region represents the level of transcriptional activity in the region. The more transcriptionally active a genomic region is, the more copies of RNA transcripts it produces, and the more RNA-Seq reads it generates. RNA-seq is essentially a counting game.

GMAK provides five types of RNA-seq services as detailed below:

  • mRNA-Seq: Starts with 100 ng to 1 ug high quality total RNA. Prepares library from poly-A enriched mRNA species. Aims to identify differentially expressed protein-coding genes. Most requested.
  • Total RNA-Seq: Starts with 100 ng to 1 ug total RNA. Library prep based on rRNA depletion. Targets both protein-coding genes and long noncoding RNAs. Can accommodate degraded RNA, such as those extracted from FFPE or laser capture microdissected samples.
  • Low-input RNA-Seq: Accommodates limited amounts of total RNA in the range of 5-100 ng. Library prep can be based on poly(A) mRNA enrichment (default), or rRNA depletion (at additional cost).
  • Small RNA-Seq: Prepares sequencing libraries for small RNA species, e.g., miRNAs, from total RNA. Can start from 100 ng to 1 ug (standard input) or 5 to 100 ng (low input) total RNA.
  • Single-Cell RNA-Seq

Users who are new to bulk RNA-seq may refer to core documents RNA-Seq Workflow Steps and Examples, and the RNA-Seq Decision Tree to help decide the type of RNA-seq service needed.

Read Length and Sequencing Depth

Standard mRNA- or total RNA-Seq: Paired-end 50 reads are mostly used for general gene expression profiling. To study alternative splicing variants, paired-end, longer reads (up to 150 bp) are often requested. On sequencing depth, 25-30 million reads per sample are usually appropriate for general gene expression profiling, while 40-50 million reads are suggested for splicing variant detection.
Low-input  RNA-seq: Read length remains the same as standard mRNA- or total RNA-seq. Sequencing depth may be reduced to some extent based on the amount of starting material.
Small RNA-seq: GMAK generates paired-end 50 bp reads for small RNA-seq. The suggested sequencing depth is 5-10 million reads per sample.

Service Request

Project consultation is provided free-of-charge.

Sample Submission

GMAK takes extracted total RNA for RNA-seq (no tissues or cells). The quality of RNA is the single most important factor that determines final outcome. After sample drop-off, core staff conducts sample QC, which includes Qubit concentration measurement and Bioanalyzer-based RNA Integrity Number (RIN) generation, prior to library construction. A RIN of 8 is required to proceed with mRNA-seq library construction. Submitted RNA samples also need to be DNA-free and we suggest to always include a DNase treatment step during RNA extraction. Presence of genomic DNA contamination is visible on Bioanalyzer traces in the range of 4-10 kb. In situations under which RNA degradation is unavoidable, such as when using FFPE tissues, total RNA-seq is suggested as it is less dependent on the intactness of RNA. Use our Sample Submission Form.

Bioinformatics

Data analysis is provided upon request. Standard RNA-seq bioinformatics service includes sequencing data QC, alignment, normalization, and differential expression analysis.

Single-Cell Sequencing

GMAK has offered single-cell sequencing since 2017. This state-of-the-art technology offers unprecedented opportunities to study cell-to-cell variation, identify/visualize different cell types/identities in a population, and infer cellular developmental trajectories. To help users accomplish these goals, GMAK assists users in every step of this process – from cell prep and sequencing library construction to bioinformatic analysis.

As the technology evolves, single-cell sequencing becomes more diverse to meet varying project needs. Currently, GMAK offers high-throughput single-cell sequencing based on 10x Genomics Technology

10x Genomics Chromium

  • Target cell number: 1,000-10,000 cells in each sample
  • Input type: freshly prepared single-cell (or nucleus) suspension, fixed-cell (or nucleus), cryopreserved cell, and FFPE embedded tissue.
  • Applications: 3’ and 5’ single-cell (or nucleus) RNA-seq, T-cell and B-cell V(D)J clone profiling, cell surface protein profiling, single nucleus ATAC-seq and multiome (simultaneous single nucleus ATAC-seq and RNA-seq on the same cells)

Service Request

To initiate a single-cell sequencing project, please contact us. Project consultation is provided free-of-charge. Consultation with the core prior to starting a single-cell sequencing experiment is highly recommended to ensure accomplishment of project goals.

Sample Submission

For all single-cell sequencing services, please make a sample submission appointment with us in advance.

Recommended Sequencing Parameters

  • RNA-Seq Libraries – Read 1 of 28 bp (Cell Barcode and UMI), i7 Index of 10 bp (Sample Index), i5 Index of 10 bp (Sample Index), and Read 2 of 90 bp (Transcript Insert) with a sequencing depth of >20,000 reads per cell;
  • ATAC-Seq Libraries – Read 1 of 50 bp (Transposed DNA), i7 Index of 8 bp (Sample index), i5 Index of 16 bp (10x Barcode) and Read 2 of 49 bp (Transposed DNA) with a sequencing depth of >25,000 reads per cell

Bioinformatics

Data analysis is provided upon request.

Spatial Transcriptomics

Spatial transcriptomics enables interrogation of gene expression within the context of tissue architecture, tissue microenvironments and cell groups (especially when coupled with single cell sequencing). To meet the rapidly increasing needs for spatial -omics studies, GMAK teams up with the HZI Core Unit Mouse Histology and Pathology hosted by VMED department . This internal cooperation ensures users have access to the various techniques needed to carry out a typical spatial analysis workflow, such as tissue prep, cryosectioning, staining, imaging, tissue section QC, sequencing library prep, sequencing and data analysis. The established workflow accommodates both fresh frozen (FF) and formalin-fixed paraffin-embedded (FFPE) tissues. Pre-cut tissue sections on standard glass slides may also be used for the Visium platform from 10x Genomics, as the 10x CytAssist instrument available at GMAK enables sample transfer from pre-existing slides to Visium slides. The Visium platform from 10x Genomics has been offered since August 2023.

Service Request

Please contact GMAK to initiate a spatial analysis project. The Core works closely with rest of HZI cores on the different steps of the workflow. Project consultation is provided free of charge. Consultation with the core prior to starting a spatial transcriptome experiment is highly recommended to ensure accomplishment of project goals.

Whole Genome Sequencing

Whole-Genome-Sequencing (WGS) is a method used to determine the complete DNA sequence of an organism's genome, including chromosomal DNA and mitochondrial DNA. While WGS provides a comprehensive view of the genome. WGS methods can be categorized into de novo sequencing projects and re-sequencing projects. Which method for which project? DNA-Seq_Decision_Tree

De novo WGS

For de novo sequencing of genomes, long reads such as those produced by Oxford Nanopore sequencers (MinION, GridION) are advantageous. Since this technology is available at GMAK, we recommend a hybrid approach for de novo WGS. Here, short reads from Illumina sequencers and long reads from Oxford Nanopore (ONT) sequencers are used to improve the assembly results for genomes.

WGS Re-Sequencing

WGS Re-Sequencing is notably more cost-effective than de novo WGS, as it can be accomplished using only short reads when a high-quality assembled reference genome is available.

Sequencing Mode

  • Short reads (Illumina): paired-end run of at least 150bp, better 300bp with a 50x coverage of the genome for re-sequencing and a 100x coverage of the genome for de novo sequencing
  • Long reads (ONT): 50x coverage of the genome for a hybrid assembly with Illumina short reads and 200x coverage of the genome for de novo sequencing

We strongly recommend a hybrid approach for successful de novo sequencing of small genomes (bacteria, viruses), the combination of short reads (Illumina) with long reads (ONT).

For the method of re-sequencing and the associated SNP/variant calling, we recommend the use of short reads (Illumina), as this sequencing technology currently has the lowest sequencing error rate and thus better prerequisites for subsequent qualitative analysis steps.

Service request

If required, a free project consultation is offered.

Submission of the sample

  • Library Prep for Illumina sequencer: 150 ng high-quality genomic DNA required.
  • Library Prep for MinION Sequencer (ONT): 1µg HMW DNA

Fluorometric methods, such as Qubit or PicoGreen, are preferred for DNA quantification. Spectrophotometric methods, such as Nanodrop, may not be accurate enough. Use our Sample Submission Form.

Bioinformatics

Data analysis is offered on request. Depending on the method, the standard bioinformatics service includes quality control of sequencing data, alignment, assembly, hybrid assembly, SNP/variant calling, automatic annotation.

Whole Exome Sequencing

While the protein-coding region of the genome (i.e., the exome) represents only a small portion of the genome (less than 2 percent in humans), it is the most studied and best annotated. For example, the human exome contains approximately 85 percent of all known disease-related variants. Due to its cost effectiveness and better data manageability, whole exome sequencing (WES) offers an ideal approach when whole-genome sequencing is not practical or needed.

WES enables core users to focus their resources on genes that are most likely to have an impact on the phenotype or disease of interest. By scanning through the entire amino acid coding region of the genome, it leads to identification of relevant variants across a wide range of applications, including genetic diseases, cancer development and population genetics.

GMAK uses a capture-based approach to target exome regions for sequencing. We use biotinylated nucleic acid baits, which are complementary to the target exome, to hybridize to genomic DNA libraries for the capture. For our WES, we only require 150 ng of high-quality human genomic DNA.

Sequencing Mode

Paired end 100 or 150 bp high- or mid-output runs are recommended for WES. Each high- or mid-output run generates 1600 million, or 800 million, paired-end reads, respectively.

Service Request

Project consultation is provided free-of-charge.

Sample Submission

For WES, 150 ng of high-quality human genomic DNA is required. For DNA quantification, fluorometric-based methods, such as Qubit or PicoGreen, are preferred. Spectrophotometric-based methods, such as Nanodrop, may not be accurate.

Bioinformatics

Data analysis is provided upon request. Standard WES bioinformatics service for variant discovery includes sequencing data QC, alignment, and variant calling. Delivered results are variant call (VCF) files.

ATAC-Seq for Open Chromatin Profiling

The eukaryotic genome is highly packaged to fit into the very limited nuclear space. As a result, access to genomic information is tightly regulated based on cellular state. What regions of the genome are accessible reveals a great deal about the state of the cell. ATAC-seq, or Assay for Transposase-Accessible Chromatin coupled with next-gen sequencing, is a technique to locate accessible chromatin regions.

As the name suggests, ATAC-seq is based on the use of an engineered, hyperactive transposase (called Tn5), which fragments DNA in open regions of the chromatin. In the same process, it simultaneously tags the ends of the fragmented DNA with sequencing adapters. This tagmentation process is a key part of ATAC-seq library construction.

Standard ATAC-seq: For input material, GMAK needs 50,000 live cells to start library prep. Extra cells are needed for cell viability and density checks before conduct of library prep. Please submit at least 60,000 cells to the Core. Since cells need to be processed immediately after delivery to the Core, the user needs to contact the Core ahead of time (at least by one week) to schedule the work.

Single cell ATAC-seq using 10x Chromium: Single nuclei suspension prepared from fresh, cryopreserved, and flash frozen tissue or cell samples is needed for library prep. As performed for single cell RNA-seq, single nuclei prep will be check first for quality and concentration, and 500-10,000 nuclei can be targeted in each sample. As single nuclei prep needs to be processed right away by Core personnel upon arrival, the user needs to schedule the work ahead of time (at least by two week).

Service Request

Project consultation is provided free-of-charge.

Sample Submission

Standard ATAC-seq: Live cells, not genomic DNA, is required as input for ATAC-seq. The number of cells is critical to project success, with 50,000 cells as a good starting point. Healthy cells in a homogeneous single-cell suspension work the best. Before proceeding to library prep, GMAK staff will check for cell viability and density.

Single cell ATAC-seq: Single nuclei suspension prepared from fresh, cryopreserved, and flash frozen tissue or cell samples is needed for library prep. The 10x Genomics scATAC-seq Support page lists Demonstrated Protocols for sample prep. GMAK staff will check single nuclei prep quality and concentration upon sample delivery.

It is essential to coordinate with core staff for sample delivery because samples need to be processed right away. It is advisable to schedule the work with the Core as early as possible, prior to cell preparation.

Sequencing Mode

  • Standard ATAC-seq: Paired-end 50 bp reads are usually enough for mapping ATAC-seq reads to the reference genome.
  • Sequencing depth: at least 50 million reads per sample are recommended.

Bioinformatics

Data analysis is provided upon request. Raw data and analysis results are usually returned to user via FTP-Service.

ChIP Sequencing

ChIP-seq is a genomics technology developed to map binding sites of a DNA-interacting protein across the genome. Examples of DNA-interacting proteins include transcription factors, histones, and enzymes for DNA repair and modification. A common application of ChIP-seq is to locate transcription factor binding patterns under different conditions, such as development stages or pathological conditions.

ChIP-seq starts with covalent cross-linking of DNA with interacting proteins, then shearing of chromatin into fragments, followed by enrichment of the protein of interest with its bound DNA by immunoprecipitation using an antibody specific for the protein. Subsequently, after dissociating the enriched protein-DNA complex, the released DNA fragments are subjected to sequencing. One key experimental factor in the ChIP-Seq process is the quality of the antibody used in the enrichment step, as the use of a poor-quality antibody can lead to high experimental noise due to non-specific precipitation of DNA fragments.

Sequencing Mode

A paired-end 50 bp run is sufficient for most cases. Use of  longer reads may help reads alignment, especially to repetitive regions.

Service Request

If needed, project consultation is provided free-of-charge.

Sample Submission

For library prep, 10 ng of ChIP and input DNA is required. For DNA quantification, fluorometric-based methods, such as Qubit or PicoGreen, are preferred. Spectrophotometric-based methods, such as Nanodrop, may not be accurate.

Bioinformatics

Data analysis is provided upon request. Standard ChIP-seq bioinformatics service includes sequencing data QC, alignment and peak calling.

DNA Methylation Sequencing

The methylation of cytosines is a major epigenomic mechanism that modulates the primary genomic code. This leads to the formation of 5-methylcytosines (5mCs) at select sites of the genome. Cytosine methylation regulates gene expression and chromatin remodeling, and as a result plays important roles in many biological functions including embryonic development, cell differentiation, and stem cell pluripotency. Abnormal DNA methylation can lead to diseases, such as cancer.

DNA methylation sequencing is a newer technology that is usually based on bisulfite conversion to differentiate methylated vs. unmethylated cytosines. Upon treatment with bisulfite, unmethylated cytosines are converted to uracils, while 5mCs are nonreactive and retained. In the sequencing step, unmethylated cytosines are read as thymines, while methylated cytosines still as cytosines.

Based on genomic coverage, bisulfite conversion based sequencing can be conducted in GMAK as either Whole-Genome Bisulfite Sequencing (WGBS) or Reduced Representation Bisulfite Sequencing (RRBS). WGBS costs more and the associated data analysis is much more involved. RRBS instead provides a cost effective approach to survey DNA methylation by sampling CpG-rich regions of the genome. To perform RRBS, genomic DNA is digested with a methylation-insensitive restriction enzyme, such as MspI. The digested DNA fragments are then subjected to adapter ligation, bisulfite conversion, and PCR, to generate a library for sequencing.

Sequencing Mode

A  paired-end 150 bp run is suggested for most cases.

Service Request

If needed, project consultation is provided free-of-charge.

Sample Submission

For library prep, 150 ng of high-quality genomic DNA is required. For DNA quantification, fluorometric-based methods, such as Qubit or PicoGreen, are preferred. Spectrophotometric-based methods, such as Nanodrop, may not be accurate.

Bioinformatics

Data analysis is provided upon request. Standard RRBS bioinformatics service includes sequencing data QC, alignment, and DNA methylation localization and quantification.

Nanopore Long-Reads Sequencing

Oxford Nanopore sequencing is a third-generation, single-molecule sequencing platform. The length of sequencing reads it produces is typically 10-100 kb for long reads sequencing mode and 100-300 kb for ultra-long reads sequencing. The longest reads achieved so far is 4 Mb. These long reads are needed for a list of applications (see below) that short-reads sequencing struggles with.

With continuous technology development, the error rate of Nanopore sequencing raw reads is at 1% achieving 99% (Q20) accuracy. Consensus accuracy can reach Q47 at 60x coverage for human DNA through combining multiple raw reads from a genomic region to form a single consensus sequence. The most common sequencing error in nanopore sequencing occurs in homopolymeric regions.

Oxford Nanopore currently offers three main devices at different data throughput levels, i.e., MinION, GridION, and PromethION. MinION/GridION use the same flow cell type which typically produces 10-20 Gb data (30 Gb maximum). The flow cell used on the PromethION has a throughput of 50-100 Gb (170 Gb maximum).
Below are some of the major applications for Oxford Nanopore sequencing:

  • Structural variation detection
  • Single nucleotide variant phasing
  • Full-length transcript sequencing and splicing isoform detection
  • Detection of fusion transcripts
  • De novo genome assembly
  • Direct detection of epigenetic base modifications on DNA and RNA

Read Length and Sequencing Output

Long-reads sequencing mode: 10-100 kb in length. Good for:

  • Full-length transcript sequencing and splicing isoform detection
  • Detection of fusion transcripts
  • Direct base modification on DNA and RNA

Ultra-long reads sequencing mode: 100-300 kb in length. Good for:

  • De novo genome assembly
  • Haplotype phasing
  • Structural variants detection
  • Total data output: 10-20 Gb per MinION flow cell; up to 200 Gb per PromethION flow cell.

Service Request

Project consultation is provided free-of-charge.

Sample Submission

GMAK takes extracted total RNA for RNA-seq or enriched poly(A) mRNA for direct RNA modification sequencing. The quality of RNA is the single most important factor that determines final outcome. After sample dropoff, core staff conducts sample QC, which includes Qubit concentration measurement and Bioanalyzer-based RNA Integrity Number (RIN) generation, prior to library construction. A RIN of 8 is required to proceed with mRNA-seq library construction. Submitted RNA samples also need to be DNA-free and we suggest to always include a DNase treatment step during RNA extraction. Presence of genomic DNA contamination is visible on Bioanalyzer traces in the range of 4-10 kb.

For DNA sample used for genome assembly, DNA purity and length (in Mb) are critical to obtain high quality data. Please consult with GMAK core for your project.

Bioinformatics

Data analysis is provided upon request.