Terminology

General terms

Next-generation sequencing

Also called second-generation sequencing or deep sequencing. This primarily refers to the Roche 454, Illumina Genome Analyzer, and ABI SOLiD.

Next-next-generation sequencing

Can be called third-generation sequencing. This generation most often includes sequencers that sequence without amplification (i.e. on a single DNA molecule, also called Single Molecule sequencing). This includes the current/up-coming machines by Helicos, Pacific Biosciences, Oxford Nanopore…

Reads

The nucleotide sequences that come of a next-generation sequencers, having variable read-length.

Tag

  • ID-tag or barcode. Sequence added to a sample to allow IDentification after sequencing, usually from a pool of mixed samples with different ID-tags.
  • Reads aligned to a reference genome.
  • Captured RNA fragments, such as from DeepCAGE and DeepSAGE.

Barcoding or Indexing

Nucleotide sequences that can be used to separate individual samples out from a pool of samples.

Paired-end

Sequencing both ends of one DNA fragment. Note: for Illumina and SOLiD this is typical for +/- 300-500 bp fragments.

Mate-pairs

Sequencing both ends of larger DNA fragments (3-10 Kbp range) from which the middle part was removed during sample preparation (usually involving a circularization step). NOTE: for Illumina this is typical for “longer” inserts.

The 2 DNA strands

  • Forward/Reverse: The forward and reverse sequence read of a DNA fragment. NOTE that using hybridization capture I usually isolate only one strand (complementary to the probe used) that is subsequently sequenced in two directions. In a clinical setting this is different from sequencing a region in both directions (starting with a mix of both strands).
  • +/-: Usually used in genome browsers, “top and bottom” strands.
  • Sense/antisense: The direction of a sequence read based on the transcriptional orientation of a gene; transcribed (strand) compared opposite non-transcribed (antisense) strand.

Colorspace

A system used by ABI SOLiD to not represent single nucleotides, but dinucleotides (a pair of nucleotides).

Dark nucleotide

A nucleotide that in single molecule sequencing in incorporated but not detected, either due to incorporation of unlabeled nucleotides, failure to fluoresce and/or failure to detect by imaging.

BioInformatics terms

Alignment

Finding where on a reference genome reads fit.

  • Unique(ness): whether a read aligns to one or multiple location.
  • Mismatches: when a read aligns to the genome, but one or more nucleotides do not match the reference.
  • Probabilistic model: when a read aligns equally to more than one location distribute the alignment output equally (randomly ?) among these locations.

Assembly

Building contigs out of reads.

Dustbin

The collection of all reads not aligning to the reference sequence under the thresholds set.

Sequencability

Giving the reference sequence used and the read lengths of the fragments obtained, the theoretical chance to cover the reference sequence. NOTE regions with reduced “sequencability” usually contain sequences that are present multiple times in the genome.

FASTQ format

@HWIEAS422:7:1:9543:965#GTCTGG/1
NAGACCTGCGGCTCCTCATCCACGGGCTGGTCGTATGCCGGCTTCTGCTT
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

Read more on FASTQ format