FAQ: bio-informatics

What hardware do I need to store and/or analyze my data?

NGS data requires a lot of disk space. The raw data you will receive can be easily up to 50 Gigabytes per lane of HiSeq data. This does not include analysis files which, depending on your experiment, can double this disk space easily.

At this moment, for a small experiment, 1 Terabyte USB disks which fit in your pocket are available for under 100 euros at most multimedia stores.

For purchasing analysis hardware, it’s probably better to make an appointment with us because many aspects of a project require different solutions.

Can I use hardware at the LGTC?

There is a computer cluster that can be used for analysis. Over 100 cores and nearly 1TB of RAM can be deployed. For details please contact the LGTC.

What is the difference between reads or tags?

In most cases there is none. Both terms are used to identify sequencing products. Some people refer to reads as being unaligned and tags being aligned reads.

What about indexing and bar-coding?

These terms are also used likewise to refer to an identifying piece of artificial sequence which can be used to extract different samples from a mixed dataset.

What is the difference between genome builds and which is best?

Which is best depends on your project. Generally, the latest build is considered most complete. At the LGTC we advise to use the latest available build. More information can be found at the NCBI

What are those strange chromosomes included in the human genome containing ‘un’ and ‘hap’ in their name?

The first are random contigs whose exact location on the chromosome is not certain. Those are thus included separately. The second are different haplotypes, these are included as alternative alleles.

How should I interpret coverage or depth?

Unfortunately, for these terms there are no clear definitions. Most likely when 20X, 20 times coverage, or a depth of 20 is reported you can expect on average 20 reads covering each position in the reference.

Sometimes a coverage percentage and a depth is reported. This likely indicates how many bases of the complete genome are covered with a minimum number of reads given as the depth.

What is SAM/BAM?

The Sequence Alignment/Map (SAM) format is a format to store alignment information. There is also a binary format (BAM) available which requires less disk space and allows faster access. See the project homepage for more information.

Are there any other resources worth reading? / I can’t find my answer here!

A large community of researchers specialized in NGS have a communication platform called Seq Anwers.

Why can’t I open my file on my Windows computer?

(Raw) NGS files are big. If you open such a file in Word or a spreadsheet, Windows will most likely freeze or crash. If you want to analyze the data yourself we recommend to install Linux. Besides, most free software used for NGS data analysis is especially developed for Linux.