Multi Terabytes of RAM ? If it was any other field in computer science people wo...

toomuchtodo · on May 6, 2015

> The size of the human genome is 21 MB.

No.

> In the real world, right off the genome sequencer: ~200 gigabytes

> As a variant file, with just the list of mutations: ~125 megabytes

> What this means is that we’d all better brace ourselves for a major flood of genomic data. The 1000 genomes project data, for example, is now available in the AWS cloud and consists of >200 terabytes for the 1700 participants. As the cost of whole genome sequencing continues to drop, bigger and bigger sequencing studies are being rolled out. Just think about the storage requirements of this 10K Autism Genome project, or the UK’s 100k Genome project….. or even.. gasp.. this Million Human Genomes project. The computational demands are staggering, and the big question is: Can data analysis keep up, and what will we learn from this flood of A’s, T’s, G’s and C’s….?

https://medium.com/precision-medicine/how-big-is-the-human-g...

sgt101 · on May 6, 2015

Also the world of genomics has done fantastic work on compression and if you can compress it further you will probably win a decent award with a ceremony and free booze.

jarvist · on May 6, 2015

Scientific computing requires a lot of memory, and a lot of computer time. I think it's fair to say that the underlying libraries (LAPACK,ScaLAPACK, Intel's MKL) are the most intensively optimised code in the world. Most of the non trivial algorithms are polynomial in both time and memory.

I suspect this Press Release is hinting at a next-generation (cheap, fast) DNA sequencing method. These are derived from Shotgun Sequencing methods, were hundreds of gigabytes of random base pair sequences are reassembled to a coherent genome. The next-generation methods realise cost savings by an even more lossy method of reading smaller fragments of the genome, with much greater computational demands to reassemble.