If it was any other field in computer science people would be really critical of your methodology.
The size of the human genome is 21 MB.
If you are trying to find the co-ordinate of every cancer cell in a human body then sure, You need a lot of RAM.
But the output of the collective field of cancer research doesn't seem to be there yet. So why do you need so much RAM ?
Usually when your problem becomes NP-hard. You switch to simpler models. Have you checked the search space for all simpler models ? Or are you sticking to complex models since it helps you publish papers ?
You also need to understand that hardware only gets you so far, running a cluster has its own costs - network latency.
Most often than not, better techniques are required, rather than than say the tremendous improvement in computational power is not good enough.
> In the real world, right off the genome sequencer: ~200 gigabytes
> As a variant file, with just the list of mutations: ~125 megabytes
> What this means is that we’d all better brace ourselves for a major flood of genomic data. The 1000 genomes project data, for example, is now available in the AWS cloud and consists of >200 terabytes for the 1700 participants. As the cost of whole genome sequencing continues to drop, bigger and bigger sequencing studies are being rolled out. Just think about the storage requirements of this 10K Autism Genome project, or the UK’s 100k Genome project….. or even.. gasp.. this Million Human Genomes project. The computational demands are staggering, and the big question is: Can data analysis keep up, and what will we learn from this flood of A’s, T’s, G’s and C’s….?
Also the world of genomics has done fantastic work on compression and if you can compress it further you will probably win a decent award with a ceremony and free booze.
Scientific computing requires a lot of memory, and a lot of computer time. I think it's fair to say that the underlying libraries (LAPACK,ScaLAPACK, Intel's MKL) are the most intensively optimised code in the world. Most of the non trivial algorithms are polynomial in both time and memory.
I suspect this Press Release is hinting at a next-generation (cheap, fast) DNA sequencing method. These are derived from Shotgun Sequencing methods, were hundreds of gigabytes of random base pair sequences are reassembled to a coherent genome. The next-generation methods realise cost savings by an even more lossy method of reading smaller fragments of the genome, with much greater computational demands to reassemble.
If it was any other field in computer science people would be really critical of your methodology.
The size of the human genome is 21 MB.
If you are trying to find the co-ordinate of every cancer cell in a human body then sure, You need a lot of RAM.
But the output of the collective field of cancer research doesn't seem to be there yet. So why do you need so much RAM ?
Usually when your problem becomes NP-hard. You switch to simpler models. Have you checked the search space for all simpler models ? Or are you sticking to complex models since it helps you publish papers ?
You also need to understand that hardware only gets you so far, running a cluster has its own costs - network latency. Most often than not, better techniques are required, rather than than say the tremendous improvement in computational power is not good enough.