ah, yes. :) very nice detailed summary of some of the issues in this sect of "academia" (I put that in quotes only because all the research seems to be co-written by corps).
I am into audio DSP & am planning to port a couple of audio algorithms (lots of FFT & linear algebra) to run on GPU but haven't even gotten to it because I considered it a pre-mature optimization to this point. I'm sure it would improve performance, but nowhere near what GPU advocates would claim.
My biggest reason?
"PCIe transfer time to/from GPU", plus it would be unoptimized GPU code. Once you read a few of these papers it becomes painfully obvious that a lot of tuning goes into the GPU algorithms that offer anything more than a low single-digit factor of speedup. It's still very significant (cutting a 3 hour algorithm down to 1 would be huge) but if you're in an early stage of research it may be a toss-up over whether its better to just tune the algorithm itself / run computations overnight rather than going through the trouble of writing a GPU-based POC. Maybe if you have 1 or 2 under your belt its not such a big deal but for most of the researchers I know GPU algorithm rewrites would not be trivial. (I've been doing enterprise Java coding for about 2 years now so the idea isn't so intimidating now, but in a past life of mucking around with Matlab scripts I'm sure it would have been daunting).
I am into audio DSP & am planning to port a couple of audio algorithms (lots of FFT & linear algebra) to run on GPU but haven't even gotten to it because I considered it a pre-mature optimization to this point. I'm sure it would improve performance, but nowhere near what GPU advocates would claim.
My biggest reason? "PCIe transfer time to/from GPU", plus it would be unoptimized GPU code. Once you read a few of these papers it becomes painfully obvious that a lot of tuning goes into the GPU algorithms that offer anything more than a low single-digit factor of speedup. It's still very significant (cutting a 3 hour algorithm down to 1 would be huge) but if you're in an early stage of research it may be a toss-up over whether its better to just tune the algorithm itself / run computations overnight rather than going through the trouble of writing a GPU-based POC. Maybe if you have 1 or 2 under your belt its not such a big deal but for most of the researchers I know GPU algorithm rewrites would not be trivial. (I've been doing enterprise Java coding for about 2 years now so the idea isn't so intimidating now, but in a past life of mucking around with Matlab scripts I'm sure it would have been daunting).