Amazon offers several instance types targeted at high-CPU usage. They typically come with recent CPUs, sometimes GPUs as well. They are decent platforms for CPU intensive stuff.
I recall some benchmarks showing Google's cloud offering (Compute Engine) had better CPU performance than EC2. There are other providers (e.g. Digital Ocean) that might be worth a look -- I expect their hardware is on average much newer than Amazon's.
It really depends on how much your time is worth vs the cost of the jobs you want to run. EC2 is the IBM (or Microsoft, depending on your age) of cloud offerings -- you won't be fired for choosing it, but you might be able to shave some time and $s off by going elsewhere.
dedicated hardware. common. that might have been an option in the 90s. i want to be able to use as many cpus as i want when i need them. not have a bunch of machines rotting away idle.
I would actually recommend a hybrid approach. If you don't need too much memory there is a cpu heavy Amazon instance that you should scale up and down and have a VPN between a dedicated co-lo machine that handles the base load. If you need memory and cpu, its going to be costly to use EC2 unless you pay upfront for dedicated instances. In this case though the economics are probably not as favorable as getting some co-lo servers and letting them rot away. In short you will need to do a cost benefit analysis to see what makes sense.
For our service we are going to use a co-lo server for our processing with an Elastic Beanstalk frontend and use a VPC + OpenVPN setup to bridge the two. We will incur some bandwidth charges because of this, but the cross talk between the boxes is actually minimal, since the client will post directly to the co-lo box(es) when it needs to upload, etc.
I worked on the LHC's CMS data team about 2-3 years ago. We had thousands of machines crunching data. Comparing the cost to Amazon, we laughed at the deal we were getting with disposable hardware (compared to EC2 pricing). Even buying in bulk from Amazon, the dedicated hardware was cheaper.
Amazon is a huge premium over physical hardware. You use them when a) you want to scale immediately, b) you have financial reasons for not buying your own gear for long-term use (1-3 years), and c) you don't mind paying the premium.
so you ask for a question saying you do "a lot of number crunching", then complain at the fact that dedicated hardware will do what you need if you actually process a lot of data?
If you don't have fixed demand, you may want to consider spot instances. The upside is that they can be much cheaper, but the downside is that they can disappear at any second. If you design your app around high availability, it may work for you.
This HNer runs core services on spot instances and claims at least a 70% savings:
I frequently want quick and easy access to a bunch of instances, but I don't expect to use any single instance for very long. For example, I might partition my data into 30 pieces, and I want to simultaneously operate on each partition for five minutes.) Since Amazon charges a 1 hr minimum for each instance you set up, I'd pay for 30 hours.
I'm experimenting with a switch to PiCloud. PiCloud charges twice as much per hour, but they only charge for the time I use (150 minutes in this example).
PiCloud has the additional advantage of being vastly easier to use than ec2.
There are use cases for ec2 (and elastic map-reduce)...
If you're going to use them a lot you're cheaper off using consumer grade computers with the most expensive intel processors available. What I usually do is take the Passmark CPU list, add $500 for the cost of the rest of the system and sort by score/price.
It depends on your utilisation vs. idle time. EC2 is more expensive than most other virtual server providers, but they let you start up or shut down many servers within minutes, and bill hourly instead of monthly.
If your load requirements are mostly fixed and spread out evenly, alternatives including buying your own hardware may be cheaper.
If you do a lot of number crunching in short (hours or days) bursts, EC2 can be significantly cheaper, as you are not paying for the idle time. Of course, if the numerical computation is scalable across computers, it can be made bursty - instead of running one average server for a month, you could use 30 servers for a day and get the results faster for the same total cost.
The easy way: go for the high CPU instances (don't bother with the 'regular' ones, other providers offer cheaper and better machines)
The slightly harder: go for technologies that help you spread the load, so you spawn several instances that run simultaneously, run for some time and them stop, costing you maybe what would cost you for 1 instance/month but giving you the results faster
The even harder way is going for spot instances and orchestrating the sharding and reassembly of results
Is EC2 a good choice if you just want to do that? Or what would you guys use for that?