Judging by some of the comments here, it seems people are giving Greenspun a free pass because he's apparently getting at deeper point. However when I read this article, it is chock full of straw men. The comparison between a competent Microsoft programmer vs a complete bumbling fool labeled as an MIT Genius is at best intellectually dishonest. I wrote a lengthy response which I'll post here in case his moderator decides he doesn't like it:
I read the moderation policy where it's suggested that reviews of the post are not valued. However I feel an obligation to point out the factual errors in this post. There are dozens of nonsensical assertions and could potentially be very misleading to anyone who doesn't understand Rails or web development in general.
My first general critique is that there is no real comparison going on here. It says the business guy called up Microsoft and they recommended buying a bunch of hardware, but there's no discussion of who developed the site or how they got up and running. There's no discussion of the price of the hardware, which clearly looks to be well into the 5-figures, or the price of the fiber connection at home, system administration, backups, etc. To get into some specifics:
The programmer, being way smarter than the swaptree idiot, decided to use Ruby on Rails, the latest and greatest Web development tool. As only a fool would use obsolete systems such as SQL Server or Oracle, our brilliant programmer chose MySQL.
This is a caricature of an "MIT Genius" that doesn't jive with reality. Anyone who was actually that smart would know better than to dismiss Oracle in favor of MySQL. They may prefer using Ruby on Rails and be more productive than if they used .NET, but they wouldn't go around calling people idiot's for such superficial reasons. Therefore you're not describing an actual genius, just someone who thinks they are a genius, but is actually a fool. Using such a person as the basis for an argument of why Microsoft's recommendations are better than Rails is intellectually dishonest.
How do you get scale and reliability? Start by virtualizing everything. The database server should be a virtual “slice” of a physical machine, without direct access to memory or disk, the two resources that dumb old database administrators thought that a database management system needed.
The reason that virtualization is done in the web deployment world is so that you can get access to fast and reliable hardware if you need less than the cost of the full resources. A degenerate example would be that if your capacity requirements could be met by a 250mhz processor, you would get better throughput by using 1/8th of a 2Ghz server. The reasoning for this is that the vast majority of sites don't need dedicated hardware, which you seem to imply as being cheaper, but clearly it is not if you are leasing server capacity.
Ruby and Rails should run in some virtual “slices” too, restricted maybe to 500 MB or 800 MB of RAM. More users? Add some more slices!
I'm going to assume you are talking about EngineYard here, since that is the managed Rails hosting provider I am most familiar with and is somewhat inline with your pricing figures below. First, the 500 or 800 MB is just a base amount of RAM that is good for most small Rails apps. When that starts to run out, the solution is NOT to add more slices, you simply commission more RAM. EY can do this without even restarting your slice. Incidentally you can also commission more CPU if you need it. The reason they start with two production slices is for redundancy. One of your slices goes down for some reason? That's okay because there's a backup.
The cost for all of this hosting wizardry at an expert Ruby on Rails shop? $1100 per month.
What you described above is a very poor description of what you are paying for at a managed hosting provider like EngineYard. I will describe managed hosting in a minute. But to compare to your unmanaged Microsoft example, I currently pay $8/month for a 256MB of unmanaged hosting that is plenty to server significant traffic on a well optimized app. This is an order of magnitude less than the Verizon FiOS line alone, and provides much better network connectivity (ie. multiple tier-1 connections, lower latency to more endpoints).
With managed hosting at EngineYard, you are not just paying for the server. You are basically paying for a fulltime system administrator. They have people all over the world ready to help you at a moment's notice any time of day or night. They proactively monitor your server and contact you if they notice any abnormalities. They provide a large suite of finely tuned recipes and standard software installations that they can install on a moment's notice, and will tie into their monit-based server monitoring setup. The individual machines in the cluster are optimized for their specific tasks. The network hardware and topography is optimized for real world usage scenarios. They continuously tune the machines for throughput and move clients around to avoid bottlenecks. They will even take significant steps towards helping the client tune their own application, above and beyond their contractual obligations for server adminstration. In short, you've completely ignored 95% of what they do, and painted it as extremely expensive without even providing a comparison against the overhead costs of buying and managing your own servers.
For the last six months, my friend and his programmer have been trying to figure out why their site is so slow. It could take literally 5 minutes to load a user page. Updates to the database were proceeding at one every several seconds. Was the site heavily loaded? About one user every 10 minutes.
If a request on an unloaded server takes 5 minutes to load, and the programmer can not figure it out in 6 months, then that programmer is incompetent plain and simple. Laying this at the feet of Rails is just plain ridiculous.
I began emailing the sysadmins of the slices. How big was the MySQL database? How big were the thumbnail images? It turned out that the database was about 2.5 GB and the thumbnails and other stuff on disk worked out to 10 GB. The servers were thrashing constantly and every database request went to disk. I asked “How could this ever have worked?” The database “slice” had only 5 GB of RAM. It was shared with a bunch of other sites, all of which were more popular than mitgenius.com.
Are you implying that you need enough RAM to keep the entire database in physical memory? That is patently false. In a worst case scenario, yes it could take performance down quite a bit, but disc access is not nearly as slow as implied above. I've served tons of sites on pure shared hosting (not even virtualized) with much higher load and orders of magnitude better performance than you are describing here.
How could a “slice” with 800 MB of RAM run out of memory and start swapping when all it was trying to do was run an HTTP server and a scripting language interpreter? Only a dinosaur would use SQL as a query language. Much better to pull entire tables into Ruby, the most beautiful computer language ever designed, and filter down to the desired rows using Ruby and its “ActiveRecord” facility.
This is nonsense Philip. Please don't take this as an ad-hominem, because there's no other way to put this. What you described here is 100% pure nonsense. ActiveRecord, like any ORM component, abstracts away some SQL in order to simplify common database interactions. The lion's share of ActiveRecord code is all about constructing efficient SQL. When you are developing with Rails it shows you all the SQL running the development log, and you can quickly spot n+1 errors. If you need something more efficient, it offers plenty of levels of access right down to pure SQL.
In reviewing email traffic, I noticed much discussion of “mongrels” being restarted. I never did figure out what those were for ... What am I missing? To my inexperienced untrained-in-the-ways-of-Ruby mind, it would seem that enough RAM to hold the required data is more important than a “mongrel”. Can it be that simple?
I'm shocked that a programmer would speculate so wildly as to say something like this. A mongrel is an application server. I don't understand what you seem to think it is, but it's simply the process serving up Rails requests to the web server and passed through to the client. Typically you run more than one so you can serve multiple requests concurrently, but for a well-optimized app usually no more than 3 or 4 are necessary. Rails uses a non-threaded share-nothing architecture which means you can scale horizontally across unlimited servers. Note that I am not talking about virtualized servers. I'm talking about when you have more traffic than the biggest server in the world can handle, Rails will let you scale out painlessly at the web server level until your database can not be served by a single box. At that point you need to look at database sharding, or alternative data stores using Map-Reduce or some other scalable database solution.
None of this is to say Rails doesn't have its warts. Ruby is memory hungry, leaky, and relatively slow. Deployment has traditionally been very complicated compared to something like PHP (although it's much improved with Phusion Passenger aka. mod_rails for Apache/Nginx). There are many reasons why you would be well-advised not to use Rails, however this article doesn't touch on any of them. Rails, just like Oracle, .NET, Java or many other technologies is a proven platform with pros and cons. In this article you pit an apparently competent programmer developing swaptree.com against what can be described as nothing less than a complete bumbling idiot using Rails. You insist the cost of Rails is high without any justification or direct comparison against the costs of swaptree.com.
I've read your blog in the past and found it to be pretty interesting, which is why I've taken the time to write this response, and suggest politely that you retract this article.
My short reaction: Good lord, you know all that and still feel like the article is targeted at you? Or at Rails as a technology?
Long reaction: You missed the point of the article, which is that keeping on top of the latest and greatest technologies is almost never necessary, and it is never sufficient under any circumstances. You don't have to know what a mongrel is. You do have to understand the orders-of-magnitude difference between different levels in the memory heirarchy. (RAM is much better than disk -- a simple, stupid fact that people ignore all the time.) There are lots of people running around with credentials and hot technologies who don't know what they're doing, and there are lots of young people who worship those guys and spend their time running after trendy stuff because they haven't yet figured out the difference between learning technology and deciding what to wear. (Which might not be as bad as relying on engineering principles to choose your wardrobe. Hmmm, personal food for thought.)
Sure his article isn't particularly original in intent or execution, but the need for this article is perennial. You have to keep updating it because it's aimed at people who only pay attention if you talk about the current latest and greatest. That's why Rails was the perfect victim -- that's where his target audience is right now. (And the Microsoft stack is the perfect frumpy foil to Rails.) Not that the Rails community doesn't contain other kind of people; it evidently does, or posts like yours wouldn't exist. But it is also The Trendy Thing and is therefore cursed with attracting the naive my-favorite-band-is-better-than-yours types who think "follow the buzz" is the successful strategy for all domains of life.
Fast-forward ten years, and I'm sure he'll have written the same article with the blanks filled in with another hot technology. Which is a good thing.
So first of all the article takes two degenerate cases and stereo types and generalizes them to the extreme.
Part of the problem is that if he was dealing with EngineYard, they definitely have some issues with their infrastructure. After having hosted with them, I don't recommend clients use them any more. They are more marketing than technical substance. One of the critiques that he makes is that their database infrastructure is on a shared architecture - which unfortunately is true. They separate all of their front end slices out, but all of their database are on a shared architecture. Unfortunately with most apps that have scaling issues, those issues are related to database access - which makes that the exact worse part of the system to be shared. Without going into too much more EY bashing, they sell you on the idea that you have a full time admin working on your site, but the reality is vastly different. Personally, I've had much better luck using SliceHost than EngineYard, but YMMV.
I've been working with EY for two years. I've also had a few SliceHost slices over the years. I've also worked with a dozen other hosts ranging from managed Rackspace servers through Xen and Virtuozzo VPSes down to shared hosts, administered my own Linux server on Internet2 a university, worked for a full service agency that resold white label hosting to over 1000 clients, and I've been building websites since 1994.
The fact that you would compare SliceHost to EngineYard is very very fishy. These two companies are both excellent, but they are not providing services that can even be compared to each other. What I said about EngineYard is not based on "marketing", it's based on considerable experience working with them. I have put them to the test and they have never come up short. I've worked with techs in 10+ countries at all hours to solve problems, and solve them quickly. I think it borders on libel to say they don't have support on hands 24 hours a day, they absolutely do. Rackspace is the only company I've used that matched that level of service and expertise. Try asking SliceHost for help with your server admin, it's just not available. Ezra Zygmuntowicz, one of the founders of EngineYard, literally wrote the book on Rails deployment. They have significantly sponsored Rubinius and Passenger development. The made TextDrive (aka Joyent, originally the "official" rails host) look like amateurs. Their cap recipes gem and stock monit scripts are more comprehensive and reliable than anything else I've seen anywhere else.
As far as the database sharing issue is concerned, the purpose of sharing is to allow you to save money by using only the resources you need. It's not true that all database machines are shared. They have spec'ed them out and refine based on the amount of resources need by on actual usage of their clients. If you get to the point that you actually need a dedicated database server than they will gladly provide that to you. At that point you will probably need multiple dedicated app servers as well, though of course that depends on the particulars of your app. In any case, I can't think of anyone that can build you a better Rails cluster. Sure you pay a premium for their expertise, but that experience is currently second to none.
My comparison of SliceHost to EY is also based on actual experience deploying large scale sites. EY claims to offer all the services in the world, but at the end of the day the technology they delivered was sub par. I understand all of the things you say about Ezra writing the Rails deployment books and their support of the community. None of that changes the fact that their offerings are vastly over priced for what they deliver.
I've been building web solutions since 1994 as well, and have built dozens of web sites for Fortune 50 and better companies. I fully understand what it takes to scale out an architecture to 30,000 transactions per second at 30% average cpu utilization, or to deal with 70 Terabytes of text and images in sub second response times. I've built arbitrarily deeply nested hierarchies with 100's of millions of items that have to return their result sets in sub-second response times. I don't mention this to brag, but to illustrate that I have substantial system engineering and architecture background.
I'm not sure how you can say that comparing SliceHost to EngineYard is "very very fishy". It depends on your perspective I guess. EngineYard seems to cater to companies that don't have strong in-house database talent, or who aren't comfortable with certain things like creating a Capistrano deployment file or basic sysadmin tasks. If you are comfortable with those things though, when you compare what they do with what they claim, they fall very short imo.
The thing with SliceHost is that they focus on one thing, which is delivering virtualized resources. And they deliver them fast. You can have a new slice up and running with SliceHost within minutes, where with EngineYard that same tasks takes weeks. On multiple occasions we had to escalate to one of the company owners before we got a new slice created.
So perhaps for my background, and for the needs of the companies I've been at, maybe EngineYard's services weren't a great fit. I'm willing to give them the benefit of the doubt. If you just need virtualized resources though and are comfortable building out your own architecture from there, EY may not be the best fit.
I do find it interesting that you mention Rackspace's service, considering they bought SliceHost sometime ago.
If you have the manpower and talent to administer your own servers then clearly EY is overpriced. However the cost of acquiring said talent is significantly higher, and much riskier for smaller companies. My experience is not at as big a scale as yours, but I can see how there are increasing economies of scale of self-administration (and purchasing your own hardware, etc) as you get bigger.
However my beef is saying that you had better luck with SliceHost than EY, which to me is a non-sensical comparison. SliceHost doesn't offer any of what you're paying for at EY. However you explanation clarifies things significantly for me. EY is not very good at SliceHost's core competency, I'll tentatively agree with you here since I haven't done a lot of Slice commissioning on EY.
He did not reply. He deleted the comment (along with ostensibly dozens of others), and you can see his justification in comment #8. Clearly he did not address any of my points, and is not interested in an honest discussion.
The guy has lost all my respect. If he deletes this kind of comment than what else is he deleting? I thought of a number of his articles were quite good, but I can't trust someone who deletes comments that were based on this much consideration and experience. As far as I'm concerned he's an intellectual hack and I'll be avoiding his site from now on.
The kind of guy who would sincerely reply to your post would not have written the original article in the first place. Intellectual honesty does not seem to be one of his priorities.
Disappointing, but not surprising given the things I've seen written about him.
I would add that the original article is the perfect example of why a top-down argument is a not a valid argument. In general, you should never start from examples and work your way down to a bottom line - as this post shows, you can easily make correlations that are flat out wrong or can be explained in other ways that are more substantial.
If you want to learn how to properly draw conclusions from examples, learn statistics. Stats is all about making sure you have sufficient evidence (large sample sizes, small p-values) to back up a correlation.
"For the last six months, my friend and his programmer have been trying to figure out why their site is so slow. It could take literally 5 minutes to load a user page. Updates to the database were proceeding at one every several seconds. Was the site heavily loaded? About one user every 10 minutes."
I would have replied with "My staff sent me an internet last thursday and it only arrived this morning", but you get points for restraint.
I read the moderation policy where it's suggested that reviews of the post are not valued. However I feel an obligation to point out the factual errors in this post. There are dozens of nonsensical assertions and could potentially be very misleading to anyone who doesn't understand Rails or web development in general.
My first general critique is that there is no real comparison going on here. It says the business guy called up Microsoft and they recommended buying a bunch of hardware, but there's no discussion of who developed the site or how they got up and running. There's no discussion of the price of the hardware, which clearly looks to be well into the 5-figures, or the price of the fiber connection at home, system administration, backups, etc. To get into some specifics:
The programmer, being way smarter than the swaptree idiot, decided to use Ruby on Rails, the latest and greatest Web development tool. As only a fool would use obsolete systems such as SQL Server or Oracle, our brilliant programmer chose MySQL.
This is a caricature of an "MIT Genius" that doesn't jive with reality. Anyone who was actually that smart would know better than to dismiss Oracle in favor of MySQL. They may prefer using Ruby on Rails and be more productive than if they used .NET, but they wouldn't go around calling people idiot's for such superficial reasons. Therefore you're not describing an actual genius, just someone who thinks they are a genius, but is actually a fool. Using such a person as the basis for an argument of why Microsoft's recommendations are better than Rails is intellectually dishonest.
How do you get scale and reliability? Start by virtualizing everything. The database server should be a virtual “slice” of a physical machine, without direct access to memory or disk, the two resources that dumb old database administrators thought that a database management system needed.
The reason that virtualization is done in the web deployment world is so that you can get access to fast and reliable hardware if you need less than the cost of the full resources. A degenerate example would be that if your capacity requirements could be met by a 250mhz processor, you would get better throughput by using 1/8th of a 2Ghz server. The reasoning for this is that the vast majority of sites don't need dedicated hardware, which you seem to imply as being cheaper, but clearly it is not if you are leasing server capacity.
Ruby and Rails should run in some virtual “slices” too, restricted maybe to 500 MB or 800 MB of RAM. More users? Add some more slices!
I'm going to assume you are talking about EngineYard here, since that is the managed Rails hosting provider I am most familiar with and is somewhat inline with your pricing figures below. First, the 500 or 800 MB is just a base amount of RAM that is good for most small Rails apps. When that starts to run out, the solution is NOT to add more slices, you simply commission more RAM. EY can do this without even restarting your slice. Incidentally you can also commission more CPU if you need it. The reason they start with two production slices is for redundancy. One of your slices goes down for some reason? That's okay because there's a backup.
The cost for all of this hosting wizardry at an expert Ruby on Rails shop? $1100 per month.
What you described above is a very poor description of what you are paying for at a managed hosting provider like EngineYard. I will describe managed hosting in a minute. But to compare to your unmanaged Microsoft example, I currently pay $8/month for a 256MB of unmanaged hosting that is plenty to server significant traffic on a well optimized app. This is an order of magnitude less than the Verizon FiOS line alone, and provides much better network connectivity (ie. multiple tier-1 connections, lower latency to more endpoints).
With managed hosting at EngineYard, you are not just paying for the server. You are basically paying for a fulltime system administrator. They have people all over the world ready to help you at a moment's notice any time of day or night. They proactively monitor your server and contact you if they notice any abnormalities. They provide a large suite of finely tuned recipes and standard software installations that they can install on a moment's notice, and will tie into their monit-based server monitoring setup. The individual machines in the cluster are optimized for their specific tasks. The network hardware and topography is optimized for real world usage scenarios. They continuously tune the machines for throughput and move clients around to avoid bottlenecks. They will even take significant steps towards helping the client tune their own application, above and beyond their contractual obligations for server adminstration. In short, you've completely ignored 95% of what they do, and painted it as extremely expensive without even providing a comparison against the overhead costs of buying and managing your own servers.
For the last six months, my friend and his programmer have been trying to figure out why their site is so slow. It could take literally 5 minutes to load a user page. Updates to the database were proceeding at one every several seconds. Was the site heavily loaded? About one user every 10 minutes.
If a request on an unloaded server takes 5 minutes to load, and the programmer can not figure it out in 6 months, then that programmer is incompetent plain and simple. Laying this at the feet of Rails is just plain ridiculous.
I began emailing the sysadmins of the slices. How big was the MySQL database? How big were the thumbnail images? It turned out that the database was about 2.5 GB and the thumbnails and other stuff on disk worked out to 10 GB. The servers were thrashing constantly and every database request went to disk. I asked “How could this ever have worked?” The database “slice” had only 5 GB of RAM. It was shared with a bunch of other sites, all of which were more popular than mitgenius.com.
Are you implying that you need enough RAM to keep the entire database in physical memory? That is patently false. In a worst case scenario, yes it could take performance down quite a bit, but disc access is not nearly as slow as implied above. I've served tons of sites on pure shared hosting (not even virtualized) with much higher load and orders of magnitude better performance than you are describing here.
How could a “slice” with 800 MB of RAM run out of memory and start swapping when all it was trying to do was run an HTTP server and a scripting language interpreter? Only a dinosaur would use SQL as a query language. Much better to pull entire tables into Ruby, the most beautiful computer language ever designed, and filter down to the desired rows using Ruby and its “ActiveRecord” facility.
This is nonsense Philip. Please don't take this as an ad-hominem, because there's no other way to put this. What you described here is 100% pure nonsense. ActiveRecord, like any ORM component, abstracts away some SQL in order to simplify common database interactions. The lion's share of ActiveRecord code is all about constructing efficient SQL. When you are developing with Rails it shows you all the SQL running the development log, and you can quickly spot n+1 errors. If you need something more efficient, it offers plenty of levels of access right down to pure SQL.
In reviewing email traffic, I noticed much discussion of “mongrels” being restarted. I never did figure out what those were for ... What am I missing? To my inexperienced untrained-in-the-ways-of-Ruby mind, it would seem that enough RAM to hold the required data is more important than a “mongrel”. Can it be that simple?
I'm shocked that a programmer would speculate so wildly as to say something like this. A mongrel is an application server. I don't understand what you seem to think it is, but it's simply the process serving up Rails requests to the web server and passed through to the client. Typically you run more than one so you can serve multiple requests concurrently, but for a well-optimized app usually no more than 3 or 4 are necessary. Rails uses a non-threaded share-nothing architecture which means you can scale horizontally across unlimited servers. Note that I am not talking about virtualized servers. I'm talking about when you have more traffic than the biggest server in the world can handle, Rails will let you scale out painlessly at the web server level until your database can not be served by a single box. At that point you need to look at database sharding, or alternative data stores using Map-Reduce or some other scalable database solution.
None of this is to say Rails doesn't have its warts. Ruby is memory hungry, leaky, and relatively slow. Deployment has traditionally been very complicated compared to something like PHP (although it's much improved with Phusion Passenger aka. mod_rails for Apache/Nginx). There are many reasons why you would be well-advised not to use Rails, however this article doesn't touch on any of them. Rails, just like Oracle, .NET, Java or many other technologies is a proven platform with pros and cons. In this article you pit an apparently competent programmer developing swaptree.com against what can be described as nothing less than a complete bumbling idiot using Rails. You insist the cost of Rails is high without any justification or direct comparison against the costs of swaptree.com.
I've read your blog in the past and found it to be pretty interesting, which is why I've taken the time to write this response, and suggest politely that you retract this article.