Be careful what you test or deploy to Vercel

Rauchg · on April 6, 2023

Re-posting my Twitter update[1] here:

We've concluded our analysis.

1. We're refunding the overages

2. We identified the root cause

The root cause is that the Astro bundle handed to the deployment process is monolithic. There was a top-level `await` for an RSS endpoint which called an API with `fetch`. The issue is that these two (and the rest of the app) were bundled together!

Therefore, any time the function was invoked, that top-level `await` was running for all endpoints. It never yielded. And it's fully autonomous, which means it'd keep running regardless of even a browser being open once the chain reaction started.

This is a Swiss Cheese[2] kind of failure. It required the top level await, the monolithic bundle, and the RSS function using `fetch` (i.e.: over the network) rather than `import`-ing the data layer API directly.

Most importantly, what we're doing: we are going to deploy a fix to ensure this doesn't happen again, across frameworks. I really appreciate Mike raising this and hopping on a zoom call with me while our team investigated.

As an addendum for HN: we're continuing to refine the tools and patterns to best "harness" the practically-infinite ability for Serverless and Edge functions to scale horizontally. It's an awesome property, but it's taught us valuable lessons. We've come a long way in adding guardrails and alerts, and this will be another value-added protection that future customers will enjoy.

[1] https://twitter.com/rauchg/status/1644099739959590912

[2] https://en.wikipedia.org/wiki/Swiss_cheese_model

lostdog · on April 6, 2023

And what about billing limits or quotas?

5Qn8mNbc2FNCiVV · on April 8, 2023

Since it runs on AWS, it has the same limit/quota lag as AWS has

WXLCKNO · on April 6, 2023

Good update. Good refund. Well handled.

swyx · on April 7, 2023

well, yes, but it wouldve been better if support had also been trained that this sort of issue gets a refund escalation. one should not have to rank on Twitter AND HN to get good service for a very reasonable failure.

llambda · on April 9, 2023

I'm currently stuck in their support hell and have been told:

1. My issue is not real

2. Okay, your issue is real, but because I'm not paying $$$ we're going to ignore you

3. I should do free work for Vercel and poll their community forums to see how widespread the issue is

4. Their support is only trained to handle frontend issues and because this is an issue with their CDN it's expected that they'll respond incompetently

5. They'll escalate with their CDN team and respond in one week (that was over a month ago, no follow up whatsoever)

It's hard to take Vercel seriously. As a toy, it's probably fine. But I'll ultimately move this project off of their CDN product as soon as it reaches costly volume.

hoofhearted · on April 7, 2023

It’s really cool to see Guillermo actively on HN!

josevalerio · on April 6, 2023

Awesome response, handled well!

You guys are always on top of things community wise, especially Lee Robinson in any related thread :')

wut42 · on April 7, 2023

I don't think we can say it's been "handled well" when the customer had to rant on social media to get CEO attention.

If OP didn't had any followers or else, Vercel would probably had done nothing.

josevalerio · on April 7, 2023

Fair points, hopefully this brings about the needed changes in these billing systems to prevent it happening in the future. It happens all the time on other providers as well. I'm very critical of the actual necessity of these infinitely scalable systems if you find my other comment in this thread

josevalerio · on April 6, 2023

I wonder what the aversion is to using a plain old server / vps. It's really not that difficult to deploy nowadays [0][1][2][3] and I'd rather get an $8 bill every month as insurance than ever worry about shit like OP just went through. It'll probably be more performant anyway due to cold starts and "edge" still having to hit us-east-1 for data.. cache your static files with Cloud Flare/Front. People are always surprised by how much traffic a single VPS can take[4] and believe it all has to be serverless to be web scale. I believe HN still runs on a single core or something.

There's a ton of places to get cloud credits as well, too many to link, so just Bing™ it

[0] https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_...

[1] https://aws.amazon.com/apprunner/

[2] https://cloud.google.com/run

[3] https://render.com/

[4] https://hackertimes.com/item?id=34676186

underdown · on April 6, 2023

Vercel is sooo easy. Hook it to a repo and you’re done. It’s also cheap depending on use case. Edge functions are nice. Zero complaints from me as a customer for a little over a year.

I work with a small crew and having no server maintenance lets us ship more code. Really as simple as that.

ricardobeat · on April 6, 2023

The options mentioned do not require server maintenance. They are not VPSes though, and do not cost $8/month...

josevalerio · on April 6, 2023

You're right I used the term interchangeably - AWS Lightsail would have been a good service comparison, and those do start at $3.50[0]. The $8 figure came from running the smallest Fargate task at .25cpu and 512mb, excluding the load balancer etc. [1]

It was announced that App Runner (today) allowed for smaller instances[2], and the price/mo for this instance would basically be 1/4th of the previous default. So ~$14 [3]

[0] https://aws.amazon.com/lightsail/pricing/

[1] https://calculator.aws/#/estimate?id=42b33d2f26786ac08a3610a...

[2] https://twitter.com/danilop/status/1643968804727271425?s=20

[3] https://calculator.aws/#/estimate?id=a46d5f15ed9c079131c6a73...

josevalerio · on April 6, 2023

Took me less than 1 min to setup NextJS on AWS App Runner with auto deploys on commits, the rest was waiting for it to finish which I'll admit, Vercel is much faster at.

https://www.youtube.com/watch?v=kmbgATze8UU

recroad · on April 7, 2023

Did it for 10 years for a fairly large site. Biggest mistake ever. The security maintenance alone was a nightmare.

Maybe for some hobby projects but not for stuff that needs to stay up, and lets me focus on solving customer problems.

anthlax · on April 7, 2023

Bursty workloads? I.e people use my app from 9am - 11am and then nothing for the rest of the day. I’m paying $8 but only around $2 of compute is actually helpful. Setting spend limits on lambda / vercel / cloudfns means the entire $8 go to my bursty workload

brundolf · on April 6, 2023

I've never been able to understand why so many usage-based hosting platforms don't give you the option to say "if my bill goes above X, shut down the service instead of continuing to charge me". It seems so easy and obvious, and I'll never ever use a platform for personal projects that doesn't let you have a failsafe

Rauchg · on April 6, 2023

We'll definitely be looking into usage limits after the fix rolls out.

brundolf · on April 7, 2023

Glad to hear!

smt88 · on April 7, 2023

> I've never been able to understand why so many usage-based hosting platforms don't give you the option to say "if my bill goes above X, shut down the service instead of continuing to charge me".

Large, valuable customers wouldn't use this option. It would only help the hobbyist types who aren't doing anything mission-critical and aren't the platform's most profitable (not big spenders, not particularly loyal because they aren't big enough to tailor their infra to that platform, etc.)

brundolf · on April 7, 2023

I'm not sure that's true. It's probably a lower risk for them, but still, a runaway service can rack up unlimited bills. At some point it matters to everybody. Even if they set it to $CURRENT_CASH_RESERVES, that has non-zero value

And when we talk about the in-between - startups - I'm pretty sure I read at least one anecdote on here about being put out of business by surprise hosting bills

wruza · on April 7, 2023

Large, valuable customers wouldn't use this option

Unless they prefer to rely on being large and valuable as a leverage, they just have much bigger limits. And even in such autoscale-type company there may be a separate department that would want to limit their budget because they have one. Imagine spending $100k extra accidentally in your office while the whole budget is around $10M and now you can’t simply reach support on terms of “that billion was clearly a mistake”, because $10.1M seems normal.

encoderer · on April 7, 2023

With self-serve developer tooling like vercel they are hoping that devs bring the tools to work that they have been prototyping themselves. That’s why there’s a free tier.

TechBro8615 · on April 7, 2023

This is why I use virtual credit cards with preconfigured spending limits for each vendor.

If this happened to me, and Vercel refused to refund it, I would simply cancel the service and dare them to collect the debt.

neurostimulant · on April 7, 2023

Too bad more and more services start rejecting them. For example, GCP started rejecting virtual prepaid cards from my banks a while ago.

Buttons840 · on April 7, 2023

I was looking at Jarvislabs for renting some GPUs recently. They use a prepay model. Pay up front and things shut down when the credit is used up.

https://jarvislabs.ai/docs/accounts-user/add-credits/

personjerry · on April 6, 2023

Doesn't AWS offer limits? Smaller companies don't?

danielvaughn · on April 7, 2023

From what I remember, they don't offer limits, and neither does GCP. They offer billing alerts and that's it. Both cloud platforms operate on what they call a "shared responsibility" model, aka you bill it you buy it.

rickette · on April 7, 2023

And these alerts aren't realtime. Meaning damage is already done when you receive an alert 24h after the fact.

brundolf · on April 6, 2023

I thought I remembered someone saying AWS doesn't, and I've heard stories from people on here about other platforms

korean-pixel_13 · on April 7, 2023

I think Vultr offered a similar feature that was helpful for me. It's definitely good to see more companies looking at adopting this feature.

detaro · on April 7, 2023

the big clouds usually have something you can wrap around alerting on their usage tracking APIs, but that's often very "eventually consistent" and might be delayed by days. Which is understandable in some cases, is still useful for some cases, but e.g. doesn't reliably protect you from "oh crap a broken config launched way to many VMs and now I've spent 10k in a day"

thundergolfer · on April 7, 2023

It might be really hard to add in retroactively. Otherwise there’s no excuse, it’s such an annoying vulnerability for a customer, even if you can email support and get a refund. Where I work, modal.com, we have enforced budget limits for accounts.

brundolf · on April 7, 2023

> It might be really hard to add in retroactively

I don't see how?

  if (cap != null && service.currentBill > cap) {
    service.deactivate()
  }

lozenge · on April 7, 2023

So do you want all your S3 buckets deleted when you reach the cap? How about that pet EC2 with ephemeral storage?

brundolf · on April 7, 2023

Sure, it gets more complicated with storage. But I think we're mostly talking about stateless services here

And for storage or storage-having services you could easily have a "shut it down but preserve the data for a comparatively-small recurring fixed storage charge"

brianjking · on April 8, 2023

Because the customers that may use this feature don't make the company any money. The users that do wouldn't want or need this feature.

emptysea · on April 7, 2023

Usually these billing systems are somewhat async so they’re would probably still be overages

Better than nothing though

Nathanba · on April 6, 2023

it seems they make enough money without us as customers, to me it's weird as well.

swyx · on April 7, 2023

there are platforms out there that offer it, but i dont remember their name because its not a feature that leads, its a nice to have.

antonvs · on April 6, 2023

There’s a reason that services like AWS and GCP tend to be fairly forgiving of billing due to legitimate mistakes by first-time users (and others).

And that reason is perfectly summarized by this post: anyone reading this post is likely to put Vercel on a mental do-not-use list.

This is also the reason that cloud providers tend to have default quotas that, among other things, limit runaway usage.

Admittedly without knowing much about it, it sounds like Vercel may be pretty immature as a business.

Edit: after commenting I saw the Vercel response. I’m leaving my comment up because that response still seems to focus on a refund being conditional on some technical justification. That’s how engineers tend to think. It’s not how successful businesspeople need to think. It seems like both the refund policy and quota/limit management may need to be reviewed.

CJefferson · on April 7, 2023

I hear this claim that AWS is forgiving of legitimate mistakes, but I don't think it's true.

It's true if you have friends in Amazon, or you manage to appear on ycombinator or have a few thousand twitter followers. I've known at least three students end up with bills of around $100, which to them are huge, and which weren't waved.

lozenge · on April 7, 2023

The cost of responding to waive requests if $100 could be waived would be huge. We're talking about $10k and up.

antonvs · on April 7, 2023

The problem there is if you can’t afford $100, then under no circumstances should you consider using AWS or any major cloud provider. They simply don’t cater to that market.

peyton · on April 6, 2023

Yeah not worth the risk.

benatkin · on April 6, 2023

They also charge more, sometimes around double, for domains, like .studio is the nice even price of $50/yr on Vercel but it's some odd cheaper price like $26 on Namecheap.

And it's not like that part of it's better, they still have to pass on weirdness from registrars. I tried registering .md and that was failing. Probably not on their end but they didn't do anything extra like warn me it could happen or provide a helpful error message.

(I got excited about Zeit when it was going to be similar to Google Cloud Run or fly.io and am less interested in serverless that means things like database bouncers or using old versions of Node.js)

dahwolf · on April 7, 2023

Earlier this week I was in a call with a colleague, an architect.

He was explaining his plan to add 2 new methods to an existing API and ended with a cost calculation where based on projected usage, it would cost the company $13/month, and growing over time.

I was shocked at the amount as this is a tiny piece of a massive project but equally shocked about us now requiring to think like this. To decipher real monetary cost, line by line.

I guess that's why there's a new profession: cloud cost optimization engineer. Not for me though, I stand by my god given right to ship shitty code without consequences.

If that doesn't work, I'll become a cloud cost optimization engineer imposter. I start with intentionally expensive code, erase any trace that I had any hand in it, then come in to fix it. I take a 50% commission from the savings.

sillysaurusx · on April 6, 2023

Yeah, unfortunately I had a similar experience back when Vercel was Now. It seems like the optimal strategy is to stick with their free plan until you really need the paid features, then monitor it like a snake stalking a mouse.

The flip side is that they’ve been rock solid for years on their free plan. Super reliable and nothing but positive things to say about that tier.

Kinda surprising they wouldn’t forgive the $3k bill. That’s the cost of a nice MacBook Pro for a runaway experiment. You’d expect this sort of thing in ML training, not webdev…

nikita · on April 6, 2023

https://twitter.com/rauchg/status/1644099739959590912?s=46&t...

sillysaurusx · on April 6, 2023

Ah, wonderful. Happy they made the right business decision and refunded everything.

impulser_ · on April 6, 2023

I have never used functions before, but this seems like this is very expensive compared it every other cloud service provided like Digitalocean, AWS, GCP, and Azure.

I might be way off, but this would cost under $100 on all of them and you would most likely get a refund if you talked to support.

So why is Vercel so expensive? Do they not have a pricing limits that you can set? Seems like a very bad idea to run functions that are cost per usage on a service that has no way for you to set limits on usage.

flockonus · on April 6, 2023

After learning about this I will stay away until they implement obvious limits on usage. Not cool to have to appeal for public viral posting to waive an unfair bill.

neurostimulant · on April 7, 2023

In a serverless environment, you're only billed for how long your function is running. However, in this case the function never terminates even after request it handles completed.

I suspect the user made a mistake of deploying a traditional web app (where there is a main process that route requests and wait indefinitely for next requests) instead of deploying individual "function" that terminates gracefully after handling the request. Seems to be an honest mistake to make for first time user. In other platform, his process will be killed by the timeout limit (serverless platform usually has a strict timeout limit) but for some reason, in vercell it keep on running forever and racked up a huge bill.

muhammadusman · on April 6, 2023

I am glad the issue was resolved for OP. I've had a great experience with Vercel and never had anything of this sort happen to me, fingers crossed I don't run into it any time soon :D

berkle4455 · on April 6, 2023

Can you still create an AWS lambda triggered by S3 event notifications that writes to the same bucket?

krvajal · on April 6, 2023

Happened to me. I got a 5k bill

benjaminwootton · on April 6, 2023

I had a very similar use case when I deployed Databricks through AWS. It went haywire and issued a trillion PUT requests to S3. I ran up a huge bill and was told to pound sand when I asked for a refund.

stevebmark · on April 6, 2023

Is serverless a scam? On a VM this wouldn't have affected billing

lmm · on April 7, 2023

Being able to scale up seamlessly to handle more load is exactly what serverless is supposed to do. But of course that comes with a downside.

antonvs · on April 7, 2023

That’s what quotas are for though. Vercel has no legitimate excuse here.

namaria · on April 7, 2023

All this brand name datacenter collocation and brand name tooling is little more then vendors seeking rent.