Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

I disagree entirely.

There’s a real non-negligible cost to Reddit hosting the content.

The bottom line is there’s solutions here.

- High volume API users pay up and subsidize the cost of everyone else

OR

- All Reddit users pay a monthly 9.99 subscription and the API stays the same.

OR

- A not-for-profit let’s say Internet Archive takes ownership and begs the Reddit community for donations (ie. Wikipedia)



If it was that expensive to host the API, they wouldn't have done it for over a decade. Reddit's popularity was at least partially built on the labor of 3rd party devs building mobile apps, moderation tools, useful bots, all types of stuff.

The bottom line is they'll happily pull the rug from under those 3rd party devs if it means they can pump their numbers for their IPO.


Multiple LLMs scraping all of reddit is a far different deal than how the APIs were used in the past.

Small cost x moderate usage = moderate cost Small cost x overthetop usage = overthetop cost

What they should do is really hammer the LLMs.


They could just block them if they want. Can't be that hard.

I think this is more about the IPO and inflating figures.


OR

- Everyone will just run web scrapers increasing the load on reddit's servers even more

An API actually REDUCES load and lets you manage it.


Your solution isn’t a solution.

Please answer the question:

How is Reddit supposed to make money with increased load and demand for its content?


The framing in the question is wrong. Reddit gets ~80 comments per second on average. An API to get comments with ids starting after n would let you reasonably scrape all new comments with 1 request per second (or one every 10 seconds if the API returned up to 1000 results, which would be entirely reasonable). They get ~10 posts per second, so add another 1 request for that. The load is trivial. Storing the content is also trivial; the entire history of reddit fits very comfortably on a single consumer-tier SSD.

The infrastructure they provide is easily replicable at almost no cost (really it's just the bandwidth that costs any money at all). The community curation and moderation is done by volunteers. The content is all from the users. Reddit Inc is providing almost none of the value, and is just benefiting from network effects. People go there because people go there.


Reddit hosts images and videos


Only recently. For most of its history of only stored text.


It was 7 years ago.


Yes, and those are features most Reddit users could care less about and or can be linked easy enough.

The real meat and potatoes is plain old text with Reddit style markup.


Your question isn't the question you should be asking.

Please answer the question:

How is reddit supposed to prevent scraping when it is legal to do so, and they have incentive to appear in search engines? Considering that scraping is legal and WILL happen, why not opt to reduce the load by offering an API?


If consuming their content by scraping was easy, nobody would be complaining about the APIs being taken away... it's very easy to make web scraping impractical if they want. Don't be deluded.


The context of the thread is machine learning companies. You're delusional if you think they don't have a ton of web scrapers running 24/7 already.


This content is text, it is extremely cheap to host. The site itself is not particularly resilient, so there is not a lot of overhead there. 10 euros a month is an absolutely ridiculous price for what probably doesn't even cost them 20 cents per user, per month.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: