I disagree entirely. There’s a real non-negligible cost to Reddit hosting the co...

fkyoureadthedoc · on June 6, 2023

If it was that expensive to host the API, they wouldn't have done it for over a decade. Reddit's popularity was at least partially built on the labor of 3rd party devs building mobile apps, moderation tools, useful bots, all types of stuff.

The bottom line is they'll happily pull the rug from under those 3rd party devs if it means they can pump their numbers for their IPO.

compiler-guy · on June 6, 2023

Multiple LLMs scraping all of reddit is a far different deal than how the APIs were used in the past.

Small cost x moderate usage = moderate cost Small cost x overthetop usage = overthetop cost

What they should do is really hammer the LLMs.

wkat4242 · on June 6, 2023

They could just block them if they want. Can't be that hard.

I think this is more about the IPO and inflating figures.

dingledork69 · on June 6, 2023

OR

- Everyone will just run web scrapers increasing the load on reddit's servers even more

An API actually REDUCES load and lets you manage it.

jollofricepeas · on June 6, 2023

Your solution isn’t a solution.

Please answer the question:

How is Reddit supposed to make money with increased load and demand for its content?

ndriscoll · on June 6, 2023

The framing in the question is wrong. Reddit gets ~80 comments per second on average. An API to get comments with ids starting after n would let you reasonably scrape all new comments with 1 request per second (or one every 10 seconds if the API returned up to 1000 results, which would be entirely reasonable). They get ~10 posts per second, so add another 1 request for that. The load is trivial. Storing the content is also trivial; the entire history of reddit fits very comfortably on a single consumer-tier SSD.

The infrastructure they provide is easily replicable at almost no cost (really it's just the bandwidth that costs any money at all). The community curation and moderation is done by volunteers. The content is all from the users. Reddit Inc is providing almost none of the value, and is just benefiting from network effects. People go there because people go there.

charcircuit · on June 7, 2023

Reddit hosts images and videos

flangola7 · on June 7, 2023

Only recently. For most of its history of only stored text.

charcircuit · on June 7, 2023

It was 7 years ago.

ddingus · on June 7, 2023

Yes, and those are features most Reddit users could care less about and or can be linked easy enough.

The real meat and potatoes is plain old text with Reddit style markup.

dingledork69 · on June 6, 2023

Your question isn't the question you should be asking.

Please answer the question:

How is reddit supposed to prevent scraping when it is legal to do so, and they have incentive to appear in search engines? Considering that scraping is legal and WILL happen, why not opt to reduce the load by offering an API?

brabel · on June 6, 2023

If consuming their content by scraping was easy, nobody would be complaining about the APIs being taken away... it's very easy to make web scraping impractical if they want. Don't be deluded.

dingledork69 · on June 6, 2023

The context of the thread is machine learning companies. You're delusional if you think they don't have a ton of web scrapers running 24/7 already.

krageon · on June 6, 2023

This content is text, it is extremely cheap to host. The site itself is not particularly resilient, so there is not a lot of overhead there. 10 euros a month is an absolutely ridiculous price for what probably doesn't even cost them 20 cents per user, per month.