Dubsmash: Scaling to 200M Users with 3 Engineers

gjhgqpqndpe · on Dec 24, 2017

Sort of interesting just to hear about the ups and downs of companies like dubsmash. They were often cited as an example of Berlin's future as a startup city [1]. They went from 35+ employees to 27 [2] to now 12 as they've stated in this post. They also moved from Berlin to New York, which seems to imply they felt like the city couldn't offer what they currently need. It looks like in the process of moving they didn't take that many of the employees with them (maybe this was also a way out of strict German employment rules?) Seems like a bit of an attempt at a restart (co-founder Roland Grenke seems to be gone, etc).

[1] http://www.wired.co.uk/article/european-startups-2016-berlin [2] https://techcrunch.com/2016/11/30/dubsmash-9m/

svantana · on Dec 24, 2017

Looking at their rank history at AppAnnie, they were doing really well in 2015 but it's been downhill from there (from top 10 to >500 in all the major App store charts). How they were able to go from 140M to 350M downloads in the last year (compare this article with the techcrunch one) is a complete mystery. Also, stating your number of users without any qualifier (e.g. MAU) in a tech article is a bit of a red flag, in my experience that usually means that it's a vanity number (yearly active? Who knows).

kolmogorov · on Dec 24, 2017

It also sounds odd that they have 3 engineers and 12 employees. What do the other people do? And hopefully they had more than 3 engineers back when they had 35 employees...but even then why would they choose to fire engineers and have that tech to non tech ratio?

tqkxzugoaupvwqr · on Dec 25, 2017

Dubsmash relies heavily on copyrighted content from big studios (at least it did when it became popular). I guess most staff works hand in hand with media companies to promote their content inside the app.

tschellenbach · on Dec 24, 2017

We share a Slack channel with Dubsmash, (they use getstream.io for their feed technology), I heard that they are hiring like crazy at the moment.

gjhgqpqndpe · on Dec 24, 2017

Sounds like that deserves a bit of caution. They were "hiring like crazy" not long ago in Berlin, I guess those jobs would have lasted less than a year:

https://twitter.com/justproductjobs/status/83132760282459340... https://twitter.com/BetaListJobs/status/829744239852548096 https://twitter.com/BetaListJobs/status/824655808680095753 https://stackoverflow.com/jobs/130465/android-developer-f-m-...

kunthar · on Dec 24, 2017

interesting, they even couldn't manage a basic signup process. app says my email is in the users list. when i try to login, it says no email is exist in their system. tried forgot option same. ok, hire someone to manage your acl part guys ,_,

sundev · on Dec 24, 2017

Agreed, that part was lengthy as well, seemed to be way too many steps, but with the amount of users they've acquired that process doesn't seem to be an issue.

gaius · on Dec 24, 2017

I hope those three engineers have meaningful equity and exposure to upside because it sounds like they’re working like sled dogs

vemv · on Dec 24, 2017

Thought the same. My wild guess is that those engineers are at their career peek in terms of energy / ability to deliver glue code, but a few years behind getting to be a well-rounded engineer that can live/work sustainably.

windowshopping · on Dec 25, 2017

Three engineers maintain code in Java, Swift, previously Objective-C, Go, Python (both Django and Flask), Node.JS, considering Kotlin, and additionally make use of Celery, RabbitMQ, React, Redux, Apollo, GraphQL, Postgres, Heroku, AWS, Jenkins, Kubernetes, Redis, DynamoDB, Elasticsearch, Algolia, Memcached, and more.

I might be an inexperienced engineer by comparison, but I'll be honest, that sounds absolutely fucking insane. These three people must be geniuses to be able to use all of that with sufficient mastery to effectively handle 200M users.

pepijndevos · on Dec 24, 2017

Sometimes I wonder if there are any internet companies (startup or otherwise) that do customer support. With numbers like that, it's hard to imagine one of those users getting even one second of attention with any problems they might have.

dan_mctree · on Dec 25, 2017

You can only really do customer support if it makes financial sense, which it won't unless you make a significant amount of money on your average customer. Tech companies that don't have sales, but instead take their revenue through ads or through selling data are making cents per customer. With average profit that low, even 1/1000 customers making use of your support for 5 minutes would destroy any chance of profit.

mooreds · on Dec 24, 2017

Pretty cool story. All about automating all the things!

Would love to read more about whether they started with microservices or had an MVP monolith that they then cut parts off of.

bambax · on Dec 24, 2017

> We since have moved to a multi-way handshake-like upload process that uses signed URLs vendored to the clients upon request so they can upload the files directly to S3.

How does this work in practice / where can one learn more about this?

sergiotapia · on Dec 24, 2017

It's actually really simple and quick to do. You offload a ton of processing power to S3.

It works like this:

User tells the backend, “I want to upload picture.jpeg!”

Backend tells the user, “Alright you have my permission but ONLY for that filename with that extension. Here’s a token, enjoy.”

User uses that signed token and pushes the file to your S3 bucket.

Here's how you do it in Phoenix. https://sergiotapia.me/phoenix-framework-uploading-to-amazon...

rawnlq · on Dec 24, 2017

I want to make sure that I understand the security aspect of this.

You can argue that the user can upload anything using the original api anyway. But in the original case you can do server-side validation before the upload is proxied. I am thinking stuff that are domain specific like only allowing videos that are 6 seconds long or something.

You can move the validation to the client but the client can be easily modified. An actual user might not do this but someone trying steal your storage space (for serving malware or something) might?

These signed urls also seem to expire based on time so you can potentially save the url and upload again later if you allow generous expiration. (again, not really something I see being a huge problem)

But I guess these aren't really serious issues compared to the cost savings. Am I missing other ways this can be exploited?

I am looking into the GCS version, not S3, if that matters: https://cloud.google.com/storage/docs/access-control/signed-...

ryanworl · on Dec 24, 2017

You would use two buckets in this case. Input bucket gets consumed by worker processes to do the transcoding (and validation) and then they upload into the output bucket. The output bucket is what you serve to clients (hopefully with a CDN in front).

rawnlq · on Dec 24, 2017

Thanks! This is a great solution but none of the tutorials/blogs I read on pre-signed uploads mentions it.

Do you have links (or just keywords) to learn more? Will I need to add something like Cloud Pub/Sub to my stack? https://cloud.google.com/solutions/using-cloud-pub-sub-long-...

This is more complicated than I imagined so I am not sure the cost saving will still work out (factoring in development time and extra code maintenance cost).

ryanworl · on Dec 24, 2017

No, I don’t, sorry. What I can promise you is that you’ll thank yourself for implementing it! There is hardly any additional complexity here because you’d probably be uploading the derived content somewhere anyway. Now you’re just putting in a different place than the source.

You can use whatever queue you’re comfortable with so long as you can pipe the upload events from the bucket into it. The pattern I’m outlining is just a physical separation of buckets to make access control much harder to screw up.

TheReveller · on Dec 24, 2017

I can comment on using pub/sub - it's an immensely useful abstraction for these kinds of tasks and something that is quite difficult to implement yourself with the same level of guarantees that using the cloud service will provide. Any time you need to pass information or trigger events asynchronously messaging is the first choice IMO.

theIV · on Dec 24, 2017

http://docs.aws.amazon.com/AmazonS3/latest/dev/PresignedUrlU...

Not 100% sure what they mean by _vendored_ here, but I'm guessing they make a request to one of their backends to generate the URL and return it to the client for use.

antoncohen · on Dec 24, 2017

Heroku has good docs on this, for multiple languages:

https://devcenter.heroku.com/articles/s3#file-uploads

One thing to keep in mind, users should be able to upload (to the specific signed URL), they should not be able to download from that location. Don't make the files users can upload publicly downloadable, otherwise you can be used to host malware. After the video/image is uploaded, you need to download and process it[1], then upload it to an S3 bucket that allows download (e.g, via CDN).

[1] Use caution when processing user content. It is best to process media in a sandbox that can protect you against exploits in the media processing libraries.

CryoLogic · on Dec 24, 2017

Client makes request to server passing back auth token, server verifies auth token and uses the S3 library to generate a unique 1 time use URL for upload directly to the client. Client makes a put request to the s3 url. After it's finished s3 revokes the URL.

Multipart signed upload is much harder and requires signing every chunk.

Just google s3 signed upload there are a few tutorials from Amazon.

pul · on Dec 24, 2017

> However, we discovered after some time that the custom Python implementation for those workers was dropping up to 5% of the events. This was mostly due to the nature of how reading happens with Kinesis: every stream has multiple shards (ours up to 50!) and each reading client would use a so-called shard iterator to keep track of where it was reading last. Since the used machines could always crash, be recycled, or scaled down, we needed to save those shard iterators in some serialized format to Redis and share them across machines and process boundaries. Since we had so many shards, every once in awhile we would skip events and hence lose them.

I've never worked with Kinesis, but in Kafka you'd store offsets specifically to solve this issue. When one of the members of a consumer group would drop out, the partition (read: shard) would automatically be reassigned to another member. This gives an at least once delivery guarantee, combined with idempotent actions gives effectively once semantics. No need to loose any messages. What was the issue that the dubsmash engineers were solving here?

alexdean · on Dec 24, 2017

With Kinesis, you would just use the Kinesis Client Library (https://github.com/awslabs/amazon-kinesis-client-python) which would automatically handle committing the offsets to DynamoDB.

Home-rolling a checkpoint-free event pipeline is a rookie mistake; it's a pity they didn't come across our Snowplow project (Apache 2.0 event pipeline running on Kinesis, Kafka and NSQ, https://github.com/snowplow/snowplow/).

pinarello · on Dec 24, 2017

Anyone knows how they handle copyrights of movies and music?

mooreds · on Dec 24, 2017

Depending on the length of the clip it might be handled under fair use. From the first page of google:

http://lemoinefirm.com/parody-fair-use-or-copyright-infringe...

coob · on Dec 24, 2017

They don't

mlevental · on Dec 24, 2017

I wonder if it qualifies as fair use since it's a mashup

karterk · on Dec 24, 2017

> Although we were using Elasticsearch in the beginning to power our in-app search, we moved this part of our processing over to Algolia a couple of months ago;

How many records are you storing on Algolia?

sundev · on Dec 24, 2017

I'd love to know also, Algolia pricing seems extremely expensive, but I'd also imagine at 200M users they have sufficient funds to pay for it.

tschellenbach · on Jan 2, 2018

Running an alternative solution with similar availability, performance and relevance will in most cases be substantially more expensive though. It really depends on your use case.

searchfaster · on Dec 24, 2017

If you are interested in trying out an alternative, please let me know.

Redsquare · on Dec 24, 2017

I would, we run 2 128gb enterprise algolia clusters at significant cost, 200 million documents

searchfaster · on Dec 25, 2017

Please take a look at https://searchera.io for a demo. My personal email is on my profile.

karterk · on Dec 25, 2017

Mind if I reached out to you to chat about your experience in running search at scale? How do I contact you?

Sytten · on Dec 25, 2017

You should maybe take a look at Coveo

usermike · on Dec 24, 2017

So is it 200M (daily, monthly) active users, all time users? but I assume it's the number of users since the launch of the app.

ramshanker · on Dec 25, 2017

Their signup page failed. It wouldn't accept my email. And when I go to password reset, it says User/Email does not exist.

Neither does Facebook login work.

dominotw · on Dec 24, 2017

jobs link from that page https://www.dubsmash.com/jobs/ seems to be dead?

wkd · on Dec 24, 2017

Bad link, remove the trailing slash

sundev · on Dec 24, 2017

Its the little things :)

gagabity · on Dec 24, 2017

200M users on Heroku must cost a fortune!

the_scrivener · on Dec 24, 2017

I am genuinely curious about the trade-offs, as the bad and the ugly are not mentioned. Being realistic, there are too many moving pieces there, and yet the team of 3 remains experimental?