I got tired spam calls and text, so I built a script that automates the opt-out process across 500+ data brokers on a monthly schedule.
Where I need help:
The heuristic approach misses a lot. Many of the generic sites have unique flows the four generic strategies don't catch. I'm looking for people who want to:
- Verify which generic sites are actually succeeding vs. silently failing
- Add explicit broker definitions for high-value sites that are currently on the generic path
- Test on non-macOS (launchd scheduling is macOS-only; cron fallback would help Linux/Windows users)
- Handle email verification flows (script submits the form but can't click confirmation links in your inbox)
Repo: https://github.com/stephenlthorn/auto-identity-remove
No personal data in the repo — setup script prompts for your info locally and keeps it gitignored.
Does this current approach succeed for many sites? I see that this repo was clearly vibe coded or at least heavily used AI to write it. That can be fine, it just makes it more difficult to follow how much was done already and how much is left to get this properly working. As for email verification, a stopgap solution could be to just tell me to click confirm on the emails and which senders to look out for. Properly reading the actual inbox on record across providers could be difficult, it requires an actual email client. Also, forgive me if I'm off base on this one, but your comment appears to be AI generated. If so, that violates site guidelines.
> Don't post generated comments or AI-edited comments. HN is for conversation between humans.
1. It asks you to optionally sign up for a bunch of other services like Spokeo
2. It asks for access to your email via Apple's Mail app which I don't use
3. I got a lot of 404s anyway
4. Many sites require manual intervention to work
Nice idea, but it needs a LOT of TLC to make it generally useful. I suspect that having a non-numeric "zip" code and a non-US address might be breaking a lot of the automation.
Mail isn't documented as a requirement, but the first item in the Requirements section is "macOS (uses launchd for scheduling and Messages for iMessage)".
True. But Apple also enshittified the UI and they had an unforgivable data loss issue with Mail back in the Catalina days, which is why I switched to Thunderbird and haven't looked back.
Back in 2011 or so the Yellow Pages still delivered physical phone books to ever address in the state where we were. My city literally sent out an extra off cycle recycling truck the next day to pick them all up. Everyone threw them out.
Well my coworkers and I realized that the opt out form just needed an address. We contemplated pulling all known addresses for the entire country and automating submitting them all over several months to opt everyone out. I don’t think it ever materialized but we had a good chuckle about the emergency meeting the Yellow Pages web devs would have had and at what percentage of opt outs.
Around the same time frame, my brother rented some rooms in his house to people who had the occupation of actually delivering those phone books. (This was in a different country, but apparently the Yellow Pages existed everywhere.)
The delivery-people got overwhelmed and eventually just resorted to putting the stacks and stacks of phone books into piles and burning them. It took a long time until they got caught because nobody really misses a phone book.
Pre-internet the commercial phone book was actually fairly useful. The "problem" was that most people didn't need it updating as often as the phone book company would have liked.
> I don’t think it ever materialized but we had a good chuckle about the emergency meeting the Yellow Pages web devs would have had and at what percentage of opt outs.
They would just pretend they didn't receive the opt outs, like half of the direct mailers and spammers out there.
I've gone through the trouble of trying to get Uline to stop sending gigantic paper catalogs to my PO Box two or three times per year. They have a form, they just ignore the requests:
One day many years ago, I saw an item that I did an impulse buy on. It wasn't an ad, but just lame ol' bored surfing discovery. I never even saw the rest of the site the item was bought from. Later I started receiving printed catalogs from the site. It followed me through 3 moves, and I never used USPS forwarding. I assume the site eventually died as the catalogs just stopped showing up
I managed to get off uline’s list I think they have a phone number. If one ever orders from them again the process needs to be repeated. This was a physical address not a PO Box.
Similarly I remember being at Australia Post discussing data privacy for a project and I couldn't help but make the wisecrack remark "don't y'all routinely distribute millions of individual's personal data every year and just leave the information lying about on people's doorstops for anyone to access?"
I got tired of spammers having my information, so I built a tool that submits an up-to-date copy of my information to over 500 websites. Surely this will help.
Jokes aside, I unironically suspect the purpose of many opt-out forms is merely to record the up-to-date information.
Agreed. Any time I click an “Unsubscribe” link in an email, that takes me to a site where I have to provide my email or indeed, do anything more than click “confirm,” I leave. I assume it either resets some kind of consent trigger or sells my data to a new third-party vendor. The assumption of bad faith is now baked into my interaction with almost every corporate entity.
Sometimes the people who set up the email service just forget or don't bother to add the receivers email to the URL parameter when you click unsubscribe, so it'll ask for your email again which is always an annoying step.
I refuse to believe that “someone just forgot” to implement a user-friendly feature whose omission coincidentally benefits their company. It is not a coincidence, and it was not done unintentionally. The same way that it is not a coincidence that the “unsubscribe” link is always in six-point font the same color as the rest of the email footer. Code does not happen in a vacuum. Code does not get pushed to production without vetting and approval. As I say, the assumption of bad faith is baked in.
There are plenty of dark patterns in digital marketing, and you're generally right about the thinking.
But there is a (somewhat plausible) defense here: if someone forwards you an email and you hit the unsubscribe link, then it unsubscribes them; not you. Requiring the user to enter their email helps ensure you don't accidentally unsubscribe the wrong person.
That said — the most impactful thing anyone can do to punish dark pattern digital marketing behavior is to report the message as SPAM in your email client. That'll hurt their delivery rates and damage their sending reputation with email providers.
> But there is a (somewhat plausible) defense here: if someone forwards you an email and you hit the unsubscribe link, then it unsubscribes them; not you.
Pre-filling the address in the field is easy and prevents that. But if I get redirected to an empty address field, I immediately close and mark as spam. I refuse to reward that behavior.
I think they’re doing it because of your exact behavior: one-click unsubscribe links are easy to do even if you’re on mobile and aren’t giving the process your full attention. Making you enter your email is a barrier.
They already know your email, I don’t see why getting it again would sell it to a new vendor. Clicking an unsubscribe link already verifies you are a real person.
Very true, the act of unsubscribing itself signifies that the email is still live; more bad faith. As to why not sell it to a new vendor, because that would allow them to check a box that says “we offer a feature that allows users to opt out of data sharing agreement with the partners defined in the TOS and onboarding process.”
Thankfully most let you opt out with a single click - but if not I will put the whole domain in my killfile, so I won’t get any emails from them ever again
You can definitely outlaw this.
Under GDPR it’s much harder to lawfully collect and sell personal data large scale.
Not saying it doesnt happen still but it gives you a legal basis to fight against it — noyb.eu / Max Schremscand others do some excellent important work on that front
I suspect they mean that while a lot of sites put on a good show of pretending to give you a choice, they probably either ignore your wishes anyway or they make sure to ask you so often that occasionally you’ll forget and they’ll share your data with data brokers anyway.
The problem is currently being addressed at the low level. I think we have to go higher up the food chain and go after the data brokers.
recaptcha v3 will require the human to have a Google certified android device (i.e. no GrapheneOS or LineageOS etc.) and a dedicated iOS app which leaks device ID and other data.
Google will get to know every user browsing the web and link it to a smartphone. Since they’re rolling out government issue ID verification at the OS level, this change will allow Google to identify a random web visitor to a govt ID.
I don't see "device id" on that page, just "Identifiers"
I don't believe that iOS apps get a stable device ID. It may be that Google is generating one, and it can be shared across Google apps. But I'm pretty sure there is no global device ID that's common across all apps.
The original point stands that if you use recaptcha and you're signed in to a Google account on the device and you then use another Google app and you've done some form of identity verification with Google, then Google could link identity to recaptcha.
But I think the claim was over-broad about device ID.
The "device ID" part is probably false and a red herring. What actually matters is that google can correlate which challenges a given device is solving, so if it's solving 10k challenges per day, it can be marked as being suspicious.
No, on both android and ios device id implies some sort of identifier that's reusable across apps. Otherwise a uuid that you generate and write to storage could qualify as a "device id".
So, essentially a super cookie? That is, generated once (at random or arbitrarily) and then included with proof of work? But not a fingerprint or otherwise linked to identity?
But then that would not work against correlating fraud detection as sketched above. A client could simply reset the app every now and then to generate a new UUID.
>So, essentially a super cookie? That is, generated once (at random or arbitrarily) and then included with proof of work?
You're just describing a regular cookie.
>But not a fingerprint or otherwise linked to identity?
You'll have to reverse-engineer the app to figure out whether it's actually fingerprinting, and whether it's fingerprinting to make sure it's a real device (vs emulator) or it's fingerprinting to uniquely identify someone. I suspect they're complying with app store guidelines and not doing the latter, because it's not worth the PR hit to just to vaguely improve a product responsible for <1% of their revenue.
>But then that would not work against correlating fraud detection as sketched above. A client could simply reset the app every now and then to generate a new UUID.
The attestation result contains a count of attested keys generated in the past 30 days, which detects this case without a "supercookie" that persists across uninstalls.
Yes regular cookie from Google's perspective, but super in that it works across sites. If for some reason you don't just take Google's word you might suspect they collude and share / sell your identity to the site as well...
> The attestation result contains a count of attested keys generated in the past 30 days, which detects this case without a "supercookie" that persists across uninstalls.
Ah. So there is something special limiting control over the UUID? Or is there some way of correlating the physical device to the attestation history?
Why wouldn't I be able to reset and re-enroll in the app and then have it generate me a fresh new cookie attestation history?
>Yes regular cookie from Google's perspective, but super in that it works across sites. If for some reason you don't just take Google's word you might suspect they collude and share / sell your identity to the site as well...
That's just third party cookies.
>Why wouldn't I be able to reset and re-enroll in the app and then have it generate me a fresh new cookie attestation history?
You can get a new uuid, but then that'll be associated with a key that has a high attestation count, which is also suspicious. It's like detecting spam from an account that has 1000 posts in 1 hr vs an ip that created 1000 accounts in one hr making one post each. Both are suspicious.
I still don't get how those 1000 posts tallied with previous UUID would get correlated with the new UUID. If it's only source IP address or similar finger prints, those are relatively easy to get rid off, hide, renew.
(At least, when your goal is to do as many fake attestations as possible rather than use your device for something more useful)
You buy a new phone, install the app, and get an uuid with 0 attestation count. Now what? If you try to use that uuid to farm attestations, it'll be easily linked to that uuid. If you try to uninstall/reinstall, the attestation count will count up, eventually making making the newly created uuids immediately suspicious. You might try to create one uuid per month and then try to farm those indefinitely, but they could require you to reattest every month, which should come back with 0-1 attestations, but if you were farming uuids that'll be immediately caught.
I guess I just misunderstand what is being attested. Is the attestation proving that your randomly assigned UUID belongs to a human, or conversely, does a proof of work simply prove that your device "owns" some UUID?
You can characterize this commercial arrangement as whatever you want, but not meaningfully different than what they had before, where they were getting users to click boxes and charging businesses per "verification".
Its only Google's ReCaptcha that sucks, with its eternal gaslighting.
"Select stairs": okay, does that mean the railing too? And probably some percentage of people clicked rails, so now I have meta it and guess if that percentage is enough to throw off my guess.
"Select motorbike": okay, but you're showing me a bicycle. I'll click "skip". FAIL. TRY AGAIN. Sighs.. okay, I guess the average person is so dim-witted they will misidentify a bicycle for a motorbike.
It’s not just Google. Look at Arkose, which are not only difficult for humans to solve, they’re difficult for humans to even understand (“move the particle to the correct orbit”).
And the "correct" pictures all shows steps, not stairs.
> "Select motorbike"
And the "correct" pictures all show mopeds, not motorbikes.
Christ, don't get me stated on taxis that aren't black, fire hydrants that aren't a yellow H sign (apparently I'm supposed to look for something like a yellow painted R2D2) and WTF is a "crosswalk" (a pedestrian crossing?).
It is gaslighting me into thinking I gave the wrong answer.
> No, there are multiple accepted answers.
Nope, even for very simple things like "select all fire hydrants" (which are extremely obvious) or "select all images with cars" (with the images only being images completely devoid of cars or only cars, no lorries or busses), you still get a fail.
I assume you work on Captchas, which makes it extra cute you're trying to gaslight me about the built-in gaslighting :). It is really obvious too because it doesn't happen when not using a privacy browser and/or VPN.
At any rate, I hope you internalize that your work has made everyone's everyday life a little more miserable. A net negative to society.
I think my browsing habits may have changed, as I rarely see captchas. However, just the other day, my son was frustrated by one that he said had taken him fifteen or more tries, and he still hadn't succeeded.
Yeah, that is a very common complaint about Google's recaptcha. If they don't like you, they actually just send you through an infinite failure loop, even though you keep solving them correctly.
Creating a Windows service is a bit harder (as Windows actually uses a real API for services rather than just relying on process spawning and scripting around that), but with task scheduler you can schedule tasks to run once a month in all kinds of ways.
A Windows Service is something you (generally) want running 24x7. In fact I think a Windows Service seems very much like the wrong thing to do here. Services are not the only way to schedule things in Windows.
And you can "just" use nssm to wrap any arbitrary executable with what is needed to make it a windows service.
edit: Windows can use Node and Playwright just fine. I think the only thing this needs a Mac for is to schedule and send messages as an alert.
Services are executables, but they have dedicated entrypoints/"signals" for interaction with the service manager. That means you can't point a service at a batch file or powershell script, because those applications don't have the symbols to respond to the signalling from Windows.
The state tracking and manual fallback are the most interesting parts to me. For a tool like this, I’d really want a dry-run/audit mode that shows which fields would be submitted to which broker before anything is sent. The awkward threat model is that the tool reduces exposure, but a broken selector could also leak personal data to the wrong place.
I always get paranoid about these things ever since the Streisand Effect became a thing, I feel like the outcome is you enter a second list and this second list is maybe less friendly as it turns you into an outlier which brings other kinds of problems
It feels like the system is rigged and we need a better answer
No, the system is rigged in a different way. There is an actual data broker industry that's mostly hidden from view; and then there are hundreds upon hundreds of ephemeral "look up your neighbor" websites that you know about and end up submitting opt outs to, only for half of these websites to disappear and a new crop to show up in a year or two.
All of the quack around identity removal is essentially null and void in the United States. Companies have literally zero legal obligation to nuke your personal data and they will happily keep whatever you provide to them on file.
Until there is serious legislation like GDRP and right to be forgotten in the United States it's a non-starter
A few of these services ask you to go find your record among their lists first, so you can confirm which record you want removed using the URL of the record. So either it has to guess on that, or simply isn't doing it.
Has anyone had any luck deleting themselves from the data brokers who sell cell data to political texters and/or survey companies? Those are the ones I really want to opt out of
The mention of states is because (besides the author likely being located in the States) many of the opt out forms are US only and filter on US state. You could probably just use an uncommon state or territory like Guam and try it, it would still submit opt outs for matching records on sites that are international. For example https://www.familytreenow.com/optout is listed in the broker list, and that seems to work for international profiles.
Good point, could be a solid benchmark. Sites are adversarially built to resist automation and success is verifiable later when records actually disappear, so harder to game than WebArena.
Why not just comment out the macNotify() calls in watcher.js and then run it periodically? There are also a few calls to send iMessages that you should remove.
Isn't this just a way to confirm that your email is still active, and few miliseconds later you will be getting a lot of new spam from websites you never knew about?
I said it a couple of times already in this thread but the only thing that will stop these “businesses” is stronger data privacy regulation.
Sometimes it feels like US-Americans have lost all faith in their government’s ability to improve their lives -i can understand it but at the same time where will this lead?
I like the idea. But I stopped reading when I saw that I have to pay for an API, because it looks like advertising for it in npm package form.
Then I read HN, took a look at the code, GitHub, and found no website. Just an unknown author asking me to pass all my private data to a service to get it removed.
Everything is unclear, even if the intention might be good.
I must admit, I also have trust issues with services like Aura, NordProtect, or SurfShark. They sell you the same thing, plus more. Companies that collect all your information you don't want to see anywhere else. They might sell them or get breached.
I would love to see a do-one-thing-well, open-source alternative to them. But IMO this alternative must be super understandable and secure. Maybe npm and a (for me) unknown API is the wrong choice for that.
There are times where I immediately guess it, the recent mitchell post of AI psychosis was something that I recognized (which is now at 2k upvotes)
But there are other times where I am wrong too and I even comment on threads with less upvotes because the topic is so interesting yet my comment just ends up being isolated.
It's really more like a 50/50.
Even the one post of mine which had reached the front page of Hackernews was something that I absolutely knew could reach front page but then there weren't much responses for a few days but then after a few days, I saw that it was re-uploaded (I think that Hn selects a few submissions which are interesting, I forgot how that mechanism worked) and then I reached the front page of Hackernews ;)
Either way, I think people should just make what they feel is interesting but I remember reading some article once which said a few things which this article follows:
1. I built XYZ... gets more frontpage than we built XYZ...
2. having (Open source) in the title increases the chances too
This article has both of them so its definitely interesting to see it on front page, either way its an really interesting project :-D
I honestly find these kind of useless. I think a service that simply inserts thousands of bogus entries is way more valuable since a search is useless if it returns 100 addresses for where you live.
I use Optery for about two months a year, seems to do a good enough job for most of the data brokers. There are also discounts or promo codes to lower the price as well.
I tried Optery. It got a good chunk done in two months, then the rest were just pending for a year... until I cancelled. Felt like they were just keeping me on the monthly dole while they didn't do anything.
Where I need help: The heuristic approach misses a lot. Many of the generic sites have unique flows the four generic strategies don't catch. I'm looking for people who want to:
- Verify which generic sites are actually succeeding vs. silently failing - Add explicit broker definitions for high-value sites that are currently on the generic path - Test on non-macOS (launchd scheduling is macOS-only; cron fallback would help Linux/Windows users) - Handle email verification flows (script submits the form but can't click confirmation links in your inbox) Repo: https://github.com/stephenlthorn/auto-identity-remove No personal data in the repo — setup script prompts for your info locally and keeps it gitignored.
reply