One interesting thing I learned while I tried to handle local time and NTP on a Raspberry Pico is that the TZ environment variable (that's usually something like "Etc/UTC" pointing to a zoneinfo file) can contain a TZ config as well, like "CET-1CEST,M3.5.0/2,M10.5.0/3". [1]
That unix domain socket solution sounds really nice. I wonder if it would be possible to send something naughty in the host header (like something with ../../.. in it) to misuse this or nginx does some validation before it reaches the proxy_pass...
I also tried to hack together my own solution [0] just for fun, but I didn't know about the unix socket part, so at the end I went with traefik and redis. :)
I updated the post late last night to address the security bits of the host header. Based on my understanding of nginx documentation and some brief testing, I don't think path traversal in the host header is possible -- nginx throws a 400 instead of a 502, which indicates it isn't making it to the proxy_pass yet. I think the $host variable is basically guaranteed to at least match the server_name regex block by the time it reaches the proxy_pass -- so to further tighten it up, you could only allow alphanumeric characters in your server_name regex.
I just checked out your solution and also learned a new trick about ssh! I didn't know that setting the port to 0 would cause dynamic allocation for the tunnel. It makes sense, I did know about that 0 behavior just in typical linux processes, but never thought to apply it to an ssh tunnel.
I'm by no means an expert, but it could be interference caused by other electronic devices, the direction of the ferrite rod antenna inside the clock (it cannot get signal if it's pointing in the direction of the transmitter), or maybe the battery running low in the clock could also cause something like this (based on that the Pico wasn't able to power the receiver module in my experiments).
I have a vague memory about a security talk where they used TXT records to deliver a payload to a machine and they had to write such code that the rows returned in the TXT records could be run in any order because the order the TXT records are returned is not deterministic.
After I first encountered with GPT4, I was pessimistic about the future of my job too. Now I think it will be a productivity multiplier in the right hands (or I hope). You can get proper answers from it if you somewhat know the domain. I can ask GPT4 about programming and I get mostly what I want, but I'm useless if I have to get a decent picture from DALL-E.
The question is what will companies do with this productivity multiplier. My pessimistic guess would be cost cutting and letting people go. A bit more optimistic view could be better software, better test coverage, improvement in code quality, more features quicker or maybe more native applications if companies only need to develop one native app and AI can generate the app for other platforms.
I appreciate seeing this article that I wrote 4 years ago resurface :) Happy to answer any questions about it or the tech behind it.
We also use a SQLite database, and in a manner similar to the OP’s article. We use it to track which content-defined chunks are at which offsets in which files on disk. We deduplicate those chunks on the CDN so you have to download less data when updating the game, but on disk we need to recreate the files as originally uploaded because that’s how games load them.
We’ve expanded the use of this tech quite a bit since the article. For example, my team has started using the patcher internally to manage our 2-million-file vendored code and binaries repo, as a replacement for Git LFS of sorts. Cloning the repo from scratch became 10 times faster, and so did switching branches, especially for remote devs, since the files are served through CloudFront.
Some of the more interesting work we’ve done recently has been optimizing the core content-defined chunking algorithm. The GearHash rolling hash algorithm from FastCDC is decently fast, but we were able to improve chunking speeds by 5x with vectorized implementations and using AES-NI instructions in a clever way to avoid the GearHash table lookups. It can do 11GB/sec per core now using AVX-512 instructions.
Another thing we did recently was build a tool for repacking Unreal Engine Pak files so game assets maintain a consistent layout across versions, similar to what garaetjjte mentioned in a comment above about UnityFS files. This reduced the disk I/O needed to update VALORANT by over 90% and cut update times by half for players. The combination of content-defined chunking with tooling to make game data files friendlier to differential patching can make a huge difference.
[1] https://sourceware.org/glibc/manual/2.42/html_node/TZ-Variab...