> Managing multiple network upstreams (e.g. for network failover or load balanci...

jcgl · 2026-03-13T07:49:57 1773388197

> * Hosts react to the change by reconfiguring via SLAAC and/or DHCPv6, depending on the settings in the RA

This is the linchpin of the workflow you've outlined. Anecdotal experience in this area suggests it's not broadly effective enough in practice, not least because of this:

> * Existing client connections are still dead, [1] but the host gets to know that their global IP address has changed and has a chance to take action, rather than being entirely unaware

The old IP addresses (afaiu/ime) will not be removed before any dependent connections are removed. In other words, the application (not the host/OS) is driving just as much as the OS is. Imo, this is one of the core problems with the scenario, that the OS APIs for this stuff just aren't descriptive enough to describe the network reconfiguration event. Because of that, things will ~always be leaky.

> [1] I think whether they die slow or fast depends on how the router is configured

Yeah, and that configuration will presumably be sensitive to what caused the failover. This could manifest differently based on whether upstream A simply has some bad packet loss or whether it went down altogether (e.g. a physical fault).

In any case, this vision of the world misses on at least two things, in my view:

1. Administrative load balancing (e.g. lightly utilizing upstream B even when upstream A is still up

2. The long tail of devices that don't respond well to the flow you outlined. It's not enough to think of well-behaved servers that one has total control over; need to think also of random devices with network stacks of...varying quality (e.g. IOT devices)

simoncion · 2026-03-13T09:12:19 1773393139

> The old IP addresses (afaiu/ime) will not be removed before any dependent connections are removed.

I have two reactions to this.

1) Duh? I'm discussing a failover situation where your router has unexpectedly lost its connection to the outside world. You'd hope that your existing connections would fail quickly. The existence of the deprecated IP shoudn't be relevant because the OS isn't supposed to use it for any new connections.

2) If you're suggesting that network-management infrastructure running on the host will be unable to delete a deprecated address from an interface because existing connections haven't closed, that doesn't match my experience at all. I don't think you're suggesting this, but I'm bringing it up to be thorough.

> ...the OS APIs for this stuff just aren't descriptive enough to describe the network reconfiguration event.

I know that Linux has a system (netlink?) that's descriptive enough for daemons [0] to actively nearly-instantaneously start and stop listening on newly added/removed addresses. I'd be a little surprised if you couldn't use that mechanism to subscribe to "an address has become deprecated" events. I'd also be somewhat surprised if noone had built a nice little library over top of whatever mechanism that is. IDK about other OS's, but I'd be surprised if there weren't equivalents in the BSDs, Mac OS, and Windows.

> In any case, this vision of the world misses on at least two things, in my view:

> 1. Administrative load balancing...

I deliberately didn't talk about load balancing. I expect that if you don't do that at a layer below IP, then you're either stuck with something obscenely complicated or you're doing something like using special IP stacks on both ends... regardless of what version of IP your clients are using.

> 2. The long tail of devices that don't respond well to the flow you outlined.

Do they respond worse than in the IPv4 NAT world? This and other commentary throughout indicates that you missed the point I was making. That point was that -unlike in the NATted world- the OS and the applications running in it have a way to plausibly be informed of the network addressing change. In the NAT case, they can only infer that shit went bad.

[0] ...like BIND and NTPd...

jcgl · 2026-03-13T13:08:20 1773407300

> 1) Duh? I'm discussing a failover situation where your router has unexpectedly lost its connection to the outside world. You'd hope that your existing connections would fail quickly. The existence of the deprecated IP shoudn't be relevant because the OS isn't supposed to use it for any new connections.

Well failover is an administrative decision that can result from unexpectedly losing connection. But it can also be more ambiguous packet loss too, something that wouldn't necessarily manifest in broken connections--just degraded ones.

If upstream A is still passing traffic that simply gets lost further down the line, then there's no particular guarantee that the connection will fail quickly. If upstream A deliberately starts rejecting TCP traffic with RST, then sure, that'll be fine. But UDP and other traffic, no such luck. Whereas QUIC would fare just fine with NAT thanks to its roaming capabilities.

> I know that Linux has a system (netlink?) that's descriptive enough for daemons [0] to actively nearly-instantaneously start and stop listening on newly added/removed addresses. I'd be a little surprised if you couldn't use that mechanism to subscribe to "an address has become deprecated" events. I'd also be somewhat surprised if noone had built a nice little library over top of whatever mechanism that is. IDK about other OS's, but I'd be surprised if there weren't equivalents in the BSDs, Mac OS, and Windows.

Idk, I'll have to take your word for it. Instinctively though, this feels like a situation where the lowest common denominator wins. In other words, average applications aren't going to do any legwork here. The best thing to hope for is for language standard libraries to make this as built-in as possible. But if that exists, I'm extremely unaware of it.

> I deliberately didn't talk about load balancing. I expect that if you don't do that at a layer below IP, then you're either stuck with something obscenely complicated or you're doing something like using special IP stacks on both ends... regardless of what version of IP your clients are using.

I presume you meant a layer above IP? But no, I don't see why this is challenging in a NAT world. At least, I've worked with routers that support this, and it always seemed to Just Work™. I'd naively assume that the router is just modding the hash of the layer 3 addresses or something though.

> Do they respond worse than in the IPv4 NAT world?

I've basically only ever had good experiences in the IPv4 NAT world.

> That point was that -unlike in the NATted world- the OS and the applications running in it have a way to plausibly be informed of the network addressing change. In the NAT case, they can only infer that shit went bad.

I'm certainly sympathetic to this point. And, all things being equal, of course that seems better! If NAT66 were to not offer sufficient practical benefits, then I'd be convinced.

But please bear in mind that this was the original comment I responded to (not yours). Responding to this is where I'm coming from:

> Why would IPv6 ever need NAT?