It... kinda didn't though. I mean, yes, it seems complicated when you look back and ask "why can't it just have been a flat space", etc... But the same point persists: you have to look at what came before.
Take a look at the complexities of 8080/Z80 or 6502 addressing modes, or the kind of tricks "big" 16 bit architectures like the PDP/11-70 were playing to stretch addressible memory. The 8086 was a breath of fresh air! Your code could be separate from your stack and from your heap, and all three could be a full 64k without any crazy bank switching or copying! And all you had to do was set up 4 segment selectors and then ignore them. And everything you were used to running did so in a clean, unconstrained environment.
Really, read that 8086 datasheet again (it's like 8 pages), it was great stuff at the time.
It also carries over to OS implementations. For example, many OS designed for the 386 failed to implement demand paging for which the CPU had some great support. I believe Windows continued using segmented addressing due to the backward compatibility with earlier releases of the OS.