When WebRTC was being developed, a lot of the underlying protocols were derived from existing standards (RTP, SDP, etc), ostensibly with the intent of making it easier to bridge to legacy video-conference systems that may have been running SIP or similar.
Beyond that, there is a ton of stuff to accommodate different device capabilities, network conditions, realities of the Internet, etc. WebRTC has tried to be a "nice thing" for developers from an API perspective for the use-case of streaming realtime video to the browser, but of course all those knobs under the hood make it seem like a hodgepodge, rightly or not.
Would a clean-slate design with more tightly scoped goals fare better? Probably. But the underlying complexity needs to be handled somewhere and there are always trade-offs between control and ease-of-use.
Beyond that, there is a ton of stuff to accommodate different device capabilities, network conditions, realities of the Internet, etc. WebRTC has tried to be a "nice thing" for developers from an API perspective for the use-case of streaming realtime video to the browser, but of course all those knobs under the hood make it seem like a hodgepodge, rightly or not.
Would a clean-slate design with more tightly scoped goals fare better? Probably. But the underlying complexity needs to be handled somewhere and there are always trade-offs between control and ease-of-use.