One key to know is that encoding (UTF-8, UTF-16, UTF-32) is a completely separate problem from rendering text. I have had a couple people say to me recently something along the lines of, "We don't need text shaping since UTF-8 takes care of it." That isn't remotely true. An encoding gets you a series of Unicode code points. To render this, these code points must get the bidirectional algorithm applies (bidi) and then these "runs" from the bidi algoritm are then shaped. The text shaper uses OpenType tables within the font to convert these code points into a series of glyph indices with x/y offsets. The renderer then works entirely on glyphs, which might not even map back to a code point in the font.
The HarfBuzz manual touches on some of this: https://harfbuzz.github.io/why-do-i-need-a-shaping-engine.ht...