Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

That's true provided you validate all UTF-8 input. As you probably know, UTF-8 has a lead byte that indicates how many of the following bytes are part of the same code point. If your code blindly believes what that first byte claims, then you could easily get tricked into reading beyond '\0'.


Yes, you should validate the UTF-8 at some point before you actually manipulate the contents of the string. Before that, you can just treat it as a pile of bytes.

You don't need to go down different paths of how you read those bytes based on the codepoint lengths until you're actually doing something with codepoints.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: