> Go's string is guaranteed to be a series of bytes, not Unicode code points. I'...

Jasper_ · on Jan 14, 2020

I don't know how to explain it any simpler. Iterating over a str type in Python 3 enumerates Unicode code points. The length of a str type is the number of code points it contains. Reversing a str will reverse the Unicode code points it contains (not guaranteed to be a sane operation). Indexing into a str with foo[0] gives you back a str containing a single Unicode code point.

This is not an implementation detail, it is fundamental to how the str type in Python 3 operates. I have not talked at any point about the internal storage of this type, just the interface it publicly exposes.

takeda · on Jan 14, 2020

This is called a leaky abstraction. I can't find a good behavior for a high level language to do it this way. If you use index in a string you always will get something that's invalid, at least in Python or Java you get code points.

lmm · on Jan 14, 2020

Python 3 strs should not be iterated over, sure. Ban that in your linter, then you're in the same position you would be in Rust. It's a misfeature but it's still a detail.