Repository data bytes does not show up as string literals in your code, or keywo...

reubenmorais · on Jan 13, 2020

Python 3's approach means bytes/str poisons the whole expression. So if you want to do something like:

"%s/%s" % (repository_data_1, repository_data_2)

And have it work on Python 2 and 3, you're screwed.

luhn · on Jan 13, 2020

And Python 3's behavior is more correct—You can't just intermix binary and textual data, they're two different things. Python 2 would let you do that, and it would often cause subtle bugs with non-ASCII data. Python 3 requires you to encode/decode, so you're working consistently and explicitly with binary or text.

I don't quite understand your example. `b'%s/%s' % (b'abc', b'def')` works in both 2 and 3. So does `u'%s/%s' % (b'abc'.decode('utf8'), b'def'.decode('utf8'))`, if you wanted to get a unicode string out of it.

reubenmorais · on Jan 14, 2020

> I don't quite understand your example. `b'%s/%s' % (b'abc', b'def')` works in both 2 and 3. So does `u'%s/%s' % (b'abc'.decode('utf8'), b'def'.decode('utf8'))`, if you wanted to get a unicode string out of it.

We're discussing the linked article, so I'm talking in the context of the linked article. I know it works now, but Python 3 initially removed %-formatting for bytes. I guess I should have used past in my comment, "you were" screwed instead of "you are". From the article:

> Another feature was % formatting of strings. Python 2 allowed use of the % formatting operator on both its string types. But Python 3 initially removed the implementation of % from bytes. Why, I have no clue. It is perfectly reasonable to splice byte sequences into a buffer via use of a formatting string. But the Python language maintainers insisted otherwise. And it wasn't until the community complained about its absence loudly enough that this feature was restored in Python 3.5, which was released in September 2015.

pdonis · on Jan 14, 2020

> Python 3's behavior is more correct—You can't just intermix binary and textual data, they're two different things.

Python 3's behavior as far as forcing you to explicitly recognize data type conversions is more correct, yes.

Python 3's behavior in assuming that nobody would ever need to do "text-like" operations like string formatting on byte sequences was not. At least this particular wart was fixed. But there are still a lot of places where Python makes you use the str "textual" data type when it's not the right one.

Python 3's behavior in making individual elements of a byte string integers instead of length-one byte strings is, frankly, braindead.

acdha · on Jan 14, 2020

That example works fine in both Python 2 and 3 if you’re not mixing types incorrectly. If you are, it will appears to work on Python 2 before failing the first time you encounter non-ASCII data and tends to greatly confuse people with errors which would have been caught immediately on Python 3. I’ve seen teams waste hours trying to track down errors like that.

zo1 · on Jan 14, 2020

Exactly this. The amount of times I saw juniors fixing thses sort of obscure subtle bugs with str_var.decode("utf-8").encode("latin-1") and this after attempting every which combination of the above two de/encode operations is mind boggling.

reubenmorais · on Jan 14, 2020

It works after Python 3.5. From the article:

> Another feature was % formatting of strings. Python 2 allowed use of the % formatting operator on both its string types. But Python 3 initially removed the implementation of % from bytes. Why, I have no clue. It is perfectly reasonable to splice byte sequences into a buffer via use of a formatting string. But the Python language maintainers insisted otherwise. And it wasn't until the community complained about its absence loudly enough that this feature was restored in Python 3.5, which was released in September 2015.