Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

It defaults to the system encoding. I don't use Python on Windows, but Windows evolved its default encoding over time, the code pages were popular in Windows 9x, starting with NT based (2000, XP...) They used UTF-16 I believe and then Windows 7? It became UTF-8. Perhaps Python needs to be updated to reflect that?

You can also specify encoding when calling open.



> Windows evolved its default encoding over time, the code pages were popular in Windows 9x, starting with NT based (2000, XP...) They used UTF-16 I believe and then Windows 7? It became UTF-8.

They bolted on a separate set of functions that took UCS-2 and now take UTF-16.

The actual code pages, to this day, are legacy things that are mostly 8 bits. My system is set to code pages 437 and 1252, for example.

They put together a code page for UTF-8 but it's behind a 'beta' warning.


> They bolted on a separate set of functions that took UCS-2 and now take UTF-16.

NT actually bolted on 8-bit versions of the native Unicode functions. FooBarA is a wrapper around FooBarW.

> They put together a code page for UTF-8 but it's behind a 'beta' warning.

Codepage 65001 has been a thing for quite a while. It's just that it's variable-width per character and few applications are ready to handle that when they assume a 1:1 or 2:1 relationship between bytes and characters. It does work sort of for applications that don't do too weird stuff to text, though, and can be a useful workaround in such cases to get UTF-8 support into legacy applications.

But in general, Windows is UTF-16LE and the code pages are indeed legacy cruft that no application should touch or even use. Sadly much software ported from Unix-likes notices »Hey, there's a default encoding in Windows too, so let's just use that«.


The default file encoding for Windows was changed to UTF-8 in Python 3.6. That particular problem on that particular platform is now a thing of the past.

It was just an example of why implicit conversions in the standard library functions don't save you from having to think about encodings. You get much more robust and user-friendly programs when you explicitly consider your encodings and the error-handling strategies to go with them.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: