Love Flask, but this has always been a missing tool. I have a question though - it seems like you're actually modifying the response data type for Flask routes so that it's a Pydantic model. Is that an optional approach? While I wish that were the official standard, if it is not optional then I think that's quite a big ask for maintainers of existing APIs who want to use your docs library. Regardless, I'm looking forward to trying it out! Looks great.
Actually, returning a Pydantic model directly isn't mandatory—it's just a recommended and convenient approach to ensure automatic data validation and documentation.
If you prefer, you can keep your existing route handlers as-is, returning dictionaries or other JSON-serializable objects. FastOpenAPI will handle these just fine. But using Pydantic models provides type safety and cleaner docs out of the box.
Can somebody please eli5 why it is so unanimously accepted that Python's package management is terrible? For personal projects venv + requirements.txt has never caused problems for me. For work projects we use poetry because of an assumption that we would need something better but I remain unconvinced (nothing was causing a problem for that decision to be made).
In your requirements.txt, do you pin the concrete versions or leave some leeway?
If you aren't precise, you're gonna get different versions of your dependencies on different machines. Oops.
Pinning concrete versions is of course better, but then there isn't a clear and easy way to upgrade all dependencies and check whether ci still passes.
You should use freeze files. Whatever language you are using, you should specify your dependencies on the loosest way possible, and use freeze files to pin them down.
The only difference from one language to another is that some make this mandatory, while in others it's only something that you should really do and there isn't any other real option you should consider.
- Anyone with dependencies on native/non python libraries.
Conda definitely helps with 2 and 3 above, and uv is at least a nice, fast API over pip (which is better since it started doing dependency checking and binary wheels).
More generally, lots of the issues come from the nature of python as a glue language over compiled libraries, which is a relatively harder problem in general.
There are no Windows-specific issues in venv + pip. Windows can be more painful if you need to compile C extensions, but you usually don't, since most commonly used packages have had binary wheels for Windows on PyPI for many years.
Installing anything in Windows is a system-specific problem.
If a pypi package has binary dependencies, it's a pip problem on Windows, if it depends on APIs that aren't installed by default, it's a pip problem on windows, if it depends on specific versions of something that isn't in python, it's a pip problem on Windows. If it depends on some API that the implementation is provided for free, but not freely, and is full of constrains on how you can install, it's a pip problem on Windows.
Most other package managers have the exact same problems. But some of the python alternatives that people often don't understand the point were created exactly to solve those problems.
For using packages, venv + requirements.txt works, but is a bit clunky and confusing. Virtual environments are very easy to break by moving them or by updating your OS (and getting a new Python with it). Poetry is one alternative, but there are far too many options and choices to make. For building packages, there are similarly many competing options with different qualities and issues.
i think there might be merit to gdiamos's point that python is a popular language with a large number of users, and this might mean that python package management isn't unusually bad, but more users implies more complaints.
i think there was a significant step change improvement in python packaging around 2012, when the wheel format was introduced, which standardised distributing prebuilt platform-specific binary packages. for packages with gnarly native library dependencies / build toolchains (e.g. typical C/fortran numeric or scientific library wrapped in a layer of python bindings), once someone sets up a build server to bake wheels for target platforms, it becomes very easy to pip install them without dragging in that project's native build-from-source toolchain.
venv + pip (+ perhaps maintaining a stack of pre-built wheels for your target platform, for a commercial project where you want to be able to reproduce builds) gets most of the job done, and those ingredients have been in place for over 10 years.
around the time wheel was introduced, i was working at a company that shipped desktop software to windows machines, we used python for some of the application components. between venv + pip + wheels, it was OK.
where there were rough edges were things like: we have a dep on python wrapper library pywhatever, which requires a native library libwhatever.dll built from the c++ whatever project to be installed -- but libwhatever.dll has nothing to do with python, maybe its maintainers kindly provide an msi installer, so if you install it into a machine, it gets installed into the windows system folder, so venv isn't able to manage it & offer isolation if you need to install multiple versions for different projects / product lines, as venv only manages python packages, not arbitrary library dependencies from other ecosystems
but it's a bit much blame python for such difficulties: if you have a python library that has a native dependency on something that isnt a python package, you need to do something else to manage that dep. that's life. if you're trying to do it on windows, which doesn't have an O/S level package manager.. well, that's life.
Try building a package and you will get hundreds of little paper cuts. Need a different index for some packages? It will work with a cli "pip install -from-index", but pip will not let you add an index in a requirement.txt for... security reasons. That means, good luck trying to "enforce" the CUDA version of pytorch without using third party tooling. So you either hard code a direct link (breaks platform portability), as that will work, or give up trying to make your project installable with "Pip install " Or "python build". Remember, pytorch basically has no CUDA builds anymore in its pypi index and no way to get CUDA torch from there (but I think this might have changed recently? )
Oh, and if some package you are using has a bug or something that requires you to vendor it in your repo, well then good luck because again, PEP 508 does not support installing another package from a relative link. You either need to put all the code inside the same package, vendored dependency included, and do some weird stuff to make sure that the module you wanted to vendor is used first, or... you just have to use the broken package, again for some sort of security reasons apparently.
Again, all of that might even work when using pip from the cli, but good luck trying to make a requirements.txt or define dependencies in a standard way that is even slightly outside of a certain workflow.
And have them build with "Pip install ." or python build? With the default setuptools config (or even any tweaks to it)? It works until you actually try to package the app, that's where the edge cases start piling up, a lot of them due to very weird decisions made on a whim on some random discourse thread.
Adding index URLs is explicitly not supported in the requirements.txt in setuptools or the default python build tool.
venv + requirements.txt has worked for every single python project I made for the last 2 years (I'm new to python). Only issue I had was when using a newish python version and not having a specific library released yet for this new version, but downgrading python solved this.
Being new to the ecosystem I have no clue why people would use Conda and why it matters. I tried it, but was left bewildered, not understanding the benefits.
The big thing to realise is that when Conda first was released it was the only packaging solution that truly treated Windows as a first class citizen and for a long time was really the only way to easily install python packages on Windows. This got it a huge following in the scientific community where many people don't have a solid programming/computer background and generally still ran Windows on their desktops.
Conda also not only manages your python interpreter and python libraries, it manages your entire dependency chain down to the C level in a cross platform way. If a python library is a wrapper around a C library then pip generally won't also install the C library, Conda (often) will. If you have two different projects that need two different versions of GDAL or one needs OpenBLAS and one that needs MKL, or two different versions of CUDA then Conda (attempts to) solve that in a way that transparently works on Windows, Linux and MacOS. Using venv + requirements.txt you're out of luck and will have to fall back on doing everything in its own docker container.
Conda lets you mix private and public repos as well as mirroring public packages on-perm in a transparent way much smoother than pip, and has tools for things like audit logging, find grained access control, package signing and centralised controls and policy management.
Conda also has support for managing multi-language projects. Does your python project need nodejs installed to build the front-end? Conda can also manage your nodejs install. Using R for some statistical analysis in some part of your data pipeline? Conda will mange your R install. Using a Java library for something? Conda will make sure everybody has the right version of Java installed.
Also, it at least used to be common for people writing numeric and scientific libraries to release Conda packages first and then only eventually publish on PyPi once the library was 'done' (which could very well be never). So if you wanted the latest cutting edge packages in many fields you needed Conda.
Now there are obviously a huge class a projects where none of these features are needed and mean nothing. If you don't need Conda, then Conda is no longer the best answer. But there are still a lot of niche things Conda still does better than any other tool.
> it manages your entire dependency chain down to the C level in a cross platform way.
I love conda, but this isn't true. You need to opt-in to a bunch of optional compiler flags to get a portable yml file, and then it can often fail on different OS's/versions anyway.
I haven't done too much of this since 2021 (gave up and used containers instead) but it was a nightmare getting windows/mac builds to work correctly with conda back then.
it was a nightmare getting windows/mac builds to work correctly
I think both statements can be true. Yes getting cross platform windows/Mac/Linux builds to work using Conda could definitely be a nightmare as you say. At the same time it was still easier with Conda than any other tool I've tried.
I can sort of the the argument, if you really really need to lock down your dependencies to very specific version, which I don't recommend you do.
For development I use venv and pip, sometimes pyenv if I need a specific Python version. For production, I install Python packages with apt. The operating system can deal with upgrading minor library versions.
I really hate most other package managers, they are all to confusing and to hard to use. You need to remember to pull in library update, rebuild and release. Poetry sucks too, it's way to complicated to use.
The technical arguments against Python packages managers are completely valid, but when people bring up Maven, NPM or even Go as role models I check out. The ergonomics of those tools are worse than venv and pip. I also think that's why we put up with pip and venv, they are so much easier to use than the alternative (maybe excluding uv). If a project uses Poetry, I just know that I'm going to be spending half a day upgrading dependencies, because someone locked them down a year ago and there's now 15 security holes that needs to be plugged.
No, what Python needs is to pull in requests and a web framework into the standard library and then we can start build 50% of our projects without any dependencies at all. They could pull in Django, it only has two or three dependencies anyway.
I have only ever really used venv. Poetry was fine but didn't give me any additional benefits that I could see. What does this offer? And more broadly, why do people consider pip to be a problem? I have literally never had any issues with it in any of my projects.
Personally, I use it for everything right now. It's faster to do `uv init` and then add your dependencies with `uv add` and than just `uv run <whatever>`. You can argue that poetry does the same, but `uv` also has a pipx alternative, which I find myself using more than the package manager that my distro offers, since I never had compatibility issues with packages.
The main advantage these tools (poetry, pipenv, uv, ...) offer is, that they let you create lock files, which make your venv reproducible. Without that, you are kind of living in the wild west, where tomorrow you project can break. These tools help with projects, that are supposed to do more than "runs on my machine".
Maybe I'm not fully grasping lock files then, but why isn't it sufficient to just pin the versions in the requirements.txt? Obviously it doesn't handle the python version itself but I just use pyenv for that. So my stance is just pyenv + venv seems to solve those problems. But then I see people singing the praises of these newer tools and I wonder what I am not getting
I was in the same position myself and had to learn the answers myself. It still doesn't really matter very much for me - I do more library than application development, and my applications would probably generally be fine with a wide range of dependency versions - if they even have dependencies. In short, different people have very different use cases, and you might simply not have the use cases that drive so many others to the tools in question. But it's still useful to understand the theory.
>why isn't it sufficient to just pin the versions in the requirements.txt?
Because your dependencies have dependencies.
You can, in fact, pin those as well explicitly, and as long as what you pin is a valid solution, Pip will (to my understanding; I haven't done a thorough, explicit test) happily grab exactly what you asked for. And as long as the version number is enough information, that will work for your application.
But some people also want to ensure they use specific exact builds of a package, and verify their hashes. Some of them might be using private indexes (or even mixing and matching with PyPI) and need to worry about supply chain attacks. And at any rate they don't want to account for all the transitive dependencies manually.
In principle, a lock file is a record of such a solution, including all the transitive dependencies, plus file hashes, etc. There isn't perfect agreement about what needs to be in them, which is why discussion of a standard lock file format has been tried a few times and is still ongoing (the current effort is https://peps.python.org/pep-0751/ ; see the linked threads for just some of the related discussion of the concept - there is much more, in the abstract).
Just to give a concrete example, it helped me with my projects involving langchain and several langchain extensions. I have a clear and consistent source of version record for each dependency.
And you can use uv as a full replacement for that with less bugs, faster performance and just generally a more pleasant experience. pip-compile -> uv pip compile and pip-sync -> uv pip sync.
(Though I think the high level interface is the better thing to use)
"existing tools" -- What do you mean by that? pip-tools [1] seem to be just another PyPI package. What makes them more available than any of the other tools? The other tools "exist" as well.
You don't have to 'activate' anything if you don't want to. The bin/ directory inside your venv contain the binaries and scripted entrypoints for the packages installed in your virtualenv.
uv init; uv add requests and you automatically get environment that can be easily shared between team members with predictable locking, with no “source .venv/bin/activate” bullshit.
Activating the venv only changes some environment variables. You can, as explained before, use the environment's Python executable directly instead. Activation does allow other tools to use the venv without having to know anything about it. The point is that you aren't then putting `uv run` in front of every command, because you don't need to have an integrator program that's aware of the venv being there.
If you had a bad experience with a different locking tool, sorry to hear it, but that has absolutely nothing to do with the venv itself nor its activation script.
Referring to the entire category in general: mainly, it offers to keep track of what you've installed, and help you set up reproducible scripts to install the same set of dependencies. Depending on the specific tool, additional functionality can vary widely. (Which is part of why there's no standard: there's no agreement on what the additional functionality should be.)
> why do people consider pip to be a problem?
Many problems with Pip are really problems with the underlying packaging standards. But Pip introduces a lot of its own problems as well:
* For years, everyone was expected to use a workflow whereby Pip is copied into each new venv (this is surprisingly slow - over 3 seconds on my machine); users have accidentally invented a million different ways (mainly platform-specific) for `pip` to refer to a different environment than `python` does, causing confusion. (For a while, Setuptools was also copied in by default and now it isn't any more, causing more confusion.) You don't need to do this - since 22.3, Pip has improved support for installing cross-environment, and IMX it Just Works - but people don't seem to know about it.
* Pip builds projects from sdists (thereby potentially running arbitrary code, before the user has had a chance to inspect anything - which is why this is much worse than the fact that the library itself is arbitrary code that you'll import and use later) with very little provocation. It even does this when you explicitly ask it just to download a package without installing it; and it does so in order to verify metadata (i.e., to check that building the sdist would result in an installable wheel with the right name and version). There's wide consensus that it doesn't really need to do this, but the internals aren't designed to make it an easy fix. I have an entire blog post in my planned pipeline about just this issue.
* Pip's algorithm for resolving package dependencies is thorough at the cost of speed. Depending on the kind of packages you use, it will often download multiple versions of a package as sdists, build them, check the resulting metadata, discover that this version isn't usable (either it doesn't satisfy something else's requirement, or its own requirements are incompatible) and try again. (This is partly due to how Python's import system, itself, works; you can't properly support multiple versions of the same library in the same environment, because the `import` syntax doesn't give you a clean way to specify which one you want.)
* Because of how the metadata works, you can't retroactively patch up your metadata for old published versions because e.g. you found out that you've been using something that's deprecated in the new Python release. This has especially bad interactions with the previous point in some cases and explaining it is beyond the scope of a post here; see for example https://iscinumpy.dev/post/bound-version-constraints/ for a proper explanation.
But the main reason why you'd use a package manager rather than directly working with Pip, is that Pip is only installing the packages, not, well, managing them. Pip has some record of which packages depend on which others, but it won't "garbage-collect" for you - if something was installed indirectly as a dependency, and then everything that depends on it is removed, the dependency is still there. Further, trying to upgrade stuff could, to my understanding, cause breakages if the dependency situation is complex enough. And above all of that, you're on your own for remembering why you installed any given thing into the current environment, or figuring out whether it's still needed. Which is important if you want to distribute your code, without expecting your users to recreate your entire environment (which might contain irrelevant things).
I tried to follow along with the textbook before but really struggled with the practical side - R is just another world in terms of dependency management and organisation/documentation (compared to python at least). The book had me install some version of a library that was since unsupported. So I thought I would be a nerd and do everything in python instead, but there I had other problems installing pymc. After some hours of failing I just gave up. Can anyone speak to the state of the dependencies in this edition? Has everything been updated? Versions listed? Would love to give this another shot
I happened to install everything two days ago. R version 4.3.3 (I use RSwitch to switch between R versions on Mac). You should use REnv for dependency management. There were no problems installing the rethinking package, the Cmdrstan package just needed to be installed with devtools instead of install.packages.
I’m mostly a Python guy, and didn’t find it particularly hard to get this going. Although I’m always left scratching my head when using RStudio/Renv/R. It’s such a horrible environment (always hanging, crashing, slow, the tooling sucks ass). I refuse to believe that I’m the only person who has RStudio hang and require a restart or get stuck on some uninterruptible process and requires forcing killing it at least once a day.
> require a restart or get stuck on some uninterruptible process and requires forcing killing it at least once a day.
Yes, I think I've been trained by crashes to subconsciously limit interactions with the RStudio GUI while something is running, e.g resizing a window seems to be surefire way to cause a crash.
When I was working on the exercises, I found the Rocker project (https://rocker-project.org/) + DevContainers in VSCode to be a winning combination.
Combined with OrbStack (for Docker on MacOS) and Quarto (which is a nice Markdown-based alternative to Jupyter) I would go so far as to call the experience pleasant.
I don't remember running into version-related problems. Maybe I didn't make it as far in the book as you.
I recommend using biocmanager to install all R packages- it is very good at automatically resolving dependency issues. The built in R stuff just isn’t great… but there are a few extremely good 3rd party systems
Ultimately there is no good solution- really in any language- that I know of for using old unmaintained packages on a modern version of the language.
There is a version of everything in the book reimplemented in rStan, which is a fairly easy to install and well supported R package that wraps Stan. I don’t have the link but should be easy to google.
I think it’s a magnificent book - definitely repays the time to work though in detail.
Sure there are alternatives and I agree with the author's criticisms overall. But boxplots are a staple in statistics, and if your audience can reasonably be assumed to have some level of statistical training then boxplots are perfectly reasonable in my opinion.
Are you sure that well trained audiences are able to accurately asses box plots. For instance, most drivers think they are better than average drivers.
It being a staple in statistics is also not a good argument. The information conveyed through box plots is used in lots of fields with different education backgrounds. If a visualization, which in itself is a human simplification of data, is hard to understand, it will be misunderstood by some. This means these people will not be able to advance their field of research as well as with better visualization methodologies.
Would you care to address the specific argument that the author makes about not using box plots with audiences? I swear, statisticians are among the most inertia-prone groups of people that I’ve ever worked with. You need a certain degree of “do it this way because it’s done this way” to deal with the amount of BS going on in this field.