> Better off how? What does "better" refer to here? Size on disk? Data transfer? Time? CPU cycles? Brain cycles? Body health? Please be specific.
Getting shit done. Git doesn't support checking blobs into git very well, so you're pissing into the wind trying to do that because of philosophical opinions. It won't help you deliver your business objectives at the end of the day/week/month/year.
> This strikes me as profoundly myopic (maybe even willfully).
Based on a decade of experience, you do not need that incredibly strict level of control over reproducibility.
The bigger problem is just that your dependencies will change with new version and you need to pull in those versions and deal with the tech debt of integration. Even if you sit down and perfectly and accurately solve the problem of exact reproducability of yesterday's builds, you will always have the problem of bumping versions and dealing with tomorrows builds. Dealing with the constant incoming flow of technical debt starts to dwarf any small problems caused by slightly inconsistent "rebuildability", while the cost of pursuing the perfect ability to rebuild starts to climb. What you're deaing with is a multi-objective optimization problem[1] where the cost function is the finite resource constraints available to you where the two problems cannot be solved 100% under those resource constraints and where overall velocity suffers if the costs become too great, so you are looking for something close to a Pareto optimal[2] tradeoff between the different objectives rather than strict perfection along any one axis.
But yeah, keeping things pretty consistent is useful for dealing with tomorrow's builds as well, that is why I suggested all the different lockfile approaches. The optimization problem is nonlinear and up to a point solving yesterday's build problem does help tomorrow's build problem as well.
> Just to emphasize: you are correct to say that this is a cost you always have, whether or not you're checking the results of package installation into your repo. Not checking them in doesn't ipso facto cause any issues related to upgrades to go away, and checking them in doesn't ipso facto increase the number of integration work you have to do. There's nothing magical about pulling the dependencies in over the network at the extreme last moment versus reading them from some other place on your own disk.
The problem is inherently in the bumping of versions, and pulling in new code from upstream and dealing with emerging tech debt and integration problem. Once you have builds reasonably reproducible (lockfiles, etc) then just dealing with all the new code you're pulling in dwarfs the benefits of iterating on _perfect_ reproducibility of yesterday's builds.
And in fact if you are checking your artifacts into git _and then not bumping them_ you are just kicking the can down the road and hiding your future tech debt. Yeah, you can try to freeze time and build on 8 year old versions of your build tools, but then one day you're going to have a business requirement that requires a newer toolchain (new O/S targets, new chipsets, whatever) and then you're stuck with solving 8 years of technical debt.
Thank you for your response. Please do not assume ill intent on my part; there's nothing condescending (or passive aggressive, etc.) in linking the word "control" to the article on scientific controls. (There is also no attempting to seek philosophical purity on my part at the expensive of productivity. I am _only_ interested in productivity and reason here.) However, I don't think you are thinking clearly. You're certainly not speaking clearly.
A reminder that the resolution up for debate is "Dependencies belong in version control". (Specifically, the author is making the case that they belong in the same version control system you're using to store your own application code which is relying on those dependencies—i.e. dependencies should should not be in e.g. some ZIP files "over there", nor should they be versioned in an orthogonal version control system like Cargo, NPM, RubyGems, etc—to the exclusion of your primary version control system. In other words, they should not be late-fetched from a package host immediately after cloning or at some point leading up to and including immediately prior to the build. The author is saying you should check them into your repo. By doing this, once you've cloned your repo, then you have the dependencies corresponding to the last version that you built/deployed/whatever at the time that you built/deployed it or whatever.)
I am not saying this to be condescending. I am saying this because it is important to be clear and for us to stay on topic rather than straying into totally unrelated territory, talking about totally unrelated things, or bringing hidden assumptions into the discussion, etc[1]. I believe that you are either responding to what could be reasonably characterized as perceived constraints that are unstated on my part (i.e. that I'm failing to disclose them but you understand them nonetheless—I'm not, and there are none there), or you are taking liberties in making assumptions of your own without stating them. This is bad, and it doesn't lead to clear thinking or useful conversation for the purpose of getting there.
> The problem is inherently in the bumping of versions, and pulling in new code from upstream and dealing with emerging tech debt and integration problem. Once you have builds reasonably reproducible (lockfiles, etc) then just dealing with all the new code you're pulling in dwarfs the benefits
Okay, that's nice, but... compared to what? How does late-fetching your dependencies from a third-party package repo sometime between cloning and build time solve the integration problem? (Alternatively, you can interpret this question as, "How does keeping a copy of those dependencies directly in your repo exacerbate integration problems?") Your position seems to hinge on some magical property where in the latter case, when your repo has its own copies of those dependencies and you don't have to fetch them in a separate step, integration somehow becomes hard. (NB: I'm not trying to be reductive. I'm trying to make sense of your position.) It doesn't; there is no such magical property. You were right the first time when you said "you _always_ have the cost of integration", with special emphasis on "always".
I'll repeat myself from before: if you're running into upgrade issues because of a change to a dependency, then not checking in your dependencies doesn't ipso fact solve any of those issues, and checking in your dependencies doesn't inexplicably cause any more. Where the bits are coming from—whether some location on your own disk, or streamed in at the last second over the network—doesn't matter in this regard. If there's a breaking change upstream, it's going to break no matter what.
(Am I being uncharitable? Am I misunderstanding what you're actually arguing for/against? I'm not saying this out of convenience for myself, that is, that it would be convenient for me if you were. It strikes me as something that nobody would actually advocate for because of how obviously untrue it is. Again, I'm trying to make sense of your position based on what you've actually said. This is exactly why I pre-emptively pleaded that you be specific in your response.)
> Based on a decade of experience
I don't know why you think this is relevant. I have two decades of experience to your one. I am not a neophyte programmer who e.g. picked up React last year and is stymied or frustrated by all this tooling and looking for excuses to assure myself that it's not really all that important after all. I was around before any of this tooling was. I adopted this tooling when it showed up. But then I had a realization one day and stopped and said, "Waitaminute. What the hell are we doing? What _actual_ problem, in concrete terms, are we saying this solves, exactly?" My position is that we've sleptwalked into adopting complex tooling for which we can't clearly articulate the tangible benefit it's supposed to deliver when we use it this way. I can't articulate "The NPM hypothesis" myself, and in my post-realization prodding, I've been unsuccessful at getting anyone else to, either. I covered this all in my first comment to this submission[2]. But most importantly, though, none of this matters.
You seem to be approaching this as if does matter—at least that's how I'm reading things. I could be wrong. (I'm just as open to the possibility that this is a poor reading as I am open to the possibility that I've overlooked something on the subject of package management—or anything else. I'm a programmer after all. We write bugs and have to debug them and are shown every single day the things that we were wrong about. There's no reason programmers shouldn't be among the most humble people in the world, having gotten used to consistently being told they're wrong like this.)
The reality is it doesn't matter. There are fact claims in play here that are within the realm of science, and they either true or they are untrue. We should be able to subject them to scrutiny. Science doesn't care who's asking the questions, nor should it.
Feel free to point out anything that I'm overlooking or anywhere I've misrepresented what you're saying. If that has happened, I hope you do.
Getting shit done. Git doesn't support checking blobs into git very well, so you're pissing into the wind trying to do that because of philosophical opinions. It won't help you deliver your business objectives at the end of the day/week/month/year.
> This strikes me as profoundly myopic (maybe even willfully).
Based on a decade of experience, you do not need that incredibly strict level of control over reproducibility.
The bigger problem is just that your dependencies will change with new version and you need to pull in those versions and deal with the tech debt of integration. Even if you sit down and perfectly and accurately solve the problem of exact reproducability of yesterday's builds, you will always have the problem of bumping versions and dealing with tomorrows builds. Dealing with the constant incoming flow of technical debt starts to dwarf any small problems caused by slightly inconsistent "rebuildability", while the cost of pursuing the perfect ability to rebuild starts to climb. What you're deaing with is a multi-objective optimization problem[1] where the cost function is the finite resource constraints available to you where the two problems cannot be solved 100% under those resource constraints and where overall velocity suffers if the costs become too great, so you are looking for something close to a Pareto optimal[2] tradeoff between the different objectives rather than strict perfection along any one axis.
But yeah, keeping things pretty consistent is useful for dealing with tomorrow's builds as well, that is why I suggested all the different lockfile approaches. The optimization problem is nonlinear and up to a point solving yesterday's build problem does help tomorrow's build problem as well.
> Just to emphasize: you are correct to say that this is a cost you always have, whether or not you're checking the results of package installation into your repo. Not checking them in doesn't ipso facto cause any issues related to upgrades to go away, and checking them in doesn't ipso facto increase the number of integration work you have to do. There's nothing magical about pulling the dependencies in over the network at the extreme last moment versus reading them from some other place on your own disk.
The problem is inherently in the bumping of versions, and pulling in new code from upstream and dealing with emerging tech debt and integration problem. Once you have builds reasonably reproducible (lockfiles, etc) then just dealing with all the new code you're pulling in dwarfs the benefits of iterating on _perfect_ reproducibility of yesterday's builds.
And in fact if you are checking your artifacts into git _and then not bumping them_ you are just kicking the can down the road and hiding your future tech debt. Yeah, you can try to freeze time and build on 8 year old versions of your build tools, but then one day you're going to have a business requirement that requires a newer toolchain (new O/S targets, new chipsets, whatever) and then you're stuck with solving 8 years of technical debt.
> 1. <https://en.wikipedia.org/wiki/Scientific_control>
Do you really think I'm dumb enough to need you to condescendingly cite this wikipedia page?
[1] https://en.wikipedia.org/wiki/Multi-objective_optimization
[2] https://en.wikipedia.org/wiki/Pareto_efficiency