I love gnu parallel, but that citation thing is a bit of a drag on parallel. You only have to do this once, but I see why you'd write a blog post to include the flag and not have to discuss the issue.
I wish Ole would relax on this. It's not really that appropriate to demand that anyone using parallel for academics cite a magazine article. It's only appropriate for people doing research in parallel algorithms, and in any case it would make citing easier if there was a journal article to reference.
There's nothing wrong with asking nicely, and I try to spread the word about gnu parallel. But the heavy handed demand is annoying, even if I only deal with it like once a year.
Oh, I've seen the --tollef option in the help doc, but I still didn't realize there was a complete alternative to gnu parallel. Definitely will have to try.
Mr. Hess makes a lot of good points, most I think I agree with, so I can sympathize with his frustration.
The point about transferring files seems interesting. I haven't used gnu parallel in remote/ssh mode aside from simple tests, but that seems wicked useful for some situations. Is there a real and simple alternative to running gnu parallel on multiple remote hosts? Is that something the moreutils version can also do?
The pypi page for pssh links to an old (dead?) Google Code repository. Do you know if the project is still being maintained?
For most uses I find that ansible ad-hoc commands give a nice balance -- just a tiny bit of learning curve, and only a little bit different than typing the single SSH command that you want to run.
I use Linux but I've never used parallels. At first, I thought you were making that up.
But, I looked at your username and realized I have seen you post before - and I couldn't imagine you being dishonest about this. You just don't seem like the type to make this sort of stuff up.
So, curious, I headed to Google.
Sure enough, and for those who don't know, the author wants you to cite parallels when you use it in conjunction with something that is going to be published. Now, to be fair, I've always included citation of the tools involved when I published. However, nobody has ever asked me to directly and nobody has nagged me to do so.
It's a bit taboo, I guess. You can pay money and not have to cite it. You can also feed it the --will-cite argument and it won't bug you. Still, it is in probably in poor taste. If nothing else, it is unconventional.
I didn't find a whole lot of complaints about it. The only hyperbolic outrage was, interestingly enough, someone on HN who was quite uptight about it.
I guess Ole is within his rights to do that, but it is certainly unconventional. I wonder if anyone takes his inflated number of citations as a negative against him?
From what I've now read, there don't seem to be a whole lot of complaints. Even the mailing list thread was short and didn't contain any drama. The only drama I found was the post on HN.
I have discussed this before, I really hope the hyperbolic outrage wasn't me though! ;) I like parallel, and this issue is relatively minor. But I admit the language of the nag bothers me just a smidge.
> You can pay money and not have to cite it.
You can also use it without paying. The nag is neither a license nor contract, and the proposed money isn't a required fee. The language of the notice makes it sound like you have to pay, and makes it sound like part of the license.
> I guess Ole is within his rights
I agree completely. And I'm in favor of him getting what he wants too! He's right that PR is helpful for his cause, and I appreciate his cause. I just wish he'd ask without the nag, and use softer language.
Using parallel isn't a reason to cite it, any more than using LaTeX or emacs would be a reason to cite LaTeX or emacs. And if gnu parallel counts as previous research, it's very unlikely he has to ask.
> I didn't find a whole lot of complaints about it.
There are some, and I'd agree it's not a huge issue, by and large. The main discussions I've seen have been with package maintainers. It is a small issue, and the few times I have brought it up, I've had a ton of confirmation that it bothers other people a similarly: a little bit.
No, it wasn't you. I didn't mention their username as I don't want to make it look like I'm calling someone out in a thread and I don't want to appear to be harassing other site members.
They haven't posted in a couple of years. Instead of calling them out by name, I'll just go ahead and link to the thread:
At first, I was thinking it couldn't be true - but then I noticed your username. As I mentioned, I've seen you post before and I didn't think you'd be dishonest. So, I figured I'd go investigate.
It is definitely unconventional. Another poster mentioned it violated GNUs guidelines but I doubt that is true. The GNU folks are, shall we say, really big on remaining ethically consistent. If it violated their guidelines, I'm really certain they'd remove it from their site.
I am not sure I like the precedent that it sets, though I am not seeing it copied by other projects.
Like you, I get why they might do it but, now that I know about it, I don't like the idea of it. I'm not irate, or even perturbed really. It absolutely wouldn't stop me from using the software if I needed it.
It does make me curious as to when people stopped listing their tools. I have academic publications with my name on them, so I know you should cite your tools so that others can reproduce your work.
That is the whole reason you cite your tools, reproducability. It absolutely shouldn't be because of academic fame for the creator of the tools. It shouldn't be about the toolmaker at all. You cite tools, and versions of said tools, so that others can reproduce your work and, if they can't, they can see if it isn't reproducible due to part of the tool chain being different.
In fact, that's one of the great reasons for preferring permissability-based software licenses - so that you have permission to share the exact version of the tools you used to do your research. Citing the toolset just to ensure the author gets a higher counted number is less than optimal.
I realize this veers off-topic but I wonder where things went wrong? I finished my dissertation in the early 1990s and haven't published anything since. I took my newly minted doctorate and headed for the private sector. Perhaps someone who remained in academia knows?
I am really curious as to when this changed and why it changed? Begging for citations, for something that should give very little credit, shouldn't be a thing. Worse, the situation should never be that someone feels pressured to do so. How did it even get to this point? What did I miss?
I have to assume that gnu parallel is currently meeting gnu's guidelines here, perhaps by allowing users to mute the nag. I do wonder if this particular guideline was written specifically for parallel.
> It does make me curious as to when people stopped listing their tools.
That's an interesting question! Certainly some people still do list their tools.
My story isn't entirely different than yours, but I've continued to publish occasionally since my thesis, as well as edit and review papers for several journals.
For me, I think it comes down to methodology. If a program used represents previous research, then it should be cited. If a program used does factor into the methodology of the paper, then citing it for reproducibility is a great idea. If either of those are true with parallel, I will absolutely cite gnu parallel.
But parallel is generally used only to speed things up, and does not affect the methodology at all. I guess it could be nice for reproducibility, in case there are bugs. But if parallel is only used incidentally like that, and the publication isn't about parallelism, and parallelism doesn't affect the output, it's not something you should cite. The journals I've submitted to would normally ask you to remove noise like that if you included it. (And as an editor, I've asked people to remove non-academic references, or at the very least footnote them instead.)
I realize it's not a widespread problem, but take the idea to a logical extreme -- do I cite all my tools? Should I cite my version of Linux, and include that I used zsh instead of bash? I process my output using sed, Perl, awk, Python, numpy, and I did my user experiements using Chrome 55 with JavaScript and Angular. The list of tools I use is very long. As a paper reviewer or editor, I'd be annoyed if I had to wade through that. And the number of tools that affect the methodology is small, those are the ones I care about.
An author should simply include their entire source code as a single citation, rather than individually cite any tools. That satisfies reproducibility without adding any unnecessary noise to the appendix, or treating gnu parallel as a part of the research when it's used only incidentally.
Also, Ole's really asking for PR more than he's asking for academic citations. There are lots of other ways to give him PR and help him. I feel like the emphasis on academics in his citation nag is slightly separate from the overall goal.
If you look there, he says, "... please cite as per ..." So, it isn't required as a part of the license or condition of use. It's just begging.
I suspect that's how it's not in violation of the GNU terms and GPLv3, but I'm not an expert.
And you should cite what version of Perl you used, for example. You should also ensure the source for Perl is available for future researchers. That's why open source is so valuable in academia.
Obviously there is a reasonable limit. If it potentially had an impact, cite it. The key word is reasonable.
I don't know enough about parallel to comment about the viability of it impacting the output. I still find it alarming that they feel compelled to beg for citations.
Also, yeah, when I cited software, it went into the acknowledgments section. This being a different era, I included my email address (the Internet was not world wide back then, so to speak) so that people could contact me and I could mail them a copy of software that I wrote, both compiled and the source.
I'd cite any software that was reasonable to consider as relevant. If possible, I'd cite a scientific article, where possible. A couple of times, the software want necessarily all that important for the science, but I'd found it so useful that I'd cite it - though that was more to draw attention to it.
I do now wonder if it is a generational thing. Namely, when I was still in academia, there wasn't as much software as there is now. The use of computers was still fairly new. Citing our software tools was a bit more unique and citing COTS software was probably even more rare.
That may have something to do with it. While I still read a lot of papers, I'm completely removed from academia. I suspect I missed something along the way. It has been nearly 30 years - that's eons in the world of computers.
I would totally recommend giving it a try anyway, I don't mean to turn anyone away. Gnu parallel is really nice. I use it a ton for things like batch image resizing. I wish the citation request would take another form instead, but it really is minor.
You would then not be allowed to call it GNU Parallel due to possible trademark confusion. This also why we have names like CentOS (not RedHat Free) and IceCat (not Firefox Free).
There was a court case in Germany about this (for some CMS tool - IIRC) where a forker used a similar name. The verdict was pretty clear: Forking was OK (copyright law - permission by GPL), but keeping the name was not (trademark law - no permission by GPL).
Good Q! I don't know, but I also feel like that could be a bit dirty, without having other reasons to fork. I might not want to encourage or support that.
There are some (non-fork) projects that provide the same functionality, with the stated motivation being in part because of the citation thing.
Nothing homebrew can do about it, of course, but this illustrates one of the implications of parallel doing something unexpected; package managers have to field the complaints.
I’ll look again later today. I tried to find one I saw earlier quickly when I replied above, but I didn’t see it. I think it was also called “parallel” and it mentioned the citation being a factor.
If the author wants to extract ten seconds of your time once a year, that's just the price of admission. They're not uptight; they're trying to advance their career.
This is a very important problem. We devs love to make tools, and we love to do it for free. But you can't eat idealism. Parallel is a nice tool, but it's probably not as impressive as it seems when it comes to landing a job.
You're free to decry the author and to point out alternatives. It's practically tradition. But I identify with where the author is coming from. How would you feel if you spent a bunch of time on a tool and the top comment blasts it for reasons unrelated to its merits?
> How would you feel if you spent a bunch of time on a tool and the top comment blasts it for reasons unrelated to its merits?
I love parallel, and I spread the word and participate in the PR that Ole is asking for.
> If the author wants to extract ten seconds of your time once a year, that's just the price of admission.
The ten seconds is not the problem. The problems are:
- The citation nag notice is confusing people on their legal responsibilities. It has prevented people from using it due to the fear of liability it causes. (RE: "If you pay 10000 EUR you should feel free to use GNU Parallel without citing." ... "If you use '--will-cite' in scripts you are expected to pay the 10000 EUR, because you are making it harder to see the citation notice.")
- Ole's citation is not from a scientific or peer-reviewed publication. It simply cannot be used in some contexts. In many contexts, a citation of Parallel is highly inappropriate.
- The small amount of time it takes to either cite Ole or read and understand this thread is largely irrelevant to the question of whether a citation is either warranted or appropriate or allowed (by the actual GNU license, by a user's employer, and/or by the publication they're submitting to). Understanding the license is a one-time event, and it's very important for the license to be legally clear. Parallel's citation notice is causing license confusion. The question is whether Parallel can and should be used without having to worry (forever) about the legal consequences of the contract I've agreed to by using the software.
- This approach isn't scalable. If other GNU tools, or other free software started using the same language as Ole, it would cause widespread problems. Mr. Hess is correct, this is very antithetical to Unix.
"Tut-tutting" is accurate. Your criticisms are mostly mistaken: He isn't demanding a citation. Let's look at what he's actually doing.
The best idea I have come up with so far is printing a citation notice
on STDERR if output is to a terminal when GNU Parallel starts. The
notice will not be printed if STDERR is redirected (to a pipe or a
file), it will also not be printed if --no-notice is given and it can
be disabled completely by running --bibtex once:
"""
When using GNU Parallel to process data for publication please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.
This helps funding further development.
To get rid of this notice run 'parallel --bibtex' once or use '--no-notice'.
"""
That's just about the most straightforward nag screen I've ever seen. It includes both the rationale and how to disable it permanently. Run --bibtex once!
--
> The citation nag notice is confusing people on their legal responsibilities. It has prevented people from using it due to the fear of liability it causes.
Whoever decided not to use parallel due to this was being silly. There are no legal concerns.
> Ole's citation is not from a scientific or peer-reviewed publication. It simply cannot be used in some contexts. In many contexts, a citation of Parallel is highly inappropriate.
He is asking for a citation where possible, to help him out.
> This approach isn't scalable. If other GNU tools, or other free software started using the same language as Ole, it would cause widespread problems.
But other tools don't do that. You can accept this one case to help this one author.
> Mr. Hess is correct, this is very antithetical to Unix.
Y'know what's more antithetical to Unix? When the authors can't build tools because it makes no sense to spend their limited time working on projects that further label them as "not a webdev, so why would I hire them?"
There are still plenty of jobs for non-webdevs, but millions of new programmers have popped up in recent years. Those non-webdev jobs are getting sparser and more competitive.
This argument is full of holes, but the overall point is that Unix as an ideology has been steadily losing ground for the last decade. It's not financially lucrative to be a Unix ideologue. You could argue that that's just the price of adhering to the philosophy. But when a project is actively shunned merely for making a polite citation request, what are onlookers supposed to think?
This is just about the least evil type of hustling that the author could do, yet it's still being treated as some kind of offense. How dare he ask you to run --bibtex.
I love gnu parallel, but that citation thing is a bit of a drag on parallel. You only have to do this once, but I see why you'd write a blog post to include the flag and not have to discuss the issue.
I wish Ole would relax on this. It's not really that appropriate to demand that anyone using parallel for academics cite a magazine article. It's only appropriate for people doing research in parallel algorithms, and in any case it would make citing easier if there was a journal article to reference.
There's nothing wrong with asking nicely, and I try to spread the word about gnu parallel. But the heavy handed demand is annoying, even if I only deal with it like once a year.