The same thing is trying to happen in scientific publication that's been happeni...

petschge · on June 10, 2019

(I agree with the parent poster, but this seems like the right place to add the following.)

One thing that people who claim "the scarcity is gone, let's just publish everything" do miss is that there still is a scarcity: The time and attention of researchers.

Let me expand a bit on that from my own field of studying reconnection in astrophysical plasma. As you can tell from the description it is not a well defined closed topic. Instead there is tons of more or less adjacent fields. The study of solar flares (which might be triggered by reconnection), the study of coronal heating (the heat might be due to magnetic field getting converted to heat by reconnection), astronomical observations of AGN jets (reconnection might be what produces energetic particles there that we see in the SDE of those sources), observations of pulsar wind nebulae (the energetic particles there might have been accelerated at the termination shock. or due to reconnection), observations of giant pulses in (some) pulsars (that might be due to reconnection at the Y point of the current sheet just outside the light cylinder) and so on. On top of the related physical topics I need to keep on top of development methods in the simulation method I use (particle in cell codes) and in several related simulation methods (either because improvements there might make them viable to study reconnection which wasn't possible before, or because the neat new trick in method X might also improve the characteristics in PiC codes).

If I wanted to I could spend easily 100 hours per week just reading papers. But staying current with the field is just 5 or maybe 10 percent of my job. So what I do is the following: Papers that fall close enough to my special topic I will read. All of them. I will actually print them out, annotate them, go over them with a fine toothed comb. For many other papers I will read the abstracts (10 to 50 each morning) to find the 1 or 2 paper that are worth reading. And this is where selection by the editors and limitations to publications come in. They rank important papers. I am much more likely to read "a novel approach to plasma simulation" if it was accepted into the Astrophysical Journal than if it just appears on ArXiv. Because ApJ does not like pure code papers. So if it made the cut 2 or 3 experts in the field deemed it worth the time of the community.

Now that doesn't mean that all ArXiv papers are bad, that we should publish less or anything like that. When I talk to a colleague and ask about a technical detail I am SOOOO happy when they say "it's in the Arxiv paper ID 1906.bla". And on the other hand there is the notion of "it was in Nature but it still might be right". Bottom line is:

Do not discount the sorting by topic (numerical vs observational vs theoretical), impact ("here is one data point" vs "here is a completely new approach") and quality ("I'm not too convinced, but maybe it gives somebody an idea" vs "holy smokes how did we all miss that!") that journals provide. Any better, future alternative needs this sorting and ranking. Just dumping it onto the internet is not the solution.

marcus_holmes · on June 10, 2019

Interesting. This seems to correlate almost exactly with discovery in music streaming too. Given the torrent of new music available, how do you discover the new music that you like?

At the moment, AI is doing a pretty bad job of this - my Spotify Discover Weekly is an interesting listen, but I know it's not the best selection of new music out there suited to my tastes (to be fair, it's not really trying to be that, though). My "recommended" list on Netflix bores me. I get why they're recommending them, but it's all things that I've seen and rejected from My List, rather than things that I would find genuinely new and interesting.

I think this is the next big problem to solve, for everyone. The combination of Discoverability (how do I get my music/paper/novel/art/film discovered by the audience of people who will like it?) and Search (how do I find new music/papers/novels/art/film that suit my tastes/research subject?).

isodude · on June 10, 2019

Way back it was always through word of mouth that one discovered things. I still hold it true. You may have 10 friends how have their nieche and once in a while they recommend something they think you want to listen to.

I think that the music matter more based on whom recommended it. An algorithm won't have the same effect. It's missing the storyline on how you ended up with watching x movie or listening to y song.

marcus_holmes · on June 13, 2019

really good point. There are a number of bands that I listened to (and ended up liking) purely because of who told me about them.

isodude · on June 17, 2019

I think the same applies to most stuff. Your idols influence alot. I see this as the best reason to not rely purely on algorithms.

Producing playlists is also a sort of art.

dorchadas · on June 10, 2019

You're right, it does need some type of ranking. I think the endorsement could be something like that. Instead of having journals endorse it, let specific academics do it. So if Dr. John Doe really likes an article, he can endorse it personally, and put his personal reputation up to bat for it.

And then have an algorithm that sorts based on endorsements, or how trusted the endorsers are ('Do you agree with this endorsement', or how many endorsements they themselves have). Let the authors easily input all the topics they want as tags, making it easy to search as well. And, of course, public comments about what would need to change to get an endorsement, as well as allowing readers to look at all previous versions of an article to see what changed. Basically, let the academics take care of endorsing and topic tagging themselves.

petschge · on June 10, 2019

Reviewing every paper by just one to three qualified people is already hard and putting a lot of strain on the system. Trying to solve that by "just have everybody read everything and endorse what they like" is going to break it. Or rather most paper will never be read nor endorsed by a professional. All it would do it turn science into even more of a who like whom.

isodude · on June 10, 2019

The same is true within IT, you need to read a lot to make sure you're in the loop.

I think it correlates to map structures, if every point in a map is connected to every other point there will be a large surface for each point in the beginning but as the amount of points grow the surface (of possible attention in this case) will shrink.

How much area of attention is needed to perform the best in a creative research, to produce the best work in a given time? Maybe the answer to that question will yield more papers with better quality.

My personal belief is that we need to dream more and do less work to make sure the work that is done is of more quality such that no filtration is needed and we can truly publish everything.

petschge · on June 10, 2019

Even if every paper is amazing, I still don't have the time to read about all topics. So even if every submitted paper is great and should be published, we still need sorting by topic.

Add to that the fact that "publish or perish" has lead to a decline in at least amount of "actually new information" per paper if not outright in quality of papers. And that is not going away easily.

isodude · on June 10, 2019

Yeah, I agree that force publishing something which is not ready is always bad.

Maybe publishing in itself is a way that does not scale well. Text has always been quite the slow way of making progress. And now with VR we have better ways to visualize and learn from others work.

How would you like to work, given free time and funding? Would you like to be able to keep track of every related research?

zimbatm · on June 10, 2019

This very site is an example of a way to handle abundant content.

"Post all of the things" doesn't mean that content shouldn't be prioritized. The process is just asynchronous. It avoids abuse in specialized fields where a low number of experts are the gateway to publication.

petschge · on June 10, 2019

This site works ok to find content that is intersting to a large community. The problem in science is "how do you find content for the 5 people working on problem X". For every X.

baxtr · on June 10, 2019

Great insight, thanks! It sounds like AI could play a part here in order to achieve the required efficiency, e.g. sorting out the right papers for you.

petschge · on June 10, 2019

All I can say is that all the feeds that currently promise to solve that problem (Nasa ADS, Google Scholar, Researchgate and the feeds of several journals) are all pretty bad. I get several suggestions for medical/biophysical papers every day (probably due to "plasma" and "particle in _C_ell").

fortdriad · on June 10, 2019

This is a complex topic and I think Gelman (the linked poster) is either misinterpreting and/or confused by Kaufman and Glǎveanu (the article he's discussing). Just for some context, I agree with Gelman and KG both in their main arguments.

There's different issues colliding in the open science movement. One is what you're referring to, the fact that scarcity is gone. Combined with the overcrowding and hypercompetitiveness of science, you not only have what you are referring to, a decrease in verdicality, you also have a decrease in signal-to-noise in general. So the nonsense increases, but so do the traditional signals to quality. So nonsense appears in high-profile journals, and very quality work appears in low-profile journals or even as "unpublished" pieces.

The other problem though, what my colleagues refer to as the "science police", is an increasing tendency for certain groups to argue that a certain set of practices are not only good, but necessary for "good" science, and by implication, everything that does not is "bad" science, in a black-and-white kind of way.

For one thing, not all problems are with replication. Nonsense can replicate well, and very important legitimate phenomena can be difficult to replicate. If something is really not replicable at all, that's a problem, but replicability per se is only one part of scientific progress, and it comes in degrees with various causes.

It's also much more difficult to determine what is replicable sometimes than it might seem on the surface. Replicate what? What's important to replicate? How? Sometimes this is clear, but other times it is not.

Also, when you really delve into it, there's not really a good rationale for what, exactly, are the important ingredients for open science, or why. For example, is it really necessary to have preregistered studies? What's to keep someone from preregistering but then silently declining to publish null results? Or to "preregister" something they've already collected? If an important unanalyzed pre-existing dataset becomes available, is that "tainted" because it wasn't preregistered? Is it important to preregister, or just to make the data openly available? Is it better to use modeling to identify anomalies in studies, or to rely on preregistration? These issues aren't always clear.

I think there's a sense sometimes that the open science movement is not only trying to dismantle a broken system run by an established elite, but to replace it dogmatically with a new system run by a new elite, with its own imperfect rules. Already I've seen misuses of open science guidelines used to bully and discredit legitimate work (for example, by suggesting that someone is hiding something by not sharing data, when the data contains protected healthcare information and would be accessible to them anyway if they would just go through proper channels). This is tricky to discuss, as you might imagine, so it comes out in pieces like KG's piece. Gelman is asking "why not publish everything", which is responding (I think) to something different from what KG are responding to. Maybe I'm misreading KG, but I think they might also argue "why not publish everything"; they just have a different group they're addressing when they would say that.