Posted by timr 1 day ago
Recently I was made aware by colleagues of a publication by authors of a new agent-based modeling toolkit in a different, hipper programming language. They compared their system to others, including mine, and made kind of a big checklist of who's better in what, and no surprise, theirs came out on top. But digging deeper, it quickly became clear that they didn't understand how to run my software correctly; and in many other places they bent over backwards to cherry-pick, and made a lot of bold and completely wrong claims. Correcting the record would place their software far below mine.
Mind you, I'm VERY happy to see newer toolkits which are better than mine -- I wrote this thing over 20 years ago after all, and have since moved on. But several colleagues demanded I do so. After a lot of back-and-forth however, it became clear that the journal's editor was too embarrassed and didn't want to require a retraction or revision. And the authors kept coming up with excuses for their errors. So the journal quietly dropped the complaint.
I'm afraid that this is very common.
I recommended that the journal not publish the paper, and gave them a good list of improvements to give to the authors that should be made before re-submitting. The journal agreed with me, and rejected the paper.
A couple of months later, I saw it had been published unchanged in a different journal. It wasn't even a lower-quality journal, if I recall the impact factor was actually higher than the original one.
I despair of the scientific process.
This is one of the reasons you should never accept a single publication at face value. But this isn’t a bug — it’s part of the algorithm. It’s just that most muggles don’t know how science actually works. Once you read enough papers in an area, you have a good sense of what’s in the norm of the distribution of knowledge, and if some flashy new result comes over the transom, you might be curious, but you’re not going to accept it without a lot more evidence.
This situation is different, because it’s a case where an extremely popular bit of accepted wisdom is both wrong, and the system itself appears to be unwilling to acknowledge the error.
Schools should be using these kinds of examples in order to teach critical thinking. Unfortunately the other side of the lesson is how easy it is to push an agenda when you've got a little bit of private backing.
I was an undergraduate at the University of Maryland when you were a graduate student there in the mid nineties. A lot of what you had to say shaped the way I think about computer science. Thank you.
Universities care about money and reputation. Individuals at universities care about their careers.
With exceptions of some saintly individual faculty members, a university is like a big for-profit corporation, only with less accountability.
Faculty bring in money, are strongly linked to reputation (scandal news articles may even say the university name in headlines rather than the person's name), and faculty are hard to get rid of.
Students are completely disposable, there will always be undamaged replacements standing by, and turnover means that soon hardly anyone at the university will even have heard of the student or internal scandal.
Unless you're really lucky, the university's position will be to suppress the messenger.
But if you go in with a lawyer, the lawyer may help your whistleblowing to be taken more seriously, and may also help you negotiate a deal to save your career. (For example of help, you need the university's/department's help in switching advisors gracefully, with funding, even as the uni/dept is trying to minimize the number of people who know about the scandal.)
Integrity is hard, but reputations are lifelong.
Back in my day, grad students generally couldn't afford lawyers.
Our conclusion was to never trust psychology majors with computer code. And like with any other expertise field they should have shown their idea and/or code to some CS majors at the very least before publishing.
How sad. Admitting and correcting a mistake may feel difficult, but it makes you credible.
As a reader, I would have much greater trust in a journal that solicited criticism and readily published corrections and retractions when warranted.
Personally, I would agree with you. That's how these things are supposed to work. In practice, people are still people.
From the perspective of the academic community, there will be lower incentive to publish incorrect results if data and code is shared.
theyre usually published with a response by the authors
Now I'm not saying that everything in M-S is junk, but the small subset I was exposed to was.
On my side-project todo list, I have an idea for a scientific service that overlays a "trust" network over the citation graph. Papers that uncritically cite other work that contains well-known issues should get tagged as "potentially tainted". Authors and institutions that accumulate too many of such sketchy works should be labeled equally. Over time this would provide an additional useful signal vs. just raw citation numbers. You could also look for citation rings and tag them. I think that could be quite useful but requires a bit of work.
The idea failed a simple sanity check: just going to Google Scholar, doing a generic search and reading randomly selected papers from within the past 15 years or so. It turned out most of them were bogus in some obvious way. A lot of ideas for science reform take as axiomatic that the bad stuff is rare and just needs to be filtered out. Once you engage with some field's literatures in a systematic way, it becomes clear that it's more like searching for diamonds in the rough than filtering out occasional corruption.
But at that point you wonder, why bother? There is no alchemical algorithm that can convert intellectual lead into gold. If a field is 90% bogus then it just shouldn't be engaged with at all.
1) Anyone publishes anything they want, whenever they want, as much or as little as the want. Publishing does not say anything about your quality as a researcher, since anyone can do it.
2) Being published doesn't mean it's right, or even credible. No one is filtering the stream, so there's no cachet to being published.
We then let memetic evolution run its course. This is the system that got us Newton, Einstein, Darwin, Mendeleev, Euler, etc. It works, but it's slow, sometimes ugly to watch, and hard to game so some people would much rather use the "Approved by A Council of Peers" nonsense we're presently mired in.
We are just back to the universities under the religious control system that we had before the Enlightenment. Any change would require separating academia from political government power.
Academia is just the propaganda machine for the government, just like the church was the tool for justifying god-gifted powers to kings.
Still I'm skeptical about any sort of system trying to figure out 'trust'. There's too much on the line for researchers/students/... to the point where anything will eventually be gamed. Just too many people trying to get into the system (and getting in is the most important part).
The system ends up promoting an even more conservative culture. What might start great will end up with groups and institutions being even more protective of 'their truths' to avoid getting tainted.
Don't think there's any system which can avoid these sort of things, people were talking about this before WW1, globalisation just put it in overdrive.
That's reference-stealing, some other paper I read cited this so it should be OK, I'll steal their reference. I always make sure I read the cited paper before citing it myself, it's scary how often it says something rather different to what the citation implies. That's not necessarily bad research, more that the author of the citing paper was looking for effect A in the cited reference and I'm looking for effect B, so their reason for citing differs from mine, and it's a valid reference in their paper but wouldn't be in mine.
When you added it up, most of the hard parts were Engineering, and a bit Econ. You would really struggle to work through tough questions in engineering, spend a lot of time on economic theory, and then read the management stuff like you were reading a newspaper.
Management you could spot a mile away as being soft. There's certainly some interesting ideas, but even as students we could smell it was lacking something. It's just a bit too much like a History Channel documentary. Entertaining, certainly, but it felt like false enlightenment.
And from the comments:
> From my experience in social science, including some experience in managment studies specifically, researchers regularly belief things – and will even give policy advice based on those beliefs – that have not even been seriously tested, or have straight up been refuted.
Sometimes people use fewer than one non replicatable studies. They invent studies and use that! An example is the "Harvard Goal Study" that is often trotted out at self-review time at companies. The supposed study suggests that people who write down their goals are more likely to achieve them than people who do not. However, Harvard itself cannot find such a study existing:
https://en.wikipedia.org/wiki/Addiction_Rare_in_Patients_Tre...
Straight-up replications are rare, but if a finding is real, other PIs will partially replicate and build upon it, typically as a smaller step in a related study. (E.g., a new finding about memory comes out, my field is emotion, I might do a new study looking at how emotion and your memory finding interact.)
If the effect is replicable, it will end up used in other studies (subject to randomness and the file drawer effect, anyway). But if an effect is rarely mentioned in the literature afterwards...run far, FAR away, and don't base your research off it.
A good advisor will be able to warn you off lost causes like this.
Ranking 1 to 3 - 1 being the best - 3 the bare minimum for publication.
3. Citations only
2. Citations + full disclosure of data.
1. Citations + full disclosure of data + replicated
you'll just get replication rings in addition to citation rings.
People who cheat in their papers will have no issues cheating in their replication studies too. All this does, is give them a new tool to attack papers they don't like by faking a failed replication.
https://pmc.ncbi.nlm.nih.gov/articles/PMC1182327/pdf/pmed.00...
For example, here's an article that argues (with data) that there is actually little publication bias in medical studies in the Cochrane database:
Ioannidis' work during Covid raised him in my esteem. It's rare to see someone in academics who is willing to set their own reputation on fire in search of truth.
“Most Published Research Findings Are False” —> “Most Published COVID-19 Research Findings Are False” -> “Uh oh, I did a wrongthink, let’s backtrack at bit”.
Is that it?
If IFR is low then a lot of the assumptions that justified lockdowns are invalidated (the models and assumptions were wrong anyway for other reasons, but IFR is just another). So Ioannidis was a bit of a class traitor in that regard and got hammered a lot.
The claim he's a conspiracy theorist isn't supported, it's just the usual ad hominem nonsense (not that there's anything wrong with pointing out genuine conspiracies against the public! That's usually called journalism!). Wikipedia gives four citations for this claim and none of them show him proposing a conspiracy, just arguing that when used properly data showed COVID was less serious than others were claiming. One of the citations is actually of an article written by Ioannidis himself. So Wikipedia is corrupt as per usual. Grokipedia's article is significantly less biased and more accurate.
https://statmodeling.stat.columbia.edu/2020/04/19/fatal-flaw...
That said, I'd put both his serosurvey and the conduct he criticized in "Most Published Research Findings Are False" in a different category from the management science paper discussed here. Those seem mostly explainable by good-faith wishful thinking and motivated reasoning to me, while that paper seems hard to explain except as a knowing fraud.
In hindsight, I can't see any plausible argument for an IFR actually anywhere near 1%. So how were the other researchers "not necessarily wrong"? Perhaps their results were justified by the evidence available at the time, but that still doesn't validate the conclusion.
It's also hard to determine whether that serosurvey (or any other study) got the right answer. The IFR is typically observed to decrease over the course of a pandemic. For example, the IFR for COVID is much lower now than in 2020 even among unvaccinated patients, since they almost certainly acquired natural immunity in prior infections. So high-quality later surveys showing lower IFR don't say much about the IFR back in 2020.
Epidemiology tends to conflate IFR and CFR, that's one of the issues Ioannidis was highlighting in his work. IFR estimates do decline over time but they decline even in the absence of natural immunity buildup, because doctors start becoming aware of more mild cases where the patient recovered without being detected. That leads to a higher number of infections with the same number of fatalities, hence lower IFR computed even retroactively, but there's no biological change happening. It's just a case of data collection limits.
That problem is what motivated the serosurvey. A theoretically perfect serosurvey doesn't have such issues. So, one would expect it to calculate a lower IFR and be a valuable type of study to do well. Part of the background of that work and why it was controversial is large parts of the public health community didn't actually want to know the true IFR because they knew it would be much lower than their initial back-of-the-envelope calculations based on e.g. news reports from China. Surveys like that should have been commissioned by governments at scale, with enough data to resolve any possible complaint, but weren't because public health bodies are just not incentivized that way. Ioannidis didn't play ball and the pro lockdown camp gave him a public beating. I think he was much closer to reality than they were, though. The whole saga spoke to the very warped incentives that come into play the moment you put the word "public" in front of something.
There's the other angle of selective outrage. The case for lockdowns was being promoted based on, amongst other things, the idea that PCR tests have a false positive rate of exactly zero, always, under all conditions. This belief is nonsense although I've encountered wet lab researchers who believe it - apparently this is how they are trained. In one case I argued with the researcher for a bit and discovered he didn't know what Ct threshold COVID labs were using; after I told him he went white and admitted that it was far too high, and that he hadn't known they were doing that.
Gellman's demands for an apology seem very different in this light. Ioannidis et al not only took test FP rates into account in their calculations but directly measured them to cross-check the manufacturer's claims. Nearly every other COVID paper I read simply assumed FPs don't exist at all, or used bizarre circular reasoning like "we know this test has an FP rate of zero because it detects every case perfectly when we define a case as a positive test result". I wrote about it at the time because this problem was so prevalent:
https://medium.com/mike-hearn/pseudo-epidemics-part-ii-61cb0...
I think Gellman realized after the fact that he was being over the top in his assessment because the article has been amended since with numerous "P.S." paragraphs which walk back some of his own rhetoric. He's not a bad writer but in this case I think the overwhelming peer pressure inside academia to conform to the public health narratives got to even him. If the cost of pointing out problems in your field is that every paper you write has to be considered perfect by every possible critic from that point on, it's just another way to stop people flagging problems.
https://sites.stat.columbia.edu/gelman/research/unpublished/...
I don't think Gelman walked anything back in his P.S. paragraphs. The only part I see that could be mistaken for that is his statement that "'not statistically significant' is not the same thing as 'no effect'", but that's trivially obvious to anyone with training in statistics. I read that as a clarification for people without that background.
We'd already discussed PCR specificity ad nauseam, at
https://news.ycombinator.com/item?id=36714034
These test accuracies mattered a lot while trying to forecast the pandemic, but in retrospect one can simply look at the excess mortality, no tests required. So it's odd to still be arguing about that after all the overrun hospitals, morgues, etc.
But then in the P.P.P.S sections he's saying things like "I’m not saying that the claims in the above-linked paper are wrong." (then he has to repeat that twice because in fact that's exactly what it sounds like he's saying), and "When I wrote that the authors of the article owe us all an apology, I didn’t mean they owed us an apology for doing the study" but given he wrote extensively about how he would not have published the study, I think he did mean that.
Also bear in mind there was a followup where Ioannidis's team went the extra mile to satisfy people like Gellman and:
They added more tests of known samples. Before, their reported specificity was 399/401; now it’s 3308/3324. If you’re willing to treat these as independent samples with a common probability, then this is good evidence that the specificity is more than 99.2%. I can do the full Bayesian analysis to be sure, but, roughly, under the assumption of independent sampling, we can now say with confidence that the true infection rate was more than 0.5%.
After taking into account the revised paper, which raised the standard from high to very high, there's not much of Gellman's critique left tbh. I would respect this kind of critique more if he had mentioned the garbage-tier quality of the rest of the literature. Ioannidis' standards were still much higher than everyone else's at that time.
I hope this was sarcasm.
I don’t think the general idea of co-opting is hard to understand, it’s quite easy to understand. But there is a certain personality type, common among people who earn a living by telling Claude what to do, out there with a defect to have to “prove” people on the Internet “wrong,” and these people are constantly, blithely mobilized to further someone’s political cause who truly doesn’t give a fuck about them. Ioannidis is such a personality type, and as you can see, a victim.
In rhetoric, yes. (At least, except when people are given the opportunity to appear virtuous by claiming that they would sacrifice themselves for others.)
In actions and revealed preferences, not so much.
It would be rather difficult to be a functional human being if one took that principle completely seriously, to its logical conclusion.
I can't recall ever hearing any calls for compulsory public interaction, only calls to stop forbidding various forms of public interaction.
If this isn't bad people, then who can ever be called bad people? The word "bad" loses its meaning if you explain away every bad deed by such people as something else. Putting other people's lives at risk by deciding to drive when you are drunk sounds like very bad people to me.
> They’re living in a world in which doing the bad thing–covering up error, refusing to admit they don’t have the evidence to back up their conclusions–is easy, whereas doing the good thing is hard.
I don't understand this line of reasoning. So if people do bad things because they know they can get away with it, they aren't bad people? How does this make sense?
> As researchers they’ve been trained to never back down, to dodge all criticism.
Exactly the opposite is taught. These people are deciding not to back down and admit wrong doing out of their own accord. Not because of some "training".
“That’s a bad thing to do…”
Maybe should be: “That’s a stupid thing to do…”
Or: reckless, irresponsible, selfish, etc.
In other words, maybe it has nothing to do with morals and ethics. Bad is kind of a lame word with limited impact.
> because they know they can get away with it
the point is that the paved paths lead to bad behavior
well designed systems make it easy to do good
> Exactly the opposite is taught.
"trained" doesn't mean "taught". most things are learned but not taught
You guys are saying that drink driving does not make someone a bad person. Ok. Let's say I grant you that. Where do you draw the line for someone being a bad person?
I mean with this line of reasoning you can "explain way" every bad deed and then nobody is a bad person. So do you guys consider someone to be actually a bad person and what did they have to do to cross that line where you can't explain away their bad deed anymore and you really consider them to be bad?
I don't think that that line can be drawn exactly. There are many factors to consider and I'm not sure that even considering them will allow you to draw this line and not come to claims like '99% of people are bad' or '99% of people are not bad'.
'Bad' is not an innate property of a person. 'Bad' is a label that exists only in an observer's model of the world. A spherical person in vacuum cannot be 'bad', but if we add an observer of the person, then they may become bad.
To my mind, the decision of labeling a person to be bad or not labeling them is a decision reflecting how the labeling subject cares about the one on the receiving side. So, it goes like this: first you decide what to do with bad behavior of someone, and if you decide to go about it with punishment, then you call them 'bad', if you decide to help them somehow to stop their bad behavior, then you don't call them bad.
It works like this: when observing some bad behavior I decide what to do about it. If I decide to punish a person, I declare them to be bad. If I decide to help them stop their behavior, I declare them to be not bad, but 'confused' or circumstantially forced, or whatever. Y'see: you cannot change personal traits of others, so if you declare that the reason of bad behavior is a personal trait 'bad' then you cannot do anything about it. If you want to change things, you need to find a cause of bad behavior, that can be controlled.
Once something enters The Canon, it becomes “untouchable,” and no one wants to question it. Fairly classic human nature.
> "The most erroneous stories are those we think we know best -and therefore never scrutinize or question."
-Stephen Jay Gould
Made me think of the black spoon error being off by a factor of 10 and the author also said it didn't impact the main findings.
https://statmodeling.stat.columbia.edu/2024/12/13/how-a-simp...
ResearchGate says 3936 citations. I'm not sure what they are counting, probably all the pdf uploaded to ResearchGate
I'm not sure how they count 6000 citations, but I guess they are counting everything, including quotes by the vicepresident. Probably 6001 after my comment.
Quoted in the article:
>> 1. Journals should disclose comments, complaints, corrections, and retraction requests. Universities should report research integrity complaints and outcomes.
All comments, complaints, corrections, and retraction requests? Unmoderated? Einstein articles will be full of comments explaining why he is wrong, from racist to people that can spell Minkowski to save their lives. In /newest there is like one post per week from someone that discover a new physics theory with the help of ChatGPT. Sometimes it's the same guy, sometimes it's a new one.
[1] https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1964011
[2] https://www.researchgate.net/publication/279944386_The_Impac...
The number appears to be from Google Scholar, which currently reports 6269 citations for the paper
Judging from PubPeer, which allows people to post all of the above anonymously and with minimal moderation, this is not an issue in practice.
It has 0 comments, for an article that forgot "not" in "the result is *** statistical significative".