View on GitHub

Science-Fallacies

How the Scientific Establishment Generates Bad Science, and What We Can Do About It

Scientific Peer Review Process

The job of your group is to consider the peer review process.

There have been several high-profile incidents of dramatic failures in the process. In the Sokal Affair, a physics professor submitted a paper to Social Text to investigate “a leading North American journal of cultural studies – whose editorial collective includes such luminaries as Fredric Jameson and Andrew Ross – [would] publish an article liberally salted with nonsense if (a) it sounded good and (b) it flattered the editors’ ideological preconceptions”. The paper was published, of course.

In response to this, several MIT scholars designed a tool, SCIGen, which generated gibberish papers, and had one published at WMSCI 2005, admittedly a weak conference. What was less publicized that in the decade following, over 120 papers generated by SCIGen appeared in credible conferences from IEEE and Springer.

These are not isolated incidents. In my experience, all but one of my papers were not read or understood by their reviewers (the exception being IEEE Transactions on Circuits and Systems II, where the reviewers not only read the paper, but re-did all of my derivations, caught error, and provided very thoughtful feedback on both the concept and the writing). I include a sample review below.

Consider:

For participants unfamiliar with the review process, I attach a sample set of reviews from a paper about deployment of a MOOC recommender system submitted to an IEEE conference.

----------------------- REVIEW 1 ---------------------
PAPER: 184
TITLE: Point-of-need-help at Scale: Recommender Systems
OVERALL EVALUATION: 1 (weak accept)
Relevance: 5 (excellent)
Originality: 4 (good)
Research significance: 2 (poor)
Technical Quality: 2 (poor)
Research context/knowledge of the field: 4 (good)
Form - Organization and readability: 4 (good)
Form - Grammar and style: 4 (good)
Best Paper Nomination: 1 (Definitely not)

----------- REVIEW -----------

The paper aims to promote the use of student crowsourcing to obtain
quality recomendations.
Authors claim that students
who arrived at an incorrect answer and later a correct answer
could submit a remediation which would be seen by future
students who made the same mistake.

Comment: This sentence is confusing our system with one from the prior work section. The rest of the review continues to confuses the two for the rest of the review

however, the authors speak nothing about how are testing the
student remediation to be sure that they are correct. This is a
crucial point but it is not treated in the paper. It would be very
interesting to see a grafic with percentage of correct remediation
and incorrect remediation.

Explanation of Figure 3 is not understood. Which are the
correlations shown in Figure 3?

It is missing the explanation about the recomendation algorithm
used. The page contained the source code,
https://github.com/ANONYMIZED/ANONYMIZED is not operative.

Comment: We replaced identifying information throughout the paper with ‘ANONYMIZED.’

----------------------- REVIEW 2 ---------------------
PAPER: 184
TITLE: Point-of-need-help at Scale: Recommender Systems

OVERALL EVALUATION: -2 (reject)
Relevance: 5 (excellent)
Originality: 3 (fair)
Research significance: 2 (poor)
Technical Quality: 2 (poor)
Research context/knowledge of the field: 3 (fair)
Form - Organization and readability: 3 (fair)
Form - Grammar and style: 4 (good)
Best Paper Nomination: 1 (Definitely not)

----------- REVIEW -----------

The paper describes a system for resource suggestion by participants
on online courses.

The authors do not describe in depth the theoretical foundations
of their work, and the analysis of the results is very preliminary
and somewhat ad-hoc.

The presented qualitative results are limited to a very restricted
scale, given that the number of participants was small

Comment: The paper was the first use of a recommender system in a MOOC. It was the largest deployment of such a recommender system to-date, with thousands of students; prior work in RECSYS used classrooms, typically with dozens of students. The reviewers somehow missed both that this was in a MOOC, the number of students given in the submitted paper, and in all of the plots. The rest of the review continues to presume we did this in one residential classroom.

and the observations were limited to one course. Thus, the main
point of the paper, the applicability of the proposed recommender
in large-scale environments, is not proved by the presented
experiments.

Furthermore, the presented results did not show any significant
effect of the system in student performance.

The paper is well-written and the goal and methodology of the
presented research is clear (disregarding the aforementioned
limitations).

Overall, the discussed work needs a complete, more detailed and in
larger scale redesign of the experiments in order to assess the
main hypothesis posed by the authors.