Machine learning analysis of scientific articles

Neural networks have an advantage compared to humans because they have access to a much larger body of information. This is why what looks like random noise to a human, after correct processing by a machine learning algorithm turns out to be a signature of Higgs boson.

While analysing physics data with machine learning is definitely a great direction of research, another intriguing possibility is trying to infer what the researchers themselves think. To make an example, the domain of sentiment analysis tries to not only extract the information contained in the text, but also the attitude the author has about this information.

With this insight we (well, really Joe Weston, who did all the heavy lifting) tried to see whether an algorithm is able to infer whether an article report a definitive discovery from its full contents. Naturally, we first focused on a familiar topic: finding Majorana zero modes. Much to our surprise, the algorithm showed nearly 100% accuracy on both the training and validation datasets! With crunching more data we were able to generalize it to analyse questions about quantum computing and even high energy physics.

In order to make this tool available to the community we wrapped it into a simple web service, hooked it to the arXiv API, so that you can check interesting preprints for yourself. Disclaimer: this analysis is automated, we do not control the reported result, and we do not guarantee its correctness for the specific preprint you are trying.

The result of our work is below.

While we only analysed publications related to a couple of research questions, we are happy to share the source code so that you can modify it, retrain, and apply it to your own domain. Looking ahead we expect that automated article assessment will become an important tool for research community, used by anyone from individual researchers to funding agencies and policy makers.