cool hit counter Can machine learning algorithms turn the tide when it comes to time-consuming and costly paper image checking?_Intefrankly

Can machine learning algorithms turn the tide when it comes to time-consuming and costly paper image checking?


Originally published under the headline Researchers have finally created a tool to spot duplicated images across thousands of papers

Published in Nature News on February 23, 2018

Original article by Declan Butler

Publishers need to join forces and apply image checker software to examine various documents.

Optical microscopes can capture micrographs of cells and tissues, and such photographs may be repeated in the scientific literature.

Source: Mikhail Tereshchenko/TASS/Getty

Computer software can now quickly examine large amounts of research literature and find duplicate images in it, the three scientists said.

Daniel Acuna, a machine learning researcher at Syracuse University in New York, USA, led a team that submitted an original paper on the preprint server bioRxiv on 22 February describing the use of an algorithm to examine hundreds of thousands of biomedical papers and search for duplicate images that exist within them. If journal editors were to adopt a similar approach, it would hopefully be easier to screen images prior to publication - a task that currently requires a significant investment and which only a few publications have taken.

Acuna says their research shows that it is possible to use technology for image checking. He has not made the algorithm public, but has had discussions with Lauran Qualkenbush, director of the Office of Research Integrity at Northwestern University in Chicago and vice president of the American Federation of Research Integrity Officers. "It's very useful for the Office of Research Integrity," she says, "and I'm very hopeful that this year my office will be a pilot unit for this tool for Daniel. "

In early 2015, Acuna and two colleagues used an algorithm to extract more than 2.6 million images, including cell and tissue and gel-blot micrographs, from the 760,000 papers in the then-open access subrepository of the PubMed database of biomedical literature. At the time, the algorithm focused on the most feature-rich regions - the ones with the greatest differences in color and grayscale - to extract the most significant digital "fingerprints" for each image.

After the team eliminated graphics such as arrows or flowchart elements, they ended up with about 2 million images. They compare only the images inside the first author and corresponding author-like papers to avoid the computational load of having to compare each image with all other all images. Even when images are flipped, resized, contrast or color is changed, the algorithm can find potential duplicate images within them.

After that, the trio manually examined a sample of about 3,750 images that had been flagged by the algorithm to determine if the duplicate images were suspicious or faked. Based on the results of the checks, they estimate that about 1.5% of the papers in the database contain suspicious images and 0.6% contain faked images.

Hany Farid, a computer scientist at Dartmouth College in the United States, noted that the researchers have not yet been able to benchmark the accuracy of the algorithm because there is no database containing images of scientific research known to be duplicates or non-duplicates for them to test against. However, he was deeply appreciative of the Acuna trio's application of existing technology to examine realistic images and their attempt to put the tool in the hands of journal editors.

time-consuming and difficult

Currently, many journals check some images, but very few have automated checking processes. As an example, Nature performs random sampling checks on incoming manuscripts and requires authors to submit unedited gel images for reference purposes. Nature is currently reviewing its image inspection process. (The Nature news team and its journal team are editorially independent of each other. )

A number of journals, represented by the Journal of Cell Biology and The EMBO Journal, are leading the way in manually screening most images in submissions, but Bernd Pulverer, editor-in-chief of The EMBO Journal, says the process is so time-consuming that it has been slow to see a routine automated process.

Elsevier's head of research integrity, IJsbrand Jan Aalbersberg, believes that To check the reuse of images in the literature, publishers need to create a shared database of all published images and then use it as a benchmark to compare the images in the paper to be published.

There is a precedent for this type of cooperation before. In 2010, academic publishers launched an industry-wide collaboration of services to combat plagiarism. The non-profit collaborative Crossref (which includes some 10,000 commercial publishers and academic group publishers) has launched the CrossCheck service - which checks the full text of papers published by member publishers using plagiarism detection software iThenticate, produced by California-based Turnitin. The service was later renamed "Similarity Check" and helped make plagiarism detection a regular practice in the publishing industry.

Ed Pentz, executive director of Crossref, said the company currently has no plans to implement a publisher-wide image detection system, partly because the technology is not yet mature. However, he said Crossref will be closely monitoring relevant developments in the industry.

Elsevier, for its part, has expressed support for a similar image detection program to Similarity Check. Two years ago, Elsevier entered into a three-year, €1 million partnership with Humboldt University Berlin to conduct research paper mining and identify research misconduct. On January 25, the project announced plans to create a database based on images from retracted publications. This database can be used as a test image library to help researchers develop automated tools to screen images in publications.


Recommended>>
1、The shortcut Ctrl E that 90 of people dont know
2、My World can you survive 20 rounds of Hypixel minigame with only 2 weapons and equipment
3、Dropshipping VW to form new joint venture with 40 VW stakeTitanium Express
4、To do a good job you must first use the right tools How to choose the right environment for learning C development for a beginner
5、Hedgehog More quality P2P platforms left after compliance checks

    已推荐到看一看 和朋友分享想法
    最多200字,当前共 发送

    已发送

    朋友将在看一看看到

    确定
    分享你的想法...
    取消

    分享想法到看一看

    确定
    最多200字,当前共

    发送中

    网络异常,请稍后重试

    微信扫一扫
    关注该公众号