ChatGPT: All talk and no substance?

Can we really use AI to write research papers?

As an editor, writer, and scientific researcher, I am following with interest the gaining momentum of artificial intelligence (AI) programs in the context of scientific writing.

AI tools are certainly causing a stir and even leading journals Nature and Science are at loggerheads over the best way forward.

Once I learned that Nature journal were accepting submissions that acknowledged AI-assisted writing tools, I decided it was time to check it out for myself. After reading various opinions on the subject, I decided that for the time being, these algorithms are likely best used for creating the “filler” text: introductions, summaries.

I initially asked one-such prominent AI tool (ChatGPT) to compose an introduction for a review paper on acute kidney injury (AKI) in children. I was pretty impressed – within a few seconds I had some fairly decent prose written in the style of a review article introduction. Sentences perfectly formed. A native tone. Ideal.

But the text was superficial. The introduction comprised just 126 words, of which a third described what the review would be about (based on the text I input).

Perhaps these algorithms needed more input than I thought. So, giving it the benefit of the doubt, I gave the algorithm a bit more information, asking it to include a discussion on the genetic basis of AKI in children.

I gained just 20 words on the original.

First impression – a good starting point but certainly not relieving me of the task of writing my introduction.

On to its next test. “Can you include some references in the introduction you have written?”, I asked. “Certainly!”, it boldly responded.

The same introduction came back but now with three reference citations repeatedly dotted about the text. I was shocked, as on first glance, we now had something approaching the full package. A fully referenced piece of novel, grammatically correct, text. The references were absolutely plausible – the journals were well known for the field, I knew the author names and the subject area matched their expertise. There was nothing to suggest anything was wrong.

But the editor and researcher in me checked these three references out. I could not find them anywhere.

So I asked ChatGPT directly, “Are these real references?”. The reply was adamant. “Yes, the references cited are genuine articles that have been published in the scientific literature…” The only concession ChatGPT made was that being a language model, it does “not have the ability to independently verify the accuracy of validity of the content of these references” (a perhaps even more important issue for a later discussion).

I searched again, and again came up with nothing. I asked ChatGPT three times and each time it maintained that the references were genuine, though it conceded that perhaps the citations contained typographical errors hence leading to my problem to find them. On the fourth attempt telling ChatGPT that the references were non-existent, the response was unexpected, to say the least.

ChatGPT returned to me the 200 words or so of introductory text, but this time with a disclaimer that the references referred to “are fictional and provided as an example only”.

Fourth time lucky. It took me half an hour to create some 200 words of falsely referenced text, and half an hour to get ChatGPT to admit it. Could I even trust those 200 words now?

Armed with this knowledge, I repeated the experiment on a different subject area. The same thing happened. This time, two out of three references were invented by ChatGPT. Now knowing how to probe ChatGPT for the truth, I quickly got the response that “the references were generated based on commonly cited sources in the scientific literature” and that “they do not appear to be accurate or credible sources of information”. But I had to probe for this answer many times before the truth came out.

My discovery, therefore, is quite profound. ChatGPT lied. Several times. Here, no harm was caused but I was committed to checking and double checking. Will everyone using these tools to save time be so persistent? In the real world, misattributing statements of fact to actual real researchers is dangerous and misleading. Moreover, if the content these tools create is also false, and then attributed to an active researcher…then what? The repercussions could be serious.

I have no doubt that these prototypes are going to develop into highly sophisticated tools that will have enormous benefits, uses and applications. But I urge caution, especially in the context of the biomedical sciences. As with most things, there are pros and cons with these tools, and while we are in the early days of their development, I suggest that you trust your own abilities rather than a computer to write your papers.

*My tests were based on ChatGPT Jan 30 Version. Free Research Preview. These tests were conducted in February, 2023.

Let us know you
agree to cookies

Author

Jessica Tamanini

In Blog

Let us know you agree to cookies

ChatGPT: All talk and no substance?

Let us know you
agree to cookies