Quick Thoughts on ChatGPT and Academic Research

Several years ago, in my doctoral seminar for 1st year students, I distributed printouts of seven short papers co-authored with me and each of the seven students in the class. I indicated that I had painstakingly worked on these papers to give them a head start on the publication process. I asked them to take 30 minutes to review their paper and let me know what they thought. The papers superficially looked credible, but were garbage, generated by SCIGen “a program that generates random Computer Science research papers, including graphs, figures, and citations. It uses a hand-written context-free grammar to form all elements of the papers.” After review, only 3 of the 7 students identified the nonsensical nature of the papers, 2 were unsure (perhaps because they did not want to challenge the instructor), and 2 indicated that they liked the papers and thanked me.

The technology is far better today, and ChatGPT due to its easy accessibility is causing some widespread concern. Some journals and conferences have already set up policies that prohibit the use of ChatGPT in the research product. For instance, the International Conference on Machine Learning indicate that “Papers that include text generated from a large-scale language model (LLM) such as ChatGPT are prohibited unless the produced text is presented as a part of the paper’s experimental analysis.”

Is this an overreaction? Certainly, the “ability to discern” an AI generated prose from a human generated one increases the diligence needed from our editors and reviewers. Most studies have shown that humans have a difficult time discriminating between AI and human generated text. Machines however (i.e., bot detection AI) however, perform better at discriminating. AI generated writing tends to be less specific, less creative, over generalizes specific instances, and has a different writing style (e.g., uses more predicted words) than human writing. AI tools (like GPTZero) have been pretty successful at probabilistically identifying AI generated writing.

However, while there may be legitimate reasons for reacting adversely to this tool, there are just as many reasons to embrace it proactively. ChatGPT is just that, a tool, that can be embraced like other tools (e.g., Grammarly) to improve the quality of writing. For instance, often the journal review process ends with the tedium of shortening the paper to meet length requirements. Think of alleviating the difficulty in deciding what to cut by using the tool. Or consider the value to authors in feeding a complete paper to the AI tool, and having it write the abstract. Similarly, complex papers could be made more accessible to different constituencies by simplifying the communication of complex ideas. This could facilitate better communication of our work to practice – something often discussed, but rarely done because it takes “extra” effort when the goal of journal publication is met. Non-native speaking researchers could greatly benefit from improving the quality of writing through this tool. The AI could also scrape websites or papers and organize it at a general level that might facilitate data collection (from websites) or a literature review (from papers).

The challenges are also substantial. If our systems (particularly less scrutinous conferences) are not able to discriminate, then it is possible that the mass production of AI bot generated research papers could tax the review system and challenge the integrity of research. False information is just as much of a potential problem in research as it is in journalism and news. This is because how the AI takes information (based on its training set) and weighs certain information could lead to misleading conclusions. The problem may be compounded when it is difficult to untangle the sources of the information and the attribution of credit. Where is the intellectual ownership? Is it with the training set used or with the algorithms, the latter are which are usually a black box behind a wall of corporate control. The lack of transparency can make the governance of the tool very messy.

So, where are we going with this – and what are the solutions? While it would be foolhardy to speculate with high specificity on direction, there are a few general tenets that I feel comfortable in predicting.

The battle between bots (use of ChatGPT vs. Bot Detection AI) is only a small part of the solution. While we can train models in human text vs. AI generated test, there will always be a degree of mismatch as the training sets between the two needs to constantly change as the AI evolves.
The AI will always get better (through reinforcement learning, bigger and better training sets, access to the Internet) and so fighting this trend will fail – policies need to be set around transparency.
For academic research, the line is between using the Chatbot to improve writing (communication goal) vs. to generate new ideas (innovation goal). Where that line is drawn between communication and innovation and how policies are articulated is an important professional conversation.
Chat GPT can never partake in co-authorship arrangements due to its lack of accountability.
There needs to be high cognizance of ethics in the AI to prevent automation of misinformation and the spread of false research.

I suspect that similar to most creative AI tasks, there might be a combination of human and AI complementarity that will produce the best product. ChatGPT as a tool can greatly facilitate research writing and other creative pursuits (like filmmaking, book writing, etc.) – but the open question is how good can it get? The perfect human-AI complementarity may be an elusive ideal that requires ongoing navigation through some delicate ethical boundaries.