My working definition of an LLM is a large language model trained on large amounts of data designed to generate coherent text based on prediction triggered by prompts. A lot has been written about the use of LLMs in research. Several months ago, playing around with ChatGPT, I was not overly impressed with its ability to summarize reviewer reports to make it more convenient for me to write a Senior Editor letter. Neither was it very useful in generating a PowerPoint slide for insertion into a research paper. In both cases, I spent far more time trying to make it work better – and then I ended up just doing it myself. More recently, however, I revisited ChatGPT and explored its performance in other research activities, and noticed an apparent improvement. Below, I offer some broad observations specific to my experience with the tools.
- The more you are familiar with an area, the less value the LLM will provide.
- Better prompts provide better outputs – but alignment with what you need can be honed through conversations.
- Socrates’ style of questioning is effective. Probes to clarify, challenge assumptions and derive implications. For instance, what do you mean by X, or can you provide an example, or what are the implications….
If we consider various other research tasks, here are some illustrations that describe my experience.
Research Brainstorming
LLMs seem quite effective in exploring ideas. For instance, if you feed an anomaly (something that makes you feel uncomfortable) and ask it for reasons…the yield could be useful. For instance, when I typed, “I don’t understand why IT investments as a percentage of corporate budgets are going down in certain industries despite the digital age we are in. Can you explain this?” The LLM (in this case, Bard) gave me a cogent set of possible reasons – without challenging my inherent claim in the question. When I asked it to “explain why people who gamble tend to be innovative users of technology,” it gave me a long list of very plausible reasons – including their probabilistic mind, their tendency to use various gaming platforms, rapid increase in online gambling, etc. This was useful in providing ideas that can help provide initial ideas to build theoretical models.
Theory/Review
LLMs do not seem as effective at summarizing literature, although, with some probing, the output can get to a marginally useful point. For instance, when I asked ChatGPT to summarize the literature on digital strategy, it gave several fairly generic statements. On probing for sources, it seems that many sources were practice-based books and articles. Probing on key themes allowed the tool to give me enough to have a basic understanding of some major themes, but there was no systemic way to validate the approach followed.
On the theory side, specificity helped. So, asking the LLM to provide a theoretical basis for the relationship between two constructs yielded useful output. For instance, I asked, “Is Williamson’s Transaction Cost Theory valid in today’s highly digital environment?” or to “critique the technology acceptance model,” it provided a fairly complete set of points with reasonable validity. Probing of sources, however, proved unsatisfactory.
Data/Analysis
Trying to search for data was also not satisfying. I asked Bard, “I need a data set for ownership of cell phones in the USA. Where can I find it? Provide links.” The outputs were generic links to public databases – where there might be a possibility of obtaining such data, but nothing that was immediately useful.
However, on specific questions of analysis like “What is the best analysis for a dataset where the many DVs are interval, and the IVs are categorical,” the bot came back with MANOVA and the steps needed to conduct the analysis. Similarly, with specific questions like “Can you find an instrument variable for IT self-efficacy?” or “Write code in R for discriminant analysis where Y is a three-level DV and A and B are IVs,” the chatbot readily provides accurate prescriptions.
Writing
LLMs are quite effective at shortening text – but, in some cases, do not abide by prescribed word limits. They can be useful in identifying keywords or suggesting titles from an abstract. Similarly, previewing a document for clarity and grammar and offering writing diagnostics (where you can ask it to identify the key point in a paragraph or places where sentences are unclear).
Overall, based on my limited experience, I think LLMs can be useful in research but would require some cognitive engagement on the part of the researcher. If the improvement I have observed in a matter of months is any indication, then I believe that at some point in the near term, they might prove to be incredibly useful. However, usefulness as a tool to support research differs from adding useful creativity to the research. How we draw the line between using the LLM to improve writing (communication goal) vs. generating new ideas (innovation goal) will be tricky. While editorial policies of journals are still being refined, most journals require full disclosure of LLM use – and some restrict use to writing only.
Leave a Reply