The large language models’ potential is the central finding of a new working paper, which replicated a previous study conducted for a Fortune 500 company, originally done without the use of AI, using ChatGPT-4.
Published by the Marketing Science Institute, the research explored how LLMs can be integrated across different phases of the market research process. Collaborating with a major U.S. food company, the authors recreated both qualitative and quantitative studies from 2019 using generative AI.
The qualitative part focused on understanding consumer perspectives around Friendsgiving, an increasingly popular offshoot of Thanksgiving. The quantitative phase tested reactions to refrigerated dog food among pet owners. These AI-generated studies were then benchmarked against the original human-led ones to evaluate performance. The research team also developed the system architecture and prompt design to ensure the LLM’s output aligned with research objectives.
Key Findings on the Use of LLMs in Market Research
New research into the application of large language models (LLMs) in market research highlights both the potential and limitations of this emerging technology. When combined with human input, LLMs can significantly enhance the quality and depth of insights, particularly in qualitative studies.
One of the most promising findings is that a hybrid LLM-human approach can yield high-quality, information-rich data. In qualitative settings, LLM-generated responses often outperform human counterparts in terms of depth and informativeness, offering detailed answers that introduce new and engaging ideas. However, these responses tend to suffer from lower readability, which may impact their immediate usability without further editing.
LLMs also demonstrated surprising strength in the identification of niche participant segments that human researchers had previously overlooked, suggesting they can broaden the scope and inclusivity of consumer research. In their role as analysts, LLMs showed strong capabilities. They performed comparably to humans when it came to identifying core insights, clustering them into coherent themes, and providing concise summaries.
However, challenges remain – particularly in the quantitative domain. When used to generate synthetic responses for surveys, LLMs struggled with replicating the diversity and internal consistency found in human answers. This limitation can be mitigated by giving models additional context, such as prior survey responses, or by implementing retrieval-augmented generation (RAG), which allows LLMs to draw on a company’s proprietary knowledge base.
Finally, the study cautions that LLMs may unintentionally reinforce human biases present in the training data or propagate misinformation. As such, human oversight remains essential to ensure the integrity and accuracy of the research findings.

