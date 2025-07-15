When we write something to another person, over email or perhaps on social media, we may not state things directly, but our words may instead convey a latent meaning—an underlying subtext. We also often hope that this meaning will come through to the reader.

But what happens if an artificial intelligence system is at the other end, rather than a person? Can AI, especially conversational AI, understand the latent meaning in our text? And if so, what does this mean for us?

Latent content analysis is an area of study concerned with uncovering the deeper meanings, sentiments, and subtleties embedded in text. For example, this type of analysis can help us grasp political leanings present in communications that are perhaps not obvious to everyone.

Understanding how intense someone’s emotions are or whether they’re being sarcastic can be crucial in supporting a person’s mental health, improving customer service, and even keeping people safe at a national level.

These are only some examples. We can imagine benefits in other areas of life, like social science research, policymaking, and business. Given how important these tasks are—and how quickly conversational AI is improving—it’s essential to explore what these technologies can (and can’t) do in this regard.

Work on this issue is only just starting. Current work shows that ChatGPT has had limited success in detecting political leanings on news websites. Another study that focused on differences in sarcasm detection between different large language models—the technology behind AI chatbots such as ChatGPT—showed that some are better than others.

Finally, a study showed that LLMs can guess the emotional “valence” of words—the inherent positive or negative feeling associated with them. Our new study published in Scientific Reports tested whether conversational AI, inclusive of GPT-4—a relatively recent version of ChatGPT—can read between the lines of human-written texts.

The goal was to find out how well LLMs simulate understanding of sentiment, political leaning, emotional intensity, and sarcasm—thus encompassing multiple latent meanings in one study. This study evaluated the reliability, consistency, and quality of seven LLMs, including GPT-4, Gemini, Llama-3.1-70B, and Mixtral 8 × 7B.

We found that these LLMs are about as good as humans at analyzing sentiment, political leaning, emotional intensity, and sarcasm detection. The study involved 33 human subjects and assessed 100 curated items of text.

For spotting political leanings, GPT-4 was more consistent than humans. That matters in fields like journalism, political science, or public health, where inconsistent judgement can skew findings or miss patterns.

GPT-4 also proved capable of picking up on emotional intensity and especially valence. Whether a tweet was composed by someone who was mildly annoyed or deeply outraged, the AI could tell—although someone still had to confirm if the AI was correct in its assessment. This was because AI tends to downplay emotions. Sarcasm remained a stumbling block both for humans and machines.