Science and technology have a profound impact on everyone’s life, but the dense jargon experts use to talk about them make it hard for laypeople to get a grip on these fields. Now a new tool that automatically identifies jargon could help scientists get their point across more effectively.
The “De-jargonizer” is the brainchild of Ayelet Baram-Tsabari, an associate professor at the Israel Institute of Technology, working on science education and communication. Before moving into academia she worked as a journalist, where she wrote about science.
She says she received frequent complaints from scientists about how science was poorly represented in the media, blaming it on unprofessional journalists, the ignorant general public and a failing education system.
“But there was only one community that did not need to change anything—the scientists themselves,” she told Singularity Hub by email. “They did everything perfectly. And I remember thinking that the only group scientists can directly influence is themselves, and they could use a little help.”
The researchers were keen to note that scientists are not willfully making their language indecipherable; they simply suffer from the so-called “curse of knowledge.” After spending years studying a topic, it’s hard to remember what you didn’t know before you became an expert, which makes it hard to judge what terms are going to pose stumbling blocks for a general audience.
So her group set to work creating a free online tool to automatically identify the technical language that alienates outsiders from discussions about science and technology with the hope that experts use it to adapt their articles, blogs, and speeches to be more accessible.
To build it the group hoovered up more than 90 million words from the roughly 250,000 articles published on the BBC’s website between 2012 and 2015. These words were then classified based on how frequently they were used, and this analysis was then used to build the online tool.
When a text is uploaded or pasted into the online tool, an algorithm color-codes words using black to denote commonly used words, orange for intermediate difficulty words, and red for jargon.
Being based on a statistical analysis, the system’s classification is not always perfect, but the researchers say the corpus will be updated periodically, and they hope to make it possible for users to flag errors in classification to help fine-tune the system.
Baram-Tsabari says the system should also be translatable to any language that has a significant amount of written online content so it can establish word frequencies. She says they are testing the approach on Hebrew at the moment, and it seems to work well so far.
In a paper in the journal PLOS ONE last week, the researchers tested the De-jargonizer on 5,000 pairs of academic paper abstracts and their corresponding lay summaries, which are aimed at a wider audience.
They found the lay summaries did include less jargon than the abstracts—10 percent compared to 14 percent. But previous research has found that readers need to be familiar with 98 percent of the words for them to adequately comprehend the material.
That means that even when scientists try and adapt their writing for non-experts, much of it will still be going above people’s heads. Baram-Tsabari hopes the De-jargonizer can help scientists bring those figures down.
“I believe it is our duty as well as interest to make our work, mainly funded by taxpayer money, accessible to members of society,” she says. “I want to see evidenced-based public discourse, and it’s hard to use evidence if the experts don’t do their share of communicating clearly.”
Image Credit: Stock Media provided byelxeneize / Pond5