The shortage of qualified data scientists is often highlighted as one of the major handbrakes on the adoption of big data and AI. But a growing number of tools are putting these capabilities in the hands of non-experts, for better and for worse.
There’s been an explosion in the breadth and quality of self-service analytics platforms in recent years, which let non-technical employees tap the huge amounts of data businesses are sitting on. They typically let users carry out simple, day-to-day analytic tasks—like creating reports or building data visualizations—rather than having to rely on the company’s data specialists.
Gartner recently predicted that workers using self-service analytics will output more analysis than professional data scientists. Given the perennial shortage of data specialists and the huge salaries they command these days, that’s probably music to the ears of most C-suite executives.
And increasingly, it’s not just simple analytic tasks that are being made more accessible. Driven in particular by large cloud computing providers like Amazon, Google, and Microsoft, there are a growing number of tools to help beginners start to build their own machine learning models.
These tools provide pre-built algorithms and intuitive interfaces that make it easy for someone with little experience to get started. They are aimed at developers rather than the everyday business users who use simpler self-service analytics platforms, but they mean it’s no longer necessary to have a PhD in advanced statistics to get started.
Most recently, Google released a service called Cloud AutoML that actually uses machine learning itself to automate the complex process of building and tweaking a deep neural network for image recognition.
They aren’t the only ones automating machine learning. Boston-based DataRobot lets users upload their data, highlight their target variables, and the system then automatically builds hundreds of models based on the platform’s collection of hundreds of open-source machine learning algorithms. The user can then choose from the best performing models and use it to analyze future data.
For the more adventurous developers, there are a growing number of open-source machine learning libraries that provide the basic sub-components needed to craft custom algorithms.
This still requires considerable coding experience and a brain wired for data, but just last month Austin-based CognitiveScale released Cortex, which they say is the first graphical user interface for building AI models.
Rather than having to specify what they want by writing and combining endless lines of code, users can simply drop various pre-made AI “skills” like sentiment analysis or natural language processing into a honeycomb-like interface with lines between the cells denoting data flows. These skills can be combined to build a more complex model that is able to carry out high-level tasks, like processing insurance claims using text analysis.
Just as replacing esoteric command-line interfaces with visual GUIs like Windows greatly expanded the number of people who were able to engage with personal computers, the creators of Cortex say their tool could have a similar effect for AI.
All of these attempts to democratize access to advanced analytics could go a long way to speeding up its adoption across all kinds of businesses. Putting these tools in the hands of non-experts could mean companies that don’t have the resources to compete for the top data professionals can still reap the benefits of AI.
It also frees up experts to work on the most cutting-edge applications of the technology rather than getting bogged down on more mundane but commercially important projects.
But there are also risks that need to be considered before setting non-experts loose on an organization’s data sets. Data science isn’t just about knowing how to build an algorithm. It’s about understanding how to collect data effectively, how to prepare it for analysis, and the strengths and limitations of various statistical techniques.
The old adage “garbage in, garbage out” highlights the danger of putting powerful analytics in the hands of those who don’t fully understand the tools they are using or the provenance of their data and the potential errors or biases that may be hidden in it.
Writing in Forbes, Brent Dykes from self-service analytics platform Domo points out that businesses should not expect the democratization of these technologies to magically turn their employees into effective “citizen data scientists.” He says they need to be coupled with solid training on how to interpret and analyze data properly, as well as robust data governance to make sure the data being used is reliable.
That will require trained data scientists to play a critical oversight role to ensure that the proliferation of AI provides businesses with reliable insights rather than leading them astray.
Image Credit: fatmawati achmad zaenuri / Shutterstock.com