Want an AI revolution? Bring on open data!

AI is most often synonymous with data analysis (or analytics). That is, you need data to analyze, and most of the work of analytics is finding and managing data. As more people publish open data, these tasks become easier. The alternative to using open data is that the pace of the AI revolution will slow down.

As you might know, AI is a hot topic. AI, or artificial intelligence, represents the dream, or for some, the fear, of computers taking over many of the tasks that humans do. AI also represents the vision of creating new solutions that humans can’t create on their own.

How advanced is the development of AI in Sweden and other comparable countries? AI isn’t widely established, but there are active projects, some of which are producing good results. Most of them are in the AI sub-discipline of machine learning.

What is machine learning all about?

Perhaps “advanced data analytics” would be a better description than machine learning. Most analytics solutions aren’t intuitive. They often involve large amounts of data and require significant computing power.

One thing about machine learning is easy to understand: without access to the right data, there is no analysis. Experts and research typically conclude that between 80 and 90 percent of the time in machine learning projects is spent finding, organizing, quality checking, and managing data in various ways.

The remaining time is spent deciding what analyses to perform, formulating and implementing algorithms, performing the analyses, and interpreting the results of the analyses.

Recommended reading: Open data facilitates the development of AI

Machine learning is already being used directly and indirectly in many use cases. For example, machine learning analysis is the basis for how autonomous cars decide what is and isn’t a bicycle. Another example is the approval or rejection of different types of applications, such as loans.

In short, all decisions and priorities can be based on data. Many things that exist in the world can be described as data, such as images, sounds, numbers, calculations, written text, and so on.

There are many reasons to use machine learning for such analysis. One reason is the ability to handle large amounts of data and perform advanced analysis on the data. Other reasons include high performance and a high degree of automation. This reduces the need for human intervention. That means lower costs and fewer errors. And it provides a solution to the problem of recruiting competent staff.

With machine learning, let’s start at the beginning: where to find the right data. This is where open data comes in. Access to open data can provide a solution to many of the problems associated with finding data for machine learning, such as

Finding appropriate data in the first place.
Ensuring that the data is of sufficient quality, quantity, and variety.
Structuring the data well.

The better the quality of data, the greater opportunity for automation

Access to open, linked data through public APIs will solve the first problem, assuming public data is available. The availability of public data will also go a long way toward solving the third problem, and will help solve the second. Of course, all of these promises depend on competent people working on structuring and publishing the public data. It can be assumed that there are competent people in organizations that work with public data.

Do you want to publish open data? Click here!

There are a lot of horror stories floating around, highlighting problems with data quality and variation in the context of machine learning. One of the most talked about is the recruiting solution that consistently discriminated against women and people who weren’t white.

Why? Because most of the people currently in positions were white men, and that was reflected in the data used for analysis.

Problems like this can be avoided by building algorithms, interpreting results, and designing solutions in a particular way. But that requires people to do the work. The better the quality of the data, the greater the opportunity for automation and the lower the risk of biased results from analysis.

In summary, publishing open data is currently the best strategy for making data widely available for machine learning. This is especially true in the public sector, where agencies, districts, regions, organizations, and public companies should be incentivized to collaborate by publishing open data.

Everyone wins by following this strategy.