The DCAT Application Profile for data portals in Europe (DCAT-AP) is a specification based on the Data Catalogue Vocabulary (DCAT) developed by W3C. It defines how to describe public data in the EU.
How to rationally collect information from many organizations in order to make them easy to find and understand? The answer to this question is DCAT-AP from the European Commission, which is a standard and a method to share data.
The European Commission faced the challenge to gather data from 28 EU-countries, that in turn had 1000 organizations with over 100 datasets each.
This was the reason that the commission decided to go for automatic collection of datasets – to use DCAT-AP. It solves the scalability issue as well as making it easy to present and search for data, develop data portals, so anyone can find datasets in the EU.
DCAT-AP and open data
From the European commissions standard it is clear what is mandatory and optional information when publishing open data. It also includes use of specific terms and references. It all started with the PSI Directive – now called Open Data Directive.
Countries can choose to make adaptations to the standard to get richer information that is easier to understand. One example is Sweden where DCAT-AP-SE is defined. This is also the only way to nationally publish open data to the national data portal, dataportal.se. A change is motivated by two things. Firstly to be able to leave information to the EU secondly to make a rational data collection within the state. Good implementation can minimize a lot of manual work and at the same time increase quality.
How DCAT-AP works
In DCAT-AP you describe your information 3 steps: For a data catalog (1), for a dataset or data service in the catalog (2) and lastly for the distributions (resources) connected to the dataset (3).
A dataset is only a description of what data exists and what it contains. The actual access to data is what is called distribution. This allows the possibility to give access to data in several different ways. Eg a dataset can be in accessed as a .zip-file you download and also as an API.
Since DCAT-AP is used in accordance with standards from the W3C it is inherently easy to extend the information to other use cases such as terminologies (standard SKOS) and specification (standard is PROF).
Editor and harvester for DCAT-AP
When setting a country specific profile there may be two new needs. A editor for the data owner to express data so it can be published. And on the other side a way to collect data from several sources. The editor is referred to as a DCAT-AP data catalog and the collector is referred to as a DCAT-AP harvester.
The data catalog challenge is not to create the first correct DCAT-AP expression, but rather to provide a work process so that data is maintained without the need of specialist. It is about the lifecycle.
A harvester on the other hand is in the hand of specialist and needs to validate data as well as give easy means to build a search interface. The combination of harvesting data and presenting data is what is called a data portal. A portal can be very specific for a region or a topic as well as a nationwide access to data.
MetaSolutions develops EntryScape – open source solutions in different modules for working with metadata. Catalog is packaged for the publisher and Registry for the data collector which is typically the data portal owner. You can use the modules in different scenarios, together with existing solutions. Both modules support several DCAT-AP profiles today and new profiles is added continuously.