Configuring the EntryScape platform to effectively support Application Profiles
MetaSolutions AB, Stockholm, Sweden http://metasolutions.se
Building web applications for every metadata standard is a daunting task. Writing applications for every Application Profile of a standard, for instance national or topical variations of a standard, is an even worse prospect. The EntryScape platform is built on the assumption that application profiles can be expressed as configurations for user interfaces oriented towards managing metadata. In this short paper we describe how this approach has been successfully realized. We also go into detail regarding the DCAT-AP recommendation and what we learned when implementing it in the EntryScape user interface. We present some generic conclusions regarding things to take into consideration when writing standards or application profiles, to minimize the risk for developers to rely on guesswork or ignore parts that are unclear.
EntryScape is an Open Source data management system based on Linked Data. EntryScape is built as a Single Page Application and is intended to meet the needs of a wide variety of users and use-cases. EntryScape comes with three main modules:
EntryScape’s architecture includes a backend and a framework that contribute with some fundamental functionality:
We will now present each of the modules and summarize their core functionality, their challenges, and their roadmap. After that we will discuss some issues and observations we made when implementing support for DCAT-AP. Finally, in the conclusions we list some insights regarding what could be done differently when providing recommendations or applications profiles for using vocabularies and schemas.
EntryScape Catalog enables organizations to manage their data catalogs and datasets according to the DCAT-AP standard. More specifically, we have provided RDForms-templates for the Swedish and the Norwegian adaptation of DCAT-AP (more are under development). The system allows data managers to collaboratively author a data catalog, datasets, distributions, publishers and contacts. If appropriate, whole data catalogs or individual datasets may be published as Linked Data and made available, e.g. for being harvested by data portals with support for DCAT-AP. It is also possible to download entire catalogs as a single file, with all DCAT-AP entities included. The latter is suitable for harvesting schemes and is currently in use by several Swedish agencies and municipalities to publish their datasets to the National Swedish Open Data portal (oppnadata.se) and the European Data Portal.
EntryScape Catalog supports uploading files to a dataset that will appear as distributions. APIs can be automatically generated from tabular data to provide search oriented access to rows of the dataset according to the CSV on the Web recommendations. The API will also appear as a distribution of the dataset.
EntryScape Catalog also offers a portal view and special scripts that allow a brief summary per dataset to be embedded on web page, e.g. generated by a CMS. How much information that is included is configurable and there are always links back to the full portal view.
We have seen that there is a need for supporting a larger work process from the idea of a dataset to a mature and published dataset. This led to work on introducing support for “candidate datasets” in EntryScape Catalog. Candidates include a checklist (configurable per organization) along with support for tracking comments and tasks. On the same note we are also looking into how to support the maintenance perspective of datasets, e.g. reminders and checks whose regularity should probably be based on the accrual periodicity as used by DCAT-AP.
As others already have pointed out before, we also see a need for better description of APIs. The current approach with providing an access URL for a distribution with documentation links (foaf:page) and linked schemas (dct:conformsTo) requires an official recommendation for being useful. Even with that, we already know that some data maintainers need to provide information about a service (API) directly in the metadata. Requiring them to provide something like a Swagger file will in many cases shoot over the target both from the perspective of the maintainer’s technical competence, commitment, or allocated time. It may also be too much from the perspective of what is needed for achieving discovery in Open Data portals.
EntryScape Terms manages concept schemes and concepts expressed in SKOS. We have created an RDForms-template for this module as well to allow easy editing of concepts including labels, notes, relations and mapping relations. Concepts can be dragged and dropped in a hierarchy represented by the narrow-broader relations as expected. There is also support for organizing concepts into SKOS collections. Entire SKOS concept schemes can be imported and original URIs for the concepts can be preserved or replaced with newly minted ones depending on the use case.
An important aspect of EntryScape Terms is how concepts can easily be selected from within metadata forms. For instance, a regional or topical adaption of DCAT-AP may require a fine-grained categorization of datasets. If such a categorization is too large, or envisioned to change quite often, it is not reasonable to enumerate it directly in the RDForms-template. In such cases it is better to use EntryScape Terms to maintain the categorization and to provide a dynamic reference to it from within the metadata form, to perform a type-ahead search when needed.
Discussions with terminology experts indicate that a process for creating new concepts where documenting them is the last step would be much appreciated. This is tightly coupled with information about the status or maturity of concepts which we see in practice in many systems. There is also a need for integration with other knowledge organization services like Wikidata. Such an integration could mean to provide search functionality for referencing concepts or other entities in such external services.
EntryScape Workbench is different from the other modules in the sense that it does not come with support for a specific Application Profile out of the box. Instead, its purpose is to allow each organization to define a set of entity types and corresponding RDForms templates to meet their specific needs. All entities are organized in lists based on their entity types where they can be created, imported and interconnected if supported by the corresponding RDForms templates.
We believe that the list oriented view is a fine baseline, but it needs to be complemented with other views. One good example is the tree view we already have in EntryScape Terms. Although, to be useful the broader and narrow relations need to be replaced with other properties that are defined per entity type. We also see a need for sorting entities orthogonally to entity types, options include tagging entities and using collections. We may end up supporting both. It is also clear that there is a need for a more flexible definition of entity types as it currently is based solely on class membership. For example, keeping countries and continents separate, both being imported from GeoNames and typed with the geo:Feature class, requires using specific predicate-object pairs as separating constraints.
Our experience indicates that neither DCAT nor DCAT-AP, although good recommendations, are specific enough when implementing applications. There are a lot of additional decisions that have to be made, in EntryScape we make those decisions in a configuration step when we write our RDForms templates. Here are a few examples:
In addition, there are recommendations that are clear but hard to realize without compromising high standards of usability, for example:
As a side note, DCAT-AP strongly suggests that vocabularies should use resolvable identifiers which is not true for any of the vocabularies provided by the Publications Office. In addition, it is recommended to use vocabularies that contain a relatively small number of terms, i.e. 10-25. This is not the case either, e.g. the Location vocabulary contains 2853 terms and the mediatypes vocabulary contains 119 terms.
First, we are happy to see that it is doable to build solutions directly on top of native Linked Data platforms using established schemas and vocabularies. EntryScape is such a flexible platform and we are looking forward to applying it within other knowledge intensive domains where there are strong mature linked data based schemas and vocabularies, e.g. cultural heritage, health, building sector etc.
We believe that at least four somewhat generic conclusions can be drawn from the discussion above, around the use of DCAT-AP and related to the provided examples:
Finally, it is worth observing that writing specifications and recommendations like DCAT-AP as thick documents with long tables is a practise that seems to have come to stay. Despite a good amount of expertise from the authors, there are errors (although quite few) and missing information that implementers have to deal with (or ignore while waiting for guidelines or the next version). There have been initiatives, like the Dublin Core-based Description Set Profiles, that aim to provide a machine-readable expression where missing information and bugs could be detected much sooner, at least in theory. But, we has a strong suspicion that this will not help. At least, it will not solve the problem with missing information, simply because the missing information in many cases reflects the lack of agreement or even lack of time to come to a conclusion on all issues. Maybe this is the cost of driving standardization based on consensus rather than by committee behind closed doors, if so, it is probably worth it.