feat: LDES for all public DCAT data#104
Open
mirdono wants to merge 9 commits into
Open
Conversation
af20bfb to
6214713
Compare
This service will be used to add data to the LDES feeds. This commit just adds it without further configuration. So it will use its default one, which does nothing for this app. Proper configuration that is relevant for the data handled by this app will be added later.
Configuration for the `ldes-delta-pusher` service to initialise a public feed with all DCAT-related resources. This `initialization` configuration will also be re-used by the dispatch and healing functionality.
Process incoming delta messages by - extracting any inserts for the public graph - extract subjects for relevant resource types - retrieve all triples for the subject - insert these triples into the stream
The `dispatch` function is also reused in the healing flow. There changeset inserts are faked with `http://mu.semte.ch/graphs/application` as graph[1]. Filtering inserts based on public graph therefore breaks the healing functionality. Alternatively, one could filter inserts for the `public` AND `application` graphs. But this might to similar subtle bugs when changing the configuration in the future. It seems better to rely fully on the filters in `initialization` to ensure only correct data is put on the LDES. [1]: https://github.com/redpencilio/ldes-delta-pusher-service/blob/f0b272e4cda9bbcef69547d7073ff6b8710e4701/self-healing/heal-ldes-data.ts#L19-L55
6214713 to
b67e510
Compare
Add some basic healing functionality. We rely on the `dcterms:modified` predicate to determine whether a resource has changed. This has two disadvantages: - When changing an existing resource we need to make sure the appropriate value is set for `dcterms:modified`. This could be automated using the `track-modified-service` but considering DCAT data should not change often we opt to avoid that additional complexity. - For DCAT the absence of `dcterms:modified` has a specific meaning. If the upstream source for a DCAT resource chooses not to set a timestamp, e.g. to support continuous updates, out healing will fail to pick up any changes unless we add a timestamp ourselves.
b67e510 to
9f15b73
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a single LDES to the app that publishes all DCAT data in the app's public graph. Additional partner-specific streams will be added in follow-up PRs.
Proposed approach
Add the ldes-delta-pusher and ldes-serve-feed services to the stack.
The initialization file configures all RDF resource types that should be added to the LDES, these are all DCAT resource types as well as related ones such as
foaf:Agentandskos:Concept. UsinggraphFilters we explicitly limit to data in thehttp://mu.semte.ch/graphs/publicgraph as this is the only one currently containing DCAT data. Furthermore, associated data resources are only considered relevant of they are (indirectly) linked to a DCAT resource. For example, afoaf:Agentresource is only considered interesting if it is used asdcterms:publisherordcat:contactPointfor a catalog or dataset resource.The dispatch file handles processing incoming delta messages. For this it relies on the configuration in the initialization file. For each received delta message it essentially does the following:
moveTriplesfunction for further processing.The healing file converts the contents of
initializationto a format appropriate for the healing functionality of the service. I usedcterms:modifiedas healing predicate. This means we need to make sure to update (or add) this triple whenever changing a published resource. This could be automated using the track-modified service. But I don't expect the published resources to change (often) so I opted to avoid this additional complexity. Note, thatdcterms:modifiedMAY be absent for DCAT resources, this means our healing will not work for DCAT resources where it is deliberately not added.How to test
Initialising the LDES
ldes-feedfolder indatadocker-compose.override.yml. TheWRITE_INITIAL_STATEtells the delta pusher to create its initial feed upon starting. Setting theLDES_BASEandBASE_URLallows to more easily browse the feed locally, making links to, for example, other pages of the formhttp://localhost/ldes/public/3.dispatcheranddeltantofierservices such that they load their updated configurations.ldes-delta-pusherandldes-serve-feed. Theldes-delta-pushershould automatically initialise itself, the logs will contain something likeldes-delta-pusher-1 | 2026-05-21T14:32:41.413526565Z done writing initial statewhen it has finished.data/ldes-feed/public/folder. The initial data should correspond to that inserted in this migration.Adding new resources to the LDES
To test whether the
ldes-delta-pusherproperly reacts to new data you need to insert appropriate data via thedatabaseso that delta messages are forwarded. This can be simplified using a fake service that inserts data. To this end, add something like the following to youdocker-compose.override.yml:Create the
config/fake-datafolder and insert the following files:package.json{ "dependencies": { "@lblod/mu-auth-sudo": "^1.1.0" } }app.jsUp the
fake-dataservice. You can call the different endpoints usingdocker compose exec fake-data curl http://localhost/ENDPOINT. The comments for each endpoint inapp.jsshortly explain what behaviour is expected. For example, theinsertendpoint inserts a bunch of DCAT data all of which should appear in the LDES. Note, it might take a few moments before the data shows up, keep an eye on theldes-delta-pusherlogs for its progress.Healing
To test the healing functionality you have to insert/update data directly in the triplestore, for example using a migration or by querying it directly. For instance, the following query updates the title of the Catalogue resource.
After executing the query, manually trigger the healing:
docker compose exec ldes-delta-pusher curl -X POST http://localhost/manual-healing. Afterwards you should be able the find a new version of the catalogue resource with the updated title.TODO
dcterms:modifiedtriples. Furthermore, the associated resources, such asfoaf:Agentandskos:Concept, have no suitable triples. While the former can be fixed in the data itself we need to be careful not change the meaning of thedcterms:modifedtriples. Possibly we need to add track-modified service to support proper healing.Notes
Related tickets