A component to ingest metadata from remote sources as documented at https://soilwise-he.github.io/SoilWise-documentation/technical_components/ingestion/.
Harvesting tasks can best be triggered from a task runner, such as a CI-CD pipeline. Configuration scripts for running various harvesting tasks in a Gitlab CI-CD environment are available in CI. Tasks are configured using environment variables. The result of the harvester are ingested into a PostGres storage, where follow up processes pick up the results.
flowchart LR
hc[engine] -->|harvests| db[(temporary storage)]
mh[harmonisation] <-->|harmonize| db[(storage)]
ma[augmentaton]<-->|enrich| db
db -->|triplify| TS[(Triple store)]
db -->|query| py[(pycsw)]
db -->|indexing| SOLR[(SORL)]
SOLR <-->|query|CT[Catalogue]
This component is tightly related to the md-harmonization and md-augmentation components. Harvested records are stored on a postgres database.
- Ingests metadata from various source types (CSW, Datacite, tailored)
- Can run as a containerised workflow
- Stores metadata on a PostGres Database
Python >3.10 required.
Set up the SoilWise PostGres database following the instructions at db-migrate.
Connection details are configured through environment variables, for example as a .env file.
git clone https://github.com/soilwise-he/harvesters
cd harvesters
pip install -r test/requirements.txt
Run unit tests with pytest (from root folder)
pytest test
From a python enabled shell run:
python csw/metadata.py
Run script in a docker. Create a .env file with harvester details.
docker build -t soilwise/harvesters .
docker run --env-file csw/.env soilwise/harvesters python csw/metadata.py
The following harvesters are configured:
Generic repositories
- inspire
- Bonares repository
- data.europa.eu
- OpenAire
- EEA (including copernicus)
Some project specific repositories (while they are running)
Alternate harvesters
- Projects are harvested from ESDAC as well as Soil Mission platform
- Newsfeeds imports newsfeeds from soil mission websites
This work has been initiated as part of the Soilwise-he project. The project receives funding from the European Union’s HORIZON Innovation Actions 2022 under grant agreement No. 101112838. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or Research Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.