Skip to content

Conversation

@ullingerc
Copy link
Member

@ullingerc ullingerc commented Oct 31, 2025

Included in this PR

  • infrastructure MaterializedView, MaterializedViewWriter, MaterializedViewsManager
  • magic service query for reading any number of columns + magic predicate for reading a single column
  • IndexScan support for MaterializedView
  • support for reading from a materialized view in parser and query planner
  • writing views from SPARQL queries via an HTTP request and via libqlever
  • very large views can be written due to external sorting + sorting is only done if necessary (not if the query result is already sorted correctly)
  • explicitly loading views via an HTTP request and via libqlever
  • code documentation
  • unit tests for direct usage, usage through Libqlever, usage through HTTP and logging

Note: this PR is merged with and depends on helper PR #2519

Shortcomings to be adressed in future PRs

  • The query to build a view currently needs to select a minimum of four columns and may not need to use a local vocab
  • Materialized Views are not detected automatically in query planning to accelerate conventional queries but rather need to be requested manually as shown below
  • MaterializedViewWriter currently computes and immediately discards an unused SOP permutation
  • Currently views can be loaded one-by-one or are loaded automatically upon first use. In the future the configuration should allow loading a list or all at server start.
  • Updates after creation of a view are ignored
  • Reading from a materialized view always reads the first two payload columns even if they are not requested (because of hard coded triple-semantics of IndexScan) - they are discarded automatically via column stripping

Usage examples (Reading + Writing, Libqlever + HTTP)

Write a materialized view with libqlever:

qlever::EngineConfig config;
config.baseName_ = "my-dataset";
qlever::Qlever qlv{config};
qlv.writeMaterializedView("nameOfTheView", "SELECT ... { ... }");

Write a materialized view with an HTTP request:

curl "http://localhost:7930/?cmd=write-materialized-view&view-name=nameOfTheView&timeout=24h&access-token=$ACCESS_TOKEN" \
  -H "Accept: application/json" 
  -H "Content-type: application/sparql-query" 
  --data "SELECT * WHERE {... }"
# Returns: {"materialized-view-written":"nameOfTheView"}

Load a view explicitly with libqlever:

qlv.loadMaterializedView("nameOfTheView");

Load a view explicitly with an HTTP request (otherwise it is loaded automatically when the first query is issued on the view):

curl "http://localhost:7930/?cmd=load-materialized-view&view-name=nameOfTheView&access-token=$ACCESS_TOKEN"

Different ways to read from a materialized view:

PREFIX osmkey: <https://www.openstreetmap.org/wiki/Key:>
PREFIX osmway: <https://www.openstreetmap.org/way/>
PREFIX view: <https://qlever.cs.uni-freiburg.de/materializedView/>
SELECT * WHERE {
  # Method 1: scan with fixed subject
  osmway:6593464 view:geom:geometry ?geo .
  
  # Method 2: scan without fixed subject
  ?x view:geom:geometry ?geo .
  ?x osmkey:waterway [] .
  
  # Method 3: service with control over which payload columns are read
  SERVICE view: {
    _:config view:name "geom" ;
             view:scan-column osmway:6593464 ; # could also be a variable here
             view:payload-geometry ?geo ;
             view:payload-centroid ?centroid .
  }
}

Performance tests for geometries and geometry info

All experiments with warm OS disk cache (Run query once, clear cache, run query again and take the time of the run after clear-cache). Indices have on-disk-compressed-geo-split vocabulary type. The materialized view contains geo:hasGeometry/geo:asWKT, geof:centroid, geof:metricLength, geof:metricArea .

Query to write the view:

curl "http://localhost:7907/?cmd=write-materialized-view&view-name=geom&timeout=48h&access-token=$ACCESS_TOKEN" -H "Accept: application/json" -H "Content-type: application/sparql-query" --data "PREFIX geo: <http://www.opengis.net/ont/geosparql#> PREFIX geof: <http://www.opengis.net/def/function/geosparql/> SELECT * WHERE { ?osm geo:hasGeometry ?interm . ?interm geo:asWKT ?geometry . BIND(geof:centroid(?geometry) AS ?centroid) BIND(geof:metricLength(?geometry) AS ?length) BIND(geof:metricArea(?geometry) AS ?area) }"

Times below indicated as: Time-without-using-view -> Time-using-view

OSM Switzerland

View stats: 58,870,803 rows, time for writing 32 sec, size 941 MiB

  • Join geo:hasGeometry/geo:asWKT with ?osm_id osmkey:highway []: 600 ms -> 270 ms
  • Join geo:hasGeometry/geo:asWKT with ?osm_id osmkey:building []: 775 ms -> 290 ms
  • Join geo:hasGeometry/geo:asWKT with ?osm_id osmkey:amenity "restaurant": 250 ms -> 150 ms
  • Fixed subject osmrel:1690227 geo:hasGeometry/geo:asWKT (Zurich): 4 ms -> 2 ms
  • geo:hasGeometry/geo:asWKT, geof:centroid, and geof:metricArea of all osmkey:building []: 3780 ms -> 350 ms (Note that the index has on-disk-compressed-geo-split vocabulary type - the results are thus precomputed in both cases, just the method of retrieval is different)
  • Prefiltering test: ?osm_id osmkey:name "Zürich" . ?osm_id geo:hasGeometry/geo:asWKT ?geometry, both 13 ms, prefiltered Index Scan reads 8 of 1256 blocks of view

OSM Planet

View stats: 11,358,592,564 rows, time for writing 12 min 52 sec, size 183 GiB

  • Join geo:hasGeometry/geo:asWKT with ?osm_id osmkey:highway [] with LIMIT 1000: 26912ms -> 1768ms
  • Prefiltering test: ?osm_id osmkey:name "Zürich" . ?osm_id geo:hasGeometry/geo:asWKT ?geometry, 271ms -> 66ms

Performance tests for large star (table-like structure)

Like above, experiments run with warm cache. Queries run with ?send=0 to avoid measuring export time.

DBLP

Wrote view with:

curl "http://localhost:7915/?cmd=write-materialized-view&view-name=articles&access-token=$ACCESS_TOKEN" -H "Accept: application/sparql-results+json" -H "Content-type: application/sparql-query" --data "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX dblp: <https://dblp.org/rdf/schema#> PREFIX datacite: <http://purl.org/spar/datacite/> SELECT * WHERE { ?s rdf:type dblp:Article ; dblp:numberOfCreators ?numberOfCreators ; dblp:yearOfPublication ?yearOfPublication ; dblp:title ?title ; dblp:bibtexType ?bibtexType }"

View stats: 3,039,856 rows, time for writing < 1 sec, size 17 MiB

Query without view: time to compute 397ms, total time 444ms

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX datacite: <http://purl.org/spar/datacite/>
SELECT * WHERE {
  ?s rdf:type dblp:Article ;
     dblp:numberOfCreators ?numberOfCreators ;
     dblp:yearOfPublication ?yearOfPublication ;
     dblp:title ?title ;
     dblp:bibtexType ?bibtexType
}

Query with view: time to compute 3ms, total time 75ms

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX datacite: <http://purl.org/spar/datacite/>
PREFIX view: <https://qlever.cs.uni-freiburg.de/materializedView/>
PREFIX p: <https://qlever.cs.uni-freiburg.de/materializedView/payload->
SELECT * WHERE {
  SERVICE view: {
    _:x view:name "articles" ;
        view:scan-column ?s ;
        p:numberOfCreators ?numberOfCreators ;
        p:yearOfPublication ?yearOfPublication ;
        p:title ?title ;
        p:bibtexType ?bibtexType .
  }
}

@ullingerc ullingerc changed the title Experiments in preparation for Materialized Views Add support for materialized views Nov 4, 2025
hannahbast pushed a commit that referenced this pull request Nov 24, 2025
#2519)

Previously, the `IndexScan` constructor took a `Permutation::Enum` and retrieved the respective `Permutation` and `LocatedTriplesSnapshot` from its `QueryExecutionContext`. However, this was limited to the six standard permutations PSO, POS, SPO, SOP, OSP, and OPS. With this change, the `IndexScan` constructor takes a `PermutationPtr` and `LocatedTriplesSnapshotPtr`, which are shared pointers to the "real thing". This can then be used also for custom permutations, which are not part of the regular index, in particular those from the materialized views in #2482

NOTE: The "normal" and "internal" variant of the PSO or POS permutation still use the same `PermutationPtr`. The distinction is made via `std::unique_ptr<Permutation> internalPermutation_` in the `Permutation` class, which is rather hacky. Instead, the distinction should be made in the query planner, so that the `PermutationPtr` object is already the correct permutation (normal or internal). That is work for a separate PR.
hannahbast pushed a commit that referenced this pull request Dec 5, 2025
Refactor the (rather lengthy and complex) `CompressedRelationWriter::createPermutationPair` method into an own `struct`, where complex lambdas become proper member functions and the core loop that iterates over all blocks of triples is now much easier to read and much shorter. This is the foundation of further refactoring, in particular for the writing of only a single permutation as opposed to a pair of permutations. Preparation for #2537 and thus also #2482
@sparql-conformance
Copy link

Overview

Number of Tests Passed ✅ Intended ✅ Failed ❌ Not tested
525 379 67 79 0

Conformance check passed ✅

No test result changes.

Details: https://qlever.dev/sparql-conformance-ui?cur=72e40be0de55f9e42ab86b2658216217f17f4235&prev=84156ed50f4f55b0fbfced0b9fbb167f1508a18f

@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 9, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants