Add support for materialized views #2482

ullingerc · 2025-10-31T11:37:26Z

Included in this PR

infrastructure MaterializedView, MaterializedViewWriter, MaterializedViewsManager
magic service query for reading any number of columns + magic predicate for reading a single column
IndexScan support for MaterializedView
support for reading from a materialized view in parser and query planner
writing views from SPARQL queries via an HTTP request and via libqlever
very large views can be written due to external sorting + sorting is only done if necessary (not if the query result is already sorted correctly)
explicitly loading views via an HTTP request and via libqlever
code documentation
unit tests for direct usage, usage through Libqlever, usage through HTTP and logging

Note: this PR is merged with and depends on helper PR #2519

Shortcomings to be adressed in future PRs

The query to build a view currently needs to select a minimum of four columns and may not need to use a local vocab
Materialized Views are not detected automatically in query planning to accelerate conventional queries but rather need to be requested manually as shown below
MaterializedViewWriter currently computes and immediately discards an unused SOP permutation
Currently views can be loaded one-by-one or are loaded automatically upon first use. In the future the configuration should allow loading a list or all at server start.
Updates after creation of a view are ignored
Reading from a materialized view always reads the first two payload columns even if they are not requested (because of hard coded triple-semantics of IndexScan) - they are discarded automatically via column stripping

Usage examples (Reading + Writing, Libqlever + HTTP)

Write a materialized view with libqlever:

qlever::EngineConfig config;
config.baseName_ = "my-dataset";
qlever::Qlever qlv{config};
qlv.writeMaterializedView("nameOfTheView", "SELECT ... { ... }");

Write a materialized view with an HTTP request:

curl "http://localhost:7930/?cmd=write-materialized-view&view-name=nameOfTheView&timeout=24h&access-token=$ACCESS_TOKEN" \
  -H "Accept: application/json" 
  -H "Content-type: application/sparql-query" 
  --data "SELECT * WHERE {... }"
# Returns: {"materialized-view-written":"nameOfTheView"}

Load a view explicitly with libqlever:

qlv.loadMaterializedView("nameOfTheView");

Load a view explicitly with an HTTP request (otherwise it is loaded automatically when the first query is issued on the view):

curl "http://localhost:7930/?cmd=load-materialized-view&view-name=nameOfTheView&access-token=$ACCESS_TOKEN"

Different ways to read from a materialized view:

PREFIX osmkey: <https://www.openstreetmap.org/wiki/Key:>
PREFIX osmway: <https://www.openstreetmap.org/way/>
PREFIX view: <https://qlever.cs.uni-freiburg.de/materializedView/>
SELECT * WHERE {
  # Method 1: scan with fixed subject
  osmway:6593464 view:geom:geometry ?geo .
  
  # Method 2: scan without fixed subject
  ?x view:geom:geometry ?geo .
  ?x osmkey:waterway [] .
  
  # Method 3: service with control over which payload columns are read
  SERVICE view: {
    _:config view:name "geom" ;
             view:scan-column osmway:6593464 ; # could also be a variable here
             view:payload-geometry ?geo ;
             view:payload-centroid ?centroid .
  }
}

Performance tests for geometries and geometry info

All experiments with warm OS disk cache (Run query once, clear cache, run query again and take the time of the run after clear-cache). Indices have on-disk-compressed-geo-split vocabulary type. The materialized view contains geo:hasGeometry/geo:asWKT, geof:centroid, geof:metricLength, geof:metricArea .

Query to write the view:

curl "http://localhost:7907/?cmd=write-materialized-view&view-name=geom&timeout=48h&access-token=$ACCESS_TOKEN" -H "Accept: application/json" -H "Content-type: application/sparql-query" --data "PREFIX geo: <http://www.opengis.net/ont/geosparql#> PREFIX geof: <http://www.opengis.net/def/function/geosparql/> SELECT * WHERE { ?osm geo:hasGeometry ?interm . ?interm geo:asWKT ?geometry . BIND(geof:centroid(?geometry) AS ?centroid) BIND(geof:metricLength(?geometry) AS ?length) BIND(geof:metricArea(?geometry) AS ?area) }"

Times below indicated as: Time-without-using-view -> Time-using-view

OSM Switzerland

View stats: 58,870,803 rows, time for writing 32 sec, size 941 MiB

Join geo:hasGeometry/geo:asWKT with ?osm_id osmkey:highway []: 600 ms -> 270 ms
Join geo:hasGeometry/geo:asWKT with ?osm_id osmkey:building []: 775 ms -> 290 ms
Join geo:hasGeometry/geo:asWKT with ?osm_id osmkey:amenity "restaurant": 250 ms -> 150 ms
Fixed subject osmrel:1690227 geo:hasGeometry/geo:asWKT (Zurich): 4 ms -> 2 ms
geo:hasGeometry/geo:asWKT, geof:centroid, and geof:metricArea of all osmkey:building []: 3780 ms -> 350 ms (Note that the index has on-disk-compressed-geo-split vocabulary type - the results are thus precomputed in both cases, just the method of retrieval is different)
Prefiltering test: ?osm_id osmkey:name "Zürich" . ?osm_id geo:hasGeometry/geo:asWKT ?geometry, both 13 ms, prefiltered Index Scan reads 8 of 1256 blocks of view

OSM Planet

View stats: 11,358,592,564 rows, time for writing 12 min 52 sec, size 183 GiB

Join geo:hasGeometry/geo:asWKT with ?osm_id osmkey:highway [] with LIMIT 1000: 26912ms -> 1768ms
Prefiltering test: ?osm_id osmkey:name "Zürich" . ?osm_id geo:hasGeometry/geo:asWKT ?geometry, 271ms -> 66ms

Performance tests for large star (table-like structure)

Like above, experiments run with warm cache. Queries run with ?send=0 to avoid measuring export time.

DBLP

Wrote view with:

curl "http://localhost:7915/?cmd=write-materialized-view&view-name=articles&access-token=$ACCESS_TOKEN" -H "Accept: application/sparql-results+json" -H "Content-type: application/sparql-query" --data "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX dblp: <https://dblp.org/rdf/schema#> PREFIX datacite: <http://purl.org/spar/datacite/> SELECT * WHERE { ?s rdf:type dblp:Article ; dblp:numberOfCreators ?numberOfCreators ; dblp:yearOfPublication ?yearOfPublication ; dblp:title ?title ; dblp:bibtexType ?bibtexType }"

View stats: 3,039,856 rows, time for writing < 1 sec, size 17 MiB

Query without view: time to compute 397ms, total time 444ms

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX datacite: <http://purl.org/spar/datacite/>
SELECT * WHERE {
  ?s rdf:type dblp:Article ;
     dblp:numberOfCreators ?numberOfCreators ;
     dblp:yearOfPublication ?yearOfPublication ;
     dblp:title ?title ;
     dblp:bibtexType ?bibtexType
}

Query with view: time to compute 3ms, total time 75ms

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX datacite: <http://purl.org/spar/datacite/>
PREFIX view: <https://qlever.cs.uni-freiburg.de/materializedView/>
PREFIX p: <https://qlever.cs.uni-freiburg.de/materializedView/payload->
SELECT * WHERE {
  SERVICE view: {
    _:x view:name "articles" ;
        view:scan-column ?s ;
        p:numberOfCreators ?numberOfCreators ;
        p:yearOfPublication ?yearOfPublication ;
        p:title ?title ;
        p:bibtexType ?bibtexType .
  }
}

…periment

…mutation

…periment

#2519) Previously, the `IndexScan` constructor took a `Permutation::Enum` and retrieved the respective `Permutation` and `LocatedTriplesSnapshot` from its `QueryExecutionContext`. However, this was limited to the six standard permutations PSO, POS, SPO, SOP, OSP, and OPS. With this change, the `IndexScan` constructor takes a `PermutationPtr` and `LocatedTriplesSnapshotPtr`, which are shared pointers to the "real thing". This can then be used also for custom permutations, which are not part of the regular index, in particular those from the materialized views in #2482 NOTE: The "normal" and "internal" variant of the PSO or POS permutation still use the same `PermutationPtr`. The distinction is made via `std::unique_ptr<Permutation> internalPermutation_` in the `Permutation` class, which is rather hacky. Instead, the distinction should be made in the query planner, so that the `PermutationPtr` object is already the correct permutation (normal or internal). That is work for a separate PR.

…periment

…xperiment

…periment

Refactor the (rather lengthy and complex) `CompressedRelationWriter::createPermutationPair` method into an own `struct`, where complex lambdas become proper member functions and the core loop that iterates over all blocks of triples is now much easier to read and much shorter. This is the foundation of further refactoring, in particular for the writing of only a single permutation as opposed to a pair of permutations. Preparation for #2537 and thus also #2482

…xperiment

…periment

sparql-conformance · 2025-12-09T11:28:45Z

Overview

Number of Tests	Passed ✅	Intended ✅	Failed ❌	Not tested
525	379	67	79	0

Conformance check passed ✅

No test result changes.

Details: https://qlever.dev/sparql-conformance-ui?cur=72e40be0de55f9e42ab86b2658216217f17f4235&prev=84156ed50f4f55b0fbfced0b9fbb167f1508a18f

sonarqubecloud · 2025-12-09T13:23:57Z

Quality Gate passed

Issues
12 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

ullingerc added 23 commits October 29, 2025 14:08

exp

0bd43a0

work

50a48ab

lots of stuff , still has problems

38cd939

fix error

c49d4f4

omg it seems to work

49f9bf2

dynamic table

aeb14bd

remove unnecessary permutation step

5af792a

do stuff

a3287bb

more columns

4072228

add MaterializedViewWriter

c15fda0

prepare stuff

d758017

permutation load

af7bb3d

stuff

f3d2bc9

work on infra

13e03f9

stuff

fb901db

hack some stuff into indexscan op

93b486d

Merge https://github.com/ad-freiburg/qlever into materializedviews-ex…

8d243d5

…periment

fix. order of initialisation

853cbfc

rough clean up

1585cfc

remember variable names

76fa613

test with regular index scan and additional cols

e234c73

notes

73e49eb

lots of work

596b697

ullingerc changed the title ~~Experiments in preparation for Materialized Views~~ Add support for materialized views Nov 4, 2025

ullingerc added 6 commits November 4, 2025 18:02

clean up

ab21496

work

748a6ca

explicit loading

c314ebd

loading in libqlever

030b0c5

start work on tests

3a27b54

add docs and first batch of tests

715f5bf

ullingerc added 13 commits November 20, 2025 15:17

fix spelling

35931b9

fix failing test

2dac105

use shared pointers instead of references

00e7a78

apply feedback by Johannes

59088f2

Merge branch 'indexscan-custom-permutation' into materializedviews-ex…

50d0485

…periment

use ptrs

2397cf2

clangformat

78cebe3

move varsToKeep

b23941a

Merge branch 'indexscan-custom-permutation' into materializedviews-ex…

eb2d9d2

…periment

Merge https://github.com/ad-freiburg/qlever into materializedviews-ex…

3f16de1

…periment

Merge https://github.com/ad-freiburg/qlever into indexscan-custom-per…

a1d0007

…mutation

move two large objects

66873a5

Merge branch 'indexscan-custom-permutation' into materializedviews-ex…

68858aa

…periment

ullingerc added 5 commits November 24, 2025 11:22

Merge https://github.com/ad-freiburg/qlever into materializedviews-ex…

50bdd67

…periment

clean up after merge of indexscan changes

28e9379

clean up after merge of indexscan changes

82e0455

Merge https://github.com/ad-freiburg/qlever into materializedviews-ex…

18e687d

…periment

Merge https://github.com/ad-freiburg/qlever into materializedviews-ex…

57fb8ef

…periment

ullingerc mentioned this pull request Nov 25, 2025

Allow writing single permutations with CompressedRelationWriter #2537

Open

Merge remote-tracking branch 'origin/master' into materializedviews-e…

1b82e1d

…xperiment

ullingerc mentioned this pull request Dec 2, 2025

Refactor CompressedRelationWriter::createPermutationPair #2566

Merged

Merge https://github.com/ad-freiburg/qlever into materializedviews-ex…

c145b1e

…periment

Merge remote-tracking branch 'origin/master' into materializedviews-e…

e878424

…xperiment

ullingerc mentioned this pull request Dec 8, 2025

Add MaterializedViewQuery #2581

Open

ullingerc added 2 commits December 9, 2025 11:34

add coverage stuff from factored out pr

0558863

Merge https://github.com/ad-freiburg/qlever into materializedviews-ex…

72e40be

…periment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for materialized views #2482

Add support for materialized views #2482

Uh oh!

ullingerc commented Oct 31, 2025 •

edited

Loading

Uh oh!

sparql-conformance bot commented Dec 9, 2025

Uh oh!

sonarqubecloud bot commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add support for materialized views #2482

Are you sure you want to change the base?

Add support for materialized views #2482

Uh oh!

Conversation

ullingerc commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Included in this PR

Shortcomings to be adressed in future PRs

Usage examples (Reading + Writing, Libqlever + HTTP)

Performance tests for geometries and geometry info

OSM Switzerland

OSM Planet

Performance tests for large star (table-like structure)

DBLP

Uh oh!

sparql-conformance bot commented Dec 9, 2025

Overview

Conformance check passed ✅

Uh oh!

sonarqubecloud bot commented Dec 9, 2025

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ullingerc commented Oct 31, 2025 •

edited

Loading