-
Notifications
You must be signed in to change notification settings - Fork 103
First version of magic proxy service #2575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Includes `qlproxy_server.py` for end-to-end testing. This should eventually be removed again. For example, when that server is started with `python3 qlproxy_server.py --port 8888 verbose`, the the following should work in QLever: ```sparql PREFIX qlproxy: <https://qlever.cs.uni-freiburg.de/qlproxy/> SELECT ?result WHERE { VALUES (?x1 ?x2) { (1 3) (2 2) (3 1) } SERVICE qlproxy: { _:config qlproxy:endpoint <https://qlever.dev/api/proxy-test> ; qlproxy:payload_first ?x1 ; qlproxy:payload_second ?x2 ; qlproxy:result_res ?result ; qlproxy:param_op "add" . } } ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a new magic proxy service (SERVICE qlproxy:) for QLever that enables SPARQL queries to send payload bindings to remote endpoints and receive result bindings back. The implementation allows arithmetic and other operations to be delegated to external services, with payload and result variables mapped between QLever and the remote endpoint.
Key changes:
- Added
ProxyQueryparser to handleqlproxy:service configuration with endpoint URL, payload variables, result variables, and URL parameters - Implemented
ProxyOperationengine operation that serializes payload as SPARQL Results JSON, sends HTTP POST requests, and parses responses - Integrated proxy operation into the query planner with child operation support similar to
SpatialJoin
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/parser/ProxyQuery.h |
Defines ProxyQuery and ProxyConfiguration structures for parsing and validating qlproxy service configuration |
src/parser/ProxyQuery.cpp |
Implements parameter extraction and validation for endpoint, payload, result, and URL parameter specifications |
src/parser/MagicServiceIriConstants.h |
Adds QLPROXY_IRI constant for the qlproxy service IRI |
src/parser/GraphPatternOperation.h |
Adds ProxyQuery to the graph pattern operation variant |
src/parser/sparqlParser/SparqlQleverVisitor.h |
Declares visitProxyQuery method for SPARQL parser visitor |
src/parser/sparqlParser/SparqlQleverVisitor.cpp |
Implements visitor method to parse and validate qlproxy service clauses |
src/parser/CMakeLists.txt |
Adds ProxyQuery.cpp to parser build |
src/engine/ProxyOperation.h |
Defines ProxyOperation class with HTTP request handling and JSON serialization |
src/engine/ProxyOperation.cpp |
Implements proxy operation execution, including payload serialization, HTTP communication, and result parsing |
src/engine/QueryPlanner.h |
Declares methods for proxy operation planning and child addition |
src/engine/QueryPlanner.cpp |
Implements query planner logic to join proxy operations with child operations providing payload variables |
src/engine/CheckUsePatternTrick.cpp |
Adds ProxyQuery to operations where pattern trick is disabled |
src/engine/Service.h |
Removes bindingToTripleComponent method (moved to shared utility) |
src/engine/Service.cpp |
Refactors to use sparqlJsonBindingUtils::bindingToTripleComponent instead of member method |
src/engine/CMakeLists.txt |
Adds ProxyOperation.cpp to engine build and updates comment about C++20-dependent operations |
test/ProxyOperationTest.cpp |
Comprehensive tests for ProxyOperation including basic methods, multiple result variables, URL parameters, and error handling |
test/ServiceTest.cpp |
Updates bindingToTripleComponent test to use shared utility function |
test/CMakeLists.txt |
Adds ProxyOperationTest and SparqlJsonBindingUtilsTest to test suite |
qlproxy_server.py |
Python test server for end-to-end testing, performing arithmetic operations on received bindings |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| first = false; | ||
| // URL-encode the value (simple encoding for common characters). | ||
| url += name + "=" + |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter name should also be URL-encoded to handle special characters properly. Currently only the value is encoded. Consider using:
url += boost::urls::encode(name, boost::urls::unreserved_chars) + "=" +
boost::urls::encode(value, boost::urls::unreserved_chars);| url += name + "=" + | |
| url += boost::urls::encode(name, boost::urls::unreserved_chars) + "=" + |
|
|
||
| # Build the response | ||
| response = { | ||
| "head": {"vars": ["res"]}, |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded variable name "res" doesn't match the example in the docstring (lines 16-19), which shows qlproxy:result_result ?sum, implying the result variable name should be "result". The server should either:
- Accept the variable names from the request's "head.vars" field and echo them back in the response, or
- Update the docstring example to use
qlproxy:result_res ?sumto match the hardcoded "res" name.
Option 1 is more flexible and correct for a general-purpose test server.
| std::string url = config_.endpoint_; | ||
| if (!config_.parameters_.empty()) { | ||
| url += "?"; | ||
| bool first = true; | ||
| for (const auto& [name, value] : config_.parameters_) { | ||
| if (!first) { | ||
| url += "&"; | ||
| } | ||
| first = false; | ||
| // URL-encode the value (simple encoding for common characters). | ||
| url += name + "=" + | ||
| boost::urls::encode(value, boost::urls::unreserved_chars); | ||
| } | ||
| } |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The URL string is built using repeated += operations which can cause multiple reallocations. For better performance with many parameters, consider using absl::StrCat or preallocating the string size, similar to:
std::vector<std::string> parts;
parts.reserve(config_.parameters_.size());
for (const auto& [name, value] : config_.parameters_) {
parts.push_back(absl::StrCat(name, "=",
boost::urls::encode(value, boost::urls::unreserved_chars)));
}
return absl::StrCat(config_.endpoint_, "?", absl::StrJoin(parts, "&"));| } else { | ||
| throw ProxyException(absl::StrCat( | ||
| "Unsupported parameter `", predString, | ||
| "` in qlproxy service`. Supported parameters are: `<endpoint>`, " |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The backtick placement in this error message is incorrect. The string currently reads `in qlproxy service`. which places the backtick before "service`" oddly. It should either be:
in qlproxy service.(remove the misplaced backtick), or`in qlproxy service.`(move the closing backtick after the period)
Most likely the backtick should be removed entirely for consistency with the rest of the message.
| "` in qlproxy service`. Supported parameters are: `<endpoint>`, " | |
| "` in qlproxy service. Supported parameters are: `<endpoint>`, " |
src/engine/CMakeLists.txt
Outdated
| CartesianProductJoin.cpp TextIndexScanForWord.cpp TextIndexScanForEntity.cpp | ||
| TextLimit.cpp LazyGroupBy.cpp GroupByHashMapOptimization.cpp SpatialJoin.cpp | ||
| CountConnectedSubgraphs.cpp SpatialJoinAlgorithms.cpp PathSearch.cpp ExecuteUpdate.cpp | ||
| CountConnectedSubgraphs.cpp SpatialJoinAlgorithms.cpp PathSearch.cpp ProxyOperation.cpp ExecuteUpdate.cpp |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding ProxyOperation.cpp to the engine target enables the SERVICE qlproxy: feature, whose implementation sends HTTP/HTTPS requests to the user-specified qlproxy:endpoint URL with no host/port restrictions. On a publicly reachable SPARQL endpoint this lets an attacker craft queries that cause QLever to issue arbitrary HTTP(S) POSTs to internal services (e.g., 127.0.0.1, cloud metadata IPs, admin panels), which is a classic SSRF primitive. To mitigate this, gate SERVICE qlproxy: behind server-side configuration and/or enforce a strict allowlist for proxy targets (e.g., configured hostnames and safe ports, blocking private/reserved IP ranges) instead of accepting arbitrary external URLs from queries.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #2575 +/- ##
==========================================
- Coverage 91.14% 90.75% -0.39%
==========================================
Files 469 474 +5
Lines 40181 40463 +282
Branches 5376 5424 +48
==========================================
+ Hits 36622 36724 +102
- Misses 2021 2189 +168
- Partials 1538 1550 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Overview
Conformance check passed ✅No test result changes. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this this morning and it works very well as long as there is a single proxy service. If you try to combine multiple proxy services, it is quite easy to generate cryptic assertion failures:
PREFIX qlproxy: <https://qlever.cs.uni-freiburg.de/qlproxy/>
SELECT ?result2 WHERE {
VALUES (?x1 ?x2) { (1 3) (2 2) (3 1) }
SERVICE qlproxy: {
_:config qlproxy:endpoint <https://qlever.dev/api/proxy-test> ;
qlproxy:payload_first ?x1 ;
qlproxy:payload_second ?x2 ;
qlproxy:result_res ?result ;
qlproxy:param_op "add" .
}
SERVICE qlproxy: {
_:config qlproxy:endpoint <https://qlever.dev/api/proxy-test> ;
qlproxy:payload_first ?x1 ;
qlproxy:payload_second ?x2 ;
qlproxy:result_res ?result2 ;
qlproxy:param_op "add" .
}
}
results in an ERROR: Assertion 'res > 0' failed. in MultiColumnJoin.cpp. I guess that query doesn't make any sense because the variables ?x1 and ?x2 are not "passed through" by the proxy, but the error message should be better.
For the following query, I would naively expect a result of
8
8
8
but I am getting ERROR: Assertion 'variablesAreDisjoint' failed.:
PREFIX qlproxy: <https://qlever.cs.uni-freiburg.de/qlproxy/>
SELECT ?result2 WHERE {
VALUES (?x1 ?x2) { (1 3) (2 2) (3 1) }
SERVICE qlproxy: {
_:config qlproxy:endpoint <https://qlever.dev/api/proxy-test> ;
qlproxy:payload_first ?x1 ;
qlproxy:payload_second ?x2 ;
qlproxy:result_res ?result ;
qlproxy:param_op "add" .
}
SERVICE qlproxy: {
_:config qlproxy:endpoint <https://qlever.dev/api/proxy-test> ;
qlproxy:payload_first ?result ;
qlproxy:payload_second ?result ;
qlproxy:result_res ?result2 ;
qlproxy:param_op "add" .
}
}
However, I am not sure whether queries which include multiple SERVICE proxy calls like that even make any sense.
EDIT: in the query above, the following works:
PREFIX qlproxy: <https://qlever.cs.uni-freiburg.de/qlproxy/>
SELECT ?result2 WHERE {
{
SELECT * WHERE {
VALUES (?x1 ?x2) { (1 3) (2 2) (3 1) }
SERVICE qlproxy: {
_:config qlproxy:endpoint <https://qlever.dev/api/proxy-test> ;
qlproxy:payload_first ?x1 ;
qlproxy:payload_second ?x2 ;
qlproxy:result_res ?result ;
qlproxy:param_op "add" .
}
}
}
SERVICE qlproxy: {
_:config qlproxy:endpoint <https://qlever.dev/api/proxy-test> ;
qlproxy:payload_first ?result ;
qlproxy:payload_second ?result ;
qlproxy:result_res ?result2 ;
qlproxy:param_op "add" .
}
}



Includes
qlproxy_server.pyfor end-to-end testing. This should eventually be removed again. For example, when that server is started withpython3 qlproxy_server.py --port 8888 verbose, the the following should work in QLever: