Skip to content

Conversation

@hannahbast
Copy link
Member

Includes qlproxy_server.py for end-to-end testing. This should eventually be removed again. For example, when that server is started with python3 qlproxy_server.py --port 8888 verbose, the the following should work in QLever:

PREFIX qlproxy: <https://qlever.cs.uni-freiburg.de/qlproxy/>
SELECT ?result WHERE {
  VALUES (?x1 ?x2) { (1 3) (2 2) (3 1) }
  SERVICE qlproxy: {
    _:config qlproxy:endpoint <https://qlever.dev/api/proxy-test> ;
             qlproxy:payload_first ?x1 ;
             qlproxy:payload_second ?x2 ;
             qlproxy:result_res ?result ;
             qlproxy:param_op "add" .
  }
}

Includes `qlproxy_server.py` for end-to-end testing. This should
eventually be removed again. For example, when that server is started
with `python3 qlproxy_server.py --port 8888 verbose`, the the following
should work in QLever:

```sparql
PREFIX qlproxy: <https://qlever.cs.uni-freiburg.de/qlproxy/>
SELECT ?result WHERE {
  VALUES (?x1 ?x2) { (1 3) (2 2) (3 1) }
  SERVICE qlproxy: {
    _:config qlproxy:endpoint <https://qlever.dev/api/proxy-test> ;
             qlproxy:payload_first ?x1 ;
             qlproxy:payload_second ?x2 ;
             qlproxy:result_res ?result ;
             qlproxy:param_op "add" .
  }
}
```
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a new magic proxy service (SERVICE qlproxy:) for QLever that enables SPARQL queries to send payload bindings to remote endpoints and receive result bindings back. The implementation allows arithmetic and other operations to be delegated to external services, with payload and result variables mapped between QLever and the remote endpoint.

Key changes:

  • Added ProxyQuery parser to handle qlproxy: service configuration with endpoint URL, payload variables, result variables, and URL parameters
  • Implemented ProxyOperation engine operation that serializes payload as SPARQL Results JSON, sends HTTP POST requests, and parses responses
  • Integrated proxy operation into the query planner with child operation support similar to SpatialJoin

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/parser/ProxyQuery.h Defines ProxyQuery and ProxyConfiguration structures for parsing and validating qlproxy service configuration
src/parser/ProxyQuery.cpp Implements parameter extraction and validation for endpoint, payload, result, and URL parameter specifications
src/parser/MagicServiceIriConstants.h Adds QLPROXY_IRI constant for the qlproxy service IRI
src/parser/GraphPatternOperation.h Adds ProxyQuery to the graph pattern operation variant
src/parser/sparqlParser/SparqlQleverVisitor.h Declares visitProxyQuery method for SPARQL parser visitor
src/parser/sparqlParser/SparqlQleverVisitor.cpp Implements visitor method to parse and validate qlproxy service clauses
src/parser/CMakeLists.txt Adds ProxyQuery.cpp to parser build
src/engine/ProxyOperation.h Defines ProxyOperation class with HTTP request handling and JSON serialization
src/engine/ProxyOperation.cpp Implements proxy operation execution, including payload serialization, HTTP communication, and result parsing
src/engine/QueryPlanner.h Declares methods for proxy operation planning and child addition
src/engine/QueryPlanner.cpp Implements query planner logic to join proxy operations with child operations providing payload variables
src/engine/CheckUsePatternTrick.cpp Adds ProxyQuery to operations where pattern trick is disabled
src/engine/Service.h Removes bindingToTripleComponent method (moved to shared utility)
src/engine/Service.cpp Refactors to use sparqlJsonBindingUtils::bindingToTripleComponent instead of member method
src/engine/CMakeLists.txt Adds ProxyOperation.cpp to engine build and updates comment about C++20-dependent operations
test/ProxyOperationTest.cpp Comprehensive tests for ProxyOperation including basic methods, multiple result variables, URL parameters, and error handling
test/ServiceTest.cpp Updates bindingToTripleComponent test to use shared utility function
test/CMakeLists.txt Adds ProxyOperationTest and SparqlJsonBindingUtilsTest to test suite
qlproxy_server.py Python test server for end-to-end testing, performing arithmetic operations on received bindings

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
first = false;
// URL-encode the value (simple encoding for common characters).
url += name + "=" +
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name should also be URL-encoded to handle special characters properly. Currently only the value is encoded. Consider using:

url += boost::urls::encode(name, boost::urls::unreserved_chars) + "=" +
       boost::urls::encode(value, boost::urls::unreserved_chars);
Suggested change
url += name + "=" +
url += boost::urls::encode(name, boost::urls::unreserved_chars) + "=" +

Copilot uses AI. Check for mistakes.

# Build the response
response = {
"head": {"vars": ["res"]},
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded variable name "res" doesn't match the example in the docstring (lines 16-19), which shows qlproxy:result_result ?sum, implying the result variable name should be "result". The server should either:

  1. Accept the variable names from the request's "head.vars" field and echo them back in the response, or
  2. Update the docstring example to use qlproxy:result_res ?sum to match the hardcoded "res" name.

Option 1 is more flexible and correct for a general-purpose test server.

Copilot uses AI. Check for mistakes.
Comment on lines +114 to +127
std::string url = config_.endpoint_;
if (!config_.parameters_.empty()) {
url += "?";
bool first = true;
for (const auto& [name, value] : config_.parameters_) {
if (!first) {
url += "&";
}
first = false;
// URL-encode the value (simple encoding for common characters).
url += name + "=" +
boost::urls::encode(value, boost::urls::unreserved_chars);
}
}
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The URL string is built using repeated += operations which can cause multiple reallocations. For better performance with many parameters, consider using absl::StrCat or preallocating the string size, similar to:

std::vector<std::string> parts;
parts.reserve(config_.parameters_.size());
for (const auto& [name, value] : config_.parameters_) {
  parts.push_back(absl::StrCat(name, "=", 
      boost::urls::encode(value, boost::urls::unreserved_chars)));
}
return absl::StrCat(config_.endpoint_, "?", absl::StrJoin(parts, "&"));

Copilot uses AI. Check for mistakes.
} else {
throw ProxyException(absl::StrCat(
"Unsupported parameter `", predString,
"` in qlproxy service`. Supported parameters are: `<endpoint>`, "
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backtick placement in this error message is incorrect. The string currently reads `in qlproxy service`. which places the backtick before "service`" oddly. It should either be:

  • in qlproxy service. (remove the misplaced backtick), or
  • `in qlproxy service.` (move the closing backtick after the period)

Most likely the backtick should be removed entirely for consistency with the rest of the message.

Suggested change
"` in qlproxy service`. Supported parameters are: `<endpoint>`, "
"` in qlproxy service. Supported parameters are: `<endpoint>`, "

Copilot uses AI. Check for mistakes.
CartesianProductJoin.cpp TextIndexScanForWord.cpp TextIndexScanForEntity.cpp
TextLimit.cpp LazyGroupBy.cpp GroupByHashMapOptimization.cpp SpatialJoin.cpp
CountConnectedSubgraphs.cpp SpatialJoinAlgorithms.cpp PathSearch.cpp ExecuteUpdate.cpp
CountConnectedSubgraphs.cpp SpatialJoinAlgorithms.cpp PathSearch.cpp ProxyOperation.cpp ExecuteUpdate.cpp
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding ProxyOperation.cpp to the engine target enables the SERVICE qlproxy: feature, whose implementation sends HTTP/HTTPS requests to the user-specified qlproxy:endpoint URL with no host/port restrictions. On a publicly reachable SPARQL endpoint this lets an attacker craft queries that cause QLever to issue arbitrary HTTP(S) POSTs to internal services (e.g., 127.0.0.1, cloud metadata IPs, admin panels), which is a classic SSRF primitive. To mitigate this, gate SERVICE qlproxy: behind server-side configuration and/or enforce a strict allowlist for proxy targets (e.g., configured hostnames and safe ports, blocking private/reserved IP ranges) instead of accepting arbitrary external URLs from queries.

Copilot uses AI. Check for mistakes.
@codecov
Copy link

codecov bot commented Dec 5, 2025

Codecov Report

❌ Patch coverage is 44.13580% with 181 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.75%. Comparing base (6c4c294) to head (0b3bdf0).

Files with missing lines Patch % Lines
src/engine/Proxy.cpp 48.04% 84 Missing and 9 partials ⚠️
src/parser/ProxyQuery.cpp 0.00% 54 Missing ⚠️
src/engine/QueryPlanner.cpp 41.17% 17 Missing and 3 partials ⚠️
src/parser/sparqlParser/SparqlQleverVisitor.cpp 0.00% 9 Missing and 1 partial ⚠️
src/engine/Proxy.h 50.00% 2 Missing ⚠️
src/parser/ProxyQuery.h 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2575      +/-   ##
==========================================
- Coverage   91.14%   90.75%   -0.39%     
==========================================
  Files         469      474       +5     
  Lines       40181    40463     +282     
  Branches     5376     5424      +48     
==========================================
+ Hits        36622    36724     +102     
- Misses       2021     2189     +168     
- Partials     1538     1550      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sparql-conformance
Copy link

Overview

Number of Tests Passed ✅ Intended ✅ Failed ❌ Not tested
525 379 67 79 0

Conformance check passed ✅

No test result changes.

Details: https://qlever.dev/sparql-conformance-ui?cur=0b3bdf0cb8b40c8684d016af1fce05b5661fe8f8&prev=6c4c2949e7ee88390ae18948ac6d13dd7107c96e

@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 5, 2025

Copy link
Member

@patrickbr patrickbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this this morning and it works very well as long as there is a single proxy service. If you try to combine multiple proxy services, it is quite easy to generate cryptic assertion failures:

PREFIX qlproxy: <https://qlever.cs.uni-freiburg.de/qlproxy/>
SELECT ?result2 WHERE {
  VALUES (?x1 ?x2) { (1 3) (2 2) (3 1) }
  SERVICE qlproxy: {
    _:config qlproxy:endpoint <https://qlever.dev/api/proxy-test> ;
             qlproxy:payload_first ?x1 ;
             qlproxy:payload_second ?x2 ;
             qlproxy:result_res ?result ;
             qlproxy:param_op "add" .
  }
  SERVICE qlproxy: {
    _:config qlproxy:endpoint <https://qlever.dev/api/proxy-test> ;
             qlproxy:payload_first ?x1 ;
             qlproxy:payload_second ?x2 ;
             qlproxy:result_res ?result2 ;
             qlproxy:param_op "add" .
  }
}

results in an ERROR: Assertion 'res > 0' failed. in MultiColumnJoin.cpp. I guess that query doesn't make any sense because the variables ?x1 and ?x2 are not "passed through" by the proxy, but the error message should be better.

For the following query, I would naively expect a result of

8
8
8

but I am getting ERROR: Assertion 'variablesAreDisjoint' failed.:

PREFIX qlproxy: <https://qlever.cs.uni-freiburg.de/qlproxy/>
SELECT ?result2 WHERE {
  VALUES (?x1 ?x2) { (1 3) (2 2) (3 1) }
  SERVICE qlproxy: {
    _:config qlproxy:endpoint <https://qlever.dev/api/proxy-test> ;
             qlproxy:payload_first ?x1 ;
             qlproxy:payload_second ?x2 ;
             qlproxy:result_res ?result ;
             qlproxy:param_op "add" .
  }
  SERVICE qlproxy: {
    _:config qlproxy:endpoint <https://qlever.dev/api/proxy-test> ;
             qlproxy:payload_first ?result ;
             qlproxy:payload_second ?result ;
             qlproxy:result_res ?result2 ;
             qlproxy:param_op "add" .
  }
}

However, I am not sure whether queries which include multiple SERVICE proxy calls like that even make any sense.

EDIT: in the query above, the following works:

PREFIX qlproxy: <https://qlever.cs.uni-freiburg.de/qlproxy/>
SELECT ?result2 WHERE {
  {
    SELECT * WHERE {
      VALUES (?x1 ?x2) { (1 3) (2 2) (3 1) }
      SERVICE qlproxy: {
        _:config qlproxy:endpoint <https://qlever.dev/api/proxy-test> ;
                 qlproxy:payload_first ?x1 ;
                 qlproxy:payload_second ?x2 ;
                 qlproxy:result_res ?result ;
                 qlproxy:param_op "add" .
      }
    }
  }
  SERVICE qlproxy: {
    _:config qlproxy:endpoint <https://qlever.dev/api/proxy-test> ;
             qlproxy:payload_first ?result ;
             qlproxy:payload_second ?result ;
             qlproxy:result_res ?result2 ;
             qlproxy:param_op "add" .
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants