Move to stable C++ API#21
Conversation
|
Not sure why the Windows CI is failing - it seems this might not have been tested before? CC @Maxxen
|
I think you removed the "require unstable c-API" setting in the cmake. It doesnt require the unstable api per-se, but it requires the behavior change that enabled function overloads by default, and thats only available on duckdb main. |
|
Sorry, its also in the Makefile Lines 8 to 14 in b0eaf5b |
|
|
|
Grumble grumble :) thanks |
|
For the failures I ran into before - I have now patched it in the same way as #20 but this will break again when the next DuckDB nightly build is out. The problem is:
For dev / unstable builds these versions are not going to be the same for long, since we can't control the version we pull from PyPI unless it is a release version. We need to figure out something else with the versions here - or maybe just forego the version check on load when testing using the unstable versions. |
Follow-up from #20
Instead of using the C API directly, we try to define a stable C++ API that is effectively a wrapper for the C API. This API just calls into the C API methods, but tries to look similar to our own internal C++ API. As a result, extensions written with this API look more similar to our internal codebase and our C++ extensions. This should make writing these extensions easier and make migrations easier, while still maintaining all of the advantages of C API extensions.
The stable C++ API lives in https://github.com/duckdb/duckdb-cpp-api.
Dependencies
The stable C++ API depends only on (1) the DuckDB C API, and (2) the C++ STL. It is important this does not in any way depend on the main DuckDB C++ codebase, even if this means re-implementing certain functions. This independence allows it to be moved around/edited/extended in extensions as well if required/desired- although ideally at least for our own extensions we avoid doing this and use a central stable DuckDB C++ API.
Entrypoint
To register a C++ extension, the
DUCKDB_EXTENSION_CPP_ENTRYPOINTentrypoint can be used:Internally this calls
DUCKDB_EXTENSION_ENTRYPOINTand creates anExtensionLoaderclass. This class has a number of helper methods to register extension callbacks (Register(LogicalType),Register(CastFunction),Register(ScalarFunction), ...).Types
Logical types exist as the
LogicalTypeclass induckdb/stable/logical_type.hpp. There are helper methods to create various primitive logical types similar to our internal code, e.g.:LogicalType::VARCHAR()Internally this class has a
duckdb_logical_typethat is managed by this class.Functions, Casts & Executors
In order to register functions, we can inherit from a
ScalarFunctionobject. This requires a number of fields to be defined, which are then used to create aduckdb_scalar_functionwhich can be registered.In order to simplify defining functions, there is an
Executorclass that can be used to loop over vectors similar to how we have theGenericExecutorin our own codebase. This uses composable types so that it can also work for nested types (and is used both for structs and primitives in this PR).There are two functions that use this executor internally -
UnaryExecutorandBinaryExecutor- as well as theCastExecutorfor casts. These allow for very easy definitions of functions, e.g. here is thehostfunction:The logical argument types and return types are inferred by the templated parameter types:
And here is how we register the function in the entrypoint:
HostFunction host_function; Register(host_function);Exceptions
Exceptions are automatically caught and passed on as errors to the C API layer. We also introduce a stable
Exceptionclass that has support for formatting, e.g.:This is done using a separate
FormatValuethat mimics ourExceptionFormatValue- but the code is greatly simplified because it does not depend onfmtbut just does a simple loop to replace instances of{}with the replacement arguments.For now the actual exception types are not yet propagated down, but we could do this in the future by formatting exceptions using JSON.
CC @Maxxen @taniabogatsch