Skip to content

Move to stable C++ API#21

Merged
Mytherin merged 37 commits into
duckdb:mainfrom
Mytherin:cppstable
Oct 8, 2025
Merged

Move to stable C++ API#21
Mytherin merged 37 commits into
duckdb:mainfrom
Mytherin:cppstable

Conversation

@Mytherin

@Mytherin Mytherin commented Oct 2, 2025

Copy link
Copy Markdown
Contributor

Follow-up from #20

Instead of using the C API directly, we try to define a stable C++ API that is effectively a wrapper for the C API. This API just calls into the C API methods, but tries to look similar to our own internal C++ API. As a result, extensions written with this API look more similar to our internal codebase and our C++ extensions. This should make writing these extensions easier and make migrations easier, while still maintaining all of the advantages of C API extensions.

The stable C++ API lives in https://github.com/duckdb/duckdb-cpp-api.

Dependencies

The stable C++ API depends only on (1) the DuckDB C API, and (2) the C++ STL. It is important this does not in any way depend on the main DuckDB C++ codebase, even if this means re-implementing certain functions. This independence allows it to be moved around/edited/extended in extensions as well if required/desired- although ideally at least for our own extensions we avoid doing this and use a central stable DuckDB C++ API.

Entrypoint

To register a C++ extension, the DUCKDB_EXTENSION_CPP_ENTRYPOINT entrypoint can be used:

DUCKDB_EXTENSION_CPP_ENTRYPOINT(INET) {
    // registration code
}

Internally this calls DUCKDB_EXTENSION_ENTRYPOINT and creates an ExtensionLoader class. This class has a number of helper methods to register extension callbacks (Register(LogicalType), Register(CastFunction), Register(ScalarFunction), ...).

Types

Logical types exist as the LogicalType class in duckdb/stable/logical_type.hpp. There are helper methods to create various primitive logical types similar to our internal code, e.g.:

LogicalType::VARCHAR()

Internally this class has a duckdb_logical_type that is managed by this class.

Functions, Casts & Executors

In order to register functions, we can inherit from a ScalarFunction object. This requires a number of fields to be defined, which are then used to create a duckdb_scalar_function which can be registered.

class ScalarFunction {
public:
	virtual const char *Name() const;
	virtual LogicalType ReturnType() const = 0;
	virtual std::vector<LogicalType> Arguments() const = 0;
	virtual duckdb_scalar_function_t GetFunction() const = 0;
};

In order to simplify defining functions, there is an Executor class that can be used to loop over vectors similar to how we have the GenericExecutor in our own codebase. This uses composable types so that it can also work for nested types (and is used both for structs and primitives in this PR).

There are two functions that use this executor internally - UnaryExecutor and BinaryExecutor - as well as the CastExecutor for casts. These allow for very easy definitions of functions, e.g. here is the host function:

using INET_EXECUTOR_TYPE = StructTypeTernary<PrimitiveType<uint8_t>, PrimitiveType<duckdb_hugeint>, PrimitiveType<uint16_t>>;

class HostFunction : public UnaryFunction<HostFunction, INET_EXECUTOR_TYPE, PrimitiveType<string_t>, StringBuffer> {
public:
	const char *Name() const override {
		return "host";
	}

	static RESULT_TYPE::ARG_TYPE Operation(const INPUT_TYPE::ARG_TYPE &input, STATIC_DATA &data) {
		auto &buffer = data.buffer;
		INET_IPAddress inet;
		inet.type = (INET_IPAddressType)input.a_val;
		inet.address = from_compatible_address(input.b_val, inet.type);
		inet.mask = inet.type == INET_IP_ADDRESS_V4 ? 32 : 128;

		size_t len = ipaddress_to_string(&inet, buffer, sizeof(buffer));

		if (len == 0) {
			throw std::runtime_error("Could not write inet string");
		}
		if (len >= sizeof(buffer)) {
			throw std::runtime_error("Could not write string");
		}
		return string_t(buffer, len);
	}
};

The logical argument types and return types are inferred by the templated parameter types:

namespace duckdb_stable {

template<>
LogicalType TemplateToType<INET_EXECUTOR_TYPE>() {
	return make_inet_type();
}

template <>
LogicalType TemplateToType<PrimitiveType<string_t>>() {
	return LogicalType::VARCHAR();
}

}

And here is how we register the function in the entrypoint:

HostFunction host_function;
Register(host_function);

Exceptions

Exceptions are automatically caught and passed on as errors to the C API layer. We also introduce a stable Exception class that has support for formatting, e.g.:

throw OutOfRangeException("Value {} is out of range", 42);

This is done using a separate FormatValue that mimics our ExceptionFormatValue - but the code is greatly simplified because it does not depend on fmt but just does a simple loop to replace instances of {} with the replacement arguments.

For now the actual exception types are not yet propagated down, but we could do this in the future by formatting exceptions using JSON.

CC @Maxxen @taniabogatsch

@Mytherin

Mytherin commented Oct 2, 2025

Copy link
Copy Markdown
Contributor Author

Not sure why the Windows CI is failing - it seems this might not have been tested before? CC @Maxxen

_duckdb.InvalidInputException: Invalid Input Error: Failed to load 'build/release/inet.duckdb_extension', The file was built specifically for DuckDB version 'd52dd4e3df' and can only be loaded with that version of DuckDB. (this version of DuckDB is '2259ad7316')

@Maxxen

Maxxen commented Oct 2, 2025

Copy link
Copy Markdown
Member

Invalid Input Error: Failed to load 'build/release/inet.duckdb_extension', The file was built specifically for DuckDB version 'd52dd4e3df' and can only be loaded with that version of

I think you removed the "require unstable c-API" setting in the cmake. It doesnt require the unstable api per-se, but it requires the behavior change that enabled function overloads by default, and thats only available on duckdb main.

@Maxxen

Maxxen commented Oct 2, 2025

Copy link
Copy Markdown
Member

Sorry, its also in the Makefile

duckdb-inet/Makefile

Lines 8 to 14 in b0eaf5b

# Set to 1 to enable Unstable API (binaries will only work on TARGET_DUCKDB_VERSION, forwards compatibility will be broken)
# WARNING: When set to 1, the duckdb_extension.h from the TARGET_DUCKDB_VERSION must be used, using any other version of
# the header is unsafe.
USE_UNSTABLE_C_API=1
# The DuckDB version to target
TARGET_DUCKDB_VERSION=d52dd4e3df

@Maxxen

Maxxen commented Oct 7, 2025

Copy link
Copy Markdown
Member

static_assert(false, ...) is always ill-formed in C++ versions < 17, even if the branch/code path/specialization is never instantiated, see e.g. https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2593r0.html#the-actual-rule

@Mytherin

Mytherin commented Oct 7, 2025

Copy link
Copy Markdown
Contributor Author

Grumble grumble :) thanks

@Mytherin

Mytherin commented Oct 7, 2025

Copy link
Copy Markdown
Contributor Author

For the failures I ran into before - I have now patched it in the same way as #20 but this will break again when the next DuckDB nightly build is out.

The problem is:

  • We build against a DuckDB version specified in MainDistributionPipeline.yml
  • We test against a version pulled from PyPI

For dev / unstable builds these versions are not going to be the same for long, since we can't control the version we pull from PyPI unless it is a release version. We need to figure out something else with the versions here - or maybe just forego the version check on load when testing using the unstable versions.

@Mytherin Mytherin merged commit f82bb00 into duckdb:main Oct 8, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants