Skip to content

Add option for deterministic defaults#18

Open
giograno wants to merge 1 commit intomainfrom
deterministic-defaults
Open

Add option for deterministic defaults#18
giograno wants to merge 1 commit intomainfrom
deterministic-defaults

Conversation

@giograno
Copy link
Member

@giograno giograno commented Feb 18, 2026

Background

For our use case, we have decided to distinguish between 3 cases when it comes to Avro compatibility:

  1. Two schemas are identical;
  2. Two schemas differ but are backward compatible;
  3. Two schemas differ and are not compatible.

Problem

Avro supports default values, and they are often used within dataclasses.
Imagine the following code:

@dataclass.dataclasses
class PyType:
   creation_time: datetime = dataclasses.field(default_factory=lambda: datetime.now(tz=timezone.utc))

The generated schema will look like this:

{
  "type": "record",
  "name": "PyType",
  "namespace": "tests.unit.persistence.avro.aws.test_store_schemas",
  "fields": [
    {
      "type": {
        "type": "long",
        "logicalType": "timestamp-micros"
      },
      "name": "creation_time",
      "default": 1771420337157151
    }
  ]
}

Given that we use the default factories to set default values, the default will be different each time we regenerate the schema. As the code did not change, they should be identical but differ in the default.
Avro has a way to represent the schema called "canonical", which excludes the default attribute. Unfortunately, we can't use this option because any addition with a default (which is compatible) would have been marked as incompatible because the defaults are stripped.

Solution

We introduce a new option that puts a placeholder instead of factory defaults. We do this for datetime and uuids, as they are the ones we did find to change all the time.

Usage

When saving Avro states, we'll keep the option disabled, as we want to properly record defaults in the schema saved along with the records. For the schemas we want to keep on disk, we'll turn the option on.

@giograno giograno requested review from bentsku and purcell February 18, 2026 07:39
@giograno giograno self-assigned this Feb 18, 2026
@giograno giograno marked this pull request as ready for review February 18, 2026 13:19
Copy link

@bentsku bentsku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I do not see a big issue with this, even if we ended not using it, it is behind an optional flag. We can also extend it if we find other cases.

Thanks for jumping on this, really pragmatic and clean approach 👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments