Skip to content

Add a JSON reader option to ignore type conflicts#7276

Merged
alamb merged 13 commits intoapache:mainfrom
scovich:json-ignore-type-conflicts-option
Apr 7, 2026
Merged

Add a JSON reader option to ignore type conflicts#7276
alamb merged 13 commits intoapache:mainfrom
scovich:json-ignore-type-conflicts-option

Conversation

@scovich
Copy link
Copy Markdown
Contributor

@scovich scovich commented Mar 12, 2025

Which issue does this PR close?

Rationale for this change

JSON data is notoriously non-homogenous, but the JSON parser today is super strict -- it requires a concrete schema and parsing fails if any field of any row encounters a type conflict. In such cases, it can be preferable for incompatible fields to parse as NULL instead of producing a hard error.

What changes are included in this PR?

Adds a new method arrow_json::reader::ReaderBuilder::with_ignore_type_conflicts, which can override the default behavior of throwing on type conflict, to return NULL values instead.

Plumb that flag through to all ten decoders so they honor it: Null, Boolean, Primitive, Decimal, Timestamp, String, StringView, List, Map, Struct.

Add both positive and negative unit tests for each decoder type, to ensure the plumbing worked.

Are there any user-facing changes?

New API method, see above.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 12, 2025
Comment on lines +38 to +41
for p in pos {
if !matches!(tape.get(*p), TapeElement::Null) {
return Err(tape.error(*p, "null"));
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: Indentation-only change

@tustvold
Copy link
Copy Markdown
Contributor

Have you run the benchmarks for this?

@scovich
Copy link
Copy Markdown
Contributor Author

scovich commented Mar 12, 2025

Have you run the benchmarks for this?

Not yet... but https://github.com/apache/arrow-rs/blob/main/CONTRIBUTING.md makes it look very easy. Will do so and report back.

@scovich
Copy link
Copy Markdown
Contributor Author

scovich commented Mar 12, 2025

@tustvold is cargo bench -p arrow-json sufficient? Or do I need to benchmark some other sub-crates as well? Asking because there didn't seem to be very many benchmarks in the arrow-json crate?

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 12, 2025

@tustvold is cargo bench -p arrow-json sufficient? Or do I need to benchmark some other sub-crates as well? Asking because there didn't seem to be very many benchmarks in the arrow-json crate?

I think this one is probably what @tustvold is referring to: https://github.com/apache/arrow-rs/blob/a75da00eed762f8ab201c6cb4388921ad9b67e7e/arrow/benches/json_reader.rs#L30-L29

so like

cargo bench --bench json_reader

@scovich
Copy link
Copy Markdown
Contributor Author

scovich commented Mar 12, 2025

Hmm, the benchmark results are not stable from run to run. Even benchmarking the main branch against itself gives a randomly and changing set of regressions and improvements. I tried on two very different computers: 2021 MacBook Pro (Apple M1 Max) and a 2019 Lenovo T490s (Intel Core i5-8365U). Different absolute numbers, same large jitter.

Is there some trick for getting stable numbers? I tried increasing the measurement interval to 10s, it didn't solve the problem.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 12, 2025

Is there some trick for getting stable numbers? I tried increasing the measurement interval to 10s, it didn't solve the problem.

I am not suepr familar

Maybe you could use a non laptop (sometimes they vary based on thermostats, etc)?

@scovich
Copy link
Copy Markdown
Contributor Author

scovich commented Mar 13, 2025

Finally got some reasonably stable benchmark results using an EC2 m6i.8xlarge instance, rustc 1.85.0. It uncovered one issue with the append helpers I had introduced. After addressing that, we now have:

benchmark run1 run2 run3 run4 run5
small_bench_primitive --noise-- --noise-- --noise-- --noise-- --noise--
large_bench_primitive 2.27% faster 2.27% faster 1.65% faster 2.71% faster 1.36% faster
small_bench_list --noise-- --noise-- 3.69% faster 2.71% faster 6.02% faster

I don't know why my changes should have caused a speedup, but at least there's no slowdown.

Benchmarking commands used
# 5 runs against upstream main branch
git checkout 82c2d5f4c 
for i in $(seq 5); do cargo bench --bench json_reader -- --save-baseline main$i; done

# 5 runs against this PR
git switch json-ignore-type-conflicts-option
for i in $(seq 5); do cargo bench --bench json_reader -- --save-baseline feature$i; done

# compare the run results
for i in $(seq 5); do cargo bench --bench json_reader -- --load-baseline feature$i --baseline main$i; done

(see https://bheisler.github.io/criterion.rs/book/user_guide/command_line_options.html#baselines)

@scovich
Copy link
Copy Markdown
Contributor Author

scovich commented Mar 13, 2025

@tustvold -- does the above work? Or are there other benchmarks to double check?

@scovich
Copy link
Copy Markdown
Contributor Author

scovich commented Apr 4, 2025

bump?

@Blizzara
Copy link
Copy Markdown
Contributor

Hey @tustvold, have you had a chance to look at this? :) It would be very useful for our use-case as well 🤞

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 28, 2025

@scovich and @Blizzara -- would this PR be superceded by this PR?

BTW I am pretty sure this usecase is what @mwylde described in his blog from arroyo last year: https://www.arroyo.dev/blog/fast-arrow-json-decoding

@scovich
Copy link
Copy Markdown
Contributor Author

scovich commented Apr 28, 2025

@scovich and @Blizzara -- would this PR be superceded by this PR?

* [Add custom decoder in arrow-json #7442](https://github.com/apache/arrow-rs/pull/7442)

Potentially? But I'd be very interested to see how one could actually achieve the same result with that other PR in practice. I suspect it would require a pretty annoying exercise to replicate each decoder type, just to inject the null handling. Maybe each custom decoder could be a thin wrapper around the default one... except I don't think any of the decoders are public today (see e.g. https://docs.rs/arrow-json/55.0.0/arrow_json/all.html)?

@Blizzara
Copy link
Copy Markdown
Contributor

@scovich and @Blizzara -- would this PR be superceded by this PR?

@cht42 can confirm from our side, but I believe we'd need (or like) both; this PR handles type conflicts for most types, while the other (#7442) allows us to handle the specific case for strings. Using #7442 for all conflicts may be possible but likely requires us to have more custom code (as we then need to override all decoders rather than just one, I think).

@Rafferty97
Copy link
Copy Markdown
Contributor

@scovich @Blizzara I've noticed this PR has been sitting around for a while. From my perspective, this looks like a useful addition to the codebase. Is there still any interest in moving this along?

@scovich
Copy link
Copy Markdown
Contributor Author

scovich commented Mar 5, 2026

@scovich @Blizzara I've noticed this PR has been sitting around for a while. From my perspective, this looks like a useful addition to the codebase. Is there still any interest in moving this along?

This and similar PR for customizing JSON parsing have stalled on the architectural/API question of whether we should publicly widen the JSON parser interface with more configs, rely on variant for fancy tricks, or just expose the tape decoder and let people do whatever they want. I don't have a good answer to that, tho now that we actually have variant support, suspect that going through variant would only address some of the motivations for customizing a JSON parser.

@Rafferty97
Copy link
Copy Markdown
Contributor

@scovich Thanks for providing that added context. Given I've got some outstanding PRs that also expand the capabilities of the JSON parser, and thus the available config options, it would be good to tie it together into one conversation. Where is that discussion being held?

@scovich
Copy link
Copy Markdown
Contributor Author

scovich commented Mar 6, 2026

The conversation has been stalled for several months at this point. A very good question when/how we should resume it. Maybe @alamb or @tustvold has ideas?

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 11, 2026

Sorry -- I have been overwhelmed for a few weeks on DataFusion releases and other things. I hope to get back to this at some point.

@alamb

This comment has been minimized.

@adriangbot

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @scovich

I just went through this PR carefully (and had codex help me). I think it looks good

Some things I think we should do before merging:

  1. Update the comments to explain the user facing behavior in more detail (I left some suggestions)
  2. Resolve the merge conflicts

Comment on lines +287 to +291
/// Sets whether the decoder should produce NULL instead of returning an error if it encounters
/// a type conflict on a nullable column (effectively treating it as a non-existent column).
///
/// NOTE: The inferred NULL on type conflict will still produce errors for non-nullable columns,
/// the same as any other NULL or missing value.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please define what a "type conflict" means more specifically? Perhaps with an example

I think it means something like:

the JSON decoder encounters a value that can not be parsed into the specified column type.

For example, if the type is declared to be a nullable [DataType::Int32] but the reader encounters a string value "foo":

  • If with_ignore_type_conflicts is set to false (the default), the reader will return an error.
  • If with_ignore_type_conflicts is set to true, the reader will fill in NULL value for that array element

#[test]
fn test_type_conflict_non_nullable() {
let fields = [
Field::new("bool", DataType::Boolean, false),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

alamb
alamb previously approved these changes Mar 18, 2026
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @scovich

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 18, 2026

The initial benchmarks show some slowdown: #7276 (comment)

I am rerunning to see if it reproducable

@adriangbot

This comment has been minimized.

@adriangbot

This comment has been minimized.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 18, 2026

The benchmarks appear to show 10-15% slowdown for some queries

decode_list_long_i64_json/131072 1.12 345.3±1.77ms 226.8 MB/sec 1.00 307.0±1.67ms 255.0 MB/sec
decode_list_long_i64_serialize 1.09 211.4±6.92ms ? ?/sec 1.00 193.8±5.98ms ? ?/sec
decode_list_short_i64_json/131072 1.10 21.7±0.12ms 240.1 MB/sec 1.00 19.7±0.03ms 264.9 MB/sec

@alamb alamb dismissed their stale review March 18, 2026 21:17

Need to look into potential performance regression

@scovich
Copy link
Copy Markdown
Contributor Author

scovich commented Mar 18, 2026

@alamb -- I tweaked the inner loop logic in a way that seems to help considerably on my laptop. Can you re-spin the benchmarks?

@alamb

This comment has been minimized.

1 similar comment
@alamb

This comment has been minimized.

@adriangbot

This comment has been minimized.

@adriangbot

This comment has been minimized.

@adriangbot

This comment has been minimized.

@adriangbot

This comment has been minimized.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 21, 2026

New benchmarks look much better to me (no regressions)

However, the tests are now failing 🤔

@scovich scovich requested a review from alamb March 23, 2026 18:01
@scovich
Copy link
Copy Markdown
Contributor Author

scovich commented Mar 23, 2026

New benchmarks look much better to me (no regressions)

🎉

However, the tests are now failing 🤔

🤦 found+fixed an unfaithful bit of the refactoring. I don't think it would impact performance, just match arm wrangling.

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @scovich

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 31, 2026

run benchmark json_reader

1 similar comment
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 31, 2026

run benchmark json_reader

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4165081597-637-snwnl 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing json-ignore-type-conflicts-option (0e0d257) to 3b61796 (merge-base) diff
BENCH_NAME=json_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench json_reader
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4165081766-638-w4k2l 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing json-ignore-type-conflicts-option (0e0d257) to 3b61796 (merge-base) diff
BENCH_NAME=json_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench json_reader
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                        json-ignore-type-conflicts-option      main
-----                                        ---------------------------------      ----
decode_binary_hex_json                       1.00     13.7±0.04ms        ? ?/sec    1.02     14.0±0.04ms        ? ?/sec
decode_binary_view_hex_json                  1.02     14.1±0.10ms        ? ?/sec    1.00     13.9±0.06ms        ? ?/sec
decode_fixed_binary_hex_json                 1.01     13.8±0.05ms        ? ?/sec    1.00     13.7±0.06ms        ? ?/sec
decode_list_long_i64_json/131072             1.01    310.2±1.39ms   252.4 MB/sec    1.00    306.7±1.81ms   255.3 MB/sec
decode_list_long_i64_serialize               1.02    187.7±4.79ms        ? ?/sec    1.00    184.5±5.72ms        ? ?/sec
decode_list_short_i64_json/131072            1.02     20.3±0.19ms   257.0 MB/sec    1.00     20.0±0.29ms   261.1 MB/sec
decode_list_short_i64_serialize              1.00     11.5±0.17ms        ? ?/sec    1.02     11.8±0.22ms        ? ?/sec
decode_wide_object_i64_json                  1.02    469.6±3.31ms        ? ?/sec    1.00   459.5±16.10ms        ? ?/sec
decode_wide_object_i64_serialize             1.00   422.3±10.78ms        ? ?/sec    1.02   430.9±14.88ms        ? ?/sec
decode_wide_projection_full_json/131072      1.00    764.3±8.26ms   227.7 MB/sec    1.00   765.9±13.19ms   227.2 MB/sec
decode_wide_projection_narrow_json/131072    1.05    453.2±0.56ms   383.9 MB/sec    1.00    430.8±0.77ms   403.9 MB/sec
infer_json_schema/1000                       1.00  1576.6±15.54µs    80.1 MB/sec    1.00  1578.8±17.01µs    80.0 MB/sec
large_bench_primitive                        1.00   1522.8±2.49µs        ? ?/sec    1.01   1533.2±6.02µs        ? ?/sec
small_bench_list                             1.00      7.9±0.02µs        ? ?/sec    1.02      8.0±0.02µs        ? ?/sec
small_bench_primitive                        1.01      4.4±0.01µs        ? ?/sec    1.00      4.4±0.01µs        ? ?/sec
small_bench_primitive_with_utf8view          1.00      4.4±0.01µs        ? ?/sec    1.00      4.4±0.01µs        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 304.2s
Peak memory 3.5 GiB
Avg memory 2.9 GiB
CPU user 287.4s
CPU sys 16.6s
Disk read 0 B
Disk write 602.9 MiB

branch

Metric Value
Wall time 307.0s
Peak memory 3.5 GiB
Avg memory 2.9 GiB
CPU user 289.1s
CPU sys 17.8s
Disk read 0 B
Disk write 1.2 MiB

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                        json-ignore-type-conflicts-option      main
-----                                        ---------------------------------      ----
decode_binary_hex_json                       1.00     13.6±0.08ms        ? ?/sec    1.02     13.9±0.06ms        ? ?/sec
decode_binary_view_hex_json                  1.03     14.2±0.33ms        ? ?/sec    1.00     13.7±0.06ms        ? ?/sec
decode_fixed_binary_hex_json                 1.02     13.8±0.08ms        ? ?/sec    1.00     13.5±0.06ms        ? ?/sec
decode_list_long_i64_json/131072             1.01    308.8±1.11ms   253.6 MB/sec    1.00    305.3±0.65ms   256.5 MB/sec
decode_list_long_i64_serialize               1.00    186.7±4.71ms        ? ?/sec    1.00    186.1±4.57ms        ? ?/sec
decode_list_short_i64_json/131072            1.03     20.4±0.11ms   256.4 MB/sec    1.00     19.8±0.10ms   264.2 MB/sec
decode_list_short_i64_serialize              1.00     11.5±0.16ms        ? ?/sec    1.06     12.2±0.23ms        ? ?/sec
decode_wide_object_i64_json                  1.03    468.8±3.34ms        ? ?/sec    1.00    456.4±6.73ms        ? ?/sec
decode_wide_object_i64_serialize             1.00   422.4±11.35ms        ? ?/sec    1.03   436.8±15.92ms        ? ?/sec
decode_wide_projection_full_json/131072      1.00    762.4±8.26ms   228.2 MB/sec    1.00   759.8±11.77ms   229.0 MB/sec
decode_wide_projection_narrow_json/131072    1.05    451.5±0.47ms   385.4 MB/sec    1.00    431.3±1.22ms   403.4 MB/sec
infer_json_schema/1000                       1.00   1577.6±9.68µs    80.0 MB/sec    1.00  1583.8±24.64µs    79.7 MB/sec
large_bench_primitive                        1.00   1524.0±2.48µs        ? ?/sec    1.01   1534.1±2.59µs        ? ?/sec
small_bench_list                             1.00      7.9±0.02µs        ? ?/sec    1.01      8.0±0.02µs        ? ?/sec
small_bench_primitive                        1.01      4.4±0.01µs        ? ?/sec    1.00      4.4±0.01µs        ? ?/sec
small_bench_primitive_with_utf8view          1.01      4.4±0.02µs        ? ?/sec    1.00      4.4±0.01µs        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 304.4s
Peak memory 3.5 GiB
Avg memory 2.9 GiB
CPU user 286.0s
CPU sys 18.2s
Disk read 0 B
Disk write 622.4 MiB

branch

Metric Value
Wall time 306.3s
Peak memory 3.5 GiB
Avg memory 2.9 GiB
CPU user 288.7s
CPU sys 17.5s
Disk read 0 B
Disk write 1.6 MiB

File an issue against this benchmark runner

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 31, 2026

👌

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 6, 2026

This PR looks like it has some conflicts but otherwise is ready to go

@alamb alamb merged commit 43d984e into apache:main Apr 7, 2026
23 checks passed
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 7, 2026

Thank you @scovich

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Option for JSON parser to return NULL field values on type mismatch

6 participants