Skip to content

Comments

perf: optimize block creation#3093

Merged
julienrbrt merged 14 commits intomainfrom
alex/execute_perf
Feb 20, 2026
Merged

perf: optimize block creation#3093
julienrbrt merged 14 commits intomainfrom
alex/execute_perf

Conversation

@alpe
Copy link
Contributor

@alpe alpe commented Feb 19, 2026

Bring moonshot 2 optimizations to main

Performance Optimizations Analysis (alex/bench_phase1_start vs HEAD)

Summary: Targeted optimizations focused on reducing CPU overhead (marshalling/hashing) and I/O latency (store reads) in the block production hot path.

1. Executor: In-Memory Caching for Block Production

  • Change: Executor now caches lastHeaderHash, lastDataHash, and lastSignature in memory after committing a block.
  • Why: Eliminates the need to call store.GetBlockData and store.GetSignature (database I/O) at the start of CreateBlock. This significantly speeds up the "cold start" of a new block round.

2. Executor: Avoid Re-hashing

  • Change: CreateBlock reuses newState.LastHeaderHash (computed during state transition) instead of calling header.Hash() again before broadcasting.
  • Why: header.Hash() triggers a full Protobuf marshal and SHA-256 hash. Reusing the already-computed hash saves CPU cycles.

3. Executor: Optimized Metrics Collection

  • Change: Replaced data.Size() with data.TxsByteSize() for the BlockSizeBytes metric.
  • Why: data.Size() marshals the entire object to Protobuf to get the size. data.TxsByteSize() simply sums the lengths of the transaction byte slices, avoiding expensive allocations and serialization.

4. Types: Cached Public Key Marshalling

  • Change: Signer struct now caches the marshalled public key bytes (marshalledPubKey).
  • Why: ToProto methods (called frequently during P2P exchange and storage) used to call crypto.MarshalPublicKey every time. Caching this eliminates repeated serialization overhead.

5. Types: Lightweight Sequence Validation

  • Change: Introduced AssertValidSequence in State.
  • Why: Provides a "fast path" validation for self-produced blocks. It checks simple fields (Height, Time, Hash Chaining) but skips the expensive Validate() call, which requires re-marshalling and re-hashing the entire block data to verify the DACommitment.

6. Store/Executor: Pre-serialized writes (SaveBlockDataFromBytes)

  • Change: Added SaveBlockDataFromBytes to the Store interface; signHeader now returns the serialized header bytes.
  • Why: Allows the Executor to write the block to disk using the exact bytes it already generated for signing/broadcasting, avoiding a redundant MarshalBinary call at the storage layer.

7. Protobuf: Zero-Allocation Timestamp Conversion

  • Change: In State.ToProto, manually constructed &timestamppb.Timestamp{...}.
  • Why: Replaces timestamppb.New(time), which performs a heap allocation. Small optimization for a high-frequency operation.

8. Executor: Sequential P2P Broadcast

  • Change: Changed Header and Data broadcast from parallel (errgroup) to sequential (Header then Data).
  • Why: Correctness fix that also aids performance stability. Prevents peers from rejecting Data because they haven't received the Header yet, preventing network churn and retries.
go test -bench=BenchmarkProduceBlock -count=5 -benchmem -run='^$'  ./block/internal/executing/...

This branch:

goos: darwin
goarch: arm64
pkg: github.com/evstack/ev-node/block/internal/executing
cpu: Apple M2 Max
BenchmarkProduceBlock/single_tx-12                 43686             26992 ns/op            7205 B/op         80 allocs/op
BenchmarkProduceBlock/single_tx-12                 45000             27086 ns/op            7305 B/op         80 allocs/op
BenchmarkProduceBlock/single_tx-12                 44564             27181 ns/op            7263 B/op         80 allocs/op
BenchmarkProduceBlock/single_tx-12                 44956             27089 ns/op            7295 B/op         80 allocs/op
BenchmarkProduceBlock/single_tx-12                 44187             27195 ns/op            7234 B/op         80 allocs/op
BenchmarkProduceBlock/100_txs-12                   36262             32719 ns/op           25766 B/op         80 allocs/op
BenchmarkProduceBlock/100_txs-12                   36657             32518 ns/op           25758 B/op         80 allocs/op
BenchmarkProduceBlock/100_txs-12                   37702             32044 ns/op           25739 B/op         80 allocs/op
BenchmarkProduceBlock/100_txs-12                   37425             31974 ns/op           25744 B/op         80 allocs/op
BenchmarkProduceBlock/100_txs-12                   37120             32146 ns/op           25750 B/op         80 allocs/op
BenchmarkProduceBlock/empty_batch-12               46038             26368 ns/op            6992 B/op         70 allocs/op
BenchmarkProduceBlock/empty_batch-12               45804             26304 ns/op            6944 B/op         70 allocs/op
BenchmarkProduceBlock/empty_batch-12               45894             26368 ns/op            6975 B/op         70 allocs/op
BenchmarkProduceBlock/empty_batch-12               46137             26293 ns/op            6986 B/op         70 allocs/op
BenchmarkProduceBlock/empty_batch-12               44925             26837 ns/op            6846 B/op         70 allocs/op
PASS
ok      github.com/evstack/ev-node/block/internal/executing     23.345s

base alex/bench_phase1_start

goos: darwin
goarch: arm64
pkg: github.com/evstack/ev-node/block/internal/executing
cpu: Apple M2 Max
BenchmarkProduceBlock/empty_batch-12               14835             80996 ns/op           11042 B/op        155 allocs/op
BenchmarkProduceBlock/empty_batch-12               14977             80256 ns/op           11031 B/op        154 allocs/op
BenchmarkProduceBlock/empty_batch-12               15104             81055 ns/op           11024 B/op        154 allocs/op
BenchmarkProduceBlock/empty_batch-12               14895             81077 ns/op           11035 B/op        154 allocs/op
BenchmarkProduceBlock/empty_batch-12               14671             80922 ns/op           11048 B/op        154 allocs/op
BenchmarkProduceBlock/single_tx-12                 14695             81520 ns/op           11605 B/op        171 allocs/op
BenchmarkProduceBlock/single_tx-12                 14871             81326 ns/op           11592 B/op        170 allocs/op
BenchmarkProduceBlock/single_tx-12                 14636             81783 ns/op           11605 B/op        170 allocs/op
BenchmarkProduceBlock/single_tx-12                 14800             81486 ns/op           11595 B/op        170 allocs/op
BenchmarkProduceBlock/single_tx-12                 14617             81957 ns/op           11606 B/op        170 allocs/op
BenchmarkProduceBlock/100_txs-12                   12792             93848 ns/op           47152 B/op        277 allocs/op
BenchmarkProduceBlock/100_txs-12                   12963             93155 ns/op           47128 B/op        277 allocs/op
BenchmarkProduceBlock/100_txs-12                   13046             92560 ns/op           47121 B/op        277 allocs/op
BenchmarkProduceBlock/100_txs-12                   12789             93442 ns/op           47141 B/op        277 allocs/op
BenchmarkProduceBlock/100_txs-12                   12891             93534 ns/op           47133 B/op        277 allocs/op
PASS
ok      github.com/evstack/ev-node/block/internal/executing     31.700s

@github-actions
Copy link
Contributor

github-actions bot commented Feb 19, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedFeb 20, 2026, 12:34 PM

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @alpe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on significant performance improvements for block execution and storage operations. By introducing caching mechanisms, optimizing data serialization, and refining key generation, the changes aim to reduce CPU and memory overhead, particularly in high-throughput scenarios. The modifications also streamline the block production pipeline by avoiding redundant computations and ensuring correct P2P broadcast ordering, contributing to a more efficient and robust node operation.

Highlights

  • Performance Optimizations: Implemented several performance enhancements across block execution, storage, and serialization paths. This includes reducing memory allocations during key generation and protobuf marshalling, and optimizing data access patterns.
  • Caching and Amortization: Introduced in-memory caching for last block data (header hash, data hash, signature) to avoid redundant store reads during block creation. Pending block checks are now amortized to reduce the frequency of expensive store scans.
  • Streamlined Block Production: Refactored block production logic to use a new SaveBlockDataFromBytes method, allowing pre-serialized data to be saved directly. Validation for self-produced blocks now uses a lightweight AssertValidSequence to skip expensive re-computations.
  • P2P Broadcast Ordering: Ensured sequential broadcasting of block headers before block data to the P2P network, addressing potential validation failures due to out-of-order delivery on peers.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • block/internal/executing/executor.go
    • Removed errgroup import to simplify concurrency model.
    • Added hasPendingBlock, lastHeaderHash, lastDataHash, lastSignature, and pendingCheckCounter fields to Executor for caching and amortized checks.
    • Initialized and warmed the new last-block cache fields in initializeState.
    • Modified ProduceBlock to amortize pending limit checks, reducing store scans.
    • Optimized ProduceBlock to use hasPendingBlock flag to conditionally check for pending blocks in the store.
    • Updated signHeader call in ProduceBlock to retrieve serialized header bytes, avoiding re-marshalling.
    • Replaced blockProducer.ValidateBlock with currentState.AssertValidSequence for lightweight validation of self-produced blocks.
    • Removed deletePendingBlock call and explicitly set hasPendingBlock to false after commit.
    • Updated the in-memory last-block cache after successful block production.
    • Changed P2P broadcasting from parallel errgroup to sequential, ensuring header broadcast before data.
    • Implemented a fast path in CreateBlock to use cached last block data, falling back to store reads if the cache is empty.
    • Optimized ApplyBlock to conditionally allocate rawTxs slice and used a dedicated context for transaction execution.
    • Modified signHeader to return both the signature and the serialized header bytes.
    • Updated recordBlockMetrics to use data.TxsByteSize() for more accurate block size metrics.
  • pkg/store/batch.go
    • Updated ds.NewKey calls to ds.RawKey for direct key usage.
    • Introduced SaveBlockDataFromBytes to save pre-serialized block data, avoiding redundant marshalling.
    • Refactored SaveBlockData to utilize the new SaveBlockDataFromBytes method.
  • pkg/store/keys.go
    • Added heightKey helper function for efficient, allocation-minimal key generation for height-based keys.
    • Updated GetHeaderKey, getDataKey, GetSignatureKey, and getStateAtHeightKey to use the new heightKey function.
    • Simplified getHeightKey to directly return a constant string.
  • pkg/store/kv.go
    • Removed unused path import.
    • Refactored GenerateKey to use strings.Builder for more efficient string concatenation and reduced allocations.
  • pkg/store/tracing.go
    • Added tracing instrumentation for the new SaveBlockDataFromBytes method.
  • pkg/store/tracing_test.go
    • Added a mock implementation for the SaveBlockDataFromBytes method for testing purposes.
  • pkg/store/types.go
    • Added SaveBlockDataFromBytes to the Batch interface definition.
  • types/data.go
    • Added TxsByteSize() method to calculate the total byte size of transactions without protobuf marshalling, for cheaper metrics.
  • types/serialization.go
    • Modified SignedHeader.ToProto and SignedData.ToProto to use a cached marshalled public key from the Signer struct.
    • Updated SignedHeader.FromProto and SignedData.FromProto to populate the cached marshalled public key.
    • Optimized State.ToProto to avoid timestamppb.New allocation by directly constructing the timestamppb.Timestamp.
  • types/signer.go
    • Added marshalledPubKey field to Signer struct to cache marshalled public key bytes.
    • Implemented MarshalledPubKey() method to retrieve the cached public key or marshal it if not present.
    • Updated NewSigner to pre-cache the marshalled public key upon creation.
  • types/state.go
    • Added AssertValidSequence method for lightweight structural validation of self-produced blocks, bypassing expensive DACommitment re-computation.
Activity
  • No specific review comments or activity have been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link
Contributor

claude bot commented Feb 19, 2026

Claude finished @alpe's task in 4m 36s —— View job


Code Review

Tasks

  • Gather context and understand request
  • Read changed files: executor.go, types, store packages
  • Review code quality, correctness, and architecture
  • Review performance and resource efficiency
  • Review testing and reliability
  • Post final review

Great set of optimizations with solid benchmarks backing them up. The ~3x throughput improvement and ~50% allocation reduction are impressive. A few issues need addressing before merge:

Bugs / Correctness

1. Optimization #6 is incomplete — signHeader bytes are discarded [block/internal/executing/executor.go:534]

signHeader was refactored to return (Signature, []byte, error) specifically to allow the caller to pass pre-serialized bytes to SaveBlockDataFromBytes. However the bytes are silently discarded with _ and the old SaveBlockData (which re-marshals) is used:

// line 534
signature, _, err := e.signHeader(&header.Header)
// ...
// line 553
if err := batch.SaveBlockData(header, data, &signature); err != nil {

The optimization is wired halfway — the interface change and signHeader change happened, but the caller was never updated to use SaveBlockDataFromBytes. Either complete the wiring or revert signHeader to return only (Signature, error) to avoid the misleading dead return value.

Note: this only works if AggregatorNodeSignatureBytesProvider produces the same bytes that header.MarshalBinary() would produce. If the signing payload is different (e.g., ev-abci uses a custom provider), then the bytes cannot be reused for storage and this optimization may not be universally applicable. Fix this →

2. lastBlockInfo cache not invalidated on rollback [block/internal/executing/executor.go:601–605]

e.lastBlockInfo.Store(&lastBlockInfo{
    headerHash: newState.LastHeaderHash,
    dataHash:   data.Hash(),
    signature:  signature,
})

If store.Rollback is triggered (e.g., during replay/sync), the in-memory lastBlockInfo and lastState caches would be stale. A subsequent CreateBlock using the cached values would build on a rolled-back block, producing a chain fork. Is there a guarantee that Rollback is never called while the executor is running? If not, this cache needs to be invalidated on rollback or the rollback path needs to update these caches.

Thread Safety

3. Signer.MarshalledPubKey() lazy init without synchronization [types/signer.go:19–32]

func (s *Signer) MarshalledPubKey() ([]byte, error) {
    if len(s.marshalledPubKey) > 0 {
        return s.marshalledPubKey, nil
    }
    // ...
    s.marshalledPubKey = bz  // unsynchronized write
    return bz, nil
}

This is a check-then-act without synchronization. If the same SignedHeader is ever accessed from multiple goroutines (e.g., passed to the P2P layer which may read it concurrently), this is a data race. In the current sequential broadcast path it's likely safe, but it's fragile. Using sync.Once or atomic.Pointer would be safer, or simply ensuring marshalledPubKey is always populated at construction time (as NewSigner does). Note that CreateBlock constructs types.Signer inline without calling NewSigner, so the cache is never pre-populated for locally produced blocks:

// executor.go:738-743
Signer: types.Signer{
    PubKey:  pubKey,        // marshalledPubKey is zero
    Address: e.genesis.ProposerAddress,
},

4. pendingCheckCounter not atomic [block/internal/executing/executor.go:73]

pendingCheckCounter uint64

This field is only accessed from the execution loop goroutine, so it's safe today. But it's a plain field adjacent to atomic.Pointer and atomic.Bool fields, which creates a misleading inconsistency. Consider making it atomic.Uint64 for clarity, or add a comment stating it's only accessed from executionLoop.

Design / Correctness Concerns

5. AssertValidSequence skips DataHash and ValidatorHash [types/state.go:70–94]

The new lightweight validation intentionally skips Validate(header, data) which checks DataHash. This is the right trade-off for self-produced blocks, but ValidatorHash is also not checked. If a validator rotation happened and produced an incorrect ValidatorHash, it would not be caught. The comment says "skip the expensive Validate() / DACommitment() re-computation" but DACommitment checks DataHash, not ValidatorHash. ValidatorHash check is cheap to add and would catch a class of bugs. Worth documenting explicitly which fields are intentionally skipped and why.

6. savePendingBlock never sets hasPendingBlock = true [block/internal/executing/executor.go:514]

if err := e.savePendingBlock(ctx, header, data); err != nil {
    return fmt.Errorf("failed to save block data: %w", err)
}
// hasPendingBlock is still false here

The flag only transitions false → true in initializeState. If the node crashes after savePendingBlock but before batch.Commit, on restart initializeState correctly finds and sets the flag. This is consistent with its purpose as a "crash recovery" indicator. However, the semantics are subtle: hasPendingBlock means "a pending block from a previous crashed cycle exists" not "a pending block is currently being produced". Consider renaming to recoveredPendingBlock or adding a comment to clarify this invariant.

Code Quality

7. signatureHash misleading name [pkg/store/batch.go:66]

signatureHash := *signature

signature is already a types.Signature (i.e., []byte). Naming the copy signatureHash implies it's a hash, but it's the raw signature bytes. This was noted in Gemini's review too. Rename to signatureCopy or sig. Fix this →

8. getHeightKey format inconsistency [pkg/store/keys.go:90–92]

func getHeightKey() string {
    return "/" + heightPrefix
}

All other height-related keys now use the heightKey() helper which produces /prefix/height. But getHeightKey returns /t directly (no height component). This is fine since it's the "current height" key, but the heightPrefix = "t" constant is otherwise only used here, while heightKey() is used for all block-indexed keys. The inconsistency is harmless but the heightPrefix constant is confusingly named given it's only used in getHeightKey.

9. GenerateKey drops path.Clean [pkg/store/kv.go:43–56]

The old implementation presumably used path.Clean. The new strings.Builder version is cleaner and more efficient, but if any caller passes fields with leading/trailing slashes or empty strings, the generated key format changes. All current callers appear to use well-behaved string constants, so this is low risk. A one-line comment noting the inputs are assumed clean would prevent future bugs.

10. timestamppb.Timestamp manual construction [types/serialization.go:377–378]

LastBlockTime: &timestamppb.Timestamp{Seconds: secs, Nanos: nanos},

timestamppb.New() validates that Nanos is in [0, 999999999]. Go's time.Nanosecond() is always in that range, so this is safe. Add a brief comment to make this invariant explicit, since a future refactor replacing time.Nanosecond() with something else could silently produce an invalid protobuf.

Performance (Minor)

11. Sequential broadcast adds latency to block cycle [block/internal/executing/executor.go:607–620]

The correctness argument for sequential broadcast is sound. But this adds the latency of header broadcast to the block production cycle, potentially increasing block times under slow P2P conditions. Consider whether the broadcast should happen in a background goroutine after the batch is committed (with the header-before-data ordering preserved). The current approach is the safest and likely acceptable, but worth profiling under network load.

Testing

12. No test for AssertValidSequence [types/state.go:70]

The new lightweight validation method has no dedicated unit tests. Given it's now on the critical path for every produced block (replacing ValidateBlock), it deserves table-driven tests covering: wrong chain ID, wrong height, time before last block, mismatched last header hash, mismatched app hash, and initial state (empty LastHeaderHash).

13. SaveBlockDataFromBytes has no test [pkg/store/batch.go:62]

The tracing wrapper and mock exist, but there's no test verifying that SaveBlockDataFromBytes produces the same stored result as SaveBlockData with the same inputs. A round-trip test would catch any hash computation differences.

Summary

The performance results are compelling and the architecture is sound. The most actionable items before merge:

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces several performance optimizations aimed at reducing redundant store reads and protobuf marshaling/unmarshaling. Key changes include caching lastHeaderHash, lastDataHash, and lastSignature in the Executor struct, amortizing pending limit checks, and introducing SaveBlockDataFromBytes to avoid re-serialization. Additionally, TxsByteSize is added for more efficient metrics reporting, and MarshalledPubKey caches public key serialization. The signHeader function now returns serialized header bytes to prevent redundant marshaling. Overall, these changes should lead to a more efficient block production pipeline.

"github.com/libp2p/go-libp2p/core/crypto"
"github.com/rs/zerolog"
"golang.org/x/sync/errgroup"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The golang.org/x/sync/errgroup import was removed, but the corresponding usage in ProduceBlock (lines 548-558 in the old code) was replaced with sequential broadcasting. This is a functional change that might impact the overall block production time if broadcasting was previously done concurrently. While the comment on line 598 explains the sequential broadcast, it's worth noting the performance implication of this change.

// For based sequencer, this will return an empty signature
signature, err := e.signHeader(header.Header)
// For based sequencer, this will return an empty signature.
signature, _, err := e.signHeader(&header.Header)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The signHeader function now returns three values, but only two are being captured here. The headerBytes (second return value) is not being used, which could be passed to batch.SaveBlockDataFromBytes to avoid re-marshalling the header. This would further optimize the block production process.

Suggested change
signature, _, err := e.signHeader(&header.Header)
signature, headerBytes, err := e.signHeader(&header.Header)

height := header.Height()
signatureHash := *signature

headerBlob, err := header.MarshalBinary()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The height variable is not used in this function after the refactoring. It can be removed.

signatureHash := *signature

headerBlob, err := header.MarshalBinary()
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The signatureHash variable is not used in this function after the refactoring. It can be removed.

// SaveBlockDataFromBytes saves pre-serialized block data to the batch.
// This avoids re-marshalling header and data when the caller already has the binary blobs.
func (b *DefaultBatch) SaveBlockDataFromBytes(header *types.SignedHeader, headerBlob, dataBlob []byte, signature *types.Signature) error {
height := header.Height()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The height variable is already available from header.Height() and signatureHash from *signature. These lines are redundant and can be removed.

Suggested change
height := header.Height()
height := header.Height()
signatureHash := *signature

@@ -39,29 +39,40 @@ const (
heightPrefix = "t"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The heightPrefix constant is defined but not used in the new heightKey function. It seems like the intention was to use it as the prefix, but it's hardcoded as 't' in the constant definition and then 'h', 'd', 'c', 's' in the heightKey calls. This could lead to confusion or incorrect key generation.

Comment on lines 90 to +91
func getHeightKey() string {
return GenerateKey([]string{heightPrefix})
return "/" + heightPrefix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The getHeightKey function now returns a hardcoded string "/t". This is inconsistent with the heightKey function which dynamically generates keys based on a prefix and height. If heightPrefix is intended to be used here, it should be passed to heightKey or directly used in the string concatenation.

Suggested change
func getHeightKey() string {
return GenerateKey([]string{heightPrefix})
return "/" + heightPrefix
func getHeightKey() string {
return heightKey(heightPrefix, 0) // Or whatever default height is appropriate
}

Comment on lines 43 to +55
func GenerateKey(fields []string) string {
key := "/" + strings.Join(fields, "/")
return path.Clean(key)
// Pre-calculate total size to avoid re-allocation.
n := 0
for _, f := range fields {
n += 1 + len(f) // '/' + field
}
var b strings.Builder
b.Grow(n)
for _, f := range fields {
b.WriteByte('/')
b.WriteString(f)
}
return b.String()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The GenerateKey function was optimized to use strings.Builder for performance. However, the path.Clean call was removed. While strings.Builder is more efficient for concatenation, path.Clean ensures that the generated key is always in a canonical form (e.g., handling double slashes or ./.. components). Removing it might lead to non-canonical keys if the input fields are not perfectly clean, which could cause issues with datastore lookups.

Base automatically changed from alex/bench_phase1_start to main February 19, 2026 16:20
@alpe alpe changed the title Alex/execute perf perf: optimize block creation Feb 19, 2026
Copy link
Member

@julienrbrt julienrbrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! mostly lgtm! i think we can delete some old code and delete some struct cache to put them in cached_store instead (better for readability and consistency)


// Cached per-block data to avoid store reads + protobuf deserialization
// in CreateBlock. Updated after each successful block production.
lastHeaderHash types.Hash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still believe those aren't necessary, because GetBlockData is cached on the second call.
We may simply want to add GetSignature to the pkg/store/cached_store.go for consistency.

Copy link
Contributor Author

@alpe alpe Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the code to use GetBlockData and ran benchmarks. It is slower with GetBlockData (using a memory DB).

This branch:

BenchmarkProduceBlock/empty_batch-12               43850             26704 ns/op            6758 B/op         70 allocs/op
BenchmarkProduceBlock/empty_batch-12               45126             26834 ns/op            6854 B/op         70 allocs/op
BenchmarkProduceBlock/empty_batch-12               47043             27658 ns/op            7074 B/op         70 allocs/op
BenchmarkProduceBlock/empty_batch-12               45932             25491 ns/op            6985 B/op         70 allocs/op
BenchmarkProduceBlock/empty_batch-12               46345             26730 ns/op            7016 B/op         70 allocs/op
BenchmarkProduceBlock/single_tx-12                 44605             27264 ns/op            7258 B/op         80 allocs/op
BenchmarkProduceBlock/single_tx-12                 44408             26924 ns/op            7252 B/op         80 allocs/op
BenchmarkProduceBlock/single_tx-12                 44490             27180 ns/op            7253 B/op         80 allocs/op
BenchmarkProduceBlock/single_tx-12                 44314             27036 ns/op            7234 B/op         80 allocs/op
BenchmarkProduceBlock/single_tx-12                 45469             27158 ns/op            7367 B/op         80 allocs/op
BenchmarkProduceBlock/100_txs-12                   36204             32863 ns/op           25767 B/op         80 allocs/op
BenchmarkProduceBlock/100_txs-12                   36198             32815 ns/op           25767 B/op         80 allocs/op
BenchmarkProduceBlock/100_txs-12                   37375             32799 ns/op           25745 B/op         80 allocs/op
BenchmarkProduceBlock/100_txs-12                   36219             32371 ns/op           25767 B/op         80 allocs/op
BenchmarkProduceBlock/100_txs-12                   36669             32604 ns/op           25758 B/op         80 allocs/op
PASS
ok      github.com/evstack/ev-node/block/internal/executing     23.490s

With GetBlockData

BenchmarkProduceBlock/empty_batch-12               41150             28790 ns/op            8979 B/op        107 allocs/op
BenchmarkProduceBlock/empty_batch-12               42402             28799 ns/op            8966 B/op        107 allocs/op
BenchmarkProduceBlock/empty_batch-12               41312             29003 ns/op            8977 B/op        107 allocs/op
BenchmarkProduceBlock/empty_batch-12               42162             29241 ns/op            8967 B/op        107 allocs/op
BenchmarkProduceBlock/empty_batch-12               41576             28934 ns/op            8973 B/op        107 allocs/op
BenchmarkProduceBlock/single_tx-12                 41461             29483 ns/op            9471 B/op        120 allocs/op
BenchmarkProduceBlock/single_tx-12                 40598             29281 ns/op            9484 B/op        120 allocs/op
BenchmarkProduceBlock/single_tx-12                 41346             29488 ns/op            9472 B/op        120 allocs/op
BenchmarkProduceBlock/single_tx-12                 40634             28958 ns/op            9483 B/op        120 allocs/op
BenchmarkProduceBlock/single_tx-12                 38173             29275 ns/op            9523 B/op        120 allocs/op
BenchmarkProduceBlock/100_txs-12                   30086             39664 ns/op           39148 B/op        226 allocs/op
BenchmarkProduceBlock/100_txs-12                   30886             39006 ns/op           39126 B/op        226 allocs/op
BenchmarkProduceBlock/100_txs-12                   30547             39378 ns/op           39135 B/op        226 allocs/op
BenchmarkProduceBlock/100_txs-12                   31329             38776 ns/op           39115 B/op        226 allocs/op
BenchmarkProduceBlock/100_txs-12                   31376             38786 ns/op           39113 B/op        226 allocs/op
PASS
ok      github.com/evstack/ev-node/block/internal/executing     23.685s

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, interesting. Have you made sure to call store.NewCachedStore on that memdb store?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: Checked with store.NewCachedStore and it is slightly slower still, a field access will probably always be faster 👍🏾

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, let me try again.

lastSignature = e.lastSignature
} else {
// Cold start fallback: read from store
lastHeader, lastData, err := e.store.GetBlockData(ctx, height-1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically not true, without those cached struct, it would have read from cache anyway

@codecov
Copy link

codecov bot commented Feb 20, 2026

Codecov Report

❌ Patch coverage is 67.41573% with 58 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.91%. Comparing base (dc473e5) to head (985e9f9).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
block/internal/executing/executor.go 73.86% 17 Missing and 6 partials ⚠️
pkg/store/tracing.go 0.00% 11 Missing ⚠️
types/signer.go 62.50% 3 Missing and 3 partials ⚠️
types/data.go 0.00% 5 Missing ⚠️
types/serialization.go 73.33% 4 Missing ⚠️
block/internal/executing/pending.go 0.00% 3 Missing ⚠️
pkg/store/batch.go 70.00% 0 Missing and 3 partials ⚠️
pkg/store/keys.go 80.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3093      +/-   ##
==========================================
- Coverage   61.18%   60.91%   -0.27%     
==========================================
  Files         113      113              
  Lines       11444    11617     +173     
==========================================
+ Hits         7002     7077      +75     
- Misses       3654     3740      +86     
- Partials      788      800      +12     
Flag Coverage Δ
combined 60.91% <67.41%> (-0.27%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@alpe alpe marked this pull request as ready for review February 20, 2026 14:02
@alpe alpe requested a review from julienrbrt February 20, 2026 14:02
Copy link
Member

@julienrbrt julienrbrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏🏾

@julienrbrt julienrbrt merged commit ce18484 into main Feb 20, 2026
35 checks passed
@julienrbrt julienrbrt deleted the alex/execute_perf branch February 20, 2026 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants