{% hint style="info" %}
Substreams builds upon Firehose.
Keep track of Firehose releases and Data model updates in the Firehose documentation.
{% endhint %}
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Fix
Sinker.requestActiveStartBlocknot being set when the handler implementsSinkerSessionInitHandler, which previously causedProgressMessageLastContiguousBlockto be incorrect for production-mode mapper stages.
- Server:
tier1forkable hub now logs under thetier1logger instead of the genericbstreampackage logger, soprocessing block(and related hub) log lines are correctly attributed to the component (requires bstreamhub.WithLogger). - Server: per-block execution timeouts (
--substreams-block-execution-timeout) are no longer silently swallowed when a WASM host-function panic (e.g. wasmtime) coincides with the deadline. Previously,recoverExecutionPanicwould returnnilinstead ofCodeDeadlineExceeded, causing the offending block to be skipped and the stream to complete successfully. - CI: Docker image login, build and push are now skipped for fork PRs; image is still built (without push) to validate the Dockerfile.
- added more metrics to identify time spent squashing
- Server: the tier1 job scheduler no longer slows down on very large reprocessings (100_000s of segments). Both
NextJobandAllStoresCompletedused to rescan the whole completed-segment prefix on every scheduling event, making job selection O(segments²) over a run; they now advance a forward-only cursor and are O(1) amortized. - Server:
UpdateStats(progress reporting) now builds each stage's ranges in a single sort-free pass instead of one map+sort per stage every second. - Server: removed per-message overhead in the scheduler event loop — the debug-state env var is read once at startup instead of on every message, and the per-message debug log no longer builds its fields when debug logging is disabled.
- Server: the cached-output streaming buffer now appends and checks for flushing under a single lock per block.
- Index optimisation: Optimized
ClockDistributorto skip blocks earlier and faster when using block filter. - Fix server-side bug that would cause Blocks request to fail after a few retries with 'load full store (...) load store stream: opening file for streaming: not found' when depending on a store that is being merged slowly
- add 'substreams_tier1_active_requests_hard_limit' and 'substreams_tier2_max_concurrent_requests' attributes to prometheus metrics (constant, reflects the configuration)
- Fix server-side bug that would prevent forkableHub from correctly updating metrics when receiving partial or out-of-order blocks
- Add optional 'secret key' authentication between tier1 and tier2 services.
- Fixed a case where a nil pointer exception could happen on storage error(s).
- Substreams client library will no longer forcefully strip authentication from plaintext connections.
- Substreams tier1 will now retry failed tier2 jobs that stream data directly if they have not produced any data yet.
- Substreams worker jobs will always retry tier2 jobs that return with codes.Unavailable:
no healthy upstream, considering this error as load-balancer version ofcodes.ResourceExhausted. - Substreams will reject requests with stores when expected total memory usage from stores alone are above 45% (down from prev. 75%)
You can set
SUBSTREAMS_TOTAL_STORE_SIZE_LIMIT_PERCENT=75to go back to previous behavior orSUBSTREAMS_ENFORCE_TOTAL_STORE_SIZE_LIMIT=falseto disable the feature.
- Fix
InferOutputModuleFromPackagesentinel value causing error when used with manifests containing anetworks:section. The sentinel value was being passed to the manifest reader before resolution, causing a "could not find module @!##InferOutputModuleFromSpkg##!@ in graph" error.
- Fix V4 buffer flushing on end-of-file: the buffer is now properly flushed when the last block in a segment is reached
- Fix error handling for
InvalidArgumentand other validation errors: they were shown asUnknown - Fix partial blocks output in V4: prevent
NewPartialandUndoPartialcursor steps from being exposed to clients (normalized toNew/Undorespectively) - Fix partial block data to be correctly wrapped in
BlockScopedDatason v4. - Last partial blocks are now accepted interchangeably with
newblocks and vice-versa, allowing faster full blocks for requests that do not ask for partial blocks.
-
Improved
substreams authcommand to better guide new users by showing the registration link alongside the authentication link. The command now also accepts API keys directly and automatically exchanges them for JWT tokens. -
Added support for buf.build commit references in descriptor set module versions. In addition to semantic versions (e.g.,
v1.2.3), you can now use buf.build commit hashes (32 lowercase hex characters, e.g.,@f3ab5976b9ba4f9bac28b26271fca7d7). Commit references are also cached since they are immutable. -
Fixed
protogenhash caching behavior when descriptor sets don't have a pinned version. Previously, when using descriptor sets without a version (resolving to "latest"), the.last_generated_hashfile would cache incorrectly and skip regeneration even when the remote content had changed. Now:- Descriptor sets without a deterministic version (semver or commit ref) will always trigger regeneration
- A warning is emitted listing which descriptor sets need pinned versions
- The hash file is removed when non-deterministic descriptor sets are present to prevent stale caches
- Improved 'partial blocks': support new pbbstream's "LastPartial" field, fix 'undo' scenarios for stores
This release introduces significant performance optimizations to the Substreams gRPC communication layer. See RPC Protocol Reference for details.
-
RPC V4 protocol with
BlockScopedDatasbatching: The new V4 protocol batches multipleBlockScopedDatamessages into a singleBlockScopedDatasresponse, reducing gRPC round-trips and message framing overhead during backfill. -
S2 compression (new default): S2 compression replaces gzip as the default compression algorithm. S2 provides ~3-5x faster compression/decompression than gzip with comparable compression ratios. The client automatically negotiates compression with the server.
-
VTProtobuf fast serialization: Both client and server now use vtprotobuf for protobuf marshaling/unmarshaling, providing ~2-3x faster serialization with reduced memory allocations.
-
Server-side message buffering: Configurable via
OutputBufferSizeflag (default: 100 blocks) orMESSAGE_BUFFER_MAX_DATA_SIZEenvironment variable (default: 10MB). -
Automatic protocol fallback: Clients gracefully fall back V4 → V3 → V2 when connecting to older servers.
-
Improved Connect/gRPC protocol selection: Server now efficiently routes requests to the appropriate handler based on content-type, improving performance by ~15% for pure gRPC clients (previously all requests went through Connect RPC layer).
- Updated sink library to leverage RPC V4 protocol with
BlockScopedDatasbatching, improving throughput by reducing per-message processing overhead.
- Fix issue where a retry on dstore while writing a fullKV would corrupt the file, making it unreadable. Fix prevents this and also now deletes affected files when they are detected.
- Fix bug in event loop where
loop.NewQuitMsg()(which returns*QuitMsgpointer) was not being handled, causing quit messages from error paths to be silently ignored and requests to hang indefinitely. - Fix issue where transient HTTP/2 stream errors (e.g.,
INTERNAL_ERROR) fromdstorewere being treated as fatal errors instead of being retried. These transient network errors are now detected and retried with exponential backoff. Added comprehensive diagnostic logging including working duration tracking and ERROR-level alerts when walker is stuck for more than 5 minutes.
- Fix
progress_running_jobsnot resetting its counter when reaching 0 jobs running on the server.
- Added bucketed prometheus metrics
head_block_relative_time_sum{app=substreams_output}that shows latency between outputing live blocks and their blocktime. - Fixed underflow in 'FailedPrecondition desc = request needs to process a total of x blocks' error when running from 'substreams run' with a start-block in the future.
- Fix parsing of params with modules derived with
use: now allows reusing the same modules with different params - Fix
substreams packfor a substreams.yaml where no modules are defined, but a SinkConfig points to an imported module (force modules inclusion) - The
substreams initcommand will now list generators in the server's specified order (instead of randomly display them). - The
substreams inithas now proper error handling when a file cannot be uploaded to the server. - added
substreams sink protojson(migrated from https://github.com/streamingfast/substreams-sink-files)
- Fix issue where "live backfiller" would not create segments after reconnecting with a cursor starting from a previous quicksave, causing delays in future reconnection
- Prevent "panic" when log messages are too large: instead, they will be truncated with a 'some logs were truncated' message.
- Raise max individual log message size from 128k to 512k
- Raise max log message size for a full block from 128k to 5MiB
- Reduce log level from Warn to Debug when we fail to get or set the store size (for backends that don't support it)
- Added
--bytes-encodingflag torunandgui, accepted values: ['', 'hex', 'base58', 'base64', 'string'] (default: '' still auto-detects from network)
- Removed PartialsData message and brought back this data inside the good old BlockScopedData
- added the following fields to BlockScopedData:
- bool
is_partialto indicate if this block is a partial block. The following two fields are only present whenis_partial==true - optional bool
is_last_partialto indicate if this is the last partial of a given block (with correct block hash) - optional uint32
partial_indexto indicate the index of this partial block within the full block
- bool
- renamed
--partial_blocks_onlyflag topartial_blockson substreams Blocks request - removed
--include_partial_blocksflag from substreams Blocks request
- Added experimental support for partial blocks (ex: Base's Flash Blocks) -- only supported on https://base-mainnet-flash.streamingfast.io endpoint
- See more details in the documentation
- commands
runandsink webhooknow support these flags:--include-partial-blocks: sends every block as partial(s) but also as real block--partial-blocks-only: only sends partials (for every block, every bit of data should be there)
- To use the new partial blocks in a sink that uses github.com/streamingfast/substreams/sink, simply:
- Define your sink flags with
sink.FlagIncludePartialBlocksand/orsink.FlagPartialBlocksOnlyunderFlagIncludeOptional() - Implement the function
HandlePartialBlockData(...)and pass it toNewSinkerFullHandlersWithPartial(...)when creating the sinker.
- Define your sink flags with
- Server accepts new
include_partial_blocksandpartial_blocks_onlyboolean params in the request body. - Response, when requested with above params, now include new message
PartialBlockData, containing the usual "map module output", the clock and the index of that partial. - Server accepts environment variable
SUBSTREAMS_BIGGEST_PARTIAL_BLOCK_INDEX(default 10) -- it will emit a partial with this index when it gets the final part of a block.
- The
--endpointflag on various elements (substreams run/gui/sink) now accepts a short identifier to resolve, identifier which must match of the The Graph Network Registry, so for example,--endpoint=solanacan be used directly now.
- Added validation in tier1 service for WASM modules that import
eth_calloreth_get_balance. When the environment variableX-substreams-acknowledge-non-deterministicis set, requests using such modules must include the headerX-substreams-acknowledge-non-deterministicset totrueto acknowledge the non-deterministic nature of external calls.
- [Server] Fix regression from v1.17.3 where "store stages" would not always correctly get scheduled for backfilling, resulting in
get size of store "...": opening file: not found
- substreams GUI: fix setting (only) the start block with a relative value: will now change the default stop block from +1000 to 0 instead of returning the error
relative end block is supported only with an absolute start block
- Fix handling of relative stop block (regression) in GUI, RUN and other sinks.
- Fix GUI handling of substreams that are not built yet.
- Fix running from manifests that were created with
tools unpackand contain imported modules, preventing "failed to get entrypoint" error.
-
Added opt-in memory limits related to loading FullKV stores, gated by environment variables:
- "SUBSTREAMS_STORE_SIZE_LIMIT_PER_REQUEST" (default allows 5GiB:
5368709120): limit size of all loaded stores for a single request, in bytes. Set to a numeric value in bytes. - "SUBSTREAMS_ENFORCE_STORE_SIZE_LIMIT_PER_REQUEST" (default false): if set to
true, enforce the limit above instead of just logging a warning - "SUBSTREAMS_TOTAL_STORE_SIZE_LIMIT_PERCENT" (default: 75): limit the size in-memory of all loaded stores concurrently on the instance, in percentage of usable memory (cgroup or system total -- regardless of free or available)
- "SUBSTREAMS_ENFORCE_TOTAL_STORE_SIZE_LIMIT" (default: false): if set to
true, enforce the limit above instead of just logging a warning
- "SUBSTREAMS_STORE_SIZE_LIMIT_PER_REQUEST" (default allows 5GiB:
-
Fixed an edge case where substreams with modules depending on stores that start on the future would fail and incorrectly report an error about "tier2 version being incompatible"
- Reduced memory usage associated with reading and writing large stores by streaming the marshalling process. (~45% reduction peak usage)
- New environment variable
SUBSTREAMS_TIER1_DEBUG_API_ADDRnow enables the debug API on the tier1 service. - Renamed environment variable
SUBSTREAMS_DEBUG_API_ADDRtoSUBSTREAMS_TIER2_DEBUG_API_ADDR, since it only affected tier2.
- Improved
substreams initto expand~and environment variable when resolving local file path. - Improved
substreams initreporting of error when local file cannot be uploaded/read correctly. - Improved
substreams initrendering of list items and some other elements. - Fixed
substreams initto correctly showing selected label when selecting from a list of items. - Fixed
substreams protogenwhen params are defined on networks for imported modules that are not dependencies to any local module
- Fix a panic (nil pointer) when skipping blocks via indexes on stores on tier2
- Fix egress bytes calculation when running in noop or dev mode with specified output debug modules
- Reduced memory usage while reading or writing large stores
- Fixed
substreams publishalias command not fully aligned withsubstreams registry publish"main" version.
- This new endpoint removes the need for complex "mangling" of the package on the client side.
- Instead of expecting
sf.substreams.v1.Modules(with the client having to apply parameters, network, etc.), thesf.substreams.rpc.v3.Requestnow expects:- a
sf.substreams.v1.Package. - a
map<string, string>ofparams - the
networkstring which will all be applied to the package server-side.
- a
- It returns the same object as the v2 endpoint, i.e. a stream of
sf.substreams.rpc.v2.Response
- Watch for releases for firehose-core, firehose-ethereum, etc. to include this new endpoint.
- It is added on top of the existing 'v2' endpoint, both being active at the same time.
- To enable it, operators will simply need to ensure that their routing allows the
/sf.substreams.rpc.v3.Stream/*path. - Cached spkg on the server will now contain protobuf definitions, simplifying debugging of user requests.
- Emitted metrics for requests can now be
sf.substreams.rpc.v3/Blocksinstead of alwayssf.substreams.rpc.v2/Blocks, make sure that your metering endpoint can support it.
- The clients provided in this release (substreams run, gui and sinks that are linked to the 'client' library) will now support both endpoints.
- Without any flag, they will use the v3 endpoint by default and automatically fallback to v2 if they hit a "404 Not Found" or "Not Implemented" error.
- The
--force-protocol-versionhas been added to all clients. Set it to 2 to force only v2, set it to 3 to force only v3, or leave it unset (0) to use the default "v3 with fallback to v2"
- Added filesystem-backed caching for Buf BSR API requests to improve build performance and prevent rate limit errors. Cache uses SHA256 keys based on module/version/symbols, stores to
~/.config/substreams/buf-cache/, and only caches deterministic semver versions. Falls back to in-memory cache if filesystem unavailable. Warns when descriptor sets lack version specifications, as these cannot be cached and may cause rate limit issues. - Added support for
@versionnotation inprotobuf.descriptorSetssection of manifest. You can now specify versions in multiple ways:- Separate fields:
module: buf.build/streamingfast/substreams-sink-sql - Separate fields with explicit latest:
module: buf.build/streamingfast/substreams-sink-sqlwithversion: latest - Inline notation:
module: buf.build/streamingfast/substreams-sink-sql@v0.1.0 - Note:
@latestinline notation is not allowed; useversion: latestor omit the version instead
- Separate fields:
- Fixed a bug with BlockFilter: a skipped module would send BlockScopedData (in dev or near HEAD, to follow progress) with an empty module name, breaking some sinks. Module name was present if requesting a module dependent on that skipped module. Now the module name is always included.
- Fix "max-retries" so that it only 'resets' the counter if it receives actual data.
- Updated Wasmtime runtime from v30.0.0 to v36.0.0, bringing performance improvements, inlining support, Component Model async implementation, and enhanced security features.
- Added WASM bindgen shims support for Wasmtime runtime to handle WASM modules with WASM bindgen imports (when Substreams Module binary is defined as type
wasm/rust-v1+wasm-bindgen-shims).
- Added support for foundational-store (in wasmtime and wazero).
- Added support for new 'sf.substreams.rpc.v3.Stream/Blocks' endpoint that sends the full '.spkg' data with params and network value, so the client does not need to do any mangling.
- This requires the substreams server to support it (under
/sf.substreams.rpc.v3.Stream/*location). - On the
run,guiandsinkcommands, the--force-protocol-versionflag is available to specify protocol version (2 or 3); thev2endpoint will also be tried as fallback if the server responds with 404 or MethodNotAllowed.
- This requires the substreams server to support it (under
- Added foundational-store grpc client to substreams engine.
- Fixed module caching to properly handle modules with different runtime extensions.
- Added support for
http://andhttps://prefixes in the--endpointflag. Setting the protocol (http/https) in the URL will ignore the--plaintextflag setting. The default (no prefix) is still SSL. The enforcing of--plaintextand--insecurehas been relaxed: plaintext+insecure is simply plaintext. - Fixed the progress logs and prometheus metrics from
substreams sink noopwhen running with an output_module of type "index" in production mode (other sinks will now refuse to run in this mode) - Removed 'progress_last_contiguous_block' from sink logs, as it was often misleading. Getting a correct value in all cases would require doing a slow lookup on all cached files, which is not desirable.
- Fixed
substreams registry publishcommand now properly returns non-zero exit codes when publishing fails (e.g., authentication errors), enabling scripts and CI/CD pipelines to correctly detect failures. - Removed
substreams proxycommand
- BREAKING Concurrent streams and workers limits are now handled under the new session plugin (see CHANGELOG in github.com/streamingfast/firehose-core for details and usage)
- removed 'WorkerPoolFactory' from Tier1Modules
- removed 'GlobalRequestPool' from Tier1Modules
- added 'SessionPool' (dsession.SessionPool) to Tier1Modules
- BREAKING Add a maximum execution time for a full tier2 segment. By default, this is 60 minutes. It will fail with
rpc error: code = DeadlineExceeded desc = request active for too long. It can be configured from theSegmentExecutionTimeoutconfiguration option on Tier2Config or disabled by setting it to 0. - Improve log message for 'request active for a long time', adding stats.
- Fix
subscription channel at max capacityerror: when the LIVE channel is full (ex: slow module execution or slow client reader), the request will be continued from merged files instead of failing, and gracefully recover if performance is restored. - Fixed a small context memory leak when using wasmtime (especially with grpc-based metering plugin)
- BREAKING: Replaced
--infinite-retryboolean flag with--max-retriesinteger flag in sink package for more flexible retry control:--max-retries 0: No retries (fail immediately on first error)--max-retries 3: Default behavior (retry up to 3 times)--max-retries -1: Infinite retries (equivalent to old--infinite-retryflag)
- Fix zstd thread/mem leak on filereader
People using their own authentication layer will need to consider these changes before upgrading!
- Renamed config headers that come from authentication layer:
x-sf-user-idrenamed tox-user-id(from dauth module)x-sf-api-key-idrenamed tox-api-key-id(from dauth module)x-sf-metarenamed tox-meta(from dauth module)x-sf-substreams-parallel-jobsrenamed tox-substreams-parallel-workers
- Allow decreasing
x-substreams-parallel-workersthrough an HTTP headers (auth layer determines higher bound) - Detect value for the 'stage layer parallel executor max count' based on the
x-plan-tierheader (removedx-sf-substreams-stage-layer-parallel-executor-max-counthandling)
- Added
tgm://auth.thegraph.market?indexer-api-key=<API_KEY>&reissue-jwt-max-age-secs=600plugin that allows an indexer to use The Graph Market as the authentication source. An API key with special "indexer" feature is needed to allow repeated calls to the API without rate limiting (for Key-based authentication and reissuance of "untrusted long-lived JWTs").
-
Added
substreams registry verifycommand to validate a package is ready for publishing without actually publishing it. Only available asregistry verify(no alias). -
Added
--yesflag tosubstreams registry publishcommand to auto-confirm package publishing without prompting. -
Added
--team-slugflag tosubstreams registry publishcommand and deprecated--teamSlug(use--team-sluginstead). -
Refuse
<name>@latestin imports, this resolves to a different version at different busting the Substreams cache, use a specific version instead<name>@<version>whichversionmust respect semantic versioning (SemVer). -
Add close match suggestions when module name cannot be found on
substreams run/sinkcommand(s). -
Do not print usage report when there was no usage at all, usually when there is an error on
substreams run/sinkcommand(s). -
Improved error message when Substreams short package notation (
<name>@<version>) is used but malformed.
-
Added mechanism to immediately cancel pending requests that are doing an 'external call' (ex: eth_call) on a given block when it gets forked out (UNDO because of a reorg).
-
Fixed handling of invalid module kind: prevent heavy logging from recovered panic
-
Error considered deterministic which will cache the error forever are now suffixed with
<original message> (deterministic error).
- Improved
substreams runcommand output to have humanize bytes/values and harmonized output withsubstreams build. - Fixed GUI which didn't show the 'dev outputs' from other modules anymore in development mode.
- More tweaks to
substreams buildandsubstreams protogencommands output. - Added support for package version notation using
@syntax (e.g.,package@v1.2.3orpackage@latest) in manifest imports and package references. - Added
--prometheus-addrflag to sink commands for binding Prometheus metrics server to a specified address.
- Fixed
substreams buildcommand when there is no WASM file already present on disk.
- Improved Improved
substreams build,substreams protogenandsubstreams packcommand outputs to be streamlined and condensed. - Added support for reading manifest from stdin across all manifest-accepting commands using
"-"as the manifest path. Affected commands:build,run,gui,info,graph,pack,protogen. This enables dynamic manifest generation and preprocessing workflows, including integration with tools likeenvsubstfor environment variable substitution and CI/CD pipeline automation. - Added
substreams sink webhookcommand to send Substreams output to a webhook endpoint. See the documentation for more information. - Changed
substreams-api-token-envvarflag toapi-token-envvar - Changed
substreams-api-key-envvarflag toapi-key-envvar
- Moved github.com/streamingfast/substreams-sink library in this repo, under github.com/streamingfast/substreams/sink
- Re-release of v1.15.9 with missing dependency update.
- [BREAKING CHANGE]
substreams-tier2servers must be upgraded before tier1 servers, tier2 servers will stream outputs for the 'first segment', to speed up time to first block. - Return
processed_blockscounter to client at the end of the request. - Progress notifications will only be sent every 500ms for the first minute, then reduce rate up to every 5 seconds (can be overridden per request).
- Added
dev_output_modulesto protobuf request (if present, in dev mode, only send the output of the modules listed). - Added
progress_messages_interval_msto protobuf request (if present, overrides the rate of progress messages to that many milliseconds).
- Updated to latest networks registry version.
- Added
--proto-pathflag tosubstreams runandsubstreams guicommands: Allows loading protobuf definitions from a directory containing.protofiles on top of the substreams package protobuf definitions - Added
--proto-descriptor-setflag tosubstreams runandsubstreams guicommands: Allows loading protobuf definitions from a single protobuf descriptor set file on top of the substreams package protobuf definitions - Both flags work with both manifest files (
.yaml) and pre-compiled packages (.spkg), enabling additional protobuf types to be available during execution - Added
substreams unpackcommand to extract the contents of a .spkg file to a tweakable YAML manifest. - Added validation of protobuf outputs when doing 'pack' and 'publish' (they must have protobuf definitions attached to the manifest)
- Set
dev_output_modulesto only show the output_module when usingsubstreams run, and all non-imported modules when usingsubstreams gui - Print the
processed blockscounter to client at the end of the request
substreams runnow prints "Total Egress Bytes" as well as "Total Processed Bytes"
Rework the execout File read/write:
-
This reduces the RAM usage necessary to read and stream data to the user on tier1, as well as to read the existing execouts on tier2 jobs (in multi-stage scenario)
-
The cached execouts need to be rewritten to take advantage of this, since their data is currently not ordered: the system will automatically load and rewrite existing execout when they are used.
-
Code changes include:
- new FileReader / FileWriter that "read as you go" or "write as you go"
- No more 'KV' map attached to the File
- Split the IndexWriter away from its dependencies on execoutMappers.
- Clock distributor now also reads "as you go", using a small "one-block-cache"
-
Removed env var and behaviors:
- removed SUBSTREAMS_DISABLE_PRELOAD_EXEC_FILES (no more preloading, it was mostly useful because reading full file+unmarshal was necessary when streaming...)
- removed SUBSTREAMS_OUTPUT_SIZE_LIMIT_PER_SEGMENT (this is not a RAM issue anymore)
-
Add
uncompressed_egress_bytesfield tosubstreams request statslog message. Only tier1 will produce a non-zero value there.
- Tier2 jobs now write mapper outputs "as they progress", preventing memory usage spikes when saving them to disk. This should considerably reduce the memory footprint of tier2 instances.
- Tier2 jobs now limit writing and loading mapper output files to a maximum size of 8GiB by default.
- Added
SUBSTREAMS_OUTPUT_SIZE_LIMIT_PER_SEGMENTenvironment variable to control this new limit. - Gate the DebugAPI feature on tier2 with the
SUBSTREAMS_DEBUG_API_ADDRenvironment variable (set it tolocalhost:8081to keep behavior from v1.15.5)
- Removed the 'codegen subgraph' command from the CLI as SpS are being deprecated.
- Added
--skip-package-validationand--extension-configsflags totools tier2calldev command
-
The
substreams runwill now better render bytes depending on the network. -
The
substreams run/guiJSON rendered is now able to render knownanypb.Anytype correctly. -
Integrated the Network Registry to better track supported networks.
- Add SUBSTREAMS_STORE_SIZE_LIMIT env var to allow overwriting the default 1GiB value
- Add env var SUBSTREAMS_PRINT_STACK to enable printing full stack traces when caught panic occurs
- Prevent a deterministic failure on a module definition (mode, valueType, updatePolicy) from persisting when the issue is fixed in the substreams.yaml #621
- Metering events on tier2 now bundled at the end of the job (prevents sending metering events for failing jobs)
- Added metering for: "processed_blocks" (block * number of stages where execution happened) and "egress_bytes"
- Added a 'debug API' that listens on localhost:8081 and allows blocking connections, running GC, listing or canceling active requests.
- Add
unichainto the list of supported chains.
- dedupe modules with same hash when computing graph. (#619)
- prevent memory usage burst when writing mapper by streaming protobuf items to writer
- ignore "service currently overloaded" worker errors in the "maxRetries" count. Tier1 requests should not error out because tier2 servers are ramping up, only when they fail multiple times.
- Default SUBSTREAMS_WORKER_MAX_RETRIES now set to 5.
- Catch "store errors" as deterministic (ex: invalid operation, store too big...), writing them to the module cache as well as errors that happen directly in the WASM code.
- Ensure the 'error cache' is effective even when the "stop block" is unset (0)
- Fix 'SUBSTREAMS_WORKERS_RAMPUP_TIME' environment variable that was not being honored
substreams init: fix project creation when using the--force-download-cwdflag.
- Fix quicksave feature (incorrect block hash on quicksave)
- Fix logging of wasm external calls in
substreams request stats(previously missing in wasmtime engine)
- Save deterministic failures in WASM in the module cache (under a file named
errors.0123456789.zstat the failed block number), so further requests depending on this module at the same block can return the error immediately without re-executing the module.
substreams init: add Stellar to the list of supported grouped chains (this will require everyone to upgrade the CLI version to use codegen)substreams init: create project in a new directory, not in the current directory of the user. --substreams init: new Protobuf field to enforce versions with the codegen.
- Tier2 now returns GRPC error codes for
DeadlineExceededwhen it times out, andResourceExhaustedwhen a request is rejected due to overload - Tier1 now correctly reports tier2 job outcomes in the
substreams request stats - Added jitter in "retry" logic to prevent all workers from retrying at the same time when tier2 are overloaded
- Fix panic on tier2 when hitting a timeout for requests running from pre-cached module outputs
- Add environment variables to control retry behavior, "SUBSTREAMS_WORKER_MAX_RETRIES" (default 10) and "SUBSTREAMS_WORKER_MAX_TIMEOUT_RETRIES" (default 2), changing from previous defaults (720 and 3) The worker_max_timeout_retries is the number of retries specifically applied to block execution timing out (ex: because of external calls)
- The mechanism to slow down processing segments "ahead of blocks being sent to user" has been disabled on "noop-mode" requests, since these requests are used to pre-cache data and should not be slowed down.
- The "number of segments ahead" in this mechanism has been increased from
>number of parallel workers>to<number of parallel workers> * 1.5
- Bugfix on server: fix panic on requests disconnecting before the resolvedStartBlock is set.
- Properly reject requests with a stop-block below the "resolved" StartBlock (caused by module initialBlocks or a chain's firstStreamableBlock)
- Added the
resolved-start-blockto thesubstreams request statslog
- fix the 'Hint' when --limit-processed-blocks is too low, sometimes suggesting "0 or 0" and some typos
-
The
substreams guiflag--debug-modules-outputhas been removed, it had zero effect. -
The
substreams runflag--debug-modules-outputnow accepts regular expressions likesubstreams run --debug-modules-output=".*". -
Fixed
--skip-package-validationto also skip sub packages being imported. -
Added
--limit-processed-blocksflag tosubstreams runandsubstreams guito set thelimit_processed_blocksfield in the request -
The information messages in 'substreams run' now print to STDERR instead of STDOUT.
-
Added a mechanism to slow down processing "ahead of blocks being sent to user" for 'production-mode' requests. The tier1 will not schedule tier2 jobs over { max_parallel_subrequests } segments above the current block being streamed to the user. This will ensure that a user slowly reading blocks 1, 2, 3... will not trigger a flood of tier2 jobs for higher blocks, let's say 300_000_000, that might never get read.
-
Added a validation on a module for the existence of 'triggering' inputs: the server will now fail with a clear error message when the only available inputs are stores used with mode 'get' (not 'deltas'), instead of silenlty skipping the module on every block.
-
Fixed
runtime error: slice bounds out of rangeerror on heavy memory usage with wasmtime engin -
Added information about the number of blocks that need to be processed for a given request in the
sf.substreams.rpc.v2.SessionInitmessage -
Added an optional field
limit_processed_blocksto thesf.substreams.rpc.v2.Request. When set to a non-zero value, the server will reject a request that would process more blocks than the given value with theFailedPreconditionGRPC error code. -
Improved error messages when a module execution is timing out on a block (ex: due to a slow external call) and now return a
DeadlineExceededConnect/GRPC error code instead of a Internal. Removed 'panic' from wording. -
Improved connection draining on shutdown: Now waits for the end of the 'shutdown-delay' before draining and refusing new connections, then waits for 'quicksaves' and successful signaling of clients, up to a max of 30 sec.
-
In
substreams request statslog, add fields:remote_jobs_completed,remote_blocks_processedandtotal_uncompressed_read_bytes
- Fix a bug where a 'worker pool' could incorrectly get exhausted
- Fix another
cannot resolve 'old cursor' from files in passthrough mode -- not implementedbug when receiving a request in production-mode with a cursor that is below the "linear handoff" block
This release brings performance improvements to the substreams engine, through the introduction of a new "QuickSave" feature, and a switch to wasmtime as the default runtime for Rust modules.
-
Implement "QuickSave" feature to save the state of "live running" substreams stores when shutting down, and then resume processing from that point if the cursor matches.
- enabled if the "QuickSaveStoreURL" attribute is not empty in the tier1 config
- requires the "CheckPendingShutdown" module to be passed to the app via NewTier1()
-
Rust modules will now be executed with
wasmtimeby default instead ofwazero.- Prevents the whole server from stalling in certain memory-intensive operations in wazero.
- Speed improvement: cuts the execution time in half in some circumstances.
- Wazero is still used for modules with
wbindgenand modules compiled withtinygo. - Set env var
SUBSTREAMS_WASM_RUNTIME=wazeroto revert to previous behavior.
- Fixed
--skip-package-validationto also skip sub packages being imported. - Trim down packages when using 'imports': only the modules explicitly defined in the YAML manifest and their dependencies will end up in the final spkg.
-
Added
GlobalRequestPoolto theTier1Modulesstruct inapp/tier1.goand integrated it into theRunmethod to enhance request lifecycle management. When set, theGlobalRequestPoolwill manage the borrowing, quotas, and keep-alive mechanisms for user requests via requests to a GRPC remote server. -
Added
WorkerPoolFactoryto theTier1Modulesstruct inapp/tier1.goand integrated it into theRunmethod to enhance worker lifecycle management. When set, theWorkerPoolwill manage the borrowing, quotas, and keep-alive mechanisms for worker subrequests on tier2, via requests to a GRPC remote server.
-
Added 'shared cache' on tier1: execution of modules near the HEAD of the chain will be done once for a given module hash and the result shared between requests. This will reduce CPU usage and increase performance when many requests are using the same modules (ex: foundational modules)
-
Improved "time to first block" when a lot of cached files exist on dependency substreams modules by skipping reads segments that won't be used and assuming stores "full KVs" are always filled sequentially (since they are!)
-
Limit parallel execution of a stage's layer. Previously, the engine was executing modules in a stage's layer all in parallel. We now change that behavior, development mode will from now on execute every sequentially and when in production mode will limit parallelism to 2 (hard-coded) for now. The auth plugin can control that value dynamically by providing a trusted header
X-Sf-Substreams-Stage-Layer-Parallel-Executor-Max-Count. -
Fixed a regression since "v1.12.2" where the SkipEmptyOutput instruction was ignored in substreams mappers
- Removed enforcement of
BUFBUILD_AUTH_TOKENenvironment variable when using descriptor sets. It appears there is now a public free tier to query those which should work in most cases. - When running Solana package, set base58 encoding by default in the GUI.
- Add Sei Mainnet to the
ChainConfigByIDmap.
- Fix log regression on 'substreams request stats' (bad value for production_mode/tier)
-
Added
WorkerPoolFactorytoTier1ModulesandRemoteWorkerClienttoTier2Modulesto support enhanced worker pool management. -
Introduced
WorkerKeepAliveDelayinTier1Configto manage worker pool keep-alive settings. -
Updated
Tier1AppandTier2Appto utilize the new worker pool components in theirRunmethods. -
Refactored the orchestrator
looppackage to introduce a newMsginterface and associated message types, enhancing the message handling mechanism. -
Modified the
SchedulerandParallelProcessorto integrate with the new worker pool interface and handle job scheduling more effectively. -
Introduce
loop.IsMsginterface to ensure proper message handling. -
Improve noop-mode: will now only send one signal per bundle, without any data.
-
Improve logging.
- Add
--noop-modeflag tosubstreams runas a simple way to force the server to generate caches in production-mode.
- Add Stellar Mainnet and Testnet to the HardcodedEndpoints map.
- Fix a panic when a substreams was using an index as an input which contained empty output
-
Fixed
tier2app not setting itself as ready on startup -
Added extra ad-hoc prometheus labels 'tools prometheus-explorer' as query params to each endpoint.
- Fix a thread leak in cursor resolution resulting in a bad value for active_connections metric
- Fix detection of accepted gzip compression when multiple values are sent in the
Grpc-Accept-Encodingheader (ex: Python library) - Properly accept and compress responses with
gzipfor browser HTTP clients using ConnectWeb withAccept-Encodingheader - Allow setting subscription channel max capacity via
SOURCE_CHAN_SIZEenv var (default: 100) - Added tier1 app configuration option to limit max active requests a single instance can accept before starting to reject them with 'Unavailable' gRPC code.
- Added tier1 & tier2 app new Prometheus metric
substreams_{tier1,tier2}_rejected_request_counter, to track rejected request, especially when hard limit is reached.
-
improvements to 'tools prometheus-explorer'
- change flags
lookup_intervalandlookup_timeoutto--intervaland--timeout - now support relative block (default is now: -1) and does not use 'final-blocks-only' flag on request
- add
--max-freshnessflag to check for block age (when using relative block) - add
substreams_healthcheck_block_age_msprometheus metric --block-heightis now a flag instead of a positional argument- improve logging
- removed "3 retries" that were built in and causing more confusion
- change flags
-
add User-Agent headers depending on the client command
- Fixed: detection of gzip compression on 'connect' protocol (js/ts clients)
- Added: tier1.Config
EnforceCompressionto refuse incoming connections that do not support GZIP compression (default: false)
- Fix too many memory allocations impacting performance when stores are used
- Force topological ordering of protobuf descriptors when 'packing' an spkg (affecting current substreams-js clients)
- Allow
substreams packto be able to do a "re-packing" of an existing spkg file. Useful to apply the protobuf descriptor ordering fix.
- Rebuilt of v1.11.1 to generate Docker
latesttag with revamp Docker image building. - Substreams CLI is now built with using Ubuntu 22, previous releases were built using Ubuntu 20.
- Substreams Docker image is now using
ubuntu:22as its base, previous releases were built usingubuntu:20.04.
- Fix the
guibreaking when the network field is not set in the spkg - Fixed
SUBSTREAMS_REGISTRY_TOKENenvironment variable not taking precedence over theregistry-tokenfile.
- Commands
run,guiandinfonow accept the new standard package definition (ex:ethereum-common@latest) to reference an spkg file fromhttps://substreams.dev. - Changed
substreams run: the two positional parameters now align withgui:[package [module_name]]. The syntaxsubstreams run <module_name>is not accepted anymore. - Added
substreams publishtopublisha package on the substreams registry (check onhttps://substreams.dev). - Added
substreams registrytologinandpublishon the substreams registry (check onhttps://substreams.dev). - Added
substreams tools extract-wasmto extract a wasm file from a substreams package.
- Add
avalanche-mainnetto the CLI.
- Fix
substreams guiselecting the wrong module in the 'outputs' view if there is no output the selected output_module. - Add the block 'age' printed clock headers in the
substreams runcommand.
- Add Mantra Mainnet and Testnet to the HardcodedEndpoints map.
- Add Vara Mainnet and Testnet to the HardcodedEndpoints map.
- Fix
substreams guicommand downloading spkg twice which would cause some issues with spkg that are very big. - Add base58 decoding in the output view for the
substreams gui
Note All caches for stores using the updatePolicy
set_sum(added in substreams v1.7.0) and modules that depend on them will need to be deleted, since they may contain bad data.
- Fix bad data in stores using
set_sumpolicy: squashing of store segments incorrectly "summed" some values that should have been "set" if the last event for a key on this segment was a "sum" - Fix panic in initialization (
metrics sender not set)
- Fix small bug making some requests in development-mode slow to start (when starting close to the module initialBlock with a store that doesn't start on a boundary)
- Fixed
substreams buildcreating a buf.gen.yaml file with absolute paths (should be relative) - Removed
--show-generated-buf-genflag tosubstreams protogen - Bumped neoeinstein-prost version in auto-generated
buf.gen.yamlfile when usingsubstreams protogenorsubstreams build(compatible with new substreams-0.6 and prost-0.13)
- Fixed
substreams guipanic (regression appeared in v1.10.3)
- Fixed an(other) issue where multiple stores running on the same stage with different initialBlocks will fail to proress (and hang)
- Fix bug where some invalid cursors may be sent (with 'LIB' being above the block being sent) and add safeguard/loggin if the bug appears again
- Fix panic in the whole tier2 process when stores go above the size limit while being read from "kvops" cached changes
- Add
-o cursoroutput type tosubstreams runfor debugging purposes
- Fix "cannot resolve 'old cursor' from files in passthrough mode" error on some requests with an old cursor
- Fix handling of 'special case' substreams module with only "params" as its input: should not skip this execution (used in graph-node for head tracking)
-> empty files in module cache with hash
d3b1920483180cbcd2fd10abcabbee431146f4c8should be deleted for consistency
- Add
substreams tools default-endpoint {network-name}to help with auto-configuration tools - Bump
substreams initprotocol version to "1" to be compatible with new codegen endpoint
substreams gui: fix panic in some conditions when streaming from block 0
Note Since a bug that affected substreams with "skipping blocks" was corrected in this release, any previously produced substreams cache should be considered as possibly corrupted and be eventually replaced
- Fix handling of modules that receive both filtered AND unfiltered data as their inputs -> some "repeated entries" could appear where no data should have showed up
- Fix stalling on substreams with both map and store with different initialBlocks on the same stage
- Fix: prevent execution of modules that should be skipped when running live or dev mode (different outputs than when running in batch mode on tier2)
substreams guifixed a panic occuring if the given package path doesn't existsubstreams initmust now be called from within your project folder (it no longer downloads file in a subdirectory)- (since v1.10.0)
substreams guino longer accepts "output_module" as a single argument. It either receives nothing, the package, or the package followed by the output_module
- Add
sf.substreams.rpc.v2.EndpointInfo/Infoendpoint (if the infoserver is given as a module, i.e. from firehose-core) - Add an execution timeout of 3 minutes per block by default (can be overridden in tier1/tier2 Configs) -- this is useful when an external (eth_call) is stuck on a forked block hash.
- Revert 'initialBlocks' changes from v1.9.1 because a 'changing module hash' causes more trouble.
- Wazero: bump v1.8.0 and activate caching of precompiled wasm modules in
/tmp/wazeroto decrease compilation time - Metering update: more detailed metering with addition of new metrics (
live_uncompressed_read_bytes,live_uncompressed_read_forked_bytes,file_uncompressed_read_bytes,file_uncompressed_read_forked_bytes,file_compressed_read_forked_bytes,file_compressed_read_bytes,file_uncompressed_write_bytes,file_compressed_write_bytes). DEPRECATION WARNING:bytes_readandbytes_writtenmetrics will be removed in the future, please use the new metrics for metering instead. - Manifest reader: increase timeout of remote spkg fetch to 5 minutes, up from 30 seconds
- Add
substreams authcommand, to authenticate viathegraph.marketand to get a dev API Key. - Rename
--discovery-endpointintocodegen-endpointinsubstreams initcommand. - Add
substreams codegen subgraphcommand that takes a substreamsmoduleand anspkgand that generates a simplesubgraphfrom themoduleoutput. - On
substreams initcommand, if flag--state-fileis provided, the state file is used by default for project generation. - In
substreams initcommand, the state file is named using aDate formatand not usingUnixanymore. - Tools->prometheus: added the possibility to override the start-block on an endpoint
substreams guino longer accepts "output_module" as a single argument. It either receives nothing, the package, or the package followed by the output_module
- Fixed error handling issue in 'backprocessing' causing high CPU usage in tier1 servers
- Fixed handling of packages referenced by
ipfs://URL (now simply using /api/v0/cat?arg=...) - Added
--used-modules-onlyflag tosubstreams infoto only show modules that are in execution tree for the given output_module
- Added support for directly reading spkg file that is compressed with zstd (from http, gs, s3, azure or local)
- Prevent Noop handler from sending outputs with 'Stalled' step in cursor (which breaks substreams-sink-kv)
Fixed substreams hanging in production-mode on chains with a 'first-streamable-block' higher than 0:
- all initialBlocks will be 'bumped' to the first-streamable-block if it is higher
- this will affect the module hashes: use
substreams info --first-streamable-block=<block_num>to see how a value will affect your modules - modules with initialBlocks higher than the first-streamable-block of a chain will be unaffected.
- Fix a bug introduced in v1.6.0 that could result in corrupted store "state" file if all the "outputs" were already cached for a module in a given segment (rare occurence)
- We recommend clearing your substreams cache after this upgrade and re-processing or validating your data if you use stores.
- substreams 'tools decode state' now correctly prints the
kvopswhen pointing to store output files
- Expose a new intrinsic to modules:
skip_empty_output, which causes the module output to be skipped if it has zero bytes. (Watch out, a protobuf object with all its default values will have zero bytes) - Improve schedule order (faster time to first block) for substreams with multiple stages when starting mid-chain
substreams init(code generation): fix displaying of saved path in filenames
- Add a
NoopModeto theTier1enabling to avoid sending data back to requester while processing live.
The substreams init command now fetches a list of available 'code generators' to "https://codegen.substreams.dev".
Upon selection of a code generator, it launches an interactive session to gather the information necessary to build your substreams.
This allows flexibility and getting anything from "skeleton" of a substreams for a given chain up to a fully built .spkg file with subgraph bindings.
- Add 'compressed' boolean field to the 'incoming request' log
- Add a substreams
live back filler, so a request running close to HEAD in production-mode on tier1 will trigger jobs on tier2 when boundaries are passed by final blocks, backfilling the cache. These jobs will be "unmetered".
- Fixed Substreams tier1 active worker request metrics that was not decrementing correctly.
- Truncate error messages log lines to 18k characters to prevent them from disappearing through some load balancers.
- Removed local ethereum code generation from
initcommand.
- Faster bootstrapping through bstream improvements, now only loads and keeps 200 blocks below LIB to link with merged blocks.
- Fixed delay in serving requests close to chain HEAD when using production-mode
- If module with
useattribute has notinputsat all, inputs are replaced by used module inputs - If module with
useattribute has noblockFilter, it's replaced by used moduleblockFilter - If
blockFilteris set to{}, it will be considered asnilin the spkg, enabling module withuseattribute to override theblockFilterby anilone
- Substreams engine is now able run Rust code that depends on
solana_programin Solana land to decode andalloy/ether-rsin Ethereum land
Those libraries when used in a wasm32-unknown-unknown context creates in a bunch of wasmbindgen imports in the resulting Substreams Rust code, imports that led to runtime errors because Substreams engine didn't know about those special imports until today.
The Substreams engine is now able to "shims" those wasmbindgen imports enabling you to run code that depends libraries like solana_program and alloy/ether-rs which are known to pull those wasmbindgen imports. This is going to work as long as you do not actually call those special imports. Normal usage of those libraries don't accidentally call those methods normally. If they are called, the WASM module will fail at runtime and stall the Substreams module from going forward.
To enable this feature, you need to explicitly opt-in by appending a +wasm-bindgen-shims at the end of the binary's type in your Substreams manifest:
binaries:
default:
type: wasm/rust-v1
file: <some_file>to become
binaries:
default:
type: wasm/rust-v1+wasm-bindgen-shims
file: <some_file>-
substreams.yaml now supports
localPathattribute underprotobuf.descriptorSets, so you can pre-build a descriptor set usingbuf build --as-file-descriptor-set -o myfile.binpband add it directly to your substreams package. -
Substreams clients now enable gzip compression over the network (already supported by servers).
-
Substreams binary type can now be optionally composed of runtime extensions by appending a
+<extension>,[<extesions...>]at the end of the binary type. Extensions arekey[=value]that are runtime specifics.[!NOTE] If you were a library author and parsing generic Substreams manifest(s), you will now need to handle that possibility in the binary type. If you were reading the field without any processing, you don't have to change nothing.
-
Fixed a failure in protogen where duplicate files would "appear multiple times" and fail.
-
Fixed bug with block rate underflow in
gui.
-
Added store with update policy
set_sumwhich allows the store to either sum a numerical value, or set it to a new value. -
Re-added Ethereum Sepolia support in
substreams init. -
Fixed a bug with the new
descriptorSetsfeature that wasn't ordered properly to correctly generate Protobuf bindings.
- execout: preload only one file instead of two, log if undeleted caches found
- execout: add environment variable SUBSTREAMS_DISABLE_PRELOAD_EXEC_FILES to disable file preloading
- Revert sanity check to support the special case of a substreams with only 'params' as input. This allows a chain-agnostic event to be sent, along with the clock.
- Fix error handling when resolved start-block == stop-block and stop-block is defined as non-zero
Note Upgrading to v1.6.0 will require changing the tier1 and tier2 versions concurrently, as the internal protocol has changed.
- Index Modules and Block Filter can now be used to speed up processing and reduce the amount of parsed data.
- When indexes are used along with the
BlockFilterattribute on a mapper, blocks can be skipped completely: they will not be run in downstreams modules or sent in the output stream, except in live segment or in dev-mode, where an empty 'clock' is still sent. - See https://github.com/streamingfast/substreams-foundational-modules for an example implementation
- Blocks that are skipped will still appear in the metering as "read bytes" (unless a full segment is skipped), but the index stores themselves are not "metered"
- The scheduler no longer duplicates work in the first segments of a request with multiple stages.
- Fix all issues with running a substreams where modules have different "initial blocks"
- Maximum Tier1 output speed improved for data that is already processed
- Tier1 'FileWalker' now polls more aggressively on local filesystem to prevent extra seconds of wait time.
- Fix a bug in the
guithat would crash when trying torestart the stream. - fix total read bytes in case data already cache
- New environment variable
SUBSTREAMS_WORKERS_RAMPUP_TIMEcan specify the initial delay before tier1 will reach the number of tier2 concurrent requests. - Add 'clock' output to
substreams runcommand, useful mostly for performance testing or pre-caching - (alpha) Introduce the
wasip1/tinygo-v1binary type.
- Disabled
otelcol://tracing protocol, its mere presence affected performance. - Previous value for
SUBSTREAMS_WORKERS_RAMPUP_TIMEwas4s, now set to0, disabling the mechanism by default.
- Fix bug where substreams tier2 would sometimes write outputs with the wrong tag (leaked from another tier1 request)
- Removed MaxWasmFuel since it is not supported in Wazero
- bump wazero execution to fix issue with certain substreams causing the server process to freeze
- Allow unordered ordinals to be applied from the substreams (automatic ordering before flushing to stores)
- add
substreams_tier1_worker_retry_countermetric to count all worker errors returned by tier2 - add
substreams_tier1_worker_rejected_overloaded_countermetric to count only worker errors with string "service currently overloaded" - add
google/protobuf/duration.prototo system proto files - Support for buf build urls in substreams manifest. Ex.:
protobuf:
buf_build:
- buf.build/streamingfast/firehose-cosmos- fix a possible panic() when an request is interrupted during the file loading phase of a squashing operation.
- fix a rare possibility of stalling if only some fullkv stores caches were deleted, but further segments were still present.
- fix stats counters for store operations time
Performance, memory leak and bug fixes
- fix memory leak on substreams execution (by bumping wazero dependency)
- prevent substreams-tier1 stopping if blocktype auto-detection times out
- allow specifying blocktype directly in Tier1 config to skip auto-detection
- fix missing error handling when writing output data to files. This could result in tier1 request just "hanging" waiting for the file never produced by tier2.
- fix handling of dstore error in tier1 'execout walker' causing stalling issues on S3 or on unexpected storage errors
- increase number of retries on storage when writing states or execouts (5 -> 10)
- prevent slow squashing when loading each segment from full KV store (can happen when a stage contains multiple stores)
- prevent 'gui' command from crashing on 'incomplete' spkgs without moduledocs (when using --skip-package-validation)
- Fix a context leak causing tier1 responses to slow down progressively
- Fix a panic on tier2 when not using any wasm extension.
- Fix a thread leak on metering GRPC emitter
- Rollback scheduler optimisation: different stages can run concurrently if they are schedulable. This will prevent taking much time to execute when restarting close to HEAD.
- Add
substreams_tier2_active_requestsandsubstreams_tier2_request_counterprometheus metrics - Fix the
tools tier2callmethod to make it work with the new 'generic' tier2 (added necessary flags)
- A single substreams-tier2 instance can now serve requests for multiple chains or networks. All network-specific parameters are now passed from Tier1 to Tier2 in the internal ProcessRange request.
Important
Since the tier2 services will now get the network information from the tier1 request, you must make sure that the file paths and network addresses will be the same for both tiers.
Tip
The cached 'partial' files no longer contain the "trace ID" in their filename, preventing accumulation of "unsquashed" partial store files. The system will delete files under '{modulehash}/state' named in this format{blocknumber}-{blocknumber}.{hexadecimal}.partial.zst when it runs into them.
-
Implement a
usefeature, enabling a module to use an existing module by overriding its inputs or initial block. (Inputs should have the same output type than override module's inputs). Check a usage of this new feature on the substreams-db-graph-converter repository. -
Fix panic when using '--header (-H)' flag on
guicommand -
When packing substreams, pick up docs from the README.md or README in the same directory as the manifest, when top-level package.doc is empty
-
Added "Total read bytes" summary at the end of 'substreams run' command
Some redundant reprocessing has been removed, along with a better usage of caches to reduce reading the blocks multiple times when it can be avoided. Concurrent requests may benefit the other's work to a certain extent (up to 75%)
-
All module outputs are now cached. (previously, only the last module was cached, along with the "store snapshots", to allow parallel processing). (this will increase disk usage, there is no automatic removal of old module caches)
-
Tier2 will now read back mapper outputs (if they exist) to prevent running them again. Additionally, it will not read back the full blocks if its inputs can be satisfied from existing cached mapper outputs.
-
Tier2 will skip processing completely if it's processing the last stage and the
output_moduleis a mapper that has already been processed (ex: when multiple requests are indexing the same data at the same time) -
Tier2 will skip processing completely if it's processing a stage that is not the last, but all the stores and outputs have been processed and cached.
-
The "partial" store outputs no longer contain the trace ID in the filename, allowing them to be reused. If many requests point to the same modules being squashed, the squasher will detect if another Tier1 has squashed its file and reload the store from the produced full KV.
-
Scheduler modification: a stage now waits for the previous stage to have completed the same segment before running, to take advantage of the cached intermediate layers.
-
Improved file listing performance for Google Storage backends by 25%
-
Tier2 service now supports a maximum concurrent requests limit. Default set to 0 (unlimited).
-
Readiness metric for Substreams tier1 app is now named
substreams_tier1(was mistakenly calledfirehosebefore). -
Added back readiness metric for Substreams tiere app (named
substreams_tier2). -
Added metric
substreams_tier1_active_worker_requestswhich gives the number of active Substreams worker requests a tier1 app is currently doing against tier2 nodes. -
Added metric
substreams_tier1_worker_request_counterwhich gives the total Substreams worker requests a tier1 app made against tier2 nodes.
-
Fixed
substreams initgenerated The Graph GraphQL regarding wrongBooltypes. -
The
substreams initcommand can now be used on Arbitrum Mainnet network.
This release brings important server-side improvements regarding performance, especially while processing over historical blocks in production-mode.
- Performance: prevent reprocessing jobs when there is only a mapper in production mode and everything is already cached
- Performance: prevent "UpdateStats" from running too often and stalling other operations when running with a high parallel jobs count
- Performance: fixed bug in scheduler ramp-up function sometimes waiting before raising the number of workers
- Added support for authentication using api keys. The env variable can be specified with
--substreams-api-key-envvarand defaults toSUBSTREAMS_API_KEY. - Added the output module's hash to the "incoming request"
- Added
trace_idin grpc authentication calls - Bumped connect-go library to new "connectrpc.com/connect" location
- Enable gRPC reflection API on tier1 substreams service
- Added
substreams initsupport for creating a substreams with data from fully-decoded Calls instead of only extracting events.
- Added
substreams initsupport for creating a substreams with the "Dynamic DataSources" pattern (ex: aFactorycontract creatingpoolcontracts through thePoolCreatedevent) - Changed
substreams initto always add prefixes the tables and entities with the project name - Fixed
substreams initsupport for unnamed params and topics on log events
-
Fixed
substreams initgenerated code when dealing with Ethereum ABI events containing array types.[!NOTE] For now, the generated code only works with Postgres, an upcoming revision is going to lift that constraint.
- Fixed
store.has_atWazero signature which was defined ashas_at(storeIdx: i32, ord: i32, key_ptr: i32, key_len: i32)but should have beenhas_at(storeIdx: i32, ord: i64, key_ptr: i32, key_len: i32). - Fixed the local
substreams alpha service serveClickHouse deployment which was failing with a message regarding fork handling. - Catch more cases of WASM deterministic errors as
InvalidArgument. - Added some output-stream info to logs.
- Fixed error-passing between tier2 and tier1 (tier1 will not retry sending requests that fail deterministicly to tier2)
- Tier1 will now schedule a single job on tier2, quickly ramping up to the requested number of workers after 4 seconds of delay, to catch early exceptions
- "store became too big" is now considered a deterministic error and returns code "InvalidArgument"
- Support new
networksconfiguration block insubstreams.yamlto override modules' params and initial_block. Network can be specified at run-time, avoiding the need for separate spkg files for each chain. - [BREAKING CHANGE] Remove the support for the
deriveFromoverrides. Theimports, along with the newnetworksfeature, should provide a better mechanism to cover the use cases thatderiveFromtried to address.
{% hint style="info" %}
These changes are all handled in the substreams CLI, applying the necessary changes to the package before sending the requests. The Substreams server endpoints do not need to be upgraded to support it. {% endhint %}
- Added
networksfield at the top level of the manifest definition, withinitialBlockandparamsoverrides for each module. See the substreams.yaml.example file in the repository or https://substreams.streamingfast.io/reference-and-specs/manifests for more details and example usage. - The networks
paramsand `initialBlock`` overrides for the chosen network are applied to the module directly before being sent to the server. All network configurations are kept when packing an .spkg file. - Added the
--networkflag for choosing the network onrun,guiandalpha service deploycommands. Default behavior is to use the one defined asnetworkin the manifest. - Added the
--endpointflag tosubstreams alpha service serveto specify substreams endpoint to connect to - Added endpoints for Antelope chains
- Command 'substreams info' now shows the params
- Removed the handling of the
DeriveFromkeyword in manifest, this override feature is going away. - Removed the `--skip-package-validation`` option only on run/gui/inspect/info
- Added the
--paramsflag toalpha service deployto apply per-module parameters to the substreams before pushing it. - Renamed the
--parametersflag to--deployment-paramsinalpha service deploy, to clarify the intent of those parameters (given to the endpoint, not applied to the substreams modules) - Small improvement on
substreams guicommand: no longer reads the .spkg multiple times with different behavior during its process.
- Fixed bug in
substreams initwith numbers in ABI types
- Return the correct GRPC code instead of wrapping it under an "Unknown" error. "Clean shutdown" now returns CodeUnavailable. This is compatible with previous substreams clients like substreams-sql which should retry automatically.
- Upgraded components to manage the new block encapsulation format in merged-blocks and on the wire required for firehose-core v1.0.0
- Fix fuzzy matching when endpoint require auth headers
- Fix panic in "serve" when trying to delete a non-existing deployment
- Add validation check of substreams package before sending deploy request to server
-
Codegen: substreams-database-change to v1.3, properly generates primary key to support chain reorgs in postgres sink.
-
Sink server commands all moved from
substreams alpha sink-*tosubstreams alpha service * -
Sink server: support for deploying sinks with DBT configuration, so that users can deploy their own DBT models (supported on postgres and clickhouse sinks). Example manifest file segment:
[...] sink: module: db_out type: sf.substreams.sink.sql.v1.Service config: schema: "./schema.sql" wire_protocol_access: true postgraphile_frontend: enabled: true pgweb_frontend: enabled: true dbt: files: "./dbt" run_interval_seconds: 60
where "./dbt" is a folder containing the dbt project.
-
Sink server: added REST interface support for clickhouse sinks. Example manifest file segment:
[...] sink: module: db_out type: sf.substreams.sink.sql.v1.Service config: schema: "./schema.clickhouse.sql" wire_protocol_access: true engine: clickhouse postgraphile_frontend: enabled: false pgweb_frontend: enabled: false rest_frontend: enabled: true
- Fix
substreams infocli doc field which wasn't printing any doc output
- Optimized start of output stream in developer mode when start block is in reversible segment and output module does not have any stores in its dependencies.
- Fixed bug where the first streamable block of a chain was not processed correctly when the start block was set to the default zero value.
- Codegen: Now generates separate substreams.{target}.yaml files for sql, clickhouse and graphql sink targets.
- Codegen: Added support for clickhouse in schema.sql
- Fixed metrics for time spent in eth_calls within modules stats (server and GUI)
- Fixed
undojson message in 'run' command - Fixed stream ending immediately in dev mode when start/end blocks are both 0.
- Sink-serve: fix missing output details on docker-compose apply errors
- Codegen: Fixed pluralized entity created for db_out and graph_out
- Fixed a regression where start block was not resolved correctly when it was in the reversible segment of the chain, causing the substreams to reprocess a segment in tier 2 instead of linearly in tier 1.
- Missing decrement on metrics
substreams_active_requests
substreams_active_requestsandsubstreams_countermetrics tosubstreams-tier1
evt_block_timein ms to timestamp inlib.rs, proto definition andschema.sql
- This release brings the
substreams initcommand out of alpha! You can quickly generate a Substreams from an Ethereum ABI:
- New Alpha feature: deploy your Substreams Sink as a deployable unit to a local docker environment!

- See those two new features in action in this tutorial
-
Sink configs can now use protobuf annotations (aka Field Options) to determine how the field will be interpreted in substreams.yaml:
-
load_from_filewill put the content of the file directly in the field (string and bytes contents are supported). -
zip_from_folderwill create a zip archive and put its content in the field (field type must be bytes).Example protobuf definition:
import "sf/substreams/v1/options.proto"; message HostedPostgresDatabase { bytes schema = 1 [ (sf.substreams.v1.options).load_from_file = true ]; bytes extra_config_files = 2 [ (sf.substreams.v1.options).zip_from_folder = true ]; }Example manifest file:
[...] network: mainnet sink: module: main:db_out type: sf.substreams.sink.sql.v1.Service config: schema: "./schema.sql" wire_protocol_access: true postgraphile_frontend: enabled: true pgweb_frontend: enabled: true
-
-
substreams infocommand now properly displays the content of sink configs, optionally writing the fields that were bundled from files to disk with--output-sinkconfig-files-path=</some/path>
-
substreams alpha initrenamed tosubstreams init. It now includesdb_outmodule andschema.sqlto support the substreams-sql-sink directly. -
The override feature has been overhauled. Users may now override an existing substreams by pointing to an override file in
runorguicommand. This override manifest will have aderiveFromfield which points to the original substreams which is to be overriden. This is useful to port a substreams to one network to another. Example of an override manifest:deriveFrom: path/to/mainnet-substreams.spkg #this can also be a remote url package: name: "polygon-substreams" version: "100.0.0" network: polygon initialBlocks: module1: 17500000 params: module1: "address=2a75ca72679cf1299936d6104d825c9654489058" -
The
substreams runandsubstreams guicommands now determine the endpoint from the 'network' field in the manifest if no value is passed in the--substreams-endpointflag. -
The endpoint for each network can be set by using an environment variable
SUBSTREAMS_ENDPOINTS_CONFIG_<network_name>, ex:SUBSTREAMS_ENDPOINTS_CONFIG_MAINNET=my-endpoint:443 -
The
substreams alpha inithas been moved tosubstreams init
- fixed the
substreams guicommand to correctly compute the stop-block when given a relative value (ex: '-t +10')
- Fixed (bumped) substreams protobuf definitions that get embedded in
spkgto match the new progress messages from v1.1.12. - Regression fix: fixed a bug where negative start blocks would not be resolved correctly when using
substreams runorsubstreams gui. - In the request plan, the process previously panicked when errors related to block number validation occurred. Now the error will be returned to the client.
- If the initial block or start block is less than the first block in the chain, the substreams will now start from the first block in the chain. Previously, setting the initial block to a block before the first block in the chain would cause the substreams to hang.
- Fixed a bug where the substreams would fail if the start block was set to a future block. The substreams will now wait for the block to be produced before starting.
- Complete redesign of the progress messages:
- Tier2 internal stats are aggregated on Tier1 and sent out every 500ms (no more bursts)
- No need to collect events on client: a single message now represents the current state
- Message now includes list of running jobs and information about execution stages
- Performance metrics has been added to show which modules are executing slowly and where the time is spent (eth calls, store operations, etc.)
[!IMPORTANT] The client and servers will both need to be upgraded at the same time for the new progress messages to be parsed:
- The new Substreams servers will NOT send the old
modulesfield as part of itsprogressmessage, only the newrunning_jobs,modules_stats,stages.- The new Substreams clients will NOT be able to decode the old progress information when connecting to older servers.
However, the actual data (and cursor) will work correctly between versions. Only incompatible progress information will be ignored.
- Bumped
substreamsandsubstreams-ethereumto latest insubstreams alpha init. - Improved error message when
<module_name>is not received, previously this would lead to weird error message, now, if the input is likely a manifest, the error message will be super clear.
- Fixed compilation errors when tracking some contracts when using
substreams alpha init.
-
substreams infonow takes an optional second parameter<output-module>to show how the substreams modules can be divided into stages -
Pack command: added
-cflag to allow overriding of certain substreams.yaml values by passing in the path of a yaml file. example yaml contents:package: name: my_custom_package_name network: arbitrum-one initialBlocks: module_name_1: 123123123 params: mod1: "custom_parameter"
- Removed
Config.RequestStats, stats are now always enabled.
- Added metering of live blocks
- Fixed/Removed: jobs would hang when config parameter
StateBundleSizewas different fromSubrequestsSize. The latter has been removed completely: Subrequests size will now always be aligned with bundle size. - Auth: added support for continuous authentication via the grpc auth plugin (allowing cutoff triggered by the auth system).
- Fixed params handling in
guimode
- Massive refactoring of the scheduler: prevent excessive splitting of jobs, grouping them into stages when they have the same dependencies. This should reduce the required number of
tier2workers (2x to 3x, depending on the substreams). - The
tier1andtier2config have a new configurationStateStoreDefaultTag, will be appended to theStateStoreURLvalue to form the final state store URL, ex:StateStoreURL="/data/states"andStateStoreDefaultTag="v2"will make/data/states/v2the default state store location, while allowing users to provide aX-Sf-Substreams-Cache-Tagheader (gated by auth module) to point to/data/states/v1, and so on. - Authentication plugin
trustcan now specify an exclusive list ofallowedheaders (all lowercase), ex:trust://?allowed=x-sf-user-id,x-sf-api-key-id,x-real-ip,x-sf-substreams-cache-tag - The
tier2app no longer has customizable auth plugin (or any Modules),trustwill always be used, so thattiercan pass down its headers (e.g.X-Sf-Substreams-Cache-Tag). Thetier2instances should not be accessible publicly.
- Color theme is now adapted to the terminal background (fixes readability on 'light' background)
- Provided parameters are now shown in the 'Request' tab.
alpha initcommand: replaceinitialBlockfor generated manifest based on contract creation block.alpha initprompt Ethereum chain. Added: Mainnet, BNB, Polygon, Goerli, Mumbai.
alpha initreports better progress specially when performing ABI & creation block retrieval.alpha initcommand without contracts fixed Protogen command invocation.
- Max-subrequests can now be overridden by auth header
X-Sf-Substreams-Parallel-Jobs(note: if your auth plugin is 'trust', make sure that you filter out this header from public access - Request Stats logging. When enable it will log metrics associated to a Tier1 and Tier2 request
- On request, save "substreams.partial.spkg" file to the state cache for debugging purposes.
- Manifest reader can now read 'partial' spkg files (without protobuf and metadata) with an option.
- Fixed a bug which caused "live" blocks to be sent while the stream previously received block(s) were historic.
- In GUI, module output now shows fields with default values, i.e.
0,"",false
Now using plugin: buf.build/community/neoeinstein-prost-crate:v0.3.1 when generating the Protobuf Rust mod.rs which fixes the warning that remote plugins are deprecated.
Previously we were using remote: buf.build/prost/plugins/crate:v0.3.1-1. But remote plugins when using https://buf.build (which we use to generate the Protobuf) are now deprecated and will cease to function on July 10th, 2023.
The net effect of this is that if you don't update your Substreams CLI to 1.1.7, on July 10th 2023 and after, the substreams protogen will not work anymore.
substreams-tier1andsubstreams-tier2are now standalone Apps, to be used as such by server implementations (firehose-ethereum, etc.)substreams-tier1now listens to Connect protocol, enabling browser-based substreams clients- Authentication has been overhauled to take advantage of https://github.com/streamingfast/dauth, allowing the use of a GRPC-based sidecar or reverse-proxy to provide authentication.
- Metering has been overhauled to take advantage of https://github.com/streamingfast/dmetering plugins, allowing the use of a GRPC sidecar or logs to expose usage metrics.
- The tier2 logs no longer show a
parent_trace_id: thetrace_idis now the same as tier1 jobs. Unique tier2 jobs can be distinguished by theirstageandsegment, corresponding to theoutput_module_nameandstartblock:stopblock
- The
substreams protogencommand now uses this Buf plugin https://buf.build/community/neoeinstein-prost to generate the Rust code for your Substreams definitions. - The
substreams protogencommand no longer generate theFILE_DESCRIPTOR_SETconstant which generates an unsued warning in Rust. We don't think nobody relied on having theFILE_DESCRIPTOR_SETconstant generated, but if it's the case, you can provide your ownbuf.gen.yamlthat will be used instead of the generated one when doingsubstreams protogen. - Added
-Hflag on thesubstreams runcommand, to set HTTP Headers in the Substreams request.
- Fixed generated
buf.gen.yamlnot being deleted when an error occurs while generating the Rust code.
This release fixes data determinism issues. This comes at a 20% performance cost but is necessary for integration with The Graph ecosystem.
- When upgrading a substreams server to this version, you should delete all existing module caches to benefit from deterministic output
- Tier1 now records deterministic failures in wasm, "blacklists" identical requests for 10 minutes (by serving them the same InvalidArgument error) with a forced incremental backoff. This prevents accidental bad actors from hogging tier2 resources when their substreams cannot go passed a certain block.
- Tier1 now sends the ResolvedStartBlock, LinearHandoffBlock and MaxJobWorkers in SessionInit message for the client and gui to show
- Substreams CLI can now read manifests/spkg directly from an IPFS address (subgraph deployment or the spkg itself), using
ipfs://Qm...notation
- When talking to an updated server, the gui will not overflow on a negative start block, using the newly available resolvedStartBlock instead.
- When running in development mode with a start-block in the future on a cold cache, you would sometimes get invalid "updates" from the store passed down to your modules that depend on them. It did not impact the caches but caused invalid output.
- The WASM engine was incorrectly reusing memory, preventing deterministic output. It made things go faster, but at the cost of determinism. Memory is now reset between WASM executions on each block.
- The GUI no longer panics when an invalid output-module is given as argument
- Changed default WASM engine from
wasmtimetowazero, useSUBSTREAMS_WASM_RUNTIME=wasmtimeto revert to prior engine. Note thatwasmtimewill now run a lot slower than before because resetting the memory inwasmtimeis more expensive than inwazero. - Execution of modules is now done in parallel within a single instance, based on a tree of module dependencies.
- The
substreams guiandsubstreams runnow accept commas inside aparamvalue. For example:substreams run --param=p1=bar,baz,qux --param=p2=foo,baz. However, you can no longer pass multiple parameters using an ENV variable, or a.yamlconfig file.
- Module hashing changed to fix cache reuse on substreams use imported modules
- Memory leak fixed on rpc-enabled servers
- GUI more responsive
- BREAKING: The module hashing algorithm wrongfully changed the hash for imported modules, which made it impossible to leverage caches when composing new substreams off of imported ones.
- Operationally, if you want to keep your caches, you will need to copy or move the old hashes to the new ones.
- You can obtain the prior hashes for a given spkg with:
substreams info my.spkg, using a prior release of thesubstreams - With a more recent
substreamsrelease, you can obtain the new hashes with the same command. - You can then
cpormvthe caches for each module hash.
- You can obtain the prior hashes for a given spkg with:
- You can also ignore this change. This will simply invalidate your cache.
- Operationally, if you want to keep your caches, you will need to copy or move the old hashes to the new ones.
- Fixed a memory leak where "PostJobHooks" were not always called. These are used to hook in rpc calls in Ethereum chain. They are now always called, even if no block has been processed (can be called with
nilvalue for the clock) - Jobs that fail deterministically (during WASM execution) on tier2 will fail faster, without retries from tier1.
substreams guicommand now handles params flag (it was ignored)- Substeams GUI responsiveness improved significantly when handling large payloads
- Added Tracing capabilities, using https://github.com/streamingfast/sf-tracing . See repository for details on how to enable.
- If the cached substreams states are missing a 'full-kv' file in its sequence (not a normal scenario), requests will fail with
opening file: not found#222
This release contains fixes for race conditions that happen when multiple request tries to sync the same range using the same .spkg. Those fixes will avoid weird state error at the cost of duplicating work in some circumstances. A future refactor of the Substreams engine scheduler will come later to fix those inefficiencies.
Operators, please read the operators section for upgrade instructions.
Note This upgrade procedure is applies if your Substreams deployment topology includes both
tier1andtier2processes. If you have defined somewhere the config valuesubstreams-tier2: true, then this applies to you, otherwise, if you can ignore the upgrade procedure.
This release includes a small change in the internal RPC layer between tier1 processes and tier2 processes. This change requires an ordered upgrade of the processes to avoid errors.
The components should be deployed in this order:
- Deploy and roll out
tier1processes first - Deploy and roll out
tier2processes in second
If you upgrade in the wrong order or if somehow tier2 processes start using the new protocol without tier1 being aware, user will end up with backend error(s) saying that some partial file are not found. Those will be resolved only when tier1 processes have been upgraded successfully.
- Fixed a race when multiple Substreams request execute on the same
.spkg, it was causing races between the two executors. - GUI: fixed an issue which would slow down message consumption when progress page was shown in ascii art "bars" mode
- GUI: fixed the display of blocks per second to represent actual blocks, not messages count
- [
binary]: Commandssubstreams <...>that fails now correctly return an exit code 1. - [
library]: Themanifest.NewReadersignature changed and will now return a*Reader, error(previously*Reader).
- [
library]: Themanifest.Readergained the ability to infer the path if provided with input""based on the current working directory. - [
library]: Themanifest.Readergained the ability to infer the path if provided with input that is a directory.
This release contains bug fixes and speed/scaling improvements around the Substreams engine. It also contains few small enhancements for substreams gui.
This release contains an important bug that could have generated corrupted store state files. This is important for developers and operators.
The store state files will be fully deleted on the Substreams server to start fresh again. The impact for you as a developer is that Substreams that were fully synced will now need to re-generate from initial block the store's state. So you might see long delays before getting a new block data while the Substreams engine is re-computing the store states from scratch.
You need to clear the state store and remove all the files that are stored under substreams-state-store-url flag. You can also make it point to a brand new folder and delete the old one after the rollout.
- Fix a bug where not all extra modules would be sent back on debug mode
- Fixed a bug in tier1 that could result in corrupted state files when getting close to chain HEAD
- Fixed some performance and stalling issues when using GCS for blocks
- Fixed storage logs not being shown properly
- GUI: Fixed panic race condition
- GUI: Cosmetic changes
- GUI: Added traceID
This release introduces a new RPC protocol and the old one has been removed. The new RPC protocol is in a new Protobuf package sf.substreams.rpc.v2 and it drastically changes how chain re-orgs are signaled to the user. Here the highlights of this release:
- Getting rid of
undopayload during re-org substreams guiImprovements- Substreams integration testing
- Substreams Protobuf definitions updated
Previously, the GRPC endpoint sf.substreams.v1.Stream/Blocks would send a payload with the corresponding "step", NEW or UNDO.
Unfortunately, this led to some cases where the payload could not be deterministically generated for old blocks that had been forked out, resulting in a stalling request, a failure, or in some worst cases, incomplete data.
The new design, under sf.substreams.rpc.v2.Stream/Blocks, takes care of these situations by removing the 'step' component and using these two messages types:
sf.substreams.rpc.v2.BlockScopedDatawhen chain progresses, with the payloadsf.substreams.rpc.v2.BlockUndoSignalduring a reorg, with the last valid block number + block hash
The client now has the burden of keeping the necessary means of performing the undo actions (ex: a map of previous values for each block). The BlockScopedData message now includes the final_block_height to let you know when this "undo data" can be discarded.
With these changes, a substreams server can even handle a cursor for a block that it has never seen, provided that it is a valid cursor, by signaling the client to revert up to the last known final block, trading efficiency for resilience in these extreme cases.
- Added key 'f' shortcut for changing display encoding of bytes value (hex, pruned string, base64)
- Added
jqsearch mode (hit/twice). Filters the output with thejqexpression, and applies the search to match all blocks. - Added search history (with
up/down), similar toless. - Running a search now applies it to all blocks, and highlights the matching ones in the blocks bar (in red).
- Added
OandP, to jump to prev/next block with matching search results. - Added module search with
m, to quickly switch from module to module.
Added a basic Substreams testing framework that validates module outputs against expected values. The testing framework currently runs on substreams run command, where you can specify the following flags:
test-filePoints to a file that contains your test specstest-verboseEnables verbose mode while testing.
The test file, specifies the expected output for a given substreams module at a given block.
We changed the Substreams Protobuf definitions making a major overhaul of the RPC communication. This is a breaking change for those consuming Substreams through gRPC.
Note The is no breaking changes for Substreams developers regarding your Rust code, Substreams manifest and Substreams package.
- Removed the
RequestandResponsemessages (and related) fromsf.substreams.v1, they have been moved tosf.substreams.rpc.v2. You will need to update your usage if you were consuming Substreams through gRPC. - The new
Requestexcludes fields and usages that were already deprecated, like using multiplemodule_outputs. - The
Responsenow contains a single module output - In
developmentmode, the additional modules output can be inspected underdebug_map_outputsanddebug_store_outputs.
Separating Tier1 vs Tier2 gRPC protocol (for Substreams server operators)
Now that the Blocks request has been moved from sf.substreams.v1 to sf.substreams.rpc.v2, the communication between a substreams instance acting as tier1 and a tier2 instance that performs the background processing has also been reworked, and put under sf.substreams.internal.v2.Stream/ProcessRange. It has also been stripped of parameters that were not used for that level of communication (ex: cursor, logs...)
- The
final_blocks_only: trueon theRequestwas not honored on the server. It now correctly sends only blocks that are final/irreversible (according to Firehose rules). - Prevent substreams panic when requested module has unknown value for "type"
- The
substreams runcommand now has flag--final-blocks-only
This should be the last release before a breaking change in the API and handling of the reorgs and UNDO messages.
- Added support for resolving a negative start-block on server
- CHANGED: The
runcommand now resolves a start-block=-1 from the head of the chain (as supported by the servers now). Prior to this change, the-1value meant the 'initialBlock' of the requested module. The empty string is now used for this purpose, - GUI: Added support for search, similar to
less, with/. - GUI: Search and output offset is conserved when switching module/block number in the "Output" tab.
- Library: protobuf message descriptors now exposed in the
manifest/package. This is something useful to any sink that would need to interpret the protobuf messages inside a Package. - Added support for resolving a negative start-block on server (also added to run command)
- The
runandguicommand no longer resolve astart-block=-1to the 'initialBlock' of the requested module. To get this behavior, simply assign an empty string value to the flagstart-blockinstead. - Added support for search within the Substreams gui
outputview. Usage of search withinoutputbehaves similar to thelesscommand, and can be toggled with "/".
- Release was retracted because it contained the refactoring expected for 1.1.0 by mistake, check https://github.com/streamingfast/substreams/releases/tag/v1.0.3 instead.
- Fixed "undo" messages incorrectly contained too many module outputs (all modules, with some duplicates).
- Fixed status bar message cutoff bug
- Fixed
substreams runwhenmanifestcontains unknown attributes - Fixed bubble tea program error when existing the
runcommand
- Added command
substreams gui, providing a terminal-based GUI to inspect the streamed data. Also adds--replaysupport, to save a stream toreplay.logand load it back in the UI later. You can use it as you wouldsubstreams run. Feedback welcome. - Modified command
substreams protogen, defaulting to generating themod.rsfile alongside the rust bindings. Also added--generate-mod-rsflag to togglemod.rsgeneration. - Added support for module parameterization. Defined in the manifest as:
module:
name: my_module
inputs:
params: string
...
params:
my_module: "0x123123"
"imported:module": override value from imported module
and on the command-line as:
substreams run -p module=value -p "module2=other value" ...
Servers need to be updated for packages to be able to be consumed this way.
This change keeps backwards compatibility. Old Substreams Packages will still work the same, with no changes to module hashes.
- Added support for
{version}template in--output-fileflag value onsubstreams pack. - Added fuel limit to wasm execution as a server-side option, preventing wasm process from running forever.
- Added 'Network' and 'Sink{Type, Module, Config}' fields in the manifest and protobuf definition for future bundling of substreams sink definitions within a substreams package.
-
Improved execution speed and module loading speed by bumping to WASM Time to version 4.0.
-
Improved developer experience on the CLI by making the
<manifest>argument optional.The CLI when
<manifest>argument is not provided will now look in the current directory for asubstreams.yamlfile and is going to use it if present. So if you are in your Substreams project and your file is namedsubstreams.yaml, you can simply dosubstreams pack,substreams protogen, etc.Moreover, we added to possibility to pass a directory containing a
substreams.yamldirectly sosubstreams pack path/to/projectwould work as long aspath/to/projectcontains a file namedsubstreams.yaml. -
Fixed a bug that was preventing production mode to complete properly when using a bounded block range.
-
Improved overall stability of the Substreams engine.
- Breaking Config values
substreams-stores-save-intervalandsubstreams-output-cache-save-intervalhave been merged together intosubstreams-cache-save-intervalin thefirehose-<chain>repositories. Refer to chain specificfirehose-<chain>repository for further details.
- The
<manifest>can point to a directory that contains asubstreams.yamlfile instead of having to point to the file directly. - The
<manifest>parameter is now optional in all commands requiring it.
- Fixed valuetype mismatch for stores
- Fixed production mode not completing when block range was specified
- Fixed tier1 crashing due to missing context canceled check.
- Fixed some code paths where locking could have happened due to incorrect checking of context cancellation.
- Request validation for blockchain's input type is now made only against the requested module it's transitive dependencies.
- Updated WASM Time library to 4.0.0 leading to improved execution speed.
- Remove distinction between
output-save-intervalandstore-save-interval. substreams inithas been moved undersubstreams alpha initas this is a feature included by mistake in latest release that should not have been displayed in the main list of commands.substreams codegenhas been moved undersubstreams alpha codegenas this is a feature included by mistake in latest release that should not have been displayed in the main list of commands.
This upcoming release is going to bring significant changes on how Substreams are developed, consumed and speed of execution. Note that there is no breaking changes related to your Substreams' Rust code, only breaking changes will be about how Substreams are run and available features/flags.
Here the highlights of elements that will change in next release:
- Production vs Development Mode
- Single Output Module
- Output Module must be of type
map InitialSnapshotsis now adevelopmentmode feature only- Enhanced Parallel Execution
In this rest of this post, we are going to go through each of them in greater details and the implications they have for you. Full changelog is available after.
Warning Operators, refer to Operators Notes section for specific instructions of deploying this new version.
We introduce an execution mode when running Substreams, either production mode or development mode. The execution mode impacts how the Substreams get executed, specifically:
- The time to first byte
- The module logs and outputs sent back to the client
- How parallel execution is applied through the requested range
The difference between the modes are:
- In
developmentmode, the client will receive all the logs of the executedmodules. Inproductionmode, logs are not available at all. - In
developmentmode, module's are always re-executed from request's start block meaning now that logs will always be visible to the user. Inproductionmode, if a module's output is found in cache, module execution is skipped completely and data is returned directly. - In
developmentmode, only backward parallel execution can be effective. Inproductionmode, both backward parallel execution and forward parallel execution can be effective. See Enhanced parallel execution section for further details about parallel execution. - In
developmentmode, every module's output is returned back in the response but only root module is displayed by default insubstreamsCLI (configurable via a flag). Inproductionmode, only root module's output is returned. - In
developmentmode, you may request specificstoresnapshot that are in the execution tree via thesubstreamsCLI--debug-modules-initial-snapshotsflag. Inproductionmode, this feature is not available.
The execution mode is specified at that gRPC request level and is the default mode is development. The substreams CLI tool being a development tool foremost, we do not expect people to activate production mode (-p) when using it outside for maybe testing purposes.
If today's you have sink code making the gRPC request yourself and are using that for production consumption, ensure that field production_mode in your Substreams request is set to true. StreamingFast provided sink like substreams-sink-postgres, substreams-sink-files and others have already been updated to use production_mode by default.
Final note, we recommend to run the production mode against a compiled .spkg file that should ideally be released and versioned. This is to ensure stable modules' hashes and leverage cached output properly.
We now only support 1 output module when running a Substreams, while prior this release, it was possible to have multiple ones.
- Only a single module can now be requested, previous version allowed to request N modules.
- Only
mapmodule can now be requested, previous version allowedmapandstoreto be requested. InitialSnapshotsis now forbidden inproductionmode and still allowed indevelopmentmode.- In
developmentmode, the server sends back output for all executed modules (by default the CLI displays only requested module's output).
Note We added
output_moduleto the Substreams request and keptoutput_modulesto remain backwards compatible for a while. If anoutput_moduleis specified we will honor that module. If not we will checkoutput_modulesto ensure there is only 1 output module. In a future release, we are going to removeoutput_modulesaltogether.
With the introduction of development vs production mode, we added a change in behavior to reduce frictions this changes has on debugging. Indeed, in development mode, all executed modules's output will be sent be to the user. This includes the requested output module as well as all its dependencies. The substreams CLI has been adjusted to show only the output of the requested output module by default. The new substreams CLI flag -debug-modules-output can be used to control which modules' output is actually displayed by the CLI.
Migration Path If you are currently requesting more than one module, refactor your Substreams code so that a single
mapmodule aggregates all the required information from your different dependencies in one output.
It is now forbidden to request a store module as the output module of the Substreams request, the requested output module must now be of kind map. Different factors have motivated this change:
- Recently we have seen incorrect usage of
storemodule. Astoremodule was not intended to be used as a persistent long term storage,storemodules were conceived as a place to aggregate data for later steps in computation. Using it as a persistent storage make the store unmanageable. - We had always expected users to consume a
mapmodule which would return data formatted according to a finalsinkspec which will then permanently store the extracted data. We never envisionedstoreto act as long term storage. - Forward parallel execution does not support a
storeas its last step.
Migration Path If you are currently using a
storemodule as your output store. You will need to create amapmodule that will have as input thedeltasof saidstoremodule, and return the deltas.
Let's assume a Substreams with these dependencies: [block] --> [map_pools] --> [store_pools] --> [map_transfers]
- Running
substreams run substreams.yaml map_transferswill only print the outputs and logs from themap_transfersmodule. - Running
substreams run substreams.yaml map_transfers --debug-modules-output=map_pools,map_transfers,store_poolswill print the outputs of those 3 modules.
Now that a store cannot be requested as the output module, the InitialSnapshots did not make sense anymore to be available. Moreover, we have seen people using it to retrieve the initial state and then continue syncing. While it's a fair use case, we always wanted people to perform the synchronization using the streaming primitive and not by using store as long term storage.
However, the InitialSnapshots is a useful tool for debugging what a store contains at a given block. So we decided to keep it in development mode only where you can request the snapshot of a store module when doing your request. In the Substreams' request/response, initial_store_snapshot_for_modules has been renamed to debug_initial_store_snapshot_for_modules, snapshot_data to debug_snapshot_data and snapshot_complete to debug_snapshot_complete.
Migration Path If you were relying on
InitialSnapshotsfeature in production. You will need to create amapmodule that will have as input thedeltasof saidstoremodule, and then synchronize the full state on the consuming side.
Let's assume a Substreams with these dependencies: [block] --> [map_pools] --> [store_pools] --> [map_transfers]
- Running
substreams run substreams.yaml map_transfers -s 1000 -t +5 --debug-modules-initial-snapshot=store_poolswill print all the entries in store_pools at block 999, then continue with outputs and logs frommap_transfersin blocks 1000 to 1004.
There are 2 ways parallel execution can happen either backward or forward.
Backward parallel execution consists of executing in parallel block ranges from the module's start block up to the start block of the request. If the start block of the request matches module's start block, there is no backward parallel execution to perform. Also, this is happening only for dependencies of type store which means that if you depends only on other map modules, no backward parallel execution happens.
Forward parallel execution consists of executing in parallel block ranges from the start block of the request up to last known final block (a.k.a the irreversible block) or the stop block of the request, depending on which is smaller. Forward parallel execution significantly improves the performance of the Substreams as we execute your module in advanced through the chain history in parallel. What we stream you back is the cached output of your module's execution which means essentially that we stream back to you data written in flat files. This gives a major performance boost because in almost all cases, the data will be already for you to consume.
Forward parallel execution happens only in production mode is always disabled when in development mode. Moreover, since we read back data from cache, it means that logs of your modules will never be accessible as we do not store them.
Backward parallel execution still occurs in development and production mode. The diagram below gives details about when parallel execution happen.
You can see that in production mode, parallel execution happens before the Substreams request range as well as within the requested range. While in development mode, we can see that parallel execution happens only before the Substreams request range, so between module's start block and start block of requested range (backward parallel execution only).
The state output format for map and store modules has changed internally to be more compact in Protobuf format. When deploying this new version, previous existing state files should be deleted or deployment updated to point to a new store location. The state output store is defined by the flag --substreams-state-store-url flag parameter on chain specific binary (i.e. fireeth).
- Added
production_modeto Substreams Request - Added
output_moduleto Substreams Request
- Fixed
Ctrl-Cnot working directly when in TUI mode. - Added
Trace IDprinting once available. - Added command
substreams tools analytics store-statsto get statistic for a given store. - Added
--debug-modules-output(comma-separated module names) (unavailable inproductionmode). - Breaking Renamed flag
--initial-snapshotsto--debug-modules-initial-snapshots(comma-separated module names) (unavailable inproductionmode).
- Moved Rust modules to
github.com/streamingfast/substreams-rs
- Gained significant execution time improvement when saving and loading stores, during the squashing process by leveraging vtprotobuf
- Added XDS support for tier 2s
- Added intrinsic support for type
bigdecimal, will deprecatebigfloat - Significant improvements in code-coverage and full integration tests.
- Added
substreams tools proxy <package>subcommand to allow calling substreams with a pre-defined package easily from a web browser using bufbuild/connect-web - Lowered GRPC client keep alive frequency, to prevent "Too Many Pings" disconnection issue.
- Added a fast failure when attempting to connect to an unreachable substreams endpoint.
- CLI is now able to read
.spkgfromgs://,s3://andaz://URLs, the URL format must be supported by our dstore library). - Command
substreams packis now restricted to local manifest file. - Added command
substreams tools moduleto introspect a store state in storage. - Made changes to allow for
substreamsCLI to run on Windows OS (thanks @robinbernon). - Added flag
--output-file <template>tosubstreams packcommand to control where the.skpgis written,{manifestDir}and{spkgDefaultName}can be used in thetemplatevalue where{manifestDir}resolves to manifest's directory and{spkgDefaultName}is the pre-computed default name in the form<name>-<version>where<name>is the manifest's "package.name" value (_values in the name are replaced by-) and<version>ispackage.versionvalue. - Fixed relative path not resolved correctly against manifest's location in
protobuf.fileslist. - Fixed relative path not resolved correctly against manifest's location in
binarieslist. substreams protogen <package> --output-path <path>flag is now relative to<package>if<package>is a local manifest file ending with.yaml.- Endpoint's port is now validated otherwise when unspecified, it creates an infinite 'Connecting...' message that will never resolves.
- Fixed error when importing
http/https.spkgfiles inimportssection.
New updatePolicy append, allows one to build a store that concatenates values and supports parallelism. This affects the server, the manifest format (additive only), the substreams crate and the generated code therein.
- Store APIs methods now accept
keyof typeAsRef<str>which means for example that bothStringan&strare accepted as inputs in:StoreSet::setStoreSet::set_manyStoreSet::set_if_not_existsStoreSet::set_if_not_exists_manyStoreAddInt64::addStoreAddInt64::add_manyStoreAddFloat64::addStoreAddFloat64::add_manyStoreAddBigFloat::addStoreAddBigFloat::add_manyStoreAddBigInt::addStoreAddBigInt::add_manyStoreMaxInt64::maxStoreMaxFloat64::maxStoreMaxBigInt::maxStoreMaxBigFloat::maxStoreMinInt64::minStoreMinFloat64::minStoreMinBigInt::minStoreMinBigFloat::minStoreAppend::appendStoreAppend::append_bytesStoreGet::get_atStoreGet::get_lastStoreGet::get_first
- Low-level state methods now accept
keyof typeAsRef<str>which means for example that bothStringan&strare accepted as inputs in:state::get_atstate::get_laststate::get_firststate::setstate::set_if_not_existsstate::appendstate::delete_prefixstate::add_bigintstate::add_int64state::add_float64state::add_bigfloatstate::set_min_int64state::set_min_bigintstate::set_min_float64state::set_min_bigfloatstate::set_max_int64state::set_max_bigintstate::set_max_float64state::set_max_bigfloat
- Bumped
prost(and related dependencies) to^0.11.0
- Environment variables are now accepted in manifest's
importslist. - Environment variables are now accepted in manifest's
protobuf.importPathslist. - Fixed relative path not resolved correctly against manifest's location in
importslist. - Changed the output modes:
module-*modes are gone and become the format forjsonlandjson. This means all printed outputs are wrapped to provide the module name, and other metadata. - Added
--initial-snapshots(or-i) to theruncommand, which will dump the stores specified as output modules. - Added color for
uioutput mode under a tty. - Added some request validation on both client and server (validate that output modules are present in the modules graph)
- Added support to serve the initial snapshot
- Changed
substreams manifest info->substreams info - Changed
substreams manifest graph->substreams graph - Updated usage
- Multiple fixes to boundaries
- Various bug fixes around store and parallel execution.
- Fix null pointer exception at the end of CLI run in some cases.
- Do log last error when the CLI exit with an error has the error is already printed to the user and it creates a weird behavior.
- Ensure arguments can be passed to Docker built image.
- Various bug fixes around store and parallel execution.
- Fixed logs being repeated on module with inputs that was receiving nothing.
- Added
substreams::hexwrapper around hex_literal::hex macro
- Added
substreams run -o ui|json|jsonl|module-json|module-jsonl.
- Fixed a whole bunch of issues, in parallel processing. More stable caching. See chain-specific releases.
- Fixed
substreamscrate usage from tagged version published on crates.io.
-
Changed
startBlocktoinitialBlockin substreams.yaml manifests. -
code:is now defined in thebinariessection of the manifest, instead of in each module. A module can select which binary with thebinary:field on the Module definition. -
Added
substreams inspect ./substreams.yamlorinspect some.spkgto see what's inside. Requiresprotocto be installed (which you should have anyway). -
Added command
substreams protogenthat writes a temporarybuf.gen.yamland generates Rust structs based on the contents of the provided manifest or package. -
Added
substreams::handlersmacros to reduce boilerplate when create substream modules.substreams::handlers::mapis used for the handlers corresponding to modules of typemap. Modules of typemapshould return aResultwhere the error is of typeError/// Map module example #[substreams::handlers::map] pub fn map_module_func(blk: eth::Block) -> Result<erc721::Transfers, Error> { ... }
substreams::handlers::storeis used for the handlers corresponding to modules of typestore. Modules of typestoreshould have no return value./// Map module example #[substreams::handlers::store] pub fn store_module(transfers: erc721::Transfers, s: store::StoreAddInt64, pairs: store::StoreGet, tokens: store::StoreGet) { ... }
- Implemented packages (see docs).
- Added
substreams::Hexwrapper type to more easily deal with printing and encoding bytes to hexadecimal string. - Added
substreams::log::info!(...)andsubstreams::log::debug!(...)supporting formatting arguments (acts likeprintln!()macro). - Added new field
logs_truncatedthat can be used to determined if logs were truncated. - Augmented logs truncation limit to 128 KiB per module per block.
- Updated
substreams runto properly report module progress error. - When a module WASM execution error out, progress with failure logs is now returned before closing the substreams connection.
- The API token is not passed anymore if the connection is using plain text option
--plaintext. - The
-c(or--compact-output) can be used to print JSON as a single compact line. - The
--stop-blockflag onsubstream runcan be defined as+1000to stream from start block + 1000.
- Added Dockerfile support.
- Improved defaults for
--proto-pathand--proto, using globs. - WASM file paths in substreams.yaml manifests now resolve relative to the location of the yaml file.
- Added
substreams manifest packageto create .pb packages to simplify querying using other languages. See the python example. - Added
substreams manifest graphto show the Mermaid graph alone. - Improved mermaid graph layout.
- Removed native Go code support for now.
- Always writes store snapshots, each 10,000 blocks.
- A few tools to manage partial snapshots under
substreams tools
First chain-agnostic release. THIS IS BETA SOFTWARE. USE AT YOUR OWN RISK. WE PROVIDE NO BACKWARDS COMPATIBILITY GUARANTEES FOR THIS RELEASE.
See https://github.com/streamingfast/substreams for usage docs..
- Removed
localcommand. See README.md for instructions on how to run locally now. Buildsfethfrom source for now. - Changed the
remotecommand torun. - Changed
runcommand's--substreams-api-key-envvarflag to--substreams-api-token-envvar, and its default value is changed fromSUBSTREAMS_API_KEYtoSUBSTREAMS_API_TOKEN. See README.md to learn how to obtain such tokens.
