Skip to content

feat: add observability tooling for Flashbox images#93

Open
MoeMahhouk wants to merge 12 commits intomainfrom
moe/flashbox-observability
Open

feat: add observability tooling for Flashbox images#93
MoeMahhouk wants to merge 12 commits intomainfrom
moe/flashbox-observability

Conversation

@MoeMahhouk
Copy link
Member

@MoeMahhouk MoeMahhouk commented Feb 6, 2026

This pull request introduces a comprehensive observability and monitoring stack to the project, centered around Prometheus and its exporters. It adds Prometheus, node-exporter, and process-exporter as services, configures them for system and container-level metrics collection, and sets up recording rules for aggregated metrics. The changes also include dynamic configuration improvements, firewall adjustments for metrics endpoints, and new helper scripts for environment-specific configuration.

Observability & Monitoring Integration

  • Added Prometheus, node-exporter, and process-exporter as systemd services, including installation, configuration, and service enablement for system and container monitoring (prometheus.service, node-exporter.service, process-exporter.service).
  • Introduced Prometheus configuration templates and recording rules for aggregated CPU, memory, disk, network, and container health metrics (prometheus.yml.tmpl, recording_rules.yml, process-exporter.yml).
  • Added gomplate as a build dependency to render dynamic Prometheus configuration from templates.

Firewall & Networking

  • Updated firewall scripts in both L1 and L2 to dynamically allow outgoing traffic to the observability metrics endpoint, using the METRICS_ENDPOINT variable loaded from configuration.
  • Adjusted searcher-firewall.service dependencies to ensure correct ordering with configuration fetching.

Dynamic Configuration

  • Added project-specific dynamic configuration scripts for bob-l1 and bob-l2, supporting both QEMU development and Vault-based production environments. These scripts generate environment-specific config files based on mode and available secrets.

Miscellaneous

  • Ensured correct ownership of Prometheus data directories after installation to avoid permission issues.

These changes collectively enable robust, flexible, and secure monitoring of both the host system and key containers, and prepare the environment for future observability enhancements.

@MoeMahhouk MoeMahhouk force-pushed the moe/flashbox-observability branch from 8f75a8f to b0557e3 Compare February 12, 2026 17:42
@MoeMahhouk MoeMahhouk marked this pull request as ready for review February 13, 2026 12:45
@@ -0,0 +1,5 @@
process_names:
# Monitor the searcher container (conmon + all children via --children flag)
- name: "searcher-container"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also monitor lighthouse in bob-l1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a need for a dedicated monitoring for lighthouse in bob-l1 image, then it should be somehow placed in the bob-l1 directory setup and make it extend this configuration if possible, wdyt?

Comment on lines +1 to +49
#!/bin/sh
set -eu

# Project-specific dynamic configuration for bob-l1
# Called by fetch-config.sh with mode (qemu/vault) and config path

MODE="$1"
CONFIG_PATH="$2"

if [ "$MODE" = "qemu" ]; then
# Local QEMU development configuration
# GATEWAY is exported by the common fetch-config.sh
cat <<EOF >> "$CONFIG_PATH"
CONFIG_NETWORK_ID='1'
CONFIG_NETWORK_NAME='mainnet'
CONFIG_JWT_SECRET='1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef'
CONFIG_CL_STATIC_PEERS=''
CONFIG_EL_STATIC_PEERS='enode://abc123@${GATEWAY}:30303'
CONFIG_TITAN_IP='52.207.17.217'
CONFIG_FLASHBOTS_BUNDLE_1='18.221.59.61'
CONFIG_FLASHBOTS_BUNDLE_2='3.15.88.156'
CONFIG_FLASHBOTS_TX_STREAM_1='3.136.107.142'
CONFIG_FLASHBOTS_TX_STREAM_2='3.149.14.12'
EOF

elif [ "$MODE" = "vault" ]; then
# Production configuration from Vault
# get_data_value and get_ips_from_uris are exported by fetch-config.sh

# For bob-l1, we might not have Vault set up yet
# This is a placeholder for when Vault integration is added
echo "Warning: Vault configuration not yet implemented for bob-l1"
echo "Using environment variables or defaults..."

# You can add Vault-based configuration here when ready
# For now, we can use environment variables as fallback
cat <<EOF >> "$CONFIG_PATH"
CONFIG_NETWORK_ID='${CONFIG_NETWORK_ID:-1}'
CONFIG_NETWORK_NAME='${CONFIG_NETWORK_NAME:-mainnet}'
CONFIG_JWT_SECRET='${CONFIG_JWT_SECRET:-}'
CONFIG_CL_STATIC_PEERS='${CONFIG_CL_STATIC_PEERS:-}'
CONFIG_EL_STATIC_PEERS='${CONFIG_EL_STATIC_PEERS:-}'
CONFIG_TITAN_IP='${CONFIG_TITAN_IP:-52.207.17.217}'
CONFIG_FLASHBOTS_BUNDLE_1='${CONFIG_FLASHBOTS_BUNDLE_1:-18.221.59.61}'
CONFIG_FLASHBOTS_BUNDLE_2='${CONFIG_FLASHBOTS_BUNDLE_2:-3.15.88.156}'
CONFIG_FLASHBOTS_TX_STREAM_1='${CONFIG_FLASHBOTS_TX_STREAM_1:-3.136.107.142}'
CONFIG_FLASHBOTS_TX_STREAM_2='${CONFIG_FLASHBOTS_TX_STREAM_2:-3.149.14.12}'
EOF
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this file for L1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently not because it is all hardcoded but if we want to unify the approaches for both l1 and l2, this should be included. Plus the remote write url is also being fetched dynamically from vault for both l1 and l2 currently

pablin-10 and others added 2 commits February 26, 2026 01:33
```
/usr/bin/fetch-config.sh: 136: export: Illegal option -f
```
Description=Searcher Network and Firewall Rules
After=network.target network-setup.service
Requires=network-setup.service
After=network.target network-setup.service fetch-config.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes because we need to fetch the config that contains the remote write url for prometheus stuff and later included in the firewall configs accordingly

Comment on lines +10 to +24
if [ "$MODE" = "qemu" ]; then
# Local QEMU development configuration
# GATEWAY is exported by the common fetch-config.sh
cat <<EOF >> "$CONFIG_PATH"
CONFIG_NETWORK_ID='12345'
CONFIG_NETWORK_NAME='local-testnet'
CONFIG_JWT_SECRET='1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef'
CONFIG_EL_STATIC_PEERS='enode://abc123@${GATEWAY}:30303'
CONFIG_EL_PEERS_IPS='${GATEWAY}'
CONFIG_SIMULATOR_RPC_URL='http://${GATEWAY}:8545'
CONFIG_SIMULATOR_WS_URL='ws://${GATEWAY}:8546'
CONFIG_SIMULATOR_IP='${GATEWAY}'
EOF

elif [ "$MODE" = "vault" ]; then
Copy link
Contributor

@niccoloraspa niccoloraspa Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should find a better approach than this if/elif approach

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, that is dependent on the merge efforts.
Note: this logic here isnt introduced in this PR but rather moved from the bob-l2 image to the base bob-common to be used by both l1 & l2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants