Skip to content

Latest commit

 

History

History
188 lines (145 loc) · 6.71 KB

File metadata and controls

188 lines (145 loc) · 6.71 KB

AWS Lambda

AWS Lambda function for processing ICESat-2 ATL06 data by morton cell.

Overview

The Lambda function processes a single morton cell (order 6) by:

  1. Reading HDF5 files directly from S3 using h5coro (no downloads)
  2. Spatial filtering using morton indexing
  3. Calculating summary statistics for child cells (order 12)
  4. Writing xdggs-enabled Zarr to S3

Architecture

┌─────────────────────────────────────────────────────────────┐
│  Lambda Function (process-morton-cell)                      │
│  ──────────────────────────────────────────────────────────  │
│  Runtime: Python 3.12                                       │
│  Memory: 2048 MB (2 GB)                                     │
│  Timeout: 720s (12 minutes)                                 │
│  ──────────────────────────────────────────────────────────  │
│  Code (~5 MB):                                              │
│    - deployment/aws/lambda_handler.py (AWS wrapper)         │
│    - src/magg/ package (processing, auth, catalog)          │
│  ──────────────────────────────────────────────────────────  │
│  Layer (~70 MB compressed, ~240 MB uncompressed):           │
│    - numpy, pandas, h5coro, mortie, healpy                  │
│    - fastparquet, cramjam, shapely, astropy, earthaccess    │
│    - pydantic-zarr, zarr, obstore, pyarrow                  │
└─────────────────────────────────────────────────────────────┘

Files

File Purpose
deployment/aws/lambda_handler.py AWS Lambda wrapper function
src/magg/processing.py Cloud-agnostic core processing logic
src/magg/auth.py NASA Earthdata authentication helper
src/magg/catalog.py CMR granule catalog builder
deployment/aws/invoke_lambda.py Orchestration script
deployment/aws/build_arm64_layer.sh ARM64 Lambda layer build script

Event Payload

{
  "parent_morton": 123456,
  "parent_order": 6,
  "child_order": 12,
  "granule_urls": [
    "s3://nsidc-cumulus-prod-protected/ATLAS/ATL06/007/2023/12/18/...",
    "s3://nsidc-cumulus-prod-protected/ATLAS/ATL06/007/2023/12/19/..."
  ],
  "store_path": "s3://your-output-bucket/atl06/production.zarr",
  "s3_credentials": {
    "accessKeyId": "ASIA...",
    "secretAccessKey": "...",
    "sessionToken": "..."
  }
}

Parameters

Parameter Type Required Description
parent_morton int Yes Morton index of parent cell (order 6)
parent_order int Yes Order of parent cell (typically 6)
child_order int Yes Order of child cells for statistics (typically 12)
granule_urls list Yes Pre-computed list of S3 URLs from catalog
store_path str Yes Output Zarr store path (e.g. s3://bucket/prefix.zarr)
s3_credentials dict Yes NSIDC S3 credentials for reading source data

S3 Credentials

Credentials are obtained by the orchestrator once before invoking Lambda functions:

from magg.auth import get_nsidc_s3_credentials

# Get credentials (valid for ~1 hour)
s3_creds = get_nsidc_s3_credentials()

# Pass to each Lambda invocation
event = {
    "parent_morton": -6134114,
    "parent_order": 6,
    "child_order": 12,
    "granule_urls": [...],
    "store_path": "s3://output-bucket/atl06/production.zarr",
    "s3_credentials": s3_creds,
}

This approach avoids rate limiting from 1,872 simultaneous NASA logins and eliminates an AWS Secrets Manager dependency.

Deployment

Step 1: Create the function package

cd /path/to/magg

# Create function.zip with handler and magg package
zip -j deployment/aws/function.zip deployment/aws/lambda_handler.py && \
  cd src && zip -ur ../deployment/aws/function.zip magg/ -i "*.py" && cd ..

Step 2: Build and deploy the Lambda layer

See ARM64 Layer for building and deploying the Lambda layer.

Step 3: Create the Lambda function

aws lambda create-function \
  --function-name process-morton-cell \
  --runtime python3.12 \
  --architectures arm64 \
  --role arn:aws:iam::ACCOUNT_ID:role/lambda-execution-role \
  --handler deployment.aws.lambda_handler.lambda_handler \
  --zip-file fileb://deployment/aws/function.zip \
  --timeout 720 \
  --memory-size 2048 \
  --layers arn:aws:lambda:REGION:ACCOUNT_ID:layer:magg-layer-arm64:VERSION

Updating function code

# Re-create the zip
zip -j deployment/aws/function.zip deployment/aws/lambda_handler.py && \
  cd src && zip -ur ../deployment/aws/function.zip magg/ -i "*.py" && cd ..

# Update the Lambda function
aws lambda update-function-code \
  --function-name process-morton-cell \
  --zip-file fileb://deployment/aws/function.zip

Testing

# Build a granule catalog
uv run python -m magg.catalog --cycle 22 --parent-order 6

# Test locally first (no Lambda required)
uv run python -m magg --config atl06.yaml --catalog catalog.json \
  --store ./test.zarr --max-cells 1

# Dry run with the Lambda orchestrator
uv run python deployment/aws/invoke_lambda.py \
  --config atl06.yaml --catalog catalog.json --dry-run

Performance

Metric Value
Average execution time 2--3 minutes per cell
Maximum execution time 10 minutes
Lambda timeout 12 minutes (720s)
Configured memory 2048 MB
Typical memory usage 1--1.5 GB
Cold start 3--5 seconds

Cost Estimate

Per invocation (180s average, 2 GB memory): ~$0.006

Full run (~1,300 cells at order 6): ~$2 including S3 and CloudWatch costs.

Troubleshooting

!!! warning "Missing s3_credentials" Ensure your orchestrator script calls [get_nsidc_s3_credentials][magg.auth.get_nsidc_s3_credentials] and passes the credentials to each Lambda invocation.

!!! info "No granules found" This is normal for cells outside the data coverage area. The function returns gracefully with error: "No granules found".

!!! warning "S3 write permission denied" Check that the Lambda execution role has s3:PutObject permission for the output bucket.

!!! warning "Too many open files" Decrease max workers (e.g., --max-workers 50) or increase ulimit (ulimit -n 10000).