Skip to content

Rethink event lifecycle & idempotency #36

@alexluong

Description

@alexluong

Here's some pseudocode around the critical path around the event lifecycle.

Current design

Idempotency logic:

type Idempotence interface {
  Exec(ctx, context.Context, key string, exec fun() error) error
}
impl:
  options:
    timoutTTL time.Duration // "processing"
    successfulTTL time.Duration // "processed"

Exec:
  hasKey = SETNX key timeoutTTL
  if !hasKey {
    err = exec()
    if err != nil {
      DEL key
      return
    }
    SET key successfulTTL
  } else {
    status = GET key
    if status == nil
      CONSIDERATION: how to handle this?
    else if status == proccessed
       return nil
    else (status == processing)
       time.Sleep(timeoutTTL)
       status = GET key
       if status == processed
          return nil
       else return ErrConflict
  }

PublishMQ consumer:

idempotence.Exec(ctx, "idempotency:publishmq:"+event.ID, handler)
idempotence opts:
  assuming visibility timeout is 5s
  timeoutTTL: 5s (or 5s+1s=6s)
  successfulTTL: 10s (2x visibility timeout)

handler logic:
  destinations = listDestinations(event) // list from tenant ID, destinationID, topic, etc
  for destination in destinations {
    deliveryEvent = newDeliveryEvent(event, destination)
    err = deliverymq.Publish(deliveryEvent) // TODO: parallelize this op
    if err != nil {
      // NOTE_1
      return err
    }
  }

DeliveryMQ consumer:

idempotence.Exec(ctx, "idempotency:deliverymq:"+deliveryEvent.ID, handler)
idempotence opts:
  assuming visibility timeout is 30s
  timeoutTTL: 30s (or 30s+1s=31s)
  successfulTTL: 24hr

handler logic:
  deliveryEvent.result = destination.Publish(event)
  err = logmq.Publish(deliveryEvent)
  if err != nil {
    return err
  }

Problem & potential solution

At NOTE_1, how should we handle the error here better? Let's say an event is supposed to be delivered to 10 different destinations. If one destination fails, based on the current logic, we will nack the message and upon retry, we will attempt to redeliver the event to all destinations again. This approach will cause multiple duplicate event deliveries.

Solution:

One solution is to wrap the enqueue with an idempotence handler. We can consider skipping the consumer-level idempotency logic in that case because it may or may not be necessary anymore (latency vs Redis memory usage).

Here's the updated logic for PublishMQ consumer:

idempotence.Exec(ctx, "idempotency:publishmq:"+event.ID, handler) // optional, should we still?
idempotence opts:
  assuming visibility timeout is 5s
  timeoutTTL: 5s (or 5s+1s=6s)
  successfulTTL: 10s (2x visibility timeout)

+ deliveryIdempotenceOptions: 5s timeoutTTL, 1hr successfulTTL (or 24hr?)

handler logic:
  destinations = listDestinations(event) // list from tenant ID, destinationID, topic, etc
  var errs []error
  // assuming this loop is parallelized
  for destination in destinations {
    deliveryEvent = newDeliveryEvent(event, destination)
    err = deliveryIdempotence.Exec(ctx, "idempotence:event_destination:"+event.ID+"_"+destination.ID, deliverymq.Publish(deliveryEvent))
    if err != nil {
      errs = append(errs, err)
    }
  }
  if len(errs) > 0 -- Nack else Ack

Let's evaluate the memory key usage for an event lifecycle. Let's say 1 event to be delivered to 10 destinations:

  • 1 publishmq key: idempotence:publishmq:<event_id> (10s successfulTTL - could be negligible given the short TTL)
  • 10 delivery event keys: idempotence:event_destination:<event_id>_<destination_id> (1hr successfulTTL)
  • 10 deliverymq keys: `idempotence:deliverymq:<delivery_event_id> (24hr successfulTTL)
  • also optionally another 10 logmq keys as well

We can consider further optimize for Redis memory usage with an "eventLifecycleIdempotence" implementation where we can share long-term keys from each event delivery. So instead of 1 publishmq key + 30 keys, we can replace that with just 1 publishmq key + 10 keys only. We may need to use some separate ephemeral "processing" keys on top of the 10 long-term (24hr) keys, but given the shorter TTL, it should be negligible, all things considered.

Summary

Here are the next action items here

  • parallelize deliverymq enqueue ops
  • employ idempotence implementation when enqueueing to deliverymq to avoid duplicate deliveries
  • (optional) custom event lifecycle idempotence implementation to optimize Redis memory usage

Would appreciate your input on this design! Let me know if I'm missing anything and what's the next step should be. Specifically, I'd love to hear your thoughts on the TTLs for these idempotency keys here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions