Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,301 @@
---
layout: docs
page_title: Configure singleton deployments
description: |-
Declare a job that guarantees only a single instance can run at a time, with
minimal downtime.
---

# Configure singleton deployments

A singleton deployment is one where there is at most one instance of a given
allocation running on the cluster at one time. You might need this if the
workload needs exclusive access to a remote resource like a data store. Nomad
does not support singleton deployments as a built-in feature. Your workloads
continue to run even when the Nomad client agent has crashed, so ensuring
there's at most one allocation for a given workload requires some cooperation from the
job. This document describes how to implement singleton deployments.

## Design Goals

The configuration described here meets these primary design goals:

- The design prevents a specific process with a task from running if there
is another instance of that task running anywhere else on the Nomad cluster.
- Nomad should be able to recover from failure of the task or the node on which
the task is running with minimal downtime, where "recovery" means that Nomad should stop the
original task and schedule a replacement
task.
- Nomad should minimize false positive detection of failures to avoid
unnecessary downtime during the cutover.

There's a tradeoff between between recovery speed and false positives. The
faster you make Nomad attempt to recover from failure, the more likely that a
transient failure causes Nomad to schedule a replacement and a subsequent
downtime.

Note that it's not possible to design a perfectly zero-downtime singleton
allocation in a distributed system. This design errs on the side of
correctness: having zero or one allocations running rather than the incorrect one or two
allocations running.

## Overview

There are several options available for some details of the implementation, but
all of them include the following:

- You must have a distributed lock with a TTL that's refreshed from the
allocation. The process that sets and refreshes the lock must have its
lifecycle tied to the main task. It can be either in-process, in-task with
supervision, or run as a sidecar. If the allocation cannot obtain the lock,
then it must not start whatever process or operation you intend to be a
singleton. After a configurable window without obtaining the lock, the
allocation must fail.
- You must set the [`group.disconnect.stop_on_client_after`][] field. This
forces a Nomad client that's disconnected from the server to stop the
singleton allocation, which in turn releases the lock or allows its TTL to
expire.

Tune the lock TTL, the time it takes the alloc to
give up, and the `stop_on_client_after` duration timer values to reduce the
maximum amount of downtime the application can have.

The Nomad [Locks API][] can support the operations needed. In psuedo-code these
operations are the following:

- To acquire the lock, `PUT /v1/var/:path?lock-acquire`
- On success: start heartbeat every 1/2 TTL
- On conflict or failure: retry with backoff and timeout.
- Once out of attempts, exit the process with error code.
- To heartbeat, `PUT /v1/var/:path?lock-renew`
- On success: continue
- On conflict: exit the process with error code
- On failure: retry with backoff up to TTL.
- If TTL expires, attempt to revoke lock, then exit the process with error code.

The allocation can safely use the Nomad [Task API][] socket to write to the
locks API, rather than communicating with the server directly. This reduces load
on the server and speeds up detection of failed client nodes because the
disconnected client cannot forward the Task API requests to the leader.

The [`nomad var lock`][] command implements this logic, so you can use it to shim
the process being locked.

### ACLs

Allocations cannot write to Nomad variables by default. You must configure a
[workload-associated ACL policy][] that allows write access in the
[`namespace.variables`][] block. For example, the following ACL policy allows
access to write a lock on the path `nomad/jobs/example/lock` in the `prod`
namespace:

```
namespace "prod" {
variables {
path "nomad/jobs/example/lock" {
capabilities = ["write", "read", "list"]
}
}
}
```

You set this policy on the job with `nomad acl policy apply -namespace prod -job
example example-lock ./policy.hcl`.

## Implementation

### Use `nomad var lock`

We recommend implementing the locking logic with `nomad var lock` as a shim in
your task. This example jobspec assumes there's a Nomad binary in the container
image.

```hcl
job "example" {
group "group" {

disconnect {
stop_on_client_after = "1m"
}

task "primary" {
config {
driver = "docker"
image = "example/app:1"
command = "nomad"
args = [
"var", "lock", "nomad/jobs/example/lock", # lock
"busybox", "httpd", # application
"-vv", "-f", "-p", "8001", "-h", "/local" # application args
]
}

identity {
env = true
}
}
}
}
```

If you don't want to ship a Nomad binary in the container image, make a
read-only mount of the binary from a host volume. This only works in cases
where the Nomad binary has been statically linked or you have glibc in the
container image.

<CodeBlockConfig lineNumbers highlight="8-12,30-33">

```hcl
job "example" {
group "group" {

disconnect {
stop_on_client_after = "1m"
}

volume "binaries" {
type = "host"
source = "binaries"
read_only = true
}

task "primary" {
config {
driver = "docker"
image = "example/app:1"
command = "/opt/bin/nomad"
args = [
"var", "lock", "nomad/jobs/example/lock", # lock
"busybox", "httpd", # application
"-vv", "-f", "-p", "8001", "-h", "/local" # application args
]
}

identity {
env = true # make NOMAD_TOKEN available to lock command
}

volume_mount {
volume = "binaries"
destination = "/opt/bin"
}
}
}
}

### Sidecar lock

If you cannot implement the lock logic in your application or with a shim such
as `nomad var lock`, you need to implement it such that the task you are locking
is running as a sidecar of the locking task, which has [`task.leader=true`][]
set.

<CodeBlockConfig lineNumbers highlight="9">

```hcl
job "example" {
group "group" {

disconnect {
stop_on_client_after = "1m"
}

task "lock" {
leader = true
config {
driver = "raw_exec"
command = "/opt/lock-script.sh"
pid_mode = "host"
}

identity {
env = true # make NOMAD_TOKEN available to lock command
}
}

task "application" {
lifecycle {
hook = "poststart"
sidecar = true
}

config {
driver = "docker"
image = "example/app:1"
}
}
}
}

The locking task has the following requirements:

- Must be in the same group as the task being locked.
- Must be able to terminate the task being locked without the Nomad client being
up. For example, they share the same PID namespace, or the locking task is
privileged.
- Must have a way of signalling the task being locked that it is safe to start.
For example, the locking task can write a Sentinel file into the `/alloc`
directory, which the locked task tries to read on startup and blocks until it
exists.

If you cannot meet the third requirement, then you need to split the lock
acquisition and lock heartbeat into separate tasks.

<CodeBlockConfig lineNumbers highlight="8-20,22-32">

```hcl
job "example" {
group "group" {

disconnect {
stop_on_client_after = "1m"
}

task "acquire" {
lifecycle {
hook = "prestart"
sidecar = false
}
config {
driver = "raw_exec"
command = "/opt/lock-acquire-script.sh"
}
identity {
env = true # make NOMAD_TOKEN available to lock command
}
}

task "heartbeat" {
leader = true
config {
driver = "raw_exec"
command = "/opt/lock-heartbeat-script.sh"
pid_mode = "host"
}
identity {
env = true # make NOMAD_TOKEN available to lock command
}
}

task "application" {
lifecycle {
hook = "poststart"
sidecar = true
}

config {
driver = "docker"
image = "example/app:1"
}
}
}
}

[`group.disconnect.stop_on_client_after`]: /nomad/docs/job-specification/disconnect#stop_on_client_after
[Locks API]: /nomad/api-docs/variables/locks
[Task API]: /nomad/api-docs/task-api
[`nomad var lock`]: /nomad/commands/var/lock
[workload-associated ACL policy]: /nomad/docs/concepts/workload-identity#workload-associated-acl-policies
[`namespace.variables`]: /nomad/docs/other-specifications/acl-policy#variables
[`task.leader=true`]: /nomad/docs/job-specification/task#leader
[`restart`]: /nomad/docs/job-specification/restart
4 changes: 4 additions & 0 deletions content/nomad/v1.11.x/data/docs-nav-data.json
Original file line number Diff line number Diff line change
Expand Up @@ -697,6 +697,10 @@
{
"title": "Configure rolling",
"path": "job-declare/strategy/rolling"
},
{
"title": "Configure singleton",
"path": "job-declare/strategy/singleton"
}
]
},
Expand Down
Loading