Skip to content

Commit f12f70d

Browse files
tgrossaimeeu
andauthored
Nomad: recommendations for singleton deployments (#1473)
Many users have a requirement to run exactly one instance of a given allocation because it requires exclusive access to some cluster-wide resource, which we'll refer to here as a "singleton allocation". This is challenging to implement, so this document is intended to describe an accepted design to publish as a how-to/tutorial. Co-authored-by: Aimee Ukasick <[email protected]>
1 parent c5b9672 commit f12f70d

File tree

2 files changed

+305
-0
lines changed

2 files changed

+305
-0
lines changed
Lines changed: 301 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,301 @@
1+
---
2+
layout: docs
3+
page_title: Configure singleton deployments
4+
description: |-
5+
Declare a job that guarantees only a single instance can run at a time, with
6+
minimal downtime.
7+
---
8+
9+
# Configure singleton deployments
10+
11+
A singleton deployment is one where there is at most one instance of a given
12+
allocation running on the cluster at one time. You might need this if the
13+
workload needs exclusive access to a remote resource like a data store. Nomad
14+
does not support singleton deployments as a built-in feature. Your workloads
15+
continue to run even when the Nomad client agent has crashed, so ensuring
16+
there's at most one allocation for a given workload requires some cooperation from the
17+
job. This document describes how to implement singleton deployments.
18+
19+
## Design Goals
20+
21+
The configuration described here meets these primary design goals:
22+
23+
- The design prevents a specific process with a task from running if there
24+
is another instance of that task running anywhere else on the Nomad cluster.
25+
- Nomad should be able to recover from failure of the task or the node on which
26+
the task is running with minimal downtime, where "recovery" means that Nomad should stop the
27+
original task and schedule a replacement
28+
task.
29+
- Nomad should minimize false positive detection of failures to avoid
30+
unnecessary downtime during the cutover.
31+
32+
There's a tradeoff between between recovery speed and false positives. The
33+
faster you make Nomad attempt to recover from failure, the more likely that a
34+
transient failure causes Nomad to schedule a replacement and a subsequent
35+
downtime.
36+
37+
Note that it's not possible to design a perfectly zero-downtime singleton
38+
allocation in a distributed system. This design errs on the side of
39+
correctness: having zero or one allocations running rather than the incorrect one or two
40+
allocations running.
41+
42+
## Overview
43+
44+
There are several options available for some details of the implementation, but
45+
all of them include the following:
46+
47+
- You must have a distributed lock with a TTL that's refreshed from the
48+
allocation. The process that sets and refreshes the lock must have its
49+
lifecycle tied to the main task. It can be either in-process, in-task with
50+
supervision, or run as a sidecar. If the allocation cannot obtain the lock,
51+
then it must not start whatever process or operation you intend to be a
52+
singleton. After a configurable window without obtaining the lock, the
53+
allocation must fail.
54+
- You must set the [`group.disconnect.stop_on_client_after`][] field. This
55+
forces a Nomad client that's disconnected from the server to stop the
56+
singleton allocation, which in turn releases the lock or allows its TTL to
57+
expire.
58+
59+
Tune the lock TTL, the time it takes the alloc to
60+
give up, and the `stop_on_client_after` duration timer values to reduce the
61+
maximum amount of downtime the application can have.
62+
63+
The Nomad [Locks API][] can support the operations needed. In psuedo-code these
64+
operations are the following:
65+
66+
- To acquire the lock, `PUT /v1/var/:path?lock-acquire`
67+
- On success: start heartbeat every 1/2 TTL
68+
- On conflict or failure: retry with backoff and timeout.
69+
- Once out of attempts, exit the process with error code.
70+
- To heartbeat, `PUT /v1/var/:path?lock-renew`
71+
- On success: continue
72+
- On conflict: exit the process with error code
73+
- On failure: retry with backoff up to TTL.
74+
- If TTL expires, attempt to revoke lock, then exit the process with error code.
75+
76+
The allocation can safely use the Nomad [Task API][] socket to write to the
77+
locks API, rather than communicating with the server directly. This reduces load
78+
on the server and speeds up detection of failed client nodes because the
79+
disconnected client cannot forward the Task API requests to the leader.
80+
81+
The [`nomad var lock`][] command implements this logic, so you can use it to shim
82+
the process being locked.
83+
84+
### ACLs
85+
86+
Allocations cannot write to Nomad variables by default. You must configure a
87+
[workload-associated ACL policy][] that allows write access in the
88+
[`namespace.variables`][] block. For example, the following ACL policy allows
89+
access to write a lock on the path `nomad/jobs/example/lock` in the `prod`
90+
namespace:
91+
92+
```
93+
namespace "prod" {
94+
variables {
95+
path "nomad/jobs/example/lock" {
96+
capabilities = ["write", "read", "list"]
97+
}
98+
}
99+
}
100+
```
101+
102+
You set this policy on the job with `nomad acl policy apply -namespace prod -job
103+
example example-lock ./policy.hcl`.
104+
105+
## Implementation
106+
107+
### Use `nomad var lock`
108+
109+
We recommend implementing the locking logic with `nomad var lock` as a shim in
110+
your task. This example jobspec assumes there's a Nomad binary in the container
111+
image.
112+
113+
```hcl
114+
job "example" {
115+
group "group" {
116+
117+
disconnect {
118+
stop_on_client_after = "1m"
119+
}
120+
121+
task "primary" {
122+
config {
123+
driver = "docker"
124+
image = "example/app:1"
125+
command = "nomad"
126+
args = [
127+
"var", "lock", "nomad/jobs/example/lock", # lock
128+
"busybox", "httpd", # application
129+
"-vv", "-f", "-p", "8001", "-h", "/local" # application args
130+
]
131+
}
132+
133+
identity {
134+
env = true
135+
}
136+
}
137+
}
138+
}
139+
```
140+
141+
If you don't want to ship a Nomad binary in the container image, make a
142+
read-only mount of the binary from a host volume. This only works in cases
143+
where the Nomad binary has been statically linked or you have glibc in the
144+
container image.
145+
146+
<CodeBlockConfig lineNumbers highlight="8-12,30-33">
147+
148+
```hcl
149+
job "example" {
150+
group "group" {
151+
152+
disconnect {
153+
stop_on_client_after = "1m"
154+
}
155+
156+
volume "binaries" {
157+
type = "host"
158+
source = "binaries"
159+
read_only = true
160+
}
161+
162+
task "primary" {
163+
config {
164+
driver = "docker"
165+
image = "example/app:1"
166+
command = "/opt/bin/nomad"
167+
args = [
168+
"var", "lock", "nomad/jobs/example/lock", # lock
169+
"busybox", "httpd", # application
170+
"-vv", "-f", "-p", "8001", "-h", "/local" # application args
171+
]
172+
}
173+
174+
identity {
175+
env = true # make NOMAD_TOKEN available to lock command
176+
}
177+
178+
volume_mount {
179+
volume = "binaries"
180+
destination = "/opt/bin"
181+
}
182+
}
183+
}
184+
}
185+
186+
### Sidecar lock
187+
188+
If you cannot implement the lock logic in your application or with a shim such
189+
as `nomad var lock`, you need to implement it such that the task you are locking
190+
is running as a sidecar of the locking task, which has [`task.leader=true`][]
191+
set.
192+
193+
<CodeBlockConfig lineNumbers highlight="9">
194+
195+
```hcl
196+
job "example" {
197+
group "group" {
198+
199+
disconnect {
200+
stop_on_client_after = "1m"
201+
}
202+
203+
task "lock" {
204+
leader = true
205+
config {
206+
driver = "raw_exec"
207+
command = "/opt/lock-script.sh"
208+
pid_mode = "host"
209+
}
210+
211+
identity {
212+
env = true # make NOMAD_TOKEN available to lock command
213+
}
214+
}
215+
216+
task "application" {
217+
lifecycle {
218+
hook = "poststart"
219+
sidecar = true
220+
}
221+
222+
config {
223+
driver = "docker"
224+
image = "example/app:1"
225+
}
226+
}
227+
}
228+
}
229+
230+
The locking task has the following requirements:
231+
232+
- Must be in the same group as the task being locked.
233+
- Must be able to terminate the task being locked without the Nomad client being
234+
up. For example, they share the same PID namespace, or the locking task is
235+
privileged.
236+
- Must have a way of signalling the task being locked that it is safe to start.
237+
For example, the locking task can write a Sentinel file into the `/alloc`
238+
directory, which the locked task tries to read on startup and blocks until it
239+
exists.
240+
241+
If you cannot meet the third requirement, then you need to split the lock
242+
acquisition and lock heartbeat into separate tasks.
243+
244+
<CodeBlockConfig lineNumbers highlight="8-20,22-32">
245+
246+
```hcl
247+
job "example" {
248+
group "group" {
249+
250+
disconnect {
251+
stop_on_client_after = "1m"
252+
}
253+
254+
task "acquire" {
255+
lifecycle {
256+
hook = "prestart"
257+
sidecar = false
258+
}
259+
config {
260+
driver = "raw_exec"
261+
command = "/opt/lock-acquire-script.sh"
262+
}
263+
identity {
264+
env = true # make NOMAD_TOKEN available to lock command
265+
}
266+
}
267+
268+
task "heartbeat" {
269+
leader = true
270+
config {
271+
driver = "raw_exec"
272+
command = "/opt/lock-heartbeat-script.sh"
273+
pid_mode = "host"
274+
}
275+
identity {
276+
env = true # make NOMAD_TOKEN available to lock command
277+
}
278+
}
279+
280+
task "application" {
281+
lifecycle {
282+
hook = "poststart"
283+
sidecar = true
284+
}
285+
286+
config {
287+
driver = "docker"
288+
image = "example/app:1"
289+
}
290+
}
291+
}
292+
}
293+
294+
[`group.disconnect.stop_on_client_after`]: /nomad/docs/job-specification/disconnect#stop_on_client_after
295+
[Locks API]: /nomad/api-docs/variables/locks
296+
[Task API]: /nomad/api-docs/task-api
297+
[`nomad var lock`]: /nomad/commands/var/lock
298+
[workload-associated ACL policy]: /nomad/docs/concepts/workload-identity#workload-associated-acl-policies
299+
[`namespace.variables`]: /nomad/docs/other-specifications/acl-policy#variables
300+
[`task.leader=true`]: /nomad/docs/job-specification/task#leader
301+
[`restart`]: /nomad/docs/job-specification/restart

content/nomad/v1.11.x/data/docs-nav-data.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -697,6 +697,10 @@
697697
{
698698
"title": "Configure rolling",
699699
"path": "job-declare/strategy/rolling"
700+
},
701+
{
702+
"title": "Configure singleton",
703+
"path": "job-declare/strategy/singleton"
700704
}
701705
]
702706
},

0 commit comments

Comments
 (0)