Nature of Ansible's "Timeout (12s) waiting for privilege escalation prompt" error
Ansible can occasionally fail with the following error: "Timeout (12s) waiting for privilege escalation prompt".
At some point I wanted to finally learn exactly what this "privilege escalation prompt timeout" is, where I can change its value and where it says that it defaults to 12 seconds.
Clearly it relates to sudo
(or other become
mechanism), but I could not
find anything about it in Ansible documentation - nothing about such specific
timeout. In fact, it seemed that there is no setting in Ansible as a whole
with a default value of 12 seconds at all. And I did not have any such value set anywhere on
my side, too.
Inspecting Ansible source
Luckly, searching Ansible source code for the exact place where this error originates from turned out to be trivial:
$ git log -1 --pretty=oneline
82529e534dd3edd84aba03d86b337f88c58b9982 (HEAD, tag: v2.19.0) New release v2.19.0 (#85513)
$ grep -Rn "waiting for privilege escalation prompt" lib/
lib/ansible/plugins/connection/ssh.py:1245: raise AnsibleConnectionFailure('Timeout (%ds) waiting for privilege escalation prompt: %s' % (timeout, to_native(b_stdout)))
From there it's easy to track down the whole context that is important here.
First we have the code that starts ssh
in a subprocess (with a proper connection timeout
passed as an argument, e.g. as -o ConnectTimeout
to openssh client):
p = subprocess.Popen(...)
Then we construct the timeout value the we discuss in this note:
# select timeout should be longer than the connect timeout, otherwise
# they will race each other when we can't connect, and the connect
# timeout usually fails
timeout = 2 + self.get_option('timeout')
And we use it just a moment later, when we wait for data from the connection:
selector = selectors.DefaultSelector()
selector.register(p.stdout, selectors.EVENT_READ)
selector.register(p.stderr, selectors.EVENT_READ)
...
while True:
poll = p.poll()
events = selector.select(timeout)
# We pay attention to timeouts only while negotiating a prompt.
if not events:
# We timed out
if state <= states.index('awaiting_escalation'):
# If the process has already exited, then it's not really a
# timeout; we'll let the normal error handling deal with it.
if poll is not None:
break
self._terminate_process(p)
raise AnsibleConnectionFailure('Timeout (%ds) waiting for privilege escalation prompt: %s' % (timeout, to_native(b_stdout)))
The two key things to note here are:
-
The value of 12 seconds mentioned in the title, which is used as a timeout for reaching privilege escalation prompt, comes from the value of SSH connection timeout (which defaults to 10 seconds) extended by hardcoded extra 2 seconds.
-
Countdown for both timeouts (connection and reaching privilege escalation prompt) is started approximately at the same time - when connection initiation starts, i.e. they more or less overlap (as opposed to prompt timeout starting its countdown after the connection is ready).
For defaults we have 10 seconds to connect to the host (timeout passed to the
ssh
process), and then we have all what is left from these 10 seconds, plus
extra 2 seconds, to reach privilege escalation prompt from the remote system
(assuming we requested such escalation).
Side note on these two extra seconds
It looks like this 2 seconds extension was added a long time ago (2015), to avoid a situation where the
connection timeout managed by the ssh
process and the reaching-escalation-prompt
timeout managed inside the Python process would race each other upon slow
connection (or no connectivity at all) and then lead to other confusing error message.
Meaning that this extension is not motivated by wanting to have an extra time to get to the escalation prompt after connecting, but is rather a trick to solve different issue, with a side effect of having this one extra timeout value.
Conclusions
There is a dedicated timeout for obtaining privilege escalation prompt on the host, but its value is an implementation detail and is not exposed to the user as a separate setting. It's simply derived from SSH connection timeout by extending it by 2 seconds, which, if default settings are used, results in the value of 12 seconds. At the same time this value is present in the timeout error message, which may be slightly confusing as to what is the exact nature of the timeout and its value.
Changing the SSH connection timeout setting is the way to influence how long Ansible will wait until triggering the timeout that generates the discussed error.
Slow connection process (taking substantial part of the connection timeout) reduces time left for awaiting escalation prompt after connection succeeded. On a heavy loaded systems this may trigger timeouts reported as "Timeout (12s) waiting for privilege escalation prompt", while the culprit may mostly come from some networking issues (as this reported timeout overlaps the connection process).
On the other hand, "Timeout (12s) waiting for privilege escalation prompt",
may also mean that ssh
connection succeeded with some delay, and yes it's
the host that is not too responsive, but it does not mean we have spent the whole
12 seconds between connection being established and us stopping awaiting for the
prompt - it might as well be, say, half that time (rest being used up during the
connection process).