Ansible - 'until' loop
Contents
Introduction
In this short post I'll introduce you to lesser known type of Ansible loop: "until" loop. This loop is used for retrying task until certain condition is met.
To use this loop in task you essentially need to add 3 arguments to your task arguments:
until
- condition that must be met for loop to stop. That is Ansible will continue executing the task until expression used here evaluates to true.
retry
- specifies how many times we want to run the task before Ansible gives up.
delay
- delay, in seconds, between retries.
As an example, below task will keep sending GET request to specified URL until the "status" key in response is equal to "READY". We ask Ansible to make 10 attempts in total with delay of 1 second between each attempt. If after final attempt condition in until
is still not met task is marked as failed.
- name: Wait until web app status is "READY"
uri:
url: "{{ app_url }}/status"
register: app_status
until: app_status.json.status == "READY"
retries: 10
delay: 1
What's so cool about this loop is that you can use it to actively check result of executing given task before proceeding to other tasks.
This is different to using when
task argument for instance, where we only execute task IF condition is met. Here the condition MUST be met before we execute next task.
One is conditional execution, usually based on static check, i.e. existence of package or feature, or value of pre-defined variable. The other pauses execution until condition is met, and failing task if it isn't, to ensure desired state is in place before proceeding.
Some scenarios where until
loop could be useful:
- Making sure web app service came up before progressing Playbook.
- Checking status via API endpoint of long running asynchronous task.
- Waiting for routing protocol adjacency to come up.
- Waiting for convergence of the system, e.g. routing in networking.
- Checking if Docker container is reporting as healthy.
- Retrying service that might take multiple attempts to come up fully.
Basically, there are a lot of use cases for until
loop :)
Note that some of the above can also be achieved with wait_for
module, which is a bit more specialized. Module wait_for
can check status of ports, files and processes, among other things. Have a look at link in References if you want to find out more.
Examples
We now know what until
loop is, how to use it, and where it could be useful. Next we'll now through some examples to give you a better intuition of how one would go about using it in Playbooks.
Setup details
Details of the setup used for the examples:
- Python 3.8.5
- Ansible 2.9.10 running in Python virtual environment
- Python libraries listed in
requirements.txt
in the GitHub repository for this post - Docker engine
- Docker container named "veos:4.18.10M" built with vrnetlab and "vEOS-lab-4.18.10M.vmdk" image
Example 1 - Polling web app status via API
In first example I have a Playbook that gets content of home page of a web app. Twist is that this web app takes some time to fully come up. Fortunately there is an API endpoint that we can query to check if the app is ready to accept requests.
We'll take advantage of the until
loop to keep polling the status until we get green light to proceed.
until_web_app.yml
---
- name: "PLAY 1. Use 'until' to wait for Web APP to come up."
hosts: local
connection: local
gather_facts: no
vars:
app_url: "http://127.0.0.1:5010"
tasks:
- name: "TASK 1.1. Start Web app (async 20 keeps up app in bg for 20 secs)."
command: python flask_app/main.py
async: 20
poll: 0
changed_when: no
- name: "TASK 1.2. Retrieve Web app home page (should fail)."
uri:
url: "{{ app_url }}"
register: app_hp
ignore_errors: yes
- name: "TASK 1.3. Display HTTP code returned by home page."
debug:
msg: "Web app returned {{ app_hp.status }} HTTP code"
- name: "TASK 1.4. Wait until GET to 'status' returns 'READY'."
uri:
url: "{{ app_url }}/status"
register: app_status
until: app_status.json.status == "READY"
retries: 10
delay: 1
- name: "TASK 1.5. Retrieve Web app home page (should succeed now)."
uri:
url: "{{ app_url }}"
register: app_hp
- name: "TASK 1.6. Display HTTP code and body returned by home page."
debug:
msg:
- "Web app returned {{ app_hp.status }} HTTP code"
- "Web page content: {{ lookup('url', app_url) }}"
Let's have a look at interesting bits in this Playbook.
I built this app with API endpoint that returns status of the service in the json payload. This can be either "NOT_READY" or "READY".
-
In TASK 1.1 we launch a small Flask Web App that takes 10 seconds to fully come up. I use
async
argument here to trick Ansible into keeping this up in background for 20 seconds, otherwise the Playbook would get stuck on this task. -
In TASK 1.2 we get an error while retrieving home page because App is not ready yet.
-
In TASK 1.4 we use
until
loop to keep querying thestatus
endpoint until returned value equals "READY". Only when the task succeeds will we proceed to the next task where we again retrieve home page, now knowing that our chance of succeeding is much higher. -
In TASK 1.5 we retrieve home page again, which should now succeed, contents of which we'll display in TASK 1.6.
A lot of different Web API services expose some kind of status
or healthcheck
endpoint so this example shows a very useful pattern that we can use elsewhere.
If you're curiouse, you can find code of the Flask app in the Github repository together with the playbook.
And this is the output from the Playbook run:
venv) przemek@quark:~/netdev/repos/ans_unt$ ansible-playbook -i hosts.yml until_web_app.yml
PLAY [PLAY 1. Use 'until' to wait for Web APP to come up.] *********************************************************************************************************************************************
TASK [TASK 1.1. Start Web app (async 20 keeps up app in bg for 20 secs).] ******************************************************************************************************************************
ok: [localhost]
TASK [TASK 1.2. Retrieve Web app home page (should fail).] *********************************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "content": "", "content_length": "0", "content_type": "text/html; charset=utf-8", "date": "Sun, 22 Nov 2020 16:47:37 GMT", "elapsed": 0, "msg": "Status code was 503 and not [200]: HTTP Error 503: SERVICE UNAVAILABLE", "redirected": false, "server": "Werkzeug/1.0.1 Python/3.8.5", "status": 503, "url": "http://127.0.0.1:5010"}
...ignoring
TASK [TASK 1.3. Display HTTP code returned by home page.] **********************************************************************************************************************************************
ok: [localhost] => {
"msg": "Web app returned 503 HTTP code"
}
TASK [TASK 1.4. Wait until GET to 'status' returns 'READY'.] *******************************************************************************************************************************************
FAILED - RETRYING: TASK 1.4. Wait until GET to 'status' returns 'READY'. (10 retries left).
FAILED - RETRYING: TASK 1.4. Wait until GET to 'status' returns 'READY'. (9 retries left).
FAILED - RETRYING: TASK 1.4. Wait until GET to 'status' returns 'READY'. (8 retries left).
FAILED - RETRYING: TASK 1.4. Wait until GET to 'status' returns 'READY'. (7 retries left).
FAILED - RETRYING: TASK 1.4. Wait until GET to 'status' returns 'READY'. (6 retries left).
FAILED - RETRYING: TASK 1.4. Wait until GET to 'status' returns 'READY'. (5 retries left).
FAILED - RETRYING: TASK 1.4. Wait until GET to 'status' returns 'READY'. (4 retries left).
ok: [localhost]
TASK [TASK 1.5. Retrieve Web app home page (should succeed now).] **************************************************************************************************************************************
ok: [localhost]
TASK [TASK 1.6. Display HTTP code and body returned by home page.] *************************************************************************************************************************************
ok: [localhost] => {
"msg": [
"Web app returned 200 HTTP code",
"Web page content: Service ready for use."
]
}
PLAY RECAP *********************************************************************************************************************************************************************************************
localhost : ok=6 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=1
Example 2 - Wait for BGP to establish before retrieving peer routes
In the world of networking we often encounter situations where some kind of adjacency, be it BFD, PIM or BGP has to be established before we can retrieve information that is of interest.
To illustrate this I wrote a Playbook that waits for BGP peering to come up before checking routes we receive from neighbors.
Bear in mind this is simplified for use in an example, in the real world you might need to add more checks to ensure routing information between peer has been fully exchanged.
until_eos_net.yml
---
- name: "PLAY 1. Use 'until' to wait for BGP sessions to establish."
hosts: veos_net
gather_facts: no
tasks:
- name: "TASK 1.1. Record peer IPs for use in 'until' task."
eos_command:
commands:
- command: show ip bgp summary
output: json
register: init_bgp_sum
- name: "TASK 1.2. Forcefully reset BGP sessions."
eos_command:
commands: clear ip bgp neighbor *
- name: "TASK 1.3. Use 'until' to wait for all BGP sessions to establish."
eos_command:
commands:
- command: show ip bgp summary
output: json
register: u_bgp_sum
until: u_bgp_sum.stdout.0.vrfs.default.peers[item.key].peerState == "Established"
retries: 15
delay: 1
loop: "{{ init_bgp_sum.stdout.0.vrfs.default.peers | dict2items }}"
loop_control:
label: "{{ item.key }}"
- name: "TASK 1.4. Retrieve neighbor routes."
eos_command:
commands:
- command: "show ip bgp neighbors {{ item.key }} routes"
output: json
register: nbr_routes
loop: "{{ init_bgp_sum.stdout.0.vrfs.default.peers | dict2items }}"
loop_control:
label: "{{ item.key }}"
- name: "TASK 1.5. Display neighbor routes."
debug:
msg:
- "{{ ''.center(80, '=') }}"
- "Neighbor: {{ nbr.item.key }}"
- "{{ nbr.stdout.0.vrfs.default.bgpRouteEntries.keys() | list }}"
- "{{ ''.center(80, '=') }}"
loop: "{{ nbr_routes.results }}"
loop_control:
loop_var: nbr
label: "{{ nbr.item.key }}"
Again, we'll look more closely at tasks that do something interesting.
-
In TASK 1.1 we record output of
show ip bgp summary
that we'll be used to iterate over list of BGP neighbors. -
In TASK 1.3 we have core of our logic.
- Using
until
loop we keep checking status of each of the peers until all of them report "Established" value. - During each retry output of
show ip bgp summary
is recorded inu_bgp_sum
variable. - To add to fun,
until
loop is run inside of a standard outer loop. Outer loop feedsuntil
IPs of the peers so that it's easier to access data structure recorded inu_bgp_sum
.
- Using
-
In TASK 1.4 we can get routes received from each neighbor knowing that all of the peerings are now established. These routes are displayed in TASK 1.5.
Waiting for convergence, or adjacency to get up, is another use case that comes up often. Hopefully this example illustrates how we can handle these.
You can also see here that until
loop happily cooperates with standard loop allowing us to handle even more use cases.
Below is the result of running this Playbook.
(venv) przemek@quark:~/netdev/repos/ans_unt$ ansible-playbook -i hosts.yml until_eos_net.yml
PLAY [PLAY 1. Use 'until' to wait for BGP sessions to establish.] **************************************************************************************************************************************
TASK [TASK 1.1. Record peer IPs for use in 'until' task.] **********************************************************************************************************************************************
ok: [veos01]
TASK [TASK 1.2. Forcefully reset BGP sessions.] ********************************************************************************************************************************************************
ok: [veos01]
TASK [TASK 1.3. Use 'until' to wait for all BGP sessions to establish.] ********************************************************************************************************************************
FAILED - RETRYING: TASK 1.3. Use 'until' to wait for all BGP sessions to establish. (15 retries left).
FAILED - RETRYING: TASK 1.3. Use 'until' to wait for all BGP sessions to establish. (14 retries left).
FAILED - RETRYING: TASK 1.3. Use 'until' to wait for all BGP sessions to establish. (13 retries left).
ok: [veos01] => (item=10.0.13.2)
ok: [veos01] => (item=10.0.12.2)
FAILED - RETRYING: TASK 1.3. Use 'until' to wait for all BGP sessions to establish. (15 retries left).
FAILED - RETRYING: TASK 1.3. Use 'until' to wait for all BGP sessions to establish. (14 retries left).
ok: [veos01] => (item=10.1.11.2)
TASK [TASK 1.4. Retrieve neighbor routes.] *************************************************************************************************************************************************************
ok: [veos01] => (item=10.0.13.2)
ok: [veos01] => (item=10.0.12.2)
ok: [veos01] => (item=10.1.11.2)
TASK [TASK 1.5. Display neighbor routes.] **************************************************************************************************************************************************************
ok: [veos01] => (item=10.0.13.2) => {
"msg": [
"================================================================================",
"Neighbor: 10.0.13.2",
[
"192.168.0.0/25",
"192.168.1.0/25",
"192.168.4.0/24",
"192.168.7.0/24",
"192.168.6.0/24",
"10.50.255.3/32",
"192.168.5.0/24"
],
"================================================================================"
]
}
ok: [veos01] => (item=10.0.12.2) => {
"msg": [
"================================================================================",
"Neighbor: 10.0.12.2",
[
"10.50.255.2/32",
"192.168.0.0/25"
],
"================================================================================"
]
}
ok: [veos01] => (item=10.1.11.2) => {
"msg": [
"================================================================================",
"Neighbor: 10.1.11.2",
[],
"================================================================================"
]
}
PLAY RECAP *********************************************************************************************************************************************************************************************
veos01 : ok=5 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Example 3 - Polling health status of Docker container
Docker is everywhere these days. There is a lot of tools like docker-compose that make running container easier. But we can also use Ansible to manage our containers.
Many containers these days come with built-in health checks which Docker engine can use to report on health of given container.
In this example I'll show you how we can talk to Docker to get the container status from inside of Ansible Playbook. Our goal is to launch 4 containers with virtual routers that we want to dynamically add to Ansible inventory. But we only want to do that once all of them came up fully.
until_docker.yml
---
- name: "PLAY 1. Provision lab with virtual routers."
hosts: local
connection: local
gather_facts: no
tasks:
- name: "TASK 1.1. Bring up virtual router containers."
docker_container:
name: "{{ item }}"
image: "{{ vr_image_name }}"
privileged: yes
register: cont_data
loop: "{{ vnodes }}"
loop_control:
pause: 10
- name: "TASK 1.2. Wait for virtual routers to finish booting."
docker_container_info:
name: "{{ item }}"
register: cont_check
until: cont_check.container.State.Health.Status == 'healthy'
retries: 15
delay: 25
loop: "{{ vnodes }}"
- name: "TASK 1.3. Auto discover device IPs and add to inventory group."
set_fact:
dyn_grp: "{{ dyn_grp | combine({cont_name: {'ansible_host': cont_ip_add }}) }}"
vars:
cont_ip_add: "{{ item.container.NetworkSettings.IPAddress }}"
cont_name: '{{ item.container.Name | replace("/", "") }}'
dyn_grp: {}
loop: "{{ cont_data.results }}"
loop_control:
label: "{{ cont_name }}"
- name: "TASK 1.4. Dynamically create hosts.yml inventory."
copy:
content: "{{ dyn_inv | to_nice_yaml }}"
dest: ./lab_hosts.yml
vars:
dyn_inv:
"{{ {'all': {'children': {inv_name: {'hosts': dyn_grp}}}} }}"
Of interest here are mostly TASK 1.1 and TASK 1.2. Remaining tasks deal with generating and saving inventory, but I wanted to leave them here to provide context.
Let's have a look at the first two tasks then.
-
In TASK 1.1 we loop over container names recorded in
vnodes
var and we launch container for each of the entries. I added 10 second pause between launching each container to avoid overwhelming my local Docker. -
In TASK 1.2 we got our
until
loop inside of standard loop. Inuntil
loop we tell Docker to get info on container with name fed from outer loop. Then we check if value of health status ishealthy
. We'll keep retrying here until we get status we want, of if we exceed number of retries the task will fail.
You might wonder how I chose the values for retries
and delay
arguments. These are completely arbitrary and depend on the machine and container that you're running. In my case I know from running these by hand that it takes some time for all containers to come up so 15 retries with 25 second delays fits my case well.
Now you can see that you can have Ansible poll status of your containers, pretty cool right?
To finish off, here's the result of this playbook being executed.
(venv) przemek@quark:~/netdev/repos/ans_unt$ ansible-playbook -i hosts.yml until_docker.yml
PLAY [PLAY 1. Provision lab with virtual routers.] *****************************************************************************************************************************************************
TASK [TASK 1.1. Bring up virtual router containers.] ***************************************************************************************************************************************************
changed: [localhost] => (item=spine1)
changed: [localhost] => (item=spine2)
changed: [localhost] => (item=leaf1)
changed: [localhost] => (item=leaf2)
TASK [TASK 1.2. Wait for virtual routers to finish booting.] *******************************************************************************************************************************************
FAILED - RETRYING: TASK 1.2. Wait for virtual routers to finish booting. (15 retries left).
FAILED - RETRYING: TASK 1.2. Wait for virtual routers to finish booting. (14 retries left).
FAILED - RETRYING: TASK 1.2. Wait for virtual routers to finish booting. (13 retries left).
FAILED - RETRYING: TASK 1.2. Wait for virtual routers to finish booting. (12 retries left).
FAILED - RETRYING: TASK 1.2. Wait for virtual routers to finish booting. (11 retries left).
FAILED - RETRYING: TASK 1.2. Wait for virtual routers to finish booting. (10 retries left).
FAILED - RETRYING: TASK 1.2. Wait for virtual routers to finish booting. (9 retries left).
ok: [localhost] => (item=spine1)
FAILED - RETRYING: TASK 1.2. Wait for virtual routers to finish booting. (15 retries left).
ok: [localhost] => (item=spine2)
ok: [localhost] => (item=leaf1)
FAILED - RETRYING: TASK 1.2. Wait for virtual routers to finish booting. (15 retries left).
ok: [localhost] => (item=leaf2)
TASK [TASK 1.3. Auto discover device IPs and add to inventory group.] **********************************************************************************************************************************
ok: [localhost] => (item=spine1)
ok: [localhost] => (item=spine2)
ok: [localhost] => (item=leaf1)
ok: [localhost] => (item=leaf2)
TASK [TASK 1.4. Dynamically create hosts.yml inventory.] ***********************************************************************************************************************************************
changed: [localhost]
PLAY RECAP *********************************************************************************************************************************************************************************************
localhost : ok=4 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Conclusion
Adding Ansible until
loop to your toolset will open some new possibilities. Ability to dynamically repeat polling until certain condition is met is powerful and will allow you to add logic to your Playbooks that otherwise might be difficult to achieve.
I hope that my examples helped in illustrating the value of the until
loop and you found this post useful.
Thanks for reading!
References
- Ansible docs for 'until' loop: https://docs.ansible.com/ansible/latest/user_guide/playbooks_loops.html#retrying-a-task-until-a-condition-is-met
- Ansible docs for 'wait_for' module:
https://docs.ansible.com/ansible/latest/collections/ansible/builtin/wait_for_module.html - vrnetlab: https://github.com/plajjan/vrnetlab
- TTL255 post on vrnetlab: https://ttl255.com/vrnetlab-run-virtual-routers-in-docker-containers/
- GitHub repo with resources for this post: https://github.com/progala/ttl255.com/tree/master/ansible/until-loop