Notes on Ansible

By: Cam Wohlfeil
Published: 2019-06-28 1235 EDT
Category: DevOps
Tags: ansible

These are my notes to go along with Ansible training I have done, this is not a full walkthrough or best practices guide. Here's the files: https://gitlab.com/cwohlfeil/MasteringAnsible

Preparations

See topology.pdf See Vagrantfile

https://www.vagrantup.com/ https://docs.ansible.com/ansible/intro_installation.html https://rogerwelin.github.io/ansible/docker/2016/07/04/testing-ansible-playbooks-with-docker.html

Foundations

Inventory

Two types, static and dynamic. Inventory lists hosts but can also provide additional details, such as how to connect. Allows you to group together by role. Can also pass in variables. ansible --list-hosts-all Dummy hosts are included as a template: /etc/ansible/hosts. Ansible will always try to connect via SSH, in the dev file we tell it to use a local connection to the control node. Global config is located at: /etc/ansible/ansible.cfg.

https://docs.ansible.com/ansible/intro_inventory.html https://docs.ansible.com/ansible/intro_dynamic_inventory.html

Host Selection

ansible --list-hosts loadbalancer, supports globbing, wildcards ansible --list-hosts webserver[0], supports arrays (this will select first result) ansible --list-hosts \!control, supports negation (have to escape for bash)

https://docs.ansible.com/ansible/intro_patterns.html

Tasks

ansible -m ping all, ping is the command, all are the hosts to run it on. All tasks have return status, even if a failure. Non 0 exit codes are considered a failure. This is for basic troubleshooting.

https://docs.ansible.com/ansible/intro_getting_started.html#your-first-commands https://docs.ansible.com/ansible/ping_module.html https://docs.ansible.com/ansible/command_module.html

Plays

Playbook is a YAML file with plays in it. Plays are a set of target hosts and set of tasks to run on them. Hosts and tasks are the two YAML keys in the file. See ansible/playbooks/hostname.yml Don't have to respecify targets, can track changes, can add/modify. Don't just worry about running commands and results, but the process as well.

https://docs.ansible.com/ansible/playbooks_intro.html#playbook-language-example

Playbook Execution

First ansible does data gathering. Next command is executed. Rather than querying end host, shifting towards only focusing on errors. Last is the play recap. Even though we only ran commands, ansible considers it a change.

https://docs.ansible.com/ansible/playbooks_intro.html#executing-a-playbook

Playbooks

Four major aspects: * Packages needed * Service handler * System config * App config files

Standard playbook creation loop: pick a module, implement what it needs, and run to test.

https://docs.ansible.com/ansible/modules_by_category.html

Packages: `apt`

Tasks: * name: install nginx same as using apt on the command line * apt: name=nginx state=present update_cache=yes the app state, present will check if it's installed, you can also do latest to ensure it's up to date or pin the package version. Third parameter will run apt update.

https://docs.ansible.com/ansible/apt_module.html

Packages: `become`

Even though ansible is a sudoer, we need to tell it to use those permissions. We can do so by adding become: true. This will execute at the level it is specified, i.e. if at the top of the playbook, it will go for the entire playbook. Used to be called sudo.

https://docs.ansible.com/ansible/become.html https://docs.ansible.com/ansible/YAMLSyntax.html#yaml-basics http://yaml.org/type/bool.html

Packages: `with_items`

Ansible includes the loop with_items to help reduce code repetition. Feed it a list of things to loop over, and use Jinja2 templating to create variables in declaration.

https://docs.ansible.com/ansible/playbooks_loops.html#standard-loops http://jinja.pocoo.org/

Services: `service`

Ansible can manage services, it just needs to know they are there. service: name=nginx state=started enabled=yes for state, most common will be started, stopped, restarted, and reload. enabled=yes means the service will start on startup.

https://docs.ansible.com/ansible/service_module.html

Support Playbook 1 - Stack Restart

The playbook will restart the entire stack to known good config. Start by taking down the stack in order, from userfacing (i.e. loadbalancer to webserver), then restart database, finally bring them back up in reverse order.

Services: `apache2_module`, `handlers`, `notify`

Here we begin to prepare apache for our Python application. We will be doing this with mod_wsgi. This can be done with the apache2_module service. After enabling an apache service, apache must be restarted. apache2_module is idempotent, so if it's already enabled we'll skip right on, but if we set to restarted it will restart it no matter what. To solve this, we can set a handler, and by default it will not fire unless we request it with a notify condition. The nice thing about notify is that it will aggregate multiple calls and only run once.

https://docs.ansible.com/ansible/playbooks_intro.html#handlers-running-operations-on-change https://docs.ansible.com/ansible/apache2_module_module.html

Files: `copy`

File location is relative to playbook file. Trailing / specifies to copy the directory.

https://docs.ansible.com/ansible/copy_module.html

Application Modules: `pip`

Works as expected, pip: requirements=/var/www/demo/requirements.txt virtualenv=/var/www/demo/.venv

https://docs.ansible.com/ansible/pip_module.html

Files: `file`

Can be used to ensure files exist, do not exist, are symlinks, etc. Here we will be ensuring the default sites-enabled conf is absent and our demo site conf is a link instead.

https://docs.ansible.com/ansible/file_module.html

Files: `template`

Templates allow you to use a Jinja 2 templates, including many features from Python such as loops, to template file changes based on variables. template: src=templates/nginx.conf.j2 dest=/etc/nginx/sites-available/demo mode=0755

https://docs.ansible.com/ansible/template_module.html

Files: `lineinfile`

lineinfile allows you to read and write a file on the host, in this case we specify the mysql config and use regex to set the bind-address.

https://docs.ansible.com/ansible/lineinfile_module.html

Application Modules: `mysql_db`, `mysql_user`

Ansible has several packages for working with mysql, here we use mysql_db and mysql_user to set the basic configurations. This requires the python-mysqldb package installed. For mysql_user, priv=demo.*:ALL is standard mysql.

https://docs.ansible.com/ansible/mysql_db_module.html https://docs.ansible.com/ansible/mysql_user_module.html

Support Playbook 2 - Stack Status: `wait_for`

It's helpful now to make a playbook to help us quickly check status of services. We want this playbook to be read-only, and we can do this with status commands. Additionally, we want to ensure the hosts are answering on the correct ports. For this we can use wait_for with very short timeouts, since they should respond quickly. Additionally, we can add wait_for hints to the restart playbook to handle draining.

https://docs.ansible.com/ansible/wait_for_module.html

Support Playbook 2 - Stack Status: `uri`, `register`, `fail`, `when`

uri gives us an end-to-end web application test. register creates a variable and returns the content and output. fail check the contents and fail under certain conditions. when conditional logic, such as when using fail. By putting all this together, we can create an end-to-end test for our app.

https://docs.ansible.com/ansible/uri_module.html https://docs.ansible.com/ansible/playbooks_conditionals.html#register-variables https://docs.ansible.com/ansible/playbooks_conditionals.html#the-when-statement https://docs.ansible.com/ansible/playbooks_loops.html#standard-loops

Roles

At this point, we are finished with the basics, but we are making a lot of assumptions and not following best practices, especially security. We've made it work, now make it right. We can do this by going back, injecting roles and variables. We will focus on reusability, maintainability, and security. We should be able to change playbooks easily without much extra work, and roles allow this with encapsulation.

https://docs.ansible.com/ansible/playbooks_roles.html

Converting to Roles: `tasks`, `handlers`

Since the tasks have been moved to main.yml, Ansible can make some implicit assumptions, so we no longer need to define these tasks in the plays and can instead reference them with the roles key. Note: The database tasks had a possible deadlock condition, to fix this we put ensure mysql started after the configuration changes. We can also replace handlers with roles as well.

https://docs.ansible.com/ansible/playbooks_roles.html

Converting to Roles: `files`, `templates`

We're going to move the nginx.conf.j2 to a location closer to the role file and change the relative path. We'll also move the demo app files under the apache2 role.

Site.yml: `include`

By creating a site.yml and adding all our playbooks with include, we can automate executing all of them.

https://docs.ansible.com/ansible/playbooks_roles.html

Variables: `facts`

By using facts, we can create a dynamic variable based on the host information, such as IP address. We can replace the mysql bind-address value with {{ ansible_eth0.ipv4.address }}. stack_status.yml is now out of sync with our configurations and must be fixed.

https://docs.ansible.com/ansible/playbooks_variables.html#information-discovered-from-systems-facts

Variables: `defaults`

Now we can replace the default configurations in the database with variables, and instead put those default variables in defaults/main.yml

https://docs.ansible.com/ansible/playbooks_roles.html#role-default-variables

Variables: `vars`

There are several precedence levels of vars, generally as you get closer to what is actually being run you want higher precedence. Be careful not to have to many overrides. vars are all in global scope, so organization is important. Here we will use three levels, defaults, roles, and group_vars. Pass variables in as a dictionary.

https://docs.ansible.com/ansible/playbooks_variables.html

Variables: `with_dict`

with_dict is how variables can be passed in. Since it is a dictionary, we need to use {{ item.key }} to reference the value.

https://docs.ansible.com/ansible/playbooks_loops.html#looping-over-hashes

Selective Removal: `shell`, `register`, `with_items`, `when`

Drift has happened in out config now, demo is still there but Ansible no longer cares about it since it's no longer in the config. For this, we will ensure nothing else is running as well. Use shell to run commands on the host to get what is already activated. register will register the output as a list variable. with_items will loop through the list. when will perform an action when it finds an site not in the sites key.

https://docs.ansible.com/ansible/shell_module.html https://docs.ansible.com/ansible/playbooks_conditionals.html#register-variables https://docs.ansible.com/ansible/playbooks_conditionals.html#the-when-statement https://docs.ansible.com/ansible/playbooks_loops.html#standard-loops

Variables: `vars_files`, `group_vars`

Ansible has a few ways to keep external variable files, such as inventory, vars_files, and group_vars. We're not going to use inventory to not overload that and mix logic, we use group_vars/all.yml to keep everything in the same location.

https://docs.ansible.com/ansible/intro_inventory.html#splitting-out-host-and-group-specific-data https://docs.ansible.com/ansible/playbooks_variables.html#variable-file-separation

Variables: `vault`

Secrets (passwords, SSH keys, etc.) are very dangerous to leave in your configuration. Vault will create an encrypted file to store secrets safely with a passphrase. Since it will encrypt the entire file, it's best to separate out your secret variables from the rest of your variables, and that's what we will do. group_vars/all will become group_vars/all/vars, and group_vars/all/vault will be the vault. You can use ansible-playbook --ask-vault-pass to unlock the playbook while you work. You can also create a vault password file some where safe, like in your home directory, and tell ansible where to find the file by defining vault_password_file = <file path> in ansible.cfg. These only support one vault password, so if you are managing multiple environments this may be tricky.

https://docs.ansible.com/ansible/playbooks_vault.html

External Roles & Galaxy

Ansible Galaxy is a platform to share third-party roles. There are advantages and disadvantages, like with using any external libraries and tools.

Some considerations: * Age * Ratings * App feature coverage * Updates * Dependence * Modifications you must make

Quiz

Q: How could you define a variable value and be absolutely sure that it would not be overridden anywhere else by Ansible? A: Pass the variable using the -e or --extra-varsparameter when running ansible-playbook. Q: What ad-hoc command would you run to determine the facts available for a server? A: ansible -m setup The setup module will query all facts on a host and return them.

Advanced Execution

Now we are done with our configurations, we move on to making things faster. This isn't critical if performance is good enough, and if the playbooks are actively being changed it's a waste of time. First we start with a benchmark to refer to later: time ansible-playbook site.yml. This will be a no-op, no changes are made. Also we will benchmark a stack restart: time ansible-playbook playbooks/stack_restart.yml

Removing Unnecessary Steps: `gather_facts`

For any task where we do not need to use facts, then we can simply add a key to skip the step: gather_facts: false. This gives immediate gains with no downsides if done correctly.

https://docs.ansible.com/ansible/playbooks_variables.html#turning-off-facts

Extracting Repetitive Tasks: `cache_valid_time`

Updating cache has a lot of overhead, so by setting cache_valid_time we can tell the playbook to only check within a reasonable period, not every single time. This means when we do check we will pay slightly higher time costs, but we check far less often, with big long term gains.

https://docs.ansible.com/ansible/apt_module.html

Limiting Execution by Hosts: `limit`

Now site.yml is taking about 1 minute, which isn't very long, but it can quickly add up. If we only need to run a playbook on a subset of hosts, we can run it with limit specified: ansible-playbook site.yml --limit app01. This allows us to use the whole site.yml logic.

https://docs.ansible.com/ansible/intro_patterns.html

Limiting Execution by Tasks: `tags`

The tags key allow us to apply specific tasks and playbooks to a certain subset of hosts. For example, setting tags: [ 'packages' ] allows us to do this: ansible-playbook site.yml --tags "packages", and only that task will execute. We can also invert this logic: ansible-playbook site.yml --skip-tags "packages" Now our runtime is down to 15 seconds. Tags are flexible but can easily get out of hand.

https://docs.ansible.com/ansible/playbooks_tags.html

Idempotence: `changed_when`, `failed_when`

By using these, we can define our task output to get more or less information based on conditions. changed_when: false don't show output. changed_when: "active.stdout_lines != sites.keys()" this is a Python expression being evaluated for truthiness.

https://docs.ansible.com/ansible/playbooks_error_handling.html#overriding-the-changed-result

Accelerated Mode and Pipelining

Ansible uses the installed OpenSSH with a fallback to the paramiko library for compatibility, but performance loss. Accelerated mode and pipelining takes advantage of new features as long as your environment meets the requirements. It's not considered best practice, but it is available if needed.

https://docs.ansible.com/ansible/playbooks_acceleration.html https://docs.ansible.com/ansible/intro_configuration.html#pipelining

Troubleshooting, Testing, & Validation

Troubleshooting Ordering Problems

Inevitably you will run in to errors and playbooks will not execute, usually ordering problems are the cause. For example: a configuration change might not work, so you can't restart the service, and the service is expected to be up on the next run. There are many ways around this, but the best way is to think through logically what is happening and fix the source of the problem.

Jumping to Specific Tasks: `list-tasks`, `step`, `start-at-task`

While troubleshooting, you can use these commands to limit the run to just the tasks you are working on. ansible-playbook site.yml --step will go through the playbook step by step, requiring interaction to continue. ansible-playbook site.yml --list-tasks will output every task that must be executed, so we can select it for start-at-task.

https://docs.ansible.com/ansible/playbooks_startnstep.html

Retrying Failed Hosts

Not every host may fail, when hosts fail ansible will output a file containing just these hosts. --limit @/home/ansible/site.retry

Syntax-Check & Dry-Run: `syntax-check`, `check`

Static analysis and a no-op dry-run. ansible-playbook --syntax-check site.yml ansible-playbook --check site.yml It may be helpful to run against specific playbooks rather than an overriding playbook like site.yml. Dynamic data or reliance on previous tasks can't be checked, this is just a best guess. You always have to provide some sort of inventory to run these.

https://docs.ansible.com/ansible/playbooks_checkmode.html

Debugging: `debug`

Spits out messages or variable values at the defined point in your playbook. For example: - debug: var=active.stdout_lines

https://docs.ansible.com/ansible/debug_module.html

Quiz

Q: Which of the following is NOT an option presented after running ansible-playbook with '--step'? A: c - cancel execution and return to the prompt. The "c" will cause resumption of the playbook, normal. Use to break and cancel playbook execution. Q: When there are failed hosts, Ansible creates a "retry" file at /home/<user>/<playbook>.retry. How would you use a file /home/ansible/site.retry to retry the last execution? A: ansible-playbook site.yml --limit @/home/ansible/site.retry The site.yml playbook should be executed, as normal. The .retry file is a list of hosts that can be used to limit the execution to only the failures from the previous run. The "@" instructs ansible-playbook to use the contents of the retry file for the limit, not the literal file path.

Notes on Ansible

Preparations

Foundations

Inventory

Host Selection

Tasks

Plays

Playbook Execution

Playbooks

Packages: apt

Packages: become

Packages: with_items

Services: service

Support Playbook 1 - Stack Restart

Services: apache2_module, handlers, notify

Files: copy

Application Modules: pip

Files: file

Files: template

Files: lineinfile

Application Modules: mysql_db, mysql_user

Support Playbook 2 - Stack Status: wait_for

Support Playbook 2 - Stack Status: uri, register, fail, when

Roles

Converting to Roles: tasks, handlers

Converting to Roles: files, templates

Site.yml: include

Variables: facts

Variables: defaults

Variables: vars

Variables: with_dict

Selective Removal: shell, register, with_items, when

Variables: vars_files, group_vars

Variables: vault

External Roles & Galaxy

Quiz

Advanced Execution

Removing Unnecessary Steps: gather_facts

Extracting Repetitive Tasks: cache_valid_time

Limiting Execution by Hosts: limit

Limiting Execution by Tasks: tags

Idempotence: changed_when, failed_when

Accelerated Mode and Pipelining

Troubleshooting, Testing, & Validation

Troubleshooting Ordering Problems

Jumping to Specific Tasks: list-tasks, step, start-at-task

Retrying Failed Hosts

Syntax-Check & Dry-Run: syntax-check, check

Debugging: debug

Quiz

Packages: `apt`

Packages: `become`

Packages: `with_items`

Services: `service`

Services: `apache2_module`, `handlers`, `notify`

Files: `copy`

Application Modules: `pip`

Files: `file`

Files: `template`

Files: `lineinfile`

Application Modules: `mysql_db`, `mysql_user`

Support Playbook 2 - Stack Status: `wait_for`

Support Playbook 2 - Stack Status: `uri`, `register`, `fail`, `when`

Converting to Roles: `tasks`, `handlers`

Converting to Roles: `files`, `templates`

Site.yml: `include`

Variables: `facts`

Variables: `defaults`

Variables: `vars`

Variables: `with_dict`

Selective Removal: `shell`, `register`, `with_items`, `when`

Variables: `vars_files`, `group_vars`

Variables: `vault`

Removing Unnecessary Steps: `gather_facts`

Extracting Repetitive Tasks: `cache_valid_time`

Limiting Execution by Hosts: `limit`

Limiting Execution by Tasks: `tags`

Idempotence: `changed_when`, `failed_when`

Jumping to Specific Tasks: `list-tasks`, `step`, `start-at-task`

Syntax-Check & Dry-Run: `syntax-check`, `check`

Debugging: `debug`