I recently did a two part video section on HA AWX Ansible. Check out part 1 here if you haven’t seen it.
In this post I’ll be sharing some of the more intricate details of the things I did in the video.
To start off I need to clear up the statement of “High Availability”. This is not an HA solution but rather a “reasonably resilient to failure” solution.
In my lab I have infrastructure in two different data centers on opposite sides of the country. Each site is protected by a firewall with an IPSEC VPN tunnel between them.
For this all to work, I’ll need to have an external postgres database server, separate from AWX host.
To make things easy I’ll be using Jujucharms and Metal as a Service to do provisioning of the infrastructure and bringing postgres to life.
To make sure that the database machines get placed in opposite datacenters, I need to add a ‘tag’ of awx_db to a machine in US East inside of MAAS and then instruct JuJu to build a postgresql instance on an Ubuntu 18.04 server, using the tag as a constraint. Once the primary server is provisioned, I’ll add the same tag to a second machine in US West and then “scale” or add another postgresql unit to the model.
root@maas-region-ctl:~# juju add-model ha-awx-ansible
root@maas-region-ctl:~# juju deploy postgresql awx-postgres-db --series bionic --constraints tags=awx_db
root@maas-region-ctl:~# juju add-unit awx-postgres-db
Once the database servers are up I’ll use the JuJu GUI to make a small change to the configuration setting extra_pg_auth option which will allow external applications to connect to the postgres server. I set the string to
host all all 0.0.0.0/0 md5
Once this is done I’ll ssh to both units, switch to the postgres user and set the postgres password.
root@maas-region-ctl:~# juju ssh juju ssh awx-postgres-db/0
ubuntu@use-med-vm-04:~$ sudo su - postgres
postgres@use-med-vm-04:~$ psql
psql (10.6 (Ubuntu 10.6-0ubuntu0.18.04.1))
Type "help" for help.
postgres=# \password
Enter new password:
Enter it again:
postgres=# \q
After this I can use pgadmin to connect and create a new database for the AWX data. Click here to read Getting Started with pgadmin.
I can then instruct JuJu to build two more servers on the same model which I’ll use as the AWX hosts. The same way I provisioned the postgres instances, I’ll use a tag of awx_web to stand up the AWX Web servers and give them a friendly name of awx-web-primary and awx-web-secondary.
root@maas-region-ctl:~# juju deploy ubuntu awx-web-primary --series bionic --constraints tags=awx_web
root@maas-region-ctl:~# juju deploy ubuntu awx-web-secondary --series bionic --constraints tags=awx_web
Now that I have hosts for the web tier, I need to install Ansible, the Docker engine, and finally pull down the AWX code from github. Check out the docker install instructions here.
root@maas-region-ctl:~# juju ssh awx-web-primary/0
ubuntu@usw-med-vm-03:~$ sudo apt-get update
ubuntu@usw-med-vm-03:~$ sudo apt-get install software-properties-common
ubuntu@usw-med-vm-03:~$ sudo apt-add-repository --yes --update ppa:ansible/ansible
ubuntu@usw-med-vm-03:~$ sudo apt-get install ansible
ubuntu@usw-med-vm-03:~$ sudo su -
root@usw-med-vm-03:~# git clone https://github.com/ansible/awx.git
root@usw-med-vm-03:~# cd awx/installer/
Once the AWX repository has been cloned, I open up the inventory file inside the installer folder and fill out all the required parameters. Remember to uncomment “pg_hostname=” to enable a remote postgres server. If “pg_hostname=” is not set a local docker container is started for the postgres data.
To install AWX you’ll actually use ansible and declare the inventory file with all the settings and use the install.yml playbook.
root@usw-med-vm-03:~/awx/installer# ansible-playbook -i inventory install.yml
After the playbook run completes successfully there should be four docker containers running.
At this point, I have a working copy of AWX. I repeat this process for the second server and point that AWX configuration to the live postgres server as well.
To move the database from US East to US West, I simply instruct JuJu to destroy the live master server. This will release the machine back into the MAAS pool as well as promote the Replica server to the Live Master.
root@maas-region-ctl:~# juju remove-unit awx-postgres-db/0
Once the database is live in US West I simply ssh into the AWX web servers and edit the inventory file again making sure to point “pg_hostname=” to the IP address of the live postgres server in the West and then run the installer again
root@usw-med-vm-03:~/awx/installer# ansible-playbook -i inventory install.yml
11 thoughts on ““High Availability” AWX Ansible”
Sean,
That was a wonderful document you got. Everything worked perfectly good till the replication of Database (about to start with Oracle Database). I have a question that is striking me from long time, You just created Failure Resilience method as a small version of High Availability. But how much resources do you think we need for AWX and sub containers and also when it comes to database, how do we figure out the tables that will satisfy AWX and also resources for Database.
Since you are already using it for some time, What are some problems you faced and how you resolved them.
If you could share all of your thoughts on above questions it will be helpful for us.
Hey John
I would always reference the official docs first. You can check out the system requirements here :
https://github.com/ansible/awx/blob/devel/INSTALL.md#system-requirements
I’ll be honest, I’m really not strong when it comes to the database pieces.
What I do know is; after you install AWX ansible, the containers do log into the database and create all the necessary schemas and tables required. What these are exatacly – I don’t know (and too lazy to care) as long as they’re replicated elsewhere and backed up regularly.
My Database machines are tiny. 2 Cores, 8GB of memory and 100G of storage, and they’ve been fine. That said my Ansible inventories add up to about 150-200 hosts. In a large deployment you may need more DB resources.
Problems I’ve seen:
Having both web servers pointed at the same live database sometimes results in duplicate jobs (this behaviour is inconsistent though so I haven’t really figured that out.)
The backup Web hosts have run out of space on a few occasions because they were pointed to the read only database replica and the log files just went crazy 😀 LoL – Now I just stop awx_task and awx_web containers on the backup web servers until I need them.
After I patch the boxes sometimes the docker containers don’t come up. A reboot of the host usually sorts this out.
But for the most part it’s been like super stable.
Cool, that sounds good. I can keep an eye on those failovers. Also wondering why didn’t you try Load-balancing two AWX’s so that it will be a complete High Availability Architecture. Right now it seems to be an Active-Passive Architecture. Any future plans of adding Load-Balancer? or any tips on that will be helpful too.
So yeah I have thought about that.
What I’m actually doing is using Application Gateways in MS Azure to reverse proxy connections to both awx_web via a VPN connection (over kill, I know)
The benefit here is I interact with AWX on a HTTPS socket, and the backend pools on the gateway have both AWX web servers as targets so if one goes down the app gateway automatically reroutes the request to the surviving server.
Check out the video I did 😁
https://youtu.be/z7Zs-TmqlME
Actually, it’s not recovered I’m still getting the same issue some how. when I open awx_web in browser it says “SERVER ERROR : A server error has occurred ”
Before 5432 is closed in my local, now I opened it but still no change.
DO you think there should be couple more changes if we deploy AWX in localhost using external postgre?
Hey I’m not so sure. What you’re trying to do is a little strange for me.
The localhost implementation usually brings up an additional docker container for the postgres db.
There might be some conflicting logic built into the code when you set the DB connection string as ‘localhost’ and running postgres outside of docker.
That was helpful thanks man. I just followed the same in localhost with external DB but my awx_task and awx_web is failing saying ” Is this server running on “localhost” (::1) and accepting TCP/IP connections on port 5432? “.
Hey Naren
Thanks for dropping by!
Just to confirm a few things.
In the awx inventory file, did you uncomment and set the value for “pg_hostname=”
Also, did you connect to your external postgres host and create the awx database and database user for awx_web and awx_task to connect to?
If that’s all done, I would make sure that you can connect to the postgres host on port 5432 from the actual awx server (tower host) using a telnet command like
telnet my.postgres.domain 5432.
If you’ve used Juju to bring up the postgres host, make sure you’re setting the extra_pg_auth string on the charm and not manually on the host, because jujud will overwrite your settings.
Yup, uncommented and gave the details as ,
pg_hostname: localhost ( as I’m using local postgres and awx deployed in my localhost macos )
pg_username=awx
pg_password=””
pg_database=awx
pg_port=5432
I didn’t understood when you said ” Also, did you connect to your external postgres host and create the awx database and database user for awx_web and awx_task to connect to? ”
I changed pf.conf and opened 5432 port and i can see the status of port is open.
I can still these error logs of awx_task,
“`
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File “/usr/bin/awx-manage”, line 11, in
load_entry_point(‘awx==6.0.0.0’, ‘console_scripts’, ‘awx-manage’)()
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/__init__.py”, line 140, in manage
execute_from_command_line(sys.argv)
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/__init__.py”, line 364, in execute_from_command_line
utility.execute()
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/__init__.py”, line 356, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/base.py”, line 283, in run_from_argv
self.execute(*args, **cmd_options)
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/base.py”, line 330, in execute
output = self.handle(*args, **options)
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/management/commands/run_dispatcher.py”, line 123, in handle
reaper.reap()
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/reaper.py”, line 36, in reap
me = instance or Instance.objects.me()
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/managers.py”, line 114, in me
if node.exists():
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/query.py”, line 673, in exists
return self.query.has_results(using=self.db)
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/sql/query.py”, line 517, in has_results
return compiler.has_results()
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/sql/compiler.py”, line 858, in has_results
return bool(self.execute_sql(SINGLE))
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/sql/compiler.py”, line 887, in execute_sql
cursor = self.connection.cursor()
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/base/base.py”, line 254, in cursor
return self._cursor()
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/base/base.py”, line 229, in _cursor
self.ensure_connection()
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/base/base.py”, line 213, in ensure_connection
self.connect()
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/utils.py”, line 94, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/utils/six.py”, line 685, in reraise
raise value.with_traceback(tb)
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/base/base.py”, line 213, in ensure_connection
self.connect()
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/base/base.py”, line 189, in connect
self.connection = self.get_new_connection(conn_params)
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/postgresql/base.py”, line 176, in get_new_connection
connection = Database.connect(**conn_params)
File “/var/lib/awx/venv/awx/lib64/python3.6/site-packages/psycopg2/__init__.py”, line 130, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: could not connect to server: Connection refused
Is the server running on host “localhost” (127.0.0.1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Cannot assign requested address
Is the server running on host “localhost” (::1) and accepting
TCP/IP connections on port 5432?
“`
My bad, I just did how you created DB server, user with permissions and DB all named with awx
Ah awesome! Glad you figured it out 😀
Happy hacking, have fun!