Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Unarchive Different Files in Different Servers in Just One Shot

DZone's Guide to

How to Unarchive Different Files in Different Servers in Just One Shot

Learn how to set up automation for unarchiving different files on different servers using Ansible, to save you a lot of time.

· DevOps Zone
Free Resource

Learn more about how CareerBuilder was able to resolve customer issues 5x faster by using Scalyr, the fastest log management tool on the market. 

It would be simpler if you had to unarchive just one file in several servers, but what about different files on different servers? A sysadmin friend of mine reached out me with such challenge: once, quite often, he had to place specific files in a bunch of servers for monitoring purposes.

He had a routine to package all the needed files for each server, in TAR.GZ files. After the packaging step, he put all the tarball files in an Apache server, in a way they could be accessed for downloading, each one by an URL. Finally, no matter how long it would take, he logged in server by server, downloaded the specific compressed file, and extracted it to a directory.  Needless to say, there was a better way.

The solution can be checked out on GitHub. It was developed using Ansible and tested in a VM environment built using Vagrant and the VirtualBox hypervisor. The details are shown below.

The Environment

In order to simulate my friend’s environment, 3 VMs were used: 1 representing the Apache server, called repo, and 2 representing the different servers: server1 and server2. Each one received an IP address, and the communication between them was established through a private network. Vagrant was the VM management tool used to turn them all on in just one command: vagrant up.  The Vagrantfile below was required by Vagrant to do such task.

Vagrant.configure("2") do |config|
  config.vm.box = "minimal/trusty64"

  config.vm.define "repo" do |repo|
    repo.vm.hostname = "repo.local"
    repo.vm.network "private_network", ip: "192.168.33.10"
    repo.vm.provision "ansible" do |ansible|
      ansible.playbook = "playbook-repo.yml"
    end
  end

  config.vm.define "server1" do |server1|
    server1.vm.hostname = "server1.local"
    server1.vm.network "private_network", ip: "192.168.33.20"
  end

  config.vm.define "server2" do |server2|
    server2.vm.hostname = "server2.local"
    server2.vm.network "private_network", ip: "192.168.33.30"
  end
end

Notice that in the Vagrantfile were defined:

  • The VM image (box) to be used: minimal/trusty64 (requires the Oracle VM VirtualBox Extension Pack), with a reduced version of Ubuntu (faster download and boot);

  • The hostname and the IP of each VM, including how they communicate with each other: private_network;

  • The provisioning of the repo VM, done by Ansible, automation tool required to be installed in the Vagrant host machine beforehand.

The Repo Server Provisioning

The repo server is provisioned by Ansible during the vagrant up execution. The Apache HTTP Server is installed and 2 compressed files are obtained from the Internet. The objective is to make the files available for downloading internally, by their URLs. The playbook-repo.yml below is executed by Ansible in order to do such task.

---
- hosts: repo
  become: yes
  gather_facts: no
  tasks:
  - name: Install Apache 2
    apt:
      name: apache2
      update_cache: yes
  - name: Download files
    get_url:
      url: "{{item.url}}"
      dest: "/var/www/html/{{item.dest}}"
    with_items: [{"url": "https://archive.apache.org/dist/maven/maven-3/3.5.0/binaries/apache-maven-3.5.0-bin.tar.gz", "dest": "server1.tar.gz"},
                 {"url": "https://archive.apache.org/dist/ant/binaries/apache-ant-1.10.1-bin.zip", "dest": "server2.zip"}]

Some details about the playbook-repo.yml execution:

  • The VM user must become root, in order to install the Apache Server, hence the become: yes;
  • Ansible by default collects information about the target host. It’s an initial step before the task's execution. When such information is not necessary, the step can be bypassed. The gather_facts : no, in this case, is recommended to save time, too;
  • The installation of the Apache Server was done through apt_get, the package management tool of Ubuntu. If the OS were CentOS, for example, it could be installed through yum;
  • Both files are downloaded in just one task. It’s possible because Ansible allows the use of loops, through the with_items statement.

The playbook-servers.yml Execution

Ansible can be used for executing tasks in several target hosts in just one shot. It’s possible because of the inventory file, where groups of hosts can be defined. In the hosts file below was defined the servers group, composed by  server1 (192.168.33.20) and server2 (192.168.33.30).

[repo]
192.168.33.10

[servers]
192.168.33.20
192.168.33.30

An important part of the solution was separate all the needed parameters in a specific file, called params.json. In this file, each server has its compressed file URL defined, as long as its target directory, where the downloaded file will be extracted, like shown below. Notice that both URLs point to the repo server (192.168.33.10), and each one to the file previously provided during the provisioning phase.

[
  {
    "host": "server1",
    "url": "http://192.168.33.10/server1.tar.gz",
    "target": "/var/target"
  },
  {
    "host": "server2",
    "url": "http://192.168.33.10/server2.zip",
    "target": "/var/target"
  }
]

With the environment up and the parameters defined, we can finally unarchive different files in different servers in just one shot, executing the command ansible-playbook playbook-servers.yml -u vagrant -k -i hosts. The -u argument defines the SSH user, the -k argument prompts for password input (vagrant, too), and the -i argument points to the hosts file, commented earlier, instead of the default /etc/ansible/hosts.

---
- hosts: servers
  become: yes
  vars:
    hostname: "{{ansible_hostname}}"
    params: "{{lookup('file', 'params.json')}}"
    url_query: "[?host=='{{hostname}}'].url"
    url_param: "{{(params|json_query(url_query))[0]}}"
    target_query: "[?host=='{{hostname}}'].target"
    target_param: "{{(params|json_query(target_query))[0]}}"
  tasks:
  - name: Create the target directory if it doesn't exist
    file:
      path: "{{target_param}}"
      state: directory
  - name: Install unzip
    apt:
      name: unzip
      update_cache: yes
    when: url_param | match(".*\.zip$")
  - name: Unarchive from url
    unarchive:
      src: "{{url_param}}"
      dest: "{{target_param}}"
      remote_src: yes

Some details about the playbook-servers.yml execution:

  • By pointing to the group servers (hosts: servers), Ansible is able to execute the same playbook for both servers: server1 and server2;
  • The parameters of each server are obtained through variables:
    • hostname – the name of the current host found by Ansible during the gathering facts phase;
    • params – the params.json file content, returned by the lookup function;
    • url_query – the query to find the URL parameter defined for the current host;
    • url_param – the URL parameter defined for the current host, returned by the json_query filter;
    • target_query – the query to find the target parameter defined for the current host;
    • target_param – the target directory defined for the current host, returned by the json_query filter.
  • The target directory is created, if it doesn’t exist yet. It’s required by the unarchive task. Otherwise, an error occurs;
  • The unzip tool is installed, only if the remote file has the extension ZIP. This step is necessary because that’s the case of the server2’s remote file, and the subsequent unarchive task can extract files compressed through different algorithms. If the when statement condition is not met, the task is skipped;
  • Finally, the compressed file is downloaded from the repo server and extracted to the target directory.
ansible-playbook playbook-servers.yml -u vagrant -k -i hosts
SSH password: 

PLAY [servers] *********************************************************************************************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************************************************************************
ok: [192.168.33.30]
ok: [192.168.33.20]

TASK [Create the target directory if it doesn't exist] *****************************************************************************************************************************************************
changed: [192.168.33.20]
changed: [192.168.33.30]

TASK [Install unzip] ***************************************************************************************************************************************************************************************
skipping: [192.168.33.20]
changed: [192.168.33.30]

TASK [Unarchive from url] **********************************************************************************************************************************************************************************
changed: [192.168.33.20]
changed: [192.168.33.30]

PLAY RECAP *************************************************************************************************************************************************************************************************
192.168.33.20              : ok=3    changed=2    unreachable=0    failed=0   
192.168.33.30              : ok=4    changed=3    unreachable=0    failed=0

Conclusion

My friend became really happy to save a lot of his time using such automation, and I’m sure other sysadmins with the same or similar tasks can benefit from it. So, if you enjoyed the solution, or think it’s useful for some friend of yours, don’t hesitate and share it.

Regardless its utility, bear in mind this solution is a work in progress, so feel free to collaborate and to improve it. After all, that’s the open source way.

Finally, if you want my help in automating something, please give me more details, tell me your problem. It may be a problem of someone else too.

Find out more about how Scalyr built a proprietary database that does not use text indexing for their log management tool.

Topics:
ansible ,automation ,vagrant ,sysadmin ,infracode ,devops

Published at DZone with permission of Gustavo Carmo. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}