How to Bulk Fetch Files from Multiple Remote Servers to Local Using Ansible


1 views

When managing infrastructure at scale, you'll often encounter situations where you need to retrieve log files, configuration backups, or diagnostic data from multiple servers. Ansible's built-in fetch module has a significant limitation - it only handles single-file transfers per task execution.

# This only fetches one file
- name: Fetch single file
  ansible.builtin.fetch:
    src: /var/log/app/error.log
    dest: /backup/logs/
    flat: yes

1. Using find + fetch with register

This approach dynamically discovers files and processes them sequentially:

- name: Find all target files
  ansible.builtin.find:
    paths: /var/log/app/
    patterns: "*.log"
    recurse: no
  register: found_files

- name: Fetch all matched files
  ansible.builtin.fetch:
    src: "{{ item.path }}"
    dest: "/backup/logs/{{ inventory_hostname }}/"
    flat: yes
  loop: "{{ found_files.files }}"

2. Parallel Transfer with async and async_status

For large file collections across many hosts, parallel execution improves performance:

- name: Async fetch operation
  ansible.builtin.fetch:
    src: "{{ item }}"
    dest: "/backup/{{ inventory_hostname }}/"
  loop: "{{ files_to_fetch }}"
  async: 45
  poll: 0
  register: async_results

- name: Check async tasks
  ansible.builtin.async_status:
    jid: "{{ item.ansible_job_id }}"
  loop: "{{ async_results.results }}"
  register: async_poll_results
  until: async_poll_results.finished
  retries: 30

3. Custom Module for Complex Scenarios

When you need advanced features like compression or delta transfers:

# my_fetch_all.py (custom module)
from ansible.module_utils.basic import AnsibleModule
import os
import shutil

def main():
    module = AnsibleModule(
        argument_spec=dict(
            src_dir=dict(type='str', required=True),
            dest_base=dict(type='str', required=True),
            pattern=dict(type='str', default='*')
        )
    )
    # Implementation logic here
    # ...
  • Always include inventory_hostname in destination paths to avoid conflicts
  • Set appropriate file permissions with mode parameter
  • For large transfers, consider using throttle to limit bandwidth
  • Implement proper error handling with ignore_errors and failed_when

When dealing with hundreds of servers:

- name: Optimized batch fetch
  ansible.builtin.fetch:
    src: "/data/{{ item }}"
    dest: "/archive/{{ inventory_hostname }}/"
  loop: "{{ query('fileglob', '/data/*.tar.gz') }}"
  throttle: 10
  become: false  # Reduces privilege escalation overhead
  vars:
    ansible_ssh_pipelining: true
    ansible_scp_if_ssh: true

When managing multiple servers, there's often a need to collect log files, configuration backups, or diagnostic data from identical directory structures across your infrastructure. While Ansible's fetch module works perfectly for single files, it presents limitations when dealing with multiple files or entire directories.


- name: Gather multiple files using find + fetch combo
  hosts: webservers
  tasks:
    - name: Find all .log files in /var/log/app/
      ansible.builtin.find:
        paths: /var/log/app/
        patterns: "*.log"
        recurse: yes
      register: found_files

    - name: Fetch each found file
      ansible.builtin.fetch:
        src: "{{ item.path }}"
        dest: "/tmp/ansible_fetched/{{ inventory_hostname }}/"
        flat: yes
      loop: "{{ found_files.files }}"

The solution combines two powerful Ansible modules:

  • find: Recursively locates files matching specific patterns
  • fetch: Transfers files while maintaining host-based directory structure

For more complex file selection criteria:


- name: Find files modified in last 24 hours
  ansible.builtin.find:
    paths: /opt/backups/
    age: "-1d"
    size: "+1M"
    file_type: "file"
  register: recent_backups

When dealing with hundreds of files across dozens of servers:

  • Use throttle parameter to limit concurrent transfers
  • Consider async mode for long-running operations
  • Implement serial execution for resource-constrained environments

Make your playbook resilient with proper error handling:


- name: Safely fetch files with error tolerance
  ansible.builtin.fetch:
    src: "{{ item.path }}"
    dest: "/collected/{{ inventory_hostname }}/"
  loop: "{{ found_files.files }}"
  ignore_errors: yes
  when: found_files.matched > 0

For collecting Apache access logs from a web server cluster:


- name: Collect rotated access logs
  ansible.builtin.find:
    paths: /var/log/httpd/
    patterns: "access.log*"
  register: apache_logs

- name: Fetch logs with date-based organization
  ansible.builtin.fetch:
    src: "{{ item.path }}"
    dest: "/analytics/logs/{{ inventory_hostname }}/{{ ansible_date_time.date }}/"
    flat: no
  loop: "{{ apache_logs.files }}"