Optimizing List Merging in Ansible: Techniques to Combine and Deduplicate Nested Lists


2 views

When working with complex data structures in Ansible, particularly when dealing with nested lists containing duplicate elements, performance can become a significant concern. The common approach using with_subelements often proves inefficient for large datasets due to its repetitive processing of identical elements.

Consider this typical inventory structure:

my_list:
  - { name: foo, settings: ['x', 'y', 'z'] }      
  - { name: bar, settings: ['x', 'y', 'q', 'w'] }

Using the basic extraction method:

- name: get all settings
  set_fact:
    all_settings: "{{ my_list|map(attribute='settings')|list }}"

This gives us nested lists rather than the desired flattened, deduplicated result.

To efficiently combine and deduplicate these lists, we can leverage Jinja2's powerful filter chain:

- name: combine and deduplicate settings
  set_fact:
    unique_settings: "{{ my_list|map(attribute='settings')|flatten|unique|list }}"

Let's break down the filter sequence:

  1. map(attribute='settings'): Extracts all settings lists
  2. flatten: Combines nested lists into a single list
  3. unique: Removes duplicate elements
  4. list: Ensures the output is a proper list

This approach significantly outperforms with_subelements for several reasons:

  • Single-pass processing of the entire dataset
  • Built-in deduplication during the merge operation
  • No repeated processing of duplicate elements

For more complex scenarios where settings might be deeply nested:

- name: handle nested structures
  set_fact:
    all_settings: "{{ my_list|json_query('[].settings')|flatten|unique|list }}"

If you have the community.general collection installed:

- name: using community.general filters
  set_fact:
    unique_settings: "{{ my_list|map(attribute='settings')|community.general.lists_mergeby('union') }}"

Verify the output with:

- name: display results
  debug:
    var: unique_settings

This should output the desired ['x', 'y', 'z', 'q', 'w'] structure.


When working with Ansible inventories containing nested data structures, we often need to extract and combine elements from multiple lists. The challenge intensifies when dealing with large datasets where performance becomes critical, especially when duplicate elements exist that don't require repeated processing.

Given our example inventory:

my_list:
  - { name: foo, settings: ['x', 'y', 'z'] }      
  - { name: bar, settings: ['x', 'y', 'q', 'w'] }

We want to transform this into a single list containing unique elements: ['x', 'y', 'z', 'q', 'w'].

The straightforward method using map(attribute='settings') gives us nested lists:

- name: get all settings
  set_fact:
    all_settings="{{ my_list|map(attribute='settings')|list }}"

This creates the intermediary structure we need to process further.

An efficient way to combine and deduplicate these lists involves using several Jinja2 filters:

- name: combine and deduplicate settings
  set_fact:
    unique_settings="{{ my_list|map(attribute='settings')|flatten|unique|list }}"
  
- name: display final result
  debug:
    var: unique_settings

Let's examine each filter in the solution:

  1. map(attribute='settings'): Extracts the settings lists
  2. flatten: Combines nested lists into a single list
  3. unique: Removes duplicate elements
  4. list: Ensures proper list formatting

This approach is significantly more efficient than with_subelements for several reasons:

  • Processes each element only once
  • Uses built-in filters optimized for performance
  • Minimizes intermediate data structures

For more complex scenarios where you need to maintain element relationships, consider:

- name: advanced list processing
  set_fact:
    enhanced_result="{{ my_list|json_query('[].settings[]')|unique }}"

This uses JMESPath for more sophisticated querying capabilities.

Always verify your results with test cases:

- name: validate unique settings
  assert:
    that:
      - "'x' in unique_settings"
      - "'w' in unique_settings"
      - "unique_settings|length == 5"

This technique is particularly useful when:

  • Generating configuration files from multiple sources
  • Creating consolidated reports from inventory data
  • Preparing input for other tasks that require unique values