When managing web filtering whitelists in SquidGuard, you might encounter a frustrating behavior: if both example.com
and www.example.com
exist in your whitelist, SquidGuard will only honor the more specific www.example.com
entry. This becomes problematic when you need to whitelist entire domains.
Sorting domains from TLD to left helps visually identify and merge overlapping entries. For example:
original order:
example.com
www.example.com
reverse-sorted order:
www.example.com
example.com
The sorted version makes it immediately obvious that example.com
covers both.
Here are several ways to achieve this sorting:
Using Python
def reverse_domain_sort(domains):
return sorted(domains,
key=lambda x: list(reversed(x.split('.'))))
domains = [
"www.activityvillage.co.uk",
"ajax.googleapis.com",
# ... other domains ...
]
sorted_domains = reverse_domain_sort(domains)
print('\n'.join(sorted_domains))
Using AWK (Unix/Linux)
awk '{
split($0, arr, ".");
for (i=length(arr); i>=1; i--) {
printf "%s", arr[i];
if (i>1) printf ".";
}
printf "\t%s\n", $0;
}' domains.txt | sort | cut -f2
Using PowerShell (Windows)
$domains = Get-Content .\domains.txt
$domains | Sort-Object {
$parts = $_.Split('.')
[array]::Reverse($parts)
$parts -join '.'
}
Consider these special cases in your implementation:
- Internationalized domain names (IDNs)
- Domains with trailing dots
- Case sensitivity (though DNS is case-insensitive)
After sorting, you can more easily:
- Identify redundant entries
- Merge overlapping domains
- Spot potential conflicts
For example, after sorting you might find:
sub.domain.com
domain.com
This clearly shows that domain.com
already covers all subdomains.
For large whitelists (10,000+ domains):
- Python is generally fastest
- AWK handles medium files well
- PowerShell may be slower for very large files
When managing SquidGuard whitelists, we often encounter situations where domain entries like example.com
and www.example.com
coexist. Due to SquidGuard's matching behavior, this can lead to unexpected access restrictions. The solution requires sorting domains by their components in reverse order (from TLD to subdomain).
Consider these example domains:
www.activityvillage.co.uk
ajax.googleapis.com
akhet.co.uk
When sorted traditionally (left-to-right), we get alphabetical ordering that doesn't reflect the actual domain hierarchy. What we need is:
chrome.angrybirds.com
crl.godaddy.com
ajax.googleapis.com
www.activityvillage.co.uk
akhet.co.uk
bbc.co.uk
Here are three practical solutions:
1. Using awk for Quick Sorting
awk -F. '{
printf "%s", $NF;
for (i=NF-1; i>=1; i--) {
printf ".%s", $i
}
printf "\t%s\n", $0
}' domains.txt | sort | cut -f2
2. Python Implementation
def reverse_domain(domain):
parts = domain.split('.')
return '.'.join(reversed(parts))
with open('domains.txt') as f:
domains = [line.strip() for line in f if line.strip()]
sorted_domains = sorted(domains, key=reverse_domain)
for domain in sorted_domains:
print(domain)
3. PowerShell Solution
Get-Content .\domains.txt |
ForEach-Object {
$parts = $_.Split('.')
[array]::Reverse($parts)
New-Object PSObject -Property @{
Original = $_
Reversed = $parts -join '.'
}
} |
Sort-Object -Property Reversed |
Select-Object -ExpandProperty Original
After sorting, you can easily spot and remove redundant entries. For example:
# Before sorting
example.com
www.example.com
sub.www.example.com
# After sorting
sub.www.example.com
www.example.com
example.com
This visual grouping makes it obvious which domains might be causing conflicts in your whitelist.
For large lists (100k+ domains), consider these optimizations:
- Use parallel processing in Python with multiprocessing
- For extremely large files, implement external sorting
- Add validation to skip malformed domains
Here's an optimized Python version for large datasets:
import concurrent.futures
def process_domain(domain):
try:
return ('.'.join(reversed(domain.strip().split('.'))), domain.strip())
except:
return None
with open('large_domains.txt') as f:
with concurrent.futures.ThreadPoolExecutor() as executor:
results = list(filter(None, executor.map(process_domain, f)))
sorted_domains = [d[1] for d in sorted(results, key=lambda x: x[0])]