Many developers encounter situations where they need to process spreadsheet files on headless servers. The specific case of converting Excel XML (SpreadsheetML) to CSV is particularly tricky because:
- Most command-line tools fail to properly handle Excel XML format
- LibreOffice/OpenOffice conversions often alter data formats
- Python libraries like pandas have limited XML Spreadsheet support
Gnumeric's ssconvert handles this conversion exceptionally well because:
ssconvert --list-exporters
# Shows supported formats including:
# Gnumeric_stf:stf_csv (for CSV output)
# Gnumeric_stf:stf_excel_xml (for Excel XML input)
For Ubuntu 12.04 (Precise Pangolin), here's how to install just the essentials:
sudo apt-get update
sudo apt-get install --no-install-recommends gnumeric-common
Then force-install the binary package:
sudo apt-get install --no-install-recommends gnumeric --force-yes
Check if ssconvert works without GUI dependencies:
ssconvert --version
# Should output version info without errors
Convert Excel XML to CSV (handling tricky cases):
ssconvert -S --export-type=Gnumeric_stf:stf_csv \
--export-options="separator=, quote=\" eol=unix" \
input.xml output.csv
Key parameters explained:
-S
: Split output per sheetseparator=,
: Force comma delimiterquote=\"
: Proper quote escapingeol=unix
: UNIX line endings
If you encounter library errors, install these additional packages:
sudo apt-get install libgoffice-0.8-8 libgsf-1-114
For missing locale warnings (common on servers):
export LC_ALL=C
ssconvert yourfile.xml output.csv
For cleaner isolation (especially on newer Ubuntu versions):
docker run --rm -v $(pwd):/data ubuntu:12.04 \
bash -c "apt-get update && \
apt-get install -y gnumeric && \
ssconvert /data/input.xml /data/output.csv"
For bulk processing many files, wrap ssconvert in a script:
#!/bin/bash
for xml_file in *.xml; do
base_name="${xml_file%.*}"
ssconvert "$xml_file" "${base_name}.csv"
done
When working with financial data pipelines on Ubuntu servers, we often encounter Excel XML files that need conversion to CSV format. While tools like libreoffice --headless
exist, they struggle with Excel XML (SpreadsheetML) format specifically. Gnumeric's ssconvert
handles this conversion perfectly, but installing it on headless servers presents unique challenges since it's typically packaged as part of the GNOME desktop environment.
For Ubuntu 12.04 LTS (Precise Pangolin), the solution lies in installing only the required dependencies without pulling in the entire GNOME stack:
sudo apt-get update
sudo apt-get install --no-install-recommends gnumeric
The --no-install-recommends
flag is crucial here - it prevents installation of recommended packages like the GNOME desktop environment while still pulling in essential dependencies.
After installation, verify ssconvert works in headless mode:
ssconvert --version
# Expected output: ssconvert (Gnumeric) 1.10.17
Here are some common conversion scenarios with actual usage examples:
# Basic Excel XML to CSV conversion
ssconvert input.xml output.csv
# Convert specific sheet (Sheet2) to CSV
ssconvert --export-options "sheet=Sheet2" input.xml sheet2_output.csv
# Batch conversion in a directory
find /data/exports -name "*.xml" -exec ssconvert {} {}.csv \;
For production environments, consider this Python wrapper that adds error handling:
import subprocess
import os
def convert_xml_to_csv(input_path, output_path):
try:
subprocess.run(
['ssconvert', input_path, output_path],
check=True,
stderr=subprocess.PIPE
)
return True
except subprocess.CalledProcessError as e:
print(f"Conversion failed: {e.stderr.decode()}")
return False
If you encounter "cannot open display" errors despite being on a headless server, set this environment variable:
export DISPLAY=:0
For more complex Excel XML files, you might need additional dependencies:
sudo apt-get install libxml-libxml-perl libarchive-zip-perl