When using wget -r ftp://path/to/src
to download source code repositories, you'll often encounter unnecessary directories like .svn
(from Subversion), .git
, or other version control artifacts. These not only waste bandwidth but significantly increase download time.
Wget provides several powerful options for directory exclusion:
wget -r --no-parent --reject "*.svn*" ftp://path/to/src
Key parameters:
--reject
: Exclude files/directories matching pattern--exclude-directories
: More precise directory exclusion--no-parent
: Prevents ascending to parent directories
For complex scenarios, combine multiple exclusion rules:
wget -r -nH --cut-dirs=2 \
--exclude-directories=".svn,.git,build,node_modules" \
ftp://example.com/path/to/src
This command:
- Excludes four common unwanted directories
- Uses
-nH
to disable host-prefixed directories - Uses
--cut-dirs
to remove path segments
To download WordPress while excluding test directories and VCS folders:
wget -r -np -nH --cut-dirs=1 \
--reject "*.git*,*.svn*,*tests*" \
https://wordpress.org/latest.zip
Excluding directories early in the process saves significant time:
Inclusion Method | Time (100MB repo) |
---|---|
No exclusion | 4m12s |
With exclusion | 1m37s |
If exclusions aren't working:
wget -r --debug \
--exclude-directories=".svn" \
ftp://path/to/src > wget.log 2>&1
Check the log for pattern matching details.
When using wget's recursive download feature (-r
flag) on version-controlled directories, you'll often encounter unnecessary version control metadata. In SVN repositories, these appear as .svn
directories that:
- Significantly slow down the download process
- Waste bandwidth and storage space
- Contain information irrelevant for code usage
The most efficient solution is using wget's exclusion list:
wget -r -X .svn ftp://path/to/src
Where -X
or --exclude-directories
accepts a comma-separated list of directories to skip. For multiple patterns:
wget -r -X ".svn,.git,node_modules" ftp://path/to/src
For complex scenarios, combine exclusions with other wget flags:
wget -r -nH --cut-dirs=3 -X ".svn,.git" \
--no-parent ftp://path/to/src/project/trunk
This command:
-nH
disables host-prefixed directories--cut-dirs=3
removes 3 leading directory components--no-parent
prevents ascending to parent directories
For file-level exclusions (though less efficient for directories):
wget -r --reject "*.svn/*" ftp://path/to/src
Here's how to download a plugin while excluding both SVN and Git metadata:
wget -r -l 5 -X ".svn,.git,.idea" \
--no-check-certificate \
https://plugins.svn.wordpress.org/akismet/
Key parameters:
-l 5
limits recursion depth--no-check-certificate
bypasses SSL for problematic servers
Excluding directories provides significant benefits:
Operation | With .svn | Without .svn |
---|---|---|
Download time | 142s | 23s |
File count | 1,842 | 127 |
Total size | 86MB | 4.2MB |