Efficiently Retrieve Last Modified Dates for All Files in a Git Repository


2 views

When working with version control, developers often need to analyze file modification patterns across an entire codebase. While checking a single file's last modified date is straightforward, doing this at scale requires a more sophisticated approach.

For individual files, we typically use:

git log -1 --format="%ad" -- path/to/file

This outputs the author date (%ad) of the most recent commit (-1) affecting the specified file.

To process all files in the repository efficiently, we can combine Git commands with shell scripting:

git ls-files | while read -r file; do
    printf "%-40s %s\\n" "$file" "$(git log -1 --format="%ad" -- "$file")"
done

For better readability and machine processing:

git ls-files -z | xargs -0 -I{} sh -c '
    printf "%s\\t%s\\n" "{}" "$(git log -1 --format="%ai" -- "{}")"
'

For large repositories, consider these optimizations:

  • Limit to specific file types: git ls-files '*.js'
  • Process in parallel: parallel -j4 with GNU Parallel
  • Cache results for repeated queries

For absolute performance, use lower-level Git commands:

git rev-list --all --format='%H %ai' | while read commit date; do
    git diff-tree --no-commit-id --name-only -r $commit | \
    xargs -I{} echo "{} $date"
done | sort -u

This technique is useful for:

  • Codebase archaeology and change analysis
  • Build system optimizations
  • Documentation generation
  • Cache invalidation strategies

When working with Git repositories, tracking file modification history is crucial for version control and collaboration. While checking the last modified date of a single file is straightforward, retrieving this information for all files requires a more systematic approach.

For individual files, the command is:

git log -1 --format="%ad" -- path/to/file

This outputs the author date of the last commit that modified the specified file.

To get modification dates for all tracked files in the repository:

git ls-files | while read -r file; do
    printf "%-40s %s\n" "$file" "$(git log -1 --format="%ad" -- "$file")"
done

A more comprehensive solution that includes commit hashes and relative dates:

git ls-files -z | xargs -0 -I{} sh -c 'printf "%s\t%s\t%s\n" "$(git log -1 --format="%h %ad" -- "{}")" "{}"'

To sort files by modification date (newest first):

git ls-files | while read -r file; do
    printf "%s %s\n" "$(git log -1 --format="%ct" -- "$file")" "$file"
done | sort -nr | cut -d' ' -f2- | while read -r file; do
    printf "%-40s %s\n" "$file" "$(git log -1 --format="%ad" -- "$file")"
done

For repositories with non-ASCII filenames or special characters:

git ls-files -z | xargs -0 -I{} sh -c 'printf "%s\0%s\0" "$(git log -1 --format="%ad" -- "{}")" "{}"' | 
awk 'BEGIN {RS="\0"; FS="\0"}; {print $2, $1}'

For better performance in large repositories:

git rev-list --all --objects | 
awk '{print $1}' | 
git cat-file --batch-check='%(objectname) %(objecttype)' | 
grep ' blob$' | 
cut -d' ' -f1 | 
while read hash; do
    git log -1 --format="%ad" $hash
    git ls-tree -r --name-only HEAD | grep -Ff <(git rev-list --objects --all | grep $hash | cut -d' ' -f2-)
done