How to Extract Repository Name from GitHub URL in Bash: A Robust Solution for All URL Formats


23 views

When working with GitHub repositories in bash scripts, you often need to extract just the repository name from various URL formats. GitHub URLs can come in several forms:

git://github.com/user/repo.git
git@github.com:user/repo.git
https://github.com/user/repo.git

The challenge is to create a bash solution that reliably extracts "repo" from any of these formats.

Here's a robust bash function that handles all common GitHub URL formats:

extract_repo_name() {
    local url="$1"
    # Remove protocol prefixes
    url=${url#git://}
    url=${url#git@}
    url=${url#https://}
    # Remove domain part
    url=${url#github.com[/:]}
    # Remove .git suffix if present
    url=${url%.git}
    # Extract the last part after /
    repo_name=${url##*/}
    echo "$repo_name"
}

Let's verify it works with all URL formats:

extract_repo_name "git://github.com/some-user/my-repo.git"
# Output: my-repo

extract_repo_name "git@github.com:some-user/my-repo.git"
# Output: my-repo

extract_repo_name "https://github.com/some-user/my-repo.git"
# Output: my-repo

For those who prefer a one-liner using sed:

echo "git@github.com:some-user/my-repo.git" | sed -E 's/.*github.com[/:][^/]*\/([^.]*).*/\1/'

The solution handles most cases, but you might want to add validation for:

  • URLs without .git extension
  • URLs with multiple path segments
  • Invalid URLs

This technique is useful for:

  • Automating git operations in scripts
  • Generating local directory names from URLs
  • Creating log messages with repository names

When working with GitHub repositories in bash scripts, we commonly encounter three main URL formats:

git://github.com/user/repo.git
git@github.com:user/repo.git
https://github.com/user/repo.git

The key challenge is creating a solution that works reliably across all these formats while handling edge cases like:

  • Different protocol prefixes (git://, git@, https://)
  • Optional .git suffix
  • Potential subdirectories or branch names

Here's a comprehensive bash function that handles all cases:

extract_repo_name() {
    local url=$1
    # Remove protocol prefixes
    url=${url#git://}
    url=${url#git@}
    url=${url#https://}
    # Remove domain part
    url=${url#github.com[:/]}
    # Remove .git suffix if present
    url=${url%.git}
    # Extract the last path component
    repo_name=${url##*/}
    echo "$repo_name"
}

Let's verify it works with all URL formats:

# Test cases
extract_repo_name "git://github.com/some-user/my-repo.git"  # Output: my-repo
extract_repo_name "git@github.com:some-user/my-repo.git"    # Output: my-repo
extract_repo_name "https://github.com/some-user/my-repo.git" # Output: my-repo
extract_repo_name "https://github.com/org/another.repo"     # Output: another.repo

For those who prefer regular expressions:

extract_with_regex() {
    [[ $1 =~ ([^/:]+)/?$ ]] && echo "${BASH_REMATCH[1]%.git}"
}

The solution should also work with:

  • URLs without .git suffix
  • URLs with additional path components
  • URLs with port numbers or authentication
# Additional test cases
extract_repo_name "git@github.com:user/repo"          # no .git
extract_repo_name "https://github.com/org/subdir/repo.git" # with subdir

The parameter expansion method is generally faster than regex for simple cases. Benchmark with:

time for i in {1..1000}; do extract_repo_name "https://github.com/user/repo.git"; done