How to Extract Repository Name from GitHub URL in Bash: A Robust Solution for All URL Formats


1 views

When working with GitHub repositories in bash scripts, you often need to extract just the repository name from various URL formats. GitHub URLs can come in several forms:

git://github.com/user/repo.git
git@github.com:user/repo.git
https://github.com/user/repo.git

The challenge is to create a bash solution that reliably extracts "repo" from any of these formats.

Here's a robust bash function that handles all common GitHub URL formats:

extract_repo_name() {
    local url="$1"
    # Remove protocol prefixes
    url=${url#git://}
    url=${url#git@}
    url=${url#https://}
    # Remove domain part
    url=${url#github.com[/:]}
    # Remove .git suffix if present
    url=${url%.git}
    # Extract the last part after /
    repo_name=${url##*/}
    echo "$repo_name"
}

Let's verify it works with all URL formats:

extract_repo_name "git://github.com/some-user/my-repo.git"
# Output: my-repo

extract_repo_name "git@github.com:some-user/my-repo.git"
# Output: my-repo

extract_repo_name "https://github.com/some-user/my-repo.git"
# Output: my-repo

For those who prefer a one-liner using sed:

echo "git@github.com:some-user/my-repo.git" | sed -E 's/.*github.com[/:][^/]*\/([^.]*).*/\1/'

The solution handles most cases, but you might want to add validation for:

  • URLs without .git extension
  • URLs with multiple path segments
  • Invalid URLs

This technique is useful for:

  • Automating git operations in scripts
  • Generating local directory names from URLs
  • Creating log messages with repository names

When working with GitHub repositories in bash scripts, we commonly encounter three main URL formats:

git://github.com/user/repo.git
git@github.com:user/repo.git
https://github.com/user/repo.git

The key challenge is creating a solution that works reliably across all these formats while handling edge cases like:

  • Different protocol prefixes (git://, git@, https://)
  • Optional .git suffix
  • Potential subdirectories or branch names

Here's a comprehensive bash function that handles all cases:

extract_repo_name() {
    local url=$1
    # Remove protocol prefixes
    url=${url#git://}
    url=${url#git@}
    url=${url#https://}
    # Remove domain part
    url=${url#github.com[:/]}
    # Remove .git suffix if present
    url=${url%.git}
    # Extract the last path component
    repo_name=${url##*/}
    echo "$repo_name"
}

Let's verify it works with all URL formats:

# Test cases
extract_repo_name "git://github.com/some-user/my-repo.git"  # Output: my-repo
extract_repo_name "git@github.com:some-user/my-repo.git"    # Output: my-repo
extract_repo_name "https://github.com/some-user/my-repo.git" # Output: my-repo
extract_repo_name "https://github.com/org/another.repo"     # Output: another.repo

For those who prefer regular expressions:

extract_with_regex() {
    [[ $1 =~ ([^/:]+)/?$ ]] && echo "${BASH_REMATCH[1]%.git}"
}

The solution should also work with:

  • URLs without .git suffix
  • URLs with additional path components
  • URLs with port numbers or authentication
# Additional test cases
extract_repo_name "git@github.com:user/repo"          # no .git
extract_repo_name "https://github.com/org/subdir/repo.git" # with subdir

The parameter expansion method is generally faster than regex for simple cases. Benchmark with:

time for i in {1..1000}; do extract_repo_name "https://github.com/user/repo.git"; done