How to Automatically Mirror and Sync GitHub Repository to GitLab (OpenStack Nova Example)


1 views

When working with large open-source projects like OpenStack Nova, maintaining a synchronized mirror in your GitLab instance becomes crucial for development workflows. The primary repository at https://github.com/openstack/nova receives frequent updates, and keeping your local GitLab copy current requires an automated solution.

Before implementing the sync solution, ensure you have:

  • Admin access to your GitLab instance
  • SSH keys configured for repository access
  • Cron or similar scheduling capability
  • The original GitHub repository cloned in your GitLab

Here's the complete solution using Git's native capabilities combined with cron:

#!/bin/bash
# Sync script for nova repository
REPO_DIR="/path/to/your/local/nova/repo"
GITLAB_REMOTE="git@your-gitlab-instance.com:your-group/nova.git"

cd $REPO_DIR || exit
git fetch github
git push --mirror $GITLAB_REMOTE

To schedule hourly updates using cron:

# Edit cron jobs
crontab -e

# Add this line for hourly sync
0 * * * * /path/to/your/sync-script.sh > /var/log/nova-sync.log 2>&1

For GitLab Premium instances, you can use the built-in repository mirroring:

  1. Navigate to your project in GitLab
  2. Go to Settings > Repository
  3. Expand "Mirroring repositories"
  4. Enter the GitHub repository URL
  5. Set update frequency (hourly recommended)
  6. Provide authentication details

Authentication failures: Ensure your SSH keys are properly configured in both GitHub and GitLab.

Merge conflicts: For projects with active development, consider adding conflict resolution to your script:

git reset --hard github/master
git push --force $GITLAB_REMOTE

Large repository handling: For massive repos like Nova, add these git config settings:

git config --global pack.windowMemory "100m"
git config --global pack.packSizeLimit "100m"
git config --global pack.threads "1"

Implement logging to track synchronization:

# Enhanced sync script with logging
TIMESTAMP=$(date +"%Y-%m-%d %T")
echo "[$TIMESTAMP] Starting sync" >> /var/log/nova-sync.log
git fetch github 2>&1 | tee -a /var/log/nova-sync.log
git push --mirror $GITLAB_REMOTE 2>&1 | tee -a /var/log/nova-sync.log
echo "[$TIMESTAMP] Sync completed" >> /var/log/nova-sync.log

When working with large open-source projects like OpenStack Nova, maintaining an up-to-date local copy in your GitLab instance becomes crucial for development and testing. The challenge lies in establishing a reliable synchronization mechanism that doesn't require manual intervention.

First, ensure you've created the initial mirror in your GitLab instance. Here's how we did it initially:

git clone --mirror https://github.com/openstack/nova.git
cd nova.git
git remote set-url --push origin http://your-gitlab-instance/namespace/nova.git
git push --mirror

The most robust solution involves creating a scheduled task that runs the synchronization at your desired interval (hourly/daily). Here's a script you can use:

#!/bin/bash

# Configuration
GITHUB_REPO="https://github.com/openstack/nova.git"
LOCAL_REPO="/path/to/your/local/nova.git"
GITLAB_REPO="http://your-gitlab-instance/namespace/nova.git"
LOG_FILE="/var/log/github-to-gitlab-sync.log"

# Sync function
sync_repo() {
    cd $LOCAL_REPO || exit 1
    git remote update &>> $LOG_FILE
    if [ $? -ne 0 ]; then
        echo "$(date) - Failed to fetch updates from GitHub" >> $LOG_FILE
        exit 1
    fi
    
    git push --mirror $GITLAB_REPO &>> $LOG_FILE
    if [ $? -ne 0 ]; then
        echo "$(date) - Failed to push updates to GitLab" >> $LOG_FILE
        exit 1
    fi
    
    echo "$(date) - Successfully synchronized repository" >> $LOG_FILE
}

# Execute sync
sync_repo

For hourly updates, add this to your crontab:

0 * * * * /path/to/your/sync-script.sh

For daily updates at midnight:

0 0 * * * /path/to/your/sync-script.sh

OpenStack Nova is a large repository. Consider these optimizations:

  • Use git config --global pack.windowMemory 256m to limit memory usage
  • Set git config --global pack.packSizeLimit 256m to limit pack size
  • Add --depth=1 if you only need recent history

Enhance the script with better error handling and notifications:

#!/bin/bash
# Previous configuration remains

send_alert() {
    # Implement your notification method (email, Slack, etc.)
    echo "$1" | mail -s "GitHub-GitLab Sync Error" admin@example.com
}

sync_repo() {
    cd $LOCAL_REPO || { send_alert "Cannot access local repo"; exit 1; }
    
    if ! git remote update &>> $LOG_FILE; then
        send_alert "Failed to fetch from GitHub"
        exit 1
    fi
    
    if ! git push --mirror $GITLAB_REPO &>> $LOG_FILE; then
        send_alert "Failed to push to GitLab"
        exit 1
    fi
}

If you're using GitLab, consider setting up a pipeline for synchronization:

sync_job:
  image: alpine/git
  script:
    - git clone --mirror $GITHUB_REPO
    - cd nova.git
    - git remote set-url --push origin $GITLAB_REPO
    - git push --mirror
  only:
    - schedules

Configure the pipeline to run on a schedule through GitLab's UI.