NeuroAgent

How to Merge Git Repositories While Preserving History

Learn how to merge Git repositories while preserving complete history using git subtree and filter-repo methods. Step-by-step guide for integrating projects as subdirectories.

Question

How do you merge two Git repositories while preserving history?

I have developed a small experimental project A in its own Git repository. It has now matured, and I’d like to add it as a subdirectory to a larger project B, which has its own repository. How can I merge repository A into repository B without losing the history of either repository?

NeuroAgent

To merge two Git repositories while preserving history, you can use the git subtree command or the git filter-repo tool to integrate one repository into a subdirectory of another while maintaining complete commit history. The subtree approach is simpler and more widely compatible, while filter-repo offers more advanced functionality for complex repository transformations.

Contents

Understanding Repository Merging Options

When you need to merge two Git repositories while preserving complete history, you have several approaches to consider. Each method has its own advantages and trade-offs in terms of complexity, compatibility, and functionality.

The most common approaches are:

  1. Git Subtree - Integrates one repository as a subdirectory of another, preserving all commit history
  2. Git Filter-Repo - More powerful tool for rewriting repository history before merging
  3. Git Submodule - Links to the external repository as a reference, but doesn’t integrate history
  4. Manual Import with Git Archive - Creates a clean slate but loses some history context

For your specific use case of adding project A as a subdirectory to project B while preserving history, git subtree is often the most straightforward and effective solution.

Key Consideration: Unlike simple file copying, these methods preserve the complete commit history, allowing you to see the full evolution of your experimental project within the larger project structure.

Using Git Subtree for Repository Integration

The git subtree command is specifically designed to integrate one repository into a subdirectory of another while preserving all commit history. It’s part of Git’s contrib scripts and provides a clean way to merge repositories.

Basic Subtree Commands

First, ensure you have git subtree available. It comes with Git but may need to be explicitly enabled:

bash
git subtree --help  # Test if it's available

The key commands you’ll need are:

bash
# Add repository A as a subdirectory in repository B
git subtree add --prefix=projectA <repositoryA_url> <branch_or_tag>

# Pull updates from repository A into the subdirectory
git subtree pull --prefix=projectA <repositoryA_url> <branch_or_tag>

# Push changes from the subdirectory back to repository A
git subtree push --prefix=projectA <repositoryA_url> <target_branch>

How Subtree Preserves History

Unlike simple file copying, git subtree maintains the original commit history by creating merge commits that reference both repositories. Each commit from repository A becomes part of repository B’s history, with the file paths prefixed according to the subdirectory location.

This approach creates a unified history where you can see the complete evolution of both projects within a single repository structure.

Alternative Methods with Git Filter-Repo

For more complex scenarios, git filter-repo offers powerful capabilities for repository history rewriting before merging. This tool is particularly useful when you need to:

  • Rewrite author information or commit dates
  • Filter out specific files or directories
  • Change commit messages en masse
  • Restructure repository layout before merging

Filter-Repo Workflow

bash
# First, install git-filter-repo if not available
pip install git-filter-repo

# Clone repository A and rewrite its history
git clone <repositoryA_url> temp-repo
cd temp-repo
git filter-repo --to-subdirectory-filter projectA

# Now add the rewritten repository to repository B
git remote add temp-repo ../temp-repo
git fetch temp-repo
git merge temp-repo/main --allow-unrelated-histories

Note: Git filter-repo is more powerful but also more complex. It’s recommended for situations where you need fine-grained control over the history rewriting process.

Step-by-Step Implementation Guide

Let’s walk through a complete implementation using the git subtree approach, which is ideal for your use case.

Prerequisites

Before starting, ensure both repositories are accessible and you have write permissions:

bash
# Navigate to your main project B repository
cd /path/to/projectB

# Verify repository status
git status
git remote -v

Step 1: Add Repository A as Subdirectory

bash
# Add repository A as a subdirectory named 'projectA'
git subtree add --prefix=projectA https://github.com/yourusername/repositoryA.git main

# The command will automatically:
# 1. Fetch the remote repository
# 2. Create a merge commit
# 3. Add all files from repository A to the projectA/ subdirectory
# 4. Preserve all commit history from repository A

Step 2: Verify the Integration

bash
# Check that files are in the correct location
ls -la projectA/

# View the commit history to see merged commits
git log --oneline --graph --all

# Check that the original history is preserved
git log projectA/ | head -10

Step 3: Push to Remote Repository

bash
# Push the merged repository to remote
git push origin main

# Also push any new branches created by the subtree operation
git push origin --all

Step 4: Ongoing Maintenance

As repository A evolves, you can pull updates:

bash
# Pull latest changes from repository A
git subtree pull --prefix=projectA https://github.com/yourusername/repositoryA.git main

# If you make changes in projectA/ that should go back to repository A
git subtree push --prefix=projectA https://github.com/yourusername/repositoryA.git main

Best Practices and Considerations

When merging repositories while preserving history, consider these important best practices:

Branch Strategy

  • Consider creating a feature branch before performing the merge to isolate the work
  • Test the merge in a staging environment before committing to main
  • Document the merge process for future reference

Conflict Resolution

  • Be prepared for merge conflicts, especially if both repositories have files with similar names
  • Resolve conflicts carefully, preserving the intent of both codebases
  • Test thoroughly after resolving conflicts to ensure functionality is preserved

History Management

  • Keep commit messages clear and descriptive of what was merged
  • Consider tagging important points in the history before major operations
  • Regularly backup your repositories before performing complex operations

Performance Considerations

  • Large repositories may take longer to merge due to history processing
  • Network connectivity is crucial for remote repository operations
  • Disk space requirements increase with preserved history

Troubleshooting Common Issues

Subtree Not Available

If git subtree is not available:

bash
# For macOS with Homebrew
brew install git

# For Ubuntu/Debian
sudo apt-get install git

# Or use the contrib scripts directly
git contrib/subtree/git-subtree.sh

Merge Conflicts

If you encounter merge conflicts:

bash
# Check conflicted files
git status

# Resolve conflicts manually in your editor
git add projectA/path/to/conflicted/file

# Complete the merge
git commit

History Issues

If you need to rewrite history after a subtree merge:

bash
# Interactive rebase to clean up commit history
git rebase -i HEAD~3

# Or use git filter-repo for more complex operations
git filter-repo --path projectA/ --force

Remote Repository Issues

If you have trouble accessing the remote repository:

bash
# Verify remote URL
git remote -v

# Update remote URL if needed
git remote set-url origin https://new-url.com/repository.git

# Test connectivity
git fetch origin

Sources

  1. Git Subtree Documentation - Official Git Pro Book
  2. Git Filter-Repo Documentation
  3. Stack Overflow - How to merge two Git repositories
  4. Atlassian Git Tutorial - Merging Repositories
  5. GitHub Docs - Adding a repository as a subtree

Conclusion

Merging two Git repositories while preserving history is entirely achievable using modern Git tools. For your specific use case of integrating project A into project B as a subdirectory, the git subtree approach provides the best balance of simplicity and functionality.

Key takeaways:

  • Git subtree maintains complete commit history while integrating repositories as subdirectories
  • The process creates a unified history where both projects’ evolution is visible
  • Regular maintenance allows you to pull updates from the original repository or push changes back
  • Proper planning and testing help avoid common issues like merge conflicts

Recommended next steps:

  1. Create a backup of both repositories before starting
  2. Test the merge in a local clone first
  3. Document the process for future reference
  4. Consider establishing a regular sync schedule if both repositories continue to evolve independently

By following these methods, you can successfully combine your experimental project with the larger project while preserving the valuable history that shows how your work has evolved over time.