Git Maintenance and Visualization: Keeping Your Repository Healthy

Author

Andres Monge

Published

December 21, 2024

Git repositories require regular maintenance to ensure optimal performance and prevent accumulation of unnecessary data. This article covers essential tools and techniques for maintaining repository health, including garbage collection, visualization with tig, and advanced recovery operations.

Git Garbage Collection: Repository Housekeeping

Git’s garbage collection system automatically cleans up unreachable objects and optimizes repository storage. Understanding when and how to use these tools is essential for maintaining repository performance.

Basic Garbage Collection

# Basic cleanup (run automatically by Git)
git gc

# More aggressive cleanup (recommended for periodic maintenance)
git gc --aggressive

Basic git gc performs lightweight cleanup:

  • Packs loose objects into packfiles
  • Removes unreachable objects older than 2 weeks
  • Optimizes packfile deltas and offsets

Aggressive Garbage Collection

# Comprehensive repository optimization
git gc --aggressive --prune=now

Aggressive collection includes:

  • More thorough delta compression
  • Immediate pruning of all unreachable objects
  • Complete repository optimization

Manual Pruning Operations

# Remove unreachable objects immediately
git prune

# Check repository integrity during pruning
git fsck --unreachable
git prune

When to Run Maintenance

  1. Weekly: Basic git gc for routine housekeeping
  2. Monthly: git gc --aggressive for deeper optimization
  3. After major operations: After branch cleanup or history rewriting
  4. When repository slows down: Periodic aggressive cleanup

Space Recovery

Large repositories benefit from aggressive cleanup:

# Check repository size
du -sh .git

# Aggressive cleanup
git gc --aggressive --prune=now

# Check size after cleanup
du -sh .git

Repository Visualization with tig

tig is an ncurses-based text-mode interface for Git that provides powerful repository visualization capabilities.

Installation

# Ubuntu/Debian
sudo apt install tig

# macOS
brew install tig

# Arch/Manjaro Linux
pamac install tig

Basic Usage

# Main view (log with diff)
tig

# Status view (like git status with interactive staging)
tig status

# Blame view (line-by-line history)
tig blame <file>

# Branch view
tig refs

# Stash view
tig stash

Log View Features

# Start tig with specific commit range
tig HEAD~10..HEAD

# Filter by author
tig --author="John Doe"

# Filter by file path
tig -- path/to/file

# Show specific branch only
tig branch-name

Status View Workflow

The status view is particularly powerful for interactive staging:

  1. Run tig status
  2. Navigate to files with j/k
  3. Press u to stage/unstage individual files
  4. Press 1-4 to stage hunks (like git add -p)
  5. Press C to commit staged changes
  6. Press c to commit all changes

Advanced Recovery Operations

Git’s reflog provides a safety net for recovery from almost any operation.

Understanding reflog

# View recent operations
git reflog

# View with more details
git reflog --date=iso

# View branch-specific reflog
git reflog feature-branch

Recovering Lost Commits

# Find lost commits (unreachable but not yet garbage collected)
git fsck --unreachable

# Recover a specific commit by hash
git checkout <commit-hash>
git checkout -b recovery-branch

# Restore a branch to previous state
git branch -f feature-branch HEAD@{1}

Branch Recovery

# Restore deleted branch
git checkout -b restored-feature HEAD@{1}

# Find when branch was deleted
git reflog --date=iso | grep "feature-branch"

# Restore to exact previous state
git update-ref refs/heads/feature-branch HEAD@{2}

Repository Integrity Checking

Regular integrity checks help identify and prevent repository corruption.

Basic Integrity Check

# Check repository integrity
git fsck

# More verbose checking
git fsck --full

# Check with progress indication
git fsck --progress

Specific Checks

# Check for dangling objects (safe, usually not a problem)
git fsck --dangling

# Check for unreachable objects
git fsck --unreachable

# Verify object connectivity
git fsck --connectivity-only

Large File Management

Repositories with large files require special maintenance considerations.

Large File Detection

# Find largest files in repository
git rev-list --objects --all | grep -f <(git verify-pack -v .git/objects/pack/*.idx| sort -k 3 -n | cut -f 1 -d " ") | sort -k 3 -n

LFS Cleanup

# Clean up LFS objects
git lfs prune

# Check LFS status
git lfs status

# Verify LFS objects
git lfs fsck

Performance Monitoring

Monitor repository health and performance over time.

Repository Statistics

# Count objects
git count-objects -v

# Repository size information
git count-objects -vH

# Recent activity
git shortlog -n -s --all

Performance Testing

# Test clone performance
time git clone <repository>

# Test log performance
time git log --oneline | wc -l

# Test status performance
time git status

Best Practices for Repository Maintenance

Regular Maintenance Schedule

# Weekly maintenance alias
git config --global alias.maintain-weekly '!git gc && git prune'

# Monthly deep maintenance
git config --global alias.maintain-monthly '!git gc --aggressive --prune=now && git repack -a -d'

Pre-Backup Checks

# Comprehensive pre-backup maintenance
git gc --aggressive --prune=now
git fsck --full
git count-objects -vH

Large Repository Considerations

# Optimize for large repositories
git config gc.auto 0  # Disable automatic gc
git config pack.threads 1  # Single thread for stability
git config pack.windowMemory 1g  # Limit window memory

Essential Git Configuration for Maintenance

Enhanced configuration for optimal maintenance workflows:

[gc]
    # Disable auto-gc to prevent performance issues during work
    auto = 0

    # Aggressive settings for manual runs
    aggressiveWindow = 250
    window = 50

[pack]
    # Optimize for reliability over speed
    threads = 1
    windowMemory = 1g

    # Delta compression settings
    depth = 50
    window = 10

[fetch]
    # Automatically prune deleted refs
    prune = true

[repack]
    # Optimize packfile creation
    usedeltabaseoffset = true

Advanced Maintenance Workflows

Comprehensive Repository Audit

# Complete repository health check
git fsck --full --verbose
git count-objects -vH
git reflog expire --expire=now --all
git gc --aggressive --prune=now
git repack -a -d

Recovery from Corruption

# In case of repository corruption
git fsck --full
git prune
git reflog expire --expire=now --all
git gc --aggressive --prune=now

tig Advanced Features

Custom Key Bindings

Add to ~/.tigrc:

# Custom key bindings
bind main R !git gc --aggressive
bind status G !git gc --aggressive

Color Configuration

Enhanced color scheme in ~/.tigrc:

# Enhanced color scheme
color "default" yellow   red     reverse
color "cursor"  yellow   magenta reverse
color "selected" white    blue    bold
color "line-number" green default

Conclusion

Regular Git maintenance ensures optimal repository performance and prevents data bloat. Garbage collection tools like git gc and git prune keep repositories lean, while visualization tools like tig provide essential insights into repository state. Understanding recovery operations through git reflog and git fsck ensures resilience against accidental data loss.

Establishing regular maintenance routines and monitoring repository health prevents performance degradation and maintains a healthy development environment. These tools and techniques are essential for repositories of any size, but become critical as repositories grow in complexity and history depth.