Git Maintenance and Visualization: Keeping Your Repository Healthy
Git repositories require regular maintenance to ensure optimal performance and prevent accumulation of unnecessary data. This article covers essential tools and techniques for maintaining repository health, including garbage collection, visualization with tig
, and advanced recovery operations.
Git Garbage Collection: Repository Housekeeping
Git’s garbage collection system automatically cleans up unreachable objects and optimizes repository storage. Understanding when and how to use these tools is essential for maintaining repository performance.
Basic Garbage Collection
# Basic cleanup (run automatically by Git)
git gc
# More aggressive cleanup (recommended for periodic maintenance)
git gc --aggressive
Basic git gc
performs lightweight cleanup:
- Packs loose objects into packfiles
- Removes unreachable objects older than 2 weeks
- Optimizes packfile deltas and offsets
Aggressive Garbage Collection
# Comprehensive repository optimization
git gc --aggressive --prune=now
Aggressive collection includes:
- More thorough delta compression
- Immediate pruning of all unreachable objects
- Complete repository optimization
Manual Pruning Operations
# Remove unreachable objects immediately
git prune
# Check repository integrity during pruning
git fsck --unreachable
git prune
When to Run Maintenance
- Weekly: Basic
git gc
for routine housekeeping - Monthly:
git gc --aggressive
for deeper optimization - After major operations: After branch cleanup or history rewriting
- When repository slows down: Periodic aggressive cleanup
Space Recovery
Large repositories benefit from aggressive cleanup:
# Check repository size
du -sh .git
# Aggressive cleanup
git gc --aggressive --prune=now
# Check size after cleanup
du -sh .git
Repository Visualization with tig
tig is an ncurses-based text-mode interface for Git that provides powerful repository visualization capabilities.
Installation
# Ubuntu/Debian
sudo apt install tig
# macOS
brew install tig
# Arch/Manjaro Linux
pamac install tig
Basic Usage
# Main view (log with diff)
tig
# Status view (like git status with interactive staging)
tig status
# Blame view (line-by-line history)
tig blame <file>
# Branch view
tig refs
# Stash view
tig stash
Log View Features
# Start tig with specific commit range
tig HEAD~10..HEAD
# Filter by author
tig --author="John Doe"
# Filter by file path
tig -- path/to/file
# Show specific branch only
tig branch-name
Status View Workflow
The status view is particularly powerful for interactive staging:
- Run
tig status
- Navigate to files with
j/k
- Press
u
to stage/unstage individual files - Press
1
-4
to stage hunks (likegit add -p
) - Press
C
to commit staged changes - Press
c
to commit all changes
Advanced Recovery Operations
Git’s reflog provides a safety net for recovery from almost any operation.
Understanding reflog
# View recent operations
git reflog
# View with more details
git reflog --date=iso
# View branch-specific reflog
git reflog feature-branch
Recovering Lost Commits
# Find lost commits (unreachable but not yet garbage collected)
git fsck --unreachable
# Recover a specific commit by hash
git checkout <commit-hash>
git checkout -b recovery-branch
# Restore a branch to previous state
git branch -f feature-branch HEAD@{1}
Branch Recovery
# Restore deleted branch
git checkout -b restored-feature HEAD@{1}
# Find when branch was deleted
git reflog --date=iso | grep "feature-branch"
# Restore to exact previous state
git update-ref refs/heads/feature-branch HEAD@{2}
Repository Integrity Checking
Regular integrity checks help identify and prevent repository corruption.
Basic Integrity Check
# Check repository integrity
git fsck
# More verbose checking
git fsck --full
# Check with progress indication
git fsck --progress
Specific Checks
# Check for dangling objects (safe, usually not a problem)
git fsck --dangling
# Check for unreachable objects
git fsck --unreachable
# Verify object connectivity
git fsck --connectivity-only
Large File Management
Repositories with large files require special maintenance considerations.
Large File Detection
# Find largest files in repository
git rev-list --objects --all | grep -f <(git verify-pack -v .git/objects/pack/*.idx| sort -k 3 -n | cut -f 1 -d " ") | sort -k 3 -n
LFS Cleanup
# Clean up LFS objects
git lfs prune
# Check LFS status
git lfs status
# Verify LFS objects
git lfs fsck
Performance Monitoring
Monitor repository health and performance over time.
Repository Statistics
# Count objects
git count-objects -v
# Repository size information
git count-objects -vH
# Recent activity
git shortlog -n -s --all
Performance Testing
# Test clone performance
time git clone <repository>
# Test log performance
time git log --oneline | wc -l
# Test status performance
time git status
Best Practices for Repository Maintenance
Regular Maintenance Schedule
# Weekly maintenance alias
git config --global alias.maintain-weekly '!git gc && git prune'
# Monthly deep maintenance
git config --global alias.maintain-monthly '!git gc --aggressive --prune=now && git repack -a -d'
Pre-Backup Checks
# Comprehensive pre-backup maintenance
git gc --aggressive --prune=now
git fsck --full
git count-objects -vH
Large Repository Considerations
# Optimize for large repositories
git config gc.auto 0 # Disable automatic gc
git config pack.threads 1 # Single thread for stability
git config pack.windowMemory 1g # Limit window memory
Essential Git Configuration for Maintenance
Enhanced configuration for optimal maintenance workflows:
[gc]
# Disable auto-gc to prevent performance issues during work
auto = 0
# Aggressive settings for manual runs
aggressiveWindow = 250
window = 50
[pack]
# Optimize for reliability over speed
threads = 1
windowMemory = 1g
# Delta compression settings
depth = 50
window = 10
[fetch]
# Automatically prune deleted refs
prune = true
[repack]
# Optimize packfile creation
usedeltabaseoffset = true
Advanced Maintenance Workflows
Comprehensive Repository Audit
# Complete repository health check
git fsck --full --verbose
git count-objects -vH
git reflog expire --expire=now --all
git gc --aggressive --prune=now
git repack -a -d
Recovery from Corruption
# In case of repository corruption
git fsck --full
git prune
git reflog expire --expire=now --all
git gc --aggressive --prune=now
tig Advanced Features
Custom Key Bindings
Add to ~/.tigrc
:
# Custom key bindings
bind main R !git gc --aggressive
bind status G !git gc --aggressive
Color Configuration
Enhanced color scheme in ~/.tigrc
:
# Enhanced color scheme
color "default" yellow red reverse
color "cursor" yellow magenta reverse
color "selected" white blue bold
color "line-number" green default
Conclusion
Regular Git maintenance ensures optimal repository performance and prevents data bloat. Garbage collection tools like git gc
and git prune
keep repositories lean, while visualization tools like tig
provide essential insights into repository state. Understanding recovery operations through git reflog
and git fsck
ensures resilience against accidental data loss.
Establishing regular maintenance routines and monitoring repository health prevents performance degradation and maintains a healthy development environment. These tools and techniques are essential for repositories of any size, but become critical as repositories grow in complexity and history depth.