git du: historic object size
published on Tuesday, May 30, 2017
Ever wanted to do git du, accumulating the size of folders or files through the entire history? Or just wanted to know the maximum or average size of a file or folder?
Download the git-du-helper.pl script into your current working directory and then execute the following command:
You get a text file with columns SUM_SIZE MAX_SIZE NUM_REVS PATH.
What's going on here?
- git rev-list prints all revisions in the history of a branch
- xargs -l1 -- git ls-tree shows list of files and their size for each revision
- ./git-du-helper.pl analyzes the text data (no interaction with git).
You can now pose additional queries to get pretty-printed output, e.g. sort by MAX_SIZE, show human readable sizes (KiB/MiB/…) and columnate:
Note that these numbers don't reflect the actual storage size on disk because git compresses objects and can pack similar objects based on deltas using packfiles (see also this excellent answer on SO and maybe this post). If you want to detect large files on disk, take a look at Steve Lorek's article How to Shrink a Git Repository which shows how to get actual file sizes using git verify-pack.
For completeness, my git-du-helper.pl looks as follows: