I’m working on a prototyping project that has been going on for a few years. This year, its source got migrated to git. For the last month or two, all of the interesting action has been in one subdirectory of the repository. We wanted to split the work off into another repository that didn’t have all the old cruft. It wasn’t too hard.

To do it, I took advantage of git’s internal structures. Conceptually, I did the opposite of a subtree merge… so, it was a subtree extract. Our subdirectory has always been in the same place, so the combination of git log HEAD -- [subtree] and git ls-tree [commit] [subtree] got me a list of commits and the tree IDs for the subtree I was extracting. From there, I used commit-tree to build up the new history for the tree.

That description makes it sound like I should have had about a 5-line shell script. But there are obviously some details left out. If you want everything, check out extract_subtree.rb.

If you decide to use this script, please be careful with it. It shouldn’t destroy anything, but it might mess up your repo if something isn’t set up right. Also, this won’t deal with the .gitmodules file, so if you use submodules, you’ll need to manually build your .gitmodules file again.

If you want to know more about git’s internals, check out Scott Chacon’s ProGit book.

>>> git tfs clone http://team:8080/ $/sandbox sample_for_git_tfs
Initialized empty Git repository in d:/Projects/sample_for_git_tfs/.git/
C4949 = 84cfc504fd85a826ede8d852e3a5e75fc8952bd2
C69840 = fb10351a88a948d07b3eb6ef176b7e3bb1999ecb
C69841 = 764abfc9995945cafce2e23a1d2735aec8d288bd
C69842 = bb87001af0e93de25e13dab4602db2f84f4b1932
C69843 = 664705c06790fbd1da066833e9e81da24c88ea61
C86163 = 99d300275e87adedae65cd3c9a57c9f55c76eb94
C86164 = 2d10de0deb9fe96e22b14eb2e65e2f95ef55c1d6
C86165 = da9da4b79b2d9e16aa17f6a12317bfe3e29687de
C86167 = b3d7f12b3cf39b546cb19ddc37518be1080b86d8
C86170 = 5df88af2597f219a7b4ccf979ff90ff1cc7ed05b
C86209 = 46d814b83a5a75508e1584a8850d8bcb3c9d5a7d
>>> cd sample_for_git_tfs
>>> ls -R
.:
ANewDirectory

./ANewDirectory:
SomeDir

./ANewDirectory/SomeDir:
readme.txt sayhi.bat

>>> echo another line >> ANewDirectory/SomeDir/readme.txt
>>> git commit -a -m "Contributed using git-tfs."
[master a72b561] Contributed using git-tfs.
1 files changed, 1 insertions(+), 0 deletions(-)
>>> git tfs shelve GIT_TFS_SAMPLE
edit ANewDirectory/SomeDir/readme.txt
>>> mkdir ../sample_workspace
>>> cd ../sample_workspace
>>> tf workspace -new -noprompt sample_workspace
>>> tf unshelve -noprompt GIT_TFS_SAMPLE
sandbox\ANewDirectory\SomeDir:
Unshelving edit: readme.txt
>>> tf diff -noprompt
edit: d:\Projects\sample_workspace\sandbox\ANewDirectory\SomeDir\readme.txt
File: sandbox\ANewDirectory\SomeDir\readme.txt
===================================================================
--- sandbox\ANewDirectory\SomeDir\readme.txt;C69843 (server) 1/28/2010 1:02 PM
+++ sandbox\ANewDirectory\SomeDir\readme.txt (local) 1/28/2010 1:02 PM
@@ -4,3 +4,4 @@
----
This line was added for the second checkin.
This line was added after the parent directory was nested inside a new top-level-directory.
+another line
===================================================================