Understanding and Working with Submodules in Git
Most modern software projects depend on the work of others. It would be a waste of time to reinvent the wheel in your own code when someone else has already written a wonderful solution. That’s why so many projects use third-party code in the form of libraries or modules.
Git, the world’s most popular version control system, offers a great way to manage these dependencies in an elegant, robust way. Its “submodule” concept allows us to include and manage third-party libraries while keeping them cleanly separated from our own code.
In this article, you’ll learn why submodules in Git are so useful, what they actually are, and how they work.
Keeping Code Separate
To make clear why Git’s submodules are indeed an invaluable structure, let’s look at a case without submodules. When you need to include third-party code (such as an open-source library) you can of course go the easy way: just download the code from GitHub and dump it somewhere into your project. While certainly quick, this approach is definitely dirty for a couple of reasons:
- By brute force copying third-party code into your project, you’re effectively mixing multiple projects into one. The line between your own project and that of someone else (the library) starts to get blurry.
- Whenever you need to update the library code (because its maintainer delivered a great new feature or fixed a nasty bug) you again have to download, copy, and paste. This quickly becomes a tedious process.
The general rule in software development to “keep separate things separate” exists for a reason. And it’s certainly true for managing third-party code in your own projects. Luckily, Git’s submodule concept was made for exactly these situations.
But of course, submodules aren’t the only available solution for this kind of problem. You could also use one of the various “package manager” systems that many modern languages and frameworks provide. And there’s nothing wrong about that!
However, you could argue that Git’s submodule architecture comes with a couple of advantages:
- Submodules provide a consistent, reliable interface — no matter what language or framework you’re using. Especially if you’re working with multiple technologies, each one might have its own package manager with its own set of rules and commands. Submodules, on the other hand, always work the same.
- Not every piece of code might be available over a package manager. Maybe you just want to share your own code between two projects — a situation where submodules might offer the simplest possible workflow.
What Git Submodules Really Are
Submodules in Git are really just standard Git repositories. No fancy innovation, just the same Git repositories that we all know so well by now. This is also part of the power of submodules: they’re so robust and straightforward because they are so “boring” (from a technological point of view) and field-tested.
The only thing that makes a Git repository a submodule is that it’s placed inside another, parent Git repository.
Other than that, a Git submodule remains a fully functional repository: you can perform all the actions that you already know from your “normal” Git work — from modifying files, all the way to committing, pulling and pushing. Everything’s possible in a submodule.
Adding a Submodule
Let’s take the classic example and say we’d like to add a third-party library to our project. Before we go get any code, it makes sense to create a separate folder where things like these can have a home:
$ mkdir lib $ cd lib
$ git submodule add https://github.com/spencermountain/spacetime.git
When we run this command, Git starts cloning the repository into our project, as a submodule:
Cloning into 'carparts-website/lib/spacetime'... remote: Enumerating objects: 7768, done. remote: Counting objects: 100% (1066/1066), done. remote: Compressing objects: 100% (445/445), done. remote: Total 7768 (delta 615), reused 975 (delta 588), pack-reused 6702 Receiving objects: 100% (7768/7768), 4.02 MiB | 7.78 MiB/s, done. Resolving deltas: 100% (5159/5159), done.
And if we take a look at our working copy folder, we can see that the library files have in fact arrived in our project.
“So what’s the difference?” you might ask. After all, the third-party library’s files are here, just like they would be if we had copy-pasted them. The crucial difference is indeed that they are contained in their own Git repository! Had we just downloaded some files, thrown them into our project and then committed them — like the other files in our project — they would have been part of the same Git repository. The submodule, however, makes sure that the library files don’t “leak” into our main project’s repository.
Let’s see what else has happened: a new
.gitmodules file has been created in the root folder of our main project. Here’s what it contains:
[submodule "lib/spacetime"] path = lib/spacetime url = https://github.com/spencermountain/spacetime.git
.gitmodules file is one of multiple places where Git keeps track of the submodules in our project. Another one is
.git/config, which now ends like this:
[submodule "lib/spacetime"] url = https://github.com/spencermountain/spacetime.git active = true
And finally, Git also keeps a copy of each submodule’s
.git repository in an internal
All of these are technical details you don’t have to remember. However, it probably helps you to understand that the internal maintenance of Git submodules is quite complex. That’s why it’s important to take one thing away: don’t mess with Git submodule configuration by hand! If you want to move, delete, or otherwise manipulate a submodule, please do yourself a favor and do not try this manually. Either use the proper Git commands or a desktop GUI for Git like “Tower”, which takes care of these details for you.
Let’s have a look at the status of our main project, now that we’ve added the submodule:
$ git status On branch master Changes to be committed: (use "git restore --staged <file>..." to unstage) new file: .gitmodules new file: lib/spacetime
As you can see, Git regards adding a submodule as a change like any other. Accordingly, we have to commit this change like any other:
$ git commit -m "Add timezone converter library as a submodule"
Cloning a Project with Git Submodules
In our example above, we added a new submodule to an existing Git repository. But what about “the other way around”, when you clone a repository that already contains submodules?
If we performed a vanilla
git clone <remote-URL> on the command line, we would download the main project — but we would find any submodule folder empty! This again is vivid proof that submodule files are separate and not included in their parent repositories.
In such a case, to populate submodules after you’ve cloned their parent repository, you can simply execute
git submodule update --init --recursive afterwards. The even better way is to simply add the
--recurse-submodules option right when you call
git clone in the first place.
Checking Out Revisions
In a “normal” Git repository, we usually check out branches. By using
git checkout <branchname> or the newer
git switch <branchname>, we’re telling Git what our currently active branch should be. When new commits are made on this branch, the HEAD pointer is automatically moved to the very latest commit. This is important to understand — because Git submodules work differently!
In a submodule, we’re always checking out a specific revision — not a branch! Even when you’re executing a command like
git checkout main in a submodule, in the background, the currently latest commit on that branch is noted — not the branch itself.
This behavior, of course, is not a mistake. Think about it: when you include a third-party library, you want to have complete control over what exact code is being used in your main project. When the library’s maintainer releases a new version, that’s all well and good … but you don’t necessarily want this new version to be automatically used in your project. Simply because you don’t know if those new changes might break your project!
If you want to find out what revision your submodules are using, you can request this information in your main project:
$ git submodule status ea703a7d557efd90ccae894db96368d750be93b6 lib/spacetime (6.16.3)
This returns the currently checked out revision of our
lib/spacetime submodule. And it also lets us know that this revision is a tag, named “6.16.3”. It’s pretty common to use tags heavily when working with submodules in Git.
Let’s say you wanted your submodule to use an older version, which was tagged “6.14.0”. First, we have to change directories so that our Git command will be executed in the context of the submodule, not our main project. Then, we can simply run
git checkout with the tag name:
$ cd lib/spacetime/ $ git checkout 6.14.0 Previous HEAD position was ea703a7 Merge pull request HEAD is now at 7f78d50 Merge pull request
If we now go back into our main project and execute
git submodule status again, we’ll see our checkout reflected:
$ cd ../.. $ git submodule status +7f78d50156ae1205aa50675ddede81a61a45fade lib/spacetime (6.14.0)
Take a close look at the output, though: the little
+ symbol in front of that SHA-1 hash tells us that the submodule is at a different revision than is currently stored in the parent repository. As we just changed the checked out revision, this looks correct.
git status in our main project now informs us about this fact, too:
$ git status On branch master Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git restore <file>..." to discard changes in working directory) modified: lib/spacetime (new commits)
You can see that Git considers moving a submodule’s pointer as a change like any other: we have to commit it to the repository if we want it to be stored:
$ git commit -m "Changed checked out revision in submodule" $ git push
Updating a Git Submodule
In the above steps, it was us who moved the submodule pointer: we were the ones who chose to check out a different revision, commit it, and push it to our team’s remote repository. But what if one of our colleagues changed the submodule revision — maybe because an interesting new version of the submodule was released and our colleague decided to use this in our project (after thoroughly testing, of course …).
Let’s do a simple
git pull in our main project — as we would probably do quite often anyway — to get new changes from the shared remote repository:
$ git pull From https://github.com/gntr/git-crash-course d86f6e0..055333e main -> origin/main Updating d86f6e0..055333e Fast-forward lib/spacetime | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
The second to last line indicates that something in our submodule has been changed. But let’s take a closer look:
$ git submodule status +7f78d50156ae1205aa50675ddede81a61a45fade lib/spacetime (6.14.0)
I’m sure you remember that little
+ sign: it means the submodule pointer was moved! To update our locally checked out revision to the “official” one that our teammate chose, we can run the
$ git submodule update lib/spacetime Submodule path 'lib/spacetime': checked out '5e3d70a88180879ae0222b6929551c41c3e5309e'
Alright! Our submodule is now checked out at the revision that’s recorded in our main project repository!
Working with Submodules in Git
We’ve covered the basic building blocks of working with Git submodules. Other workflows are really quite standard!
Checking for new changes in a submodule, for example, works like in any other Git repository: you run a
git fetch command inside the submodule repository, possibly followed by something like
git pull origin main if you want to indeed make use of the updates.
Making changes in a submodule might also be a use case for you, especially if you manage the library code yourself (because it’s an internal library, not from a third party). You can work with the submodule like with any other Git repository: you can make changes, commit them, push them, and so on.
Using the Full Power of Git
Git has a whole lot of power under the hood. But many of its advanced tools — like Git submodules — aren’t well known. It’s really a pity that so many developers are missing out on a lot of powerful stuff!
If you want to go deeper and get a glimpse of some other advanced Git techniques, I highly recommend the “Advanced Git Kit“: it’s a (free!) collection of short videos that introduce you to topics like the Reflog, Interactive Rebase, Cherry-Picking, and even branching strategies.
Have fun becoming a better developer!