[i18n] readingTime
Git Index
TL;DR
Be careful how file path values are written to the git index because they can be set to any value (not just filenames) if you’re manipulating the index manually instead of with git. I created a repo with all the code snippets to make things easier to follow instead having of digging through my project with of unrelated application logic.
Project
I was working on a Go project recently that needed to interact with a git repo that took me down a very unexpected, but interesting, rabbit hole. Without going into a bunch of unrelated detail, my project needed to accomplish the following things in a git repo:
- Read a file in the repo
- Update the contents of the file
- Write the changes to disk
- Add the file changes to git
- Commit the file changes
Implementation
I’m writing this project in Go, so I’ll need a way to interact with git. I decided to use go-git because it was one of the first libraries that popped up. But first, let’s read the file:
func ReadFile(filename string) (string, error) {
b, err := os.ReadFile(filename)
return string(b), err
}
Update the file contents that is returned from the function above:
updatedContent := strings.ReplaceAll(content, "great", "good")
Write the updated contents to file:
func WriteFile(filename string, content string) error {
tmpFile := fmt.Sprintf("%s.tmp", filename)
fout, err := os.Create(tmpFile)
defer fout.Close()
if err != nil {
return err
}
bufOut := bufio.NewWriter(fout)
defer bufOut.Flush()
_, err = bufOut.WriteString(content)
if err != nil {
return err
}
err = os.Rename(tmpFile, filename)
if err != nil {
return err
}
return nil
}
Finally, add the updated file to git:
func AddToRepo(repo string, filepath string) error {
gitRepo, err := git.PlainOpen(repo)
if err != nil {
return err
}
gitWorktree, err := gitRepo.Worktree()
if err != nil {
return err
}
_, err = gitWorktree.Add(filepath)
if err != nil {
return err
}
_, err = gitWorktree.Commit("Updated file contents", &git.CommitOptions{})
if err != nil {
return err
}
return nil
}
Bring it all together to run:
package main
import (
"bufio"
"fmt"
"github.com/go-git/go-git/v5"
"os"
"strings"
)
func main() {
repo := "."
filename := "my-file"
filepath := strings.Join([]string{repo, filename}, string(os.PathSeparator))
content, err := ReadFile(filepath)
if err != nil {
os.Exit(1)
}
updatedContent := strings.ReplaceAll(content, "great", "good")
err = WriteFile(filepath, updatedContent)
if err != nil {
os.Exit(1)
}
err = AddToRepo(repo, filepath)
if err != nil {
os.Exit(1)
}
}
func ReadFile(filename string) (string, error) {
b, err := os.ReadFile(filename)
return string(b), err
}
func WriteFile(filename string, content string) error {
tmpFile := fmt.Sprintf("%s.tmp", filename)
fout, err := os.Create(tmpFile)
defer fout.Close()
if err != nil {
return err
}
bufOut := bufio.NewWriter(fout)
defer bufOut.Flush()
_, err = bufOut.WriteString(content)
if err != nil {
return err
}
err = os.Rename(tmpFile, filename)
if err != nil {
return err
}
return nil
}
func AddToRepo(repo string, filepath string) error {
gitRepo, err := git.PlainOpen(repo)
if err != nil {
return err
}
gitWorktree, err := gitRepo.Worktree()
if err != nil {
return err
}
_, err = gitWorktree.Add(filepath)
if err != nil {
return err
}
_, err = gitWorktree.Commit("Updated file contents", &git.CommitOptions{})
if err != nil {
return err
}
return nil
}
Now let’s make sure git looks like it’s supposed to:
$ git status
On branch master
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: ./my-file
new file: my-file
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: my-file
What the hell? I made a change to a single file, but now I have two staged changes and one unstaged change all for the same file??
Troubleshooting
Sanity check
Just to make sure I’m not forgetting something from the way I normally interact with git, let’s try doing it in bash:
file='my-file'
cat "${file}"
content="$(< ${file})"
printf "%s\n" "${content/great/good}" > my-file.tmp
mv my-file.tmp my-file
cat "${file}"
git add "${file}"
git commit -m "Updated file contents"
Nothing exciting here, mostly just the usual flow of git commands. Now let’s check how things look in git here:
$ git status
On branch master
Your branch is ahead of 'origin/master' by 1 commit.
(use "git push" to publish your local commits)
Well, this is what I was expecting all along. So what’s being done differently between git and go-git?
Git index
I started digging in deeper to the git side of things by comparing the state of the repo with each approach. This is the initial state of the git index before any changes are made to the repo:
$ git ls-files --stage my-file
100644 09721149f4f734c171891b7da25c4f7db30ea616 0 my-file
Now let’s run the sanity check implementation and check the index again:
$ git ls-files --stage my-file
100644 47c9021d2cbbd8f691d9ff2e1df9a053ad7b28ee 0 my-file
Okay, things look good. The updated file is there and the hash has been updated as expected (because the file was modified). Now let’s put the repo back to its original state:
$ git reset --hard origin/master
Then run the Go implementation and check the index again:
$ git ls-files --stage my-file
100644 09721149f4f734c171891b7da25c4f7db30ea616 0 my-file
Weird, the hash is the same as the original file even though it’s been changed. Let’s look at the whole index:
$ git ls-files --stage
100644 47c9021d2cbbd8f691d9ff2e1df9a053ad7b28ee 0 ./my-file
100644 09721149f4f734c171891b7da25c4f7db30ea616 0 my-file
Well this doesn’t make any goddamned sense. The hash for my-file is the same as the hash for the original file and the
hash for ./my-file is the same as the hash for the updated file. So does that mean the go-git library is updating
the git index wrong?
Spoiler alert: yes it does.
Debugging
I’ll spare you the actual process of setting breakpoints in the go-git library to chase down exactly where this
problem arises, but it ends up boiling down to this section
of worktree_status.go:
if err != nil || !fi.IsDir() {
added, h, err = w.doAddFile(idx, s, path, ignorePattern)
} else {
added, err = w.doAddDirectory(idx, s, path, ignorePattern)
}
In this case, I’m concerned about doAddFile() which passes the filepath as ./my-file and takes me further down into
the library where the index is actually being modified
in index.go:
func (i *Index) Add(path string) *Entry {
e := &Entry{
Name: filepath.ToSlash(path),
}
i.Entries = append(i.Entries, e)
return e
}
And there it is!! The path that is being passed is ./my-file instead of my-file which means the index is being
updated with the wrong filename or at least a different filename than the git CLI uses.
I’ll create an issue with the go-git people to see about getting this fixed if it’s not the intended behavior (it
would be strange if this was the intended behavior). Either way, now I can just update my application to clean up the
filepath before sending it over to go-git. Plus, this little speed bump
forced me to dive into the git internals to track down my issue, so I guess it wasn’t a complete waste of time.