Git Tools¶

This part is mainly from ProGit Ch7

Tip

这部分随缘更新，用到的时候再学，学完顺便总结一下

最近在 skypilot 进行高强度开发，因此遇见很多git中曾经不重视的问题，借此机会学习并进行汇总

事实上，开发的过程中各种查缺补漏，到最后都是搞懂这张git指令集

只是，单纯花时间系统学没太大必要，用到再查，查完学会即可。

Stash and Clean¶

Examples¶

基础环境

Bash
# `more` branch is created by `main`
❯ git ls-tree --name-only main
1.py
2.py
3.py
4.py
  ~/Desktop/test   main ?1                                                                                                       14:59:33
❯ git ls-tree --name-only more
1.py
2.py
3.py
5.py
  ~/Desktop/test   main ?1                                                                                                       14:59:36
❯ git branch                  
* main
  more

使用场景

现在我在more分支下新建两个文件：bxhu_add_not_commit.py 和 bxhu_not_add.py.

Bash
❯ git add bxhu_add_not_commit.py
❯ git status
On branch more
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
    new file:   bxhu_add_not_commit.py

Untracked files:
  (use "git add <file>..." to include in what will be committed)
    bxhu_not_add.py

为了更好的说明stash的特性，我特地设置了两个文件，一个被追踪但未被提交，一个压根没被追踪。

现在的问题是，我突然要回到main分支，新建一个6.py；但是，上述两个文件都是我瞎写的，我甚至不想给它们commit，目前没法切换分支 ToT

Legal but not good

其实“目前没法切换分支”不完全准确，事实上也可以切换，但是这样很不安全，不符合规范

如果在 more 分支中添加了新文件而没有提交，当您切换回 main 分支时，这些文件会暂时出现在 main 分支的工作区。这种情况发生的原因是：

暂存文件 bxhu_add_not_commit.py：
- 该文件被 git add 暂存了但未提交。Git 会把暂存的更改带到切换后的分支上，因为暂存区的内容在 Git 看来是待定的，属于未完成的变更。
未跟踪文件 bxhu_not_add.py：
- 未跟踪文件会一直停留在工作区中，跨分支切换时也会存在。Git 默认不会自动移除未跟踪的文件，除非通过 git clean 清理。

因此我在这里，最规范的做法是：git stash，暂存目前的变更进入缓冲区

From ProGit

Now you want to switch branches, but you don’t want to commit what you’ve been working on yet, so you’ll stash the changes. To push a new stash onto your stack, run git stash or git stash push:

现在我们进行stash操作：

stash状态1:

Bash
❯ git status
On branch more
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
    new file:   bxhu_add_not_commit.py

Untracked files:
  (use "git add <file>..." to include in what will be committed)
    bxhu_not_add.py

❯ git stash save "bxhu_not_add.py is not tracked now"
Saved working directory and index state On more: bxhu_not_add.py is not tracked now

stash状态2:

Bash
❯ git add bxhu_not_add.py

❯ git stash save "track bxhu_not_add.py"
Saved working directory and index state On more: track bxhu_not_add.py

❯ git status
On branch more
nothing to commit, working tree clean

检查当前stash list：

Bash
❯ git stash list
stash@{0}: On more: track bxhu_not_add.py
stash@{1}: On more: bxhu_not_add.py is not tracked now

注意这里，stash@{0}是最新的stash，stash@{1}是之前的stash，以此类推...

现在假设我们==回到了 main 分支进行一系列操作，操作完，又回到 more 分支==，我们想要恢复之前的stash，那么我们可以使用git stash apply命令：

先回到比较老的状态(stash@{1})试试：

Bash
  ~/Desktop/test   more *2                                                                                                                                                   15:55:35
❯ git stash apply stash@{1}
On branch more
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
    new file:   bxhu_add_not_commit.py

  ~/Desktop/test   more *2 +1                                                                                                                                                16:00:08
❯ git status
On branch more
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
    new file:   bxhu_add_not_commit.py

再到最新的状态(stash@{0})：

Bash
❯ git stash apply stash@{0}
On branch more
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
    new file:   bxhu_add_not_commit.py
    new file:   bxhu_not_add.py

注意这里，stash apply之后，它会自动显示当前的status，不需要再单独操作一次了

Summary¶

个人来看，stash 是一个“时间机器” for specific branch，它可以让我们在当前分支上，暂存当前的工作区，然后切换到其他分支，完成其他分支的工作，然后再切换回来，恢复之前的工作区。

这里总结一下常见的 stash 指令：

1) 保存修改

Bash
git stash # save current status (staged + unstaged)
git stash save "specific message for this item" # Strongly Recommend!!!

Unstaged and Staged

默认情况下，git stash 只会保存已跟踪的文件修改 (staged)
使用 git stash -u 可以同时保存未跟踪的文件 (staged + unstaged)

2) 恢复修改

Bash
git stash pop # back to "timepoint" and delete corresponding stash item
git stash apply # back to "timepoint" but keep stash item

3) 查看stash list

Bash
git stash list # PS: Stack Structure,  stash@{0} is the latest stash

4) 删除list条目

Bash
git stash drop stash@{n} # delete specific item in Stash List
git stash clear          # delete all items in Stach List

My Choice:

git stash save "msg"
git stash list
git stash apply stash@{n}
git stash drop stash@{n}

Rebase and Merge¶

事实上这是一个历史遗留的疑惑点，正好今天有时间系统整理一下：

This video is all you need :)

Bash
# stage 1 in main
git init
echo "init content" >> file.txt
git add file.txt
git commit -m "init file"

# stage 2 in feature
git checkout -b feature
echo "modified by feature branch" >> file.txt
git add file.txt
git commit -m "add new content in file"

# stage 3 in main
git checkout main
echo "modified by main branch" >> file.txt
git add file.txt
git commit -m "add new functionality in main"

现在 commit timeline 长这样：

alt text

Difference¶

1）如果采取merge方案：

Bash
git checkout main
git merge feature

alt text

2）如果采取rebase方案：

Bash
git checkout feature
git rebase main
git checkout main
git merge feature

中间态，在git rebase main之后

alt text

最终态：after git merge feature

alt text

Exception Process¶

在feature分支merge到main分支的过程中，会出现一些常见的conflict错误，这里我们分类来说明解决方法：

Merge Conflict

首先确保你已经修改了有冲突的文件（file.txt），删除了冲突标记（<<<<<<, =======, >>>>>>>）并保存了最终想要的内容。

这一步：

如果在vim中，就是逐个对比<<<<<<, =======, >>>>>>>区间内的异同，手动修改；
如果在vscode中，可以利用“冲突合并编辑器”，直接点击“接受当前更改”或“接受传入更改”来解决冲突，非常方便 :)

1) 将修改后的文件添加到暂存区：

Bash
git add file.txt

2) 提交合并结果：

Bash
git commit -m "Merge branch 'feature': resolve conflicts in file.txt"

PS, 如果在解决冲突过程中改变主意了，想要取消这次合并，可以使用：

Bash
git merge --abort

3) 如果之后需要推送到 remote repo，直接推送即可：

Bash
git push origin <branch-name>

检查状态的cmd

git status, 查看当前冲突状态
git diff, 查看具体的冲突内容

Rebase Conflict

首先确保你已经修改了有冲突的文件（file.txt），删除了冲突标记（<<<<<<, =======, >>>>>>>）并保存了最终想要的内容。

这里跟上面merge一模一样。

1) 将修改后的文件添加到暂存区：

Bash
git add file.txt

2) 继续 rebase 过程：

Bash
git rebase --continue

abort in rebase

在 rebase 过程中可能需要多次解决冲突

放弃这次 rebase，回到之前的状态 (最初始状态)：

Bash
git rebase --abort

小心：这是回到最初始，而不是上一步状态！

3) rebase 完成后，如需推送到远程，要使用强制推送：

Bash
git push -f origin <branch-name>

Git-LFS¶

Git Large File Storage (LFS) 是一个 Git 的扩展，用于管理大型文件，如视频、音频、图像等。

git-lfs官网

git-lfs官方文档

它通过将大型文件存储在 Git 仓库之外的服务器上，并在 Git 仓库中存储指向这些文件的指针，从而解决了 Git 对大型文件的处理效率低下的问题。

Why Git-LFS¶

alt text

这张图实际上说清楚了git-lfs的设计理念，便是：

对于超级大的文件，我们实际上将其存储在git-lfs的远程库（第三方托管，有使用额度，超过则需要缴费），而在本地库中，我们只存储一个指向这个文件的指针，这样就可以大大减少本地库的大小，提高git的处理效率。

alt text

这是git-lfs官网的开篇图片，说的更加简明：

对于一般开发的code，常规来说文件肯定不会太大，我们使用git就足够了。

但是对于一些大型文件file.psd，比如.pdf / .mp4等媒体文件，我们就需要使用git-lfs来管理了。

How it works¶

alt text

Git LFS 通过将大文件替换为指针文件的方式来管理大文件：

在本地仓库中只保存指针文件，这些指针文件极小（通常小于 1KB）
实际的大文件内容存储在 Git LFS 服务器上

指针文件包含三个关键信息

Text Only
version https://git-lfs.github.com/spec/v1
oid sha256:[文件的唯一hash值]
size [文件大小]

Process¶

添加文件时：
- 当执行 git add 命令时，Git LFS 会创建指针文件替换原文件内容
- 实际文件内容被存储在本地 Git LFS 缓存中（.git/lfs/objects 目录）
推送到远程时：
- 当执行 git push 时，Git LFS 的 pre-push 钩子会被触发
- 大文件内容会从本地 LFS 缓存直接传输到远程 Git LFS 存储服务器
- 而==指针文件会被推送到常规的 Git 仓库中==
克隆和检出时：
- 克隆仓库时只会下载指针文件，不会下载大文件内容
- 只有在 checkout 到具体文件时，才会从 LFS 服务器下载对应版本的实际文件内容

Storage¶

本地仓库：
- 只存储指针文件（约 132 字节）
- 实际文件存在 .git/lfs/objects 目录下
- 只保存当前工作所需的文件版本
远程仓库：
- Git 仓库中存储指针文件
- LFS 服务器存储实际的大文件内容
- 保存文件的所有历史版本

这种设计的优点是：

显著减少 Git 仓库的体积
加快克隆和拉取操作
只下载实际需要的文件版本

缺点是：

需要依赖网络访问 LFS 服务器
本地仓库不再是完整的仓库副本，需要额外的 LFS 服务器支持

How to Use¶

在mac/linux上安装git-lfs非常简单，只需要在终端中输入：

Bash
git lfs install

此时，你会发现仓库里多了一个.gitattributes文件，这个文件用于配置哪些文件需要使用git-lfs来管理

Bash
ls -a
# we can find `.gitattributes` now

追踪需要的大文件，这里的文件可以是后缀名（群体），也可以是文件名（个体）：

Bash
# track all .pdf files in this repo
git lfs track "*.pdf"

# track a specific file
git lfs track "bxhu-handsome.pdf"

此时检验一下.gitattributes文件，会发现多了一些内容：

Text Only
❯ cat .gitattributes
───────┬────────────────────────────────────────────────────────────────────────────────────────────
       │ File: .gitattributes
───────┼────────────────────────────────────────────────────────────────────────────────────────────
   1   │ *.pdf filter=lfs diff=lfs merge=lfs -text

将.gitatributes文件自身也追踪一下：

Bash
git add .gitattributes

No Forward Compatibility!

Note that defining the file types Git LFS should track will not, by itself, convert any pre-existing files to Git LFS, such as files on other branches or in your prior commit history. To do that, use the git lfs migrate command, which has a range of options designed to suit various potential use cases.

如果我曾经有一些大文件了，但是那时候还没有启用git-lfs，则此时添加git-lfs后，不会自动前向追踪

我需要再额外使用 git lfs migrate 命令来将这些大文件转换为git-lfs的格式。

更加推荐的做法请参考：here

此时，就可以像往常一样，a-c-p了：

Bash
git add .
git commit -m "Add pdf with git-lfs"
git push origin main

TL;DR¶

Bash
git lfs track "*.pdf"
git add .gitattributes
git add .
git commit -m "Add pdf with git-lfs"
git push origin main

Git Submodules¶

在大型项目开发的过程中，尤其是在做一篇论文，很多时候存在一些相对独立的实验。一般来说，我们的习惯是:

将相对独立的实验，各自存在一个 git 仓库里。先各做各部分的实验，等到论文需要整体开源时，再将所有的实验汇总在一起，每个实验是“总仓库”的 子模块

这里我们以一个现实的例子展开，来自笔者的 INFOCOM 2026

需求¶

在 star-alliance-core 仓库中, 添加 star-alliance-webrtc 和 star-alliance-webrtc-exp 这两个子模块

Bash
star-alliance-core: https://github.com/root-hbx/star-alliance-core
star-alliance-webrtc: https://github.com/root-hbx/star-alliance-webrtc
star-alliance-webrtc-exp: https://github.com/root-hbx/star-alliance-webrtc-exp

添加 “子模块” 进 “总仓库”¶

Bash
# 进入待汇总的 “总仓库”
cd star-alliance-core

# 使用 `git submodule add` 并提供仓库的 SSH URL
# 格式: git submodule add <URL> <PATH> 
git submodule add [email protected]:root-hbx/star-alliance-webrtc.git star-alliance-webrtc
git submodule add [email protected]:root-hbx/star-alliance-webrtc-exp.git star-alliance-webrtc-exp

添加子模块后，我们会注意到工作区多几个新文件: .gitmodules 和刚刚添加的两个子模块目录

现在，将这些变更提交到 "主仓库":

Bash
git add .gitmodules star-alliance-webrtc-exp star-alliance-webrtc
git commit -m "feat: Add webrtc and webrtc-exp submodules"
git push

按理说，这个过程已经结束了，但是还需要进一步思考几个问题

"子模块" 如何独立 git pull
"总仓库" git pull 与 "子模块" git pull 的关系
其他人克隆 star-alliance-core 仓库后，是否会自动拉取子模块的内容? aka. 子模块究竟是 “直接附属内容” 还是 “一个由总仓库指向的指针”

"子模块" 是 "总仓库" 指向某独立仓库的 "一个特定commit" 的指针¶

当你独立更新了子模块仓库（star-alliance-webrtc-exp）后，主仓库（star-alliance-core）并不会自动知道这个更新。

主仓库只是记录了子模块在某一个特定的 commit ID。你需要做的就是进入主仓库，告诉它去指向子模块的最新 commit ⚠️

如图所示:

alt text

比如, star-alliance-webrtc子模块指向的便是它的 @ 86fdcab commit

假设你在另一台设备上 push 了 star-alliance-webrtc-exp 的新代码。现在，在 star-alliance-core 仓库中，执行以下步骤:

Bash
cd star-alliance-webrtc-exp
# 这个目录本身就是一个完整的 Git 仓库，所以你可以直接在这里执行 git pull
# 拉取远程仓库的最新变更
git pull origin main

over!

现在，在 star-alliance-core 仓库中运行 git status，可以看到类似下面的提示:

Bash
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   star-alliance-webrtc-exp (new commits)

no changes added to commit (use "git add" and/or "git commit -a")

仔细看: modified: star-alliance-webrtc-exp (new commits) 明确地告诉你，Git 检测到 star-alliance-webrtc-exp 指向了一个新的 commit

现在，将这个 "指针" 的变更提交到主仓库:

Bash
git add star-alliance-webrtc-exp
git commit -m "[BRANCH] feat (webrtc-exp): Update submodule to latest commit"
git push

TL; DR

看到这里, 下面两个问题就已经解释完了:

"子模块" 如何独立 git pull
"总仓库" git pull 与 "子模块" git pull 的关系

如何clone一个带有submodule的仓库¶

Bash
# 一如既往地clone:
git clone https://github.com/root-hbx/star-alliance-core
cd star-alliance-core

# 现在已经有 "指针"了, 但是要fetch内容进来:
# 初始化 + 更新 子模块
git submodule init
git submodule update

PS: 开发建议

Bash
# 初始化 + 更新 子模块
git submodule init
git submodule update

建议合二为一:

Bash
git submodule update --init --recursive

因为前者不会处理嵌套子模块（子模块的子模块）⚠️

如果你的 star-alliance-webrtc 子模块内部还有一个自己的子模块，这个命令对此将一无所知...

还是 --recursive 比较省心

开发流规范¶

通过上面我们明白了几件事:

如何为一个现有仓库添加子模块
如何clone一个带有子模块的仓库
在 “子模块”自身对应的独立仓库更新后, 应该如何同步到 “总仓库中的子模块”

还有一个问题, 如果我在“总仓库”里对“子模块”内容进行修改:

是否允许同步到对应的原“独立仓库”？如果可以，应如何？
对“子模块”内容进行修改后，别的用户应该如何同步？

如何同步到对应的原“独立仓库”

Bash
cd star-alliance-webrtc-exp
# 像在任何普通仓库中一样修改文件
git add .
git commit -m "feat: Add new feature from within the core project"
# 现在，这个新的 commit 已经保存在子模块的本地历史中了
# 将子模块的改动推送到 star-alliance-webrtc-exp 自己的 独立原仓库
git push origin main

现在回到“总仓库”, 会发现: 子模块内容已经更新了，但是指针还没指向其最新的commit ⚠️

因此:

Bash
# 添加子模块的变更（这实际上是更新它的 commit 指针）
git add star-alliance-webrtc-exp
# 提交这个更新到总仓库
git commit -m "chore: Update webrtc-exp submodule to latest commit"
# 最后，将总仓库的这次更新推送到远程
git push

“子模块”内容进行修改后，别的用户应该如何同步

合作者需要获取我刚刚完成的所有更新（包括总仓库的更新和子模块的更新）

(1) 总仓库更新:

Bash
cd star-alliance-core
git pull

现在他的"本地总仓库"现在也知道了 star-alliance-webrtc-exp 子模块应该指向一个新的 commit。但是，子模块目录里的实际代码还没有更新

Bash
git submodule update --init --recursive

这个指令会根据 总仓库记录的最新指针，去更新子模块的实际代码👍

为什么这里不可以用 cd + git pull

	git submodule update (正确方式)	cd ...; git pull (错误方式)
操作目标	同步到总仓库指定的版本	更新到子模块自身最新的版本
风险	低，可复现	高，可能引入未经测试的代码

Git Branch¶

如何“优雅地”更改一个branch的name
如何设置一个分支为“主分支”

将 dev 分支重命名为 artifacts (本地和远程)¶

先在本地重命名，然后推送到远程，最后删除远程的旧分支

(1) 确保你在 dev 分支，并拉取最新代码

Bash
git checkout dev
git pull origin dev

(2) 在本地重命名 dev 分支为 artifacts (-m)

Bash
git branch -m dev artifacts

现在，你本地的 dev 分支已经变成了 artifacts 分支 (原本的dev显然是disappear了)

(3) 将新的 artifacts 分支推送到远程仓库

Bash
git push -u origin artifacts

在远程仓库 origin 上创建一个新的 artifacts 分支
本地与远程相关联

(4) 现在远程仓库同时存在 dev 和 artifacts 两个分支，我们需要删除旧的 dev 分支

Bash
git push origin --delete dev

结束!

设置一个分支为“主分支”¶

直接在github仓库里操作即可

alt text