追溯地将Git文件夹转换为子模块?


115

通常,您正在编写某种类型的项目,一段时间后,很明显该项目的某些组件实际上可用作独立组件(也许是库)。如果您从一开始就有这个想法,那么很有可能大多数代码都在其自己的文件夹中。

有没有一种方法可以将Git项目中的子目录之一转换为子模块?

理想情况下,将发生以下情况:从父项目中删除该目录中的所有代码,并在子目录项目中添加所有适当的历史记录,并将所有父项目提交都指向正确的子模块提交。 。



这不是原始问题的一部分,但是更酷的是将保留已在文件夹外部启动并移入其中的文件历史记录的一种方法。此刻,所有答案都失去了搬迁之前的所有历史记录。
naught101

2
@ggll的链接掉了。这是存档副本。
s3cur3

Answers:


84

要将子目录隔离到其自己的存储库中,请filter-branch在原始存储库的克隆上使用:

git clone <your_project> <your_submodule>
cd <your_submodule>
git filter-branch --subdirectory-filter 'path/to/your/submodule' --prune-empty -- --all

然后,无非就是删除原始目录并将子模块添加到父项目中。


18
您可能还想git remote rm <name>在filter分支之后,然后添加一个新的遥控器。另外,如果有忽略的文件,则a git clean -xd -f可能会有用
naught101

-- --all如果只能从该分支中​​提取子模块,则可以用分支的名称替换。
adius 2015年

是否git clone <your_project> <your_submodule>只下载文件your_submodule?
多米尼克

@DominicTobias:git clone source destination只需告诉Git放置克隆文件的位置即可。然后,在此filter-branch步骤中发生了过滤子模块文件的实际魔术。
knittl

filter-branch现在不推荐使用。您可以使用git clone --filter,但是您的Git服务器必须配置为允许过滤,否则您将获得warning: filtering not recognized by server, ignoring
Matthias Braun

24

首先将目录更改为将成为子模块的文件夹。然后:

git init
git remote add origin repourl
git add .
git commit -am'first commit in submodule'
git push -u origin master
cd ..
rm -rf folder wich will be a submodule
git commit -am'deleting folder'
git submodule add repourl folder wich will be a submodule
git commit -am'adding submodule'

9
这将丢失该文件夹的所有历史记录。
naught101 '16

6
文件夹的历史记录将保存在主存储库中,新提交的历史记录将保存在子模块中
zednight

11

我知道这是一个旧线程,但是这里的答案将挤压其他分支中的所有相关提交。

克隆并保留所有这些额外分支和提交的简单方法:

1-确保您具有此git别名

git config --global alias.clone-branches '! git branch -a | sed -n "/\/HEAD /d; /\/master$/d; /remotes/p;" | xargs -L1 git checkout -t'

2-克隆遥控器,拉出所有分支,更改遥控器,过滤目录,按入

git clone git@github.com:user/existing-repo.git new-repo
cd new-repo
git clone-branches
git remote rm origin
git remote add origin git@github.com:user/new-repo.git
git remote -v
git filter-branch --subdirectory-filter my_directory/ -- --all
git push --all
git push --tags

1
我的原件具有要旨的链接,而不是将代码嵌入到SO上
oodavid

1

可以做到,但这并不简单。如果您搜索git filter-branchsubdirectory并且submodule,有对过程中的一些体面的写起坐。从本质上讲,它需要创建项目的两个克隆,使用git filter-branch它们删除一个子目录中除一个子目录之外的所有内容,并仅删除另一个子目录中的该子目录。然后,您可以将第二个存储库建立为第一个存储库的子模块。


0

现状

假设我们有一个叫做库repo-old包含一个子目录 sub,我们想转换成一个子模块有自己的回购repo-sub

还打算将原始存储库repo-old转换为修改后的存储库repo-new,其中所有与先前存在的子目录sub相关的提交现在都应指向提取的子模块存储库的相应提交repo-sub

让我们改变

可以通过git filter-branch两步过程来实现此目的:

  1. repo-old到的子目录提取repo-sub(已在接受的答案中提及)
  2. 子目录替换repo-oldrepo-new(具有正确的提交映射)

备注:我知道这个问题是古老的,已经提到过该问题已git filter-branch过时,可能会很危险。但另一方面,它可能会帮助他人提供易于转换后易于验证的个人存储库。所以要当心!并且请告诉我是否有其他工具可以执行相同的操作而又不被弃用并且可以安全使用!

我将在下面的git版本2.26.2中说明如何在Linux上实现这两个步骤。较旧的版本可能会在某种程度上起作用,但这需要进行测试。

为了简单起见,我将自己限制在原始存储库中只有一个master分支和一个origin远程对象的情况下repo-old。还请注意,我依赖带有前缀的临时git标签,这些标签temp_将在此过程中删除。因此,如果已经有类似名称的标签,则可能需要调整以下前缀。最后请注意,我尚未对此进行广泛的测试,并且在某些极端情况下,该配方可能会失败。因此,请先备份所有内容,然后再继续

可以将以下bash片段合并为一个大脚本,然后应在回购所在的同一文件夹中执行该脚本repo-org。不建议将所有内容直接复制并粘贴到命令窗口中(即使我已经成功测试了此操作)!

0.准备

变数

# Root directory where repo-org lives
# and a temporary location for git filter-branch
root="$PWD"
temp='/dev/shm/tmp'

# The old repository and the subdirectory we'd like to extract
repo_old="$root/repo-old"
repo_old_directory='sub'

# The new submodule repository, its url
# and a hash map folder which will be populated
# and later used in the filter script below
repo_sub="$root/repo-sub"
repo_sub_url='https://github.com/somewhere/repo-sub.git'
repo_sub_hashmap="$root/repo-sub.map"

# The new modified repository, its url
# and a filter script which is created as heredoc below
repo_new="$root/repo-new"
repo_new_url='https://github.com/somewhere/repo-new.git'
repo_new_filter="$root/repo-new.sh"

过滤脚本

# The index filter script which converts our subdirectory into a submodule
cat << EOF > "$repo_new_filter"
#!/bin/bash

# Submodule hash map function
sub ()
{
    local old_commit=\$(git rev-list -1 \$1 -- '$repo_old_directory')

    if [ ! -z "\$old_commit" ]
    then
        echo \$(cat "$repo_sub_hashmap/\$old_commit")
    fi
}

# Submodule config
SUB_COMMIT=\$(sub \$GIT_COMMIT)
SUB_DIR='$repo_old_directory'
SUB_URL='$repo_sub_url'

# Submodule replacement
if [ ! -z "\$SUB_COMMIT" ]
then
    touch '.gitmodules'
    git config --file='.gitmodules' "submodule.\$SUB_DIR.path" "\$SUB_DIR"
    git config --file='.gitmodules' "submodule.\$SUB_DIR.url" "\$SUB_URL"
    git config --file='.gitmodules' "submodule.\$SUB_DIR.branch" 'master'
    git add '.gitmodules'

    git rm --cached -qrf "\$SUB_DIR"
    git update-index --add --cacheinfo 160000 \$SUB_COMMIT "\$SUB_DIR"
fi
EOF
chmod +x "$repo_new_filter"

1.子目录提取

cd "$root"

# Create a new clone for our new submodule repo
git clone "$repo_old" "$repo_sub"

# Enter the new submodule repo
cd "$repo_sub"

# Remove the old origin remote
git remote remove origin

# Loop over all commits and create temporary tags
for commit in $(git rev-list --all)
do
    git tag "temp_$commit" $commit
done

# Extract the subdirectory and slice commits
mkdir -p "$temp"
git filter-branch --subdirectory-filter "$repo_old_directory" \
                  --tag-name-filter 'cat' \
                  --prune-empty --force -d "$temp" -- --all

# Populate hash map folder from our previously created tag names
mkdir -p "$repo_sub_hashmap"
for tag in $(git tag | grep "^temp_")
do
    old_commit=${tag#'temp_'}
    sub_commit=$(git rev-list -1 $tag)

    echo $sub_commit > "$repo_sub_hashmap/$old_commit"
done
git tag | grep "^temp_" | xargs -d '\n' git tag -d 2>&1 > /dev/null

# Add the new url for this repository (and e.g. push)
git remote add origin "$repo_sub_url"
# git push -u origin master

2.子目录替换

cd "$root"

# Create a clone for our modified repo
git clone "$repo_old" "$repo_new"

# Enter the new modified repo
cd "$repo_new"

# Remove the old origin remote
git remote remove origin

# Replace the subdirectory and map all sliced submodule commits using
# the filter script from above
mkdir -p "$temp"
git filter-branch --index-filter "$repo_new_filter" \
                  --tag-name-filter 'cat' --force -d "$temp" -- --all

# Add the new url for this repository (and e.g. push)
git remote add origin "$repo_new_url"
# git push -u origin master

# Cleanup (commented for safety reasons)
# rm -rf "$repo_sub_hashmap"
# rm -f "$repo_new_filter"

备注:如果新创建的仓库repo-new在此期间挂起,git submodule update --init则尝试递归地重新克隆存储库一次:

cd "$root"

# Clone the new modified repo recursively
git clone --recursive "$repo_new" "$repo_new-tmp"

# Now use the newly cloned one
mv "$repo_new" "$repo_new-bak"
mv "$repo_new-tmp" "$repo_new"

# Cleanup (commented for safety reasons)
# rm -rf "$repo_new-bak"

0

这样就完成了转换,您可以像处理任何过滤器分支一样撤消转换(我使用git fetch . +refs/original/*:*)。

我有一个带有utils库的项目,该库开始在其他项目中很有用,并且希望将其历史记录拆分为子模块。没想到先看SO,所以我写了自己的书,它在本​​地建立历史记录,因此速度要快一些,之后,如果需要,您可以设置helper命令的.gitmodules文件等,然后将子模块历史记录本身推到任何地方你要。

剥离的命令本身位于此处,注释中的文档位于其后的未剥离的命令中。将其作为带有subdirset 的命令运行,就像subdir=utils git split-submodule您要分割utils目录一样。它很一次性,因为它是一次性的,但是我在Git历史记录的Documentation子目录中对其进行了测试。

#!/bin/bash
# put this or the commented version below in e.g. ~/bin/git-split-submodule
${GIT_COMMIT-exec git filter-branch --index-filter "subdir=$subdir; ${debug+debug=$debug;} $(sed 1,/SNIP/d "$0")" "$@"}
${debug+set -x}
fam=(`git rev-list --no-walk --parents $GIT_COMMIT`)
pathcheck=(`printf "%s:$subdir\\n" ${fam[@]} \
    | git cat-file --batch-check='%(objectname)' | uniq`)
[[ $pathcheck = *:* ]] || {
    subfam=($( set -- ${fam[@]}; shift;
        for par; do tpar=`map $par`; [[ $tpar != $par ]] &&
            git rev-parse -q --verify $tpar:"$subdir"
        done
    ))
    git rm -rq --cached --ignore-unmatch  "$subdir"
    if (( ${#pathcheck[@]} == 1 && ${#fam[@]} > 1 && ${#subfam[@]} > 0)); then
        git update-index --add --cacheinfo 160000,$subfam,"$subdir"
    else
        subnew=`git cat-file -p $GIT_COMMIT | sed 1,/^$/d \
            | git commit-tree $GIT_COMMIT:"$subdir" $(
                ${subfam:+printf ' -p %s' ${subfam[@]}}) 2>&-
            ` &&
        git update-index --add --cacheinfo 160000,$subnew,"$subdir"
    fi
}
${debug+set +x}

#!/bin/bash
# Git filter-branch to split a subdirectory into a submodule history.

# In each commit, the subdirectory tree is replaced in the index with an
# appropriate submodule commit.
# * If the subdirectory tree has changed from any parent, or there are
#   no parents, a new submodule commit is made for the subdirectory (with
#   the current commit's message, which should presumably say something
#   about the change). The new submodule commit's parents are the
#   submodule commits in any rewrites of the current commit's parents.
# * Otherwise, the submodule commit is copied from a parent.

# Since the new history includes references to the new submodule
# history, the new submodule history isn't dangling, it's incorporated.
# Branches for any part of it can be made casually and pushed into any
# other repo as desired, so hooking up the `git submodule` helper
# command's conveniences is easy, e.g.
#     subdir=utils git split-submodule master
#     git branch utils $(git rev-parse master:utils)
#     git clone -sb utils . ../utilsrepo
# and you can then submodule add from there in other repos, but really,
# for small utility libraries and such, just fetching the submodule
# histories into your own repo is easiest. Setup on cloning a
# project using "incorporated" submodules like this is:
#   setup:  utils/.git
#
#   utils/.git:
#       @if _=`git rev-parse -q --verify utils`; then \
#           git config submodule.utils.active true \
#           && git config submodule.utils.url "`pwd -P`" \
#           && git clone -s . utils -nb utils \
#           && git submodule absorbgitdirs utils \
#           && git -C utils checkout $$(git rev-parse :utils); \
#       fi
# with `git config -f .gitmodules submodule.utils.path utils` and
# `git config -f .gitmodules submodule.utils.url ./`; cloners don't
# have to do anything but `make setup`, and `setup` should be a prereq
# on most things anyway.

# You can test that a commit and its rewrite put the same tree in the
# same place with this function:
# testit ()
# {
#     tree=($(git rev-parse `git rev-parse $1`: refs/original/refs/heads/$1));
#     echo $tree `test $tree != ${tree[1]} && echo ${tree[1]}`
# }
# so e.g. `testit make~95^2:t` will print the `t` tree there and if
# the `t` tree at ~95^2 from the original differs it'll print that too.

# To run it, say `subdir=path/to/it git split-submodule` with whatever
# filter-branch args you want.

# $GIT_COMMIT is set if we're already in filter-branch, if not, get there:
${GIT_COMMIT-exec git filter-branch --index-filter "subdir=$subdir; ${debug+debug=$debug;} $(sed 1,/SNIP/d "$0")" "$@"}

${debug+set -x}
fam=(`git rev-list --no-walk --parents $GIT_COMMIT`)
pathcheck=(`printf "%s:$subdir\\n" ${fam[@]} \
    | git cat-file --batch-check='%(objectname)' | uniq`)

[[ $pathcheck = *:* ]] || {
    subfam=($( set -- ${fam[@]}; shift;
        for par; do tpar=`map $par`; [[ $tpar != $par ]] &&
            git rev-parse -q --verify $tpar:"$subdir"
        done
    ))

    git rm -rq --cached --ignore-unmatch  "$subdir"
    if (( ${#pathcheck[@]} == 1 && ${#fam[@]} > 1 && ${#subfam[@]} > 0)); then
        # one id same for all entries, copy mapped mom's submod commit
        git update-index --add --cacheinfo 160000,$subfam,"$subdir"
    else
        # no mapped parents or something changed somewhere, make new
        # submod commit for current subdir content.  The new submod
        # commit has all mapped parents' submodule commits as parents:
        subnew=`git cat-file -p $GIT_COMMIT | sed 1,/^$/d \
            | git commit-tree $GIT_COMMIT:"$subdir" $(
                ${subfam:+printf ' -p %s' ${subfam[@]}}) 2>&-
            ` &&
        git update-index --add --cacheinfo 160000,$subnew,"$subdir"
    fi
}
${debug+set +x}
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.