如何仅克隆某些目录的git仓库?


26

例如,我要下载PCL 3d_rec_framework。

这是PCL的git存储库:https : //github.com/PointCloudLibrary/pcl.git

如何下载该目录?

https://github.com/PointCloudLibrary/pcl/tree/master/apps

我已经尝试运行,但是没有用:

sam@sam:~/code/pcl_standalone$ git clone https://github.com/PointCloudLibrary/pcl/tree/master/apps/3d_rec_framework
Cloning into '3d_rec_framework'...
error: The requested URL returned error: 403 while accessing https://github.com/PointCloudLibrary/pcl/tree/master/apps/3d_rec_framework/info/refs
fatal: HTTP request failed
sam@sam:~/code/pcl_standalone$ 

如何下载呢?

顺便说一句,我不想​​下载PCL的git并删除我不需要的所有其他目录。这就是为什么我问这个问题。

谢谢〜

Answers:


7

你不能。使用git,您可以克隆整个存储库以及存储库的完整历史记录。

有一些变通办法可以从git存档中获取单个文件,该文件在同一问题Stack Exchange答案上列出,但是您仍然必须下载整个存储库才能获取所需的单个文件或目录。



3
@CelticParser因此,您声称我的答案不正确,然后继续指出需要从git存储库下载每个文件才能得到一个文件的答案?
dobey '16


>“我不想下载PCL的git并删除我不需要的所有其他目录。” 那是开放式的。我将其读取为@sam不想手动删除目录。
CelticParser

39

从git v1.7开始,dobey的答案就不再是这种情况。现在,您可以从存储库中检出某些文件夹。完整说明在此处找到。

git init <repo>
cd <repo>
git remote add -f origin <url>

git config core.sparseCheckout true

echo "some/dir/" >> .git/info/sparse-checkout
echo "another/sub/tree" >> .git/info/sparse-checkout

这告诉git您要检出哪些目录。然后你可以拉那些目录

git pull origin master

3
这意味着所有Ubuntu版本都有1.7可用。您应检查情况是否正确,并在此处评论您的答案,以了解实际使用的各个版本。PowerShell也不是Ubuntu,因此我认为不应将它包括在内。
托马斯·沃德

2
@ThomasW。当前所有受支持的Ubuntu版本至少包含git 1.7,现在大多数为2.x。
dobey

4
仍然会克隆整个存储库,然后进行稀疏签出。
Clerenz

@dobey,说真的,您删除了一些有用的信息,人们可能会在Google上找到这个问题?如果我被迫使用powershell,那么我肯定想查看管道的详细信息,它们并不明显!回声“一些/目录/” | 文件外编码ascii .git / info / sparse-checkout echo“另一个/子/树/” | 文件外附加-编码ascii .git / info / sparse-checkout
SamuelÅslund18年

8

首先,请执行以下操作:

git clone --depth 1 [repo root] [name of destination directory]

然后:

cd [name of destination directory]

...最后:

git filter-branch --prune-empty --subdirectory-filter [path to sub-dir] HEAD

就这么简单。Git将重写存储库,以便仅包含所需的子目录。即使子目录的深度为几层,此方法也有效。只需将目标目录命名为子目录的名称即可。然后在“ git filter-branch”命令中将相对路径放入子目录。哦,--depth 1告诉git他们只下载头部的顶部(本质上是删除历史记录)。


这样可以下载一个子目录,但是问题涉及多个目录..这样可以吗?我不得不说,在查看文档时,我看不到它是如何工作的。
Joeppie

是否有不定期刷新该目录的简便方法?
Clerenz

4

git clone --filter 从Git 2.19

该选项实际上将跳过从服务器获取大多数不需要的对象的操作:

git clone --depth 1 --no-checkout --filter=blob:none \
  "file://$(pwd)/server_repo" local_repo
cd local_repo
git checkout master -- mydir/myfile

服务器应配置为:

git config --local uploadpack.allowfilter 1
git config --local uploadpack.allowanysha1inwant 1

从v2.19.0开始,没有服务器支持,但是已经在本地进行了测试。

TODO:--filter=blob:none跳过所有blob,但仍获取所有树对象。但是在正常的仓库中,与文件本身相比,它应该很小,所以已经足够了。在以下位置询问:https: //www.spinics.net/lists/git/msg342006.html开发人员回答说,--filter=tree:0正在为此做准备。

请记住,这--depth 1已经暗示了--single-branch,另请参见:https : //stackoverflow.com/questions/1778088/how-to-clone-a-single-branch-in-git

file://$(path)需要克服git clone协议的恶作剧:https : //stackoverflow.com/questions/47307578/how-to-shallow-clone-a-local-git-repository-with-a-relative-path

的格式--filter记录在上man git-rev-list

对Git远程协议进行了扩展以支持此功能。

Git树上的文档:

另请参阅:https : //stackoverflow.com/questions/2466735/how-to-checkout-only-one-file-from-git-repository-sparse-checkout

测试一下

#!/usr/bin/env bash
set -eu

list-objects() (
  git rev-list --all --objects
  echo "master commit SHA: $(git log -1 --format="%H")"
  echo "mybranch commit SHA: $(git log -1 --format="%H")"
  git ls-tree master
  git ls-tree mybranch | grep mybranch
  git ls-tree master~ | grep root
)

# Reproducibility.
export GIT_COMMITTER_NAME='a'
export GIT_COMMITTER_EMAIL='a'
export GIT_AUTHOR_NAME='a'
export GIT_AUTHOR_EMAIL='a'
export GIT_COMMITTER_DATE='2000-01-01T00:00:00+0000'
export GIT_AUTHOR_DATE='2000-01-01T00:00:00+0000'

rm -rf server_repo local_repo
mkdir server_repo
cd server_repo

# Create repo.
git init --quiet
git config --local uploadpack.allowfilter 1
git config --local uploadpack.allowanysha1inwant 1

# First commit.
# Directories present in all branches.
mkdir d1 d2
printf 'd1/a' > ./d1/a
printf 'd1/b' > ./d1/b
printf 'd2/a' > ./d2/a
printf 'd2/b' > ./d2/b
# Present only in root.
mkdir 'root'
printf 'root' > ./root/root
git add .
git commit -m 'root' --quiet

# Second commit only on master.
git rm --quiet -r ./root
mkdir 'master'
printf 'master' > ./master/master
git add .
git commit -m 'master commit' --quiet

# Second commit only on mybranch.
git checkout -b mybranch --quiet master~
git rm --quiet -r ./root
mkdir 'mybranch'
printf 'mybranch' > ./mybranch/mybranch
git add .
git commit -m 'mybranch commit' --quiet

echo "# List and identify all objects"
list-objects
echo

# Restore master.
git checkout --quiet master
cd ..

# Clone. Don't checkout for now, only .git/ dir.
git clone --depth 1 --quiet --no-checkout --filter=blob:none "file://$(pwd)/server_repo" local_repo
cd local_repo

# List missing objects from master.
echo "# Missing objects after --no-checkout"
git rev-list --all --quiet --objects --missing=print
echo

echo "# Git checkout fails without internet"
mv ../server_repo ../server_repo.off
! git checkout master
echo

echo "# Git checkout fetches the missing file from internet"
mv ../server_repo.off ../server_repo
git checkout master -- d1/a
echo

echo "# Missing objects after checking out d1/a"
git rev-list --all --quiet --objects --missing=print

GitHub上游

Git v2.19.0中的输出:

# List and identify all objects
c6fcdfaf2b1462f809aecdad83a186eeec00f9c1
fc5e97944480982cfc180a6d6634699921ee63ec
7251a83be9a03161acde7b71a8fda9be19f47128
62d67bce3c672fe2b9065f372726a11e57bade7e
b64bf435a3e54c5208a1b70b7bcb0fc627463a75 d1
308150e8fddde043f3dbbb8573abb6af1df96e63 d1/a
f70a17f51b7b30fec48a32e4f19ac15e261fd1a4 d1/b
84de03c312dc741d0f2a66df7b2f168d823e122a d2
0975df9b39e23c15f63db194df7f45c76528bccb d2/a
41484c13520fcbb6e7243a26fdb1fc9405c08520 d2/b
7d5230379e4652f1b1da7ed1e78e0b8253e03ba3 master
8b25206ff90e9432f6f1a8600f87a7bd695a24af master/master
ef29f15c9a7c5417944cc09711b6a9ee51b01d89
19f7a4ca4a038aff89d803f017f76d2b66063043 mybranch
1b671b190e293aa091239b8b5e8c149411d00523 mybranch/mybranch
c3760bb1a0ece87cdbaf9a563c77a45e30a4e30e
a0234da53ec608b54813b4271fbf00ba5318b99f root
93ca1422a8da0a9effc465eccbcb17e23015542d root/root
master commit SHA: fc5e97944480982cfc180a6d6634699921ee63ec
mybranch commit SHA: fc5e97944480982cfc180a6d6634699921ee63ec
040000 tree b64bf435a3e54c5208a1b70b7bcb0fc627463a75    d1
040000 tree 84de03c312dc741d0f2a66df7b2f168d823e122a    d2
040000 tree 7d5230379e4652f1b1da7ed1e78e0b8253e03ba3    master
040000 tree 19f7a4ca4a038aff89d803f017f76d2b66063043    mybranch
040000 tree a0234da53ec608b54813b4271fbf00ba5318b99f    root

# Missing objects after --no-checkout
?f70a17f51b7b30fec48a32e4f19ac15e261fd1a4
?8b25206ff90e9432f6f1a8600f87a7bd695a24af
?41484c13520fcbb6e7243a26fdb1fc9405c08520
?0975df9b39e23c15f63db194df7f45c76528bccb
?308150e8fddde043f3dbbb8573abb6af1df96e63

# Git checkout fails without internet
fatal: '/home/ciro/bak/git/test-git-web-interface/other-test-repos/partial-clone.tmp/server_repo' does not appear to be a git repository
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

# Git checkout fetches the missing directory from internet
remote: Enumerating objects: 1, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 1 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (1/1), 45 bytes | 45.00 KiB/s, done.
remote: Enumerating objects: 1, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 1 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (1/1), 45 bytes | 45.00 KiB/s, done.

# Missing objects after checking out d1
?f70a17f51b7b30fec48a32e4f19ac15e261fd1a4
?8b25206ff90e9432f6f1a8600f87a7bd695a24af
?41484c13520fcbb6e7243a26fdb1fc9405c08520
?0975df9b39e23c15f63db194df7f45c76528bccb

结论:除所有斑点外d1/a均缺失。例如f70a17f51b7b30fec48a32e4f19ac15e261fd1a4d1/b结帐后不存在d1/

请注意,root/root并且mybranch/mybranch也丢失了,但是--depth 1将其从丢失的文件列表中隐藏了。如果删除--depth 1,则它们将显示在丢失文件的列表中。


By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.