如何替换文件中的字符串？

751

根据某些搜索条件替换文件中的字符串是非常常见的任务。我怎样才能

替换字符串foo用bar在当前目录下的所有文件？
递归子目录是否一样？
仅在文件名匹配另一个字符串时才替换？
仅在特定上下文中找到字符串时才替换？
如果字符串在某个行号上，请替换？
用相同的替换替换多个字符串
用不同的替换项替换多个字符串

— Terdon
source

2

这旨在成为该主题的规范问答（请参阅本元讨论），请随时在下面编辑我的答案或添加您自己的答案。

— terdon

1009

1.在当前目录的所有文件中，将所有出现的一个字符串替换为另一个：

这些情况适用于您知道目录仅包含常规文件并且要处理所有非隐藏文件的情况。如果不是这种情况，请使用2中的方法。

sed此答案中的所有解决方案均采用GNU sed。如果使用FreeBSD或OS / X，请替换-i为-i ''。还要注意，将-i开关与任何版本的交换机一起使用都sed具有一定的文件系统安全隐患，并且在计划以任何方式分发的任何脚本中都不建议这样做。

非递归文件，仅位于此目录中：
```
sed -i -- 's/foo/bar/g' *
perl -i -pe 's/foo/bar/g' ./* 
```
（perl对于以|或空格结尾的文件名，该选项将失败）。
此子目录和所有子目录中的递归常规文件（包括隐藏文件）
```
find . -type f -exec sed -i 's/foo/bar/g' {} +
```
如果您使用的是zsh：
```
sed -i -- 's/foo/bar/g' **/*(D.)
```
（如果列表太大，可能会失败，请zargs尝试解决）。

Bash无法直接检查常规文件，需要循环（大括号避免全局设置选项）：
```
( shopt -s globstar dotglob;
    for file in **; do
        if [[ -f $file ]] && [[ -w $file ]]; then
            sed -i -- 's/foo/bar/g' "$file"
        fi
    done
)
```
当文件是实际文件（-f）并且可写（-w）时，将选择它们。

2.仅在文件名与另一个字符串匹配/具有特定扩展名/具有某种类型等时才替换：

非递归文件，仅位于此目录中：

sed -i -- 's/foo/bar/g' *baz*    ## all files whose name contains baz
sed -i -- 's/foo/bar/g' *.baz    ## files ending in .baz

此子目录和所有子目录中的递归常规文件
```
find . -type f -name "*baz*" -exec sed -i 's/foo/bar/g' {} +
```
如果您正在使用bash（花括号，请避免全局设置选项）：
```
( shopt -s globstar dotglob
    sed -i -- 's/foo/bar/g' **baz*
    sed -i -- 's/foo/bar/g' **.baz
)
```
如果您使用的是zsh：
```
sed -i -- 's/foo/bar/g' **/*baz*(D.)
sed -i -- 's/foo/bar/g' **/*.baz(D.)
```
在--发球告诉sed没有更多的旗帜将在命令行中给出。这对于防止以-。开头的文件名很有用。
例如，如果文件是某种类型的文件，则为可执行文件（man find有关更多选项，请参见）：
```
find . -type f -executable -exec sed -i 's/foo/bar/g' {} +
```
zsh：
```
sed -i -- 's/foo/bar/g' **/*(D*)
```

3.仅当在特定上下文中找到字符串时才替换

更换foo用bar仅如果有baz在同一行后：
```
sed -i 's/foo$.*baz$/bar\1/' file
```
在中sed，使用保存括号中的所有内容，然后可以使用进行访问\1。该主题有很多变体，要了解有关此类正则表达式的更多信息，请参见此处。
替换foo与bar只有当foo在输入文件的三维列（字段）被发现（假设空白分隔字段）：
```
gawk -i inplace '{gsub(/foo/,"baz",$3); print}' file
```
（需要gawk4.1.0或更高版本）。
对于不同的字段，只需使用$Nwhere N是感兴趣字段的编号。对于其他字段分隔符（:在此示例中），请使用：
```
gawk -i inplace -F':' '{gsub(/foo/,"baz",$3);print}' file
```
另一个解决方案使用perl：
```
perl -i -ane '$F[2]=~s/foo/baz/g; $" = " "; print "@F\n"' foo 
```
注意：awk和perl解决方案都将影响文件中的间距（删除开头和结尾的空格，并将空格序列转换为与之匹配的行中的一个空格字符）。对于不同的字段，请使用$F[N-1]where N是您想要的字段编号，对于不同的字段分隔符请使用（$"=":"将输出字段分隔符设置为:）：
```
perl -i -F':' -ane '$F[2]=~s/foo/baz/g; $"=":";print "@F"' foo 
```

仅在第四行替换foo为bar：

sed -i '4s/foo/bar/g' file
gawk -i inplace 'NR==4{gsub(/foo/,"baz")};1' file
perl -i -pe 's/foo/bar/g if $.==4' file

4.多次替换操作：用不同的字符串替换

您可以组合sed命令：
```
sed -i 's/foo/bar/g; s/baz/zab/g; s/Alice/Joan/g' file
```
请注意，顺序很重要（sed 's/foo/bar/g; s/bar/baz/g'将替换foo为baz）。

或Perl命令

perl -i -pe 's/foo/bar/g; s/baz/zab/g; s/Alice/Joan/g' file

如果您有大量的模式，则将模式及其替换项保存在sed脚本文件中会更容易：
```
#! /usr/bin/sed -f
s/foo/bar/g
s/baz/zab/g
```
或者，如果您有太多的模式对不可行，可以从文件中读取模式对（每行两个空格分隔的模式，$ pattern和$ replacement）：
```
while read -r pattern replacement; do   
    sed -i "s/$pattern/$replacement/" file
done < patterns.txt
```
对于一长串的模式和大型数据文件，这将非常慢，因此您可能希望读取模式并sed从中创建脚本。以下假设<space>分隔符分隔文件中每行出现的MATCH <space> REPLACE对的列表patterns.txt：
```
sed 's| *$[^ ]*$ *$[^ ]*$.*|s/\1/\2/g|' <patterns.txt |
sed -f- ./editfile >outfile
```
上面的格式在很大程度上是任意的，例如，在MATCH或REPLACE中都不允许使用<space>。不过，该方法非常通用：基本上，如果您可以创建一个看起来像脚本的输出流，则可以通过将的脚本文件指定为stdin 来将该流作为脚本进行源化。sedsedsed-

您可以以类似的方式组合和连接多个脚本：

SOME_PIPELINE |
sed -e'#some expression script'  \
    -f./script_file -f-          \
    -e'#more inline expressions' \
./actual_edit_file >./outfile

POSIX sed会将所有脚本按照它们在命令行中出现的顺序连接在一起。这些都不需要以\n情结结束。

grep 可以相同的方式工作：

sed -e'#generate a pattern list' <in |
grep -f- ./grepped_file

当使用固定字符串作为模式时，优良作法是转义正则表达式元字符。您可以轻松地做到这一点：

sed 's/[]$&^*\./[]/\\&/g
     s| *\([^ ]*\) *\([^ ]*\).*|s/\1/\2/g|
' <patterns.txt |
sed -f- ./editfile >outfile

5.多次替换操作：用同一字符串替换多个模式

更换所有的foo，bar或baz与foobar
```
sed -Ei 's/foo|bar|baz/foobar/g' file
```

要么

perl -i -pe 's/foo|bar|baz/foobar/g' file

— Terdon
source

2

@StéphaneChazelas感谢您的编辑，确实确实修复了一些问题。但是，请不要删除与bash相关的信息。并非所有人都使用zsh。一定要添加zsh信息，但是没有理由删除bash内容。另外，我知道使用外壳进行文本处理不是理想的选择，但是在某些情况下需要使用它。我在原始脚本的更好版本中进行了编辑，它将创建一个sed脚本，而不是实际使用shell循环进行解析。例如，如果您有数百对模式，这将很有用。

— terdon

2

@terdon，您的bash错误。4.3之前的bash下降时将遵循符号链接。而且bash对于(.)globlob限定符没有等效项，因此不能在此处使用。（您也缺少一些）。for循环不正确（缺少-r），意味着在文件中进行了多次传递，并且与sed脚本相比没有任何好处。

— 斯特凡Chazelas

7

@terdon 替代命令--之后 sed -i和之前指示什么？

— 极客

5

@Geek是POSIX的东西。它表示选项的结尾，并允许您传递以开头的参数-。使用它可以确保命令可以在名称为的文件上使用-foo。没有它，-f它将被解析为一个选项。

— terdon

1

在git仓库中执行一些递归命令时要非常小心。例如，此答案的第1部分中提供的解决方案实际上将修改目录中的内部git文件.git，并实际上使您的结帐混乱。最好按名称在特定目录内/上操作。

— Pistos

75

一个好的[R é PL acement Linux的工具是RPL，最初写了Debian项目，所以它可与apt-get install rpl任何Debian的发行版导出，并且可以为别人，否则你可以下载 tar.gz文件中SourgeForge。

最简单的使用示例：

 $ rpl old_string new_string test.txt

请注意，如果字符串包含空格，则应将其用引号引起来。默认情况下rpl，使用大写字母，但不使用完整单词，但是您可以使用选项-i（忽略大小写）和-w（整个单词）更改这些默认值。您还可以指定多个文件：

 $ rpl -i -w "old string" "new string" test.txt test2.txt

甚至指定扩展名（-x）进行搜索，甚至在目录中进行递归搜索（-R）：

 $ rpl -x .html -x .txt -R old_string new_string test*

您还可以使用（提示）选项以交互方式搜索/替换-p：

输出显示替换的文件/字符串的数量以及搜索的类型（区分大小写/全部/部分单词），但可以使用-q（quiet mode）选项使其静音，甚至更冗长，列出包含以下内容的行号使用-v（verbose模式）选项匹配每个文件和目录。

这是值得记住的其他选项-e（荣誉Ë花茎）允许regular expressions，所以你也可以搜索标签（\t），新线（\n），等。甚至您都可以-f用来强制权限（当然，仅当用户具有写权限时）并-d保留修改时间。）

最后，如果您不确定哪个会正确使用，请使用-s（模拟模式）。

— 弗兰
source

2

与sed相比，反馈和简单性要好得多。我只是希望它允许对文件名起作用，然后再按原样进行。

— Kzqai

1

我喜欢-s（模拟模式）:-)

— erm3nda

25

如何搜索和替换多个文件建议：

您也可以使用find和sed，但是我发现这行perl很好用。
perl -pi -w -e 's/search/replace/g;' *.php
-e表示执行以下代码行。

-i表示就地编辑

-w写警告

-p遍历输入文件，在将脚本应用到它之后打印每一行。

最好的结果来自使用perl和grep（以确保文件具有搜索表达式）

perl -pi -w -e 's/search/replace/g;' $( grep -rl 'search' )

— 亚历杭德罗·萨拉曼卡·马祖埃洛
source

13

您可以在Ex模式下使用Vim：

在当前目录的所有文件中用BRA替换字符串ALF？

for CHA in *
do
  ex -sc '%s/ALF/BRA/g' -cx "$CHA"
done

递归子目录是否一样？

find -type f -exec ex -sc '%s/ALF/BRA/g' -cx {} ';'

仅在文件名匹配另一个字符串时才替换？

for CHA in *.txt
do
  ex -sc '%s/ALF/BRA/g' -cx "$CHA"
done

仅在特定上下文中找到字符串时才替换？

ex -sc 'g/DEL/s/ALF/BRA/g' -cx file

如果字符串在某个行号上，请替换？

ex -sc '2s/ALF/BRA/g' -cx file

用相同的替换替换多个字符串

ex -sc '%s/\vALF|ECH/BRA/g' -cx file

用不同的替换项替换多个字符串

ex -sc '%s/ALF/BRA/g|%s/FOX/GOL/g' -cx file

— 史蒂文·潘尼
source

13

我用这个：

grep -r "old_string" -l | tr '\n' ' ' | xargs sed -i 's/old_string/new_string/g'

列出所有包含的文件old_string。
将结果中的换行符替换为空格（以便可以将文件列表提供给sed。
sed在这些文件上运行，用新的替换旧的字符串。

更新：上面的结果将对包含空格的文件名失败。而是使用：

grep --null -lr "old_string" | xargs --null sed -i 's/old_string/new_string/g'

— o_o_o--
source

请注意，如果您的任何文件名包含空格，制表符或换行符，则此操作将失败。使用grep --null -lr "old_string" | xargs --null sed -i 's/old_string/new_string/g'它将使其处理任意文件名。

— terdon

多谢你们。添加了更新并保留了旧代码，这是一个有趣的警告，对不知道此行为的人可能有用。

— o_o_o-- 2015年

6

从用户的角度来看，一个完美完成此工作的漂亮且简单的Unix工具是qsubst。例如，

% qsubst foo bar *.c *.h

将替换foo为bar我所有的C文件。一个不错的功能是qsubst执行查询替换，即，它将向我显示每次出现foo并询问是否要替换它。[您可以无条件地（无条件地）用-go选项替换，还有其他选项，例如，-w如果您只想在foo一个完整的单词中替换它。]

如何获得：它qsubst是der Mouse发明（来自McGill），并于1987年8月发布到comp.unix.sources 11（7）。存在更新的版本。例如，NetBSD版本qsubst.c,v 1.8 2004/11/01可以在我的Mac上编译并完美运行。

— phs
source

2

我需要一些可以提供空运行选项并且可以与glob递归一起工作的东西，在尝试使用它之后awk，sed我放弃了，而是在python中做到了。

该脚本以递归方式搜索与glob模式匹配的所有文件（例如--glob="*.html"）以查找正则表达式，并用替换的正则表达式替换：

find_replace.py [--dir=my_folder] \
    --search-regex=<search_regex> \
    --replace-regex=<replace_regex> \
    --glob=[glob_pattern] \
    --dry-run

^{的每个长期权--search-regex都有对应的短期权，即-s。运行-h以查看所有选项。}

例如，这会将所有日期从翻转2017-12-31到31-12-2017：

python replace.py --glob=myfile.txt \
    --search-regex="(\d{4})-(\d{2})-(\d{2})" \
    --replace-regex="\3-\2-\1" \
    --dry-run --verbose

import os
import fnmatch
import sys
import shutil
import re

import argparse

def find_replace(cfg):
    search_pattern = re.compile(cfg.search_regex)

    if cfg.dry_run:
        print('THIS IS A DRY RUN -- NO FILES WILL BE CHANGED!')

    for path, dirs, files in os.walk(os.path.abspath(cfg.dir)):
        for filename in fnmatch.filter(files, cfg.glob):

            if cfg.print_parent_folder:
                pardir = os.path.normpath(os.path.join(path, '..'))
                pardir = os.path.split(pardir)[-1]
                print('[%s]' % pardir)
            filepath = os.path.join(path, filename)

            # backup original file
            if cfg.create_backup:
                backup_path = filepath + '.bak'

                while os.path.exists(backup_path):
                    backup_path += '.bak'
                print('DBG: creating backup', backup_path)
                shutil.copyfile(filepath, backup_path)

            with open(filepath) as f:
                old_text = f.read()

            all_matches = search_pattern.findall(old_text)

            if all_matches:

                print('Found {} matches in file {}'.format(len(all_matches), filename))

                new_text = search_pattern.sub(cfg.replace_regex, old_text)

                if not cfg.dry_run:
                    with open(filepath, "w") as f:
                        print('DBG: replacing in file', filepath)
                        f.write(new_text)
                else:
                    for idx, matches in enumerate(all_matches):
                        print("Match #{}: {}".format(idx, matches))

                    print("NEW TEXT:\n{}".format(new_text))

            elif cfg.verbose:
                print('File {} does not contain search regex "{}"'.format(filename, cfg.search_regex))


if __name__ == '__main__':

    parser = argparse.ArgumentParser(description='''DESCRIPTION:
    Find and replace recursively from the given folder using regular expressions''',
                                     formatter_class=argparse.RawDescriptionHelpFormatter,
                                     epilog='''USAGE:
    {0} -d [my_folder] -s <search_regex> -r <replace_regex> -g [glob_pattern]

    '''.format(os.path.basename(sys.argv[0])))

    parser.add_argument('--dir', '-d',
                        help='folder to search in; by default current folder',
                        default='.')

    parser.add_argument('--search-regex', '-s',
                        help='search regex',
                        required=True)

    parser.add_argument('--replace-regex', '-r',
                        help='replacement regex',
                        required=True)

    parser.add_argument('--glob', '-g',
                        help='glob pattern, i.e. *.html',
                        default="*.*")

    parser.add_argument('--dry-run', '-dr',
                        action='store_true',
                        help="don't replace anything just show what is going to be done",
                        default=False)

    parser.add_argument('--create-backup', '-b',
                        action='store_true',
                        help='Create backup files',
                        default=False)

    parser.add_argument('--verbose', '-v',
                        action='store_true',
                        help="Show files which don't match the search regex",
                        default=False)

    parser.add_argument('--print-parent-folder', '-p',
                        action='store_true',
                        help="Show the parent info for debug",
                        default=False)

    config = parser.parse_args(sys.argv[1:])

    find_replace(config)

^{Here 是该脚本的更新版本，该脚本突出显示了搜索字词，并用不同的颜色进行了替换。}

— ccpizza
source

1

我不明白您为什么会做这么复杂的事情。要进行递归，请使用bash的（或您的shell的等效项）globstar选项和**glob或find。对于空运行，只需使用sed。除非您使用该-i选项，否则它将不会进行任何更改。用作备份sed -i.bak（或perl -i .bak）；对于不匹配的文件，请使用grep PATTERN file || echo file。为什么在世界上，您会用python扩展glob而不是让shell这样做呢？为什么script.py --glob=foo*不只是script.py foo*？

— terdon

1

我的原因很简单：（1）首先，易于调试；（2）仅使用具有支持社区的单一文档化工具（3）不了解又不熟悉sed并且awk不愿意花更多的时间来掌握它们，（4）可读性，（5）此解决方案也适用于非posix系统（不是我需要那，但其他人可能会）。

— ccpizza

1

ripgrep（命令名rg）是一个grep工具，但也支持搜索和替换。

$ cat ip.txt
dark blue and light blue
light orange
blue sky
$ # by default, line number is displayed if output destination is stdout
$ # by default, only lines that matched the given pattern is displayed
$ # 'blue' is search pattern and -r 'red' is replacement string
$ rg 'blue' -r 'red' ip.txt
1:dark red and light red
3:red sky

$ # --passthru option is useful to print all lines, whether or not it matched
$ # -N will disable line number prefix
$ # this command is similar to: sed 's/blue/red/g' ip.txt
$ rg --passthru -N 'blue' -r 'red' ip.txt
dark red and light red
light orange
red sky

rg 不支持就地选项，因此您必须自己做

$ # -N isn't needed here as output destination is a file
$ rg --passthru 'blue' -r 'red' ip.txt > tmp.txt && mv tmp.txt ip.txt
$ cat ip.txt
dark red and light red
light orange
red sky

有关正则表达式的语法和功能，请参见Rust regex文档。该-P开关将启用PCRE2风味。rg默认情况下支持Unicode。

$ # non-greedy quantifier is supported
$ echo 'food land bark sand band cue combat' | rg 'foo.*?ba' -r 'X'
Xrk sand band cue combat

$ # unicode support
$ echo 'fox:αλεπού,eagle:αετός' | rg '\p{L}+' -r '($0)'
(fox):(αλεπού),(eagle):(αετός)

$ # set operator example, remove all punctuation characters except . ! and ?
$ para='"Hi", there! How *are* you? All fine here.'
$ echo "$para" | rg '[[:punct:]--[.!?]]+' -r ''
Hi there! How are you? All fine here.

$ # use -P if you need even more advanced features
$ echo 'car bat cod map' | rg -P '(bat|map)(*SKIP)(*F)|\w+' -r '[$0]'
[car] bat [cod] map

像一样grep，该-F选项将允许匹配固定的字符串，我觉得也sed应该实现一个方便的选项。

$ printf '2.3/[4]*6\nfoo\n5.3-[4]*9\n' | rg --passthru -F '[4]*' -r '2'
2.3/26
foo
5.3-29

另一个方便的选项是-U启用多行匹配

$ # (?s) flag will allow . to match newline characters as well
$ printf '42\nHi there\nHave a Nice Day' | rg --passthru -U '(?s)the.*ice' -r ''
42
Hi  Day

rg 也可以处理dos样式的文件

$ # same as: sed -E 's/\w+(\r?)$/123\1/'
$ printf 'hi there\r\ngood day\r\n' | rg --passthru --crlf '\w+$' -r '123'
hi 123
good 123

的另一个优点rg是它可能比sed

$ # for small files, initial processing time of rg is a large component
$ time echo 'aba' | sed 's/a/b/g' > f1
real    0m0.002s
$ time echo 'aba' | rg --passthru 'a' -r 'b' > f2
real    0m0.007s

$ # for larger files, rg is likely to be faster
$ # 6.2M sample ASCII file
$ wget https://norvig.com/big.txt    
$ time LC_ALL=C sed 's/\bcat\b/dog/g' big.txt > f1
real    0m0.060s
$ time rg --passthru '\bcat\b' -r 'dog' big.txt > f2
real    0m0.048s
$ diff -s f1 f2
Files f1 and f2 are identical

$ time LC_ALL=C sed -E 's/\b(\w+)(\s+\1)+\b/\1/g' big.txt > f1
real    0m0.725s
$ time rg --no-pcre2-unicode --passthru -wP '(\w+)(\s+\1)+' -r '$1' big.txt > f2
real    0m0.093s
$ diff -s f1 f2
Files f1 and f2 are identical

— 日深
source