如何合并以反斜杠字符结尾的所有行？

36

使用sed或awk等通用命令行工具，是否可以将所有以给定字符结尾的行（例如反斜杠）连接起来？

例如，给定文件：

foo bar \
bash \
baz
dude \
happy

我想得到以下输出：

foo bar bash baz
dude happy

— 科里·克莱因
source

1

将文件传递给cpp:)

— imz – Ivan Zakharyaschev 2011年

如此众多的精彩答案，希望我能将它们全部标记为答案！感谢您对awk，sed和perl的深入了解，这些都是很好的例子。

— 科里·克莱恩

请注意，这是在sed常见问题

— 斯特凡Chazelas

27

一个更短，更简单的sed解决方案：

sed  '
: again
/\\$/ {
    N
    s/\\\n//
    t again
}
' textfile

或使用GNU的单线sed：

sed ':x; /\\$/ { N; s/\\\n//; tx }' textfile

— 神经
source

1

好...我最初只是看了一下，却听不懂（所以它不会进入太硬的篮子）...但是在深入研究了吉尔斯的答案之后（花了很长时间）我再看看您的答案，它看起来非常容易理解，我想我已经开始理解sed：）...您将每行直接附加到模式空间，并且当出现“正常结束”的行时，整个图案空间都会消失并自动打印（因为没有-n选项）...整齐！.. +1

— Peter.O

@fred：谢谢，我想我也开始了解sed，它为多行编辑提供了很好的工具，但是如何将它们混合在一起以获取所需的内容却并非一帆风顺，也不是易读性之上……

— neurino

也要注意DOS行尾。回车或\ r！

— user77376 '16

1

怎么了sed -e :a -e '/\\$/N; s/\\\n//; ta'

— 艾萨克（Isaac）

18

使用perl可能是最简单的（因为perl就像sed和awk一样，希望您可以接受）：

perl -p -e 's/\\\n//'

— 卡姆
source

简短而简单，我喜欢那个+1，而他没有明确要求sed或awk

— rudolfson

17

这是一个awk解决方案。如果某行以a结尾\，请去除反斜杠并打印该行，且不使用换行符终止；否则，用终止换行符打印该行。

awk '{if (sub(/\\$/,"")) printf "%s", $0; else print $0}'

尽管awk显然更具可读性，但它在sed中也还不错。

— 吉尔斯“别再邪恶了”
source

2

这样不是答案。这是一个附带问题sed。

具体来说，我需要逐步sed理解Gilles的命令才能理解它……我开始在上面写一些笔记，然后认为这对某人可能有用。

所以这里是... Gilles的sed 脚本，具有文档格式：

#!/bin/bash
#######################################
sed_dat="$HOME/ztest.dat"
while IFS= read -r line ;do echo "$line" ;done <<'END_DAT' >"$sed_dat"
foo bar \
bash \
baz
dude \
happy
yabba dabba 
doo
END_DAT

#######################################
sedexec="$HOME/ztest.sed"
while IFS= read -r line ;do echo "$line" ;done <<'END-SED' >"$sedexec"; \
sed  -nf "$sedexec" "$sed_dat"

  s/\\$//        # If a line has trailing '\', remove the '\'
                 #    
  t'Hold-append' # branch: Branch conditionally to the label 'Hold-append'
                 #         The condition is that a replacement was made.
                 #         The current pattern-space had a trailing '\' which  
                 #         was replaced, so branch to 'Hold-apend' and append 
                 #         the now-truncated line to the hold-space
                 #
                 # This branching occurs for each (successive) such line. 
                 #
                 # PS. The 't' command may be so named because it means 'on true' 
                 #     (I'm not sure about this, but the shoe fits)  
                 #
                 # Note: Appending to the hold-space introduces a leading '\n'   
                 #       delimiter for each appended line
                 #  
                 #   eg. compare the hex dump of the follow 4 example commands:  
                 #       'x' swaps the hold and patten spaces
                 #
                 #       echo -n "a" |sed -ne         'p' |xxd -p  ## 61 
                 #       echo -n "a" |sed -ne     'H;x;p' |xxd -p  ## 0a61
                 #       echo -n "a" |sed -ne   'H;H;x;p' |xxd -p  ## 0a610a61
                 #       echo -n "a" |sed -ne 'H;H;H;x;p' |xxd -p  ## 0a610a610a61

   # No replacement was made above, so the current pattern-space
   #   (input line) has a "normal" ending.

   x             # Swap the pattern-space (the just-read "normal" line)
                 #   with the hold-space. The hold-space holds the accumulation
                 #   of appended  "stripped-of-backslah" lines

   G             # The pattern-space now holds zero to many "stripped-of-backslah" lines
                 #   each of which has a preceding '\n'
                 # The 'G' command Gets the Hold-space and appends it to 
                 #   the pattern-space. This append action introduces another
                 #   '\n' delimiter to the pattern space. 

   s/\n//g       # Remove all '\n' newlines from the pattern-space

   p             # Print the pattern-space

   s/.*//        # Now we need to remove all data from the pattern-space
                 # This is done as a means to remove data from the hold-space 
                 #  (there is no way to directly remove data from the hold-space)

   x             # Swap the no-data pattern space with the hold-space
                 # This leaves the hold-space re-initialized to empty...
                 # The current pattern-space will be overwritten by the next line-read

   b             # Everything is ready for the next line-read. It is time to make 
                 # an unconditional branch  the to end of process for this line
                 #  ie. skip any remaining logic, read the next line and start the process again.

  :'Hold-append' # The ':' (colon) indicates a label.. 
                 # A label is the target of the 2 branch commands, 'b' and 't'
                 # A label can be a single letter (it is often 'a')
                 # Note;  'b' can be used without a label as seen in the previous command 

    H            # Append the pattern to the hold buffer
                 # The pattern is prefixed with a '\n' before it is appended

END-SED
#######

— 彼得·奥
source

1

实际上，Neurino的解决方案非常简单。说到轻度复杂的sed，这可能会让您感兴趣。

— 吉尔（Gilles）'所以

2

另一个通用的命令行工具将是ed，它默认情况下会就地修改文件，因此不修改文件权限（有关更多信息，ed请参见使用ed文本编辑器从脚本编辑文件）。

str='
foo bar \
bash 1 \
bash 2 \
bash 3 \
bash 4 \
baz
dude \
happy
xxx
vvv 1 \
vvv 2 \
CCC
'

# We are using (1,$)g/re/command-list and (.,.+1)j to join lines ending with a '\'
# ?? repeats the last regex search.
# replace ',p' with 'wq' to edit files in-place
# (using Bash and FreeBSD ed on Mac OS X)
cat <<-'EOF' | ed -s <(printf '%s' "$str")
H
,g/\\$/s///\
.,.+1j\
??s///\
.,.+1j
,p
EOF

— Verdo
source

2

使用以下事实：read如果不使用以下命令，则在shell中将解释反斜杠-r：

$ while IFS= read line; do printf '%s\n' "$line"; done <file
foo bar bash baz
dude happy

请注意，这还将解释数据中的任何其他反斜杠。

— 库萨兰达
source

不。它不会删除所有反斜杠。尝试a\\b\\\\\\\\\\\c

— 艾萨克（Isaac）

@Isaac啊，也许我应该说“解释其他反斜杠”？

— 库萨兰达

1

一个将整个文件加载到内存中的简单解决方案：

sed -z 's/\\\n//g' file                   # GNU sed 4.2.2+.

还是一个简短的方法，可以理解（输出）行（GNU语法）：

sed ':x;/\\$/{N;bx};s/\\\n//g' file

一行（POSIX语法）：

sed -e :x -e '/\\$/{N;bx' -e '}' -e 's/\\\n//g' file

或使用awk（如果文件太大而无法容纳在内存中）：

awk '{a=sub(/\\$/,"");printf("%s%s",$0,a?"":RS)}' file

— 以撒
source

0

基于@Giles解决方案的Mac版本看起来像这样

sed ':x
/\\$/{N; s|\\'$'\\n||; tx
}' textfile

主要区别在于换行符的表示方式，将任何进一步的内容合并为一行会破坏它

— 安迪
source

-1

您可以使用cpp，但它会在合并输出的地方产生一些空行，并提供一些我用sed删除的介绍-也许也可以使用cpp-flags和options来完成：

echo 'foo bar \
bash \
baz
dude \
happy' | cpp | sed 's/# 1 .*//;/^$/d'
foo bar bash baz
dude happy

— 用户未知
source

你肯定cpp 是一个解决方案？在您的示例中echo，双引号中的with字符串已经输出了拉直的文本，因此cpp毫无意义。（这也适用于您的sed代码。）如果将字符串放在单引号中，则cpp只需删除反斜杠但不将其连接起来。（cpp如果反斜杠前没有空格，则与的连接将起作用，但是分开的单词将在不带分隔符的情况下加入。）

— manatwork 2012年

@manatwork：Outsch！:)令我惊讶的是，sed命令起作用了，但当然不是sed命令，但是bash本身将反斜杠换行符解释为上一行的延续。

— 用户未知

使用cpp这样仍不能串接线我。并且sed绝对不需要使用。用法cpp -P：“ -P禁止在预处理器的输出中生成线标记。” – man cpp

— manatwork

您的命令对我不起作用：

cpp: “-P: No such file or directory cpp: warning: '-x c' after last input file has no effect cpp: unrecognized option '-P:' cpp: no input files

A cpp --version显示cpp (Ubuntu 4.4.3-4ubuntu5.1) 4.4.3-什么？Ubuntu正在修补cpp？为什么？我本来希望读GNU ...

— 用户未知

有趣。Ubuntu cpp确实将这些行连接起来，并留了一些空白。更有趣的是，这里接受相同的4.4.3-4ubuntu5.1版本-P。但是，它仅消除了线标记，而空行仍然保留。

— manatwork，2012年