如何将行追加到上一行？

9

我有一个日志文件，需要对其进行分析和分析。文件包含类似以下内容：

文件：

20141101 server contain dump
20141101 server contain nothing
    {uekdmsam ikdas 

jwdjamc ksadkek} ssfjddkc * kdlsdl
sddsfd jfkdfk 
20141101 server contain dump

基于上述情况，我必须检查起始行是否不包含日期或我必须附加到前一行的数字。

输出文件：

20141101 server contain dump
20141101 server contain nothing {uekdmsam ikdas jwdjamc ksadkek} ssfjddkc * kdlsdl sddsfd jfkdfk 
20141101 server contain dump

text-processing sed awk

— 威廉·R
source

11

中的版本perl，使用否定的前瞻：

$ perl -0pe 's/\n(?!([0-9]{8}|$))//g' test.txt
20141101 server contain dump
20141101 server contain nothing    {uekdmsam ikdas jwdjamc ksadkek} ssfjddkc * kdlsdlsddsfd jfkdfk
20141101 server contain dump

-0允许正则表达式在整个文件中进行匹配，并且\n(?!([0-9]{8}|$))是一个负向的超前行为，表示换行符后不能跟8位数字或行尾（与在一起-0，将与文件末尾）。

— uru
source

@terdon，已更新以保存最后的换行符。

— muru

好一个！我会投票给你，但我恐怕已经拥有了:)

— terdon

否，-0如果用于NUL分隔的记录。用于-0777将整个文件保存在内存中（这里不需要）。

— 斯特凡Chazelas

@StéphaneChazelas那么，除了读取整个文件之外，使Perl与换行符匹配的最佳方法是什么？

— muru 2014年

请参阅其他逐行处理文件的答案。

— 斯特凡Chazelas

5

可能有点容易 sed

sed -e ':1 ; N ; $!b1' -e 's/\n\+\( *[^0-9]\)/\1/g'

第一部分:1;N;$!b1收集文件中的所有行，并除以\n一长行
第二部分如果在非数字符号后跟非数字符号之间有可能的空格，则删除换行符号。

为了避免内存限制（尤其是大文件），可以使用：

sed -e '1{h;d}' -e '1!{/^[0-9]/!{H;d};/^[0-9]/x;$G}' -e 's/\n\+\( *[^0-9]\)/\1/g'

或忘记困难的sed剧本，并记住那一年始于2

tr '\n2' ' \n' | sed -e '1!s/^/2/' -e 1{/^$/d} -e $a

— 科斯塔斯
source

不错，+ 1。您能补充一下它的工作原理吗？

— terdon

1

w 真好我总是tr '\n' $'\a' | sed $'s/\a\a*$ *[^0-9]$/\1/g' | tr $'\a' '\n'自己做。

— mirabilos 2014年

抱歉，对于在sed（1）中使用的不是POSIX BASIC常规表达式 S的东西，它必须投下反对票，这是GNUism。

— mirabilos 2014年

1

@Costas，这是GNU grep的手册页。POSIX BRE规格都存在。相当于ERE的BRE +为\{1,\}。[\n]也不是便携式的。\n\{1,\}将是POSIX。

— 斯特凡Chazelas

1

另外，标签后不能再有其他命令。: 1;x是1;x在POSIX sed中定义标签。因此，您需要：sed -e :1 -e 'N;$!b1' -e 's/\n\{1,\}$ *[^0-9]$/\1/g'。还要注意，许多sed实现对其模式空间的大小都有很小的限制（POSIX仅保证10 x LINE_MAX IIRC）。

— 斯特凡Chazelas

5

一种方法是：

 $ perl -lne 's/^/\n/ if $.>1 && /^\d+/; printf "%s",$_' file
 20141101 server contain dump
 20141101 server contain nothing    {uekdmsam ikdas jwdjamc ksadkek} ssfjddkc * kdlsdlsddsfd jfkdfk 
 20141101 server contain dump

但是，.that也会删除最后的换行符。要再次添加它，请使用：

$ { perl -lne 's/^/\n/ if $.>1 && /^\d+/; printf "%s",$_' file; echo; } > new

说明

该-l会删除尾随的换行符（并添加一个到每个print这就是为什么我用电话printf来代替。然后，如果用数字（当前行开始/^\d+/）和当前行数大于一（$.>1，这是需要避免增加额外的在开始空行）中，添加\n到该行的开头。该printf打印每一行。

另外，您可以将所有\n字符更改为\0，然后再次\0将数字字符串之前的字符更改为\n：

$ tr '\n' '\0' < file | perl -pe 's/\0\d+ |$/\n$&/g' | tr -d '\0'
20141101 server contain dump
20141101 server contain nothing    {uekdmsam ikdas jwdjamc ksadkek} ssfjddkc * kdlsdlsddsfd jfkdfk 
20141101 server contain dump

要使其仅匹配8个数字的字符串，请改用以下命令：

$ tr '\n' '\0' < file | perl -pe 's/\0\d{8} |$/\n$&/g' | tr -d '\0'

— Terdon
source

的第一个参数printf是format。使用printf "%s", $_

— 斯特凡Chazelas

@StéphaneChazelas为什么？我的意思是，我知道它更清洁，也许更容易理解，但是是否有任何可以避免的危险？

— terdon

是的，如果输入中可能包含％个字符，这是错误的，并且可能有危险。例如，尝试输入%10000000000s。

— 斯特凡Chazelas

在C语言中，这是一个众所周知的非常糟糕的做法和漏洞来源。使用perl，echo %.10000000000f | perl -ne printf使我的机器崩溃。

— 斯特凡Chazelas

@StéphaneChazelas哇，是的。我也是。这样就足够公平了，答案已编辑，谢谢。

— terdon

3

尝试使用awk做到这一点：

#!/usr/bin/awk -f

{
    # if the current line begins with 8 digits followed by
    # 'nothing' OR the current line doesn't start with 8 digits
    if (/^[0-9]{8}.*nothing/ || !/^[0-9]{8}/) {
        # print current line without newline
        printf "%s", $0
        # feeding a 'state' variable
        weird=1
    }
    else {
        # if last line was treated in the 'if' statement
        if (weird==1) {
            printf "\n%s", $0
            weird=0
        }
        else {
            print # print the current line
        }
    }
}
END{
    print # add a newline when there's no more line to treat
}

要使用它：

chmod +x script.awk
./script.awk file.txt

— 吉尔·奎诺（Gilles Quenot）
source

2

使用awk和terdon的算法的另一种最简单的方法（比我的其他答案）：

awk 'NR>1 && /^[0-9]{8}/{printf "%s","\n"$0;next}{printf "%s",$0}END{print}' file

— 吉尔·奎诺（Gilles Quenot）
source

ITYM END{print ""}。备选方案：awk -v ORS= 'NR>1 && /^[0-9]{8}/{print "\n"};1;END{print "\n"}'

— 史蒂芬·夏泽拉斯

1

sed -e:t -e '$!N;/\n *[0-9]{6}/!s/\n */ /;tt' -eP\;D

— 麦克维
source

0

热门节目：

while read LINE
do
    if [[ $LINE =~ ^[0-9]{8} ]]
    then
        echo -ne "\n${LINE} "
    else
        echo -n "${LINE} "
    fi
done < file.txt

单行形式：

while read L; do if [[ $L =~ ^[0-9]{8} ]]; then echo -ne "\n${L} "; else echo -n "${L} "; fi done < file.txt

解保反斜杠（read -r）和前导空格（只是IFS=后while）：

while IFS= read -r LINE
do
    if [[ $LINE =~ ^[0-9]{8} ]]
    then
        echo
        echo -nE "\n${LINE} "
    else
        echo -nE "${LINE} "
    fi
done < file.txt

单行形式：

while IFS= read -r L; do if [[ $L =~ ^[0-9]{8} ]]; then echo; echo -nE "${L} "; else echo -nE "${L} "; fi done < file.text

— 车
source

如果该行包含反斜杠和，则此行将中断n。它还会去除空格。但是您可以使用mksh以下方法：while IFS= read -r L; do [[ $L = [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]* ]] && print; print -nr -- "$L"; done; print

— mirabilos

当然，不是针对所有算法，而是针对任务提供的要求的解决方案。当然，最终的解决方案将像现实生活中通常发生的那样一目了然，变得更加复杂且不易理解：)

— rook 2014年

我同意，但是我已经学会了一种很难理解OP☺的困难方法，尤其是当它们被伪文本替换为实际文本时。

— mirabilos 2014年

0

[shyam@localhost ~]$ perl -lne 's/^/\n/ if $.>1 && /^\d+/; printf "%s",$_' appendDateText.txt

那可行

i/p:
##06/12/2016 20:30 Test Test Test
##TestTest
##06/12/2019 20:30 abbs  abcbcb abcbc
##06/11/2016 20:30 test test
##i123312331233123312331233123312331233123312331233Test
## 06/12/2016 20:30 abc

o/p:
##06/12/2016 20:30 Test Test TestTestTest
##06/12/2019 20:30 abbs  abcbcb abcbc
##06/11/2016 20:30 test ##testi123312331233123312331233123312331233123312331233Test
06/12/2016 20:30 abc vi appendDateText.txt

— Shyam Gupta
source