Oneliner合并具有相同第一个字段的行


15

这是我的第一个代码高尔夫问题,因此,如果不合适的话,我谨此致歉,并欢迎您提供任何反馈。

我有一个具有这种格式的文件:

a | rest of first line
b | rest of second line
b | rest of third line
c | rest of fourth line
d | rest of fifth line
d | rest of sixth line

实际内容和定界符也有所不同。内容只是文本。分隔符每行仅出现一次。对于这个难题,可以随意更改定界符,例如,使用“%”作为定界符。

所需的输出:

a | rest of first line
b | rest of second line % rest of third line
c | rest of fourth line
d | rest of fifth line % rest of sixth line

我已经有了ruby和awk脚本来合并它,但是我怀疑有可能有一个较短的oneliner。也就是说,可以与命令行中的管道和其他命令一起使用的单线。我想不通,我自己的脚本很长,只能在命令行上压缩。

首选最短字符。输入不一定要排序,但是我们只对合并具有匹配的第一字段的连续行感兴趣。有无限行与匹配的第一字段。字段1可以是任何东西,例如水果名称,专有名称等。

(我在MacOS上运行,因此我个人对在Mac上运行的实现最感兴趣)。


这是第二个示例/测试。注意“ |” 是分隔符。“ |”前的空格 是无关紧要的,如果不满意,则应视为关键。我在输出中使用“%”作为定界符,但是同样,请随时更改定界符(但不要使用方括号)。

输入:

why|[may express] surprise, reluctance, impatience, annoyance, indignation
whom|[used in] questions, subordination
whom|[possessive] whose
whom|[subjective] who
whoever|[objective] whomever
whoever|[possessive] whosever
who|[possessive] whose
who|[objective] whom

所需的输出:

why|[may express] surprise, reluctance, impatience, annoyance, indignation
whom|[used in] questions, subordination%[possessive] whose%[subjective] who
whoever|[objective] whomever%[possessive] whosever
who|[possessive] whose%[objective] whom

输出开头是否允许换行?
mIllIbyte

在原始问题中添加了评论。而且,@ mIllIbyte,换行符与我无关。但是在我看来,没有空白行,也没有错误检查。我假设所有行都有文本,并且至少有第一列和定界符。
MichaelCodes

从测试用例来看,是否假设所有键都分组可以节省时间?即:["A|some text", "B|other text", "A|yet some other text"]这不是测试所需的输入,因为for的关键字A在列表中不是一个接一个的。
凯文·克鲁伊森

我假设所有键都已分组。我并不担心它们不是这样的情况,尽管从理论上讲,它们不会像唯一键那样被对待。
MichaelCodes

Answers:


7

视网膜,17字节

  • @MartinEnder节省了12个字节
  • @ jimmy23013节省了1个字节

得分为ISO 8859-1编码字节。

用途;,而不是|作为输入字段分隔符。

(?<=(.+;).+)¶\1
%

在线尝试。



2
@LeakyNun因为环顾四周是原子的。第一次使用环顾四周时,它捕获了行的整个前缀,之后,正则表达式引擎将不再回溯到该行。
Martin Ender

5

V16 13字节

òí^¨á«©.*úsî±

在线尝试!

你说

随时更改定界符

因此,我选择了它|作为分隔符。如果这无效,请通知我,我将其更改。

说明:

ò                #Recursively:
 í               #Search for the following on any line:
  ^¨á«©          #1 or more alphabetic characters at the beginning of the line
       .*        #Followed by anything
         ús      #Mark everything after this to be removed:
           î±    #A new line, then the first match again (one or more alphabetic characters)


@ΈρικΚωνσταντόπουλος是吗?那是问题吗?
DJMcMayhem

对于这个难题,可以随意更改定界符,例如,使用“%”作为定界符。不即
埃里克Outgolfer

2
“ |” 分隔符很好。
MichaelCodes

@MichaelCodes您能否添加更多测试用例,以便我们验证解决方案是否有价值?
DJMcMayhem

3

Perl -0n,2 + 43 = 45字节

s/
.*\|/%/g,print for/(.*\|)((?:
\1|.)*
)/g

演示:

$ perl -0ne 's/
> .*\|/%/g,print for/(.*\|)((?:
> \1|.)*
> )/g' <<EOF
> why|[may express] surprise, reluctance, impatience, annoyance, indignation
> whom|[used in] questions, subordination
> whom|[possessive] whose
> whom|[subjective] who
> whoever|[objective] whomever
> whoever|[possessive] whosever
> who|[possessive] whose
> who|[objective] whom
> EOF
why|[may express] surprise, reluctance, impatience, annoyance, indignation
whom|[used in] questions, subordination%[possessive] whose%[subjective] who
whoever|[objective] whomever%[possessive] whosever
who|[possessive] whose%[objective] whom

3

SQL(PostgreSQL),43 72字节

COPY T FROM'T'(DELIMITER'|');SELECT a,string_agg(b,'%')FROM T GROUP BY A

这利用了PostgreSQL中方便的string_agg聚合函数。输入来自T具有2列A和的表B。为了更好地解决这个问题,我提供了将文件中的数据加载到表中的命令。该文件也是T如此。我还没有计算表create语句。
输出将是无序的,但是如果这是一个问题,则可以使用ORDER BY A

SQLFiddle不想为我演奏,但这就是我在设置中得到的。

CREATE TABLE T (A VARCHAR(9),B VARCHAR(30));

COPY T FROM'T'(DELIMITER'|');SELECT a,string_agg(b,'%')FROM T GROUP BY A
a   string_agg
--- ----------------------------------------
c   rest of fourth line
b   rest of second line%rest of third line
a   rest of first line
d   rest of fifth line%rest of sixth line

1
公平地说,我建议您也包含一个COPY命令来读取表中指定的文件格式的内容,否则您将无法解决与其他所有人相同的问题。
Jules

@Jules Fair,我回答时就想到了这个默认的I / O共识。重新阅读问题,尽管我将编辑答案。
MickyT

2

C,127个字节

o[99],n[99],p=n;main(i){for(;gets(n);strncmp(o,n,i-p)?printf(*o?"\n%s":"%s",n),strcpy(o,n):printf(" /%s",i))i=1+strchr(n,'|');}

与gcc一起使用。将定界符更改为/。从stdin获取输入并将输出写入stdout,因此使用输入重定向进行调用./a.out <filename

取消高尔夫:

o[99],n[99] //declare int, to save two bytes for the bounds
,p=n; //p is an int, saves one byte as opposed to applying an (int) cast to n,
//or to declaring o and n as char arrays
main(i){for(;gets(n);strncmp(o,n,i-p //an (int)n cast would be needed;
// -(n-i) does not work either,
//because pointer arithmetics scales to (int*)
)?printf(*o?"\n%s":"%s" //to avoid a newline at the beginning of output
,n),strcpy(o,n):printf(" /%s",i))i=1+strchr(n,'|');}

1

Pyth-15个字节

关于OP的澄清,对问题的一些假设将改变。

jm+Khhd-sdK.ghk

在这里在线尝试


如果“键”是单词而不是单个字母,则此方法不起作用。(OP在评论中得到了澄清)
DJMcMayhem

1

Python 3-146字节

输入是文件名或文件路径,输出是标准输出。如果我可以从命令行将输入作为原始文本输入,则可能会短很多

从stdin接受输入,然后输出到stdin。设置分隔符"|"。要测试第一个示例输入,请使用分隔符" | "

from itertools import*
for c,b in groupby([x.split("|")for x in input().split("\n")],key=lambda x:x[0]):print(c,"|"," % ".join((a[1]for a in b)))

这个挑战并没有明确要求从文件中读取输入,因此我想我们的Default I / O方法适用于此。并且由于其他答案也吸收了STDIN的输入,因此我认为OP可以满足要求。
Denker

@DenkerAffe好吧,我将对其进行编辑,因为我认为您甚至不能从stdin提供实际的多行输入,所以它将完全没有用。
基廷厄

但是您可以在运行脚本时执行输入重定向。
mIllIbyte

1

Java 7,167字节

可以通过使用其他方法来打更多的高尔夫球。

import java.util.*;Map c(String[]a){Map m=new HashMap();for(String s:a){String[]x=s.split("=");Object l;m.put(x[0],(l=m.get(x[0]))!=null?l+"%"+x[1]:x[1]);}return m;}

注意:上面的方法创建并返回HashMap带有所需键值对的。但是,它不会像OP的问题那样在确切的输出中打印它,而|在键和新值之间将其作为输出分隔符。从MickeyT的SQL答案判断,他返回了一个数据库表,我认为这是允许的。如果没有,则应为打印功能添加更多字节。

取消测试代码:

import java.util.*;

class Main{

    static Map c(String[] a){
        Map m = new HashMap();
        for(String s : a){
            String[] x = s.split("\\|");
            Object l;
            m.put(x[0], (l = m.get(x[0])) != null
                            ? l + "%" + x[1]
                            : x[1]);
        }
        return m;
    }

    public static void main(String[] a){
        Map m = c(new String[]{
            "why|[may express] surprise, reluctance, impatience, annoyance, indignation",
            "whom|[used in] questions, subordination",
            "whom|[possessive] whose",
            "whom|[subjective] who",
            "whoever|[objective] whomever",
            "whoever|[possessive] whosever",
            "who|[possessive] whose",
            "who|[objective] whom"
        });

        // Object instead of Map.EntrySet because the method returns a generic Map
        for (Object e : m.entrySet()){
            System.out.println(e.toString().replace("=", "|"));
        }
    }
}

输出:

whoever|[objective] whomever%[possessive] whosever
whom|[used in] questions, subordination%[possessive] whose%[subjective] who
why|[may express] surprise, reluctance, impatience, annoyance, indignation
who|[possessive] whose%[objective] whom

1

PowerShell,85个字节

使用哈希表合并字符串:

%{$h=@{}}{$k,$v=$_-split'\|';$h.$k=($h.$k,$v|?{$_})-join'%'}{$h.Keys|%{$_+'|'+$h.$_}}

由于PowerShell不支持通过进行stdin重定向<,因此我假设Get-Content .\Filename.txt |它将用作默认的I / O方法。

Get-Content .\Filename.txt | %{$h=@{}}{$k,$v=$_-split'\|';$h.$k=($h.$k,$v|?{$_})-join'%'}{$h.Keys|%{$_+'|'+$h.$_}}

输出量

whoever|[objective] whomever%[possessive] whosever
why|[may express] surprise, reluctance, impatience, annoyance, indignation
whom|[used in] questions, subordination%[possessive] whose%[subjective] who
who|[possessive] whose%[objective] whom

1

APL,42个字符

{⊃{∊⍺,{⍺'%'⍵}/⍵}⌸/↓[1]↑{(1,¯1↓'|'=⍵)⊂⍵}¨⍵}

在APL编码中不是一个字节。
扎卡里

0

Sed,55个字节

:a N;:b s/^\([^|]*\)|\([^\n]*\)\n\1|/\1|\2 %/;ta;P;D;tb

测试运行 :

$ echo """why|[may express] surprise, reluctance, impatience, annoyance, indignation
> whom|[used in] questions, subordination
> whom|[possessive] whose
> whom|[subjective] who
> whoever|[objective] whomever
> whoever|[possessive] whosever
> who|[possessive] whose
> who|[objective] whom""" | sed ':a N;:b s/^\([^|]*\)|\([^\n]*\)\n\1|/\1|\2 %/;ta;P;D;tb'
why|[may express] surprise, reluctance, impatience, annoyance, indignation
whom|[used in] questions, subordination %[possessive] whose %[subjective] who
whoever|[objective] whomever %[possessive] whosever
who|[possessive] whose %[objective] whom

0

q / kdb +,46个字节

解:

exec"%"sv v by k from flip`k`v!("s*";"|")0:`:f

例:

q)exec"%"sv v by k from flip`k`v!("s*";"|")0:`:f
who    | "[possessive] whose%[objective] whom"
whoever| "[objective] whomever%[possessive] whosever"
whom   | "[used in] questions, subordination%[possessive] whose%[subjective] who"
why    | "[may express] surprise, reluctance, impatience, annoyance, indignation"

说明:

`:f            // assumes the file is named 'f'
("s*";"|")0:   // read in file, assume it has two columns delimitered by pipe
flip `k`v      // convert into table with columns k (key) and v (value)
exec .. by k   // group on key
"%"sv v        // join values with "%"
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.