在文本中搜索前缀并在文本中列出其所有后缀


17

我在这里宽松地使用“后缀”来表示“前缀之后的任何子字符串”。

“前缀”在此表示单词的开始,其中单词的开始定义为空格或输入文本的第一个字符之后(对于第一个单词)。单词中间的“前缀”将被忽略。

例如,如果您输入的前缀是“ arm”,并且输入文本是“ Dumbledore的军队已为即将发生的世界末日大战充分武装”,则输出列表将包含(y,ed,ageddon)。

测试用例

假定区分大小写,字符串以空格结尾。输入将不会以空格开头。

删除重复项是可选的。


Input prefix: "1"

Input text:

"He1in aosl 1ll j21j 1lj2j 1lj2 1ll l1j2i"

Output: (ll, lj2j, lj2) - in any permutation

Input prefix: "frac"

Input text: 

"fracking fractals fracted fractional currency fractionally fractioned into fractious fractostratic fractures causing quite a fracas"

Output: (king, tals, ted, tional, tionally, tioned, tious, tostratic, tures, as)

Input prefix: "href="https://www.astrotheme.com/astrology/"

Input text: 

"(div style="padding: 0; background: url('https://www.astrotheme.com/images/site/arrondi_450_hd.png') no-repeat; text-align: left; font-weight: bold; width: 450px; height: 36px")
  (div class="titreFiche" style="padding: 5px 0 0 6px")(a href="https://www.astrotheme.com/astrology/Nolwenn_Leroy" title="Nolwenn Leroy: Astrology, birth chart, horoscope and astrological portrait")Nolwenn Leroy(br /)
(/div)
  (div style="text-align: right; border-left: 1px solid #b2c1e2; border-right: 1px solid #b2c1e2; width: 446px; padding: 1px 1px 0; background: #eff8ff")
    (table style="width: 100%")(tr)(td style="width: 220px")
(div style="padding: 0; background: url('https://www.astrotheme.com/images/site/arrondi_450_hd.png') no-repeat; text-align: left; font-weight: bold; width: 450px; height: 36px")
  (div class="titreFiche" style="padding: 5px 0 0 6px")(a href="https://www.astrotheme.com/astrology/Kim_Kardashian" title="Kim Kardashian: Astrology, birth chart, horoscope and astrological portrait")Kim Kardashian(br /)(span style="font-weight: normal; font-size: 11px")Display her detailed horoscope and birth chart(/span)(/a)(/div)
(/div)
(div style="padding: 0; background: url('https://www.astrotheme.com/images/site/arrondi_450_hd.png') no-repeat; text-align: left; font-weight: bold; width: 450px; height: 36px")
  (div class="titreFiche" style="padding: 5px 0 0 6px")(a href="https://www.astrotheme.com/astrology/Julia_Roberts" title="Julia Roberts: Astrology, birth chart, horoscope and astrological portrait")Julia Roberts(br /)(span style="font-weight: normal; font-size: 11px")Display her detailed horoscope and birth chart(/span)(/a)(/div)
    (td id="cfcXkw9aycuj35h" style="text-align: right")
  (/div)"

Output: (Nolwenn_Leroy", Kim_Kardashian", Julia_Roberts")

获胜者,冠军

这是,因此最少的字节获胜。:)

只要您的代码可以解决诸如测试用例之类的任意问题,就可以以任何可行的方式接受输入。


2
要清楚,前缀必须在单词的开头?如果第二个测试用例中包含“ diffraction”一词,那会改变输出吗?
sundar-恢复莫妮卡

2
https://www.astrotheme.com/astrology/在前面加上时,怎么能成为前缀href="
尼尔,

1
后缀可以为空吗?
user202729

1
我建议允许人们在其他空白处以及一些似乎正在这样做的空白处进行拆分。我还建议说输入中的行中不能有多个空格(或者等效地,空词可能会导致未定义的行为)。我建议这两件事,因为挑战的主要部分不是拆分成单词部分(我建议仅允许单词列表甚至只是单词作为输入,但是现在提供22个答案为时已晚-需要注意的事情应对未来的挑战)。
乔纳森·艾伦

1
-1,以允许现在在其他空白上拆分。挑战原本应该是有意义的,但是现在进行更改会将答案分解为可以做两种不同事情的答案。这与某些语言无法处理的情况不同。64位数字之类的东西,这里只是意味着实现稍微(可能)更复杂的匹配,因此更合理地用错误的假设纠正答案,也许还要添加一个测试用例来检查这一点。
sundar-恢复莫妮卡

Answers:


5

R,63字节

function(s,p,z=el(strsplit(s,' ')))sub(p,'',z[startsWith(z,p)])

在线尝试!

不幸的是,由于存在巨大的regmatches/gregexpr组合,正向后看的实现要长5个字节:

function(s,p)regmatches(s,gregexpr(paste0('(?<=',p,')[^ ]*'),s,,T))

2
天真的sub(grep())稍好于66的后向,但仍然不会侵犯startsWith()。如果不改变方法,我认为这里没有很大的改进空间。在线尝试!
刑事


4

Japt,9个字节

如果我们可以将输入作为单词数组,则为8个字节。

¸kbV msVl
¸         // Shorthand for `qS`, split into words.
 kbV      // Filter the words, selecting only those that start with the prefix.
     msVl // For each remaining word, remove prefix length chars from the start.

在线尝试!


非常好,但似乎不适用于最后一个测试用例。可能是由于字符串内的引号引起的?或换行?
DrQuarius

@DrQuarius您的最后一个测试用例有问题,不是吗?您要查找的所有字符串都在单词的中间(用括起来url('')),但都不在开头。
尼特


4

C(GCC) 113个 109 106 105字节

-4个字节感谢@LambdaBeta!
-3个字节,感谢@WindmillCookies!

i;f(char*s,char*t){for(i=strlen(s);*t;t++)if(!strncmp(t,s,i))for(t+=i,puts("");*t^32&&*t;)putchar(*t++);}

在线尝试!


1
删除两个都可以节省4个字节^0。Just ;*t;and&&*t;
LambdaBeta

@LambdaBeta谢谢!我错过了。
betseg

1
我能够使用其他策略将其降低到107,抱歉:)
LambdaBeta

@LambdaBeta我实际上想到了这种方法,但我不认为它会比我发布的解决方案要短。好的答案,赞成。
betseg

1
用过的卖出期权的代替的putchar,现在是107,在不同的行输出:tio.run/...
风车饼干

3

Japt16 12字节

港口Arnauld Answer

@Shaggy的-4个字节

iS qS+V Å®¸g

iS                  Insert S value (S = " ") at beginning of first input (Implicit)
   q                split using
    S+V             S + Second input
        Å           slice 1
         ®          map
          ¸         split using S
           g        get first position

在线尝试!



应该提到这是Arnauld解决方案的移植。(当然,假设它不是独立衍生的)
Shaggy

@Shaggy老实说,我没有注意到这是相同的答案,无论如何,我会给他功劳。抱歉
路易斯·费利佩·德·耶稣·穆诺兹

如果您想尝试一下,有9字节的解决方案。
毛茸茸的

@Shaggy您的意思是还是你心里有什么不同?
Nit

3

05AB1E,11个字节

#ʒηså}εsgF¦

在线尝试!是多行字符串的演示)

它是如何工作的?

#ʒηså} εsgF¦完整程序。
#用空格分隔第一个输入。
 Filter}用...过滤单词
  ηså...“第二个输入是否出现在单词的前缀?”
      ε并且对于每个有效词
       sg检索第二个输入的长度。
         F¦并丢掉单词的第一个字符多次。

:)非常好,感谢多行演示!我认为这导致了其他程序的问题。
DrQuarius

3

Stax,8 个字节

·B¬╤²*6&

运行并调试

说明:

j{x:[fmx|- Full program, implicit input: On stack in order, 1st input in X register
j          Split string on spaces
 {   f     Filter:
  x:[        Is X a prefix?
      m    Map passing elements:
       x|-   Remove all characters in X the first time they occur in the element
             Implicit output

我也可以使用x%t(X的长度,从左开始修剪),它同样长,但压缩为9个字节


美丽。:)我认为这可能是赢家。大多数最低字节分数的竞争者都无法解析第三个测试用例。:)
DrQuarius

嗯...但是我知道您现在是如何完成的,您必须让程序知道字符串中的引号不属于程序的一部分。我认为很好。而且,无论如何,您的仍然是最短的。:)
DrQuarius

3

视网膜,31字节

L`(?<=^\2¶(.|¶)*([^ ¶]+))[^ ¶]+

在线尝试!第一行应为所需的前缀,其余为输入文本。不删除重复项。如果任何空白是有效的分隔符,则将为25个字节。说明:我们要列出有效前缀的后缀。该[^ ¶]+后缀本身相匹配。regexp的前缀是后面的,以确保后缀的前缀是输入前缀。当向后评估从右到左评估时,首先要匹配前缀(使用相同的模式,但在()s 内部使用它来捕获它),然后匹配任何字符,最后在输入的开头在其自己的行上匹配前缀。


空格表示空格和/或换行符?我认为这是一个有效的解决方案,但是为了公平起见,我将把问题放在上面。
DrQuarius

@DrQuarius不,任何空白都包括制表符,换页符甚至椭圆
尼尔

视网膜是我看到这篇文章时想到的第一种语言(尽管我还不知道该语言)。我以为会短一些。我可以麻烦您解释一下吗?例如。文档说是换行符,但是我不知道为什么这里需要这么多。
sundar-恢复莫妮卡

@sundar抱歉,我当时有点着急。第一个确保整个第一行都与前缀匹配。第二个是必需的,因为不知道中间有多少行。后两个字符以相同的方式工作-否定的字符类通常包括换行符,但我们不希望在此使用换行符。
尼尔

没问题,感谢您添加它。“通常包含换行符,但我们不希望在这里出现” <-如果我理解正确,我们确实希望在这里出现。OP严格规定,只有空格才算作分隔符,前缀始于空格,后缀始于空格。因此,例如。“ dif \ nfractional”不应与“ frac”匹配,因为前缀位于换行符之后,而不是空格之后。同样,“ fracture- \ nrelated”应返回后缀“ ture- \ nrelated”。我认为这是个好消息,因为您可以删除至少一个(可能更多)。
sundar-恢复莫妮卡

3

Brachylog24 21字节

tṇ₁W&h;Wz{tR&h;.cR∧}ˢ

在线尝试!

如果与内联谓词共享变量,则可能会短几个字节。

输入是一个以前缀为第一个元素,文本为第二个元素的数组。

tṇ₁W                    % Split the text at spaces, call that W
    &h;Wz               % Zip the prefix with each word, to give a list of pairs
         {         }ˢ   % Select the outputs where this predicate succeeds:
          tR            % Call the current word R
            &h;.c       % The prefix and the output concatenated
                 R      % should be R
                  ∧     % (No more constraints on output)

2

IBM / Lotus Notes公式,54个字节

c:=@Explode(b);@Trim(@If(@Begins(c;a);@Right(c;a);""))

从名为a和的两个字段获取输入b。之所以可行,是因为Formula将无需@For循环就将函数递归地应用于列表。

没有可用的TIO,因此以下是屏幕截图:

enter image description here


2

APL(Dyalog Unicode),23 字节SBCS

完整程序。提示输入来自stdin的文本和前缀。将列表打印到标准输出。

(5'(\w+)\b',⎕)⎕S'\1'⊢⎕

在线尝试!

 提示(输入文字)

 得到的是(分离'\1'

()⎕S'\1' PCRE从以下正则表达式中搜索并返回捕获组1的列表:

 提示(输入前缀)

'(\w+)\b', 在此字符串之前(单词字符组,后跟单词边界)

5⌽ 将前5个字符旋转到末尾; '\bPREFIX(\w+)'


2

C(clang),107个字节

i;f(s,t,_)char*s,*t,*_;{i=strlen(s);_=strtok(t," ");while((strncmp(_,s,i)||puts(_+i))&&(_=strtok(0," ")));}

在线尝试!

描述:

i;f(s,t,_)char*s,*t,*_;{   // F takes s and t and uses i (int) and s,t,u (char*)
    i=strlen(s);           // save strlen(s) in i
    _=strtok(t," ");       // set _ to the first word of t
    while(                 // while loop
        (strncmp(_,s,i)||  // short-circuited if (if _ doesn't match s to i places)
         puts(_+i))        // print _ starting at the i'th character
        &&                 // the previous expression always returns true
        (_=strtok(0," "))) // set _ to the next word of t
    ;                      // do nothing in the actual loop
}

必须是lang语,因为gcc segfaults不会#include <string.h>由于strtok问题。



2

MATL,17个字节

Yb94ih'(.*)'h6&XX

在MATL Online上尝试

怎么样?

Yb -在空格处分割输入,将结果放置在单元格数组中

94-为ASCII码^字符

ih -获取输入(例如“ frag”),连接“ ^”和输入

'(.*)'h-将字符串推'(.*)'入堆栈,并连接“ ^ frac”和“(。*)”。因此,现在有了'^frac(.*)regex,它在字符串的开头匹配“ frac”并捕获后面的所有内容。

6&XX-运行regexp匹配,并6&指定“令牌”模式,即返回匹配的捕获组,而不是整个匹配。

隐式输出结果。


这就是它的'Tokens'作用;很高兴知道!
Luis Mendo

1
哈哈。我也不知道,通过反复试验找出答案。
sundar-恢复莫妮卡


2

PowerShell的3.0,60 62 59个字节

param($p,$s)-split$s|%{if($_-cmatch"^$p(.*)"){$Matches[1]}}

丢失一些字节会抑制cmatch输出。有一个过时的解决方案,它是有意引起重复的,从而获得了一些收益。但是,如果第一次不匹配,它也会抛出红线,但是考虑到我现在还不行。+2个字节来修复它。


60字节的解决方案在某些情况下会返回双重答案,king, tals, ted, tional, tional, tionally, tioned, tioned, tious, tostratic, tures,tures,tures, tures, as并在He1in示例中显示索引错误。Powershell 5.1、6.0.2。62字节的解决方案还可以。
mazzy

1
@mazzy我知道,我只是在滥用“允许重复”位,以使其遇到不匹配项时返回更多重复项,并且在不匹配的第一个迭代中抛出红色。
Veskah

1

JavaScript(ES6),57个字节

以currying语法接受输入(text)(prefix)。不删除重复项。

s=>p=>(' '+s).split(' '+p).slice(1).map(s=>s.split` `[0])

在线尝试!




1

稻壳, 11 bytes

几乎只是Haskell答案的一部分

m↓L⁰foΠz=⁰w

在线尝试!

说明

m↓L⁰f(Πz=⁰)w  -- prefix is explicit argument ⁰, the other one implicit. eg: ⁰ = "ab" and implicit "abc def"
           w  -- words: ["abc","def"]
    f(    )   -- filter by (example w/ "abc"
       z=⁰    -- | zip ⁰ and element with equality: [1,1]
      Π       -- | product: 1
              -- : ["abc"]
m             -- map the following
 ↓            -- | drop n elements
  L⁰          -- | n being the length of ⁰ (2)
              -- : ["c"]

1

果冻 11  9 字节

Ḳœṣ€ḢÐḟj€

双向链接,在左侧接受文本(字符列表),在右侧接受前缀(字符列表),从而产生字符列表(结果后缀)的列表。

在线尝试!(页脚与空格连接以避免完整程序的隐式粉碎)
注意:我在OP中的字符串中添加了三个边沿大小写-开头无斑点和nofracfrachere。

怎么样?

Ḳœṣ€ḢÐḟj€ - Link: text, prefix                        e.g. "fracfracit unfracked", "frac"
Ḳ         - split (text) at spaces -> list of words        ["fracfracit", "unfracked"]
   €      - for each (word):
 œṣ       -   split around sublists equal to (prefix)       ["","","it"]  ["un","ked"]
     Ðḟ   - filter discard items for which this is truthy:
    Ḣ     -   head
          -   -- Crucially this modifies the list:             ["","it"]       ["ked"]
          -   -- and yields the popped item:                 ""            "un"
          -   -- and only non-empty lists are truthy:       kept          discarded
          -            ...so we end up with the list:      [["","it"]]
        € - for each (remaining list of lists of characters):
       j  -   join with the prefix                          "fracit"                                             
          -                                                ["fracit"]

前11个字节:

Ḳs€L}Ḣ⁼¥ƇẎ€

也是如上所述的二元链接。

在线尝试!


1

具有-asE,23 22 21字节(?)的Perl 5

say/^$b(.*)/ for@F

在线尝试!

可以作为命令行的一行运行perl -asE 'say/^$b(.*)/ for@F' -- -b=frac -,或在地方,最后的文件名-
或者从脚本文件中说perl -as -M5.010 script.pl -b=frac -(感谢@Brad Gilbert b2gills的TIO链接演示了这一点)。

代码本身为18个字节,我为该-b=选项添加了3个字节,该选项将其值(前缀输入)分配给$b代码中命名的变量。感觉像是通常的“不计算标志”共识的例外。

-a在空格处分割每个输入行并将结果放入数组中@F-s是通过在命令行上指定名称来将命令行参数作为变量分配的快捷方式。这里的参数是-b=frac,它将前缀“ frac”放在变量中$b

/^$b(.*)/ - Matches the value of $b at the beginning of the string. .* is whatever comes after that, until the end of the word, and the surrounding parantheses capture this value. The captured values are automatically returned, to be printed by say. Iterating through space-separated words with for @F means we don't have to check for initial or final spaces.



1

Perl 6, 30 bytes

{$^t.comb: /[^|' ']$^p <(\S+/}

Test it

Expanded:

{  # bare block lambda with placeholder params $p, $t

  $^t.comb:    # find all the substrings that match the following
  /
    [ ^ | ' ' ] # beginning of string or space
    $^p        # match the prefix
    <(         # don't include anything before this
    \S+        # one or more non-space characters (suffix)
  /
}

@sundar fixed​ ​
Brad Gilbert b2gills

You seem to have an extra space between 'p' and '<' btw.
sundar - Reinstate Monica

@sundar The space between p and <( is necessary as otherwise it may be seen as $v<…> which is short for $v{qw '…'}.
Brad Gilbert b2gills

1
Seems to work without it though, at least in this case.
sundar - Reinstate Monica

1
@sundar Technically it just warns, but I don't like writing code that warns when it is only one byte different than code that doesn't warn.
Brad Gilbert b2gills

1

Java 10, 94 bytes

p->s->{for(var w:s.split(" "))if(w.startsWith(p))System.out.println(w.substring(p.length()));}

Try it online here.

Ungolfed:

p -> s -> { // lambda taking prefix and text as Strings in currying syntax
    for(var w:s.split(" ")) // split the String into words (delimited by a space); for each word ...
        if(w.startsWith(p)) //  ... test whether p is a prefix ...
            System.out.println(w.substring(p.length())); // ... if it is, output the suffix
}

1

Small Basic, 242 bytes

A Script that takes no input and outputs to the TextWindow Object

c=TextWindow.Read()
s=TextWindow.Read()
i=1
While i>0
i=Text.GetIndexOf(s," ")
w=Text.GetSubText(s,1,i)
If Text.StartsWith(w,c)Then
TextWindow.WriteLine(Text.GetSubTextToEnd(w,Text.GetLength(c)+1))
EndIf
s=Text.GetSubTextToEnd(s,i+1)
EndWhile

Try it at SmallBasic.com! Requires IE/Silverlight



1

Brachylog, 12 bytes

hṇ₁∋R&t;.cR∧

Try it online!

Takes input as [text, prefix] through the input variable, and generates each word through the output variable. This was originally sundar's answer, which I started trying to golf after reading that it "could have been a few bytes shorter if there was variable sharing with inline predicates", which is possible now. Turns out that generator output saves even more bytes.

    R           R
   ∋            is an element of
h               the first element of
                the input
 ṇ₁             split on spaces,
     &          and the input
      t         's last element
         c      concatenated
       ;        with
        .       the output variable
          R     is R
           ∧    (which is not necessarily equal to the output).

My first two attempts at golfing it down, using fairly new features of the language:

With the global variables that had been hoped for: hA⁰&tṇ₁{∧A⁰;.c?∧}ˢ (18 bytes)

With the apply-to-head metapredicate: ṇ₁ᵗz{tR&h;.cR∧}ˢ (16 bytes)

And my original solution:

Brachylog, 15 bytes

ṇ₁ʰlᵗ↙X⟨∋a₀⟩b↙X

Try it online!

Same I/O. This is essentially a generator for words with the prefix, ṇ₁ʰ⟨∋a₀⟩, modified to remove the prefix.

                   The input variable
  ʰ                with its first element replaced with itself
ṇ₁                 split on spaces
    ᵗ              has a last element
   l               the length of which
     ↙X            is X,
       ⟨   ⟩       and the output from the sandwich
       ⟨∋  ⟩       is an element of the first element of the modified input
       ⟨ a₀⟩       and has the last element of the input as a prefix.
                   The output variable
       ⟨   ⟩       is the output from the sandwich
            b      with a number of characters removed from the beginning
             ↙X    equal to X.

A very different predicate with the same byte count:

Brachylog, 15 bytes

hṇ₁∋~c₂Xh~t?∧Xt

Try it online!

Same I/O.

   ∋               An element of
h                  the first element of
                   the input variable
 ṇ₁                split on spaces
    ~c             can be un-concatenated
      ₂            into a list of two strings
       X           which we'll call X.
        h          Its first element
         ~t        is the last element of
           ?       the input variable,
            ∧      and
             Xt    its last element is
                   the output variable.


0

Pyth, 21 20 18 17 16 bytes

AQVcH)IqxNG0:NG"

Try it online!

-1 by using V instead of FN because V implicitly sets N

-2 after some further reading about string slicing options

-1 using x to check for the presence of the substring at index 0

-1 using replace with "" for getting the end of the string

I'm sure this could use some serious golfing but as a Pyth beginner, just getting it to work was a bonus.

How does it work?

assign('Q',eval_input())
assign('[G,H]',Q)
for N in num_to_range(chop(H)):
    if equal(index(N,G),0):
        imp_print(at_slice(N,G,""))

0

Excel VBA, 86 bytes

Takes input as prefix in [A1] and values in [B1] and outputs to the console.

For each w in Split([B1]):?IIf(Left(w,[Len(A1)])=[A1],Mid(w,[Len(A1)+1])+" ","");:Next
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.