从字符串中删除重复项


17

受到这个朴素的StackOverflow问题的启发。

这个想法很简单;给定一个String和一个String数组,请从输入String中移除除第一个字符串之外的所有单词实例(忽略大小写),以及可能会留下的其他空白。单词必须匹配输入字符串中的整个单词,而不是单词的一部分。

例如 "A cat called matt sat on a mat and wore a hat A cat called matt sat on a mat and wore a hat", ["cat", "mat"]应该输出"A cat called matt sat on a mat and wore a hat A called matt sat on a and wore a hat"

输入项

  • 输入可以作为字符串,也可以作为字符串数组或字符串数​​组,其中输入字符串是第一个元素。这些参数可以采用任何顺序。
  • 输入的String不能视为空格分隔的String的列表。
  • 输入的字符串将没有前导,尾随或连续空格。
  • 所有输入将仅包含字符[A-Za-z0-9],但输入字符串还包括空格。
  • 输入数组可能为空,或包含输入字符串中未包含的单词。

输出量

  • 输出可以是函数的返回值,也可以打印到STDOUT
  • 输出必须与原始String大小写相同

测试用例

the blue frog lived in a blue house, [blue] -> the blue frog lived in a house
he liked to read but was filled with dread wherever he would tread while he read, [read] -> he liked to read but was filled with dread wherever he would tread while he
this sentence has no matches, [ten, cheese] -> this sentence has no matches
this one will also stay intact, [] -> this one will also stay intact
All the faith he had had had had no effect on the outcome of his life, [had] -> All the faith he had no effect on the outcome of his life
5 times 5 is 25, [5, 6] -> 5 times is 25
Case for different case, [case] -> Case for different
the letters in the array are in a different case, [In] -> the letters in the array are a different case
This is a test Will this be correct Both will be removed, [this,will] -> This is a test Will be correct Both be removed

由于这是代码高尔夫球,因此最低字节数将获胜!

Answers:


9

R,84字节

function(s,w,S=el(strsplit(s," ")),t=tolower)cat(S[!duplicated(x<-t(S))|!x%in%t(w)])

在线尝试!

小于100个字节的挑战也不是吗?

说明:

将字符串分解为单词后,我们需要排除那些

  1. 重复和
  2. w

或者,将其翻转,保持那些

  1. 单词的首次出现或
  2. 不在w

duplicated整齐地返回不是第一次出现的那些的逻辑索引,因此!duplicated()返回那些第一次出现的那些的x%in%w索引,并返回xin中的那些的逻辑索引w。整齐。


6

Java 8、117110字节

a->s->{for(String x:a)for(x="(?i)(.*"+x+".* )"+x+"( |$)(.*)";s.matches(x);s=s.replaceAll(x,"$1$3"));return s;}

说明:

在线尝试。

a->s->{                // Method with String-array and String parameters and String return
  for(String x:a)      //  Loop over the input-array
    for(x="(?i)(.*"+x+".* )"+x+"( |$)(.*)";
                       //   Regex to match
        s.matches(x);  //   Inner loop as long as the input matches this regex
      s=s.replaceAll(x,"$1$3")); 
                       //    Replace the regex-match with the 1st and 3rd capture groups
  return s;}           //  Return the modified input-String

正则表达式的其他说明:

(?i)(.*"+x+".* )"+x+"( |$)(.*)   // Main regex to match:
(?i)                             //  Enable case insensitivity
    (                            //  Open capture group 1
     .*                          //   Zero or more characters
       "+x+"                     //   The input-String
            .*                   //   Zero or more characters, followed by a space
               )                 //  End of capture group 1
                "+x+"            //  The input-String again
                     (           //  Open capture group 2
                       |$        //   Either a space or the end of the String
                         )       //  End of capture group 2
                          (      //  Open capture group 3
                           .*    //   Zero or more characters
                             )   //  End of capture group 3

$1$3                             // Replace the entire match with:
$1                               //  The match of capture group 1
  $3                             //  concatted with the match of capture group 3

4

MATL19 18字节

"Ybtk@kmFyfX<(~)Zc

输入是:字符串的单元格数组,然后是字符串。

在线尝试!验证所有测试用例

怎么运行的

"        % Take 1st input (implicit): cell array of strings. For each
  Yb     %   Take 2nd input (implicit) in the first iteration: string; or
         %   use the string from previous iteration. Split on spaces. Gives
         %   a cell array of strings
  tk     %   Duplicate. Make lowercase
  @k     %   Push current string from the array taken as 1st input. Make
         %   lowercase
  m      %   Membership: gives true-false array containing true for strings
         %   in the first input argument that equal the string in the second
         %   input argument
  F      %   Push false
  y      %   Duplicate from below: pushes the true-false array again
  f      %   Find: integer indices of true entries (may be empty)
  X<     %   Minimum (may be empty)
  (      %   Assignment indexing: write false in the true-false array at that
         %   position. So this replaces the first true (if any) by false
  ~      %   Logical negate: false becomes true, true becomes false
  )      %   Reference indexing: in the array of (sub)strings that was
         %   obtained from the second input, keep only those indicated by the
         %   (negated) true-false array
  Zc     %   Join strings in the resulting array, with a space between them
         % End (implicit). Display (implicit)

3

Perl 5,49个字节

@B=<>;$_=join$",grep!(/^$_$/xi~~@B&&$v{+lc}++),@F

在线尝试!

@TonHospel节省了9(!!)个字节!


1
这似乎对This is a test Will this be correct Both will be removed+ 失败this will。后两个单词已正确删除,但由于某种原因,它也删除了be后两个单词will
凯文·克鲁伊森

1
@KevinCruijssen嗯,我明白了为什么现在会这样。明天我会尝试适当看一下午餐,但我目前定价为+4。谢谢你让我知道!
唐·黑斯廷斯'18

49:@B=<>;$_=join$",grep!(/^$_$/xi~~@B&&$v{+lc}++),@F
吨霍斯贝尔

@TonHospel Ahh,花了一段时间试图在没有lc被parens的情况下被召唤。太棒了!对数组使用正则表达式更好,谢谢!我很难记住您的所有提示!
Dom Hastings

2

Pyth,27个字节

jdeMf!}r0eT@mr0dQmr0dPT._cz

在线尝试

说明

jdeMf!}r0eT@mr0dQmr0dPT._cz
                          z  Take the string input.
                       ._c   Get all the prefixes...
    f    eT@                 ... which end with something...
     !}         Q    PT      ... which is not in the input and the prefix...
       r0   mr0d mr0d        ... case insensitive.
jdeM                         Join the ends of each valid prefix.

我确定不区分大小写检查的10个字节可以减少,但是我不知道如何。


2

Stax,21 字节CP437

åìøΓ²¬$M¥øHΘQä~╥ôtΔ♫╟

解压缩时为25个字节,

vjcm[]Ii<;e{vm_]IU>*Ciyj@

结果是一个数组。Stax的方便输出是每行一个元素。

在线运行和调试!

说明

vj                           Convert 1st input to lowercase and split at spaces,
  c                          Duplicate at the main stack
   m                         Map array with the rest of the program 
                                 Implicitly output
    []I                      Get the first index of the current array element in the array
       i<                    Test 1: The first index is smaller than the iteration index
                                 i.e. not the first appearance
         ;                   2nd input
          {vm                Lowercase all elements
             _]I             Index of the current element in the 2nd input (-1 if not found)
                U>           Test 2: The index is non-negative
                                 i.e. current element is a member of the 2nd input
                  *C         If test 1 and test 2, drop the current element
                                 and go on mapping the next
                    iyj@     Fetch the corresponding element in the original input and return it as the mapped result
                                 This preserves the original case

2

Perl 6,49个字节

->$_,+w{~.words.grep:{.lcw».lc||!(%){.lc}++}}

测试一下

展开:

->              # pointy block lambda
  $_,           # first param 「$_」 (string)
  +w            # slurpy second param 「w」 (words)
{

  ~             # stringify the following (joins with spaces)

  .words        # split into words (implicit method call on 「$_」)

  .grep:        # take only the words we want

   {
     .lc        # lowercase the word being tested
               # is it not an element of
     w».lc      # the list of words, lowercased

     ||         # if it was one of the words we need to do a secondary check

     !          # Boolean invert the following
                # (returns true the first time the word was found)

     (
       %        # anonymous state Hash variable
     ){ .lc }++ # look up with the lowercase of the current word, and increment
   }
}

2

Perl 5中50 48个字节

包括+1用于-p

在STDIN的不同行上给出目标字符串,后跟每个过滤器单词:

perl -pe '$"="|";s%\b(@{[<>]})\s%$&x!$v{lc$1}++%iegx;chop';echo
This is a test Will this be correct Both will be removed
this
will
^D
^D

chop只需要修复的情况下,最后一个字被取出后间隔

只是代码:

$"="|";s%\b(@{[<>]})\s%$&x!$v{lc$1}++%iegx;chop

在线尝试!


1

JavaScript(ES6),98个字节

s=>a=>s.split` `.filter(q=x=>(q[x=x.toLowerCase()]=eval(`/\\b${x}\\b/i`).test(a)<<q[x])<2).join` `

1

K4,41个字节

解:

{" "/:x_/y@>y:,/1_'&:'(_y)~/:\:_x:" "\:x}

例子:

q)k){" "/:x_/y@>y:,/1_'&:'(_y)~/:\:_x:" "\:x}["A cat called matt sat on a mat and wore a hat A cat called matt sat on a mat and wore a hat";("cat";"mat")]
"A cat called matt sat on a mat and wore a hat A called matt sat on a and wore a hat"

q)k){" "/:x_/y@>y:,/1_'&:'(_y)~/:\:_x:" "\:x}["Case for different case";enlist "case"]
"Case for different"

q)k){" "/:x_/y@>y:,/1_'&:'(_y)~/:\:_x:" "\:x}["the letters in the array are in a different case";enlist "In"]
"the letters in the array are a different case"

q)k){" "/:x_/y@>y:,/1_'&:'(_y)~/:\:_x:" "\:x}["5 times 5 is 25";(1#"5";1#"6")]
"5 times is 25"

说明:

在空格上分割,将两个输入都小写,查找匹配项,除去除第一个匹配项外的所有内容,将字符串重新连接在一起。

{" "/:x_/y@>y:,/1_'&:'(_y)~/:\:_x:" "\:x} / the solution
{                                       } / lambda with implicit x & y args
                                  " "\:x  / split (\:) on whitespace " "
                                x:        / save result as x
                               _          / lowercase x
                          ~/:\:           / match (~) each right (/:), each left (\:)
                      (_y)                / lowercase y
                   &:'                    / where (&:) each ('), ie indices of matches
                1_'                       / drop first of each result
              ,/                          / flatten
            y:                            / save result as y
         y@>                              / descending indices (>) apply (@) to y
      x_/                                 / drop (_) from x
 " "/:                                    / join (/:) on whitespace " "

1

JavaScript(Node.js),75字节

f=(s,a)=>a.map(x=>s=s.replace(eval(`/\\b${x}\\b */ig`),s=>i++?"":s,i=0))&&s

在线尝试!


1
由于这不是递归函数,因此不需要f=在字节数中包含。您还可以通过传递参数,替换(s,a)=>s=>a=>,然后使用调用函数来节省字节f(s)(a)
毛茸茸的

@Shaggy是的,但是我真的很介意高尔夫功能的定义,因为主要是高尔夫身体。但是,多数民众赞成在一个很好的提示:)
DanielIndie 18'2

1

JavaScript ES6,78字节

f=(s,a,t={})=>s.split` `.filter(w=>a.find(e=>w==e)?(t[w]?0:t[w]=1):1).join` `

怎么运行的:

f=(s,a,t={})=> // Function declaration; t is an empty object by default
s.split` ` // Split the string into an array of words
.filter(w=> // Declare a function that, if it returns false, will delete the word
  a.find(e=>w==e) // Returns undeclared (false) if the word isn't in the list
  ?(t[w]?0 // If it is in the list and t[w] exists, return 0 (false)
    :t[w]=1) // Else make t[w] exist and return 1 (true)
  :1) // If the word isn't in the array, return true (keep the word for sure)
.join` ` // Rejoin the string

2
欢迎来到PPCG!由于您没有在f递归调用中使用函数名称,因此未命名的函数也将是有效的提交,因此可以通过删除来节省两个字节f=
马丁·恩德

欢迎来到PPCG!可悲的是,当涉及不同的案例时,此方法将失败。
毛茸茸的

如果不是那样,您可以将其减少到67个字节
Shaggy

@MartinEnder感谢您的提示!
伊恩

@Shaggy使用输入数组作为对象是我没有想到的一个有趣的想法。我将尝试解决案例问题。
伊恩

0

PowerShell v3或更高版本,104字节

Param($s,$w)$w|?{$_-and$s-match($r="\b$_(?: |$)")}|%{$h,$t=$s-split$r;$s="$h$($Matches.0)$(-join$t)"};$s

只需花费一个字节,就可以通过替换$Matches.0为在PS 2.0中运行$Matches[0]

长版:

Param($s, $w)
$w | Where-Object {$_ -and $s -match ($r = "\b$_(?: |$)")} |    # Process each word in the word list, but only if it matches the RegEx (which will be saved in $r).
    ForEach-Object {                                            # \b - word boundary, followed by the word $_, and either a space or the end of the string ($)
        $h, $t = $s -split $r                                   # Split the string on all occurrences of the word; the first substring will end up in $h(ead), the rest in $t(ail) (might be an array)
        $s = "$h$($Matches.0)$(-join $t)"                       # Create a string from the head, the first match (can't use the word, because of the case), and the joined tail array
    }
$s                                                              # Return the result

用法
另存为Whatever.ps1,并以字符串和单词作为参数进行调用。如果需要传递多个单词,则这些单词需要用@()包装:

.\Whatever.ps1 -s "A cat called matt sat on a mat and wore a hat A cat called matt sat on a mat and wore a hat" -w @("cat", "mat")

不带文件的替代方法(可以直接粘贴到PS控制台中):
将脚本另存为ScriptBlock(在花括号内)在变量中,然后调用其Invoke()方法,或将其与Invoke-Command一起使用:

$f={Param($s,$w)$w|?{$_-and$s-match($r="\b$_(?: |$)")}|%{$h,$t=$s-split$r;$s="$h$($Matches.0)$(-join$t)"};$s}
$f.Invoke("A cat called matt sat on a mat and wore a hat A cat called matt sat on a mat and wore a hat", @("cat", "mat"))
Invoke-Command -ScriptBlock $f -ArgumentList "A cat called matt sat on a mat and wore a hat A cat called matt sat on a mat and wore a hat", @("cat", "mat")

0

Javascript,150个字节

s=(x, y)=>{let z=new Array(y.length).fill(0);let w=[];for(f of x)(y.includes(f))?(!z[y.indexOf(f)])&&(z[y.indexOf(f)]=1,w.push(f)):w.push(f);return w}

除了打高尔夫球的问题(有关其他技巧,请参阅其他JS解决方案),它还将第一个输入作为单词数组,并输出挑战规范不允许的单词数组。当涉及不同的情况时,它也会失败。
蓬松

@Shaggy“输出可以是函数的返回值”这看起来像是从函数返回值?
aimorris

0

干净153个 142 138 134字节

import StdEnv,StdLib,Text
@ =toUpperCase
$s w#s=split" "s
=join" "[u\\u<-s&j<-[0..]|and[i<>j\\e<-w,i<-drop 1(elemIndices(@e)(map@s))]]

在线尝试!

定义函数$ :: String [String] -> String,几乎可以按照挑战描述进行。它为每个目标单词查找并删除在第一个单词之后的所有单词。


0

视网膜 46 37字节

+i`(^|,)((.+),.*\3.* )\3( |$)
$2
.*,

-14个字节要感谢@Neil,+5个字节可以解决错误。

输入格式 word1,word2,word3,sentence,因为我不确定如何使用多行输入(输入使用方式有所不同)。

说明:

在线尝试。

+i`(^|,)((.+),.*\3.* )\3( |$)   Main regex to match:
+i`                              Enable case insensitivity
   (^|,)                          Either the start of the string, or a comma
        (                         Open capture group 2
         (                         Open capture group 3
          .+                        1 or more characters
            )                      Close capture group 3
             ,                     A comma
              .*                   0 or more characters
                \3                 The match of capture group 3
                  .*               0 or more characters, followed by a space
                     )            Close capture group 2
                      \3          The match of capture group 2 again
                        ( |$)     Followed by either a space, or it's the end of the string
$2                              And replace everything with:
                                 The match of capture group 2

.*,                             Then get everything before the last comma (the list)
                                 and remove it (including the comma itself)

1
如所写,您可以将第一行简化为第二行+i`((.+),.*\2.* )\2( |$)$1但是我注意到您的代码often,he intended to keep ten geese仍然会失败。
尼尔

@Neil感谢-14高尔夫,并修复了+1的错误。
凯文·克鲁伊森

...但是现在在一个原始测试用例中失败了……
Neil

@Neil Ah oops ..再次固定为+4个字节。
凯文·克鲁伊森

好消息是,我认为您可以使用\b代替(^|,),但是坏消息是,我认为您需要\b\3\b(尽管还没有设计出合适的测试用例)。
尼尔,

0

红色,98字节

func[s w][foreach v w[parse s[thru[any" "v ahead" "]any[to remove[" "v ahead[" "| end]]| skip]]]s]

在线尝试!

f: func [s w][ 
    foreach v w [                   ; for each string in the array
        parse s [                   ; parse the input string as follows:
            thru [                  ; keep everything thru: 
                any " "             ; 0 or more spaces followed by
                v                   ; the current string from the array followed by
                ahead " "           ; look ahead for a space
            ]
            any [ to remove [       ; 0 or more: keep to here; then remove: 
                " "                 ; a space followed by 
                v                   ; the current string from the array
                ahead [" " | end]]  ; look ahead for a space or the end of the string
            | skip                  ; or advance the input by one 
            ]
        ]
    ]
    s                               ; return the processed string 
]

0

外壳,13个字节

wüöVËm_Ṗ3+⁰ew

以此顺序获取字符串列表和单个字符串作为参数。假定列表是无重复的。 在线尝试!

说明

wüöVËm_Ṗ3+⁰ew  Inputs: list of strings L (explicit, accessed with ⁰), string S (implicit).
               For example, L = ["CASE","for"], s = "Case for a different case".
            w  Split S on spaces: ["Case","for","a","different","case"]
 ü             Remove duplicates wrt an equality predicate.
               This means that a function is called on each pair of strings,
               and if it returns a truthy value, the second one is removed.
  öVËm_Ṗ3+⁰e    The predicate. Arguments are two strings, say A = "Case", B = "case".
           e    Put A and B into a list: ["Case","case"]
         +⁰     Concatenate with L: ["CASE","for","Case","case"]
       Ṗ3       All 3-element subsets: [["CASE","for","Case"],["CASE","for","case"],
                                        ["CASE","Case","case"],["for","Case","case"]]
  öV            Does any of them satisfy this:
    Ë            All strings are equal
     m_          after converting each character to lowercase.
                In this case, ["CASE","Case","case"] satisfies the condition.
               Result: ["Case","for","a","different"]
w              Join with spaces, print implicitly.

0

最小 125个字节

=a () =b a 1 get =c a 0 get " " split
(:d (b d in?) ((c d in?) (d b append #b) unless) (d b append #b) if) foreach
b " " join

输入quot在堆栈上,输入字符串为第一个元素,quot重复字符串中的a为第二个元素,即

("this sentence has no matches" ("ten" "cheese"))

0

Python 3,168字节

def f(s,W):
 s=s.split(" ");c={w:0for w in W}
 for w in W: 
  for i,v in enumerate(s):
   if v.lower()==w.lower():
    c[w]+=1
    if c[w]>1:s.pop(i)
 return" ".join(s)

在线尝试!


0

AWK,120字节

NR%2{for(;r++<NF;)R[tolower($r)]=1}NR%2==0{for(;i++<NF;$i=$(i+s))while(R[x=tolower($(i+s))])U[x]++?++s:i++;NF-=s}NR%2==0

在线尝试!

“删除空白”部分使此操作比我最初想到的更具挑战性。将字段设置为"",会删除一个字段,但是会留下一个额外的分隔符。

TIO链接具有28个额外的字节,以允许多个条目。

输入超过2行。第一行是单词列表,第二行是“句子”。注意,“单词”和“单词”与所附标点符号并不相同。具有标点符号要求可能会使这个问题变得更加有趣


0

红宝石63 61 60 59字节

->s,w{w.any?{|i|s.sub! /\b(#{i}\b.*) #{i}\b/i,'\1'}?redo:s}

在线尝试!

一个较短的版本,区分大小写,由于随机性,每10 次失败15次(37字节)

->s,w{s.uniq{|i|w.member?(i)?i:rand}}

0

Python 2,140个字节

from re import*
p='\s?%s'
S,A=input()
for a in A:S=sub(p%a,lambda s:s.end()==search(p%a,S,flags=I).end()and s.group()or'',S,flags=I)
print S

在线尝试!

说明:

re.sub(..)可以将函数而不是替换字符串作为参数。所以这里有一些花哨的lambda。每次出现模式都会调用该函数,并将一个对象传递给该函数-matchobject。该对象具有有关已建立事件的信息。我对这种情况的索引感兴趣,可以通过start()或检索end()函数。后期较短,因此可以使用。

为了排除单词的首次出现的替换,我使用了另一个正则表达式搜索功能来精确地获取第一个单词,然后使用相同的方法比较索引 end()

标志re.I是的短版re.IGNORECASES

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.