正则表达式：匹配平均主义系列

介绍

我在这里看不到很多正则表达式挑战，因此我想提供这种看似简单的方法，可以使用多种正则表达式来以多种方式完成。我希望它为正则表达式爱好者提供一些有趣的打高尔夫球时间。

挑战

面临的挑战是匹配我非常宽松地称为“平等主义者”系列的内容：一系列相等数量的不同字符。最好用示例来描述。

比赛：

aaabbbccc
xyz 
iillppddff
ggggggoooooollllllffffff
abc
banana

不匹配：

aabc
xxxyyzzz
iilllpppddff
ggggggoooooollllllfff
aaaaaabbbccc
aaabbbc
abbaa
aabbbc

一概而论，要匹配形式的主题（为任意字符的列表来，在所有c₁)ⁿ(c₂)ⁿ(c₃)ⁿ...(c_k)ⁿc₁c_kc_i != c_i+1i, k > 1, and n > 0.

说明：

输入将不为空。
字符可以稍后在字符串中重复（例如“ banana”）
k > 1，因此字符串中将始终至少包含2个不同的字符。
您可以假定仅将ASCII字符作为输入传递，并且没有字符将成为行终止符。

规则

（感谢马丁·恩德（Martin Ender）出色阐述了这一规则）

您的答案应该由一个正则表达式组成，没有任何其他代码（可选地，使您的解决方案起作用所需的正则表达式修饰符列表除外）。您不得使用允许您以托管语言调用代码的语言正则表达式功能（例如Perl的e修饰符）。

您可以使用在挑战之前存在的任何正则表达式风味，但请指定风味。

不要假设正则表达式是隐式锚定的，例如，如果您使用的是Python，请假设您的正则表达式用于re.search，而不用于re.match。您的正则表达式必须与整个字符串匹配有效的均等字符串，而对于无效字符串则不匹配。您可以根据需要使用任意数量的捕获组。

您可以假设输入将始终是两个或多个不包含任何行终止符的ASCII字符的字符串。

这是正则表达式高尔夫，因此以字节为单位的最短正则表达式获胜。如果您的语言需要分隔符（通常是/.../）来表示正则表达式，请不要计算分隔符本身。如果您的解决方案需要修饰符，请为每个修饰符添加一个字节。

标准

这是一门不错的老式高尔夫球，所以请不要考虑效率问题，而要尽量减小正则表达式。

请说明您使用了哪种正则表达式，并在可能的情况下提供一个链接，该链接显示了您在操作中的表情在线演示。

code-golf string regular-expression

— 杰伊特茶
source

这是正则表达式高尔夫吗？您可能应该澄清一下，以及它的规则。该站点上的最大挑战是各种编程语言。

— LyricLy

@LyricLy感谢您的建议！是的，我希望它纯粹是正则表达式。提交者选择的正则表达式形式的单个正则表达式。我还要注意其他规则吗？

— jaytea

我不理解您对“平等主义者”的定义，即“平等主义者” banana。

— msh210 '17

@ msh210当我想到“平等主义”一词来描述该系列时，我不认为我会允许在该系列的后面重复字符（例如“ banana”或“ aaabbbcccaaa”等）。。我只是想用一个术语来表示这样的想法，即每个重复字符的块都相同。由于“香蕉”没有重复的字符，因此此定义适用。

— jaytea

Answers:

.NET风格，48个字节

^(.)\1*((?<=(\5())*(.))(.)(?<-4>\6)*(?!\4|\6))+$

在线尝试！（使用视网膜）

好吧，事实证明，不排除逻辑毕竟更简单。我将其作为一个单独的答案，因为这两种方法完全不同。

说明

^            # Anchor the match to the beginning of the string.
(.)\1*       # Match the first run of identical characters. In principle, 
             # it's possible that this matches only half, a quarter, an 
             # eighth etc of of the first run, but that won't affect the 
             # result of the match (in other words, if the match fails with 
             # matching this as the entire first run, then backtracking into
             # only matching half of it won't cause the rest of the regex to
             # match either).
(            # Match this part one or more times. Each instance matches one
             # run of identical letters.
  (?<=       #   We start with a lookbehind to record the length
             #   of the preceding run. Remember that the lookbehind
             #   should be read from the bottom up (and so should
             #   my comments).
    (\5())*  #     And then we match all of its adjacent copies, pushing an
             #     empty capture onto stack 4 each time. That means at the
             #     end of the lookbehind, we will have n-1 captures stack 4, 
             #     where n is the length of the preceding run. Due to the 
             #     atomic nature of lookbehinds, we don't have to worry 
             #     about backtracking matching less than n-1 copies here.
    (.)      #     We capture the character that makes up the preceding
             #     run in group 5.
  )
  (.)        #   Capture the character that makes up the next run in group 6.
  (?<-4>\6)* #   Match copies of that character while depleting stack 4.
             #   If the runs are the same length that means we need to be
             #   able to get to the end of the run at the same time we
             #   empty stack 4 completely.
  (?!\4|\6)  #   This lookahead ensures that. If stack 4 is not empty yet,
             #   \4 will match, because the captures are all empty, so the
             #   the backreference can't fail. If the stack is empty though,
             #   then the backreference will always fail. Similarly, if we
             #   are not at the end of the run yet, then \6 will match 
             #   another copy of the run. So we ensure that neither \4 nor
             #   \6 are possible at this position to assert that this run
             #   has the same length das the previous one.
)+
$            # Finally, we make sure that we can cover the entire string
             # by going through runs of identical lengths like this.

— 马丁·恩德
source

我喜欢您在这两种方法之间摇摆不定！我还认为，在我实际尝试消极方法之前，它应该更短一些，但发现它更加尴尬（即使感觉应该更简单）。我在PCRE中有48b，在Perl中有49b，使用完全不同的方法，而.NET中的第三种方法的大小相同，我想说这是一个很酷的正则表达式挑战：D

— jaytea

@jaytea我很想看看那些。如果一周左右没有人提出任何建议，希望您自己张贴。:)是的，同意，方法在字节数上如此接近真是太好了。

— Martin Ender's

我可能会！另外，Perl一个已经打到46b了；）

— jaytea

所以我想您可能现在想看看这些！这是PCRE中的48b：((^.|\2(?=.*\4\3)|\4(?!\3))(?=\2*+((.)\3?)))+\3$我正在尝试\3*将其(?!\3)改为45b，但是在“ aabbbc”上失败了：( Perl版本更易于理解，现在降至45b：^((?=(.)\2*(.))(?=(\2(?4)?\3)(?!\3))\2+)+\3+$-我称其为Perl的原因PCRE似乎是有效的，因为PCRE认为(\2(?4)?\3)可以无限期递归，而Perl

— 则更

@jaytea啊，那真是整洁的解决方案。您应该将它们发布在单独的答案中。:)

— Martin Ender's

.NET风格，54个字节

^(?!.*(?<=(\2)*(.))(?!\2)(?>(.)(?<-1>\3)*)(?(1)|\3)).+

在线尝试！（使用视网膜）

我敢肯定这不是最理想的选择，但这是我目前为平衡小组提出的最佳方案。在相同的字节数下，我有一个选择，它几乎是相同的：

^(?!.*(?<=(\3())*(.))(?!\3)(?>(.)(?<-2>\4)*)(\2|\4)).+

说明

主要思想是反转问题，匹配非均等的字符串，并将整个问题否定地提前否定结果。好处是我们不必在整个字符串中都跟踪n（由于平衡组的性质，通常在检查n时会消耗n）来检查所有游程的长度是否相等。取而代之的是，我们只寻找一对长度不相同的相邻行程。这样，我只需要使用一次n。

这是正则表达式的细分。

^(?!.*         # This negative lookahead means that we will match
               # all strings where the pattern inside the lookahead
               # would fail if it were used as a regex on its own.
               # Due to the .* that inner regex can match from any
               # position inside the string. The particular position
               # we're looking for is between two runs (and this
               # will be ensured later).

  (?<=         #   We start with a lookbehind to record the length
               #   of the preceding run. Remember that the lookbehind
               #   should be read from the bottom up (and so should
               #   my comments).
    (\2)*      #     And then we match all of its adjacent copies, capturing
               #     them separately in group 1. That means at the
               #     end of the lookbehind, we will have n-1 captures
               #     on stack 1, where n is the length of the preceding
               #     run. Due to the atomic nature of lookbehinds, we
               #     don't have to worry about backtracking matching
               #     less than n-1 copies here.
    (.)        #     We capture the character that makes up the preceding
               #     run in group 2.
  )
  (?!\2)       #   Make sure the next character isn't the same as the one
               #   we used for the preceding run. This ensures we're at a
               #   boundary between runs.
  (?>          #   Match the next stuff with an atomic group to avoid
               #   backtracking.
    (.)        #     Capture the character that makes up the next run
               #     in group 3.
    (?<-1>\3)* #     Match as many of these characters as possible while
               #     depleting the captures on stack 1.
  )
               #   Due to the atomic group, there are three two possible
               #   situations that cause the previous quantifier to stopp
               #   matching. 
               #   Either the run has ended, or stack 1 has been depleted.
               #   If both of those are true, the runs are the same length,
               #   and we don't actually want a match here. But if the runs
               #   are of different lengths than either the run ended but
               #   the stack isn't empty yet, or the stack was depleted but
               #   the run hasn't ended yet.
  (?(1)|\3)    #   This conditional matches these last two cases. If there's
               #   still a capture on stack 1, we don't match anything,
               #   because we know this run was shorter than the previous
               #   one. But if stack 1, we want to match another copy of 
               #   the character in this run to ensure that this run is 
               #   longer than the previous one.
)
.+             # Finally we just match the entire string to comply with the
               # challenge spec.

— 马丁·恩德
source

我试图使它失败的：banana，aba，bbbaaannnaaannnaaa，bbbaaannnaaannnaaaaaa，The Nineteenth Byte，11，110，^(?!.*(?<=(\2)*(.))(?!\2)(?>(.)(?<-1>\3)*)(?(1)|\3)).+，bababa。是我失败了。:( +1

— Erik the Outgolfer

那一刻，当您完成解释然后弄清楚您可以通过使用完全相反的方法来节省1个字节时...我想我会在稍后再给出一个答案...：|

— Martin Ender's

@MartinEnder ...然后意识到您可以按2个字节打高尔夫球，这是哈哈：P

— Xcoder先生，2017年

@ Mr.Xcoder现在必须为7个字节，所以我希望我安全。;）

— Martin Ender's