正则表达式(.NET风味),182 181 145 132 126 114 104 100 98个 97个 96字节
2D ASCII艺术图案识别?听起来像是正则表达式的工作!(不是)
我知道这将再次引发关于正则表达式提交是否有效程序的无休止的无休止的讨论,但是我怀疑这无论如何都会击败APL或CJam,因此我认为没有任何危害。(也就是说,它们确实通过了“什么是编程语言?”的严格测试。)
这将输入作为要匹配的字符串,结果是找到的匹配数。它用于_
代替.
,因为我必须逃避后者。它还需要尾随换行符。
(X(X){1,21})(?=\D+((?>(?<-2>_)+)_))(?=.((?!\7)(.)*
.*(X\3X|()\1.)(?=(?<-5>.)*(?(5)!)
)){4,23}\7)
您可以在RegexHero或RegexStorm上对其进行实时测试。匹配项将是门户网站上最黑曜石的行。如果您发现一个失败的测试用例,请告诉我!
这是什么法术?
以下说明假定您对.NET的平衡组有基本的了解。要点是捕获是.NET正则表达式中的堆栈-每个具有相同名称的新捕获都被推送到堆栈上,但是还存在从这些堆栈中再次弹出捕获的语法,以及从一个堆栈中弹出捕获并推送捕获的语法同时移到另一个。要获得更完整的图片,您可以查看我对Stack Overflow的回答,其中应涵盖所有细节。
基本思想是匹配以下模式:
X{n}..{m}
X_{n}X.{m} |
X_{n}X.{m} | 3 to 22 times
X_{n}X.{m} |
X{n}..{m}
其中n
在2和22之间(含)。棘手的事情是使所有n
s和所有m
s都相同。由于实际字符将不同,因此我们不能仅使用反向引用。
请注意,正则表达式必须嵌入换行符,\n
如下所示。
( # Open capturing group 1. This will contain the top of a portal, which
# I can reuse later to match the bottom (being of the same length).
X # Match a single X.
(X){1,21} # Match 1 to 21 X's, and push each separately on the <2> stack. Let's
# Call the number of X's captured N-1 (so N is the inner width of the
# portal).
) # End of group 1. This now contains N X's.
(?= # Start a lookahead. The purpose of this lookahead is to capture a
# string of N underscores in group 2, so I can easily use this to match
# the inside rows of the portal later on. I can be sure that such a
# string can always be found for a valid portal (since it cannot have 0
# inner height).
\D+ # Skip past a bunch of non-digits - i.e. *any* of the vaild characters
# of the input (_, X, \n). This to make sure I search for my N
# underscores anywhere in the remainder of the input.
( # Open capturing group 3. This will contain a portal row.
(?> # This is an atomic group. Once the engine hass successfully matched the
# contents of this group, it will not go back into the group and try to
# backtrack other possible matches for the subpattern.
(?<-2>_)+ # Match underscores while popping from the <2> stack. This will match as
# many underscores as possible (but not more than N-1).
) # End of the atomic group. There are two possible reasons for the
# subpattern stopping to match: either the <2> stack is empty, and we've
# matched N-1 underscores; or we've run out of underscores, in which
# case we don't know how many underscores we matched (which is not
# good).
_ # We simply try to match one more underscore. This ensures that we
# stopped because the <2> stack was empty and that group 3 will contain
# exactly N underscores.
) # End of group 3.
) # End of the lookahead. We've got what we want in group 2 now, but the
# regex engine's "cursor" is still at the end of the portal's top.
(?= # Start another lookahead. This ensures that there's actually a valid
# portal beneath the top. In theory, this doesn't need to be a
# lookahead - I could just match the entire portal (including the lines
# it covers). But matches cannot overlap, so if there were multiple
# portals next to each other, this wouldn't return all of them. By
# putting the remainder of the check in a lookahead the actual matches
# won't overlap (because the top cannot be shared by two portals).
. # Match either _ or X. This is the character above the portal side.
( # This group (4) is where the real magic happens. It's purpose is to to
# count the length of the rest of the current line. Then find a portal
# row in the next line, and ensure that it's the same distance from the
# end of the line. Rinse and repeat. The tricky thing is that this is a
# single loop which matches both inner portal rows, as well as the
# bottom, while making sure that the bottom pattern comes last.
(?!\7) # We didn't have a group 7 yet... group 7 is further down the pattern.
# It will capture an empty string once the bottom row has been matched.
# While the bottom row has not been matched, and nothing has been
# captured, the backreference will fail, so the negative lookahead will
# pass. But once we have found the bottom row, the backreference will
# always match (since it's just an empty string) and so the lookahead
# will fail. This means, we cannot repeat group 4 any more after the
# bottom has been matched.
(.)* # Match all characters until the end of the line, and push each onto
# stack <5>.
\n # Match a newline to go to the next line.
.* # Match as many characters as necessary to search for the next portal
# row. This conditions afterwards will ensure that this backtracks to
# the right position (if one exists).
( # This group (6) will match either an inner portal row, or the bottom
# of the portal.
X\3X # Match X, then N underscores, then X - a valid inner portal row.
| # OR
() # Capture an empty string into group 7 to prevent matching further rows.
\1. # Use the captured top to match the bottom and another character.
)
(?= # This lookahead makes sure that the row was found at the same
# horizontal position as the top, by checking that the remaining line
# is the same length.
(?<-5>.)* # Match characters while popping from the <5> stack.
(?(5)!)\n # Make sure we've hit end of the line, *and* the <5> stack is empty.
)
){4,23} # Repeat this 4 to 23 times, to ensure an admissible portal height.
# Note that this is one more than the allowed inner height, to account
# for the bottom row.
\7 # Now in the above repetition there is nothing requiring that we have
# actually matched any bottom row - it just ensured we didn't continue
# if we had found one. This backreference takes care of that. If no
# bottom row was found, nothing was captured into group 7 and this
# backreference fails. Otherwise, this backreference contains an empty
# string which always matches.
)
C#,185个字节
这是一个完整的C#函数,仅是为了使其成为有效条目。是时候我为.NET正则表达式编写命令行“解释器”了。
static int f(string p){return System.Text.RegularExpressions.Regex.Matches(p,@"(X(X){1,21})(?=\D+((?>(?<-2>_)+)_))(?=.((?!\7)(.)*
.*(X\3X|()\1.)(?=(?<-5>.)*(?(5)!)
)){4,23}\7)").Count;}