是否存在捕获上下文无关语言的正则表达式扩展？

25

在许多涉及上下文无关文法（CFG）的论文中，在那里出现的此类文法示例经常承认对其生成语言的简单刻画。例如：

$S \to a a S b$
$S \to$

生成， $\{ a^{2i} b^i | i \geq 0\}$

$S \to a S b$
$S \to a a S b$
$S \to$

生成，然后 $\{ a^i b^j \mid i \geq j \geq 0 \}$

$S \to a S a$
$S \to b S b$
$S \to$

生成 $\{ w w^R \mid w \in (a|b)^* \}$ ，或等效 $\{ ((a|b)^*)_1 ((a|b)^*)_2 \mid p_1 = p_2^R \}$ （其中 $p_1$ 是指由拍摄的部分 $(...)_1$ ）。

可以通过添加索引（），对这些索引的简单约束（）以及将模式匹配到正则表达式来生成上述示例。这使我想知道是否可以通过正则表达式的某种扩展来生成所有无上下文相关的语言。 $a^i$ $i > j$

是否存在可以生成上下文无关语言的全部或某些重要子集的正则表达式扩展？

fl.formal-languages context-free context-free-languages

— 亚历克斯·十·布林克
source

3

观察加算指标和约束是太强大了：你将能够定义

，这不是CFL。

a^{n} b^{n} c^{n}

$a^nb^nc^n$

— Shaull

34

就在这里。将上下文无关的表达式定义为以下语法生成的术语：

\begin{array}{lcll} g & ::= & ϵ & Empty string \\ | & c & Character c in alphabet Σ \\ | & g \cdot g & Concatenation \\ | & ⊥ & Failing pattern \\ | & g \lor g & Disjunction \\ | & μ α . g & Recursive grammar expression \\ | & α & Variable expression \end{array}

$\begin{array}{lcll} g & ::= & \epsilon & \mbox{Empty string}\\ & | & c & \mbox{Character $c$ in alphabet $\Sigma$} \\ & | & g \cdot g & \mbox{Concatenation} \\ & | & \bot & \mbox{Failing pattern} \\ & | & g \vee g & \mbox{Disjunction}\\ & | & \mu \alpha.\; g & \mbox{Recursive grammar expression} \\ & | & \alpha & \mbox{Variable expression} \end{array}$

这是所有构造为除了Kleene星，这是通过一般的定点操作者替换正则语言的和变量引用机制。（无需Kleene星号，因为它可以被定义为 $\mu \alpha.\;g$ ）。 $g\ast \triangleq \mu \alpha.\;\epsilon \vee g\cdot\alpha$

上下文无关表达式的解释需要考虑自由变量的解释。因此，将环境定义为从变量到语言（即子集）的映射，然后让是像的行为的函数上除了所有输入，并且其返回语言为。 $\rho$ $\Sigma^*$ $[\rho|\alpha:L]$ $\rho$ $\alpha$ $L$ $\alpha$

现在，如下定义一个无上下文表达式的解释：

\begin{array}{lcl} [[ϵ]] ρ & = & {ϵ} \\ [[c]] ρ & = & {c} \\ [[g_{1} \cdot g_{2}]] ρ & = & {w_{1} \cdot w_{2} ∣ | w_{1} \in [[g_{1}]] ρ \land w_{2} \in [[g_{2}]] ρ} \\ [[⊥]] ρ & = & \emptyset \\ [[g_{1} \lor g_{2}]] ρ & = & [[g_{1}]] ρ \cup [[g_{2}]] ρ \\ [[α]] ρ & = & ρ (α) \\ [[μ α . g]] ρ & = & ⋃_{n \in N} L_{n} \\ where \\ L_{0} & = & \emptyset \\ L_{n + 1} & = & L_{n} \cup [[g]] [ρ | α : L_{n}] \end{array}

$\newcommand{\interp}[2]{[\![{#1}]\!]\;{#2}} \newcommand{\setof}[1]{\left\{#1\right\}} \newcommand{\comprehend}[2]{\setof{{#1}\;\mid|\;{#2}}} \begin{array}{lcl} \interp{\epsilon}{\rho} & = & \setof{\epsilon} \\ \interp{c}{\rho} & = & \setof{c} \\ \interp{g_1\cdot g_2}{\rho} & = & \comprehend{w_1 \cdot w_2}{w_1 \in \interp{g_1}{\rho} \land w_2 \in \interp{g_2}{\rho}} \\ \interp{\bot}{\rho} & = & \emptyset \\ \interp{g_1 \vee g_2}{\rho} & = & \interp{g_1}{\rho} \cup \interp{g_2}{\rho} \\ \interp{\alpha}{\rho} & = & \rho(\alpha) \\ \interp{\mu \alpha.\; g}{\rho} & = & \bigcup_{n \in \mathbb{N}} L_n \\ \mbox{where} & & \\ L_0 & = & \emptyset \\ L_{n+1} & = & L_n \cup \interp{g}{[\rho|\alpha:L_n]} \end{array}$

使用克纳斯特-塔斯基定理，可以很容易地看到，解释是表达式中最不固定的。 $\mu \alpha.g$

这很简单（尽管并非完全无关紧要），它表明您可以给出一个上下文无关的表达式，该表达式派生与任何上下文无关的语法相同的语言，反之亦然。无关紧要的原因在于，上下文无关的表达式具有嵌套的固定点，而上下文无关的语法为您提供了元组上的单个固定点。这就需要使用Bekic的引理，恰好说嵌套的固定点可以转换为产品上的单个固定点（反之亦然）。但这是唯一的微妙之处。

编辑：不，我对此不知道标准参考：我出于自己的利益制定了这个标准。但是，这是一个显而易见的构造，我相信它是以前发明的。一些偶然的谷歌搜索揭示了Joost Winter，Marcello Bonsangue和Jan Rutten最近发表的论文《上下文无关的 语言，Cogegebraically》，其中他们给出了该定义的变体（要求保护所有固定点），它们也称为上下文无关的表达式。

— 尼尔·克里希纳斯瓦米（Neel Krishnaswami）
source

太棒了是否有标准名称或参考？

— 亚历克斯十布林克2013年

5

阿尔托·萨洛玛（Arto Salomaa）在1973年的“形式语言”一书中对此进行了介绍。他称它们为“正则表达式”。

— 2014年

3

在MathOverflow上有一个与生成函数是完整函数的语言密切相关的问题（和几个答案）。

有趣的是，Neel对上述语义的定义恰好对应于通过隐式Species定理对递归Species方程的Species解存在的（建设性）证明。不幸的是，他的证明大纲还必须包含一个细微的错误，因为在某些情况下，事情会变得“无限”。换句话说，在语法的变换的雅可比行列上有一个条件是非奇异的，这是需要的。这可能就是Bonsangue-Rutten要求保护固定点的原因，以此作为确保Jacobian条件发生这种情况的一种方法。 $\mu$

— 雅克·卡特
source

μ α . g

$\mu\alpha.\;g$

[μ α . g / α] g

$[\mu\alpha.\;g/\alpha]g$

1

我们最近发布了一个框架的概述，该框架可以做到这一点。在comp.compilers下查看，我在其中发送了通知以及一些链接。

新的发展是基于乔姆斯基-舒岑贝格定理的，并且可以认为是这一结果的完成。乔姆斯基本人已被告知事态发展，并表示希望“赶上”。

伴随着这一发展，我们还为上下文无关的表达建立了两种等效的公式-一种是“最小定点”μ演算形式（最初由Gruska，Yntema和McWhirter进行的扩展/完成）-该书在2014年获得了最终定稿，其他书则在2008年发表。

— NinjaDarth
source

4

Please include all relevant information in the answer itself. “Look under comp.compilers” is an unhelpul answer already now, and it will be completely useless in a couple of months.

— Emil Jeřábek supports Monica

That's totally wrong. Comp.compilers (unlike this site and other blogs, by the way) is permanently archived. There you will find all the details you need. There are many links that may be found there, in the most recently posted article, as well. Also, unlike blog sites, it is open to the outside and useful to a much wider audience. You should have no difficulty finding anything on the USENET - which is where queries like this should be addressed and discussed. If you have difficulty, here is the link. groups.google.com/forum/#!topic/comp.compilers/YCa5jHUR1iQ

— NinjaDarth

2

问题不是它没有存档，而是存档很大。当我抬头看档案，现在我可以找到你的职位地方靠近顶端，但是当有人看到这个答案数月或数年的未来，他们将不知道从哪里开始挖。当您可以使读者指向更特定的位置时，使他们进行冗长而又不可靠的搜索是一种傲慢而粗鲁的做法。现在，我为您做到了。花了大约30秒。您本来可以做到的。

— 埃米尔·杰拉贝克（EmilJeřábek）