使用下推自动机证明无上下文语言的抽水引理

21

可以通过考虑识别所研究语言的有限状态自动机，选择长度大于其状态数的字符串以及应用信鸽原理来证明常规语言的抽动引理。在对上下文无关语言泵引理（以及奥格登引理这是稍微更普遍的），但是，考虑研究语言的上下文无关文法，选择一个足够长的字符串，并期待在解析树证明。

鉴于这两个泵送引理的相似性，您希望可以通过考虑识别语言而不是语法的下推自动机，以与常规的类似的方式来证明与上下文无关的一个。但是，我没有找到关于这种证明的任何参考。

因此，我的问题是：是否有证据证明无上下文语言仅涉及下推自动机而不涉及语法？

— 3纳米
source

16

我再次考虑了这个问题，并且我有充分的证据。这比我预期的要棘手。欢迎评论！更新：如果对某人有用，我会在arXiv上提交此证明：http : //arxiv.org/abs/1207.2819

$\DeclareMathOperator{\fp}{fp}$ $\DeclareMathOperator{\lp}{lp}$ $\newcommand{\fpp}[1]{\widehat{\fp{#1}}}$ $\newcommand{\lpp}[1]{\widehat{\lp{#1}}}$

Let $L$ be a context-free language over an alphabet $\Sigma$ . Let $A$ be a pushdown automaton which recognizes $L$ , with stack alphabet $\Gamma$ . We denote by $|A|$ the number of states of $A$ . Without loss of generality, we can assume that transitions of $A$ pop the topmost symbol of the stack and either push no symbol on the stack or push on the stack the previous topmost symbol and some other symbol.

We define $p' = |A|^2 |\Gamma|$ and $p = |A| (|\Gamma|+1)^{p'}$ the pumping length, and will show that all $w \in L$ such that $|w| > p$ have a decomposition of the form $w = u v x y z$ such that $|vxy| \leq p$ , $|vy| \geq 1$ and $\forall n \geq 0, u v^n x y^n z \in L$ .

Let $w \in L$ such that $|w| > p$ . Let $\pi$ be an accepting path of minimal length for $w$ (represented as a sequence of transitions of $A$ ), we denote its length by $|\pi|$ . We can define, for $0 \leq i < |\pi|$ , $s_i$ the size of the stack at position $i$ of the accepting path. For all $N > 0$ , we define an $N$ -level over $\pi$ as a set of three indices $i, j, k$ with $0 \leq i < j < k \leq p$ such that:

$s_i = s_k, s_j = s_i + N$
for all $n$ such that $i \leq n \leq j$ , $s_i \leq s_n \leq s_j$
for all $n$ such that $j \leq n \leq k$ , $s_k \leq s_n \leq s_k$ .

(For an example of this, see the picture for case 2 below which illustrates an $N$ -level.)

We define the level $l$ of $\pi$ as the maximal $N$ such that $\pi$ has an $N$ -level. This definition is motivated by the following property: if the size of the stack over a path $\pi$ becomes larger than its level $l$ , then the stack symbols more than $l$ levels deep will never be popped. We will now distinguish two cases: either $l < p'$ , in which case we know that the same configuration for the automaton state and the topmost $l$ symbols of the stack is encountered twice in the first $p+1$ steps of $\pi$ , or $l \geq p'$ , and there must be a stacking and unstacking position that can be repeated an arbitrary number of times, from which we construct $v$ and $y$ .

Case 1. $l < p'$ . We define the configurations of $A$ as the couples of a state of $A$ and a sequence of $l$ stack symbols (where stacks of size less than $l$ with be represented by padding them to $l$ with a special blank symbol, which is why we use $|\Gamma| + 1$ when defining $p$ ). By definition, there are $|A| (|\Gamma| + 1)^l$ such configurations, which is less than $p$ . Hence, in the $p+1$ first steps of $\pi$ , the same configuration is encountered twice at two different positions, say $i < j$ . Denote by $\widehat{i}$ (resp. $\widehat{j}$ ) the position of the last letter of $w$ read at step $i$ (resp. $j$ ) of $\pi$ . We have $\widehat{i} \leq \widehat{j}$ . Hence, we can factor $w = u v x y z$ with $y z = \epsilon$ , $u = w_{0 \cdots \widehat{i}}$ , $v = w_{\widehat{i} \cdots \widehat{j}}$ , $x = w_{\widehat{j} \cdots |w|}$ . (By $w_{x \cdots y}$ we denote the letters of $w$ from $x$ inclusive to $y$ exclusive.) By construction, $|vxy| \leq p$ .

We also have to show that $\forall n \geq 0, u v^n x y^n z = u v^n x \in L$ , but this follows from our observation above: stack symbols deeper than $l$ are never popped, so there is no way to distinguish configurations which are equal according to our definition, and an accepting path for $u v^n x$ is built from that of $w$ by repeating the steps between $i$ and $j$ , $n$ times.

Finally, we also have $|v| > 0$ , because if $v = \epsilon$ , then, because we have the same configuration at steps $i$ and $j$ in $\pi$ , $\pi' = \pi_{0 \cdots i} \pi_{j \cdots |\pi|}$ would be an accepting path for $w$ , contradicting the minimality of $\pi$ .

(Note that this case amounts to applying the pumping lemma for regular languages by hardcoding the topmost $l$ stack symbols in the automaton state, which is adequate because $l$ is small enough to ensure that $|w|$ is larger than the number of states of this automaton. The main trick is that we must adjust for $\epsilon$ -transitions.)

Case 2. $l \geq p'$ . Let $i, j, k$ be a $p'$ -level. To any stack size $h$ , $s_i \leq h \leq s_j$ , we associate the last push $\lp(h) = \max(\{y \leq j | s_y = h\})$ and the first pop $\fp(h) = \min(\{y \geq j | s_y = h\})$ . By definition, $i \leq \lp(h) \leq j$ and $j \leq \fp(h) \leq k$ . Here is an illustration of this construction. To simplify the drawing, I omit the distinction between the path positions and word positions which we will have to do later.

Illustration of the construction for case 2. To simplify the drawing, the distinction between the path positions and word positions are ommitted.

We say that the full state of a stack size $h$ is the triple formed by:

the automaton state at position $\lp(h)$
the topmost stack symbol at position $\lp(h)$
the automaton state at position $\fp(h)$

There are $p'$ possible full states, and $p' + 1$ stack sizes between $s_i$ and $s_j$ , so, by the pidgeonhole principle, there exist two stack sizes $g, h$ with $s_i \leq g < h \leq s_j$ such that the full states at $g$ and $h$ are the same. Like in Case 1, we define by $\lpp(g)$ , $\lpp(h)$ , $\fpp(h)$ and $\fpp(g)$ the positions of the last letters of $w$ read at the corresponding positions in $\pi$ . We factor $w = u v x y z$ where $u = w_{0 \cdots \lpp(g)}$ , $v = w_{\lpp(g) \cdots \lpp(h)}$ , $x = w_{\lpp(h) \cdots \fpp(h)}$ , $y = w_{\fpp(h) \cdots \fpp(g)}$ , and $z = w_{\fpp(g) \cdots |w|}$ .

This factorization ensures that $|vxy| \leq p$ (because $k \leq p$ by our definition of levels).

We also have to show that $\forall n \geq 0, u v^n x y^n z \in L$ . To do so, observe that each time that we repeat $v$ , we start from the same state and the same stack top and we do not pop below our current position in the stack (otherwise we would have to push again at the current position, violating the maximality of $\lp(g)$ ), so we can follow the same path in $A$ and push the same symbol sequence on the stack. By the maximality of $\lp(h)$ and the minimality of $\fp(h)$ , while reading $x$ , we do not pop below our current position in the stack, so the path followed in the automaton is the same regardless of the number of times we repeated $v$ . Now, if we repeat $w$ as many times as we repeat $v$ , since we start from the same state, since we have pushed the same symbol sequence on the stack with our repeats of $v$ , and since we do not pop more than what $v$ has stacked by minimality of $\fp(g)$ , we can follow the same path in $A$ and pop the same symbol sequence from the stack. Hence, an accepting path from $u v^n x y^n z$ can be constructed from the accepting path for $w$ .

Finally, we also have $|vy| > 1$ , because like in case 1, if $v = \epsilon$ and $y = \epsilon$ , we can build a shorter accepting path for $w$ by removing $\pi_{\lp(g)\cdots\lp(h)}$ and $\pi_{\fp(h)\cdots\fp(g)}$ .

Hence, we have an adequate factorization in both cases, and the result is proved.

(Credit goes to Marc Jeanmougin for helping me with this proof.)

— a3nm
source

7

Yes it is possible. We could use the notion of surface configurations; they were introduced by Cook a long time back. With this it should be quite easy to get a version of pumping lemma out.

As to surface configurations, almost any paper on LogCFL should carry its definition. Here is a recent paper and here is a thesis

Maybe someone more energetic can spell out the details!

— V Vinay
source

Thanks for answering! Yes, it is pretty natural to look at the combination of automaton state and topmost stack symbol. I am still thinking about this problem, though, and I can't manage to figure out the details... Help is appreciated. :-)

— a3nm

3

For completeness a reference to a proof in this direction.

A.Ehrenfeucht, H.J.Hoogeboom, G.Rozenberg: Coordinated pair systems. I: Dyck words and classical pumping RAIRO, Inf. Théor. Appl. 20, 405-424 (1986)

Abstract. The notion of a coordinated pair system [...] corresponds very closely to (is another formulation of) the notion of a push-down automaton. In this paper we [...] investigate the possibility of obtaining pumping properties of context-free languages via the analysis of computations in cp systems. In order to do this we analyze the combinatorial structure of Dyck words. The properties of Dyck words we investigate stem from the combinatorial analysis of computations in cp systems. We demonstrate how this correspondence can be used for proving the classical pumping lemma.

— Hendrik Jan
source

1

When discussing this problem with Géraud Sénizergues, he pointed me this paper by Sakarovitch that already proves this result. The proof seems to date back to this paper by Ogden.

References:

Sakarovitch, Jacques. Sur une propriété d’itération des langages algébriques déterministes. (French. English summary). Math. Systems Theory 14 (1981), no. 3, 247–288.
William F. Ogden. 1969. Intercalation theorems for stack languages. In Proceedings of the first annual ACM symposium on Theory of computing (STOC '69).

— Lamine
source