使用下推自动机证明无上下文语言的抽水引理


21

可以通过考虑识别所研究语言的有限状态自动机,选择长度大于其状态数的字符串以及应用信鸽原理来证明常规语言抽动引理。在对上下文无关语言泵引理(以及奥格登引理这是稍微更普遍的),但是,考虑研究语言的上下文无关文法,选择一个足够长的字符串,并期待在解析树证明。

鉴于这两个泵送引理的相似性,您希望可以通过考虑识别语言而不是语法的下推自动机,以与常规的类似的方式来证明与上下文无关的一个。但是,我没有找到关于这种证明的任何参考。

因此,我的问题是:是否有证据证明无上下文语言仅涉及下推自动机而不涉及语法?

Answers:


16

我再次考虑了这个问题,并且我有充分的证据。这比我预期的要棘手。欢迎评论!更新:如果对某人有用,我会在arXiv上提交此证明:http : //arxiv.org/abs/1207.2819

Let L be a context-free language over an alphabet Σ. Let A be a pushdown automaton which recognizes L, with stack alphabet Γ. We denote by |A| the number of states of A. Without loss of generality, we can assume that transitions of A pop the topmost symbol of the stack and either push no symbol on the stack or push on the stack the previous topmost symbol and some other symbol.

We define p=|A|2|Γ| and p=|A|(|Γ|+1)p the pumping length, and will show that all wL such that |w|>p have a decomposition of the form w=uvxyz such that |vxy|p, |vy|1 and n0,uvnxynzL.

Let wL such that |w|>p. Let π be an accepting path of minimal length for w (represented as a sequence of transitions of A), we denote its length by |π|. We can define, for 0i<|π|, si the size of the stack at position i of the accepting path. For all N>0, we define an N-level over π as a set of three indices i,j,k with 0i<j<kp such that:

  1. si=sk,sj=si+N
  2. for all n such that inj, sisnsj
  3. for all n such that jnk, sksnsk.

(For an example of this, see the picture for case 2 below which illustrates an N-level.)

We define the level l of π as the maximal N such that π has an N-level. This definition is motivated by the following property: if the size of the stack over a path π becomes larger than its level l, then the stack symbols more than l levels deep will never be popped. We will now distinguish two cases: either l<p, in which case we know that the same configuration for the automaton state and the topmost l symbols of the stack is encountered twice in the first p+1 steps of π, or lp, and there must be a stacking and unstacking position that can be repeated an arbitrary number of times, from which we construct v and y.

Case 1. l<p. We define the configurations of A as the couples of a state of A and a sequence of l stack symbols (where stacks of size less than l with be represented by padding them to l with a special blank symbol, which is why we use |Γ|+1 when defining p). By definition, there are |A|(|Γ|+1)l such configurations, which is less than p. Hence, in the p+1 first steps of π, the same configuration is encountered twice at two different positions, say i<j. Denote by i^ (resp. j^) the position of the last letter of w read at step i (resp. j) of π. We have i^j^. Hence, we can factor w=uvxyz with yz=ϵ, u=w0i^, v=wi^j^, x=wj^|w|. (By wxy we denote the letters of w from x inclusive to y exclusive.) By construction, |vxy|p.

We also have to show that n0,uvnxynz=uvnxL, but this follows from our observation above: stack symbols deeper than l are never popped, so there is no way to distinguish configurations which are equal according to our definition, and an accepting path for uvnx is built from that of w by repeating the steps between i and j, n times.

Finally, we also have |v|>0, because if v=ϵ, then, because we have the same configuration at steps i and j in π, π=π0iπj|π| would be an accepting path for w, contradicting the minimality of π.

(Note that this case amounts to applying the pumping lemma for regular languages by hardcoding the topmost l stack symbols in the automaton state, which is adequate because l is small enough to ensure that |w| is larger than the number of states of this automaton. The main trick is that we must adjust for ϵ-transitions.)

Case 2. lp. Let i,j,k be a p-level. To any stack size h, sihsj, we associate the last push lp(h)=max({yj|sy=h}) and the first pop fp(h)=min({yj|sy=h}). By definition, ilp(h)j and jfp(h)k. Here is an illustration of this construction. To simplify the drawing, I omit the distinction between the path positions and word positions which we will have to do later.

Illustration of the construction for case 2. To simplify the drawing, the distinction between the path positions and word positions are ommitted.

We say that the full state of a stack size h is the triple formed by:

  1. the automaton state at position lp(h)
  2. the topmost stack symbol at position lp(h)
  3. the automaton state at position fp(h)

There are p possible full states, and p+1 stack sizes between si and sj, so, by the pidgeonhole principle, there exist two stack sizes g,h with sig<hsj such that the full states at g and h are the same. Like in Case 1, we define by lp(^g), lp(^h), fp(^h) and fp(^g) the positions of the last letters of w read at the corresponding positions in π. We factor w=uvxyz where u=w0lp(^g), v=wlp(^g)lp(^h), x=wlp(^h)fp(^h), y=wfp(^h)fp(^g), and z=wfp(^g)|w|.

This factorization ensures that |vxy|p (because kp by our definition of levels).

We also have to show that n0,uvnxynzL. To do so, observe that each time that we repeat v, we start from the same state and the same stack top and we do not pop below our current position in the stack (otherwise we would have to push again at the current position, violating the maximality of lp(g)), so we can follow the same path in A and push the same symbol sequence on the stack. By the maximality of lp(h) and the minimality of fp(h), while reading x, we do not pop below our current position in the stack, so the path followed in the automaton is the same regardless of the number of times we repeated v. Now, if we repeat w as many times as we repeat v, since we start from the same state, since we have pushed the same symbol sequence on the stack with our repeats of v, and since we do not pop more than what v has stacked by minimality of fp(g), we can follow the same path in A and pop the same symbol sequence from the stack. Hence, an accepting path from uvnxynz can be constructed from the accepting path for w.

Finally, we also have |vy|>1, because like in case 1, if v=ϵ and y=ϵ, we can build a shorter accepting path for w by removing πlp(g)lp(h) and πfp(h)fp(g).

Hence, we have an adequate factorization in both cases, and the result is proved.

(Credit goes to Marc Jeanmougin for helping me with this proof.)


7

Yes it is possible. We could use the notion of surface configurations; they were introduced by Cook a long time back. With this it should be quite easy to get a version of pumping lemma out.

As to surface configurations, almost any paper on LogCFL should carry its definition. Here is a recent paper and here is a thesis

Maybe someone more energetic can spell out the details!


Thanks for answering! Yes, it is pretty natural to look at the combination of automaton state and topmost stack symbol. I am still thinking about this problem, though, and I can't manage to figure out the details... Help is appreciated. :-)
a3nm

3

For completeness a reference to a proof in this direction.

A.Ehrenfeucht, H.J.Hoogeboom, G.Rozenberg: Coordinated pair systems. I: Dyck words and classical pumping RAIRO, Inf. Théor. Appl. 20, 405-424 (1986)

Abstract. The notion of a coordinated pair system [...] corresponds very closely to (is another formulation of) the notion of a push-down automaton. In this paper we [...] investigate the possibility of obtaining pumping properties of context-free languages via the analysis of computations in cp systems. In order to do this we analyze the combinatorial structure of Dyck words. The properties of Dyck words we investigate stem from the combinatorial analysis of computations in cp systems. We demonstrate how this correspondence can be used for proving the classical pumping lemma.


1

When discussing this problem with Géraud Sénizergues, he pointed me this paper by Sakarovitch that already proves this result. The proof seems to date back to this paper by Ogden.

References:

  • Sakarovitch, Jacques. Sur une propriété d’itération des langages algébriques déterministes. (French. English summary). Math. Systems Theory 14 (1981), no. 3, 247–288.
  • William F. Ogden. 1969. Intercalation theorems for stack languages. In Proceedings of the first annual ACM symposium on Theory of computing (STOC '69).
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.