我再次考虑了这个问题,并且我有充分的证据。这比我预期的要棘手。欢迎评论!更新:如果对某人有用,我会在arXiv上提交此证明:http : //arxiv.org/abs/1207.2819
Let L be a context-free language over an alphabet Σ. Let A be a
pushdown automaton which recognizes L, with stack alphabet Γ. We denote
by |A| the number of states of A. Without loss of generality, we can assume
that transitions of A pop the topmost symbol of the stack and either push no
symbol on the stack or push on the stack the previous topmost symbol and some
other symbol.
We define p′=|A|2|Γ| and p=|A|(|Γ|+1)p′ the pumping
length, and will show that all w∈L such that |w|>p have a
decomposition of the form w=uvxyz such that |vxy|≤p, |vy|≥1 and ∀n≥0,uvnxynz∈L.
Let w∈L such that |w|>p. Let π be an accepting path of minimal
length for w (represented as a sequence of transitions of A), we denote its
length by |π|. We can define, for 0≤i<|π|, si the size of the
stack at position i of the accepting path. For all N>0, we define an
N-level over π as a set of three indices i,j,k with 0≤i<j<k≤p such that:
- si=sk,sj=si+N
- for all n such that i≤n≤j, si≤sn≤sj
- for all n such that j≤n≤k, sk≤sn≤sk.
(For an example of this, see the picture for case 2 below which illustrates an N-level.)
We define the level l of π as the maximal N such that π has an
N-level. This definition is motivated by the following property: if the size
of the stack over a path π becomes larger than its level l, then the stack
symbols more than l levels deep will never be popped. We will now distinguish
two cases: either l<p′, in which case we know that the same configuration
for the automaton state and the topmost l symbols of the stack is encountered
twice in the first p+1 steps of π, or l≥p′, and there must be a
stacking and unstacking position that can be repeated an arbitrary number of
times, from which we construct v and y.
Case 1. l<p′. We define the configurations of A as the couples
of a state of A and a sequence of l stack symbols (where stacks of size less
than l with be represented by padding them to l with a special blank symbol,
which is why we use |Γ|+1 when defining p). By definition, there are
|A|(|Γ|+1)l such configurations, which is less than p. Hence, in
the p+1 first steps of π, the same configuration is encountered twice at
two different positions, say i<j. Denote by iˆ (resp.
jˆ) the position of the last letter of w read at step i (resp.
j) of π. We have iˆ≤jˆ. Hence, we can factor w=uvxyz with yz=ϵ, u=w0⋯iˆ, v=wiˆ⋯jˆ, x=wjˆ⋯|w|. (By wx⋯y we denote the letters of w from x inclusive to y exclusive.) By
construction, |vxy|≤p.
We also have to show that ∀n≥0,uvnxynz=uvnx∈L, but
this follows from our observation above: stack symbols deeper than l are never
popped, so there is no way to distinguish configurations which are equal
according to our definition, and an accepting path for uvnx is built from
that of w by repeating the steps between i and j, n times.
Finally, we also have |v|>0, because if v=ϵ, then, because we
have the same configuration at steps i and j in π, π′=π0⋯iπj⋯|π| would be an accepting path for w, contradicting the
minimality of π.
(Note that this case amounts to applying the pumping lemma for regular languages
by hardcoding the topmost l stack symbols in the automaton state, which is
adequate because l is small enough to ensure that |w| is larger than the
number of states of this automaton. The main trick is that we must adjust for
ϵ-transitions.)
Case 2. l≥p′. Let i,j,k be a p′-level. To any stack
size h, si≤h≤sj, we associate the last push
lp(h)=max({y≤j|sy=h}) and the first pop
fp(h)=min({y≥j|sy=h}).
By definition, i≤lp(h)≤j and j≤fp(h)≤k. Here is an illustration of this construction. To simplify the drawing, I omit the distinction between the path positions and word positions which we will have to do later.
We say that the full state of a stack size h is the triple formed
by:
- the automaton state at position lp(h)
- the topmost stack symbol at position lp(h)
- the automaton state at position fp(h)
There are p′ possible full states, and p′+1 stack sizes between si and
sj, so, by the pidgeonhole principle, there exist two stack sizes g,h with
si≤g<h≤sj such that the full states at g and h are the same.
Like in Case 1, we define by lp(ˆg), lp(ˆh), fp(ˆh) and fp(ˆg) the
positions of the last letters of w read at the corresponding positions in π.
We factor w=uvxyz where u=w0⋯lp(ˆg),
v=wlp(ˆg)⋯lp(ˆh),
x=wlp(ˆh)⋯fp(ˆh),
y=wfp(ˆh)⋯fp(ˆg),
and z=wfp(ˆg)⋯|w|.
This factorization ensures that |vxy|≤p (because k≤p by our
definition of levels).
We also have to show that ∀n≥0,uvnxynz∈L. To do so,
observe that each time that we repeat v, we start from the same state and the
same stack top and we do not pop below our current position in the stack
(otherwise we would have to push again at the current position, violating the
maximality of lp(g)), so we can follow the same path in A and push the
same symbol sequence on the stack. By the maximality of lp(h) and the
minimality of fp(h), while reading x, we do not pop below our current
position in the stack, so the path followed in the automaton is the same
regardless of the number of times we repeated v. Now, if we repeat w as many
times as we repeat v, since we start from the same state, since we have pushed
the same symbol sequence on the stack with our repeats of v, and since we do
not pop more than what v has stacked by minimality of fp(g), we can follow
the same path in A and pop the same symbol sequence from the stack. Hence, an
accepting path from uvnxynz can be constructed from the accepting path
for w.
Finally, we also have |vy|>1, because like in case 1, if v=ϵ and y=ϵ, we can build a shorter accepting path for w by
removing πlp(g)⋯lp(h) and πfp(h)⋯fp(g).
Hence, we have an adequate factorization in both cases, and the result is
proved.
(Credit goes to Marc Jeanmougin for helping me with this proof.)