众所周知,包含0和1的数字相等的单词的语言不是常规的,而包含001和100的数字相等的单词的语言是常规的(请参阅此处)。
给定两个单词,是否可以确定包含相等数量的w 1和w 2的单词的语言是否正常?
众所周知,包含0和1的数字相等的单词的语言不是常规的,而包含001和100的数字相等的单词的语言是常规的(请参阅此处)。
给定两个单词,是否可以确定包含相等数量的w 1和w 2的单词的语言是否正常?
Answers:
给定两个单词,w 2,是否可以确定包含相等数量的单词的单词的语言L和 w 2是否规则?
首先是一些定义:
可以使它们更简洁,如果要在证明中使用这些符号,则可以对其进行改进。这只是初稿。
给定两个单词和w 2,我们说:
总是出现以瓦特2,注意到瓦特1 ◃ 瓦特2,当且仅当
总是cooccurs与瓦特2,注意到瓦特1 ◃ ▹ ,如果每个总是彼此出现,
和瓦特2独立地发生,注意到瓦特1 ▹ ◃ ,如果两个人都不总是在一起,
总是出现米倍以上大于瓦特2,注意到瓦特1 ◃ 米瓦特2,当且仅当对于任意字符串小号使得 小号= X 瓦特2 ý与 | X | ,| ÿ | | ≥ | 瓦特1 | + | 瓦特2 |有中号其他分解小号= X 我瓦特1 ÿ 我 为 ,使得我≠ Ĵ意味着 X 我 ≠ X Ĵ。
构造这些定义是为了使我们可以忽略在应该出现和w 2的字符串末端发生的情况。字符串末尾的边界效应必须单独分析,但它们代表的情况是有限的(实际上,我认为我在下面的第一个分析中忘记了一个或两个这样的边界子情形,但这并不重要)。这些定义与出现的重叠部分兼容。
There are 4 main cases to consider (ignoring the symetry between and ):
这两个词必定会在一起,但可能不在字符串的末尾。这仅涉及形式为 1 i 0和 01 i或 0 i 1和 10 i的对。这可以通过有限自动机轻松识别,该自动机仅检查要识别的字符串两端是否存在孤单出现,以确保在两端或两端都没有孤单出现。当 w 1 = w 2时,还存在简并的情况:那么语言L显然是规则的。
,但不是瓦特2 ◃ 瓦特
One of the 2 words cannot occur without the other, but the converse is not true (except possibly at the ends of the string). This happens when:
is a substring of :then a finite automaton can just check that does not occur outside an instance of .
and for some word , : then a finite automaton check as in the previous case that does not occur separated from . However, the automaton allows counting one extra instance of that will allow acceptance if is a suffix of the string. There are three other symetrical cases (1-0 symmetry and left-right symetry).
One of the 2 words occurs twice in the other. That can be recognized by an a finite automation that checks that the smaller word never occurs in the string. The is also a slightly more complex variant that combines the two variations of case 2. In this case the automaton checks that the smaller string never occurs, except possibly as part of in the larger one coming as a suffix of the string (and 3 other cases by symetry).
The 2 words can occur independently of each other. We build
a generalized-sequential-machine (gsm) that output when it recognizes an occurrence of and
when recognizing an occurrence of , and forgets everything
else. The language is regular only if the language is
regular. But which is clearly context-free and not regular. Hence is not regular.
Actually we have . Since regular languages and context-free languages are closed under gsm mapping and inverse gsm mapping, we know also that is context free.
One way to organize a formal proof could be the following. First build a PDA that recognizes the language. Actually it can be done with a 1-counter machine, but it is easier to have two stack symbols to avoid duplicating the finite control. Then, for the cases where it should be a FA, show that the counter can be bounded by a constant that depends only on the two words. For the other cases show that the counter can reach any arbitrary value. Of course, the PDA should be organized so that the proofs are easy enough to carry.
Representing the FA as a 2-stack-symbols PDA is probably the simplest representation for it. In the non-regular case, the finite control part of the PDA is the same as that of the GSM in the proof sketch above. Instead of outputting 's and 's like the GSM, the PDA counts the difference in number with the stack.