用出现次数描述的语言是否正常是可以确定的吗？

众所周知，包含0和1的数字相等的单词的语言不是常规的，而包含001和100的数字相等的单词的语言是常规的（请参阅此处）。

给定两个单词，是否可以确定包含相等数量的和的单词的语言是否正常？ $w_1,w_2$ $w_1$ $w_2$

regular-languages undecidability

— sdcvvc
source

您能否给出除

和

或

和

之外的其他这样定义的常规语言示例？那么一个3符号字母的例子呢？

1^{i} 0

$1^i0$

01^{i}

$01^i$

0^{i} 1

$0^i1$

10^{i}

$10^i$

— babou 2013年

如果

是

严格子词，则该语言很可能是空的，因此是规则的。我不知道其他例子。

w_{1}

$w_1$

w_{2}

$w_2$

— sdcvvc 2013年

我很怀疑上面的例子是唯一的例子，这使问题可以确定。如果仅指定两个子字符串，我猜它是CF ...取决于您可以指定的发生方式。您对“按发生次数描述”的含义不够精确。

— babou 2013年

问题主体足够精确的IMO。

— sdcvvc 2013年

迄今为止，针对特殊情况的解决方案似乎取决于以下想法：

的子字符串的出现仅保证介入

单个出现。因此，以某种方式假设当前答案是正确的（对我来说还不是很清楚），看来

，

之间存在某种关系，这保证了在扫描字符串的过程中，一个人可以处于“相等”或“不相等”状态”，但对于“不相等”的情况，只允许使用最大有限数。

w_{1}

$w_1$

w_{2}

$w_2$

w_{1}

$w_1$

w_{2}

$w_2$

— vzn

给定两个单词，，是否可以确定包含相等数量的单词的单词的语言 $w_1$ $w_2$ $L$ 和是否规则？ $w_1$ $w_2$

首先是一些定义：
可以使它们更简洁，如果要在证明中使用这些符号，则可以对其进行改进。这只是初稿。

给定两个单词和，我们说： $w_1$ $w_2$

总是出现以，注意到，当且仅当 $w_1$ $w_2$ $w_1\triangleleft w_2$
1. 对于任何字符串使得与 $s$ $s=xw_2y$ 和有另一个分解。注意：和的条件 $\mid x\mid,\, \mid y\mid\ \geq \mid w_1\mid +\mid w_2\mid$ $|x|_0,|x|_1|,|y|_0,|y|_1| \geq 1$ $s=x'w_1y'$
  $x$ $y$ 各自含有至少一个0和1由病理情况下（由@sdcvvc实测值）必需的：，和，和其对称的变体。 $w_1=1^i0$ $w_2=v1^{i+j}$ $y\in1^*$
2. 有一个字符串和 $s=xw_2y$ 使得存在至多一个分解 $\mid x\mid,\, \mid y\mid\ \geq \mid w_1\mid +\mid w_2\mid$ $s=x'w_1y'$
总是cooccurs与，注意到 $w_1$ $w_2$ ，如果每个总是彼此出现， $w_1\triangleleft \triangleright\,w_2$
和独立地发生，注意到 $w_1$ $w_2$ ，如果两个人都不总是在一起， $w_1\triangleright \triangleleft\,w_2$
总是出现倍以上大于，注意到，当且仅当对于任意字符串使得与有其他分解 $w_1$ $m$ $w_2$ $w_1\triangleleft_m w_2$ $s$ $s=xw_2y$ $\mid x\mid,\ \mid y\mid|\ \geq \mid w_1\mid +\mid w_2\mid$ $m$ $s=x_iw_1y_i$ 为，使得意味着。 $i\in[1,m]$ $i\neq j$ $x_i\neq x_j$

构造这些定义是为了使我们可以忽略在应该出现和的字符串末端发生的情况。字符串末尾的边界效应必须单独分析，但它们代表的情况是有限的（实际上，我认为我在下面的第一个分析中忘记了一个或两个这样的边界子情形，但这并不重要）。这些定义与出现的重叠部分兼容。 $w_1$ $w_2$

There are 4 main cases to consider (ignoring the symetry between $w_1$ and $w_2$ ):

这两个词必定会在一起，但可能不在字符串的末尾。这仅涉及形式为和或和。这可以通过有限自动机轻松识别，该自动机仅检查要识别的字符串两端是否存在孤单出现，以确保在两端或两端都没有孤单出现。当时，还存在简并的情况：那么语言L显然是规则的。 $w_1\triangleleft \triangleright\,w_2$
$1^i0$ $01^i$ $0^i1$ $10^i$ $w_1=w_2$
，但不是 $w_1\triangleleft w_2$ $w_2\triangleleft w_1$
One of the 2 words cannot occur without the other, but the converse is not true (except possibly at the ends of the string). This happens when:
- $w_1$ is a substring of $w_2$ :then a finite automaton can just check that $w_1$ does not occur outside an instance of $w_2$ .
- $w_1=1^i0$ and $w_2=v1^j$ for some word $v\in\{0,1\}^*$ , $v\neq01^i$ : then a finite automaton check as in the previous case that $w_1$ does not occur separated from $w_2$ . However, the automaton allows counting one extra instance of $w_1$ that will allow acceptance if $w_2$ is a suffix of the string. There are three other symetrical cases (1-0 symmetry and left-right symetry).
$w_1\triangleleft_2 w_2$
One of the 2 words occurs twice in the other. That can be recognized by an a finite automation that checks that the smaller word never occurs in the string. The is also a slightly more complex variant that combines the two variations of case 2. In this case the automaton checks that the smaller string $1^i0$ never occurs, except possibly as part of $v$ in the larger one $v1^j$ coming as a suffix of the string (and 3 other cases by symetry).
$w_1\triangleright \triangleleft\,w_2$
The 2 words can occur independently of each other. We build a generalized-sequential-machine (gsm) $G$ that output $a$ when it recognizes an occurrence of $w_1$ and $b$ when recognizing an occurrence of $w_2$ , and forgets everything else. The language $L$ is regular only if the language $G(L)$ is regular. But $G(L)=\{w\in\{a,b\}^*\mid\ \mid w\mid_a=\mid w\mid_b\}$ which is clearly context-free and not regular. Hence $L$ is not regular.
Actually we have $L=G^{-1}(G(L))$ . Since regular languages and context-free languages are closed under gsm mapping and inverse gsm mapping, we know also that $L$ is context free.

One way to organize a formal proof could be the following. First build a PDA that recognizes the language. Actually it can be done with a 1-counter machine, but it is easier to have two stack symbols to avoid duplicating the finite control. Then, for the cases where it should be a FA, show that the counter can be bounded by a constant that depends only on the two words. For the other cases show that the counter can reach any arbitrary value. Of course, the PDA should be organized so that the proofs are easy enough to carry.

Representing the FA as a 2-stack-symbols PDA is probably the simplest representation for it. In the non-regular case, the finite control part of the PDA is the same as that of the GSM in the proof sketch above. Instead of outputting $a$ 's and $b$ 's like the GSM, the PDA counts the difference in number with the stack.

— babou
source

I had a question about context-freeness in the case of three words. I deleted it when I realised it could be analyzed similarly. I had first thought that proving non-CFness would make an original exercise, but the GSM ruins it.

— babou

It is not clear what do you mean by "occur independently of each other", "come necessarily together" etc. Please write formal definitions instead, and prove that they cover all cases.

— sdcvvc

I am not sure what you are asking, and what level of formalization you need, for what purpose. I realized that analyzing by hand possible relations of the two words is not garanteed to be correct, and does not matter anyway. What matters is whether an occurence of one word can exist without creating at the same time an occurence (or several) of the other word. The details do not matter as it will always be localized and thus manageable finitely. The two ends do not matter either as tey are localized too. Even overlaps of occurrences do not matter since they can only be finitely many in 1 place

— babou

I asked you about precise definitions of the terms mentioned in the comment. Thank you for writing them. Was I supposed to guess them previously? Anyway, you seem to claim that

0^{i} 1 ◃ ▹ 1 0^{i}

$0^i 1 \triangleleft \triangleright 1 0^i$ . This does not satisfy condition 1. of the definition of "

w_{1}

$w_1$ always occurs with

w_{2}

$w_2$ ", since there is no occurrence of

1 0^{i}

$1 0^i$ in

s = 0^{M} 0^{i} 1 1^{M}

$s=0^M 0^i 1 1^M$ .

— sdcvvc

Sorry, I did not mean to make you guess. It only took me time to understand what exactly you wanted. My failing only. Regarding your counter example, you are correct. But for me it only means that I have to be a little bit more careful about telomeres, in the definition of the relations. I defined them too quickly, but

0^{M}

$0^M$ or

1^{M}

$1^M$ do not convey much information in this context. This is really a boundary pathological example within a pathological case, that actually cannot occur when more than 2 symbols are used. I just do not believe it changes anything.

— babou