SQL Server中的所有字符数据都与一个排序规则相关联,该排序规则确定了可以存储的字符域以及用于比较和排序数据的规则。归类适用于Unicode和非Unicode数据。
SQL Server包括3大类归类:二进制,旧版和Windows。二进制类别(_BIN
后缀)中的归类使用基础代码点进行比较,因此,如果代码点与字符无关,则相等比较返回不相等。传统(SQL_
前缀)和Windows归类为更自然的字典规则提供了排序和比较语义。这样可以进行比较以考虑大小写,重音,宽度和假名。Windows归类提供了更健壮的word-sort
规则,这些规则与Windows操作系统紧密匹配,而旧版归类仅考虑单个字符。
下面的示例说明Windows和使用Teth字符的二进制排序规则之间的区别:
CREATE TABLE dbo.WindowsColationExample
(
Character1 nchar(1) COLLATE Arabic_100_CI_AS_SC
, Character2 nchar(1) COLLATE Arabic_100_CI_AS_SC
, Character3 nchar(1) COLLATE Arabic_100_CI_AS_SC
, Character4 nchar(1) COLLATE Arabic_100_CI_AS_SC
);
CREATE TABLE dbo.BinaryColationExample
(
Character1 nchar(1) COLLATE Arabic_100_BIN
, Character2 nchar(1) COLLATE Arabic_100_BIN
, Character3 nchar(1) COLLATE Arabic_100_BIN
, Character4 nchar(1) COLLATE Arabic_100_BIN
);
INSERT INTO dbo.BinaryColationExample
VALUES ( NCHAR(65217), NCHAR(65218), NCHAR(65219), NCHAR(65220) );
INSERT INTO dbo.WindowsColationExample
VALUES ( NCHAR(65217), NCHAR(65218), NCHAR(65219), NCHAR(65220) );
--all characters compare not equal
SELECT *
FROM dbo.BinaryColationExample
WHERE
character1 = character2
OR character1 = character3
OR character1 = character4
OR character2 = character3
OR character2 = character4
OR character3 = character4;
--all characters compare equal
SELECT *
FROM dbo.WindowsColationExample
WHERE character1 = character2;
SELECT *
FROM dbo.WindowsColationExample
WHERE character1 = character3;
SELECT *
FROM dbo.WindowsColationExample
WHERE character1 = character4;
SELECT *
FROM dbo.WindowsColationExample
WHERE character2 = character3;
SELECT *
FROM dbo.WindowsColationExample
WHERE character2 = character4;
SELECT *
FROM dbo.WindowsColationExample
WHERE character3 = character4;
http://en.wikipedia.org/wiki/Duplicate_characters_in_Unicode中概述了为什么Unicode可能包含相同字形的不同代码点的原因 。我总结一下,这可能是为了实现旧兼容性,或者字符不是规范上相等的。请注意,Teth字符ﻁ
用于不同的语言(http://en.wikipedia.org/wiki/Teth)。