C中不区分大小写的字符串comp

Question 1

我有两个char*要比较的邮政编码，忽略大小写。有功能可以做到这一点吗？

还是我必须遍历每个使用tolower函数，然后进行比较？

任何想法，此函数将如何对字符串中的数字做出反应

谢谢

Question 2

C标准中没有执行此操作的功能。符合POSIX的Unix系统必须包含strcasecmp在头文件中strings.h; 微软系统有stricmp。为了便于移植，请编写您自己的：

int strcicmp(char const *a, char const *b)
{
    for (;; a++, b++) {
        int d = tolower((unsigned char)*a) - tolower((unsigned char)*b);
        if (d != 0 || !*a)
            return d;
    }
}

但是请注意，这些解决方案均不能与UTF-8字符串一起使用，而只能与ASCII一起使用。

Question 3

看看到strcasecmp（）在strings.h。

Question 4

我发现内置的这样的方法名为from，其中包含标准标头中的其他字符串函数。

比较不区分大小写时要注意的其他陷阱：

比较小写还是大写？（足够常见的问题）

以下两个都将通过strcicmpL("A", "a")和返回0 strcicmpU("A", "a")。
然而，strcicmpL("A", "_")并且通常在大写和小写字母之间strcicmpU("A", "_")可以返回不同的签名结果'_'。

与一起使用时，这会影响排序顺序qsort(..., ..., ..., strcicmp)。非标准库C函数（例如常用的函数） stricmp()或strcasecmp()易于定义的函数，倾向于通过小写字母进行比较。但是存在差异。

int strcicmpL(char const *a, char const *b) {
  while (*a) {
    int d = tolower(*a) - tolower(*b);
    if (d) {
        return d;
    } 
    a++;
    b++;
  } 
  return 0;
}

int strcicmpU(char const *a, char const *b) {
  while (*a) {
    int d = toupper(*a) - toupper(*b);
    if (d) {
        return d;
    } 
    a++;
    b++;
  } 
  return 0;
}

char可以为负值。（不罕见）

touppper(int)并且tolower(int)为unsigned char值和负数指定EOF。此外，strcmp()返回结果就像每个char都转换为一样unsigned char，无论char是带符号的还是无符号的。

tolower(*a); // Potential UB
tolower((unsigned char) *a); // Correct

语言环境（较不常见）

尽管使用ASCII码（0-127）的字符集无处不在，但其余的代码往往会遇到区域特定的问题。因此，strcasecmp("\xE4", "a")可能在一个系统上返回0，而在另一个系统上返回非零。

Unicode（未来之路）

如果解决方案需要处理多个ASCII字符，请考虑使用unicode_strcicmp()。由于C lib不提供此类功能，因此建议使用一些备用库中的预编码功能。编写自己的书 unicode_strcicmp()是一项艰巨的任务。

所有字母都将一低一高映射吗？（学究的）

[AZ]与[az]一对一映射，但各种语言环境将各种小写字母映射到一个大写字母，反之亦然。此外，某些大写字母可能缺少小写字母，反之亦然。

这使代码必须同时隐藏tolower()和tolower()。

int d = tolower(toupper(*a)) - tolower(toupper(*b));

同样，如果代码对tolower(toupper(*a))vs ，则排序时可能会出现不同的结果toupper(tolower(*a))。

可移植性

@B。Nadolson建议避免自己滚动strcicmp()，这是合理的，除非代码需要高度等效的可移植功能。

下面是一种方法，其执行速度甚至比某些系统提供的功能还快。它使用2个不同的表对每个循环执行单个比较，而不是两个'\0'。您的结果可能会有所不同。

static unsigned char low1[UCHAR_MAX + 1] = {
  0, 1, 2, 3, ...
  '@', 'a', 'b', 'c', ... 'z', `[`, ...  // @ABC... Z[...
  '`', 'a', 'b', 'c', ... 'z', `{`, ...  // `abc... z{...
}
static unsigned char low2[UCHAR_MAX + 1] = {
// v--- Not zero, but A which matches none in `low1[]`
  'A', 1, 2, 3, ...
  '@', 'a', 'b', 'c', ... 'z', `[`, ...
  '`', 'a', 'b', 'c', ... 'z', `{`, ...
}

int strcicmp_ch(char const *a, char const *b) {
  // compare using tables that differ slightly.
  while (low1[(unsigned char)*a] == low2[(unsigned char)*b]) {
    a++;
    b++;
  }
  // Either strings differ or null character detected.
  // Perform subtraction using same table.
  return (low1[(unsigned char)*a] - low1[(unsigned char)*b]);
}

Question 7

我不太喜欢此处最受欢迎的答案（部分原因是，这似乎是不正确的，因为它应该continue读取两个字符串中的空终止符（但不是一次同时读取两个字符串），并且不这样做），所以我写了自己的。

这是的直接替代品`strncmp()`，并且已通过许多测试用例进行了测试，如下所示。

strncmp()除以下内容外，其他均相同：

不区分大小写。
如果任一字符串为空ptr，则该行为不是未定义的（定义明确）。strncmp()如果其中一个字符串为空ptr，则Regular具有未定义的行为（请参阅：https：//en.cppreference.com/w/cpp/string/byte/strncmp）。
INT_MIN如果任一输入字符串是NULLptr，它将作为特殊的前哨错误值返回。

限制：请注意，此代码仅适用于原始的7位ASCII字符集（十进制值0到127，包括十进制），不适用于Unicode字符，例如Unicode字符编码UTF-8（最流行），UTF-16，和UTF-32。

这仅是代码（无注释）：

int strncmpci(const char * str1, const char * str2, size_t num)
{
    int ret_code = 0;
    size_t chars_compared = 0;

    if (!str1 || !str2)
    {
        ret_code = INT_MIN;
        return ret_code;
    }

    while ((*str1 || *str2) && (chars_compared < num))
    {
        ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
        if (ret_code != 0)
        {
            break;
        }
        chars_compared++;
        str1++;
        str2++;
    }

    return ret_code;
}

完整评论的版本：

/// \brief      Perform a case-insensitive string compare (`strncmp()` case-insensitive) to see
///             if two C-strings are equal.
/// \note       1. Identical to `strncmp()` except:
///               1. It is case-insensitive.
///               2. The behavior is NOT undefined (it is well-defined) if either string is a null
///               ptr. Regular `strncmp()` has undefined behavior if either string is a null ptr
///               (see: https://en.cppreference.com/w/cpp/string/byte/strncmp).
///               3. It returns `INT_MIN` as a special sentinel value for certain errors.
///             - Posted as an answer here: https://stackoverflow.com/a/55293507/4561887.
///               - Aided/inspired, in part, by `strcicmp()` here:
///                 https://stackoverflow.com/a/5820991/4561887.
/// \param[in]  str1        C string 1 to be compared.
/// \param[in]  str2        C string 2 to be compared.
/// \param[in]  num         max number of chars to compare
/// \return     A comparison code (identical to `strncmp()`, except with the addition
///             of `INT_MIN` as a special sentinel value):
///
///             INT_MIN (usually -2147483648 for int32_t integers)  Invalid arguments (one or both
///                      of the input strings is a NULL pointer).
///             <0       The first character that does not match has a lower value in str1 than
///                      in str2.
///              0       The contents of both strings are equal.
///             >0       The first character that does not match has a greater value in str1 than
///                      in str2.
int strncmpci(const char * str1, const char * str2, size_t num)
{
    int ret_code = 0;
    size_t chars_compared = 0;

    // Check for NULL pointers
    if (!str1 || !str2)
    {
        ret_code = INT_MIN;
        return ret_code;
    }

    // Continue doing case-insensitive comparisons, one-character-at-a-time, of `str1` to `str2`,
    // as long as at least one of the strings still has more characters in it, and we have
    // not yet compared `num` chars.
    while ((*str1 || *str2) && (chars_compared < num))
    {
        ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
        if (ret_code != 0)
        {
            // The 2 chars just compared don't match
            break;
        }
        chars_compared++;
        str1++;
        str2++;
    }

    return ret_code;
}

测试代码：

从我的eRCaGuy_hello_world资源库中下载完整的示例代码以及单元测试：“ strncmpci.c”：

（这只是一个片段）

int main()
{
    printf("-----------------------\n"
           "String Comparison Tests\n"
           "-----------------------\n\n");

    int num_failures_expected = 0;

    printf("INTENTIONAL UNIT TEST FAILURE to show what a unit test failure looks like!\n");
    EXPECT_EQUALS(strncmpci("hey", "HEY", 3), 'h' - 'H');
    num_failures_expected++;
    printf("------ beginning ------\n\n");


    const char * str1;
    const char * str2;
    size_t n;

    // NULL ptr checks
    EXPECT_EQUALS(strncmpci(NULL, "", 0), INT_MIN);
    EXPECT_EQUALS(strncmpci("", NULL, 0), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, NULL, 0), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, "", 10), INT_MIN);
    EXPECT_EQUALS(strncmpci("", NULL, 10), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, NULL, 10), INT_MIN);

    EXPECT_EQUALS(strncmpci("", "", 0), 0);
    EXPECT_EQUALS(strncmp("", "", 0), 0);

    str1 = "";
    str2 = "";
    n = 0;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "HEY";
    n = 0;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "HEY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "heY";
    str2 = "HeY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "hey";
    str2 = "HEdY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 'y' - 'd');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "heY";
    str2 = "hEYd";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'e' - 'E');

    str1 = "heY";
    str2 = "heyd";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'d');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'Y' - 'y');

    str1 = "hey";
    str2 = "hey";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "heyd";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'d');
    EXPECT_EQUALS(strncmp(str1, str2, n), -'d');

    str1 = "hey";
    str2 = "heyd";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hEY";
    str2 = "heyYOU";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    str1 = "hEY";
    str2 = "heyYOU";
    n = 10;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'y');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    str1 = "hEYHowAre";
    str2 = "heyYOU";
    n = 10;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 'h' - 'y');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO MEET YOU.,;", 100), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "NICE TO MEET YOU.,;", 100), 'n' - 'N');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to meet you.,;", 100), 0);

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO UEET YOU.,;", 100), 'm' - 'u');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to uEET YOU.,;", 100), 'm' - 'u');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to UEET YOU.,;", 100), 'm' - 'U');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO MEET YOU.,;", 5), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "NICE TO MEET YOU.,;", 5), 'n' - 'N');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE eo UEET YOU.,;", 5), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice eo uEET YOU.,;", 5), 0);

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE eo UEET YOU.,;", 100), 't' - 'e');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice eo uEET YOU.,;", 100), 't' - 'e');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "nice-eo UEET YOU.,;", 5), ' ' - '-');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice-eo UEET YOU.,;", 5), ' ' - '-');


    if (globals.error_count == num_failures_expected)
    {
        printf(ANSI_COLOR_GRN "All unit tests passed!" ANSI_COLOR_OFF "\n");
    }
    else
    {
        printf(ANSI_COLOR_RED "FAILED UNIT TESTS! NUMBER OF UNEXPECTED FAILURES = %i"
            ANSI_COLOR_OFF "\n", globals.error_count - num_failures_expected);
    }

    assert(globals.error_count == num_failures_expected);
    return globals.error_count;
}

样本输出：

$ gcc -Wall -Wextra -Werror -ggdb -std=c11 -o ./bin/tmp strncmpci.c && ./bin/tmp
-----------------------
String Comparison Tests
-----------------------

INTENTIONAL UNIT TEST FAILURE to show what a unit test failure looks like!
FAILED at line 250 in function main! strncmpci("hey", "HEY", 3) != 'h' - 'H'
  a: strncmpci("hey", "HEY", 3) is 0
  b: 'h' - 'H' is 32

------ beginning ------

All unit tests passed!

参考文献：

这个问题和其他答案在这里起到了启发作用，并给出了一些见解（C语言中的不区分大小写的字符串comp）
http://www.cplusplus.com/reference/cstring/strncmp/
https://zh.wikipedia.org/wiki/ASCII
https://en.cppreference.com/w/c/language/operator_precedence

有待进一步研究的课题

（注意：这是C ++，而不是C）Unicode字符的小写
OnlineGDB上的tolower_tests.c：https：//onlinegdb.com/HyZieXcew

去做：

制作此代码的版本，该版本也可用于Unicode的UTF-8实现（字符编码）！

Question 8

正如其他人所述，没有可移植功能可在所有系统上运行。您可以使用简单的方法来部分规避此问题ifdef：

#include <stdio.h>

#ifdef _WIN32
#include <string.h>
#define strcasecmp _stricmp
#else // assuming POSIX or BSD compliant system
#include <strings.h>
#endif

int main() {
    printf("%d", strcasecmp("teSt", "TEst"));
}

Question 9

如果库中没有任何内容，您可以从中获得一个想法，如何实现有效的想法：这里

它对所有256个字符使用一个表。

在该表中，所有字符（字母除外）均使用其ascii码。
大写字母代码-小写符号的表列表代码。

那么我们只需要遍历一个字符串并比较给定字符的表单元格：

const char *cm = charmap,
        *us1 = (const char *)s1,
        *us2 = (const char *)s2;
while (cm[*us1] == cm[*us2++])
    if (*us1++ == '\0')
        return (0);
return (cm[*us1] - cm[*--us2]);

Question 10

static int ignoreCaseComp (const char *str1, const char *str2, int length)
{
    int k;
    for (k = 0; k < length; k++)
    {

        if ((str1[k] | 32) != (str2[k] | 32))
            break;
    }

    if (k != length)
        return 1;
    return 0;
}

参考

Question 11

简单的解决方案：

int str_case_ins_cmp(const char* a, const char* b) {
  int rc;

  while (1) {
    rc = tolower((unsigned char)*a) - tolower((unsigned char)*b);
    if (rc || !*a) {
      break;
    }

    ++a;
    ++b;
  }

  return rc;
}

Question 12

int strcmpInsensitive(char* a, char* b)
{
    return strcmp(lowerCaseWord(a), lowerCaseWord(b));
}

char* lowerCaseWord(char* a)
{
    char *b=new char[strlen(a)];
    for (int i = 0; i < strlen(a); i++)
    {
        b[i] = tolower(a[i]);   
    }
    return b;
}

祝好运

Edit-lowerCaseWord函数使用一个char *变量，并返回此char *的小写字母值。例如，对于char *的值，“ AbCdE”将返回“ abcde”。

基本上，要做的就是在将两个char *变量转换为小写字母之后，对它们使用strcmp函数。

例如，如果我们为“ AbCdE”和“ ABCDE”的值调用strcmpInsensitive函数，它将首先以小写形式返回两个值（“ abcde”），然后对它们执行strcmp函数。