在C中实现字典的快速方法

132

在用C编写程序时，我想念的一件事就是字典数据结构。用C实现一个最方便的方法是什么？我不是在寻找性能，而是希望从头开始编写它。我也不希望它是通用的-像string-> int这样的东西。但是我确实希望它能够存储任意数量的项目。

这更多地是作为练习。我知道有可用的第三方库。但是请考虑片刻，它们不存在。在这种情况下，实现满足上述要求的字典的最快方法是什么。

c data-structures dictionary

— 罗希特
source

4

如果您错过了为您提供的功能，那么为什么要从头开始而不是使用第三方实现呢？

— Karl Knechtel

是的，这种选择始终存在。我把这个问题更多地当作练习来提出。

— 罗希特（Rohit）2010年

10

用C编写哈希表是一个有趣的练习-每个认真的C程序员都应该至少这样做一次。

— 李2010年

我认为字典是数据类型而不是数据结构，因为它可以通过多种方式实现-列表，哈希表，树，自平衡树等。您要的是字典还是哈希表？？

— Paul Hankin

1

相关：如何表示一个类Python在C []（字典？stackoverflow.com/questions/3269881/...）

— GAURANG坦登

114

C编程语言的 6.6节介绍了一个简单的字典（哈希表）数据结构。我不认为有用的字典实现会比这更简单。为了您的方便，我在此处重现代码。

struct nlist { /* table entry: */
    struct nlist *next; /* next entry in chain */
    char *name; /* defined name */
    char *defn; /* replacement text */
};

#define HASHSIZE 101
static struct nlist *hashtab[HASHSIZE]; /* pointer table */

/* hash: form hash value for string s */
unsigned hash(char *s)
{
    unsigned hashval;
    for (hashval = 0; *s != '\0'; s++)
      hashval = *s + 31 * hashval;
    return hashval % HASHSIZE;
}

/* lookup: look for s in hashtab */
struct nlist *lookup(char *s)
{
    struct nlist *np;
    for (np = hashtab[hash(s)]; np != NULL; np = np->next)
        if (strcmp(s, np->name) == 0)
          return np; /* found */
    return NULL; /* not found */
}

char *strdup(char *);
/* install: put (name, defn) in hashtab */
struct nlist *install(char *name, char *defn)
{
    struct nlist *np;
    unsigned hashval;
    if ((np = lookup(name)) == NULL) { /* not found */
        np = (struct nlist *) malloc(sizeof(*np));
        if (np == NULL || (np->name = strdup(name)) == NULL)
          return NULL;
        hashval = hash(name);
        np->next = hashtab[hashval];
        hashtab[hashval] = np;
    } else /* already there */
        free((void *) np->defn); /*free previous defn */
    if ((np->defn = strdup(defn)) == NULL)
       return NULL;
    return np;
}

char *strdup(char *s) /* make a duplicate of s */
{
    char *p;
    p = (char *) malloc(strlen(s)+1); /* +1 for ’\0’ */
    if (p != NULL)
       strcpy(p, s);
    return p;
}

请注意，如果两个字符串的哈希冲突，则可能导致O(n)查找时间。您可以通过增加的值来减少发生碰撞的可能性HASHSIZE。有关数据结构的完整讨论，请查阅本书。

— 维杰·马修（Vijay Mathew）
source

1

如果它来自C书，我想知道是否可以有一个更紧凑的实现。

— 罗希特（Rohit），2010年

30

@Rohit，对于一段有用的C代码，没有比这更紧凑的了。我想您总是可以删除一些空格...

— Ryan Calhoun 2010年

7

为什么这里hashval = *s + 31 * hashval;正好是31，而不是其他？

— 2014年

12

31是素数。素数通常用于哈希函数中，以减少冲突的可能性。它与整数分解有关（即，您不能分解素数）。

— jnovacho 2014年

2

@Overdrivr：在这种情况下不是必需的。hashtab具有静态持续时间。带有静态持续时间的未初始化变量（即，那些在函数外部声明的变量以及那些使用存储类静态声明的变量）保证以正确类型的零（即：0或NULL或0.0）

— 开头

19

该最快的方法是使用一个已经存在的实现，像uthash。

— 紫罗兰色
source

8

为了易于实现，很难天真地搜索数组。除了一些错误检查之外，这是一个完整的实现（未经测试）。

typedef struct dict_entry_s {
    const char *key;
    int value;
} dict_entry_s;

typedef struct dict_s {
    int len;
    int cap;
    dict_entry_s *entry;
} dict_s, *dict_t;

int dict_find_index(dict_t dict, const char *key) {
    for (int i = 0; i < dict->len; i++) {
        if (!strcmp(dict->entry[i], key)) {
            return i;
        }
    }
    return -1;
}

int dict_find(dict_t dict, const char *key, int def) {
    int idx = dict_find_index(dict, key);
    return idx == -1 ? def : dict->entry[idx].value;
}

void dict_add(dict_t dict, const char *key, int value) {
   int idx = dict_find_index(dict, key);
   if (idx != -1) {
       dict->entry[idx].value = value;
       return;
   }
   if (dict->len == dict->cap) {
       dict->cap *= 2;
       dict->entry = realloc(dict->entry, dict->cap * sizeof(dict_entry_s));
   }
   dict->entry[dict->len].key = strdup(key);
   dict->entry[dict->len].value = value;
   dict->len++;
}

dict_t dict_new(void) {
    dict_s proto = {0, 10, malloc(10 * sizeof(dict_entry_s))};
    dict_t d = malloc(sizeof(dict_s));
    *d = proto;
    return d;
}

void dict_free(dict_t dict) {
    for (int i = 0; i < dict->len; i++) {
        free(dict->entry[i].key);
    }
    free(dict->entry);
    free(dict);
}

— 保罗·汉金
source

2

“为了易于实施”：您完全正确：这是最简单的。另外，它实现了OP的请求“我希望它能够存储任意数量的项目”-投票最高的答案不会这样做（除非您认为选择编译时间常数可以满足“任意”要求。）

— davidbak

1

根据用例，这可能是一种有效的方法，但是OP明确要求使用字典，而这绝对不是字典。

— 丹·贝查德

3

创建一个简单的哈希函数和一些结构的链接列表，具体取决于哈希值，指定在哪个链接列表中插入值。也可以使用哈希来检索它。

不久前，我做了一个简单的实现：

...
#define K 16 //链接系数

结构指令
{
    字符* name; / *密钥名称* /
    int val; / *值* /
    struct dict *下一个；/ *链接字段* /
};

typedef struct dict dict;
字典* table [K];
int已初始化= 0;


无效putval（char *，int）;

无效的init_dict（）
{   
    初始化= 1;
    我  
    for（i = 0; iname =（字符*）malloc（strlen（key_name）+1）;
    ptr-> val = sval;
    strcpy（ptr-> name，key_name）;


    ptr-> next =（struct dict *）table [hsh];
    table [hsh] = ptr;

}


int getval（char * key_name）
{   
    int hsh = hash（key_name）;   
    dict * ptr;
    为（ptr = table [hsh]; ptr！=（dict *）0;
        ptr =（dict *）ptr->下一个）
    如果（strcmp（ptr-> name，key_name）== 0）
        返回ptr-> val;
    返回-1;
}

— abc def foo栏
source

1

您不丢失一半的代码吗？“ hash（）”和“ putval（）”在哪里？

— swdev

3

GLib和gnulib

如果您没有更具体的要求，这些可能是最好的选择，因为它们广泛可用，可移植并且可能高效。

GLib：GNOME项目的https://developer.gnome.org/glib/。在以下网址记录了几个容器：https : //developer.gnome.org/glib/stable/glib-data-types.html，包括“哈希表”和“平衡二叉树”。执照：LGPL
gnulib：GNU项目的https://www.gnu.org/software/gnulib/。您打算将源代码复制粘贴到您的代码中。在以下网址记录了几个容器：https : //www.gnu.org/software/gnulib/MODULES.html#ansic_ext_container，包括“ rbtree-list”，“ linkedhash-list”和“ rbtreehash-list”。GPL许可证。

另请参阅：是否有具有通用数据结构的开源C库？

— Ciro Santilli郝海东冠状病六四事件法轮功
source

2

这是一个快速的工具，我用它从字符串中获取“矩阵”（sruct）。您可以拥有更大的数组，并在运行时更改其值：

typedef struct  { int** lines; int isDefined; }mat;
mat matA, matB, matC, matD, matE, matF;

/* an auxilary struct to be used in a dictionary */
typedef struct  { char* str; mat *matrix; }stringToMat;

/* creating a 'dictionary' for a mat name to its mat. lower case only! */
stringToMat matCases [] =
{
    { "mat_a", &matA },
    { "mat_b", &matB },
    { "mat_c", &matC },
    { "mat_d", &matD },
    { "mat_e", &matE },
    { "mat_f", &matF },
};

mat* getMat(char * str)
{
    stringToMat* pCase;
    mat * selected = NULL;
    if (str != NULL)
    {
        /* runing on the dictionary to get the mat selected */
        for(pCase = matCases; pCase != matCases + sizeof(matCases) / sizeof(matCases[0]); pCase++ )
        {
            if(!strcmp( pCase->str, str))
                selected = (pCase->matrix);
        }
        if (selected == NULL)
            printf("%s is not a valid matrix name\n", str);
    }
    else
        printf("expected matrix name, got NULL\n");
    return selected;
}

— 达戈尔茨
source

2

令我惊讶的是，没有人提到hsearch / hcreate库集，尽管这些库在Windows上不可用，但受POSIX的委托，因此在Linux / GNU系统中可用。

该链接有一个简单而完整的基本示例，很好地解释了其用法。

它甚至具有线程安全的变体，易于使用且性能非常好。

— fkl
source

2

值得一提的是，尽管我自己还没有尝试过，但这里的人却说这是无法使用的：stackoverflow.com/a/6118591/895245

— Ciro Santilli郝海东冠状病六四事件法轮功

1

相当公平，但是，我已经在至少一个应用程序中尝试了hcreate_r（用于多个哈希表）版本，该版本运行了足够长的时间才能视为真实世界。同意它是GNU扩展，但是其他许多库也是如此。尽管我仍然会争辩说您仍然可以将其用于在某个现实应用中运行的一个大型键值对

— fkl

0

哈希表是简单的“字典”的传统实现。如果您不在乎速度或大小，请通过google搜索。有许多免费的实现。

这是我看到的第一个 -乍一看，对我来说还可以。（这是非常基本的。如果您确实希望它容纳无限量的数据，则需要添加一些逻辑以随着表内存的增长来“重新分配”。）

祝好运！

— 背风处
source

-1

哈希是关键。我认为为此使用查找表和哈希键。您可以在线找到许多哈希函数。

— ashmish2
source

-1

最快的方法是使用二叉树。最坏的情况也只有O（logn）。

— 程序员
source

15

这是不正确的。当二叉树不平衡时，最坏情况的查找是O（n）（由于错误的插入顺序而导致的简并情况，基本上导致了链接列表）。

— 兰迪·霍华德