在哪里可以找到Java中基于Trie的标准地图实现?[关闭]


76

我有一个Java程序,它存储了很多从字符串到各种对象的映射。

现在,我的选择是依靠散列(通过HashMap)或二进制搜索(通过TreeMap)。我想知道在流行且优质的收藏库中是否有一种有效且基于标准的基于trie的地图实现?

我过去写过自己的书,但是我愿意使用一些标准的东西(如果有的话)。

快速澄清:虽然我的问题很笼统,但在当前项目中,我正在处理大量数据,这些数据已通过完全合格的类名或方法签名进行了索引。因此,存在许多共享前缀。


事先知道琴弦吗?是否仅需要通过字符串访问它们?
sfossen,2009年

Answers:


32

您可能想看看Limewire为Google Guava贡献Trie实现


8
看起来Google馆藏已被Guava code.google.com/p/guava-libraries取代,但是很遗憾,我在任何地方都看不到Trie类。Patricia Trie现在似乎拥有自己的项目页面:code.google.com/p/patricia-trie
Dan J

1
Limewire / Google链接现在也有些混乱。虽然我确实设法找到了包含实际文件的code.google.com/archive/p/google-collections/issues/5,但请注意,Apache Commons Collections附带了许多尝试(包括patricia trie)。那就是我现在推荐的那个。
Jason C

同样,Apache Commons实现似乎与Limewire贡献源于同一地方,因为PatriciaTrie的Commons文档中的摘要注释与Limewire贡献的实现中的摘要注释相同。
杰森C

10

核心Java库中没有trie数据结构。

这可能是因为尝试通常被设计为存储字符串,而Java数据结构更通用,通常包含任何字符串Object(定义相等性和哈希操作),尽管有时它们限于 Comparable对象(定义顺序)。尽管CharSequence适用于字符串,但“符号序列”没有通用的抽象,我想您可以Iterable对其他类型的符号进行处理。

这是要考虑的另一点:当尝试在Java中实现传统的trie时,您很快就会遇到Java支持Unicode的事实。为了提高空间效率,您必须将特里中的字符串限制为某些符号子集,或者放弃将符号节点索引的数组中存储子节点的常规方法。这可能是为什么尝试被认为不够通用而不能包含在核心库中的另一个原因,并且是您实施自己的库或使用第三方库时要提防的地方。


这个答案假设我想为字符串实现特里。特里是一种通用的数据结构,能够保存任意序列并提供快速的前缀查找。
保罗·德雷珀

1
@PaulDraper这个答案不假设您想要什么,因为您是在提出问题多年后才出现的。而且由于该问题是专门针对字符串的,因此这是此答案的重点。尽管我确实花了很多时间指出Java特里将需要推广到任何类型的Comparable
埃里克森


4

Apache Commons集合 v4.0现在支持trie结构。

有关更多信息,请参见org.apache.commons.collections4.trie包装信息。特别是,检查PatriciaTrie类:

PATRICIA Trie(检索字母数字编码信息的实用算法)的实现。

PATRICIA Trie是压缩的Trie。PATRICIA不是将所有数据存储在Trie的边缘(并且内部节点为空),而是将数据存储在每个节点中。这允许非常有效的遍历,插入,删除,前任,后继,前缀,范围和select(Object)操作。所有操作都在O(K)时间中最差的时间执行,其中K是树中最大项的位数。实际上,操作实际上需要O(A(K))时间,其中A(K)是树中所有项目的平均位数。


3

在这里编写并发布了一个简单而快速的实现。


我想这样,但是您的每个节点都需要1024个字节,并且仅代表一个字符。由于Java更改了substring()的语义,现在插入也需要O(n ^ 2)时间。这种实现确实不是很实用。
Stefan Reich

@Stefan Reich,该数组空间仅用于内部节点,考虑到Trie树的扩散速度,该数组空间几乎消失了。
梅琳达·格林

感谢您的回答,但我不相信。尝试可能不会总是迅速进行,实际上它们可能不会包含真实数据。您的阵列扫描内容的速度也很慢。我们应该真正使用Patricia Tries使事情紧凑高效。我已经制定了自己的实现,我可能会在不久后发布。没有辛苦的感觉,只是尝试进行优化:)许多问候
Stefan Reich

我的尝试只能迅速散开,因为冗余被排除在外并存储在“前缀”成员中。根据您要优化的内容,有很多不同的实现方式。就我而言,我的目标是简单实用。
Melinda Green

1
啊,我确实误解了代码的那部分。有太多“对象”和演员表,以至于我没有看到它。所以这是帕特里夏·特里(Patricia Trie)。我的错。
Stefan Reich





0

如果您需要排序的地图,那么尝试是值得的。如果您不这样做,那么hashmap会更好。带有字符串键的哈希表可以通过标准Java实现进行改进: 数组哈希表



0

下面是Trie的基本HashMap实现。有些人可能会觉得这很有用...

class Trie {

    HashMap<Character, HashMap> root;

    public Trie() {
        root = new HashMap<Character, HashMap>();
    }

    public void addWord(String word) {
        HashMap<Character, HashMap> node = root;
        for (int i = 0; i < word.length(); i++) {
            Character currentLetter = word.charAt(i);
            if (node.containsKey(currentLetter) == false) {
                node.put(currentLetter, new HashMap<Character, HashMap>());
            }
            node = node.get(currentLetter);
        }
    }

    public boolean containsPrefix(String word) {
        HashMap<Character, HashMap> node = root;
        for (int i = 0; i < word.length(); i++) {
            Character currentLetter = word.charAt(i);
            if (node.containsKey(currentLetter)) {
                node = node.get(currentLetter);
            } else {
                return false;
            }
        }
        return true;
    }
}

0

这是我的实现,请通过以下方式享受它:GitHub-MyTrie.java

/* usage:
    MyTrie trie = new MyTrie();
    trie.insert("abcde");
    trie.insert("abc");
    trie.insert("sadas");
    trie.insert("abc");
    trie.insert("wqwqd");
    System.out.println(trie.contains("abc"));
    System.out.println(trie.contains("abcd"));
    System.out.println(trie.contains("abcdefg"));
    System.out.println(trie.contains("ab"));
    System.out.println(trie.getWordCount("abc"));
    System.out.println(trie.getAllDistinctWords());
*/

import java.util.*;

public class MyTrie {
  private class Node {
    public int[] next = new int[26];
    public int wordCount;
    public Node() {
      for(int i=0;i<26;i++) {
        next[i] = NULL;
      }
      wordCount = 0;
    }
  }

  private int curr;
  private Node[] nodes;
  private List<String> allDistinctWords;
  public final static int NULL = -1;

  public MyTrie() {
    nodes = new Node[100000];
    nodes[0] = new Node();
    curr = 1;
  }

  private int getIndex(char c) {
    return (int)(c - 'a');
  }

  private void depthSearchWord(int x, String currWord) {
    for(int i=0;i<26;i++) {
      int p = nodes[x].next[i];
      if(p != NULL) {
        String word = currWord + (char)(i + 'a');
        if(nodes[p].wordCount > 0) {
          allDistinctWords.add(word);
        }
        depthSearchWord(p, word);
      }
    }
  }

  public List<String> getAllDistinctWords() {
    allDistinctWords = new ArrayList<String>();
    depthSearchWord(0, "");
    return allDistinctWords;
  }

  public int getWordCount(String str) {
    int len = str.length();
    int p = 0;
    for(int i=0;i<len;i++) {
      int j = getIndex(str.charAt(i));
      if(nodes[p].next[j] == NULL) {
        return 0;
      }
      p = nodes[p].next[j];
    }
    return nodes[p].wordCount;
  }

  public boolean contains(String str) {
    int len = str.length();
    int p = 0;
    for(int i=0;i<len;i++) {
      int j = getIndex(str.charAt(i));
      if(nodes[p].next[j] == NULL) {
        return false;
      }
      p = nodes[p].next[j];
    }
    return nodes[p].wordCount > 0;
  }

  public void insert(String str) {
    int len = str.length();
    int p = 0;
    for(int i=0;i<len;i++) {
      int j = getIndex(str.charAt(i));
      if(nodes[p].next[j] == NULL) {
        nodes[curr] = new Node();
        nodes[p].next[j] = curr;
        curr++;
      }
      p = nodes[p].next[j];
    }
    nodes[p].wordCount++;
  }
}

0

我刚刚尝试了自己的并发TRIE实现,但不是基于字符,而是基于HashCode。我们仍然可以为每个CHAR hascode使用具有Map of Map的地图。
您可以使用以下代码进行测试 https: //github.com/skanagavelu/TrieHashMap/blob/master/src/TrieMapPerformanceTest.java https://github.com/skanagavelu/TrieHashMap/blob/master/src/TrieMapValidationTest.java

import java.util.concurrent.atomic.AtomicReferenceArray;

public class TrieMap {
    public static int SIZEOFEDGE = 4; 
    public static int OSIZE = 5000;
}

abstract class Node {
    public Node getLink(String key, int hash, int level){
        throw new UnsupportedOperationException();
    }
    public Node createLink(int hash, int level, String key, String val) {
        throw new UnsupportedOperationException();
    }
    public Node removeLink(String key, int hash, int level){
        throw new UnsupportedOperationException();
    }
}

class Vertex extends Node {
    String key;
    volatile String val;
    volatile Vertex next;

    public Vertex(String key, String val) {
        this.key = key;
        this.val = val;
    }

    @Override
    public boolean equals(Object obj) {
        Vertex v = (Vertex) obj;
        return this.key.equals(v.key);
    }

    @Override
    public int hashCode() {
        return key.hashCode();
    }

    @Override
    public String toString() {
        return key +"@"+key.hashCode();
    }
}


class Edge extends Node {
    volatile AtomicReferenceArray<Node> array; //This is needed to ensure array elements are volatile

    public Edge(int size) {
        array = new AtomicReferenceArray<Node>(8);
    }


    @Override
    public Node getLink(String key, int hash, int level){
        int index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level);
        Node returnVal = array.get(index);
        for(;;) {
            if(returnVal == null) {
                return null;
            }
            else if((returnVal instanceof Vertex)) {
                Vertex node = (Vertex) returnVal;
                for(;node != null; node = node.next) {
                    if(node.key.equals(key)) {  
                        return node; 
                    }
                } 
                return null;
            } else { //instanceof Edge
                level = level + 1;
                index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level);
                Edge e = (Edge) returnVal;
                returnVal = e.array.get(index);
            }
        }
    }

    @Override
    public Node createLink(int hash, int level, String key, String val) { //Remove size
        for(;;) { //Repeat the work on the current node, since some other thread modified this node
            int index =  Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level);
            Node nodeAtIndex = array.get(index);
            if ( nodeAtIndex == null) {  
                Vertex newV = new Vertex(key, val);
                boolean result = array.compareAndSet(index, null, newV);
                if(result == Boolean.TRUE) {
                    return newV;
                }
                //continue; since new node is inserted by other thread, hence repeat it.
            } 
            else if(nodeAtIndex instanceof Vertex) {
                Vertex vrtexAtIndex = (Vertex) nodeAtIndex;
                int newIndex = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, vrtexAtIndex.hashCode(), level+1);
                int newIndex1 = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level+1);
                Edge edge = new Edge(Base10ToBaseX.Base.BASE8.getLevelZeroMask()+1);
                if(newIndex != newIndex1) {
                    Vertex newV = new Vertex(key, val);
                    edge.array.set(newIndex, vrtexAtIndex);
                    edge.array.set(newIndex1, newV);
                    boolean result = array.compareAndSet(index, vrtexAtIndex, edge); //REPLACE vertex to edge
                    if(result == Boolean.TRUE) {
                        return newV;
                    }
                   //continue; since vrtexAtIndex may be removed or changed to Edge already.
                } else if(vrtexAtIndex.key.hashCode() == hash) {//vrtex.hash == hash) {       HERE newIndex == newIndex1
                    synchronized (vrtexAtIndex) {   
                        boolean result = array.compareAndSet(index, vrtexAtIndex, vrtexAtIndex); //Double check this vertex is not removed.
                        if(result == Boolean.TRUE) {
                            Vertex prevV = vrtexAtIndex;
                            for(;vrtexAtIndex != null; vrtexAtIndex = vrtexAtIndex.next) {
                                prevV = vrtexAtIndex; // prevV is used to handle when vrtexAtIndex reached NULL
                                if(vrtexAtIndex.key.equals(key)){
                                    vrtexAtIndex.val = val;
                                    return vrtexAtIndex;
                                }
                            } 
                            Vertex newV = new Vertex(key, val);
                            prevV.next = newV; // Within SYNCHRONIZATION since prevV.next may be added with some other.
                            return newV;
                        }
                        //Continue; vrtexAtIndex got changed
                    }
                } else {   //HERE newIndex == newIndex1  BUT vrtex.hash != hash
                    edge.array.set(newIndex, vrtexAtIndex);
                    boolean result = array.compareAndSet(index, vrtexAtIndex, edge); //REPLACE vertex to edge
                    if(result == Boolean.TRUE) {
                        return edge.createLink(hash, (level + 1), key, val);
                    }
                }
            }               
            else {  //instanceof Edge
                return nodeAtIndex.createLink(hash, (level + 1), key, val);
            }
        }
    }




    @Override
    public Node removeLink(String key, int hash, int level){
        for(;;) {
            int index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level);
            Node returnVal = array.get(index);
            if(returnVal == null) {
                return null;
            }
            else if((returnVal instanceof Vertex)) {
                synchronized (returnVal) {
                    Vertex node = (Vertex) returnVal;
                    if(node.next == null) {
                        if(node.key.equals(key)) {
                            boolean result = array.compareAndSet(index, node, null); 
                            if(result == Boolean.TRUE) {
                                return node;
                            }
                            continue; //Vertex may be changed to Edge
                        }
                        return null;  //Nothing found; This is not the same vertex we are looking for. Here hashcode is same but key is different. 
                    } else {
                        if(node.key.equals(key)) { //Removing the first node in the link
                            boolean result = array.compareAndSet(index, node, node.next);
                            if(result == Boolean.TRUE) {
                                return node;
                            }
                            continue; //Vertex(node) may be changed to Edge, so try again.
                        }
                        Vertex prevV = node; // prevV is used to handle when vrtexAtIndex is found and to be removed from its previous
                        node = node.next;
                        for(;node != null; prevV = node, node = node.next) {
                            if(node.key.equals(key)) {
                                prevV.next = node.next; //Removing other than first node in the link
                                return node; 
                            }
                        } 
                        return null;  //Nothing found in the linked list.
                    }
                }
            } else { //instanceof Edge
                return returnVal.removeLink(key, hash, (level + 1));
            }
        }
    }

}



class Base10ToBaseX {
    public static enum Base {
        /**
         * Integer is represented in 32 bit in 32 bit machine.
         * There we can split this integer no of bits into multiples of 1,2,4,8,16 bits
         */
        BASE2(1,1,32), BASE4(3,2,16), BASE8(7,3,11)/* OCTAL*/, /*BASE10(3,2),*/ 
        BASE16(15, 4, 8){       
            public String getFormattedValue(int val){
                switch(val) {
                case 10:
                    return "A";
                case 11:
                    return "B";
                case 12:
                    return "C";
                case 13:
                    return "D";
                case 14:
                    return "E";
                case 15:
                    return "F";
                default:
                    return "" + val;
                }

            }
        }, /*BASE32(31,5,1),*/ BASE256(255, 8, 4), /*BASE512(511,9),*/ Base65536(65535, 16, 2);

        private int LEVEL_0_MASK;
        private int LEVEL_1_ROTATION;
        private int MAX_ROTATION;

        Base(int levelZeroMask, int levelOneRotation, int maxPossibleRotation) {
            this.LEVEL_0_MASK = levelZeroMask;
            this.LEVEL_1_ROTATION = levelOneRotation;
            this.MAX_ROTATION = maxPossibleRotation;
        }

        int getLevelZeroMask(){
            return LEVEL_0_MASK;
        }
        int getLevelOneRotation(){
            return LEVEL_1_ROTATION;
        }
        int getMaxRotation(){
            return MAX_ROTATION;
        }
        String getFormattedValue(int val){
            return "" + val;
        }
    }

    public static int getBaseXValueOnAtLevel(Base base, int on, int level) {
        if(level > base.getMaxRotation() || level < 1) {
            return 0; //INVALID Input
        }
        int rotation = base.getLevelOneRotation();
        int mask = base.getLevelZeroMask();

        if(level > 1) {
            rotation = (level-1) * rotation;
            mask = mask << rotation;
        } else {
            rotation = 0;
        }
        return (on & mask) >>> rotation;
    }
}
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.