说明使用位向量确定所有字符是否唯一


150

我对位向量如何实现此功能感到困惑(对位向量不太熟悉)。这是给出的代码。有人可以帮我解决这个问题吗?

public static boolean isUniqueChars(String str) {
    int checker = 0;
    for (int i = 0; i < str.length(); ++i) {
        int val = str.charAt(i) - 'a';
        if ((checker & (1 << val)) > 0) return false;
        checker |= (1 << val);
    }
    return true;
}

特别是,这是checker怎么做的?


它在Java中,但是如果C / C ++中有类似的东西对我更有用。
user1136342'2

101
这段代码摘自《 Cracking The Code Interview》
Dejell 2013年

2
你测试过了吗?似乎是因为它被设置为0,左移它仍然会保持在0它将无法检测重复的“一”字
里兹

3
请注意,该解决方案用于较低的字符az,这意味着我们正在使用它来查找26个字符的重复项。因此,可以在这里使用32位int。如果范围更大,则解决方案将不起作用。
a3.14_Infinity 2015年

1
人们会犯错误的地方是他们混淆了Left shift运算符的语法-它是x左移1(= str.charAt(i)-'a')的位置,而不是x左移1位。
nanosoft

Answers:


100

int checker在这里用作位存储。整数值中的每个位都可以视为一个标志,因此最终int是一个位数组(标志)。代码中的每一位都说明是否在字符串中找到带有位索引的字符。您可以出于相同的原因而不是使用位向量int。它们之间有两个区别:

  • 大小int具有固定大小,通常为4个字节,这意味着8 * 4 = 32位(标志)。位向量通常可以具有不同的大小,或者应在构造函数中指定大小。

  • API。使用位向量,您将更容易阅读代码,可能是这样的:

    vector.SetFlag(4, true); // set flag at index 4 as true

    因为int您将具有较低级别的位逻辑代码:

    checker |= (1 << 5); // set flag at index 5 to true

也可能int会快一点,因为带位的操作级别很低,可以按原样由CPU执行。BitVector允许编写更少的加密代码,而且它可以存储更多标志。

供将来参考:位向量也称为bitSet或bitArray。这是针对不同语言/平台的此数据结构的一些链接:


Java是否有BitVector类?我找不到任何文档!
Dejell

该大小具有固定大小,为32位。这是否意味着它只能测试32个字符的唯一性?我已经测试过,该函数可以测试“ abcdefgZZ”是否为假,但“ abcdefg @@”是否为真。
tli2020 2014年

1
Google带领我来到这里。@Dejel这是您可以使用的java数据结构:docs.oracle.com/javase/7/docs/api/java/util/BitSet.html。希望这可以帮助穿越管间的人。
nattyddubbs 2014年

@nattyddubbs,谢谢,我已经将此答案以及其他几个链接添加到了
Snowbear 2014年

222

我偷偷地怀疑你是从我正在读的同一本书中获得这段代码的...这里的代码本身并不像运算符-| =,&和<<那样隐秘我们是非专业人士-作者并不需要花很多时间在解释过程上,也没有花在这里涉及的实际机制。一开始,我对这个线程的先前答案很满意,但仅在抽象级别上。我之所以回到它的位置,是因为我觉得需要做出更具体的解释-缺少解释总是会让我感到不安。

该运算符<<是一个左按位移位器,它采用该数字或操作数的二进制表示形式,并将其在由操作数或右侧数字指定的许多位置上移位,就像仅二进制数中的十进制数字一样。当我们向上移动时,我们乘以2为底,但是很多地方都不以10为底,因此右边的数字是指数,左边的数字是2的基本倍数。

此运算符| =取左侧的操作数,或将其与右侧的操作数相乘,然后将其取为-,'&'和将其两个操作数的位分别移至其左侧和右侧。

因此,我们这里是一个哈希表,每次检查器将checker |= (1 << val)字母的指定二进制值或对应的位设置为true 时,该哈希表就会以32位二进制数存储。字符的值与校验器(checker & (1 << val)) > 0)相加-如果该值大于0,我们知道我们有一个重复字符-因为两个相同的位设置为true并加在一起将返回true或“ 1”。

有26个二进制位置,每个位置对应一个小写字母(作者曾说过假设字符串仅包含小写字母),这是因为我们只剩下6个(以32位整数表示)位置需要消耗,并且比发生碰撞

00000000000000000000000000000001 a 2^0

00000000000000000000000000000010 b 2^1

00000000000000000000000000000100 c 2^2

00000000000000000000000000001000 d 2^3

00000000000000000000000000010000 e 2^4

00000000000000000000000000100000 f 2^5

00000000000000000000000001000000 g 2^6

00000000000000000000000010000000 h 2^7

00000000000000000000000100000000 i 2^8

00000000000000000000001000000000 j 2^9

00000000000000000000010000000000 k 2^10

00000000000000000000100000000000 l 2^11

00000000000000000001000000000000 m 2^12

00000000000000000010000000000000 n 2^13

00000000000000000100000000000000 o 2^14

00000000000000001000000000000000 p 2^15

00000000000000010000000000000000 q 2^16

00000000000000100000000000000000 r 2^17

00000000000001000000000000000000 s 2^18

00000000000010000000000000000000 t 2^19

00000000000100000000000000000000 u 2^20

00000000001000000000000000000000 v 2^21

00000000010000000000000000000000 w 2^22

00000000100000000000000000000000 x 2^23

00000001000000000000000000000000 y 2^24

00000010000000000000000000000000 z 2^25

因此,对于输入字符串“ azya”,我们将逐步进行

字符串“ a”

a      =00000000000000000000000000000001
checker=00000000000000000000000000000000

checker='a' or checker;
// checker now becomes = 00000000000000000000000000000001
checker=00000000000000000000000000000001

a and checker=0 no dupes condition

字符串“ az”

checker=00000000000000000000000000000001
z      =00000010000000000000000000000000

z and checker=0 no dupes 

checker=z or checker;
// checker now becomes 00000010000000000000000000000001  

字符串“ azy”

checker= 00000010000000000000000000000001    
y      = 00000001000000000000000000000000 

checker and y=0 no dupes condition 

checker= checker or y;
// checker now becomes = 00000011000000000000000000000001

字符串“ azya”

checker= 00000011000000000000000000000001
a      = 00000000000000000000000000000001

a and checker=1 we have a dupe

现在,它声明一个重复项


@ ivan-tichy您测试过了吗?似乎是因为它被设置为0,左移它仍然会保持在0它将无法检测重复的“一”字
里兹

1
@Riz否,其始终以“ 1”开头,该算法根据字母将其移位1。因此,如果字母“ a”出现一次,它将为1,即(.... 000001)。
Taylor Halliday 2014年

2
@Ivan Man,我在想同样的事情。即使选择的答案也无法说明操作员。感谢您提供详细信息。
WowBow

我应该假设上述唯一检查仅适用于已排序字符集(abcd ... z)吗?不与(bcad ...)
abdul

“我偷偷地怀疑您是从我正在读的同一本书中获得此代码的”在这里同样是:)让我发笑
骨干

39

我认为所有这些答案都可以解释其工作原理,但是我想通过重命名一些变量,添加一些其他变量并为其添加注释来更好地看待它:

public static boolean isUniqueChars(String str) {

    /*
    checker is the bit array, it will have a 1 on the character index that
    has appeared before and a 0 if the character has not appeared, you
    can see this number initialized as 32 0 bits:
    00000000 00000000 00000000 00000000
     */
    int checker = 0;

    //loop through each String character
    for (int i = 0; i < str.length(); ++i) {
        /*
        a through z in ASCII are charactets numbered 97 through 122, 26 characters total
        with this, you get a number between 0 and 25 to represent each character index
        0 for 'a' and 25 for 'z'

        renamed 'val' as 'characterIndex' to be more descriptive
         */
        int characterIndex = str.charAt(i) - 'a'; //char 'a' would get 0 and char 'z' would get 26

        /*
        created a new variable to make things clearer 'singleBitOnPosition'

        It is used to calculate a number that represents the bit value of having that 
        character index as a 1 and the rest as a 0, this is achieved
        by getting the single digit 1 and shifting it to the left as many
        times as the character index requires
        e.g. character 'd'
        00000000 00000000 00000000 00000001
        Shift 3 spaces to the left (<<) because 'd' index is number 3
        1 shift: 00000000 00000000 00000000 00000010
        2 shift: 00000000 00000000 00000000 00000100
        3 shift: 00000000 00000000 00000000 00001000

        Therefore the number representing 'd' is
        00000000 00000000 00000000 00001000

         */
        int singleBitOnPosition = 1 << characterIndex;

        /*
        This peforms an AND between the checker, which is the bit array
        containing everything that has been found before and the number
        representing the bit that will be turned on for this particular
        character. e.g.
        if we have already seen 'a', 'b' and 'd', checker will have:
        checker = 00000000 00000000 00000000 00001011
        And if we see 'b' again:
        'b' = 00000000 00000000 00000000 00000010

        it will do the following:
        00000000 00000000 00000000 00001011
        & (AND)
        00000000 00000000 00000000 00000010
        -----------------------------------
        00000000 00000000 00000000 00000010

        Since this number is different than '0' it means that the character
        was seen before, because on that character index we already have a 
        1 bit value
         */
        if ((checker & singleBitOnPosition) > 0) {
            return false;
        }

        /* 
        Remember that 
        checker |= singleBitOnPosition is the same as  
        checker = checker | singleBitOnPosition
        Sometimes it is easier to see it expanded like that.

        What this achieves is that it builds the checker to have the new 
        value it hasnt seen, by doing an OR between checker and the value 
        representing this character index as a 1. e.g.
        If the character is 'f' and the checker has seen 'g' and 'a', the 
        following will happen

        'f' = 00000000 00000000 00000000 00100000
        checker(seen 'a' and 'g' so far) = 00000000 00000000 00000000 01000001

        00000000 00000000 00000000 00100000
        | (OR)
        00000000 00000000 00000000 01000001
        -----------------------------------
        00000000 00000000 00000000 01100001

        Therefore getting a new checker as 00000000 00000000 00000000 01100001

         */
        checker |= singleBitOnPosition;
    }
    return true;
}

2
很好的解释。谢谢!
Hormigas

清楚的解释
..谢谢

很好的解释。容易理解。谢谢
Anil Kumar

那是最好的
弗拉基米尔·纳博科夫

这就是发明注释的原因。
Suryaa Jha先生,

30

我还假设您的示例来自《Cracking The Code Interview》一书而我的回答与此相关。

为了使用此算法解决问题,我们必须承认我们只将字符从a传递到z(小写)。

由于只有26个字母,并且这些字母在我们使用的编码表中已正确排序,因此可以保证我们所有的潜在差异str.charAt(i) - 'a'都小于32(int变量的大小checker)。

正如Snowbear所解释的,我们将使用checker变量作为位数组。让我们举个例子:

比方说 str equals "test"

  • 首过(i = t)

检查器== 0(00000000000000000000000000000000000000)

In ASCII, val = str.charAt(i) - 'a' = 116 - 97 = 19
What about 1 << val ?
1          == 00000000000000000000000000000001
1 << 19    == 00000000000010000000000000000000
checker |= (1 << val) means checker = checker | (1 << val)
so checker = 00000000000000000000000000000000 | 00000000000010000000000000000000
checker == 524288 (00000000000010000000000000000000)
  • 第二遍(i = e)

检查器== 524288(00000000000010000000000000000000)

val = 101 - 97 = 4
1          == 00000000000000000000000000000001
1 << 4     == 00000000000000000000000000010000
checker |= (1 << val) 
so checker = 00000000000010000000000000000000 | 00000000000000000000000000010000
checker == 524304 (00000000000010000000000000010000)

依此类推..直到我们通过条件找到特定字符在检查器中已经设置的位

(checker & (1 << val)) > 0

希望能帮助到你


2
比其余IMO更好的解释,但是我仍然不知道的一件事是checker = 00000000000010000000000000000000 | 00000000000000000000000000000000010000不是按位| = OR运算符。那会不会选择一个值或另一个?为什么使用和设置两个位?
CodeCrack 2015年

@CodeCrack您说这是按位或。它在位级别而不是位阵列级别进行比较。注意:int是位数组
MusicMan

7

上面已经提供了几个很好的答案。所以我不想重复已经说过的一切。但是我确实想添加一些东西来帮助上述程序,因为我刚完成同一个程序并遇到了几个问题,但是花了一些时间后,我对该程序有了更多的了解。

首先,“ checker”用于跟踪字符串中已经遍历的字符,以查看是否有任何字符被重复。

现在,“ checker”是一个int数据类型,因此它只能具有32位或4个字节(取决于平台),因此该程序只能对32个字符范围内的字符集正常工作。这就是原因,此程序从每个字符中减去“ a”,以使该程序仅针对小写字符运行。但是,如果您混合使用大写和小写字符,则将无法使用。

顺便说一句,如果您不从每个字符中减去“ a”(请参阅​​下面的语句),则该程序将仅对具有大写字符的String或仅具有小写字符的String正确运行。因此,上述程序的范围也从小写字符增加到大写字符,但不能将它们混合在一起。

int val = str.charAt(i) - 'a'; 

但是,我想编写一个使用按位运算的通用程序,该程序应适用于任何ASCII字符,而不必担心大写,小写,数字或任何特殊字符。为此,我们的“检查器”应足够大以存储256个字符(ASCII字符集大小)。但是Java中的int不能工作,因为它只能存储32位。因此,在下面的程序中,我使用JDK中可用的BitSet类,该类可以在实例化BitSet对象时传递任何用户定义的大小。

这是一个与上述使用Bitwise运算符编写的程序具有相同功能的程序,但该程序可用于具有ASCII字符集任何字符的字符串。

public static boolean isUniqueStringUsingBitVectorClass(String s) {

    final int ASCII_CHARACTER_SET_SIZE = 256;

    final BitSet tracker = new BitSet(ASCII_CHARACTER_SET_SIZE);

    // if more than  256 ASCII characters then there can't be unique characters
    if(s.length() > 256) {
        return false;
    }

    //this will be used to keep the location of each character in String
    final BitSet charBitLocation = new BitSet(ASCII_CHARACTER_SET_SIZE);

    for(int i = 0; i < s.length(); i++) {

        int charVal = s.charAt(i);
        charBitLocation.set(charVal); //set the char location in BitSet

        //check if tracker has already bit set with the bit present in charBitLocation
        if(tracker.intersects(charBitLocation)) {
            return false;
        }

        //set the tracker with new bit from charBitLocation
        tracker.or(charBitLocation);

        charBitLocation.clear(); //clear charBitLocation to store bit for character in the next iteration of the loop

    }

    return true;

}

1
我一直在寻找这种解决方案,但是不需要两个BitSet变量。仅跟踪器就足够了。更新了循环代码: for(int i = 0; i < s.length(); i++) { int charVal = s.charAt(i); if(tracker.get(charVal)) { return false; } tracker.set(charVal); }
zambro

7

读上面的Ivan的回答确实对我有帮助,尽管我的说法有些不同。

<<(1 << val)是一个比特移位操作。它需要1(以二进制形式表示为000000001,其中您喜欢的/前面的零个数由内存分配),然后将其向左移动val空格。由于我们仅假设az且a每次都减去,因此每个字母的值都为0-25,这是checker整数的布尔表示形式中字母从右到右的索引,因为我们会1checker val时间上将其向左移动。

在每次检查结束时,我们都会看到|=操作员。这将合并两个二进制数,更换所有0的用1的,如果1该指数在一个操作数存在。在这里,这意味着,无论1存在于哪里(1 << val)1都会将其复制到中checker,而所有checker现有的1都会保留下来。

您可能会猜到,1这里的功能作为布尔标志为true。当我们检查字符串中是否已经表示一个字符时,我们将比较checker,此时,它实际上是已经表示的字符1索引处的布尔标志(值)数组,而实际上是的数组布尔值1,在当前字符的索引处带有标志。

&操作完成这个检查。与相似|=只有两个操作数在该索引处都有a 时,该&运算符才会复制。因此,从本质上讲,只有已经存在的标志也将被复制。在这种情况下,这意味着仅当已经显示了当前字符时,的结果中才会存在任何位置。并且,如果该操作结果中的任何地方都存在a ,则返回的布尔值是,并且该方法返回false。1 1checker(1 << val)1checker & (1 << val)1> 0

我猜这就是为什么位向量也称为位数组的原因。因为,即使它们不是数组数据类型,也可以类似于使用数组存储布尔标志的方式使用它们。


1
非常有帮助,感谢您的Java信息洒落。
Bachiri Taoufiq Abderrahman

4

简单说明(下面有JS代码)

  • 每个机器代码的整数变量是一个32位数组
  • 所有按位运算都是 32-bit
  • 他们与OS / CPU架构或所选语言的数字系统(例如DEC64JS)无关。
  • 这种重复的发现方法类似于在大小为32的阵列中存储的字符,其中,我们设置0th索引如果发现a字符串中,1st对于b与等等。
  • 字符串中重复的字符将占据其对应的位,或者在这种情况下,设置为1。
  • 伊万(Ivan)已经解释了:在先前的答案中,该指数计算是如何工作的

操作摘要:

  • 在字符的&之间执行AND操作checkerindex
  • 内部都是 Int-32-Arrays
  • 这两个之间是按位操作。
  • 检查if操作的输出是否为1
  • 如果 output == 1
    • checker变量在两个数组中都设置了该特定的第index位
    • 因此,它是重复的。
  • 如果 output == 0
    • 到目前为止尚未找到此角色
    • 在字符的&之间执行“ 或”运算checkerindex
    • 从而,将第index位更新为 1
    • 将输出分配给 checker

假设:

  • 我们假设我们将获得所有小写字符
  • 而且,大小32就足够了
  • 因此,我们开始了我们的指数计数从96为参考点考虑ASCII代码aIS97

下面给出的是JavaScript源代码。

function checkIfUniqueChars (str) {

    var checker = 0; // 32 or 64 bit integer variable 

    for (var i = 0; i< str.length; i++) {
        var index = str[i].charCodeAt(0) - 96;
        var bitRepresentationOfIndex = 1 << index;

        if ( (checker & bitRepresentationOfIndex) > 1) {
            console.log(str, false);
            return false;
        } else {
            checker = (checker | bitRepresentationOfIndex);
        }
    }
    console.log(str, true);
    return true;
}

checkIfUniqueChars("abcdefghi");  // true
checkIfUniqueChars("aabcdefghi"); // false
checkIfUniqueChars("abbcdefghi"); // false
checkIfUniqueChars("abcdefghii"); // false
checkIfUniqueChars("abcdefghii"); // false

请注意,在JS中,尽管整数是64位,但总是对32位进行按位运算。

示例: 如果字符串为aa

// checker is intialized to 32-bit-Int(0)
// therefore, checker is
checker= 00000000000000000000000000000000

我= 0

str[0] is 'a'
str[i].charCodeAt(0) - 96 = 1

checker 'AND' 32-bit-Int(1) = 00000000000000000000000000000000
Boolean(0) == false

// So, we go for the '`OR`' operation.

checker = checker OR 32-bit-Int(1)
checker = 00000000000000000000000000000001

我= 1

str[1] is 'a'
str[i].charCodeAt(0) - 96 = 1

checker= 00000000000000000000000000000001
a      = 00000000000000000000000000000001

checker 'AND' 32-bit-Int(1) = 00000000000000000000000000000001
Boolean(1) == true
// We've our duplicate now

3

让我们逐行分解代码。

int checker = 0; 我们正在启动一个检查器,它将帮助我们找到重复的值。

int val = str.charAt(i)-'a'; 我们正在字符串的第i个位置获取字符的ASCII值,并用ASCII值'a'减去它。由于假设字符串仅是小写字符,所以字符数限制为26。在这种情况下,“ val”的值将始终> = 0。

如果((checker&(1 << val))> 0)返回false;

检查器| =(1 << val);

现在,这是棘手的部分。让我们考虑一个带有字符串“ abcda”的示例。理想情况下,应返回false。

对于循环迭代1:

检查器:00000000000000000000000000000000

值:97-97 = 0

1 << 0:00000000000000000000000000000001

检查器&(1 << val):00000000000000000000000000000000不是> 0

因此检查器:00000000000000000000000000000001

对于循环迭代2:

检查器:00000000000000000000000000000000000001

值:98-97 = 1

1 << 0:00000000000000000000000000000010

检查器&(1 << val):00000000000000000000000000000000不是> 0

因此检查器:00000000000000000000000000000011

对于循环迭代3:

检查器:00000000000000000000000000000011

值:99-97 = 0

1 << 0:00000000000000000000000000000100

检查器&(1 << val):00000000000000000000000000000000不是> 0

因此检查器:00000000000000000000000000000111

对于循环迭代4:

检查器:00000000000000000000000000000111

值:100-97 = 0

1 << 0:00000000000000000000000000001000

检查器&(1 << val):00000000000000000000000000000000不是> 0

因此检查器:00000000000000000000000000001111

对于循环迭代5:

检查器:00000000000000000000000000001111

值:97-97 = 0

1 << 0:00000000000000000000000000000001

检查器&(1 << val):00000000000000000000000000000001>> 0

因此返回false。


val:99-97 = 0应该是val:99-97 = 2和val:100-97 = 0应该是3
Brosef

2
public static void main (String[] args)
{
    //In order to understand this algorithm, it is necessary to understand the following:

    //int checker = 0;
    //Here we are using the primitive int almost like an array of size 32 where the only values can be 1 or 0
    //Since in Java, we have 4 bytes per int, 8 bits per byte, we have a total of 4x8=32 bits to work with

    //int val = str.charAt(i) - 'a';
    //In order to understand what is going on here, we must realize that all characters have a numeric value
    for (int i = 0; i < 256; i++)
    {
        char val = (char)i;
        System.out.print(val);
    }

    //The output is something like:
    //             !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
    //There seems to be ~15 leading spaces that do not copy paste well, so I had to use real spaces instead

    //To only print the characters from 'a' on forward:
    System.out.println();
    System.out.println();

    for (int i=0; i < 256; i++)
    {
        char val = (char)i;
        //char val2 = val + 'a'; //incompatible types. required: char found: int
        int val2 = val + 'a';  //shift to the 'a', we must use an int here otherwise the compiler will complain
        char val3 = (char)val2;  //convert back to char. there should be a more elegant way of doing this.
        System.out.print(val3);
    }

    //Notice how the following does not work:
    System.out.println();
    System.out.println();

    for (int i=0; i < 256; i++)
    {
        char val = (char)i;
        int val2 = val - 'a';
        char val3 = (char)val2;
        System.out.print(val3);
    }
    //I'm not sure why this spills out into 2 lines:
    //EDIT I cant seem to copy this into stackoverflow!

    System.out.println();
    System.out.println();

    //So back to our original algorithm:
    //int val = str.charAt(i) - 'a';
    //We convert the i'th character of the String to a character, and shift it to the right, since adding shifts to the right and subtracting shifts to the left it seems

    //if ((checker & (1 << val)) > 0) return false;
    //This line is quite a mouthful, lets break it down:
    System.out.println(0<<0);
    //00000000000000000000000000000000
    System.out.println(0<<1);
    //00000000000000000000000000000000
    System.out.println(0<<2);
    //00000000000000000000000000000000
    System.out.println(0<<3);
    //00000000000000000000000000000000
    System.out.println(1<<0);
    //00000000000000000000000000000001
    System.out.println(1<<1);
    //00000000000000000000000000000010 == 2
    System.out.println(1<<2);
    //00000000000000000000000000000100 == 4
    System.out.println(1<<3);
    //00000000000000000000000000001000 == 8
    System.out.println(2<<0);
    //00000000000000000000000000000010 == 2
    System.out.println(2<<1);
    //00000000000000000000000000000100 == 4
    System.out.println(2<<2);
    // == 8
    System.out.println(2<<3);
    // == 16
    System.out.println("3<<0 == "+(3<<0));
    // != 4 why 3???
    System.out.println(3<<1);
    //00000000000000000000000000000011 == 3
    //shift left by 1
    //00000000000000000000000000000110 == 6
    System.out.println(3<<2);
    //00000000000000000000000000000011 == 3
    //shift left by 2
    //00000000000000000000000000001100 == 12
    System.out.println(3<<3);
    // 24

    //It seems that the -  'a' is not necessary
    //Back to if ((checker & (1 << val)) > 0) return false;
    //(1 << val means we simply shift 1 by the numeric representation of the current character
    //the bitwise & works as such:
    System.out.println();
    System.out.println();
    System.out.println(0&0);    //0
    System.out.println(0&1);       //0
    System.out.println(0&2);          //0
    System.out.println();
    System.out.println();
    System.out.println(1&0);    //0
    System.out.println(1&1);       //1
    System.out.println(1&2);          //0
    System.out.println(1&3);             //1
    System.out.println();
    System.out.println();
    System.out.println(2&0);    //0
    System.out.println(2&1);       //0   0010 & 0001 == 0000 = 0
    System.out.println(2&2);          //2  0010 & 0010 == 2
    System.out.println(2&3);             //2  0010 & 0011 = 0010 == 2
    System.out.println();
    System.out.println();
    System.out.println(3&0);    //0    0011 & 0000 == 0
    System.out.println(3&1);       //1  0011 & 0001 == 0001 == 1
    System.out.println(3&2);          //2  0011 & 0010 == 0010 == 2, 0&1 = 0 1&1 = 1
    System.out.println(3&3);             //3 why?? 3 == 0011 & 0011 == 3???
    System.out.println(9&11);   // should be... 1001 & 1011 == 1001 == 8+1 == 9?? yay!

    //so when we do (1 << val), we take 0001 and shift it by say, 97 for 'a', since any 'a' is also 97

    //why is it that the result of bitwise & is > 0 means its a dupe?
    //lets see..

    //0011 & 0011 is 0011 means its a dupe
    //0000 & 0011 is 0000 means no dupe
    //0010 & 0001 is 0011 means its no dupe
    //hmm
    //only when it is all 0000 means its no dupe

    //so moving on:
    //checker |= (1 << val)
    //the |= needs exploring:

    int x = 0;
    int y = 1;
    int z = 2;
    int a = 3;
    int b = 4;
    System.out.println("x|=1 "+(x|=1));  //1
    System.out.println(x|=1);     //1
    System.out.println(x|=1);      //1
    System.out.println(x|=1);       //1
    System.out.println(x|=1);       //1
    System.out.println(y|=1); // 0001 |= 0001 == ?? 1????
    System.out.println(y|=2); // ??? == 3 why??? 0001 |= 0010 == 3... hmm
    System.out.println(y);  //should be 3?? 
    System.out.println(y|=1); //already 3 so... 0011 |= 0001... maybe 0011 again? 3?
    System.out.println(y|=2); //0011 |= 0010..... hmm maybe.. 0011??? still 3? yup!
    System.out.println(y|=3); //0011 |= 0011, still 3
    System.out.println(y|=4);  //0011 |= 0100.. should be... 0111? so... 11? no its 7
    System.out.println(y|=5);  //so we're at 7 which is 0111, 0111 |= 0101 means 0111 still 7
    System.out.println(b|=9); //so 0100 |= 1001 is... seems like xor?? or just or i think, just or... so its 1101 so its 13? YAY!

    //so the |= is just a bitwise OR!
}

public static boolean isUniqueChars(String str) {
    int checker = 0;
    for (int i = 0; i < str.length(); ++i) {
        int val = str.charAt(i) - 'a';  //the - 'a' is just smoke and mirrors! not necessary!
        if ((checker & (1 << val)) > 0) return false;
        checker |= (1 << val);
    }
    return true;
}

public static boolean is_unique(String input)
{
    int using_int_as_32_flags = 0;
    for (int i=0; i < input.length(); i++)
    {
        int numeric_representation_of_char_at_i = input.charAt(i);
        int using_0001_and_shifting_it_by_the_numeric_representation = 1 << numeric_representation_of_char_at_i; //here we shift the bitwise representation of 1 by the numeric val of the character
        int result_of_bitwise_and = using_int_as_32_flags & using_0001_and_shifting_it_by_the_numeric_representation;
        boolean already_bit_flagged = result_of_bitwise_and > 0;              //needs clarification why is it that the result of bitwise & is > 0 means its a dupe?
        if (already_bit_flagged)
            return false;
        using_int_as_32_flags |= using_0001_and_shifting_it_by_the_numeric_representation;
    }
    return true;
}

0

以前的文章很好地解释了代码块的功能,我想使用BitSet java数据结构添加我的简单解决方案:

private static String isUniqueCharsUsingBitSet(String string) {
  BitSet bitSet =new BitSet();
    for (int i = 0; i < string.length(); ++i) {
        int val = string.charAt(i);
        if(bitSet.get(val)) return "NO";
        bitSet.set(val);
    }
  return "YES";
}

0
Line 1:   public static boolean isUniqueChars(String str) {
Line 2:      int checker = 0;
Line 3:      for (int i = 0; i < str.length(); ++i) {
Line 4:          int val = str.charAt(i) - 'a';
Line 5:          if ((checker & (1 << val)) > 0) return false;
Line 6:         checker |= (1 << val);
Line 7:      }
Line 8:      return true;
Line 9:   }

我了解使用Javascript的方式。假设输入var inputChar = "abca"; //find if inputChar has all unique characters

开始吧

Line 4: int val = str.charAt(i) - 'a';

该行上方在inputChar中找到第一个字符的Binary值,即a, ascii中的a = 97,然后将97转换为binary成为 1100001

在Javascript中,例如:"a".charCodeAt().toString(2) 返回1100001

checker = 0 //二进制32位表示形式= 0000000000000000000000000

checker = 1100001 | checker; // checker变成1100001(以32位表示,它变成000000000 ..... 00001100001)

但是我希望我的位掩码(int checker)仅设置一位,但检查器为1100001

Line 4:          int val = str.charAt(i) - 'a';

现在上面的代码很方便。我总是总是减去97(a的ASCII值)

val = 0; // 97 - 97  Which is  a - a
val = 1; // 98 - 97 Which is b - a
val = 1;  // 99 - 97 Which is c - a

让我们使用val已重置的

第5行和第6行的解释很好@Ivan答案


0

万一有人使用位向量在字符串中寻找kotlin等效字符,以防万一

fun isUnique(str: String): Boolean {
    var checker = 0
    for (i in str.indices) {
        val bit = str.get(i) - 'a'
        if (checker.and(1 shl bit) > 0) return false
        checker = checker.or(1 shl bit)
    }
    return true
}

参考:https : //www.programiz.com/kotlin-programming/bitwise

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.