C ++中循环移位（旋转）操作的最佳实践

Question 1

左和右移位运算符（<<和>>）已经在C ++中可用。但是，我找不到如何执行循环移位或旋转操作的方法。

如何执行“向左旋转”和“向右旋转”之类的操作？

在这里向右旋转两次

Initial --> 1000 0011 0100 0010

应导致：

Final   --> 1010 0000 1101 0000

一个例子会有所帮助。

（编者注：如果C的旋转数为零，或者编译成多个旋转机械指令，则许多用C表示旋转的常见方法都会遭受不确定的行为。此问题的答案应记录最佳实践。）

Question 2

另请参见此答案的早期版本和另一个轮换问题，其中包含有关asm gcc / clang为x86生成的内容的更多详细信息。

避免任何未定义行为的用C和C ++表示旋转的最易于编译器的方式似乎是John Regehr的实现。我已经对其进行了调整，使其可以按类型的宽度旋转（使用像一样的固定宽度类型uint32_t）。

#include <stdint.h>   // for uint32_t
#include <limits.h>   // for CHAR_BIT
// #define NDEBUG
#include <assert.h>

static inline uint32_t rotl32 (uint32_t n, unsigned int c)
{
  const unsigned int mask = (CHAR_BIT*sizeof(n) - 1);  // assumes width is a power of 2.

  // assert ( (c<=mask) &&"rotate by type width or more");
  c &= mask;
  return (n<<c) | (n>>( (-c)&mask ));
}

static inline uint32_t rotr32 (uint32_t n, unsigned int c)
{
  const unsigned int mask = (CHAR_BIT*sizeof(n) - 1);

  // assert ( (c<=mask) &&"rotate by type width or more");
  c &= mask;
  return (n>>c) | (n<<( (-c)&mask ));
}

适用于任何无符号整数类型，而不仅仅是uint32_t，因此您可以为其他大小创建版本。

另请参见具有大量安全检查的C ++ 11模板版本（包括static_assert类型宽度为2的幂的乘方）例如，在某些24位DSP或36位大型机上不是这种情况。

对于只包含明确包含旋转宽度的名称的包装器，我建议仅使用模板作为后端。 整数提升规则意味着rotl_template(u16 & 0x11UL, 7)可以进行32或64位轮换，而不是16轮换（具体取决于unsigned long）。C ++的整数提升规则uint16_t & uint16_t将偶数提升为signed int，除非在int不大于的平台上uint16_t。

在x86上，此版本联到一个rol r32, cl（或rol r32, imm8）内含编译器的，因为编译器知道x86可以旋转和移位指令掩盖了移位计数，就像C源代码一样。

针对x86上的UB避免用法的编译器支持uint32_t x以及unsigned int n可变计数移位：

clang：自clang3.5开始，可识别为可变计数旋转，在此之前多次移位或插入。
gcc：自gcc4.9起可识别可变计数旋转，可，在此之前多次移位或。gcc5和更高版本也只使用了ror或rol变量计数指令就可以优化Wikipedia版本中的分支和掩码。
icc：从ICC13或更早版本开始支持可变计数旋转。定数轮换使用shld edi,edi,7rol edi,7当无法使用BMI2 rorx eax,edi,25来保存MOV 时，恒定计数循环的速度比某些CPU（尤其是AMD，但也包括某些Intel）要慢，并且占用更多字节。
MSVC：x86-64 CL19：仅固定计数旋转可识别。（可以识别Wikipedia习惯用法，但是分支和AND并没有被优化）。使用_rotl/ _rotr从内在<intrin.h>在x86（包括x86-64）上。

gcc for ARM使用and r1, r1, #31for变量计数轮换，但实际的轮换仍然只使用一条指令：ror r0, r0, r1。因此，gcc并未意识到轮换计数本质上是模块化的。正如ARM文档所说，“具有移位长度的ROR n大于32与具有移位长度的ROR相同n-32”。我认为gcc在这里会感到困惑，因为ARM上的左/右移位会使计数饱和，因此，移位32或更多将清除寄存器。（与x86不同，在x86中，移位掩盖的计数与旋转相同）。由于非循环移位如何在该目标上起作用，因此它可能决定在识别旋转惯用语之前需要使用AND指令。

当前的x86编译器仍然使用额外的指令来掩盖8位和16位循环的变量计数，可能出于相同的原因，他们没有避免在ARM上使用AND。这是错过的优化，因为性能不取决于任何x86-64 CPU上的轮换计数。（出于性能方面的考虑，在286中引入了计数屏蔽功能，因为它可以迭代地处理移位，而不是像现代CPU那样具有恒定延迟。）

顺便说一句，对于可变计数旋转，最好选择向右旋转，以避免编译器32-n在仅提供向右旋转的ARM和MIPS等体系结构上实现向左旋转。（这可以优化编译时常数计数。）

有趣的事实：ARM并没有真正的专用移位/旋转指令，它只是MOV，源操作数在ROR模式下经过桶形移位器：mov r0, r0, ror r1。因此，旋转可以折叠为EOR指令之类的寄存器源操作数。

确保对n和返回值使用无符号类型，否则它将不会是rotation。（用于x86目标的gcc进行算术右移，将符号位的副本而不是零移位，从而在您OR两个值同时移位。负号整数的右移是C中实现定义的行为。）

另外，请确保移位计数是无符号类型，因为(-n)&31带符号类型可以是一个人的补码或符号/大小，而不是与您通过无符号或二进制补码获得的模块化2 ^ n不同。（请参阅Regehr的博客文章的评论）。 unsigned int在我看过的每个编译器上，对于的每个宽度，它的性能都很好x。某些其他类型实际上使某些编译器无法实现习惯用语识别，因此不要仅使用与相同的类型x。

一些编译器提供rotate的内在函数，如果可移植版本无法在目标编译器上生成良好的代码，则它比inline-asm更好。我所知道的任何编译器都没有跨平台内在函数。这些是一些x86选项：

英特尔<immintrin.h>提供_rotl和_rotl64内在文档，以及向右移位的文档。MSVC需要<intrin.h>，而gcc需要<x86intrin.h>。An #ifdef负责gcc和icc，但是clang似乎没有在任何地方提供它们，除非在MSVC兼容模式下-fms-extensions -fms-compatibility -fms-compatibility-version=17.00。它为他们发射的asm很烂（额外的遮罩和CMOV）。
MSVC：_rotr8和_rotr16。
gcc和icc（非clang）： <x86intrin.h>还提供了__rolb/ __rorb用于8位左右旋转，__rolw/ __rorw（16位）， __rold/ __rord（32位），__rolq/ __rorq（64位，仅为64位目标定义）。对于窄轮换，该实现使用__builtin_ia32_rolhi或...qi，但是使用shift / or定义32位和64位轮换（没有针对UB的保护，因为其中的代码ia32intrin.h仅适用于x86的gcc）。GNU C似乎没有实现__builtin_rotate它的方式的任何跨平台功能__builtin_popcount（即使不是单一指令，它也会扩展到目标平台上的最佳状态）。大多数时候，您会从成语识别中获得良好的代码。

// For real use, probably use a rotate intrinsic for MSVC, or this idiom for other compilers.  This pattern of #ifdefs may be helpful
#if defined(__x86_64__) || defined(__i386__)

#ifdef _MSC_VER
#include <intrin.h>
#else
#include <x86intrin.h>  // Not just <immintrin.h> for compilers other than icc
#endif

uint32_t rotl32_x86_intrinsic(rotwidth_t x, unsigned n) {
  //return __builtin_ia32_rorhi(x, 7);  // 16-bit rotate, GNU C
  return _rotl(x, n);  // gcc, icc, msvc.  Intel-defined.
  //return __rold(x, n);  // gcc, icc.
  // can't find anything for clang
}
#endif

大概某些非x86编译器也具有内在函数，但让我们不要扩展这个社区Wiki答案以将它们全部包含在内。（也许可以在关于内在函数的现有答案中做到这一点）。

（此答案的旧版本建议使用MSVC特定的嵌入式asm（仅适用于32位x86代码），或者对于C版本则建议使用http://www.devx.com/tips/Tip/14043。）

内联汇编失败许多优化，尤其是MSVC样式，因为它强制输入要存储/重新加载。精心编写的GNU C内联汇编旋转将允许该计数成为编译时常数移位计数的立即操作数，但是如果要移位的值也是一个编译时常数，它仍然无法完全优化。内联后。 https://gcc.gnu.org/wiki/DontUseInlineAsm。

Question 3

由于它是C ++，因此请使用内联函数：

template <typename INT> 
INT rol(INT val) {
    return (val << 1) | (val >> (sizeof(INT)*CHAR_BIT-1));
}

C ++ 11变体：

template <typename INT> 
constexpr INT rol(INT val) {
    static_assert(std::is_unsigned<INT>::value,
                  "Rotate Left only makes sense for unsigned types");
    return (val << 1) | (val >> (sizeof(INT)*CHAR_BIT-1));
}

Question 4

大多数编译器都具有此内在函数。Visual Studio例如_rotr8，_rotr16

Question 5

明确地：

template<class T>
T ror(T x, unsigned int moves)
{
  return (x >> moves) | (x << sizeof(T)*8 - moves);
}

Question 6

C ++ 20 std::rotl和std::rotr

已经到了！http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0553r4.html并将其添加到<bit>标题中。

cppreference表示用法如下：

#include <bit>
#include <bitset>
#include <cstdint>
#include <iostream>

int main()
{
    std::uint8_t i = 0b00011101;
    std::cout << "i          = " << std::bitset<8>(i) << '\n';
    std::cout << "rotl(i,0)  = " << std::bitset<8>(std::rotl(i,0)) << '\n';
    std::cout << "rotl(i,1)  = " << std::bitset<8>(std::rotl(i,1)) << '\n';
    std::cout << "rotl(i,4)  = " << std::bitset<8>(std::rotl(i,4)) << '\n';
    std::cout << "rotl(i,9)  = " << std::bitset<8>(std::rotl(i,9)) << '\n';
    std::cout << "rotl(i,-1) = " << std::bitset<8>(std::rotl(i,-1)) << '\n';
}

提供输出：

i          = 00011101
rotl(i,0)  = 00011101
rotl(i,1)  = 00111010
rotl(i,4)  = 11010001
rotl(i,9)  = 00111010
rotl(i,-1) = 10001110

当GCC支持到达时，我会尝试一下，GCC 9.1.0 g++-9 -std=c++2a仍然不支持它。

提案说：

标头：

namespace std {
  // 25.5.5, rotating   
  template<class T>
    [[nodiscard]] constexpr T rotl(T x, int s) noexcept;
  template<class T>
    [[nodiscard]] constexpr T rotr(T x, int s) noexcept;

和：

25.5.5旋转[bitops.rot]

在以下描述中，让N表示std::numeric_limits<T>::digits。
template<class T>
  [[nodiscard]] constexpr T rotl(T x, int s) noexcept;
约束：T是无符号整数类型（3.9.1 [basic.fundamental]）。

令r为s％N。

返回：如果r为0，则x; 如果r是正数，(x << r) | (x >> (N - r)); 如果r为负，rotr(x, -r)。
template<class T>
  [[nodiscard]] constexpr T rotr(T x, int s) noexcept;
约束：T是无符号整数类型（3.9.1 [basic.fundamental]）。令r为s％N。

返回：如果r为0，则x; 如果r是正数，(x >> r) | (x << (N - r)); 如果r为负，rotl(x, -r)。

std::popcount还添加了A 来计算1位的数量：如何计算32位整数中的设置位的数量？

Question 7

使用标准位集如何像这样...

#include <bitset> 
#include <iostream> 

template <std::size_t N> 
inline void 
rotate(std::bitset<N>& b, unsigned m) 
{ 
   b = b << m | b >> (N-m); 
} 

int main() 
{ 
   std::bitset<8> b(15); 
   std::cout << b << '\n'; 
   rotate(b, 2); 
   std::cout << b << '\n'; 

   return 0;
}

HTH，

Question 8

如果x是8位值，则可以使用以下代码：

x=(x>>1 | x<<7);

Question 9

详细地，您可以应用以下逻辑。

如果位模式为33602（整数）

1000 0011 0100 0010

并且您需要使用2个右shif进行滚动，然后：首先复制位模式，然后向左移位：长度-右移，即长度为16右移值为2 16-2 = 14

左移14次后，您会得到。

1000 0000 0000 0000

现在，将值33602右移2次。你得到

0010 0000 1101 0000

现在，在14左移值和2右移值之间进行“或”运算。

1000 0000 0000 0000
0010 0000 1101 0000
===================
1010 0000 1101 0000
===================

这样您便获得了转移后的展期价值。请记住，按位运算速度更快，这甚至不需要任何循环。

Question 10

假设您要按L位右移，并且输入x是一个带N位的数字：

unsigned ror(unsigned x, int L, int N) 
{
    unsigned lsbs = x & ((1 << L) - 1);
    return (x >> L) | (lsbs << (N-L));
}

Question 11

正确答案如下：

#define BitsCount( val ) ( sizeof( val ) * CHAR_BIT )
#define Shift( val, steps ) ( steps % BitsCount( val ) )
#define ROL( val, steps ) ( ( val << Shift( val, steps ) ) | ( val >> ( BitsCount( val ) - Shift( val, steps ) ) ) )
#define ROR( val, steps ) ( ( val >> Shift( val, steps ) ) | ( val << ( BitsCount( val ) - Shift( val, steps ) ) ) )

Question 12

源代码x位号

int x =8;
data =15; //input
unsigned char tmp;
for(int i =0;i<x;i++)
{
printf("Data & 1    %d\n",data&1);
printf("Data Shifted value %d\n",data>>1^(data&1)<<(x-1));
tmp = data>>1|(data&1)<<(x-1);
data = tmp;  
}

Question 13

另一个建议

template<class T>
inline T rotl(T x, unsigned char moves){
    unsigned char temp;
    __asm{
        mov temp, CL
        mov CL, moves
        rol x, CL
        mov CL, temp
    };
    return x;
}

Question 14

以下是DídacPérez的答案的稍有改进的版本，实现了两个方向，并使用无符号char和无符号long long值演示了这些函数的用法。几点注意事项：

内联函数以进行编译器优化
我用了 cout << +value技巧来简洁地输出在这里找到的数字形式的无符号字符：https：//stackoverflow.com/a/28414758/1599699
我建议使用显式 <put the type here>为了清楚和安全起见，语法。
由于在这里的“其他详细信息”部分中找到了什么，因此我将unsigned char用于shiftNum参数：

如果加法表达式为负或加法表达式大于或等于（提升）移位表达式中的位数，则移位操作的结果不确定。

这是我正在使用的代码：

#include <iostream>

using namespace std;

template <typename T>
inline T rotateAndCarryLeft(T rotateMe, unsigned char shiftNum)
{
    static const unsigned char TBitCount = sizeof(T) * 8U;

    return (rotateMe << shiftNum) | (rotateMe >> (TBitCount - shiftNum));
}

template <typename T>
inline T rotateAndCarryRight(T rotateMe, unsigned char shiftNum)
{
    static const unsigned char TBitCount = sizeof(T) * 8U;

    return (rotateMe >> shiftNum) | (rotateMe << (TBitCount - shiftNum));
}

void main()
{
    //00010100 == (unsigned char)20U
    //00000101 == (unsigned char)5U == rotateAndCarryLeft(20U, 6U)
    //01010000 == (unsigned char)80U == rotateAndCarryRight(20U, 6U)

    cout << "unsigned char " << 20U << " rotated left by 6 bits == " << +rotateAndCarryLeft<unsigned char>(20U, 6U) << "\n";
    cout << "unsigned char " << 20U << " rotated right by 6 bits == " << +rotateAndCarryRight<unsigned char>(20U, 6U) << "\n";

    cout << "\n";


    for (unsigned char shiftNum = 0U; shiftNum <= sizeof(unsigned char) * 8U; ++shiftNum)
    {
        cout << "unsigned char " << 21U << " rotated left by " << +shiftNum << " bit(s) == " << +rotateAndCarryLeft<unsigned char>(21U, shiftNum) << "\n";
    }

    cout << "\n";

    for (unsigned char shiftNum = 0U; shiftNum <= sizeof(unsigned char) * 8U; ++shiftNum)
    {
        cout << "unsigned char " << 21U << " rotated right by " << +shiftNum << " bit(s) == " << +rotateAndCarryRight<unsigned char>(21U, shiftNum) << "\n";
    }


    cout << "\n";

    for (unsigned char shiftNum = 0U; shiftNum <= sizeof(unsigned long long) * 8U; ++shiftNum)
    {
        cout << "unsigned long long " << 3457347ULL << " rotated left by " << +shiftNum << " bit(s) == " << rotateAndCarryLeft<unsigned long long>(3457347ULL, shiftNum) << "\n";
    }

    cout << "\n";

    for (unsigned char shiftNum = 0U; shiftNum <= sizeof(unsigned long long) * 8U; ++shiftNum)
    {
        cout << "unsigned long long " << 3457347ULL << " rotated right by " << +shiftNum << " bit(s) == " << rotateAndCarryRight<unsigned long long>(3457347ULL, shiftNum) << "\n";
    }

    cout << "\n\n";
    system("pause");
}

Question 15

--- Substituting RLC in 8051 C for speed --- Rotate left carry
Here is an example using RLC to update a serial 8 bit DAC msb first:
                               (r=DACVAL, P1.4= SDO, P1.5= SCLK)
MOV     A, r
?1:
MOV     B, #8
RLC     A
MOV     P1.4, C
CLR     P1.5
SETB    P1.5
DJNZ    B, ?1

Here is the code in 8051 C at its fastest:
sbit ACC_7  = ACC ^ 7 ; //define this at the top to access bit 7 of ACC
ACC     =   r;
B       =   8;  
do  {
P1_4    =   ACC_7;  // this assembles into mov c, acc.7  mov P1.4, c 
ACC     <<= 1;
P1_5    =   0;
P1_5    =   1;
B       --  ; 
    } while ( B!=0 );
The keil compiler will use DJNZ when a loop is written this way.
I am cheating here by using registers ACC and B in c code.
If you cannot cheat then substitute with:
P1_4    =   ( r & 128 ) ? 1 : 0 ;
r     <<=   1;
This only takes a few extra instructions.
Also, changing B for a local var char n is the same.
Keil does rotate ACC left by ADD A, ACC which is the same as multiply 2.
It only takes one extra opcode i think.
Keeping code entirely in C keeps things simpler sometimes.

Question 16

重载功能：

unsigned int rotate_right(unsigned int x)
{
 return (x>>1 | (x&1?0x80000000:0))
}

unsigned short rotate_right(unsigned short x) { /* etc. */ }

Question 17

#define ROTATE_RIGHT(x) ( (x>>1) | (x&1?0x8000:0) )