RISC-V doesn't have such an instruction but there are cores that do macro-op fusion of just that sequence.
And with hard-coded immediates xor+sub also ends up at twice the code size as shl+shr, so there's some trade-off. (but yeah if code size isn't a concern, xor+sub wins out)
I would really like to see single operand operators, similar to 'i++'. '!!i' to do 'i~=i', '<<i' to do 'i=i<<1'.
The 'rep' instructions are also nice: https://www.felixcloutier.com/x86/rep:repe:repz:repne:repnz
Imagine doing something like this in C or Rust: 'int i=10;rep i printf("called %u times", i);' where rep would store the value of i in rcx, sets it to zero and stores in rax and jmp's to whatever function or codeblock you specified (could be inline code, or lambda expression), rcx (i) times, passing 'i''s current value optionally to the target code block. It would essentially be a shorthand form of 'for(int i=0;i<10;i++){printf("called %u times",i);}' except it's easier to use for simpler constructs like 'rep 8 <<i;' (just an example, you can just do 'i = i << 8;') if you combine it with my earlier proposed left shift operator.
The high level abstraction for xor r64, r64 is foo = 0. High level abstraction for sign/zero extension is casting to a larger type.
When you need sign extension or zero extension for conversion between standard integer types, you have to write only the type casting operator.
Having to use any of the inefficient tricks presented in the parent article is necessary only when you do not declare the correct types for your variables.
An unsigned has no sign bit, so the left shift just needs to be unsigned to make it "technically correct".
(Remember to not use smaller than int types though, due to integer promotion issues)
So like is
struct test_struct {
uint8_t a: 1;
uint8_t b: 1;
uint8_t x: 6;
};
xxxxxxba or abxxxxxx?ie. to get to member 'a' do you mask 0x80 or 0x01?
The compiler will always choose the appropriate machine instructions for the target ISA, which may have dedicated bit field extraction and bit field insertion instructions that are more efficient than the equivalent sequences of masking, shifting and merging instructions.
Moreover, the compiler will handle any endianness correctly.
In networking applications, where structures created on a computer may need to be used on another computer, the communication protocol will always serialize any data to a known format and the protocol implementation will provide conversion procedures to and from the native data formats.
So the only place where one may be concerned about endianness is when writing a conversion function between a native data format and some format specified for communication or storage. In this case it is known precisely which are the endiannesses for the native CPU and for the standard storage or communication data format.
For the standard format, its specification, e.g. an Internet RFC, will specify exactly the layout of the bits. For the native data format, you do not need to know the order of the bit fields. You just assign data to the structure members and or you assign the structure members to other variables and the compiler will take care of the layout.
The difference between xxxxxxba and abxxxxxx is implementation defined, and changes with different architectures even with the same compiler.
That means in the "an Internet RFC [...] exactly [specified] the bits" case, you actually need multiple implementations for different architectures to be portable.
So waht you end up having is
struct test_struct {
#if ARCH_BIT_ENDIANNESS == BIG
uint8_t a: 1;
uint8_t b: 1;
uint8_t x: 6;
#else
uint8_t x: 6;
uint8_t b: 1;
uint8_t a: 1;
#endif
};
for every struct for portable code that has to use bitfields with an externally defined bit pattern.And it's not that goofy of archs that do it the opposite way of x86. PowerPC is a good example.
int sign_extend(int val_11b) {
struct { int v : 11; } t = { val_11b };
return t.v;
}
in Compiler Explorer produces pretty much the same x86-64 assembly as the first function in the post (the shift left, then shift right version) under GCC, Clang, and MSVC when optimizations are turned on.https://godbolt.org/z/d3Kf9fsE6
(But I do love that xor variant; that's really clever and clean.)
I love when people say this as if there's exactly one compiler with a fixed implementation for whatever opt pass.
The last one is definitely nice though!
fn signExtend(raw: u11) i32 {
return @as(i11, @bitCast(raw));
}
test "signExtend" {
try expectEqual(1023, signExtend(1023));
try expectEqual(-1, signExtend(2047));
}
fn sign_extend_u11(x: u32) -> u32 {
(((x as i32) << (32-11)) >> (32-11)) as u32
}
Doesn't have any of the C++ issues he mentions. And it will be faster than the alternative since it's just two instructions. (Ok this is never going to matter in practice but still...) x ^ 0x400 - 0x400