clang 8.0.0 introduces support for the char8_t type from c++20. However, I would expect the following functions to have the same compiler output

#include <algorithm>

bool compare4(char const* pcha, char const* pchB, int n) {
    return std::equal(pcha, pcha+4, pchB);

bool compare4(char8_t const* pchA, char8_t const* pchB, int n) {
    return std::equal(pchA, pchA+4, pchB);

However, they compile under -std=c++2a -O2 to

compare4(char const*, char const*, int):   # @compare4(char const*, char const*, int)
        mov     eax, dword ptr [rdi]
        cmp     eax, dword ptr [rsi]
        sete    al
_Z8compare4PKDuS0_i:                       # @_Z8compare4PKDuS0_i
        mov     al, byte ptr [rdi]
        cmp     al, byte ptr [rsi]
        jne     .LBB1_4
        mov     al, byte ptr [rdi + 1]
        cmp     al, byte ptr [rsi + 1]
        jne     .LBB1_4
        mov     al, byte ptr [rdi + 2]
        cmp     al, byte ptr [rsi + 2]
        jne     .LBB1_4
        mov     al, byte ptr [rdi + 3]
        cmp     al, byte ptr [rsi + 3]
        sete    al
        xor     eax, eax

in wich the latter is clearly less optimized. Is there a reason for this (I couldn't find any in the standard) or is this a bug in clang?

    Looks like a flaw/missing optimization. GCC produces the same code for both functions. – NathanOliver Apr 16 at 14:10
  • Do note that char8_t is actually a unsigned char, but even making that change to the first function the compiler still optimizes the code so it can't be that: godbolt.org/z/uUhMkW – NathanOliver Apr 16 at 14:19
    I have a suspicion that it might have to do with the fact that char8_t is a library-provided type which compiler has no intrinsic understanding of. – SergeyA Apr 16 at 14:20
    Not completely related, but interesting nonetheless: Adding -stdlib=libc++ to the compile options in godbolt have the same compiler output, however in the less optimized way (it seems like godbolt uses libstdc++ by default). See here. So apparently the standard library version seems to matter as well – andreee Apr 16 at 14:21
    Note also the author's comment: "This implementation is experimental, and will be removed or revised substantially to match the proposal as it makes its way through the C++ committee." – andreee Apr 16 at 14:27
  1. In libstdc++, std::equal calls __builtin_memcmp when it detects that the arguments are "simple", otherwise it uses a naive for loop. "Simple" here means pointers (or certain iterator wrappers around pointer) to the same integer or pointer type.(relevant source code)

    • Whether a type is an integer type is detected by the internal __is_integer trait, but libstdc++ 8.2.0 (the version used on godbolt.org) does not specialize this trait for char8_t, so the latter is not detected as an integer type.(relevant source code)
  2. Clang (with this particular configuration) generates more verbose assembly in the for loop case than in the __builtin_memcmp case. (But the former is not necessarily less optimized in terms of performance. See Loop_unrolling.)

So there's a reason for this difference, and it's not a bug in clang IMO.

  • I completely agree with your first point, thanks for pointing out the relevant source code! But I don't agree with your second point: The short assembly is comparing four bytes in a single instruction, so that is even more loop-unrolled than the simple unrolling from the verbose assembly. – Tobi Apr 17 at 8:48
  • @Tobi Good point – cpplearner Apr 17 at 9:16

This is not a "bug" in Clang; merely a missed opportunity for optimization.

You can replicate the Clang compiler output by using the same function taking an enum class whose underlying type is unsigned char. By contrast, GCC recognizes a difference between an enumerator with an underlying type of unsigned char and char8_t. It emits the same code for unsigned char and char8_t, but emits more complex code for the enum class case.

So something about Clang's implementation of char8_t seems to think of it more as a user-defined enumeration than as a fundamental type. It's best to just consider it an early implementation of the standard.

It should be noted that one of the most important differences between unsigned char and char8_t is aliasing requirements. unsigned char pointers may alias with pretty much anything else. By contrast, char8_t pointers cannot. As such, it is reasonable to expect (on a mature implementation, not something that beats the standard it implements to market) different code to be emitted in different cases. The trick is that char8_t code ought to be more efficient if it's different, since the compiler no longer has to emit code that performs additional work to deal with potential aliasing from stores.

  • I don't think aliasing can affect std::equal in any way. – cpplearner Apr 17 at 7:45

