clang 8.0.0 introduces support for the char8_t type from c++20. However, I would expect the following functions to have the same compiler output

#include <algorithm>

bool compare4(char const* pcha, char const* pchB, int n) {
    return std::equal(pcha, pcha+4, pchB);

bool compare4(char8_t const* pchA, char8_t const* pchB, int n) {
    return std::equal(pchA, pchA+4, pchB);

However, they compile under -std=c++2a -O2 to

compare4(char const*, char const*, int):   # @compare4(char const*, char const*, int)
        mov     eax, dword ptr [rdi]
        cmp     eax, dword ptr [rsi]
        sete    al
_Z8compare4PKDuS0_i:                       # @_Z8compare4PKDuS0_i
        mov     al, byte ptr [rdi]
        cmp     al, byte ptr [rsi]
        jne     .LBB1_4
        mov     al, byte ptr [rdi + 1]
        cmp     al, byte ptr [rsi + 1]
        jne     .LBB1_4
        mov     al, byte ptr [rdi + 2]
        cmp     al, byte ptr [rsi + 2]
        jne     .LBB1_4
        mov     al, byte ptr [rdi + 3]
        cmp     al, byte ptr [rsi + 3]
        sete    al
        xor     eax, eax

in wich the latter is clearly less optimized. Is there a reason for this (I couldn't find any in the standard) or is this a bug in clang?

  • 3
    Looks like a flaw/missing optimization. GCC produces the same code for both functions. – NathanOliver Apr 16 at 14:10
  • Do note that char8_t is actually a unsigned char, but even making that change to the first function the compiler still optimizes the code so it can't be that: godbolt.org/z/uUhMkW – NathanOliver Apr 16 at 14:19
  • 1
    I have a suspicion that it might have to do with the fact that char8_t is a library-provided type which compiler has no intrinsic understanding of. – SergeyA Apr 16 at 14:20
  • 1
    Not completely related, but interesting nonetheless: Adding -stdlib=libc++ to the compile options in godbolt have the same compiler output, however in the less optimized way (it seems like godbolt uses libstdc++ by default). See here. So apparently the standard library version seems to matter as well – andreee Apr 16 at 14:21
  • 3
    Note also the author's comment: "This implementation is experimental, and will be removed or revised substantially to match the proposal as it makes its way through the C++ committee." – andreee Apr 16 at 14:27
  1. In libstdc++, std::equal calls __builtin_memcmp when it detects that the arguments are "simple", otherwise it uses a naive for loop. "Simple" here means pointers (or certain iterator wrappers around pointer) to the same integer or pointer type.(relevant source code)

    • Whether a type is an integer type is detected by the internal __is_integer trait, but libstdc++ 8.2.0 (the version used on godbolt.org) does not specialize this trait for char8_t, so the latter is not detected as an integer type.(relevant source code)
  2. Clang (with this particular configuration) generates more verbose assembly in the for loop case than in the __builtin_memcmp case. (But the former is not necessarily less optimized in terms of performance. See Loop_unrolling.)

So there's a reason for this difference, and it's not a bug in clang IMO.

  • I completely agree with your first point, thanks for pointing out the relevant source code! But I don't agree with your second point: The short assembly is comparing four bytes in a single instruction, so that is even more loop-unrolled than the simple unrolling from the verbose assembly. – Tobi Apr 17 at 8:48
  • @Tobi Good point – cpplearner Apr 17 at 9:16

This is not a "bug" in Clang; merely a missed opportunity for optimization.

You can replicate the Clang compiler output by using the same function taking an enum class whose underlying type is unsigned char. By contrast, GCC recognizes a difference between an enumerator with an underlying type of unsigned char and char8_t. It emits the same code for unsigned char and char8_t, but emits more complex code for the enum class case.

So something about Clang's implementation of char8_t seems to think of it more as a user-defined enumeration than as a fundamental type. It's best to just consider it an early implementation of the standard.

It should be noted that one of the most important differences between unsigned char and char8_t is aliasing requirements. unsigned char pointers may alias with pretty much anything else. By contrast, char8_t pointers cannot. As such, it is reasonable to expect (on a mature implementation, not something that beats the standard it implements to market) different code to be emitted in different cases. The trick is that char8_t code ought to be more efficient if it's different, since the compiler no longer has to emit code that performs additional work to deal with potential aliasing from stores.

  • I don't think aliasing can affect std::equal in any way. – cpplearner Apr 17 at 7:45

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.