内存对齐之 alignof、alignas 、aligned_storage、align 剖析

fibonaccii

2022-01-01

Modern C++

关于内存对齐，有诸多好处，因此常常在分配内存时也会将内存对齐这一因素纳入考量。

这一节，来讲下内存对齐以及C++11中关于内存对齐引入的alignof、alignas、std::aligned_storage、std::align ，其中前两个为关键字，后两个分别为类和函数。

alignment

我们知道，C++中的内置的基础类型，比如char、int、float、double，在内存布局上都是按照其 sizeof 大小进行对齐（alignment）。

什么叫对齐？

比如，sizoef(int) 值为 4，如果满足内存对齐要求，那么int类型变量a的地址&a对4取余的结果应该是0。

下面提供一个编译期就能检测内存对齐的宏 CHECK_ALIGN：

#define CHECK_ALIGN(ptr, alignment)                       \
  do{                                                     \
    constexpr size_t status                               \
       = reinterpret_cast<uintptr_t>(ptr) % alignment;    \
    static_assert(status == 0, "ptr must be aligned");    \
  }while(0)                                               \

下面我们来校验内置类型的内存对齐大小确实等于其sizoef(T)值，demo如下。

int main(int argc, char const *argv[]) {
  char c;
  int i;
  long l ;
  float f;
  CHECK_ALIGN(&c, sizeof(c));
  CHECK_ALIGN(&i, sizeof(i));
  CHECK_ALIGN(&l, sizeof(l));
  CHECK_ALIGN(&f, sizeof(f));
  CHECK_ALIGN(&i, sizeof(l)); // 编译错误
  return 0;
}

上述demo中的 CHECK_ALIGN(&i, sizeof(l)); 会导致编译错误，因为int类型变量的内存对齐大小要求是4，而long在gcc下是8个字节，即sizoef(l)为8，故而编译失败。

到此，我相信你应该明白何为「内存对齐」了。

alignof

C++11引入的关键字alignof，可直接获取类型T的内存对齐要求。alignof的返回值类型是size_t，用法类似于sizeof。

下面先来看看alignof的用法。

#define SHOW_SIZEOF_AND_ALIGNOF(T)                                   \
  do {                                                               \
    std::cout << "sizeof(" << #T << "):\t" << sizeof(T) << ",\t"     \
              << "alignof(" << #T << "):\t" << alignof(T)            \
              << std::endl;                                          \
  }while(0)

int main(int argc, char const *argv[]) {
  SHOW_SIZEOF_AND_ALIGNOF(char);
  SHOW_SIZEOF_AND_ALIGNOF(int);
  SHOW_SIZEOF_AND_ALIGNOF(long);
  SHOW_SIZEOF_AND_ALIGNOF(float);
  SHOW_SIZEOF_AND_ALIGNOF(double);
  return 0;
}

输出如下，这也是符合前文关于基础类型内存对齐的论述。

$ g++ main.cc -o main && ./main
sizeof(char):   1,      alignof(char):  1
sizeof(int):    4,      alignof(int):   4
sizeof(long):   8,      alignof(long):  8
sizeof(float):  4,      alignof(float): 4
sizeof(double): 8,      alignof(double):8

好，到此我相信你已经对内存对齐和alignof有了基本了解。下面我们来看看类的内存对齐。

现在有类Foo：

struct Foo { 
  char c;
  int i1; 
  int i2;
  long l;
};

考虑下alignof(Foo)和sizeof(Foo)分别会是多少，即下面的demo会输出？？？

1
2
3

int main(int argc, char const *argv[]) {
  SHOW_SIZEOF_AND_ALIGNOF(Foo);
}

Think Again~~~~

1 2	$ g++ main.cc -o main && ./main sizeof(Foo): 24, alignof(Foo): 8

嗯？怎么会是这个结果？

为了更好地解释这个结果，我准备借助offsetof函数，来获取成员变量距离类起始地址的偏移量，其函数原型如下：

1 2	/* Offset of member MEMBER in a struct of type TYPE. */ #define offsetof(OBJECT_TYPE, MEMBER) __builtin_offsetof (OBJECT_TYPE, MEMBER)

好，现在看下如下代码，并猜测下输出？

int main(int argc, char const *argv[]) {
  std::cout << offsetof(Foo, c)  << '\n'
            << offsetof(Foo, i1) << '\n' 
            << offsetof(Foo, i2) << '\n'
            << offsetof(Foo, l)  << '\n'; 
}

输出如下：

$ g++ main.cc -o main && ./main
0   # c 的偏移量为 0 
4   # i1 的偏移量为 4， c  -> i1 中间填充了 3个字节，才满足 4 字节的内存对齐要求
8   # i2 的偏移量为 8,  i1 -> i2 无填充
16  # l 的偏移量为 16， i2 -> l  中间填充了4个字节，才满足8字节的内存对齐要求

好，到此，我准备基于这个输出来解释alignof了。

对于Foo而言，所谓内存对齐，即Foo中每个字段都要满足内存对齐。而内存对齐最严格（即对齐字节数最大）的字段满足了，其他的字段也就满足了。

假设现在有三个起始地址，分别是 0、1、4，我们来看看是否都能满足Foo中所有字段的内存对齐要求。

起始地址分别0、1、4，各个字段的地址如下三列。

struct Foo {   
  char c;     // 0  |  1  |  4
  int i1;     // 4  |  5  |  8
  int i2;     // 8  |  9  |  12 
  long l;     // 16 |  17 |  20
};

从上面的右侧三列结果可以看出，只有起始地址为0（8的整倍数）的恰好能满足所有字段内存对齐的要求。因此，alignof(Foo)输出为8。

alignas

上面讲述的内存对齐要求都是默认情况下的，有时候考虑到cacheline、以及向量化操作，可能会需要改变一个类的alignof值。

怎么办？

在C++11之前，需要依赖靠编译器的扩展指令，C++11之后可以借助alignas关键字。

比如，在C++11之前，gcc实现 alignas(alignment) 效果的方式为 __attribute__((__aligned__((alignment)))

仍然以上述的Foo为例子，不过此时你希望Foo对象的起始地址总是32的倍数，C++11之后借助alignas关键字，可以如下操作：

struct alignas(32) Foo { 
  Foo() { std::cout << this << std::endl; }
  char c;
  int i1; 
  int i2;
  long l;
};

int main(int argc, char const *argv[]) {
  Foo foo;
  CHECK_ALIGN(&foo, alignof(foo));
  SHOW_SIZEOF_AND_ALIGNOF(Foo);
  return 0;
}

输出如下：

1
2
3

$ g++ main.cc -o main && ./main
0x16d6f34e0
sizeof(Foo):    32,     alignof(Foo):   32

说完alignas的基础用法，下面说下使用alignas时的注意事项，即alignas(alignment)中的alignment也不是随意写的，对于类型T，需要满足如下两个条件。

1. alignment >= alignof(T)

仍然以Foo为例，在没有alignas修饰时，默认的Foo的内存对齐要求alignof(Foo)为8，现在尝试使用alignas让Foo的对齐要求为4，操作如下：

struct alignas(4) Foo { 
  char c;
  int i1;
  int i2;
  long l; 
};

此时 SHOW_SIZEOF_AND_ALIGNOF(Foo);的输出

1 2	$ g++ main.cc -o main && ./main sizeof(Foo): 24, alignof(Foo): 8

可以看出，此时的alignas是失效的，在其他编译器下也许直接编译失败。

2. alignment == pow(2, N)

即alignas 指定的大小alignment必须是2的正数幂（N>0），否则也是失效，在有些编译器下也许直接编译失败。

仍然以Foo为例子，

struct alignas(9) Foo { 
  char c;
  int i1;
  int i2;
  long l; 
};

编译如下：

$ g++ main.cc -o main && ./main
main.cc:20:19: error: requested alignment '9' is not a positive power of 2
   20 | struct alignas(9) Foo {
      |                   ^~~

好，到此，我想你应该大致理解了alignof和alignas两个关键字，更多用法可以参`cpprefernece。

std::aligned_storage

在C++11中，也引入了一个满足内存对齐要求的静态内存分配类std::aligned_storage，其类模板原型如下：

// in <type_traits>
template< std::size_t Len, 
          std::size_t Align = /*default-alignment*/ >
struct aligned_storage;

类 std::aligned_storage对象构造完成时，即分配了长度为Len个字节的内存，且该内存满足大小为 Align 的对齐要求。

下面，我们先来看看 cpprefernece 给的一个demo，来熟悉下怎么使用std::aligned_storage。

类 StaticVector ，是一个满足内存对齐要求的静态数组，模板参数T是元素类型，N是数组元素个数。

template<typename T, size_t N>
class StaticVector {
public:
    StaticVector() { 
      std::cout << alignof(T) << "/" << sizeof(T)<< std::endl;
      for (int idx = 0; idx < N; ++idx) { 
        std::cout << &data[idx] << std::endl;
      }
    }
    
    ~StaticVector() {
      for(size_t pos = 0; pos < m_size; ++pos) {
        reinterpret_cast<T*>(data+pos)->~T();
      }
    }

    template<typename ...Args> 
    void emplace_back(Args&&... args) {
      if(m_size >= N) {
        throw std::bad_alloc{};
      }
      new(data+m_size) T(std::forward<Args>(args)...);
      ++m_size;
    }
 
    const T& operator[](size_t pos) const {
      return *reinterpret_cast<const T*>(data+pos);
    }
 
private:
  // std::aligned_storage<sizeof(T), alignof(T)>::type data[N]; // C++11
  std::aligned_storage_t<sizeof(T), alignof(T)> data[N];        // c++14
  size_t m_size = 0;
};

类StaticVector的使用如下：

struct alignas(32) Foo { 
  char c;
  int i1; 
  int i2;
  long l;
};

int main(int argc, char const *argv[]) {
    StaticVector<std::string, 2> v1;
    v1.emplace_back(5, '*');
    std::cout << v1[0] << '\n';

    StaticVector<Foo, 2>v2;
}

在输出前，我们预测下：

std:::string 的alignof值是8，那么StaticVector分配的两个std::string对象地址，都应该是8的倍数
Foo的alignof值是32，那么StaticVector为Foo 分配的两个Foo对象地址，都是32的倍数，

好，现在我们来看下输出：

$ g++ align_stroe.cc -o as && ./as
8/32
0x16b5734c0
0x16b5734e0
*****
32/32
0x16b573470
0x16b573490

所以，到此，你也许理解了std::aligned_storage 中aligned的含义，即每个对象都是经过内存对齐的。

熟悉了std::aligned_storage 的用法，现在来看看他的实现叭，毕竟没人愿意只做个调包侠（滑稽脸）。

// in std namespace;
template <std::size_t _Len>
struct __aligned_storage_msa {
  union __type {
    unsigned char __data[_Len];
    struct __attribute__((__aligned__)) { } __align;
  };
};

template <std::size_t _Len, 
	      std::size_t _Align = alignof(typename __aligned_storage_msa<_Len>::__type)>
struct aligned_storage {
  union type {
    unsigned char __data[_Len];
    struct alignas(_Align) { } __align;
  };
};

在 std::aligned_storage 内部，是通过一个union来实现的：

unsigned char __data[_Len];：这一行保证了分配的内存大小是_Len个字节
struct alignas(_Align) { } __align; ：这一行保证了分配的内存是按照Align 大小进行对齐的。

其中，第二点很好理解：

int main(int argc, char const *argv[]) {
    char data[16];
    alignas(16) char aligned_data[16];
    std::cout << "unaligned: "<< alignof(data) << ", aligned: " << alignof(aligned_data) << std::endl;
}

输出如下：

1	unaligned: 1, aligned: 16

因此，如果只有unsigned char __data[_Len]；，无法保证内存对齐，需要struct alignas(_Align) { } __align的辅助。

最后再提下 std::__aligned_storage_msa的必要性：在构造类std::aligned_storage对象时，如果没有指定类的第二个模板参数_Align，即内存对齐大小，由std::__aligned_storage_msa为你设置默认的内存对齐大小。

可以看出，在 std::__aligned_storage_msa 的实现中，__attribute__((__aligned__)) 后面是没有参数的，此时gcc即会根据平台生成默认内存对齐大小。

int main(int argc, char const *argv[]) {
    std::cout << alignof(std::__aligned_storage_msa<sizeof(1)>::__type) << std::endl;
    std::cout << alignof(std::__aligned_storage_msa<sizeof(4)>::__type) << std::endl;
    std::cout << alignof(std::__aligned_storage_msa<sizeof(16)>::__type) << std::endl;
    std::cout << alignof(std::__aligned_storage_msa<sizeof(32)>::__type) << std::endl;
}

输出如下：

$ g++ align_stroe.cc -o as && ./as
16
16
16
16

这个大小就是gcc编译器默认的内存大小。

std::align

类std::aligned_storage 是一个静态的内存对齐分配器，即在类std::aligned_storage对象构造完时，就已满足设定内存大小、内存对齐要求，但是如果现在有一块内存，想从中取出一块符合某对齐要求的内存，咋办？

此时就可以使用std::align函数，其函数原型如下：

/// @param  alignment 是想要分配的内存符合的内存对齐大小
/// @param  size 想要分配内存的大小
/// @param  ptr 是个输入输出参数，输入时指向待使用的内存，输出时调整为符合alignment对齐要求的内存地址
/// @param  space 是ptr指向的内存剩余的空间
/// @return 如果 ptr 经过调整后能满足大小为 alignment 的对齐要求，则返回ptr的值，否则返回 nullptr
void* align( std::size_t alignment,
             std::size_t size,
             void*& ptr,
             std::size_t& space);

下面，我们继续先来看看 cpprefernece 中提供的一个demo，熟悉下怎么使用std::align这个函数。

类Arena内已有一块缓冲区buffer，每次调用AlignedAllocate<T>(size_t alignment)函数时，即需要从buffer中取出大小为sizeof(T)的一块内存ptr，AlignedAllocate函数的输入参数alignment指定了获得的内存ptr满足的内存对齐要求。

现在来看看实现。

template <size_t N>
struct Arena {
  char buffer[N];
  void* ptr;
  size_t size;

  Arena() : ptr(buffer), size(N) { }
	
  /// @return 返回的指针满足大小为 alignment 的内存对齐要求
  template <typename T>
  T* AlignedAllocate(size_t alignment = alignof(T)) {
      std::cout << "ptr: " << reinterpret_cast<void*>(ptr) << ", ";
      if (std::align(alignment, sizeof(T), ptr, size)) {
          T* result = reinterpret_cast<T*>(ptr);
          ptr = (char*)ptr + sizeof(T);
          size -= sizeof(T);
          return result;
      }
      // 若无，则返回 nullptr
      return nullptr;
  }
};

下面是测试。

int main(int argc, char const *argv[]) {
    Arena<64> arena;

    char* p1 = arena.AlignedAllocate<char>();
    if (p1) *p1 = 'a';
    std::cout << "allocated a char at " << (void*)p1 << '\n';
 
    int* p2 = arena.AlignedAllocate<int>();
    if (p2) *p2 = 1;
    std::cout << "allocated an int at " << (void*)p2 << '\n';
 
    int* p3 = arena.AlignedAllocate<int>(32);
    if (p3) *p3 = 2;
    std::cout << "allocated an int at " << (void*)p3 << '\n';
}

从下面的输出可以看出，AlignedAllocate 函数返回的内存地址都是符合设定的内存对齐要求的。

$ g++ align.cc -o align && ./align 
ptr: 0x16fc2b4b8, allocated a char at 0x16fc2b4b8     # 1 byte 内存对齐，指针无须调整
ptr: 0x16fc2b4b9, allocated an int at 0x16fc2b4bc     # 4 byte 内存对齐，指针调整了 3 个字节
ptr: 0x16fc2b4c0, allocated an int at 0x16fc2b4c0     # 32 byte 内存对齐，指针无须调整

最后，我们再来看看std::align函数的实现，稍微简化后如下。

// in <memory>
inline void* align(size_t __align, size_t __size, void *&__ptr, size_t &__space) noexcept {
  const auto __intptr = reinterpret_cast<uintptr_t>(__ptr);
  const auto __aligned = (__intptr - 1u + __align) & -__align;
  const auto __diff = __aligned - __intptr;
  // 如果没有剩余的空间，直接返回 nullptr
  if ((__size + __diff) > __space)
    return nullptr;
   __space -= __diff;
   return __ptr = reinterpret_cast<void *>(__aligned);
}

std::align的实现里，最为关键的一步，即计算对齐后的地址：

1	const auto __aligned = (__intptr - 1u + __align) & -__align;

对于这一步，本来想写个证明啥的，还是举个例子来解释比较通俗。

按照__align大小进行内存对齐，即可视为按__align进制向上取整。

什么意思呢？

比如说，现在按照10进制对齐，有地址12，想让12向上调整到10的倍数，怎么做？

先加上一个步长：12 + 10 - 1 = 21
将余数1清掉：21 & (-10) = 20。这一步中，-10 的本质就是保证高位不变，将低位全部变为0，取&之后，取余就全部清理了。

现在的内存对齐，本质上也是向上取整：__intptr - 1u + __align是为了向前一个步长，再对 -__align取&，来清除余数。

关于内存对齐，很多项目里都有涉及，最近在阅读RocksDB也再次遇到，于是乎就找了个契机写下了这篇博客，后续会尝试更新RocksDB。