## C++14 实现编译期反射文/祁宇 >本文将通过分析 magic _ get 源码来介绍 magic _ get 实现的关键技术，深入解析实现 pod 类型反射的原理。 ### pod 类型编译期反射反射是一种根据元数据来获取类内部信息的机制，通过元数据就可以获取对象的字段和方法等信息。C# 和 Java 的反射机制都是通过获取对象的元数据来实现的。反射可以用于依赖注入、ORM 对象-实体映射、序列化和反序列化等与对象本身信息密切相关的领域。比如 Java 的 Spring 框架，其依赖注入的基础是建立在反射的基础之上的，可以根据元数据获取类型的信息并动态创建对象。ORM 对象-实体之间的映射也是通过反射实现的。Java 和 C# 都是基于中间运行时的语言，中间运行时提供了反射机制，所以反射对于运行时语言来说很容易，但是对于没有中间运行时的语言，要想实现反射是很困难的。在2016年的 CppCon 技术大会上，Antony Polukhin 做了一个关于 C++ 反射的演讲，他提出了一个实现反射的新思路，即无需使用宏、标记和额外的工具即可实现反射。看起来似乎是一件不可能完成的任务，因为 C++ 是没有反射机制的，无法直接获取对象的元信息。但是 Antony Polukhin 发现对 pod 类型使用 Modern C++ 的模版元技巧可以实现这样的编译期反射。他开源了一个 pod 类型的编译期反射库 magic _ get（https://github.com/apolukhin/magic_get），这个库也准备进入 boost。我们来看看 magic _ get 的使用示例。 ``` #include struct foo { int some_integer; char c; }; foo f {777, '!'}; auto& r1 = boost::pfr::flat_get<0>(f); //通过索引来访问对象foo的第1个字段 auto& r2 = boost::pfr::flat_get<1>(f); //通过索引来访问对象foo的第2个字段 ``` 通过这个示例可以看到，magic _ get 确实实现了非侵入式访问 foo 对象的字段，不需要写任何宏、额外的代码以及专门的工具，直接在编译期就可以访问 pod 对象的字段，没有运行期负担，确实有点 magic。本文将通过分析 magic _ get 源码来介绍 magic _ get 实现的关键技术，深入解析实现 pod 类型反射的原理。 ### 关键技术实现 pod 类型反射的思路是这样的：先将 pod 类型转换为对应的 tuple 类型，接下来将 pod 类型的值赋给 tuple，然后就可以通过索引去访问 tuple 中的元素了。所以实现 pod 反射的关键就是如何将 pod 类型转换为对应的 tuple 类型和 pod 值赋值给 tuple。 #### pod 类型转换为 tuple 类型 pod 类型对应的 tuple 类型是什么样的呢？以上面的 foo 为例，foo 对应的 tuple 应该是 `tuple`，即 tuple 中的元素类型和顺序和 pod 类型中的字段完全一一对应。根据结构体生成一个 tuple 的基本思路是，按顺序将结构体中每个字段的类型萃取出来并保存起来，后面再取出来生成对应的 tuple 类型。然而字段的类型是不同的，C++ 也没有一个能直接保存不同类型的容器，因此需要一个变通的方法，用一个间接的方法来保存萃取出来的字段类型，即将类型转换为一个 size _ t 类型的 id，将这个 id 保存到一个 `array` 中，后面根据这个 id 来获取实际的 type 并生成对应的 tuple 类型。这里需要解决的一个问题是如何实现类型和 id 的相互转换。 #### type 和 id 在编译期相互转换先借助一个空的模版类用来保存实际的类型，再借助 C++ 14 的 constexpr 特性，在编译期返回某个类型对应的编译期 id，就可以实现 type 转换为 id 了。具体代码如下： ``` http://ipad-cms.csdn.net/cms/article/code/3445 ``` 上面的代码在编译期将类型 int 和 char 做了一个编码，将类型转换为一个具体的编译期常量，后面就可以根据这些编译期常量来获取对应的具体类型。编译期根据 id 获取 type 的代码如下： ``` constexpr auto id_to_type( std::integral_constant ) noexcept { int res{}; return res; } constexpr auto id_to_type( std::integral_constant ) noexcept { char res{}; return res; } ``` 上面的代码中 id _ to _ type 返回的是 id 对应的类型的实例，如果要获取 id 对应的类型还需要通过 decltype 推导出来。magic _ get 通过一个宏将 pod 基本类型都做了一个编码，以实现 type 和 id 在编译期的相互转换。 ``` #define REGISTER_TYPE(Type, Index) \ constexpr std::size_t type_to_id(identity) noexcept { return Index; } \ constexpr auto id_to_type( std::integral_constant ) noexcept { Type res{}; return res; } \ // Register all base types here REGISTER_TYPE(unsigned short , 1) REGISTER_TYPE(unsigned int , 2) REGISTER_TYPE(unsigned long long , 3) REGISTER_TYPE(signed char , 4) REGISTER_TYPE(short , 5) REGISTER_TYPE(int , 6) REGISTER_TYPE(long long , 7) REGISTER_TYPE(unsigned char , 8) REGISTER_TYPE(char , 9) REGISTER_TYPE(wchar_t , 10) REGISTER_TYPE(long , 11) REGISTER_TYPE(unsigned long , 12) REGISTER_TYPE(void* , 13) REGISTER_TYPE(const void* , 14) REGISTER_TYPE(char16_t , 15) REGISTER_TYPE(char32_t , 16) REGISTER_TYPE(float , 17) REGISTER_TYPE(double , 18) REGISTER_TYPE(long double , 19) ``` 将类型编码之后，保存在哪里以及如何取出来是接着要解决的问题。magic _ get 通过定义一个 array 来保存结构体字段类型 id。 ``` template struct array { typedef T type; T data[N]; static constexpr std::size_t size() noexcept { return N; } }; ``` array 中的定长数组 data 中保存字段类型对应的 id，数组下标就是字段在结构体中的位置索引。 #### 萃取 pod 结构体字段前面介绍了如何实现字段类型的保存和获取，那么这个字段类型是如何从 pod 结构体中萃取出来的呢？具体的做法分为三步： - 定义一个保存字段类型 id 的 array； - 将 pod 的字段类型转换为对应的 id，按顺序保存到 array 中； - 筛除 array 中多余的部分。下面是具体实现代码： ``` template constexpr auto fields_count_and_type_ids_with_zeros() noexcept { static_assert(std::is_trivial::value, "Not applyable"); array types{}; detect_fields_count_and_type_ids(types.data, std::make_index_sequence{}); return types; } template constexpr auto array_of_type_ids() noexcept { constexpr auto types = fields_count_and_type_ids_with_zeros(); constexpr std::size_t count = count_nonzeros(types); array res{}; for (std::size_t i = 0; i < count; ++i) { res.data[i] = types.data[i]; } return res; } ``` 定义 array 时需要定义一个固定的数组长度，长度为多少合适呢？应按结构体最多的字段数来确定。因为结构体的字段数最多为 sizeof(T)，所以 array 的长度设置为 sizeof(T)。array 中的元素全部初始化为0。一般情况下，结构体字段数一般不会超过 array 的长度，那么 array 中就就会出现多余的元素，所以还需要将 array 中多余的字段移除，只保存有效的字段类型 id。具体的做法是计算出 array 中非零的元素有多少，接着再把非零的元素赋给一个新的 array。下面是计算 array 非零元素个数，同样是借助 constexpr 实现编译期计算。 ``` template constexpr auto count_nonzeros(Array a) noexcept { std::size_t count = 0; for (std::size_t i = 0; i < Array::size() && a.data[i]; ++i) ++ count; return count; } ``` 由于字段是按顺序保存到 array 中的，所以在元素值为0时的 count 就是有效的元素个数。接下来我们来看看 detect _ fields _ count _ and _ type _ ids 的实现，这个 constexpr 函数将结构体中的字段类型 id 保存到 array 的 data 中。 ``` detect_fields_count_and_type_ids(types.data, std::make_index_sequence{}); ``` detect _ fields _ count _ and _ type _ ids 的第一个参数为定长数组 array 的 data，第二个参数是一个 std::index _ sequence 整形序列。detect _ fields _ count _ and _ type _ ids 具体实现代码如下： ``` template constexpr auto detect_fields_count_and_type_ids(std::size_t* types, std::index_sequence) noexcept -> decltype( type_to_array_of_type_ids(types) ) { return type_to_array_of_type_ids(types); } template constexpr T detect_fields_count_and_type_ids(std::size_t* types, std::index_sequence) noexcept { return detect_fields_count_and_type_ids(types, std::make_index_sequence{}); } template constexpr T detect_fields_count_and_type_ids(std::size_t*, std::index_sequence<>) noexcept { static_assert(!!sizeof(T), "Failed for unknown reason"); return T{}; } ``` 上面的代码是为了将 index _ sequence 展开为 0，1，2..., sizeof(T) 序列，得到这个序列之后，再调用 type _ to _ array _ of _ type _ ids 函数实现结构体中的字段类型 id 保存到 array 中。在讲 type _ to _ array _ of _ type _ ids 函数之前我们先看一下辅助结构体 ubiq。保存 pod 字段类型 id 实际上是由辅助结构体 ubiq 实现的，它的实现如下： ``` template struct ubiq { std::size_t* ref_; template constexpr operator Type() const noexcept { ref_[I] = type_to_id(identity{}); return Type{}; } }; ``` 这个结构体比较特殊，我们先把它简化一下。 ``` struct ubiq { template constexpr operator Type() const { return Type{}; }; }; ``` 这个结构体的特殊之处在于它可以用来构造任意 pod 类型，比如 int、char、double 等类型。 ``` int i = ubiq{}; double d = ubiq{}; char c = ubiq{}; ``` 因为 ubiq 构造函数所需要的类型由编译器自动推断出来，所以它能构造任意 pod 类型。通过 ubiq 结构体获取了需要构造的类型之后，我们还需要将这个类型转换为 id 按顺序保存到定长数组中。 ``` template struct ubiq { std::size_t* ref_; template constexpr operator Type() const noexcept { ref_[I] = type_to_id(identity{}); return Type{}; } }; ``` 上面的代码中先将编译器推导出来的类型转换为 id，然后保存到数组下标为 I 的位置。再回头看 type _ to _ array _ of _ type _ ids 函数。 ``` template constexpr auto type_to_array_of_type_ids(std::size_t* types) noexcept -> decltype(T{ ubiq{types}... }) { return T{ ubiq{types}... }; } ``` type _ to _ array _ of _ type _ ids 有两个模版参数，第一个 T 是 pod 结构体的类型，第二个 size _ t...为0到 sizeof(T) 的整形序列，函数的入参为 size _ t*，它实际上是 `array` 的 data，用来保存 pod 字段类型 id。保存字段类型的关键代码是这一行：T{ ubiq〈I〉{types}... }，这里利用了 pod 类型的构造函数，通过 initializer _ list 构造，编译器会将 T 的字段类型推导出来，并借助 ubiq 将字段类型转换为 id 保存到数组中。这个就是 magic _ get 中的 magic。将 pod 结构体字段 id 保存到数组中之后，接下来就需要将数组中的 id 列表转换为 tuple 了。 #### pod 字段 id 序列转换为 tuple pod 字段 id 序列转换为 tuple 的具体做法分为两步： - 将 array 中保存的字段类型 id 放入整形序列 std::index _ sequence； - 将 index _ sequence 中的类型 id 转换为对应的类型组成 tuple。下面是具体的实现代码： ``` template constexpr const T& get(const array& a) noexcept { return a.data[I]; } template constexpr auto array_of_type_ids_to_index_sequence(std::index_sequence) noexcept { constexpr auto a = array_of_type_ids(); return std::index_sequence< get(a)...>{}; } ``` get 是返回数组中某个索引位置的元素值，即类型 id，返回的 id 放入 std::index _ sequence 中，接着就是通过 index _ sequence 将 index _ sequence 中的 id 转换为 type，组成一个 tuple。 ``` template constexpr auto as_tuple_impl(std::index_sequence) noexcept { return std::tuple< decltype( id_to_type(std::integral_constant{}) )... >{}; } template constexpr auto as_tuple() noexcept { static_assert(std::is_pod::value, "Not applyable"); constexpr auto res = as_tuple_impl( array_of_type_ids_to_index_sequence( std::make_index_sequence< decltype(array_of_type_ids())::size() >() ) ); static_assert(sizeof(res) == sizeof(T), "sizes check failed"); static_assert( std::alignment_of::value == std::alignment_of::value, "alignment check failed" ); return res; } ``` id _ to _ type 返回的是某个 id 对应的类型实例，所以还需要 decltype 来推导类型。这样我们就可以根据 T 来获取一个 tuple 类型了，接下来是要将 T 的值赋给 tuple，然后就可以根据索引来访问 T 的字段了。 #### pod 赋值给 tuple 对于 clang 编译器，pod 结构体是可以直接转换为 std::tuple 的，所以对于 clang 编译器来说，到这一步就结束了。 ``` template decltype(auto) get(const T& val) noexcept { auto t = reinterpret_cast())*>( std::addressof(val) ); return get(*t); } ``` 然而，对于其他编译器，如 msvc 或者 gcc，tuple 的内存并不是连续的，不能直接将 T 转换为 tuple，所以更通用的做法是先做一个内存连续的 tuple，然后就可以将 T 直接转换为 tuple 了。 ##### 内存连续的 tuple 下面是实现内存连续的 tuple 代码： ``` template struct base_from_member { T value; }; template struct tuple_base; template struct tuple_base< std::index_sequence, Tail... > : base_from_member... { static constexpr std::size_t size_v = sizeof...(I); constexpr tuple_base() noexcept = default; constexpr tuple_base(tuple_base&&) noexcept = default; constexpr tuple_base(const tuple_base&) noexcept = default; constexpr tuple_base(Tail... v) noexcept : base_from_member{ v }... {} }; template <> struct tuple_base > { static constexpr std::size_t size_v = 0; }; template struct tuple: tuple_base< std::make_index_sequence, Values...> { using tuple_base< std::make_index_sequence, Values... >::tuple_base; }; ``` base _ from _ member 用来保存 tuple 元素的索引和值，tuple _ base 派生于 base _ from _ member，自动生成 tuple 中每一个类型的 base _ from _ member，tuple 派生于 tuple _ base 用来简化 tuple _ base 的定义。再给 tuple 增加一个根据索引获取元素的辅助方法。 ``` template constexpr const T& get_impl(const base_from_member& t) noexcept { return t.value; } template constexpr decltype(auto) get(const tuple& t) noexcept { static_assert(N < tuple::size_v, "Tuple index out of bounds"); return get_impl(t); } ``` 这样就可以通过 get 就可以获取 tuple 中的元素了。到此，magic _ get 的核心代码分析完了。由于实际的代码会更复杂，为了让读者能更容易看懂，我选取的是简化版的代码，完整的代码可以参考 GitHub 上的 [magic_get](https://github.com/apolukhin/magic_get) 或者简化版的代码[https://github.com/qicosmos/cosmos/blob/master/pod_reflection.hpp](https://github.com/qicosmos/cosmos/blob/master/pod_reflection.hpp)。 ### 总结 magic _ get 实现了对 pod 类型的反射，可以直接通过索引来访问 pod 结构体的字段，而不需要任何额外的宏、标记或工具，确实很 magic。magic _ get 主要是通过 C++11/14 的可变模版参数、constexpr、index _ sequence、pod 构造函数以及很多模版元技巧实现的。那么 magic _ get 可以用来做些什么呢？根据 magic _ get 无需额外的负担和代码就可以实现编译期反射的特点，很适合做 ORM 数据库访问引擎和通用的序列化/反序列化库，我相信还有更多潜力和应用等待我们去发掘。 Modern C++ 的一些看似平淡无奇的特性组合在一起就能产生神奇的魔力，让人不禁赞叹 Modern C++ 蕴藏了无限的可能性与神奇。