Remove Orca assertions when merging buckets
These assertions started getting tripped in the previous commit when adding tests, but aren't related to the Epsilon change. Rather, we're calculating the frequency of a singleton bucket using two different methods which causes this assertion to break down. The first method (calculating the upper_third) assumes the singleton has 1 NDV and that there is an even distribution across the NDVs. The second (in GetOverlapPercentage) calculates a "resolution" that is based on Epsilon and assumes the bucket contains some small Epsilon frequency. It results in the overlap percentage being too high, instead it too should likely be based on the NDV. In practice, this won't have much impact unless the NDV is very small. Additionally, the conditional logic is based on the bounds, not frequency. However, it would be good to align in the future so our statistics calculations are simpler to understand and predictable. For now, we'll remove the assertions and add a TODO. Once we align the methods, we should add these assertions back.
Showing
想要评论请 注册 或 登录