Make 'rows' estimate more accurate for plans that fetch only a few rows.
In commit c5f6dbbe, we changed the row and cost estimates on plan nodes to represent per-segment costs. That made some estimates worse, because the effects of the estimate "clamping" compounds. Per my comment on the PR back then: > One interesting effect of this change, that explains many of the > plan changes: If you have a table with very few rows, or e.g. a qual > like id = 123 that matches exactly one row, the Seq/Index Scan on it > will be marked with rows=1. It now means that we estimate that every > segment returns one row, although in reality, only one of them will > return a row, and the rest will return nothing. That's because the > row count estimates are "clamped" in the planner to at least > 1. That's not a big deal on its own, but if you then have e.g. a > Gather Motion on top of the Scan, the planner will estimate that the > Gather Motion returns as many rows as there are segments. If you > have e.g. 100 segments, that's relatively a big discrepancy, with > 100 rows vs 1. I don't think that's a big problem in practice, I > don't think most plans are very sensitive to that kind of a > misestimate. What do you think? > > If we wanted to fix that, perhaps we should stop "clamping" the > estimates to 1. I don't think there's any fundamental reason we need > to do it. Perhaps clamp down to 1 / numsegments instead. But I came up with a less intrusive idea, implemented in this commit: Most Motion nodes have a "parent" RelOptInfo, and the RelOptInfo contains an estimate of the total number of rows, before dividing it with the number of segments or clamping. So if the row estimate we get from the subpath seems clamped to 1.0, we look at the row estimate on the underlying RelOptInfo instead, and use that if it's smaller. That makes the row count estimates better for plans that fetch a single row or a few rows, same as they were before commit c5f6dbbe. Not all RelOptInfos have a row count estimate, and the subpaths estimate is more accurate if the number of rows produced by the path differs from the number of rows in the underlying relation, e.g. because of a ProjectSet node, so we still prefer the subpath's estimate if it doesn't seem clamped. Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
Showing
想要评论请 注册 或 登录