• H
    Make 'rows' estimate more accurate for plans that fetch only a few rows. · f4d48358
    Heikki Linnakangas 提交于
    In commit c5f6dbbe, we changed the row and cost estimates on plan nodes
    to represent per-segment costs. That made some estimates worse, because
    the effects of the estimate "clamping" compounds. Per my comment on the
    PR back then:
    
    > One interesting effect of this change, that explains many of the
    > plan changes: If you have a table with very few rows, or e.g. a qual
    > like id = 123 that matches exactly one row, the Seq/Index Scan on it
    > will be marked with rows=1. It now means that we estimate that every
    > segment returns one row, although in reality, only one of them will
    > return a row, and the rest will return nothing. That's because the
    > row count estimates are "clamped" in the planner to at least
    > 1. That's not a big deal on its own, but if you then have e.g. a
    > Gather Motion on top of the Scan, the planner will estimate that the
    > Gather Motion returns as many rows as there are segments. If you
    > have e.g. 100 segments, that's relatively a big discrepancy, with
    > 100 rows vs 1. I don't think that's a big problem in practice, I
    > don't think most plans are very sensitive to that kind of a
    > misestimate. What do you think?
    >
    > If we wanted to fix that, perhaps we should stop "clamping" the
    > estimates to 1. I don't think there's any fundamental reason we need
    > to do it. Perhaps clamp down to 1 / numsegments instead.
    
    But I came up with a less intrusive idea, implemented in this commit:
    Most Motion nodes have a "parent" RelOptInfo, and the RelOptInfo
    contains an estimate of the total number of rows, before dividing it
    with the number of segments or clamping. So if the row estimate we get
    from the subpath seems clamped to 1.0, we look at the row estimate on
    the underlying RelOptInfo instead, and use that if it's smaller. That
    makes the row count estimates better for plans that fetch a single row
    or a few rows, same as they were before commit c5f6dbbe. Not all
    RelOptInfos have a row count estimate, and the subpaths estimate is
    more accurate if the number of rows produced by the path differs from
    the number of rows in the underlying relation, e.g.  because of a
    ProjectSet node, so we still prefer the subpath's estimate if it
    doesn't seem clamped.
    Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
    f4d48358
qp_subquery.out 55.7 KB