gpMgmt/bin/gpcheckcat · 226e8867d88d3dca0c1381cd0d90255c91221058 · Greenplum / Gpdb

Don't use a temp table in gpccheckcat, when checking for missing entries. · 226e8867

由 Heikki Linnakangas 提交于 10月 28, 2017

The new query is simpler. There was a comment about using the temp table
to avoid gathering all the data to the master, but I don't think that is a
good tradeoff. Creating a temp table is pretty expensive, and even with
the temp table, the master needs to broadcast all the master's entries from
to the segments. For comparison, with the Gather node, all the segments
need to send their entries to the master. Isn't that roughly the same
amount of traffic?

A long time ago, the query was made to use the temp table, after a report
from a huge cluster with over 1000 segments, where the total size of
pg_attribute, across all the nodes, was over 200 GB. So the catalogs can
be large. But even then, I don't think this query can get much better than
this.

The new query moves some of the logic from SQL to the Python code. Seems
simpler that way.

The real reason to do this right now is that in the next commit, I'm
going to change the way snapshots are dispatched with a query, and that
change will change the visibility of the temp table that was created in
the same command. In a nutshell, currently, if you do "CREATE TABLE mytemp
AS SELECT oid FROM pg_class WHERE relname='mytemp'", the oid of the table
being created is included. On PostgreSQL, and after the snapshot changes
I'm working on, it will not be. And would confuse this gpcheckcat query.

226e8867

gpcheckcat 187.8 KB

Greenplum / Gpdb

Replace gpcheckcat