1. 14 3月, 2019 1 次提交
  2. 27 2月, 2019 1 次提交
    • J
      refactor NUMSEGMENTS related macro (#7028) · d28b7057
      Jialun 提交于
      - Retire GP_POLICY_ALL_NUMSEGMENTS and GP_POLICY_ENTRY_NUMSEGMENTS,
        unify to getgpsegmentCount
      - retire GP_POLICY_MINIMAL_NUMSEGMENTS & GP_POLICY_RANDOM_NUMSEGMENTS
      - Change NUMSEGMENTS related macro from variable macro to function
        macro
      - Change default return value of getgpsegmentCount to 1, which
        represents a singleton postgresql in utility mode
      - change __GP_POLICY_INVALID_NUMSEGMENTS to GP_POLICY_INVALID_NUMSEGMENTS
      d28b7057
  3. 15 2月, 2019 1 次提交
  4. 14 2月, 2019 1 次提交
    • P
      Handle parameterized paths correctly when creating a join path. (#6770) · 5a808652
      Paul Guo 提交于
      After we have parameterized path since pg 9.2 and lateral (since pg9.3 although
      we do not support the full functionality), merge join path and hash join path
      need to consider that. Besides, for nestloop path, the previous code is wrong.
      
      1. It did not allow motion for paths include index (path_contains_inner_index()).
         That is wrong.  Here are two examples of index paths which allow motion.
      
      ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.17..24735.67 rows=86100 width=0)
          ->  Index Only Scan using t2i on t2  (cost=0.17..21291.67 rows=28700 width=0)
      
      ->  Broadcast Motion 1:3  (slice1; segments: 1)  (cost=0.17..6205.12 rows=259 width=8)
          ->  Index Scan using t2i on t2  (cost=0.17..6201.67 rows=29 width=8)
              Index Cond: (4 = a)
      
      2. The inner path and outer path might require upper nodes for parameterized
         paths so current code code
           bms_overlap(inner_req_outer, outer_path->parent->relids)
         is definitely not sufficient, besides, outer_path could have paramterized
         paths also.
      
      For nestloop join, case 1 is covered by the test case added in join_gp.
      For case 2, the test case in join.sql (although ignored) in this patch
      actually partially tested.
      
      Note the change in this patch is conservative. In theory, we could refer
      subplan code to allow broadcast for base rel if needed (for this solution
      no motion is needed), but that needs much effort and does not seem
      to be deserved given we will probably refactor related code for the
      lateral support in the near future.
      5a808652
  5. 12 2月, 2019 1 次提交
    • H
      Ensure that Motion nodes in parameterized plans are not rescanned. · 25763c22
      Heikki Linnakangas 提交于
      In plans with a Nested Loop join on the inner side of another Nested Loop
      join, the planner could produce a plan where a Motion node was rescanned.
      That produced an error at execution time:
      
      ERROR:  illegal rescan of motion node: invalid plan (nodeMotion.c:1604)  (seg0 slice4 127.0.0.1:40000 pid=27206) (nodeMotion.c:1604)
      HINT:  Likely caused by bad NL-join, try setting enable_nestloop to off
      
      Make sure we add a Materialize node to shield the from rescanning in such
      cases.
      
      While we're at it, add an explicit flag to MaterialPaths and plans, to
      indicate that the Material node was added to shield the child node from
      rescanning. There was a weaker test in ExecInitMaterial itself, which just
      checked if the immediately child was a Motion node, but that feels
      sketchy; what if there's a Result node in between, for example? However, I
      kept the direct check for a Motion node, too, because I'm not sure if there
      are other places where we add Material nodes on top of Motions, aside from
      the call in create_nestloop_path() that this fixes. ORCA probably does
      that, at least.
      
      Fixes https://github.com/greenplum-db/gpdb/issues/6769Reviewed-by: NPengzhou Tang <ptang@pivotal.io>
      Reviewed-by: NPaul Guo <pguo@pivotal.io>
      25763c22
  6. 01 2月, 2019 1 次提交
    • H
      Use normal hash operator classes for data distribution. · 242783ae
      Heikki Linnakangas 提交于
      Replace the use of the built-in hashing support for built-in datatypes, in
      cdbhash.c, with the normal PostgreSQL hash functions. Now is a good time
      to do this, since we've already made the change to use jump consistent
      hashing in GPDB 6, so we'll need to deal with the upgrade problems
      associated with changing the hash functions, anyway.
      
      It is no longer enough to track which columns/expressions are used to
      distribute data. You also need to know the hash function used. For that,
      a new field is added to gp_distribution_policy, to record the hash
      operator class used for each distribution key column. In the planner,
      a new opfamily field is added to DistributionKey, to track that throughout
      the planning.
      
      Normally, if you do "CREATE TABLE ... DISTRIBUTED BY (column)", the
      default hash operator class for the datatype is used. But this patch
      extends the syntax so that you can specify the operator class explicitly,
      like "... DISTRIBUTED BY (column opclass)". This is similar to how an
      operator class can be specified for each column in CREATE INDEX.
      
      To support upgrade, the old hash functions have been converted to special
      (non-default) operator classes, named cdbhash_*_ops. For example, if you
      want to use the old hash function for an integer column, you could do
      "DISTRIBUTED BY (intcol cdbhash_int4_ops)". The old hard-coded whitelist
      of operators that have "compatible" cdbhash functions has been replaced
      by putting the compatible hash opclasses in the same operator family. For
      example, all legacy integer operator classes, cdbhash_int2_ops,
      cdbhash_int4_ops and cdbhash_int8_ops, are all part of the
      cdbhash_integer_ops operator family).
      
      This removes the pg_database.hashmethod field. The hash method is now
      tracked on a per-table and per-column basis, using the opclasses, so it's
      not needed anymore.
      
      To help with upgrade from GPDB 5, this introduces a new GUC called
      'gp_use_legacy_hashops'. If it's set, CREATE TABLE uses the legacy hash
      opclasses, instead of the default hash opclasses, if the opclass is not
      specified explicitly. pg_upgrade will set the new GUC, to force the use of
      legacy hashops, when restoring the schema dump. It will also set the GUC
      on all upgraded databases, as a per-database option, so any new tables
      created after upgrade will also use the legacy opclasses. It seems better
      to be consistent after upgrade, so that collocation between old and new
      tables work for example. The idea is that some time after the upgrade, the
      admin can reorganize all tables to use the default opclasses instead. At
      that point, he should also clear the GUC on the converted databases. (Or
      rather, the automated tool that hasn't been written yet, should do that.)
      
      ORCA doesn't know about hash operator classes, or the possibility that we
      might need to use a different hash function for two columns with the same
      datatype. Therefore, it cannot produce correct plans for queries that mix
      different distribution hash opclasses for the same datatype, in the same
      query. There are checks in the Query->DXL translation, to detect that
      case, and fall back to planner. As long as you stick to the default
      opclasses in all tables, we let ORCA to create the plan without any regard
      to them, and use the default opclasses when translating the DXL plan to a
      Plan tree. We also allow the case that all tables in the query use the
      "legacy" opclasses, so that ORCA works after pg_upgrade. But a mix of the
      two, or using any non-default opclasses, forces ORCA to fall back.
      
      One curiosity with this is the "int2vector" and "aclitem" datatypes. They
      have a hash opclass, but no b-tree operators. GPDB 4 used to allow them
      as DISTRIBUTED BY columns, but we forbid that in GPDB 5, in commit
      56e7c16b. Now they are allowed again, so you can specify an int2vector
      or aclitem column in DISTRIBUTED BY, but it's still pretty useless,
      because the planner still can't form EquivalenceClasses on it, and will
      treat it as "strewn" distribution, and won't co-locate joins.
      
      Abstime, reltime, tinterval datatypes don't have default hash opclasses.
      They are being removed completely on PostgreSQL v12, and users shouldn't
      be using them in the first place, so instead of adding hash opclasses for
      them now, we accept that they can't be used as distribution key columns
      anymore. Add a check to pg_upgrade, to refuse upgrade if they are used
      as distribution keys in the old cluster. Do the same for 'money' datatype
      as well, although that's not being removed in upstream.
      
      The legacy hashing code for anyarray in GPDB 5 was actually broken. It
      could produce a different hash value for two arrays that are considered
      equal, according to the = operator, if there were differences in e.g.
      whether the null bitmap was stored or not. Add a check to pg_upgrade, to
      reject the upgrade if array types were used as distribution keys. The
      upstream hash opclass for anyarray works, though, so it is OK to use
      arrays as distribution keys in new tables. We just don't support binary
      upgrading them from GPDB 5. (See github issue
      https://github.com/greenplum-db/gpdb/issues/5467). The legacy hashing of
      'anyrange' had the same problem, but that was new in GPDB 6, so we don't
      need a pg_upgrade check for that.
      
      This also tightens the checks ALTER TABLE ALTER COLUMN and CREATE UNIQUE
      INDEX, so that you can no longer create a situation where a non-hashable
      column becomes the distribution key. (Fixes github issue
      https://github.com/greenplum-db/gpdb/issues/6317)
      
      Discussion: https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/4fZVeOpXllQCo-authored-by: NMel Kiyama <mkiyama@pivotal.io>
      Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io>
      Co-authored-by: NPengzhou Tang <ptang@pivotal.io>
      Co-authored-by: NChris Hajas <chajas@pivotal.io>
      Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
      Reviewed-by: NNing Yu <nyu@pivotal.io>
      Reviewed-by: NSimon Gao <sgao@pivotal.io>
      Reviewed-by: NJesse Zhang <jzhang@pivotal.io>
      Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
      Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
      Reviewed-by: NYandong Yao <yyao@pivotal.io>
      242783ae
  7. 15 12月, 2018 1 次提交
    • H
      Refactor executor code for TableScan, DynamicTableScan, BitmapHeapScan. · db516347
      Heikki Linnakangas 提交于
      This removes a lot of GPDB-specific code that was used to deal with
      dynamic scans, and code duplication between nodes dealing with Heap, AO
      and AOCS tables.
      
      * Resurrect SeqScan node. We had replaced it with TableScan in GPDB.
        Teach SeqScan to also work on append-only and AOCS tables, and remove
        TableScan and all the code changes that were made in GPDB earlier to
        deal with all three table types.
      
      * Merge BitmapHeapScan, BitmapAppendOnlyScan, and BitmapTableScan node
        types. They're all BitmapHeapScans now. We used to use BitmapTableScans
        in ORCA-generated plans, and BitmapHeapScan/BitmapAppendOnlyScan in
        planner-generated plans, and there was no good reason for the
        difference. The "heap" part in the name is a bit misleading, but I
        prefer to keep the upstream name, even though it now handles AO tables
        as well. It's more like the old BitmapTableScan now, which also handled
        all three table types, but the code is refactored to stay as close to
        upstream as possible.
      
      * Introduce DynamicBitmapHeapScan. BitmapTableScan used to perform Dynamic
        scans too, now it's the responsibility of the new DynamicBitmapHeapScan
        plan node, just like we have DynamicTableScan and DynamicIndexScan as
        wrappers around SeqScand and IndexScans.
      
      * Get rid of BitmapAppendOnlyPath in the planner, too. Use BitmapHeapPath
        also for AO tables.
      
      * Refactor the way Dynamic Table Scan works. A Dynamic Table Scan node
        is now just a thin wrapper around SeqScan. It initializes a new
        SeqScan executor node for every partition, and lets it do the actual
        scanning. It now works the same way that I refactored Dynamic Index
        Scans to work in commit 198f701e. This allowed removing a lot of code
        that we used to use for both Dynamic Index Scans and Dynamic Table
        Scans, but is no longer used.
      
      There's now some duplication in the Dynamic* nodes, to walk through the
      partitions. They all have a function called setPidIndex(), for example,
      which does the same thing. But I think it's much more clear this way,
      than the previous DynamicController stuff. We could perhaps extract some
      of the code to common helper functions, but I think this is OK for now.
      
      This also fixes issue #6274. I'm not sure what exactly the bug was, but it
      was clearly in the Bitmap Table Scan code that is used with ORCA-generated
      plans. Now that we use the same code for plans generated with the Postgres
      planner and ORCA, it's not surprising that the bug is gone.
      Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
      Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
      db516347
  8. 07 12月, 2018 1 次提交
    • N
      Create partition table with same numsegments for parent and children · 8f898338
      Ning Yu 提交于
      When creating a partition table we want children have the same
      numsegments with parent.  As they all set their numsegments to DEFAULT,
      does this meet our expectation?  No, because DEFAULT does not always
      equal to DEFAULT itself.  When DEFAULT is set to RANDOM a different
      value is returned each time.
      
      So we have to align numsegments explicitly.
      
      Also removed an incorrect assert and comment.
      8f898338
  9. 03 12月, 2018 2 次提交
  10. 27 11月, 2018 1 次提交
    • H
      Replace PathKey with new DistributionKey struct, in CdbPathLocus. · 882958da
      Heikki Linnakangas 提交于
      In PostgreSQL, a PathKey represents sort ordering, but we have been using
      it in GPDB to also represent the distribution keys of hash-distributed
      data in the planner. (i.e. the keys in DISTRIBUTED BY of a table, but also
      when data is redistributed by some other key on the fly). That's been
      convenient, and there's some precedent for that, since PostgreSQL also
      uses PathKey to represent GROUP BY columns, which is quite similar to
      DISTRIBUTED BY.
      
      However, there are some differences. The opfamily, strategy and nulls_first
      fields in PathKey are not applicable to distribution keys. Using the same
      struct to represent ordering and hash distribution is sometimes convenient,
      for example when we need to test whether the sort order or grouping is
      "compatible" with the distribution. But at other times, it's confusing.
      
      To clarify that, introduce a new DistributionKey struct, to represent
      a hashed distribution. While we're at it, simplify the representation of
      HashedOJ locus types, by including a List of EquivalenceClasses in
      DistributionKey, rather than just one EC like a PathKey has. CdbPathLocus
      now has only one 'distkey' list that is used for both Hashed and HashedOJ
      locus, and it's a list of DistributionKeys. Each DistributionKey in turn
      can contain multiple EquivalenceClasses.
      
      Looking ahead, I'm working on a patch to generalize the "cdbhash"
      mechanism, so that we'd use the normal Postgres hash opclasses for
      distribution keys, instead of hard-coding support for specific datatypes.
      With that, the hash operator class or family will be an important part of
      the distribution key, in addition to the datatype. The plan is to store
      that also in DistributionKey.
      Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
      882958da
  11. 23 11月, 2018 1 次提交
    • N
      Fix numsegments when appending multiple SingleQEs · fa86f160
      Ning Yu 提交于
      When Append node contains SingleQE subpath we used to put Append on ALL
      the segments, however if the SingleQE is partially distributed then
      apparently we could not put the SingleQE on ALL the segments, this
      conflict could results in runtime or incorrect results.
      
      To fix this we should put Append on SingleQE's segments.
      
      In the other hand when there are multiple SingleQE subpaths we should
      put Append on the common segments of SingleQEs.
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      fa86f160
  12. 19 11月, 2018 1 次提交
    • A
      Add support for executing foreign tables on master, any or all segments · 3c6c6ab2
      Adam Lee 提交于
      This commit adds the support and option of `mpp_execute 'MASTER | ANY |
      ALL SEGMENTS'` for foreign tables.
      
      MASTER is the default, FDW requests for data from master.
      
      ANY, FDW requests for data from master or one any segment, depends on
      which path costs less.
      
      ALL SEGMENTS, FDW requests for data from all segments, wrappers need to
      have a policy matching the segments to data.
      
      For instance, file_fdw probes the mpp_execute vaule, then load different
      files based on the segment number. But something like gpfdist on the
      foreign side doesn't need this, which hands out a different slice of the
      data to each request, all segments could request the same location.
      3c6c6ab2
  13. 12 11月, 2018 1 次提交
    • P
      Fix another issue on of inheritance table · 39856768
      Pengzhou Tang 提交于
      Previously, when creating a APPEND node for inheritance table, if
      subpaths has different number segments in gp_distribution_policy,
      the whole APPEND node might be assigned with a wrong numsegments,
      so some segments can not get plans and lost data in the results.
      39856768
  14. 05 11月, 2018 1 次提交
  15. 12 10月, 2018 1 次提交
  16. 28 9月, 2018 1 次提交
    • Z
      Allow tables to be distributed on a subset of segments · 4eb65a53
      ZhangJackey 提交于
      There was an assumption in gpdb that a table's data is always
      distributed on all segments, however this is not always true for example
      when a cluster is expanded from M segments to N (N > M) all the tables
      are still on M segments, to workaround the problem we used to have to
      alter all the hash distributed tables to randomly distributed to get
      correct query results, at the cost of bad performance.
      
      Now we support table data to be distributed on a subset of segments.
      
      A new columne `numsegments` is added to catalog table
      `gp_distribution_policy` to record how many segments a table's data is
      distributed on.  By doing so we could allow DMLs on M tables, joins
      between M and N tables are also supported.
      
      ```sql
      -- t1 and t2 are both distributed on (c1, c2),
      -- one on 1 segments, the other on 2 segments
      select localoid::regclass, attrnums, policytype, numsegments
          from gp_distribution_policy;
       localoid | attrnums | policytype | numsegments
      ----------+----------+------------+-------------
       t1       | {1,2}    | p          |           1
       t2       | {1,2}    | p          |           2
      (2 rows)
      
      -- t1 and t1 have exactly the same distribution policy,
      -- join locally
      explain select * from t1 a join t1 b using (c1, c2);
                         QUERY PLAN
      ------------------------------------------------
       Gather Motion 1:1  (slice1; segments: 1)
         ->  Hash Join
               Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2
               ->  Seq Scan on t1 a
               ->  Hash
                     ->  Seq Scan on t1 b
       Optimizer: legacy query optimizer
      
      -- t1 and t2 are both distributed on (c1, c2),
      -- but as they have different numsegments,
      -- one has to be redistributed
      explain select * from t1 a join t2 b using (c1, c2);
                                QUERY PLAN
      ------------------------------------------------------------------
       Gather Motion 1:1  (slice2; segments: 1)
         ->  Hash Join
               Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2
               ->  Seq Scan on t1 a
               ->  Hash
                     ->  Redistribute Motion 2:1  (slice1; segments: 2)
                           Hash Key: b.c1, b.c2
                           ->  Seq Scan on t2 b
       Optimizer: legacy query optimizer
      ```
      4eb65a53
  17. 25 9月, 2018 1 次提交
    • P
      Allow to add motion to unique-ify the path in create_unique_path(). (#5589) · e9fe4224
      Paul Guo 提交于
      create_unique_path() could be used to convert semi join to inner join.
      Previously, during the Semi-join refactor in commit d4ce0921, creating unique
      path was disabled for the case where duplicats might be on different QEs.
      
      In this patch we enable adding motion to unique_ify the path, only if unique
      mothod is not UNIQUE_PATH_NOOP. We don't create unique path for that case
      because if later on during plan creation, it is possible to create a motion
      above this unique path whose subpath is a motion. In that case, the unique path
      node will be ignored and we will get a motion plan node above a motion plan
      node and that is bad. We could further improve that, but not in this patch.
      Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
      Co-authored-by: NPaul Guo <paulguo@gmail.com>
      e9fe4224
  18. 21 9月, 2018 1 次提交
    • H
      Remove duplicated code to handle SeqScan, AppendOnlyScan and AOCSScan. · ff8161a2
      Heikki Linnakangas 提交于
      They were all treated the same, with the SeqScan code being duplicated
      for AppendOnlyScans and AOCSScans. That is a merge hazard: if some code
      is changed for SeqScans, we would have to remember to manually update
      the other copies. Small differences in the code had already crept up,
      although given that everything worked, I guess it had no effect. Or
      only had a small effect on the computed costs.
      
      To avoid the duplication, use SeqScan for all of them. Also get rid of
      TableScan as a separate node type, and have ORCA translator also create
      SeqScans.
      
      The executor for SeqScan node can handle heap, AO and AOCS tables, because
      we're not actually using the upstream SeqScan code for it. We're using the
      GPDB code in nodeTableScan.c, and a TableScanState, rather than
      SeqScanState, as the executor node. That's how it worked before this patch
      already, what this patch changes is that we now use SeqScan *before* the
      executor phase, instead of SeqScan/AppendOnlyScan/AOCSScan/TableScan.
      
      To avoid having to change all the expected outputs for tests that use
      EXPLAIN, add code to still print the SeqScan as "Seq Scan", "Table Scan",
      "Append-only Scan" or "Append-only Columnar Scan", depending on whether
      the plan was generated by ORCA, and what kind of a table it is.
      ff8161a2
  19. 19 9月, 2018 1 次提交
    • H
      Fix "could not find pathkey item to sort" error with MergeAppend plans. · 1722adb8
      Heikki Linnakangas 提交于
      When building a Sort node to represent the ordering that is preserved
      by a Motion node, in make_motion(), the call to make_sort_from_pathkeys()
      would sometimes fail with "could not find pathkey item to sort". This
      happened when the ordering was over a UNION ALL operation. When building
      Motion nodes for MergeAppend subpaths, the path keys that represented the
      ordering referred to the items in the append rel's target list, not the
      subpaths. In create_merge_append_plan(), where we do a similar thing for
      each subpath, we correctly passed the 'relids' argument to
      prepare_sort_from_pathkeys(), so that prepare_sort_from_pathkeys() can
      match the target list entries of the append relation with the entries of
      the subpaths. But when creating the Motion nodes for each subpath, we
      were passing NULL as 'relids' (via make_sort_from_pathkeys()).
      
      At a high level, the fix is straightforward: we need to pass the correct
      'relids' argument to prepare_sort_from_pathkeys(), in
      cdbpathtoplan_create_motion_plan(). However, the current code structure
      makes that not so straightforward, so this required some refactoring of
      the make_motion() and related functions:
      
      Previously, make_motion() and make_sorted_union_motion() would take a path
      key list as argument, to represent the ordering, and it called
      make_sort_from_pathkeys() to extract the sort columns, operators etc.
      After this patch, those functions take arrays of sort columns, operators,
      etc. directly as arguments, and the caller is expected to do the call to
      make_sort_from_pathkeys() to get them, or build them through some other
      means. In cdbpathtoplan_create_motion_plan(), call
      prepare_sort_from_pathkeys() directly, rather than the
      make_sort_from_pathkeys() wrapper, so that we can pass the 'relids'
      argument. Because prepare_sort_from_pathkeys() is marked as 'static', move
      cdbpathtoplan_create_motion_plan() from cdbpathtoplan.c to createplan.c,
      so that it can call it.
      
      Add test case. It's a slightly reduced version of a query that we already
      had in 'olap_group' test, but seems better to be explicit. Revert the
      change in expected output of 'olap_group', made in commit 28087f4e,
      which memorized the error in the expected output.
      
      Fixes https://github.com/greenplum-db/gpdb/issues/5695.
      Reviewed-by: NPengzhou Tang <ptang@pivotal.io>
      Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
      1722adb8
  20. 29 8月, 2018 1 次提交
  21. 24 8月, 2018 1 次提交
    • J
      Fix redistribute bug on some types which need to convert (#5568) · b0fbb5c7
      Jinbao Chen 提交于
      After 8.4 merge, we have two restrictlist 'mergeclause_list'
      and 'hashclause_list' in function 'add_paths_to_joinrel'. We
      use mergeclause_list in cdb motion in hashjoin. But some of
      keys should not been used as distribution keys.
      
      Add a whitelist that which operator is  distribution-compatible.
      b0fbb5c7
  22. 15 8月, 2018 1 次提交
  23. 13 8月, 2018 1 次提交
    • X
      Remove cdbpath_rows function · b2411b59
      xiong-gang 提交于
      Replace function `cdbpath_rows(root, path)` with path->rows, this is more in
      line with upstream 9.2, thus removes a GPDB_92_MERGE_FIXME
      
      Co-authored-by: Alexandra Wang<lewang@pivotal.io>
      Co-authored-by: Gang Xiong<gxiong@pivotal.io>
      b2411b59
  24. 03 8月, 2018 1 次提交
  25. 02 8月, 2018 1 次提交
    • R
      Merge with PostgreSQL 9.2beta2. · 4750e1b6
      Richard Guo 提交于
      This is the final batch of commits from PostgreSQL 9.2 development,
      up to the point where the REL9_2_STABLE branch was created, and 9.3
      development started on the PostgreSQL master branch.
      
      Notable upstream changes:
      
      * Index-only scan was included in the batch of upstream commits. It
        allows queries to retrieve data only from indexes, avoiding heap access.
      
      * Group commit was added to work effectively under heavy load. Previously,
        batching of commits became ineffective as the write workload increased,
        because of internal lock contention.
      
      * A new fast-path lock mechanism was added to reduce the overhead of
        taking and releasing certain types of locks which are taken and released
        very frequently but rarely conflict.
      
      * The new "parameterized path" mechanism was added. It allows inner index
        scans to use values from relations that are more than one join level up
        from the scan. This can greatly improve performance in situations where
        semantic restrictions (such as outer joins) limit the allowed join orderings.
      
      * SP-GiST (Space-Partitioned GiST) index access method was added to support
        unbalanced partitioned search structures. For suitable problems, SP-GiST can
        be faster than GiST in both index build time and search time.
      
      * Checkpoints now are performed by a dedicated background process. Formerly
        the background writer did both dirty-page writing and checkpointing. Separating
        this into two processes allows each goal to be accomplished more predictably.
      
      * Custom plan was supported for specific parameter values even when using
        prepared statements.
      
      * API for FDW was improved to provide multiple access "paths" for their tables,
        allowing more flexibility in join planning.
      
      * Security_barrier option was added for views to prevents optimizations that
        might allow view-protected data to be exposed to users.
      
      * Range data type was added to store a lower and upper bound belonging to its
        base data type.
      
      * CTAS (CREATE TABLE AS/SELECT INTO) is now treated as utility statement. The
        SELECT query is planned during the execution of the utility. To conform to
        this change, GPDB executes the utility statement only on QD and dispatches
        the plan of the SELECT query to QEs.
      Co-authored-by: NAdam Lee <ali@pivotal.io>
      Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
      Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
      Co-authored-by: NAsim R P <apraveen@pivotal.io>
      Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
      Co-authored-by: NGang Xiong <gxiong@pivotal.io>
      Co-authored-by: NHaozhou Wang <hawang@pivotal.io>
      Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
      Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
      Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
      Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
      Co-authored-by: NPaul Guo <paulguo@gmail.com>
      Co-authored-by: NRichard Guo <guofenglinux@gmail.com>
      Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
      Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
      Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>
      4750e1b6
  26. 09 7月, 2018 1 次提交
  27. 29 3月, 2018 1 次提交
    • P
      Support replicated table in GPDB · 7efe3204
      Pengzhou Tang 提交于
      * Support replicated table in GPDB
      
      Currently, tables are distributed across all segments by hash or random in GPDB. There
      are requirements to introduce a new table type that all segments have the duplicate
      and full table data called replicated table.
      
      To implement it, we added a new distribution policy named POLICYTYPE_REPLICATED to mark
      a replicated table and added a new locus type named CdbLocusType_SegmentGeneral to specify
      the distribution of tuples of a replicated table.  CdbLocusType_SegmentGeneral implies
      data is generally available on all segments but not available on qDisp, so plan node with
      this locus type can be flexibly planned to execute on either single QE or all QEs. it is
      similar with CdbLocusType_General, the only difference is that CdbLocusType_SegmentGeneral
      node can't be executed on qDisp. To guarantee this, we try our best to add a gather motion
      on the top of a CdbLocusType_SegmentGeneral node when planing motion for join even other
      rel has bottleneck locus type, a problem is such motion may be redundant if the single QE
      is not promoted to executed on qDisp finally, so we need to detect such case and omit the
      redundant motion at the end of apply_motion(). We don't reuse CdbLocusType_Replicated since
      it's always implies a broadcast motion bellow, it's not easy to plan such node as direct
      dispatch to avoid getting duplicate data.
      
      We don't support replicated table with inherit/partition by clause now, the main problem is
      that update/delete on multiple result relations can't work correctly now, we can fix this
      later.
      
      * Allow spi_* to access replicated table on QE
      
      Previously, GPDB didn't allow QE to access non-catalog table because the
      data is incomplete,
      we can remove this limitation now if it only accesses replicated table.
      
      One problem is QE need to know if a table is replicated table,
      previously, QE didn't maintain
      the gp_distribution_policy catalog, so we need to pass policy info to QE
      for replicated table.
      
      * Change schema of gp_distribution_policy to identify replicated table
      
      Previously, we used a magic number -128 in gp_distribution_policy table
      to identify replicated table which is quite a hack, so we add a new column
      in gp_distribution_policy to identify replicated table and partitioned
      table.
      
      This commit also abandon the old way that used 1-length-NULL list and
      2-length-NULL list to identify DISTRIBUTED RANDOMLY and DISTRIBUTED
      FULLY clause.
      
      Beside, this commit refactor the code to make the decision-making of
      distribution policy more clear.
      
      * support COPY for replicated table
      
      * Disable row ctid unique path for replicated table.
        Previously, GPDB use a special Unique path on rowid to address queries
        like "x IN (subquery)", For example:
        select * from t1 where t1.c2 in (select c2 from t3), the plan looks
        like:
         ->  HashAggregate
               Group By: t1.ctid, t1.gp_segment_id
                  ->  Hash Join
                        Hash Cond: t2.c2 = t1.c2
                      ->  Seq Scan on t2
                      ->  Hash
                          ->  Seq Scan on t1
      
        Obviously, the plan is wrong if t1 is a replicated table because ctid
        + gp_segment_id can't identify a tuple, in replicated table, a logical
        row may have different ctid and gp_segment_id. So we disable such plan
        for replicated table temporarily, it's not the best way because rowid
        unique way maybe the cheapest plan than normal hash semi join, so
        we left a FIXME for later optimization.
      
      * ORCA related fix
        Reported and added by Bhuvnesh Chaudhary <bchaudhary@pivotal.io>
        Fallback to legacy query optimizer for queries over replicated table
      
      * Adapt pg_dump/gpcheckcat to replicated table
        gp_distribution_policy is no longer a master-only catalog, do
        same check as other catalogs.
      
      * Support gpexpand on replicated table && alter the dist policy of replicated table
      7efe3204
  28. 09 3月, 2018 1 次提交
    • H
      Whitespace and formatting fixes. · ff940ddc
      Heikki Linnakangas 提交于
      The immediate reason to do this was the "this ‘else’ clause does not guard"
      gcc warning from create_mergejoin_path(). But while we're at it, might as
      well clean up the whole file.
      
      I spotted one piece of code that looks broken, marked that with a FIXME to
      make sure we revisit that.
      ff940ddc
  29. 08 3月, 2018 1 次提交
    • H
      Allow using a merge join for a dummy FULL JOIN ON TRUE. · a61bf5e0
      Heikki Linnakangas 提交于
      Like in the 'join' regression test:
      
      postgres=# select * from int4_tbl a full join int4_tbl b on true;
      ERROR:  Query requires a feature that has been disabled by a configuration setting.
      DETAIL:  Could not devise a query plan for the given query.
      HINT:  Current settings:  optimizer=off
      a61bf5e0
  30. 06 3月, 2018 1 次提交
  31. 09 2月, 2018 1 次提交
    • H
      Refactor the way Semi-Joins plans are constructed. · d4ce0921
      Heikki Linnakangas 提交于
      This removes much of the GPDB machinery to handle "deduplication paths"
      within the planner. We will now use the upstream code to build JOIN_SEMI
      paths, as well as paths where the outer side of the join is first
      deduplicated (JOIN_UNIQUE_OUTER/INNER).
      
      The old style "join first and deduplicate later" plans can be better in
      some cases, however. To still be able to generate such plan, add new
      JOIN_DEDUP_SEMI join type, which is transformed into JOIN_INNER followed
      by the deduplication step after the join, during planning.
      
      This new way of constructing these plans is simpler, and allows removing
      a bunch of code, and reverting some more code to the way it is in the
      upstream.
      
      I'm not sure if this can generate the same plans that the old code could,
      in all cases. In particular, I think the old "late deduplication"
      mechanism could delay the deduplication further, all the way to the top of
      the join tree. I'm not sure when that woud be useful, though, and the
      regression suite doesn't seem to contain any such cases (with EXPLAIN). Or
      maybe I misunderstood the old code. In any case, I think this is good
      enough.
      d4ce0921
  32. 24 1月, 2018 1 次提交
    • T
      Teach reparameterize_path() to handle AppendPaths. · 54e1599c
      Tom Lane 提交于
      If we're inside a lateral subquery, there may be no unparameterized paths
      for a particular child relation of an appendrel, in which case we *must*
      be able to create similarly-parameterized paths for each other child
      relation, else the planner will fail with "could not devise a query plan
      for the given query".  This means that there are situations where we'd
      better be able to reparameterize at least one path for each child.
      
      This calls into question the assumption in reparameterize_path() that
      it can just punt if it feels like it.  However, the only case that is
      known broken right now is where the child is itself an appendrel so that
      all its paths are AppendPaths.  (I think possibly I disregarded that in
      the original coding on the theory that nested appendrels would get folded
      together --- but that only happens *after* reparameterize_path(), so it's
      not excused from handling a child AppendPath.)  Given that this code's been
      like this since 9.3 when LATERAL was introduced, it seems likely we'd have
      heard of other cases by now if there were a larger problem.
      
      Per report from Elvis Pranskevichus.  Back-patch to 9.3.
      
      Discussion: https://postgr.es/m/5981018.zdth1YWmNy@hammer.magicstack.net
      54e1599c
  33. 27 9月, 2017 7 次提交
    • E
      Don't assume a subquery's output is unique if there's a SRF in its tlist · e7ff3ef1
      Ekta Khanna and Jemish Patel 提交于
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Tue Jul 8 14:03:32 2014 -0400
      
          While the x output of "select x from t group by x" can be presumed unique,
          this does not hold for "select x, generate_series(1,10) from t group by x",
          because we may expand the set-returning function after the grouping step.
          (Perhaps that should be re-thought; but considering all the other oddities
          involved with SRFs in targetlists, it seems unlikely we'll change it.)
          Put a check in query_is_distinct_for() so it's not fooled by such cases.
      
          Back-patch to all supported branches.
      
          David Rowley
      
      (cherry picked from commit 2e7469dc8b3bac4fe0f9bd042aaf802132efde85)
      e7ff3ef1
    • D
      Rename all 8.4-9.0 merge FIXMEs as `GPDB_84_MERGE_FIXME` · 2228c939
      Dhanashree Kashid, Ekta Khanna and Omer Arap 提交于
      We had a bunch of fixmes that we added as part of the subselect merge;
      All of the fixmes are now marked as `GPDB_84_MERGE_FIXME` so that they can
      be grepped easily.
      2228c939
    • D
      Implement CDB like pre-join deduplication · efb2777a
      Dhanashree Kashid, Ekta Khanna and Omer Arap 提交于
      For flattened IN or EXISTS sublinks, if we chose INNER JOIN path instead
      of SEMI JOIN then we need to apply duplicate suppression.
      
      The deduplication can be done in two ways:
      1. post-join dedup
      unique-ify the inner join results. try_postjoin_dedup in CdbRelDedupInfo denotes
      if we need to got for post-join dedup
      
      2. pre-join dedup
      unique-ify the rows coming from the rel containing the subquery result,
      before that is joined with any other rels. join_unique_ininfo in
      CdbRelDedupInfo denotes if we need to go for pre-join dedup.
      semi_operators and semi_rhs_exprs are used for this. We ported a
      function from 9.5 to compute these in make_outerjoininfo().
      
      Upstream has completely different implementation of this. Upstream explores JOIN_UNIQUE_INNER
      and JOIN_UNIQUE_OUTER paths for this and deduplication is done create_unique_path().
      GPDB does this differently since JOIN_UNIQUE_INNER and JOIN_UNIQUE_OUTER are obsolete
      for us. Hence we have kept the GPDB style deduplication mechanism as it in this merge.
      
      Post-join has been implemented in previous merge commits.
      
      Ref [#146890743]
      efb2777a
    • S
      CDB Specific changes, other fix-ups after merging e549722a · e5f6e826
      Shreedhar Hardikar 提交于
      0. Fix up post join dedup logic after cherry-pick
      0. Fix pull_up_sublinks_jointree_recurse returning garbage relids
      0. Update gporca, rangefuncs, eagerfree answer fileis
      	1. gporca
      	Previously we were generating a Hash Inner Join with an
      	HashAggregate for deduplication. Now we generate a Hash
      	Semi Join in which case we do not need to deduplicate the
      	inner.
      
      	2. rangefuncs
      	We updated this answer file during the cherry-pick of
      	e006a24a since there was a change in plan.
      	After these cherry-picks, we are back to the original
      	plan as master. Hence we see the original error.
      
      	3. eagerfree
      	We are generating a not-very-useful subquery scan node
      	with this change. This is not producing wrong results.
      	But this subqeury scan needs to be removed.
      	We will file a follow-up chore to investigate and fix this.
      
      0. We no longer need helper function `hasSemiJoin()` to check whether
      this specialInfo list has any specialJoinInfos constructed for Semi Join
      (IN/EXISTS sublink). We have moved that check inside
      `cdb_set_cheapest_dedup()`
      
      0. We are not exercising the pre-join-deduplication code path after
      this cherry-pick. Before this merge, we had three CDB specific
      nodes in `InClauseInfo` in which we recorded information for
      pre-join-dedup in case of simple uncorrelated IN sublinks.
      `try_join_unique`, `sub_targetlist` and `InOperators`
      Since we now have `SpecialJoinInfo` instead of `InClauseInfo`, we need
      to devise a way to record this information in `SpecialJoinInfo`.
      We have filed a follow-up story for this.
      
      Ref [#142356521]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      e5f6e826
    • E
      Remove InClauseInfo and OuterJoinInfo · 8b63aafb
      Ekta Khanna 提交于
      Since `InClauseInfo` and `OuterJoinInfo` are now combined into
      `SpecialJoinInfo` after merging with e006a24a; this commit remove them
      from the relevant places.
      
      Access `join_info_list` instead of `in_info_list` and `oj_info_list`
      
      Previously, `CdbRelDedupInfo` contained list of `InClauseInfo` s. While
      making join decisions and overall join processing, we traversed this list
      and invoked cdb specific functions: `cdb_make_rel_dedup_info()`, `cdbpath_dedup_fixup()`
      
      Since `InClauseInfo` is no longer available,  `CdbRelDedupInfo` will contain list of
      `SpecialJoinInfo` s. All the cdb specific routines which were previously called for
      `InClauseInfo` list will now be called if `CdbRelDedupInfo` has valid `SpecialJoinInfo`
      list and if join type in `SpecialJoinInfo` is `JOIN_SEMI`. A new helper routine `hasSemiJoin()`
      has been added which traverses `SpecialJoinInfo` list to check if it contains `JOIN_SEMI`.
      
      Ref [#142355175]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      8b63aafb
    • E
      CDBlize the cherry-pick e006a24a · 0feb1bd9
      Ekta Khanna 提交于
      Original Flow:
      cdb_flatten_sublinks
      	+--> pull_up_IN_clauses
      		+--> convert_sublink_to_join
      
      New Flow:
      cdb_flatten_sublinks
      	+--> pull_up_sublinks
      
      This commit contains relevant changes for the above flow.
      
      Previously, `try_join_unique` was part of `InClauseInfo`. It was getting
      set in `convert_IN_to_join()` and used in `cdb_make_rel_dedup_info()`.
      Now, since `InClauseInfo` is not present and we construct
      `FlattenedSublink` instead in `convert_ANY_sublink_to_join()`. And later
      in the flow, we construct `SpecialJoinInfo` from `FlattenedSublink` in
      `deconstruct_sublink_quals_to_rel()`. Hence, adding `try_join_unique` as
      part of both `FlattenedSublink` and `SpecialJoinInfo`.
      
      Ref [#142355175]
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      0feb1bd9
    • E
      Implement SEMI and ANTI joins in the planner and executor. · fe2eb2c9
      Ekta Khanna 提交于
      commit e006a24a
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Thu Aug 14 18:48:00 2008 +0000
      
          Implement SEMI and ANTI joins in the planner and executor.  (Semijoins replace
          the old JOIN_IN code, but antijoins are new functionality.)  Teach the planner
          to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti
          joins respectively.  Also, LEFT JOINs with suitable upper-level IS NULL
          filters are recognized as being anti joins.  Unify the InClauseInfo and
          OuterJoinInfo infrastructure into "SpecialJoinInfo".  With that change,
          it becomes possible to associate a SpecialJoinInfo with every join attempt,
          which permits some cleanup of join selectivity estimation.  That needs to be
          taken much further than this patch does, but the next step is to change the
          API for oprjoin selectivity functions, which seems like material for a
          separate patch.  So for the moment the output size estimates for semi and
          especially anti joins are quite bogus.
      
      Ref [#142355175]
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      fe2eb2c9