1. 14 3月, 2019 2 次提交
  2. 01 2月, 2019 1 次提交
    • H
      Use normal hash operator classes for data distribution. · 242783ae
      Heikki Linnakangas 提交于
      Replace the use of the built-in hashing support for built-in datatypes, in
      cdbhash.c, with the normal PostgreSQL hash functions. Now is a good time
      to do this, since we've already made the change to use jump consistent
      hashing in GPDB 6, so we'll need to deal with the upgrade problems
      associated with changing the hash functions, anyway.
      
      It is no longer enough to track which columns/expressions are used to
      distribute data. You also need to know the hash function used. For that,
      a new field is added to gp_distribution_policy, to record the hash
      operator class used for each distribution key column. In the planner,
      a new opfamily field is added to DistributionKey, to track that throughout
      the planning.
      
      Normally, if you do "CREATE TABLE ... DISTRIBUTED BY (column)", the
      default hash operator class for the datatype is used. But this patch
      extends the syntax so that you can specify the operator class explicitly,
      like "... DISTRIBUTED BY (column opclass)". This is similar to how an
      operator class can be specified for each column in CREATE INDEX.
      
      To support upgrade, the old hash functions have been converted to special
      (non-default) operator classes, named cdbhash_*_ops. For example, if you
      want to use the old hash function for an integer column, you could do
      "DISTRIBUTED BY (intcol cdbhash_int4_ops)". The old hard-coded whitelist
      of operators that have "compatible" cdbhash functions has been replaced
      by putting the compatible hash opclasses in the same operator family. For
      example, all legacy integer operator classes, cdbhash_int2_ops,
      cdbhash_int4_ops and cdbhash_int8_ops, are all part of the
      cdbhash_integer_ops operator family).
      
      This removes the pg_database.hashmethod field. The hash method is now
      tracked on a per-table and per-column basis, using the opclasses, so it's
      not needed anymore.
      
      To help with upgrade from GPDB 5, this introduces a new GUC called
      'gp_use_legacy_hashops'. If it's set, CREATE TABLE uses the legacy hash
      opclasses, instead of the default hash opclasses, if the opclass is not
      specified explicitly. pg_upgrade will set the new GUC, to force the use of
      legacy hashops, when restoring the schema dump. It will also set the GUC
      on all upgraded databases, as a per-database option, so any new tables
      created after upgrade will also use the legacy opclasses. It seems better
      to be consistent after upgrade, so that collocation between old and new
      tables work for example. The idea is that some time after the upgrade, the
      admin can reorganize all tables to use the default opclasses instead. At
      that point, he should also clear the GUC on the converted databases. (Or
      rather, the automated tool that hasn't been written yet, should do that.)
      
      ORCA doesn't know about hash operator classes, or the possibility that we
      might need to use a different hash function for two columns with the same
      datatype. Therefore, it cannot produce correct plans for queries that mix
      different distribution hash opclasses for the same datatype, in the same
      query. There are checks in the Query->DXL translation, to detect that
      case, and fall back to planner. As long as you stick to the default
      opclasses in all tables, we let ORCA to create the plan without any regard
      to them, and use the default opclasses when translating the DXL plan to a
      Plan tree. We also allow the case that all tables in the query use the
      "legacy" opclasses, so that ORCA works after pg_upgrade. But a mix of the
      two, or using any non-default opclasses, forces ORCA to fall back.
      
      One curiosity with this is the "int2vector" and "aclitem" datatypes. They
      have a hash opclass, but no b-tree operators. GPDB 4 used to allow them
      as DISTRIBUTED BY columns, but we forbid that in GPDB 5, in commit
      56e7c16b. Now they are allowed again, so you can specify an int2vector
      or aclitem column in DISTRIBUTED BY, but it's still pretty useless,
      because the planner still can't form EquivalenceClasses on it, and will
      treat it as "strewn" distribution, and won't co-locate joins.
      
      Abstime, reltime, tinterval datatypes don't have default hash opclasses.
      They are being removed completely on PostgreSQL v12, and users shouldn't
      be using them in the first place, so instead of adding hash opclasses for
      them now, we accept that they can't be used as distribution key columns
      anymore. Add a check to pg_upgrade, to refuse upgrade if they are used
      as distribution keys in the old cluster. Do the same for 'money' datatype
      as well, although that's not being removed in upstream.
      
      The legacy hashing code for anyarray in GPDB 5 was actually broken. It
      could produce a different hash value for two arrays that are considered
      equal, according to the = operator, if there were differences in e.g.
      whether the null bitmap was stored or not. Add a check to pg_upgrade, to
      reject the upgrade if array types were used as distribution keys. The
      upstream hash opclass for anyarray works, though, so it is OK to use
      arrays as distribution keys in new tables. We just don't support binary
      upgrading them from GPDB 5. (See github issue
      https://github.com/greenplum-db/gpdb/issues/5467). The legacy hashing of
      'anyrange' had the same problem, but that was new in GPDB 6, so we don't
      need a pg_upgrade check for that.
      
      This also tightens the checks ALTER TABLE ALTER COLUMN and CREATE UNIQUE
      INDEX, so that you can no longer create a situation where a non-hashable
      column becomes the distribution key. (Fixes github issue
      https://github.com/greenplum-db/gpdb/issues/6317)
      
      Discussion: https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/4fZVeOpXllQCo-authored-by: NMel Kiyama <mkiyama@pivotal.io>
      Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io>
      Co-authored-by: NPengzhou Tang <ptang@pivotal.io>
      Co-authored-by: NChris Hajas <chajas@pivotal.io>
      Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
      Reviewed-by: NNing Yu <nyu@pivotal.io>
      Reviewed-by: NSimon Gao <sgao@pivotal.io>
      Reviewed-by: NJesse Zhang <jzhang@pivotal.io>
      Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
      Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
      Reviewed-by: NYandong Yao <yyao@pivotal.io>
      242783ae
  3. 09 1月, 2019 1 次提交
    • R
      Do not enforce join ordering for ANTI and LASJ. (#6625) · 29daab51
      Richard Guo 提交于
      The following identity holds true:
      
      	(A antijoin B on (Pab)) innerjoin C on (Pac)
          	= (A innerjoin C on (Pac)) antijoin B on (Pab)
      
      So we should not enforce join ordering for ANTI. Instead we need to
      collapse ANTI join nodes so that they participate fully in the join
      order search.
      
      For example:
      
      	select * from a join b on a.i = b.i where
      		not exists (select i from c where a.i = c.i);
      
      For this query, the origin join order is "(a innerjoin b) antijoin c". If
      we enforce ANTI join ordering, this will be the final join order. But
      another join order "(a antijoin c) innerjoin b" is also legal. We should
      take this order into consideration and pick a cheaper one.
      
      For LASJ, it is the same as ANTI joins.
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
      29daab51
  4. 07 1月, 2019 1 次提交
    • H
      Refactor executor Squelch and EagerFree functions. · 6195b967
      Heikki Linnakangas 提交于
      Merge the two concepts, squelching and eager-freeing. There is now only
      one function, ExecSquelchNode(), that you can call. It recurses to
      children, as before, but it now also performs eager-freeing of resources.
      Previously that was done as a separate call, but that was an unnecessary
      distinction, because all callers of ExecSquelchNode() also called
      ExecEagerFree()
      
      The concept of eager-freeing still lives on, as ExecEagerFree*() functions
      specific to many node types. But it no longer recurses! The pattern is
      that ExecSquelchNode() always performs eager freeing of the node, and also
      recurses. In addition to that, some node types also call the node-specific
      EagerFree function of the same node, after reaching the end of tuples.
      
      This makes it more clear which function should be called when.
      
      ExecEagerWalker() used to have special handling for the pattern of:
        Result
        -> Material
          -> Broadcast Motion
      
      I tried removing that, but then I started to get "illegal rescan of motion
      node" errors in the regression tests, from queries with a LIMIT node in a
      subquery. Upon closer look, I believe that was because the Limit node was
      calling ExecSquelchNode() on the input, even though the Limit node was
      marked as rescannable. To fix that, I added delayEagerFree logic to Limit
      node, to not call ExecSquelchNode() when the node might get rescanned
      later.
      
      The planstate_walk_kids() code did not know how to recurse into the
      children of a MergeAppend node. We missed adding that logic, when we
      merged the MergeAppend node type from upstream, in 9.1. We don't use that
      mechanism for recursing in ExecSquelchNode() anymore, but that probably
      should be fixed, anyway, as a separate commit later.
      
      Fixes https://github.com/greenplum-db/gpdb/issues/6602 and
      https://github.com/greenplum-db/gpdb/issues/6074.
      Reviewed-by: NTang Pengzhou <ptang@pivotal.io>
      6195b967
  5. 29 10月, 2018 1 次提交
    • T
      Simplify direct dispatch related code (#6080) · 576690f2
      Tang Pengzhou 提交于
      * Simplify direct dispatch related code
      
      This commit include two parts:
      * simplify direct-dispatch dispatching code
      * simplify direct-dispatch DTM related code
      
      Previously, cdbdisp_dispatchToGang need a CdbDispatchDirectDesc info,
      now gang only contain inuse segments, so direct-dispatch info is useless.
      
      Another thing is, we need to decide if DTM is available for direct-dispatch
      within dtmPreCommand, the logic is complex, you need to know if the main plan
      is direct-dispatch and if the init plan contain direct-dispatch.
      
      one example is:
      "update foo set foo.c2 = 2
      where foo.c1 = 1 and exists (select * from bar where bar.c1=4)"
      
      main plan can be direct dispatched to segment 1, init plan can be direct
      dispatched to segment 2, with the old logic, the DTM like PREPARE need to
      dispatched to all segments, so dtmPreCommand need to dispatch a DTM named
      'DTX_PROTOCOL_COMMAND_STAY_AT_OR_BECOME_IMPLIED' to all segment so those
      segments like segment 3 who didn't receive the plan can be ready for two
      phase commit.
      
      With the new gang API, we can simplify this process, we add a list in
      currentGxact to record which segments are actually get involved in a two
      phase commit, then we can dispatch DTM to them directly. This is also very
      usefully for queries on tables that are not fully expaned yet.
      
      * support direct dispatch to more than one segment
      576690f2
  6. 10 10月, 2018 1 次提交
    • R
      Enable ALL_SUBLINK pullup based on LHS input. · 996639e0
      Richard Guo 提交于
      This patch takes LHS input into consideration when pull up sublink of type
      ALL_SUBLINK. As a result, there would be more subplans conveyed into joins,
      such as the query below:
      
      ```
      select * from A,B where exists (select * from C where C.j = A.j and B.i
          not in (select C.i from C where C.i != 10));
      ```
      
      This is only possible after commit eb919e8f, in which a set of
      cartesian-product joins would be generated as a last-ditch effort, if we failed
      to find any usable joins.
      Authored-by: NRichard Guo <guofenglinux@gmail.com>
      Authored-by: NAlexandra Wang <leiwangcheme@gmail.com>
      Reviewed by: Heikki Linnakangas <hlinnakangas@pivotal.io>
      996639e0
  7. 25 9月, 2018 1 次提交
    • P
      Allow to add motion to unique-ify the path in create_unique_path(). (#5589) · e9fe4224
      Paul Guo 提交于
      create_unique_path() could be used to convert semi join to inner join.
      Previously, during the Semi-join refactor in commit d4ce0921, creating unique
      path was disabled for the case where duplicats might be on different QEs.
      
      In this patch we enable adding motion to unique_ify the path, only if unique
      mothod is not UNIQUE_PATH_NOOP. We don't create unique path for that case
      because if later on during plan creation, it is possible to create a motion
      above this unique path whose subpath is a motion. In that case, the unique path
      node will be ignored and we will get a motion plan node above a motion plan
      node and that is bad. We could further improve that, but not in this patch.
      Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
      Co-authored-by: NPaul Guo <paulguo@gmail.com>
      e9fe4224
  8. 22 9月, 2018 1 次提交
    • H
      Change pretty-printing of expressions in EXPLAIN to match upstream. · 4c54c894
      Heikki Linnakangas 提交于
      We had changed this in GPDB, to print less parens. That's fine and dandy,
      but it hardly seems worth it to carry a diff vs upstream for this. Which
      format is better, is a matter of taste. The extra parens make some
      expressions more clear, but OTOH, it's unnecessarily verbose for simple
      expressions. Let's follow the upstream on this.
      
      These changes were made to GPDB back in 2006, as part of backporting
      to EXPLAIN-related patches from PostgreSQL 8.2. But I didn't see any
      explanation for this particular change in output in that commit message.
      
      It's nice to match upstream, to make merging easier. However, this won't
      make much difference to that: almost all EXPLAIN plans in regression
      tests are different from upstream anyway, because GPDB needs Motion nodes
      for most queries. But every little helps.
      4c54c894
  9. 18 9月, 2018 2 次提交
  10. 11 9月, 2018 1 次提交
    • Z
      Make planner generate redistribute-motion. · a4cbf586
      ZhangJackey 提交于
      When doing an inner join, we will test that if we can use redistribute motion
      by the function cdbpath_partkeys_from_preds. But if a_partkey is NIL(it is NIL
      at the beginning of the function), we append nothing into it. Thus this
      function will only return false. This leads to the planner can only generate a
      broadcast motion for the inner relation.
      
      We fix this by the same logic as an outer join.
      
      WTS node is immovable, this commit adds some code to handle it.
      Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
      Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>
      a4cbf586
  11. 07 9月, 2018 1 次提交
  12. 31 8月, 2018 1 次提交
    • H
      Replace GPDB versions of some numeric aggregates with upstream's. · 325e6fcd
      Heikki Linnakangas 提交于
      Among other things, this fixes the inaccuracy of integer avg() and sum()
      functions. (i.e. fixes https://github.com/greenplum-db/gpdb/issues/5525)
      
      The upstream versions are from PostgreSQL 9.6, using the 128-bit math
      from the following commit:
      
      commit 959277a4
      Author: Andres Freund <andres@anarazel.de>
      Date:   Fri Mar 20 10:26:17 2015 +0100
      
          Use 128-bit math to accelerate some aggregation functions.
      
          On platforms where we support 128bit integers, use them to implement
          faster transition functions for sum(int8), avg(int8),
          var_*(int2/int4),stdev_*(int2/int4). Where not supported continue to use
          numeric as a transition type.
      
          In some synthetic benchmarks this has been shown to provide significant
          speedups.
      
          Bumps catversion.
      
          Discussion: 544BB5F1.50709@proxel.se
          Author: Andreas Karlsson
          Reviewed-By: Peter Geoghegan, Petr Jelinek, Andres Freund,
              Oskari Saarenmaa, David Rowley
      325e6fcd
  13. 18 8月, 2018 1 次提交
  14. 03 8月, 2018 1 次提交
  15. 02 8月, 2018 1 次提交
    • R
      Merge with PostgreSQL 9.2beta2. · 4750e1b6
      Richard Guo 提交于
      This is the final batch of commits from PostgreSQL 9.2 development,
      up to the point where the REL9_2_STABLE branch was created, and 9.3
      development started on the PostgreSQL master branch.
      
      Notable upstream changes:
      
      * Index-only scan was included in the batch of upstream commits. It
        allows queries to retrieve data only from indexes, avoiding heap access.
      
      * Group commit was added to work effectively under heavy load. Previously,
        batching of commits became ineffective as the write workload increased,
        because of internal lock contention.
      
      * A new fast-path lock mechanism was added to reduce the overhead of
        taking and releasing certain types of locks which are taken and released
        very frequently but rarely conflict.
      
      * The new "parameterized path" mechanism was added. It allows inner index
        scans to use values from relations that are more than one join level up
        from the scan. This can greatly improve performance in situations where
        semantic restrictions (such as outer joins) limit the allowed join orderings.
      
      * SP-GiST (Space-Partitioned GiST) index access method was added to support
        unbalanced partitioned search structures. For suitable problems, SP-GiST can
        be faster than GiST in both index build time and search time.
      
      * Checkpoints now are performed by a dedicated background process. Formerly
        the background writer did both dirty-page writing and checkpointing. Separating
        this into two processes allows each goal to be accomplished more predictably.
      
      * Custom plan was supported for specific parameter values even when using
        prepared statements.
      
      * API for FDW was improved to provide multiple access "paths" for their tables,
        allowing more flexibility in join planning.
      
      * Security_barrier option was added for views to prevents optimizations that
        might allow view-protected data to be exposed to users.
      
      * Range data type was added to store a lower and upper bound belonging to its
        base data type.
      
      * CTAS (CREATE TABLE AS/SELECT INTO) is now treated as utility statement. The
        SELECT query is planned during the execution of the utility. To conform to
        this change, GPDB executes the utility statement only on QD and dispatches
        the plan of the SELECT query to QEs.
      Co-authored-by: NAdam Lee <ali@pivotal.io>
      Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
      Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
      Co-authored-by: NAsim R P <apraveen@pivotal.io>
      Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
      Co-authored-by: NGang Xiong <gxiong@pivotal.io>
      Co-authored-by: NHaozhou Wang <hawang@pivotal.io>
      Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
      Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
      Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
      Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
      Co-authored-by: NPaul Guo <paulguo@gmail.com>
      Co-authored-by: NRichard Guo <guofenglinux@gmail.com>
      Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
      Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
      Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>
      4750e1b6
  16. 09 7月, 2018 1 次提交
  17. 11 4月, 2018 1 次提交
    • A
      Address GPDB_84_MERGE_FIXME in simplify_EXISTS_query() · 99450728
      Abhijit Subramanya 提交于
      This FIXME is two-fold:
      - Handling LIMIT 0
        The LIMIT is already handled in the caller,
        convert_EXISTS_sublink_to_join(): When an existential sublink contains
        an aggregate without GROUP BY or HAVING, we can safely replace it by a
        one-time TRUE/FALSE filter based on the type of sublink since the
        result of aggregate is always going to be one row even if it's input
        rows are 0.  However this assumption is incorrect when sublink
        contains LIMIT/OFFSET, such as, if the final limit count after
        applying the offset is 0.
      
      - Rules for demoting HAVING to WHERE
        previously, simplify_EXISTS_query() only disallowed demoting HAVING
        quals to WHERE, if it did not contain any aggregates. To determine the
        same, previously it used query->hasAggs, which is incorrect
        since hasAggs indicates that aggregate is present either in targetlist
        or HAVING.  This penalized the queries, wherein HAVING did not contain
        the agg but targetlist did (as demonstrated in the newly added test).
        This check is now replaced by contain_aggs_of_level().  Also, do not
        demote if HAVING contains volatile functions since they need to be
        evaluated once per group.
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      99450728
  18. 22 3月, 2018 1 次提交
    • D
      Fix wrong results for queries of type NOT (select <>) · a0a38df9
      Dhanashree Kashid 提交于
      For queries of the form "NOT (subselect)", planner lost the "NOT" operator
      during the initial pull-up in pull_up_sublinks_qual_recurse() which resulted in
      incorrect filter and hence wrong results. Inside
      pull_up_sublinks_qual_recurse(), when a qual contains NOT, we check if sublink
      type is any one of these: EXISTS, ANY or ALL, and invoke appropriate sublink
      pull up routines. In case of qual of the form "NOT (SELECT <>)" the sublink
      type is EXPR; hence we recurse into the argument of NOT; at which point we
      lose the information about NOT operator.
      
      This commit fixes the issue by returning the node unmodified when the argument
      of NOT is an EXPR sublink. The EXPR sublink, later gets pulled up by
      preprocess_qual_conditions() wherein, pull_up_sublinks() is invoked again to
      handle sublinks in an expression.
      Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
      a0a38df9
  19. 09 2月, 2018 1 次提交
    • H
      Refactor the way Semi-Joins plans are constructed. · d4ce0921
      Heikki Linnakangas 提交于
      This removes much of the GPDB machinery to handle "deduplication paths"
      within the planner. We will now use the upstream code to build JOIN_SEMI
      paths, as well as paths where the outer side of the join is first
      deduplicated (JOIN_UNIQUE_OUTER/INNER).
      
      The old style "join first and deduplicate later" plans can be better in
      some cases, however. To still be able to generate such plan, add new
      JOIN_DEDUP_SEMI join type, which is transformed into JOIN_INNER followed
      by the deduplication step after the join, during planning.
      
      This new way of constructing these plans is simpler, and allows removing
      a bunch of code, and reverting some more code to the way it is in the
      upstream.
      
      I'm not sure if this can generate the same plans that the old code could,
      in all cases. In particular, I think the old "late deduplication"
      mechanism could delay the deduplication further, all the way to the top of
      the join tree. I'm not sure when that woud be useful, though, and the
      regression suite doesn't seem to contain any such cases (with EXPLAIN). Or
      maybe I misunderstood the old code. In any case, I think this is good
      enough.
      d4ce0921
  20. 12 1月, 2018 1 次提交
    • S
      Fix Filter required properties for correlated subqueries in ORCA · 59abec44
      Shreedhar Hardikar 提交于
      This commit brings in ORCA changes that ensure that a Materialize node is not
      added under a Filter when its child contains outer references.  Otherwise, the
      subplan is not rescanned (because it is under a Material), producing wrong
      results. A rescan is necessary for it evaluates the subplan for each of the
      outer referenced values.
      
      For example:
      
      ```
      SELECT * FROM A,B WHERE EXISTS (
        SELECT * FROM E WHERE E.j = A.j and B.i NOT IN (
          SELECT E.i FROM E WHERE E.i != 10));
      ```
      
      For the above query ORCA produces a plan with two nested subplans:
      
      ```
      Result
        Filter: (SubPlan 2)
        ->  Gather Motion 3:1
              ->  Nested Loop
                    Join Filter: true
                    ->  Broadcast Motion 3:3
                          ->  Table Scan on a
                    ->  Table Scan on b
        SubPlan 2
          ->  Result
                Filter: public.c.j = $0
                ->  Materialize
                      ->  Result
                            Filter: (SubPlan 1)
                            ->  Materialize
                                  ->  Gather Motion 3:1
                                        ->  Table Scan on c
                            SubPlan 1
                              ->  Materialize
                                    ->  Gather Motion 3:1
                                          ->  Table Scan on c
                                                Filter: i <> 10
      ```
      
      The Materialize node (on top of Filter with Subplan 1) has cdb_strict = true.
      The cdb_strict semantics dictate that when the Materialize is rescanned,
      instead of destroying its tuplestore, it resets the accessor pointer to the
      beginning and the subtree is NOT rescanned.
      So the entries from the first scan are returned for all future calls; i.e. the
      results depend on the first row output by the cross join. This causes wrong and
      non-deterministic results.
      
      Also, this commit reinstates this test in qp_correlated_query.sql. It also
      fixes another wrong result caused by the same issue. Note that the changes in
      rangefuncs_optimizer.out are because ORCA now no longer falls back for those
      queries. Instead it produces a plan which is executed on master (instead of the
      segments as was done by planner) which changes the error messages.
      
      Also bump ORCA version to 2.53.8.
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      59abec44
  21. 11 11月, 2017 1 次提交
    • D
      Align simplify_EXISTS_query with upstream · c823e7c6
      Dhanashree Kashid 提交于
      This function had diverged a lot from upstream; post subselect merge.
      One of the main reason is that upstream has lot of restrictive checks
      which prevent pull-up of EXISTS/NOT EXISTS. GPDB handles them
      differently; thus producing a join/initplan or a one-time filter.
      
      The cases that GPDB handles and for which we have not ported the checks
      from upstream are as follows:
      
      - AGG with limit count with/without offset
      - HAVING clause without AGG
      - AGG without HAVING clause
      
      For other conditions, we bail out as upstream. Hence we have added
      checks differently for having and aggs inside simplify_EXISTS_query.
      Rest of the code is similar to upstream.
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      c823e7c6
  22. 27 9月, 2017 8 次提交
    • T
      Disable flattening of IN/EXISTS sublinks inside outer joins · fb1448d0
      Tom Lane 提交于
      commit 07b9936a
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Fri Feb 27 23:30:29 2009 +0000
      
          Temporarily (I hope) disable flattening of IN/EXISTS sublinks that are within
          the ON clause of an outer join.  Doing so is semantically correct but results
          in de-optimizing queries that were structured to take advantage of the sublink
          style of execution, as seen in recent complaint from Kevin Grittner.  Since
          the user can get the other behavior by reorganizing his query, having the
          flattening happen automatically is just a convenience, and that doesn't
          justify breaking existing applications.  Eventually it would be nice to
          re-enable this, but that seems to require a significantly different approach
          to outer joins in the executor.
      
      Added relevant test case.
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      fb1448d0
    • E
      Misc test answer file changes · 8bd49b1b
      Ekta Khanna 提交于
      Signed-off-by: NJemish Patel <jpatel@pivotal.io>
      8bd49b1b
    • S
      Remove is_simple_subquery() check in simplify_EXISTS_query() · 77f804f5
      Shreedhar Hardikar 提交于
      GPDB handles a lot of the cases that are restricted by
      is_simple_subquery; and the restrictions not handled, are checked for
      separately in convert_EXISTS_sublink_to_join().
      
      Resulting from cascading ICG failures, we also fixed the following:
      
      - initialize all the members of IncrementVarSublevelsUp_context
        properly.
      - remove incorrect assertions brought in from upstream. In GPDB, these
        cases are handled.
      - improve plans for NOT EXISTS sub-queries containing an aggregation
        without limits by creating a "false" one-time filter.
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      77f804f5
    • S
      Remove dead code around JoinExpr::subqfromlist. · f16deabd
      Shreedhar Hardikar 提交于
      This was used to keep information about the subquery join tree for
      pulled-up sublinks for use later in deconstruct_recurse().  With the
      upstream subselect merge, a JoinExpr constructed at the pull-up time
      itself, so this is no longer needed since the subquery join tree
      information is available in the constructed JoinExpr.
      
      Also with the merge, deconstruct_recurse() handles JOIN_SEMI JoinExprs.
      However, since GPDB differs from upstream by treating SEMI joins as
      INNER join for internal join planning, this commit also updates
      inner_join_rels correctly for SEMI joins (see regression test).
      
      Also remove unused function declaration for not_null_inner_vars().
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      f16deabd
    • S
      Do not commute inner/outer rels of JOIN_ANTI and JOIN_LASJ_NOTIN · 70be2285
      Shreedhar Hardikar 提交于
      This issue was discovered during the subselect merge. wherin planner
      incorrectly commutes anti joins.
      `cdb_add_subquery_join_paths()` creates join paths for (rel1, rel2) and
      (rel2, rel1) for all join types including JOIN_ANTI and JOIN_LASJ_NOTIN.
      This produces wrong results since these joins are order-sensitive w.r.t
      inner and outer relations (see new regression tests). So, do not add
      (rel2, rel1) for JOIN_ANTI and JOIN_LASJ_NOTIN.
      
      This commit also refactors cdb_add_subquery_join_paths() and
      make_join_rel() to make it easier to control the commuting.
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      70be2285
    • T
      Change EXPLAIN output so that subplans and initplans · acfd5f4d
      Tom Lane 提交于
      commit fbcce080
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Sun Apr 5 19:59:40 2009 +0000
      
          Change EXPLAIN output so that subplans and initplans (particularly CTEs)
          are individually labeled, rather than just grouped under an "InitPlan"
          or "SubPlan" heading.  This in turn makes it possible for decompilation of
          a subplan reference to usefully identify which subplan it's referencing.
          I also made InitPlans identify which parameter symbol(s) they compute,
          so that references to those parameters elsewhere in the plan tree can
          be connected to the initplan that will be executed.  Per a gripe from
          Robert Haas about EXPLAIN output of a WITH query being inadequate,
          plus some longstanding pet peeves of my own.
      acfd5f4d
    • D
      Updated test answer files after merging dc9cc88 · 745bcd2b
      Dhanashree Kashid 提交于
      After the introduction of `placeholder` mechanism,
      we see some changes in the plan.
      These are all good changes in that we remove the
      redundant `NOTIN_subquery` subquery scan nodes:
      
      Before
      ```
      ...
      ->  Broadcast Motion 3:3  (slice1; segments: 3)
      	->  Subquery Scan "NotIn_SUBQUERY"
      		->  Seq Scan on subselect_tbl
      			Filter: f3 IS NOT NULL
      ```
      After
      ```
      ...
      ->  Broadcast Motion 3:3  (slice1; segments: 3)
      	 ->  Seq Scan on subselect_tbl
      	 	 Filter: f3 IS NOT NULL
      ```
      
      New plans have better cost since we will not be executing
      subquery scan for each tuple coming from its OUTER.
      
      We are now consistent with plans produced for same queries
      with IN instead of NOT IN.
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      745bcd2b
    • S
      Update subselect EXPLAIN tests · e0415acc
      Shreedhar Hardikar 提交于
      This commit updates test answer files after merging e006a24a, 19e34b62 and
      e549722a. As a result of these merges, we produce inferior plans for
      which we have added FIX_ME's appropriately. They will be addressed in
      later commits.
      e0415acc
  23. 12 9月, 2017 1 次提交
  24. 30 6月, 2017 1 次提交
  25. 14 2月, 2017 1 次提交
    • H
      Remove redundant queries from schema_topology. · b0690a83
      Heikki Linnakangas 提交于
      Most of these queries were already in the qp_correlated_query test. A few
      were not, moved those instead of outright removing them.
      
      This removed the last queries from schema_topology whose output was
      different with ORCA than without ORCA, so remove the ORCA-specific expected
      output file.
      b0690a83
  26. 15 12月, 2016 3 次提交
    • X
      Cleanup ICG qp_correlated_query · 8db2ce37
      Xin Zhang 提交于
      - Remove RESET ALL so that it can actually test with optimizer=on
      - Remove duplicated population of table A, B, C
      - Rename tables touched by DML to avoid duplicated setup
      - Fix subquery to verify the correct result other than verify error message
      - Remove query with non-deterministic result
      
      Now, with GPORCA, all SQL statements passed without ERROR. Planner still got few
      failures due to multi-level correlated subqueries, which is expected.
      8db2ce37
    • X
      Revert "Cleanup ICG qp_correlated_query" · 3b4fd28c
      Xin Zhang 提交于
      This reverts commit 6f2e35bc.
      
      Pipeline detect failure when codegen is enabled with gporca.
      3b4fd28c
    • X
      Cleanup ICG qp_correlated_query · 6f2e35bc
      Xin Zhang 提交于
      - Remove RESET ALL so that it can actually test with optimizer=on
      - Remove duplicated population of table A, B, C
      - Rename tables touched by DML to avoid duplicated setup
      - Fix subquery to verify the correct result other than verify error message
      
      Now, with GPORCA, all SQL statements passed without ERROR. Planner still got few
      failures due to multi-level correlated subqueries, which is expected.
      6f2e35bc
  27. 06 12月, 2016 1 次提交
    • X
      Fix for issue with EXISTS subquery with CTE · 5838bc1b
      Xin Zhang 提交于
      When we have a correlated subquery with EXISTS and CTE, the planner
      produces wrong plan as:
      
      ```
      pivotal=# explain with t as (select 1) select * from foo where exists (select * from bar where foo.a = 'a' and foo.b = 'b');
                                                   QUERY PLAN
      ----------------------------------------------------------------------------------------------------
       Gather Motion 3:1  (slice3; segments: 3)  (cost=0.01..1522.03 rows=5055210000 width=16)
         ->  Nested Loop  (cost=0.01..1522.03 rows=1685070000 width=16)
               ->  Broadcast Motion 1:3  (slice2; segments: 1)  (cost=0.01..0.03 rows=1 width=0)
                     ->  Limit  (cost=0.01..0.01 rows=1 width=0)
                           ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.01..0.03 rows=1 width=0)
                                 ->  Limit  (cost=0.01..0.01 rows=1 width=0)
                                       ->  Result  (cost=0.01..811.00 rows=23700 width=0)
                                             One-Time Filter: $0 = 'a'::bpchar AND $1 = 'b'::bpchar
                                             ->  Seq Scan on bar  (cost=0.01..811.00 rows=23700 width=0)
               ->  Seq Scan on foo  (cost=0.00..811.00 rows=23700 width=16)
       Optimizer status: legacy query optimizer
      (11 rows)
      ```
      
      This failed during execution because $0 is referenced across the slices.
      
      Root Cause:
      That's because planner produce a plan with `$0` aka `param` but without
      `subplan`.  The `param` is created by `replace_outer_var()`, when planner
      detects a query referring to relations from its outer/parent query.  Such `var`
      is created when removing `sublink` in `convert_EXISTS_to_join()` function.  In
      that function, when handling the `EXISTS` query, we convert the `EXISTS_sublink`
      to a `subquery RTE` (and expect it to get pulled up later by
      `pull_up_subquery()`.  However the subquery cannot be pulled-up by
      `pull_up_subquery()` since it is not a simple subquery (`is_simple_subquery()`
      returns false because of CTE in this case).  In this case, the `sublink` got
      removed, hence it cannot produce the `subplan` (which is an valid option).  And
      the `var` left behind as outer-reference, and then covered to param, which is
      blowed up during query execution.  There is a mismatching conditions between
      `convert_EXISTS_to_join()` and `pull_up_subquery()` about whether this subquery
      can be pulled-up.
      
      The fix is to reuse `is_simple_subquery()` checking when
      `convert_EXISTS_to_join()`, so that it can be consistent with
      `pull_up_subquery()` on whether subquery can be pulled or not.
      
      The correct plan after fix is:
      
      ```
      pivotal=# explain with t as (select 1) select * from foo where exists (select * from bar where foo.a = 'a' and foo.b = 'b');
                                                            QUERY PLAN
      ----------------------------------------------------------------------------------------------------------------------
       Gather Motion 3:1  (slice2; segments: 3)  (cost=0.00..1977.50 rows=35550 width=16)
         ->  Seq Scan on foo  (cost=0.00..1977.50 rows=11850 width=16)
               Filter: (subplan)
               SubPlan 1
                 ->  Result  (cost=0.01..811.00 rows=23700 width=16)
                       One-Time Filter: $0 = 'a'::bpchar AND $1 = 'b'::bpchar
                       ->  Result  (cost=882.11..1593.11 rows=23700 width=16)
                             ->  Materialize  (cost=882.11..1593.11 rows=23700 width=16)
                                   ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.01..811.00 rows=23700 width=16)
                                         ->  Seq Scan on bar  (cost=0.01..811.00 rows=23700 width=16)
       Optimizer status: legacy query optimizer
      (11 rows)
      ```
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      5838bc1b
  28. 18 11月, 2016 2 次提交
    • H
      Move some correlated test queries from bugbuster to qp_correlated_query. · b1af58aa
      Heikki Linnakangas 提交于
      We have similar tests there, and have the same test tables set up
      already.
      b1af58aa
    • H
      Use proper error code for errors. · 0bf31cd6
      Heikki Linnakangas 提交于
      Attach a suitable error code for many errors that were previously reported
      as "internal errors". GPDB's elog.c prints a source file name and line
      number for any internal errors, which is a bit ugly for errors that are
      in fact unexpected internal errors, but user-facing errors that happen
      as a result of e.g. an invalid query.
      
      To make sure we don't accumulate more of these, adjust the regression tests
      to not ignore the source file and line number in error messages. There are
      a few exceptions, which are listed explicitly.
      0bf31cd6