1. 15 10月, 2020 1 次提交
    • S
      Increment ExternalScan::scancounter across queries in ORCA · de9b0e26
      Shreedhar Hardikar 提交于
      gpfdist uses the global xid & timestamp to distinguish whether each
      connection belongs to the same external scan or not.
      
      ORCA generates a unique scan number for each ExternalScan within the
      same plan, but not accross plans. So, within a transaction, we may issue
      multiple external scans that do not get differentiated properly,
      producing different results.
      
      This commit patches that by using a different scan number accross plans,
      just like what planner does. Ideally gpfdist should also take into
      account the command-id of the query to prevent this problem for other
      cases such as prepared statements.
      de9b0e26
  2. 23 9月, 2020 1 次提交
    • J
      Format ORCA and GPOPT. · 4a37ae94
      Jesse Zhang 提交于
      The canonical config file is in src/backend/gporca/.clang-format, I've
      created two symlinks, one for GPOPT headers, one for GPOPT.
      
      This is spiritually a cherry-pick of commit 2f7dd76c, but with
      the actual code of this branch (6X_STABLE) formatted, of course.
      
      (cherry picked from commit 2f7dd76c)
      4a37ae94
  3. 30 7月, 2020 1 次提交
    • D
      Add Orca support for index only scan · 93c9829a
      David Kimura 提交于
      This commit allows Orca to select plans that leverage IndexOnlyScan
      node. A new GUC 'optimizer_enable_indexonlyscan' is used to enable or
      disable this feature. Index only scan is disabled by default, until the
      following issues are addressed:
      
        1) Implement cost comparison model for index only scans. Currently,
           cost is hard coded for testing purposes.
        2) Support index only scan using GiST and SP-GiST as allowed.
           Currently, code only supports index only scans on b-tree index.
      Co-authored-by: NChris Hajas <chajas@vmware.com>
      (cherry picked from commit 3b72df18)
      93c9829a
  4. 20 11月, 2019 1 次提交
  5. 01 6月, 2019 1 次提交
  6. 01 2月, 2019 1 次提交
    • H
      Use normal hash operator classes for data distribution. · 242783ae
      Heikki Linnakangas 提交于
      Replace the use of the built-in hashing support for built-in datatypes, in
      cdbhash.c, with the normal PostgreSQL hash functions. Now is a good time
      to do this, since we've already made the change to use jump consistent
      hashing in GPDB 6, so we'll need to deal with the upgrade problems
      associated with changing the hash functions, anyway.
      
      It is no longer enough to track which columns/expressions are used to
      distribute data. You also need to know the hash function used. For that,
      a new field is added to gp_distribution_policy, to record the hash
      operator class used for each distribution key column. In the planner,
      a new opfamily field is added to DistributionKey, to track that throughout
      the planning.
      
      Normally, if you do "CREATE TABLE ... DISTRIBUTED BY (column)", the
      default hash operator class for the datatype is used. But this patch
      extends the syntax so that you can specify the operator class explicitly,
      like "... DISTRIBUTED BY (column opclass)". This is similar to how an
      operator class can be specified for each column in CREATE INDEX.
      
      To support upgrade, the old hash functions have been converted to special
      (non-default) operator classes, named cdbhash_*_ops. For example, if you
      want to use the old hash function for an integer column, you could do
      "DISTRIBUTED BY (intcol cdbhash_int4_ops)". The old hard-coded whitelist
      of operators that have "compatible" cdbhash functions has been replaced
      by putting the compatible hash opclasses in the same operator family. For
      example, all legacy integer operator classes, cdbhash_int2_ops,
      cdbhash_int4_ops and cdbhash_int8_ops, are all part of the
      cdbhash_integer_ops operator family).
      
      This removes the pg_database.hashmethod field. The hash method is now
      tracked on a per-table and per-column basis, using the opclasses, so it's
      not needed anymore.
      
      To help with upgrade from GPDB 5, this introduces a new GUC called
      'gp_use_legacy_hashops'. If it's set, CREATE TABLE uses the legacy hash
      opclasses, instead of the default hash opclasses, if the opclass is not
      specified explicitly. pg_upgrade will set the new GUC, to force the use of
      legacy hashops, when restoring the schema dump. It will also set the GUC
      on all upgraded databases, as a per-database option, so any new tables
      created after upgrade will also use the legacy opclasses. It seems better
      to be consistent after upgrade, so that collocation between old and new
      tables work for example. The idea is that some time after the upgrade, the
      admin can reorganize all tables to use the default opclasses instead. At
      that point, he should also clear the GUC on the converted databases. (Or
      rather, the automated tool that hasn't been written yet, should do that.)
      
      ORCA doesn't know about hash operator classes, or the possibility that we
      might need to use a different hash function for two columns with the same
      datatype. Therefore, it cannot produce correct plans for queries that mix
      different distribution hash opclasses for the same datatype, in the same
      query. There are checks in the Query->DXL translation, to detect that
      case, and fall back to planner. As long as you stick to the default
      opclasses in all tables, we let ORCA to create the plan without any regard
      to them, and use the default opclasses when translating the DXL plan to a
      Plan tree. We also allow the case that all tables in the query use the
      "legacy" opclasses, so that ORCA works after pg_upgrade. But a mix of the
      two, or using any non-default opclasses, forces ORCA to fall back.
      
      One curiosity with this is the "int2vector" and "aclitem" datatypes. They
      have a hash opclass, but no b-tree operators. GPDB 4 used to allow them
      as DISTRIBUTED BY columns, but we forbid that in GPDB 5, in commit
      56e7c16b. Now they are allowed again, so you can specify an int2vector
      or aclitem column in DISTRIBUTED BY, but it's still pretty useless,
      because the planner still can't form EquivalenceClasses on it, and will
      treat it as "strewn" distribution, and won't co-locate joins.
      
      Abstime, reltime, tinterval datatypes don't have default hash opclasses.
      They are being removed completely on PostgreSQL v12, and users shouldn't
      be using them in the first place, so instead of adding hash opclasses for
      them now, we accept that they can't be used as distribution key columns
      anymore. Add a check to pg_upgrade, to refuse upgrade if they are used
      as distribution keys in the old cluster. Do the same for 'money' datatype
      as well, although that's not being removed in upstream.
      
      The legacy hashing code for anyarray in GPDB 5 was actually broken. It
      could produce a different hash value for two arrays that are considered
      equal, according to the = operator, if there were differences in e.g.
      whether the null bitmap was stored or not. Add a check to pg_upgrade, to
      reject the upgrade if array types were used as distribution keys. The
      upstream hash opclass for anyarray works, though, so it is OK to use
      arrays as distribution keys in new tables. We just don't support binary
      upgrading them from GPDB 5. (See github issue
      https://github.com/greenplum-db/gpdb/issues/5467). The legacy hashing of
      'anyrange' had the same problem, but that was new in GPDB 6, so we don't
      need a pg_upgrade check for that.
      
      This also tightens the checks ALTER TABLE ALTER COLUMN and CREATE UNIQUE
      INDEX, so that you can no longer create a situation where a non-hashable
      column becomes the distribution key. (Fixes github issue
      https://github.com/greenplum-db/gpdb/issues/6317)
      
      Discussion: https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/4fZVeOpXllQCo-authored-by: NMel Kiyama <mkiyama@pivotal.io>
      Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io>
      Co-authored-by: NPengzhou Tang <ptang@pivotal.io>
      Co-authored-by: NChris Hajas <chajas@pivotal.io>
      Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
      Reviewed-by: NNing Yu <nyu@pivotal.io>
      Reviewed-by: NSimon Gao <sgao@pivotal.io>
      Reviewed-by: NJesse Zhang <jzhang@pivotal.io>
      Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
      Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
      Reviewed-by: NYandong Yao <yyao@pivotal.io>
      242783ae
  7. 15 12月, 2018 1 次提交
    • H
      Refactor executor code for TableScan, DynamicTableScan, BitmapHeapScan. · db516347
      Heikki Linnakangas 提交于
      This removes a lot of GPDB-specific code that was used to deal with
      dynamic scans, and code duplication between nodes dealing with Heap, AO
      and AOCS tables.
      
      * Resurrect SeqScan node. We had replaced it with TableScan in GPDB.
        Teach SeqScan to also work on append-only and AOCS tables, and remove
        TableScan and all the code changes that were made in GPDB earlier to
        deal with all three table types.
      
      * Merge BitmapHeapScan, BitmapAppendOnlyScan, and BitmapTableScan node
        types. They're all BitmapHeapScans now. We used to use BitmapTableScans
        in ORCA-generated plans, and BitmapHeapScan/BitmapAppendOnlyScan in
        planner-generated plans, and there was no good reason for the
        difference. The "heap" part in the name is a bit misleading, but I
        prefer to keep the upstream name, even though it now handles AO tables
        as well. It's more like the old BitmapTableScan now, which also handled
        all three table types, but the code is refactored to stay as close to
        upstream as possible.
      
      * Introduce DynamicBitmapHeapScan. BitmapTableScan used to perform Dynamic
        scans too, now it's the responsibility of the new DynamicBitmapHeapScan
        plan node, just like we have DynamicTableScan and DynamicIndexScan as
        wrappers around SeqScand and IndexScans.
      
      * Get rid of BitmapAppendOnlyPath in the planner, too. Use BitmapHeapPath
        also for AO tables.
      
      * Refactor the way Dynamic Table Scan works. A Dynamic Table Scan node
        is now just a thin wrapper around SeqScan. It initializes a new
        SeqScan executor node for every partition, and lets it do the actual
        scanning. It now works the same way that I refactored Dynamic Index
        Scans to work in commit 198f701e. This allowed removing a lot of code
        that we used to use for both Dynamic Index Scans and Dynamic Table
        Scans, but is no longer used.
      
      There's now some duplication in the Dynamic* nodes, to walk through the
      partitions. They all have a function called setPidIndex(), for example,
      which does the same thing. But I think it's much more clear this way,
      than the previous DynamicController stuff. We could perhaps extract some
      of the code to common helper functions, but I think this is OK for now.
      
      This also fixes issue #6274. I'm not sure what exactly the bug was, but it
      was clearly in the Bitmap Table Scan code that is used with ORCA-generated
      plans. Now that we use the same code for plans generated with the Postgres
      planner and ORCA, it's not surprising that the bug is gone.
      Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
      Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
      db516347
  8. 16 8月, 2018 2 次提交
  9. 15 8月, 2018 2 次提交
  10. 01 8月, 2018 1 次提交
  11. 21 11月, 2017 1 次提交
    • H
      Refactor dynamic index scans and bitmap scans, to reduce diff vs. upstream. · 198f701e
      Heikki Linnakangas 提交于
      Much of the code and structs used by index scans and bitmap index scans had
      been fused together and refactored in GPDB, to share code between dynamic
      index scans and regular ones. However, it would be nice to keep upstream
      code unchanged as much as possible. To that end, refactor the exector code
      for dynamic index scans and dynamic bitmap index scans, to reduce the diff
      vs upstream.
      
      The Dynamic Index Scan executor node is now a thin wrapper around the
      regular Index Scan node, even thinner than before. When a new Dynamic Index
      Scan begins, we don't do much initialization at that point. When the scan
      begins, we initialize an Index Scan node for the first partition, and
      return rows from it until it's exhausted. On next call, the underlying
      Index Scan is destroyed, and a new Index Scan node is created, for the next
      partition, and so on. Creating and destroying the IndexScanState for every
      partition adds some overhead, but it's not significant compared to all the
      other overhead of opening and closing the relations, building scan keys
      etc.
      
      Similarly, a Dynamic Bitmap Index Scan executor node is just a thin wrapper
      for regular Bitmap Index Scan. When MultiExecDynamicBitmapIndexScan() is
      called, it initializes an BitmapIndexScanState for the current partition,
      and calls it. On ReScan, the BitmapIndexScan executor node for the old
      partiton is shut down. A Dynamic Bitmap Index Scan differs from Dynamic
      Index Scan in that a Dynamic Index Scan is responsible for iterating
      through all the active partitions, while a Dynamic Bitmap Index Scan works
      as a slave for the Dynamic Bitmap Heap Scan node above it.
      
      It'd be nice to do a similar refactoring for heap scans, but that's for
      another day.
      198f701e
  12. 25 9月, 2017 1 次提交
    • H
      Remove the concept of window "key levels". · b1651a43
      Heikki Linnakangas 提交于
      It wasn't very useful. ORCA and Postgres both just stack WindowAgg nodes
      on top of each other, and no-one's been unhappy about that, so we might as
      well do that, too. This reduces the difference between GPDB and the upstream
      implementation, and will hopefully make it smoother to switch.
      
      Rename the Window Plan node type to WindowAgg, to match upstream, now
      that it is fairly close to the upstream version.
      b1651a43
  13. 17 9月, 2017 1 次提交
    • H
      Convert WindowFrame to frameOptions + start + end · ebf9763c
      Heikki Linnakangas 提交于
      In GPDB, we have so far used a WindowFrame struct to represent the start
      and end window bound, in a ROWS/RANGE BETWEEN clause, while PostgreSQL
      uses the combination of  a frameOptions bitmask and start and end
      expressions. Refactor to replace the WindowFrame with the upstream
      representation.
      ebf9763c
  14. 04 9月, 2017 1 次提交
    • H
      Share external URL-mapping code between planner and ORCA. · cbb8ea18
      Heikki Linnakangas 提交于
      Planner and ORCA translator both implemented the same logic, to assign
      external table URIs to segments. But I spotted one case where the logic
      differed:
      
      CREATE EXTERNAL TABLE exttab_with_on_master( i int, j text )
      LOCATION ('file://@hostname@@abs_srcdir@/data/exttab_few_errors.data') ON MASTER FORMAT 'TEXT' (DELIMITER '|');
      
      SELECT * FROM exttab_with_on_master;
      ERROR:  'ON MASTER' is not supported by this protocol yet.
      
      With ORCA you got a less user-friendly error:
      
      set optimizer=on;
      set optimizer_enable_master_only_queries = on;
      postgres=# explain SELECT * FROM exttab_with_on_master;
      ERROR:  External scan error: Could not assign a segment database for external file (CTranslatorDXLToPlStmt.cpp:472)
      
      The immediate cause of that was that commit fcf82234 didn't remember to
      modify the ORCA translator's copy of the same logic. But really, it's silly
      and error-prone to duplicate the code, so modify ORCA to use the same code
      that the planner does.
      cbb8ea18
  15. 17 8月, 2017 1 次提交
    • H
      Remove unusued Plan.plan_parent_node_id field. · 5c155847
      Heikki Linnakangas 提交于
      This allows removing all the code in CTranslatorDXLToPlStmt that tracked
      the parent of each call.
      
      I found the plan node IDs awkward, when I was hacking on
      CTranslatorDXLToPlStmt. I tried to make a change where a function would
      construct a child Plan node first, and a Result node on top of that, but
      only if necessary, depending on the kind of child plan. The parent plan
      node IDs made it impossible to construct a part of Plan tree like that, in
      a bottom-up fashion, because you always had to pass the parent's ID when
      constructing a child node. Now that is possible.
      5c155847
  16. 19 7月, 2017 1 次提交
    • B
      [#147774653] Implemented ValuesScan Operator in ORCA · 819107b7
      Bhuvnesh Chaudhary 提交于
      This commit introduces a new operator for ValuesScan, earlier we
      generated `UNION ALL` for cases where VALUES lists passed are all
      constants, but now a new Operator CLogicalConstTable with an array of
      const tuples will be generated
      
      Once the plan is generated by ORCA, it will be translated to valuesscan
      node in GPDB.
      
      This enhancement helps significantly in improving the total run time for the queries
      involving values scan in ORCA with const values.
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      819107b7
  17. 04 4月, 2017 1 次提交
    • H
      Fix duplicate typedefs. · 615b4c69
      Heikki Linnakangas 提交于
      It's an error in standard C - at least in older standards - to typedef
      the same type more than once, even if the definition is the same. Newer
      versions of gcc don't complain about it, but you can see the warnings
      with -pedantic (among a ton of other warnings, search for "redefinition").
      
      To fix, remove the duplicate typedefs. The ones in src/backend/gpopt and
      src/include/gpopt were actually OK, because a duplicate typedef is OK in
      C++, and those files are compiled with a C++ compiler. But many of the
      typedefs in those files were not used for anything, so I nevertheless
      removed duplicate ones there too, that caught my eye.
      
      In gpmon.h, we were redefining apr_*_t types when postgres.h had been
      included. But as far as I can tell, that was always - all the files that
      included gpmon, included postgres.h directly or indirectly before that.
      Search & replace the references to apr_*_t types in that file with the
      postgres equivalents, to make it more clear what they actually are.
      615b4c69
  18. 04 8月, 2016 1 次提交
  19. 13 7月, 2016 1 次提交
    • F
      Populate canSetTag of PlannedStmt from Query object (#934) · f01bb84b
      foyzur 提交于
      * Preventing multiple ResLockPortal calls for the same portal when running multiple queries via PortalRunMulti by correctly populating canSetTag in PlannedStmt from Query object during DXL to PlannedStmt translation.
      
      * ICG tests for checking if ORCA correctly populates canSetTag.
      f01bb84b
  20. 19 5月, 2016 1 次提交
  21. 24 11月, 2015 1 次提交
  22. 28 10月, 2015 1 次提交