1. 02 7月, 2020 2 次提交
    • J
      Check whether the directory exists when deleting the tablespace (#10388) · 270a775b
      Jinbao Chen 提交于
      If the directory of tablespace does not exist, we should got a
      error on commit transaction. But error on commit transaction will
      cause a panic. So the directory of tablespace should be checked
      so that we can avoid panic.
      270a775b
    • Z
      Make QEs does not use the GUC gp_enable_global_deadlock_detector. · fde4af75
      Zhenghua Lyu 提交于
      Previously, there are some code executed in QEs that will check
      the value of the GUC gp_enable_global_deadlock_detector. The
      historical reason is that:
        Before we have GDD, UPDATE|DELETE operations cannot be
        concurrently executed, and we do not meet the EvalPlanQual
        issue and concurrent split update issue. After we have GDD,
        we meet such issues and add code to resolve them, and we
        add the code `if (gp_enable_global_deadlock_detector)` at
        the very first. It is just like habitual thinking.
      
      In fact, we do not rely on it in QEs and I tried to remove this
      in the context: #9992 (review).
      I tried to add an assert there but could not handle utility mode
      as Heikki's comments. To continue that idea, we can just remove
      the check of gp_enable_global_deadlock_detector. This brings two
      benefits:
        1. Some users only change this GUC on master. By removing the usage
           in QEs, it is safe now to make the GUC master only.
        2. We can bring back the skills to only restart master node's postmaster
           to enable GDD, this save a lot of time in pipeline. This commit
           also do this for the isolation2 test cases: lockmodes and gdd/.
      
      The Github issue #10030  is gone after this commit.
      
      Also, this commit backports lockmodes test from master branch and
      refactor concurrent_update test cases.
      fde4af75
  2. 01 7月, 2020 1 次提交
    • F
      README: Fix documentation to install Greenplum with ORCA in Ubuntu (#10253) · 335b99ad
      Francisco Guerrero 提交于
      While installing ORCA in Ubuntu 18.04 I ran into some issues while
      trying to install gp-xerces. During the build, Greenplum was trying to
      link against the installed xerces `libxerces-c-dev`, which is installed
      by running the `./README.ubuntu.bash` script. By default, Ubuntu 18.04
      will install xerces 3.2, which is incompatible with the version used by
      Greenplum 6.
      335b99ad
  3. 30 6月, 2020 7 次提交
    • J
      Fix binary_swap failure. · c17af285
      Junfeng(Jerome) Yang 提交于
      The pg_get_viewdef() function was fixed to properly show views define.
      Should remove the view which not supported in old version before the
      binary swap test.
      c17af285
    • H
      Skip foreign table when gpexpand · 5b508da5
      Hubert Zhang 提交于
      For non partition table, we will skip gpexpand external tables,
      but for partition table, when one of its child parition is a
      external table, we error out when gpexpand. This is not a correct
      behavior.
      Since data of foreign table is located outside gpdb, skip these
      tables when gpexpand is enough.
      
      Co-Authored-by: Ning Yu nyu@pivotal.io
      
      (cherry picked from commit b9691eba)
      5b508da5
    • (
      Fix CASE WHEN IS NOT DISTINCT FROM clause incorrect dump. (#10365) · 70eee180
      (Jerome)Junfeng Yang 提交于
      The clause 'CASE WHEN (arg1) IS NOT DISTINCT FROM (arg2)' dump will miss
      the arg1. For example:
      ```
      CREATE OR REPLACE VIEW xxxtest AS
      SELECT
          CASE
          WHEN 'I will disappear' IS NOT DISTINCT FROM ''::text
          THEN 'A'::text
          ELSE 'B'::text
          END AS t;
      ```
      The dump will lose 'I will disappear'.
      
      ```
      SELECT
          CASE
          WHEN IS NOT DISTINCT FROM ''::text
          THEN 'A'::text
          ELSE 'B'::text
          END AS t;
      ```
      
      For below example:
      ```
      CREATE TABLE mytable2 (
          key character varying(20) NOT NULL,
          key_value character varying(50)
      ) DISTRIBUTED BY (key);
      
      CREATE VIEW aaa AS
      SELECT
          CASE mytable2.key_value
              WHEN IS NOT DISTINCT FROM 'NULL'::text THEN 'now'::text::date
              ELSE to_date(mytable2.key_value::text, 'YYYYMM'::text)
              END AS t
          FROM mytable2;
      
      ```
      
      mytable2.key_value will cast to type date. For clause `(ARG1) IS NOT
      DISTINCT FROM (ARG2)`, this leads ARG1 become a RelabelType node and
      contains CaseTestExpr node in RelabelType->arg.
      
      So when dumping the view, it'll mark dump as
      
      ```
      select pg_get_viewdef('notdisview3',false);
                                     pg_get_viewdef
      -----------------------------------------------------------------------------
        SELECT                                                                    +
               CASE mytable2.key_value                                            +
                   WHEN (CASE_TEST_EXPR) IS NOT DISTINCT FROM 'NULL'::text THEN ('now'::text)::date+
                   ELSE to_date((mytable2.key_value)::text, 'YYYYMM'::text)       +
               END AS t                                                           +
          FROM mytable2;
      (1 row)
      ```
      
      I dig into commit a453004e, if left-hand argument for `IS NOT DISTINCT
      FROM` contains any `CaseTestExpr` node, the left-hand arg should be omitted.
      `CaseTestExpr` is a placeholder for CASE expression.
      70eee180
    • N
      resgroup: fix the cpu value of the per host status view · 9c81451f
      Ning Yu 提交于
      Resource group we does not distinguish the per segment cpu usage, the
      cpu usage reported by a segment is actually the total cpu usage of all
      the segments on the host.  This is by design, not a bug.  However, in
      the gp_toolkit.gp_resgroup_status_per_host view it reports the cpu usage
      as the sum of all the segments on the same host, so the reported per
      host cpu usage is actually N times of the actual usage, where N is the
      count of the segments on that host.
      
      Fixed by reporting the avg() instead of the sum().
      
      Tests are not provided as the resgroup/resgroup_views did not verify cpu
      usages since the beginning, because the cpu usage is unstable on
      pipelines.  However, I have verified manually.
      Reviewed-by: NHubert Zhang <hzhang@pivotal.io>
      (cherry picked from commit e0d78729)
      9c81451f
    • P
      Fix assert failure in cdbcomponent_getCdbComponents() (#10355) · 8b6e19ab
      Paul Guo 提交于
      It could be called in utility mode, however we should avoid calling into
      FtsNotifyProber() for such case. Ideally we could do that if we access the
      master node and postgres is no in single mode, but it seems that we do not that
      need.
      
      Here is the stack of the issue I encountered.
      
      0  0x0000003397e32625 in raise () from /lib64/libc.so.6
      1  0x0000003397e33e05 in abort () from /lib64/libc.so.6
      2  0x0000000000b39844 in ExceptionalCondition (conditionName=0xeebac0 "!(Gp_role == GP_ROLE_DISPATCH)", errorType=0xeebaaf "FailedAssertion",
          fileName=0xeeba67 "cdbfts.c", lineNumber=101) at assert.c:66
      3  0x0000000000bffb1e in FtsNotifyProber () at cdbfts.c:101
      4  0x0000000000c389a4 in cdbcomponent_getCdbComponents () at cdbutil.c:714
      5  0x0000000000c26e3a in gp_pgdatabase__ (fcinfo=0x7ffd7b009c10) at cdbpgdatabase.c:74
      6  0x000000000076dbd6 in ExecMakeTableFunctionResult (funcexpr=0x3230fc0, econtext=0x3230988, argContext=0x3232b68, expectedDesc=0x3231df8, randomAccess=0 '\000',
      	    operatorMemKB=32768) at execQual.c:2275
      ......
      18 0x00000000009afeb2 in exec_simple_query (query_string=0x316ce20 "select * from gp_pgdatabase") at postgres.c:1778
      Reviewed-by: NHubert Zhang <hzhang@pivotal.io>
      
      Cherry-picked from 06d7fe0a
      8b6e19ab
    • D
      Doc updates to cover packaging changes/additions for versions 7/6 (#10364) · 4631aedb
      David Yozie 提交于
      * Doc updates to cover packaging changes/additions for versions 7/6
      
      * Removing nav link to non-default steps
      
      * Add missing chown examples; add note about symbolic link update in upgrade procedure
      
      * Typo fix
      4631aedb
    • D
      Docs - update PXF version info · ab59c8dd
      David Yozie 提交于
      ab59c8dd
  4. 29 6月, 2020 4 次提交
    • A
      Skip external tables during analyze · a990ebde
      Ashwin Agrawal 提交于
      Analyze error'd with "ERROR: unsupported table type" if external table
      is part of partition hierarchy. Also, that blocks analyzing database
      wide as can't analyze such partitions. This commit adds check to skip
      external tables during acquire_inherited_sample_rows().
      
      Fixes #9506 github issue.
      Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
      a990ebde
    • (
      Fix tupdesc dangling pointer segfault in HashAgg (#10384) · ef010af2
      (Jerome)Junfeng Yang 提交于
      This problem manifests itself with HashAgg on the top of
      DynamicIndexScan node and can cause a segmentation fault.
      
      1. A HashAgg node initializes a tuple descriptor for its hash
      slot using a reference from input tuples (coming from
      DynamicIndexScan through a Sequence node).
      2. At the end of every partition index scan in DynamicIndexScan
      we unlink and free unused memory chunks and reset partition's
      memory context. It causes a total destruction of all objects in
      the context including partition index tuple descriptor used in a
      HashAgg node.
      As a result we get a dangling pointer in HashAgg on switching to
      a new index partition during DynamicIndexScan that can cause a
      segfault.
      
      This backports from commit: 41ce55bf.
      Co-authored-by: NDenis Smirnov <sd@arenadata.io>
      ef010af2
    • N
      gpexpand: cleanup new segments in parallel · 328f215f
      Ning Yu 提交于
      When cleaning up the master-only files on the new segments we used to do
      the job one by one, when there are tens or hundreds of segments it can
      be very slow.
      
      Now we cleanup in parallel.
      
      (cherry picked from commit 857763ae)
      Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
      328f215f
    • E
      For Python testing artifacts, introduce combination of Concourse cache and pip --cache-dir. · 8ee3e05f
      Ed Espino 提交于
      For the Python testing artifacts used by the CLI tools, utilize the
      Concourse cached directories feature to create and use a pip cache dir
      shared between task runs.
      
      Be aware, the cache is scoped to the worker the task is run on. We do
      not get a cache hit when subsequent builds run on different workers.
      
      * The environment variable PIP_CACHE_DIR is used to store the cache
      directory.
      
      * Add "--retries 10" to Behave test dependency pip install commands.
      8ee3e05f
  5. 26 6月, 2020 7 次提交
  6. 25 6月, 2020 9 次提交
    • S
      Recompile plperl to set the right RUNPATH · 67877c4e
      Shaoqi Bai 提交于
      Currently, GPDB5 is built with --enable-rpath (default configure
      option). For plperl, it's Makefile specifies an absolute path to the
      location of "$(perl -MConfig -e 'print $Config{archlibexp}')/CORE"
      (e.g., /usr/lib64/perl5/CORE on RHEL7). This directory is not on the
      default search path for the runtime linker. Without the proper RUNPATH
      entry, libperl.so cannot be found when postgres tries to load the plperl
      extension.
      
      Without setting correct RUNPATH for plperl.so, will see a error like
      followin:
      ERROR:  could not load library
      "/usr/local/greenplum-db-devel/lib/postgresql/plperl.so": libperl.so:
      cannot open shared object file: No such file or directory
      Authored-by: NShaoqi Bai <bshaoqi@vmware.com>
      67877c4e
    • B
      Conditionally set $PYTHONHOME only when vendoring python · 111b9db6
      Bradford D. Boyle 提交于
      Setting `$PYTHONHOME` for releases where python is not vendored causes
      the system python to fail with an error message about not being able to
      find the site module.
      
      This commit updates the logic in generate-greenplum_path.sh to check if
      we are vendoring python and if we are, it includes setting `$PYTHONHOME`
      in greenplum_path.sh. If we are vendoring python, then at build-time
      `$PYTHONHOME` is set to point at our vendored python.
      
      [#173046174]
      Authored-by: NBradford D. Boyle <bradfordb@vmware.com>
      111b9db6
    • B
      Remove AIX logic from generate-greenplum-path.sh · 800d1f07
      Bradford D. Boyle 提交于
      [#173046174]
      Authored-by: NBradford D. Boyle <bradfordb@vmware.com>
      800d1f07
    • S
      Recompile plpython subdir to set the right RUNPATH · d7a8a2a2
      Shaoqi Bai 提交于
      Authored-by: NShaoqi Bai <sbai@pivotal.io>
      d7a8a2a2
    • T
      Add curly braces for GPHOME var · 2875e8c4
      Tingfang Bao 提交于
      Authored-by: NTingfang Bao <baotingfang@gmail.com>
      2875e8c4
    • B
      Using $ORIGIN as RUNPATH for runtime link · c6bd54a4
      Bradford D. Boyle 提交于
      When upgrading from GPDB5 to GPDB6, gpupgrade will need to be able to call
      binaries from both major versions. Relying on LD_LIBRARY_PATH is not an option
      because this can cause binaries to load libraries from the wrong version.
      Instead, we need the libraries to have RPATH/RUNPATH set correctly. Since the
      built binaries may be relocated we need to use a relative path.
      
      This commit disables the rpath configure option (which would result in an
      absolute path) and use LDFLAGS to use `$ORIGIN`.
      
      For most ELF files a RUNPATH of `$ORIGIN/../lib` is correct. For pygresql
      python module and the quicklz_compressor extension, the RUNPATH needs to be
      adjusted accordingly. The LDFLAGS for those artifacts can be modified with
      different environment variables PYGRESQL_LDFLAGS and QUICKLZ_LDFLAGS.
      
      We always use `--enable-new-dtags` to set RUNPATH. On CentOS 6, with new dtags,
      both DT_RPATH and DT_RUNPATH are set and DT_RPATH will be ignored.
      
      [#171588878]
      Co-authored-by: NBradford D. Boyle <bboyle@pivotal.io>
      Co-authored-by: NXin Zhang <xzhang@pivotal.io>
      c6bd54a4
    • T
      Update generate-greenplum-path.sh for upgrade package · d02cb625
      Tingfang Bao 提交于
      Following the [Greenplum Server RPM Packaging Specification][0], we need
      to update greenplum_path.sh file, and ensure many environment variables
      set correct.
      
      There are a few basic requirments for Greenplum Path Layer:
      
      * greenplum-path.sh shall be installed to `${installation
        prefix}/greenplum-db-[package-version]/greenplum_path.sh`
      * ${GPHOME} is set by given parameter, by default it should point to
        `%{installation prefix}/greenplum-db-devel`
      * `${LD_LIBRARY_PATH}` shall be safely set to avoid a trailing colon
        (which will cause the linker to search the current directory when
        resolving shared objects)
      * `${PYTHONHOME}` shall be set to `${GPHOME}/ext/python`
      * `${PYTHONPATH}` shall be set to `${GPHOME}/lib/python`
      * `${PATH}` shall be set to `${GPHOME}/bin:${PYTHONHOME}/bin:${PATH}`
      * If the file `${GPHOME}/etc/openssl.cnf` exists then `${OPENSSL_CONF}`
        shall be set to `${GPHOME}/etc/openssl.cnf`
      * The greenplum_path.sh file shall pass [ShellCheck][1]
      
      [0]: https://github.com/greenplum-db/greenplum-database-release/blob/master/Greenplum-Server-RPM-Packaging-Specification.md#detailed-package-behavior
      [1]: https://github.com/koalaman/shellcheckCo-authored-by: NTingfang Bao <bbao@pivotal.io>
      Co-authored-by: NXin Zhang <zhxin@vmware.com>
      Co-authored-by: NNing Wu <ningw@vmware.com>
      Co-authored-by: NShaoqi Bai <bshaoqi@vmware.com>
      d02cb625
    • H
      Improve handling of target lists of window queries · d3fcb525
      Hans Zeller 提交于
      Fixing two bugs related to handling queries with window functions and
      refactoring the related code.
      
      ORCA can't handle expressions on window functions like rank() over() -
      1 in a target list. To avoid these, we split Query blocks that contain
      them into two. The new lower Query computes the window functions, the
      new upper Query computes the expressions.
      
      We use three mutators and walkers to help with this process:
      
      Increase the varlevelsup of outer references in the new lower Query,
      since we now inserted a new scope above it.  Split expressions on
      window functions into the window functions (for the lower scope) and
      expressions with a Var substituted for the WindowFunc (for the upper
      scope). Also adjust the varattno for Vars that now appear in the upper
      scope.  Increase the ctelevelsup for any RangeTblEntrys in the lower
      scope.  The bugs we saw were related to these mutators. The second one
      didn't recurse correctly into the required types of subqueries, the
      third one didn't always increment the query level correctly. The
      refactor hopefully will simplify this code somewhat. For details, see
      individual commit messages.
      
      Note: In addition to cherry-picking from the master branch, also
      removed the temporary check that triggers a fallback to planner when
      we see window queries with outer refs in them. See #10265.
      
      * Add test cases
      * Refactor: Renaming misc variables and methods
      * Refactor RunIncrLevelsUpMutator
      
      Made multiple changes to how we use the mutator:
      
      1. Start the call with a method from gpdbwrappers.h, for two reasons:
         a) execute the needed wrapping code for GPDB calls
         b) avoid calling the walker function on the top node, since we don't
            want to increment the query level when we call the method on a
            query node
      
      2. Now that we don't have to worry anymore about finding a top-level
         query node, simplify the logic to recurse into subqueries by simply
         doing that when we encounter a Query node further down. Remove the
         code dealing with sublinks, RTEs, CTEs.
      
      3. From inside the walker functions, call GPDB methods without going
         through the wrapping layer again.
      
      4. Let the mutator code make a copy of the target entry instead of
         creating one before calling the mutator.
      
      * Refactor RunWindowProjListMutator, fix bug
      
      Same as previous commit, this time RunWindowProjListMutator gets refactored.
      This change also should fix one of the bugs we have seen, that this
      mutator did not recurse into derived tables that were inside scalar
      subqueries in the select list.
      
          Made multiple changes to how we use the mutator:
      
          1. Start the call with a method from gpdbwrappers.h, for two reasons:
             a) execute the needed wrapping code for GPDB calls
             b) avoid calling the walker function on the top node, since we don't
                want to increment the query level when we call the method on a
                query node
      
          2. Now that we don't have to worry anymore about finding a top-level
             query node, simplify the logic to recurse into subqueries by simply
             doing that when we encounter a Query node further down. Remove the
             code dealing with sublinks, RTEs, CTEs.
      
          3. From inside the walker functions, call GPDB methods without going
             through the wrapping layer again.
      
          4. Let the mutator code make a copy of the target entry instead of
             creating one before calling the mutator.
      
      * Refactor RunFixCTELevelsUpMutator, fix bug
      
      Converted this mutator into a walker, since only walkers visit RTEs, which
      makes things a lot easier.
      
      Fixed a bug where we incremented the CTE levels for scalar subqueries
      that went into the upper-level query.
      
      Otherwise, same types of changes as in previous two commits.
      
      * Refactor and reorder code
      
      Slightly modified the flow in methods CQueryMutators::ConvertToDerivedTable
      and CQueryMutators::NormalizeWindowProjList
      
      * Remove obsolete methods
      * Update expected files
      
      See  https://github.com/greenplum-db/gpdb/pull/10309
      
      (cherry picked from commit 33c4582e)
      d3fcb525
    • H
  7. 24 6月, 2020 6 次提交
  8. 23 6月, 2020 4 次提交
    • (
      Fix flaky appendonly test. (#10349) · 6a730079
      (Jerome)Junfeng Yang 提交于
      This fix the error:
      ```
      ---
      /tmp/build/e18b2f02/gpdb_src/src/test/regress/expected/appendonly.out
      2020-06-16 08:30:46.484398384 +0000
      +++ /tmp/build/e18b2f02/gpdb_src/src/test/regress/results/appendonly.out
      2020-06-16 08:30:46.556404454 +0000
      @@ -709,8 +709,8 @@
         SELECT oid FROM pg_class WHERE relname='tenk_ao2'));
             case    | objmod | last_sequence | gp_segment_id
              -----------+--------+---------------+---------------
            + NormalXid |      0 | 1-2900        |             1
              NormalXid |      0 | >= 3300       |             0
            - NormalXid |      0 | >= 3300       |             1
              NormalXid |      0 | >= 3300       |             2
              NormalXid |      1 | zero          |             0
              NormalXid |      1 | zero          |             1
      ```
      
      The flaky is because of the orca `CREATE TABLE` statement without
      `DISTRIBUTED BY` will treat the table as randomly distributed.
      But the planner will treat as distributed by the table's first column.
      
      ORCA:
      ```
      CREATE TABLE tenk_ao2 with(appendonly=true, compresslevel=0,
      blocksize=262144) AS SELECT * FROM tenk_heap;
      NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL
      policy entry.
      ```
      
      Planner:
      ```
      CREATE TABLE tenk_ao2 with(appendonly=true, compresslevel=0,
      blocksize=262144) AS SELECT * FROM tenk_heap;
      NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s)
      named 'unique1' as the Greenplum Database data distribution key for this
      table.
      ```
      
      So the data distribution for table tenk_ao2 is not as expected.
      Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
      6a730079
    • Z
      Make cdbpullup_missingVarWalker also consider PlaceHolderVar. · 5750a0f2
      Zhenghua Lyu 提交于
      When planner adds a redistribute motion above this subplan, planner
      will invoke `cdbpullup_findEclassInTargetList` to make sure the
      distkey can be computed based on subplan's targetlist. When the distkey
      is an expression based on some PlaceholderVar elements in targetlist,
      the function `cdbpullup_missingVarWalker` does not handle it correctly.
      
      For example, when distkey is:
      
      ```sql
      CoalesceExpr [coalescetype=23 coalescecollid=0 location=586]
              [args]
                      PlaceHolderVar [phrels=0x00000040 phid=1 phlevelsup=0]
                              [phexpr]
                                      CoalesceExpr [coalescetype=23 coalescecollid=0 location=49]
                                              [args] Var [varno=6 varattno=1 vartype=23 varnoold=6 varoattno=1]
      ```
      
      and targetlist is:
      
      ```
      TargetEntry [resno=1]
              Var [varno=2 varattno=1 vartype=23 varnoold=2 varoattno=1]
      TargetEntry [resno=2]
              Var [varno=2 varattno=2 vartype=23 varnoold=2 varoattno=2]
      TargetEntry [resno=3]
              PlaceHolderVar [phrels=0x00000040 phid=1 phlevelsup=0]
                      [phexpr]
                              CoalesceExpr [coalescetype=23 coalescecollid=0 location=49]
                                      [args] Var [varno=6 varattno=1 vartype=23 varnoold=6 varoattno=1]
      TargetEntry [resno=4]
              PlaceHolderVar [phrels=0x00000040 phid=2 phlevelsup=0]
                      [phexpr]
                              CoalesceExpr [coalescetype=23 coalescecollid=0 location=78]
                                      [args] Var [varno=6 varattno=2 vartype=23 varnoold=6 varoattno=2]
      ```
      
      Previously only consider Var leads to `cdbpullup_missingVarWalker` fail.
      
      See Github issue: https://github.com/greenplum-db/gpdb/issues/10315 for
      details.
      
      This commit fixes the issue by considering PlaceHolderVar in function
      `cdbpullup_missingVarWalker`.
      5750a0f2
    • B
      Fix pgupgrade unit tests · a4fffa3d
      Bhuvnesh Chaudhary 提交于
      - Add --disable-gpcloud configure flag as its not required for testing
        pg_upgrade
      - Remove -fsanitize=address
      a4fffa3d
    • D
      Docs - updating bookb build dependencies · b6d4e444
      David Yozie 提交于
      b6d4e444