1. 09 11月, 2020 1 次提交
    • X
      compatible gpload (#11103) · f7174966
      xiaoxiao 提交于
      * refactor gpload test file TEST.py
      
      1. migrate gpload test to pytest
      2. new function to form config file through yaml package and make it more reasonable
      3. add a case to cover gpload update_condition arggument
      
      * migrate gpload and TEST.py to python3.6
      new test case 43 to test gpload behavior when column name has capital letters and without data type
      change some ans file since psql react different
      
      * change sql to find reuseable external table to make gpload compatible in gp7 and gp6
      better TEST.py to write config file with ruamel.yaml moudle
      Co-authored-by: NXiaoxiaoHe <hxiaoxiao@vmware.com>
      f7174966
  2. 25 9月, 2020 5 次提交
    • J
      3c124b2f
    • J
      Update utilities code to work with Python 3 · 78f5cf43
      Jamie McAtamney 提交于
      This commit makes several broad changes to address conversion issues common to
      multiple utilities:
      
      - The input and output of subprocess in Python 3 are now bytestrings instead
        of strings. Thus, some sanitizing of inputs and outputs is necessary
      
      - Many built-in functions like raw_input and __cmp__ are deprecated in Python 3,
        and as a side effect list sorting and hashing work differently, requiring a
        different set of helper functions
      
      - Implicit relative imports no longer work, so dbconn (in utilities code) and
        mgmt_utils (in test code) must be added to the search path and imported using
        a full path instead
      
      - File objects require flush methods in python3, and popen2 has been deprecated
      Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>
      Co-authored-by: NTyler Ramer <tramer@vmware.com>
      78f5cf43
    • T
      Remove subprocess32 · 6071056d
      Tyler Ramer 提交于
      The subprocess32 package is a backport of Python 3 subprocess functionality to
      Python 2, so with the upgrade to Python 3 it is no longer necessary.
      
      This commit deletes the package from pythonSrc and changes import statements to
      import subprocess directly, instead of falling back to it only if subprocess32
      is not importable.
      Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>
      Co-authored-by: NTyler Ramer <tramer@vmware.com>
      6071056d
    • T
      Allow GPDB to build and test with Python 3 · 7306abea
      Tyler Ramer 提交于
      - Update Python file shebangs to use python3 and update gp_replicate_check and
        gpversion.py to allow running under Python 3
      
      - Use Centos 7 dev containers with Python 3 and pip3 installed for testing, as
        prod containers do not yet work with Python 3, and update Travis with Python 3
      
      - Install dependencies with pip3 to get Python 3-compatible versions
      
      - Copy the Python 3 version of .so files, don't unset PYTHONHOME and PYTHONPATH,
        and don't remove built files from install locations, so that the Python 2 and
        Python 3 versions of various files can coexist
      Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>
      Co-authored-by: NKris Macoskey <kmacoskey@vmware.com>
      Co-authored-by: NTyler Ramer <tramer@vmware.com>
      7306abea
    • J
      Run 2to3 against Python code · 54a65573
      Jamie McAtamney 提交于
      The 2to3 utility is an officially-supported script to automatically convert
      Python 2 code to Python 3.  It's not a complete fix by any means, but it
      handles most basic syntax transformations and similar.
      
      This commit is the result of running 2to3 against every Python file in the
      gpMgmt directory, so it's quite large and fairly scattershot.  Manual updates
      to any code that 2to3 can't handle will come in later commits.
      54a65573
  3. 16 9月, 2020 1 次提交
  4. 07 9月, 2020 2 次提交
  5. 31 8月, 2020 1 次提交
  6. 08 7月, 2020 1 次提交
    • T
      Update PyYAML to 5.3.1 · 8d6c3059
      Tyler Ramer 提交于
      The version of PyYAML vendored in gpMgmt/bin/ext is old, unmaintained,
      and does not support python3. Actually, it does not even contain a
      `__version__` attribute, so it is not possible to know the version.
      
      We need to unvendor YAML and get to a library version that supports
      python3 - for this reason, we are updating to the latest PyYAML
      available.
      
      Also update yaml.load to use yaml.safe_load instead.
      Co-authored-by: NTyler Ramer <tramer@vmware.com>
      Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>
      8d6c3059
  7. 17 6月, 2020 1 次提交
    • T
      Update PyGreSQL from 4.0.0 to 5.1.2 · f5758021
      Tyler Ramer 提交于
      This commit updates pygresql from 4.0.0 to 5.1.2, which requires
      numerous changes to take advantages of the major result syntax change
      that pygresql5 implemented. Of note, cursors or query objects
      automatically cast returned values as appropriate python types - list of
      ints, for example, instead of a string like "{1,2}". This is the bulk of
      the changes.
      
      Updating to pygresql 5.1.2 provides numerous benfits, including the
      following:
      
      - CVE-2018-1058 was addressed in pygresql 5.1.1
      
      - We can save notices in the pgdb module, rather than relying on importing
      the pg module, thanks to the new "set_notices()"
      
      - pygresql 5 supports python3
      
      - Thanks to a change in the cursor, using a "with" syntax guarentees a
        "commit" on the close of the with block.
      
      This commit is a starting point for additional changes, including
      refactoring the dbconn module.
      
      Additionally, since isolation2 uses pygresql, some pl/python scripts
      were updated, and isolation2 SQL output is further decoupled from
      pygresql. The output of a psql command should be similar enough to
      isolation2's pg output that minimal or no modification is needed to
      ensure gpdiff can recognize the output.
      Co-Authored-by: NTyler Ramer <tramer@pivotal.io>
      Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io>
      f5758021
  8. 03 6月, 2020 1 次提交
  9. 15 5月, 2020 1 次提交
  10. 12 5月, 2020 1 次提交
  11. 03 4月, 2020 1 次提交
  12. 20 3月, 2020 1 次提交
    • (
      Enable external table's error log to be persistent for ETL. (#9757) · 04fdd0a6
      (Jerome)Junfeng Yang 提交于
      For ETL user scenarios, there are some cases that may frequently create
      and drop the same external table. And once the external table gets dropped.
      All errors stored in the error log will lose.
      
      To enable error log persistent for external with the same
      "dbname"."namespace"."table".
      Bring in "error_log_persistent" external table option. If create the
      external table with `OPTIONS (error_log_persistent 'true')` and `LOG ERROR`,
      the external's error log will be name as "dbid_namespaceid_tablename"
      under "errlogpersistent" directory.
      And drop external table will ignore to delete the error log.
      
      Since GPDB 5, 6 still use pg_exttable's options to mark
      LOG ERRORS PERSISTENTLY, so keep the ability for loading from
      OPTIONS(error_log_persistent 'true').
      
      Create separate `gp_read_persistent_error_log` function to read
      persistent error log.
      If the external table gets deleted, only the namespace owner
      has permission to delete the error log.
      
      Create separate `gp_truncate_persistent_error_log` function to delete
      persistent error log.
      If the external table gets deleted. Only the namespace owner has
      permission to delete the error log.
      It also supports wildcard input to delete error logs
      belong to a database or whole cluster.
      
      If drop an external table create with `error_log_persistent`. And then
      create the same "dbname"."namespace"."table" external table without
      persistent error log. It'll write errors to the normal error log.
      The persistent error log still exists.
      Reviewed-by: NHaozhouWang <hawang@pivotal.io>
      Reviewed-by: NAdam Lee <ali@pivotal.io>
      04fdd0a6
  13. 28 2月, 2020 1 次提交
    • H
      Add max_retries flag for gpload (#9606) · b891b85b
      Huiliang.liu 提交于
      Add max_retries flag for gpload. It indicates the max times on connecting to GPDB timed out.
      max_retries default value is 0 which means no retry.
      If max_retries is -1 or other negative value, it means retry forever.
      
      Test has been done manually.
      b891b85b
  14. 31 1月, 2020 2 次提交
    • H
      Change the catalog representation for external tables. · b62e0601
      Heikki Linnakangas 提交于
      External tables now use relkind='f', like all foreign tables. They have
      an entry in pg_foreign_table, as if they belonged to a special foreign
      server called "exttable_server". That foreign server gets special
      treatment in the planner and executor, so that we still plan and execute
      it the same as before.
      
      * ALTER / DROP EXTERNAL TABLE is now mapped to ALTER / DROP FOREIGN TABLE.
        There is no "OCLASS_EXTTABLE" anymore. This leaks through to the user
        in error messages, e.g:
      
          postgres=# drop external table boo;
          ERROR:  foreign table "boo" does not exist
      
        and to the command tag on success:
      
          postgres=# drop external table boo;
          DROP FOREIGN TABLE
      
      * psql \d now prints external tables as Foreign Tables.
      
      Next steps:
      * Use the foreign table API routines instead of special casing
        "exttable_server" everywhere.
      
      * Get rid of the pg_exttable table, and store the all the options in
        pg_foreign_table.ftoptions instead.
      
      * Get rid of the extra fields in pg_authid to store permissions to
        create different kinds of external tables. Store them as ACLs in
        pg_foreign_server.
      b62e0601
    • H
      Remove unnecessary condition on relstorage from gpload. · 4f10a873
      Heikki Linnakangas 提交于
      The condition listed all possible values of relstorage, except for
      'f' for RELSTORAGE_FOREIGN. The condition on relkind filters out foreign
      tables as well, so the condition on relstorage is redundant. (Although
      I don't think filtering out foreign table was even the intention here.)
      4f10a873
  15. 01 11月, 2019 1 次提交
  16. 27 9月, 2019 1 次提交
  17. 20 9月, 2019 1 次提交
    • P
      Ship subprocess32 and replace subprocess with it in python code (#8658) · 9c4a885b
      Paul Guo 提交于
      * Ship modified python module subprocess32 again
      
      subprocess32 is preferred over subprocess according to python documentation.
      In addition we long ago modified the code to use vfork() against fork() to
      avoid some "Cannot allocate memory" kind of error (false alarm though - memory
      is actually sufficient) on gpdb product environment that is usually with memory
      overcommit disabled.  And we compiled and shipped it also but later it was just
      compiled but not shipped somehow due to makefile change (maybe a regression).
      Let's ship it again.
      
      * Replace subprocess with our own subprocess32 in python code.
      9c4a885b
  18. 26 8月, 2019 2 次提交
  19. 09 7月, 2019 1 次提交
    • D
      Remove variable self-assignment in gpload · d6239081
      Daniel Gustafsson 提交于
      Setting a variable to itself is a no-op which can be removed. This
      may have been introduced in error and instead masking a real bug,
      but if it so then we have lived with it for two years so I'm opting
      for removing.
      
      Reviewed-by: Asim R P and Bhuvnesh Chaudhary
      d6239081
  20. 10 4月, 2019 1 次提交
  21. 01 2月, 2019 1 次提交
    • H
      Rename gp_distribution_policy.attrnums to distkey, and make it int2vector. · 69ec6926
      Heikki Linnakangas 提交于
      This is in preparation for adding operator classes as a new column
      (distclass) to gp_distribution_policy. This naming is consistent with
      pg_index.indkey/indclass. Change the datatype to int2vector, also for
      consistency with pg_index, and some other catalogs that store attribute
      numbers, and because int2vector is slightly more convenient to work with
      in the backend. Move the column to the end of the table, so that all the
      variable-length and nullable columns are at the end, which makes it
      possible to reference the other columns directly in Form_gp_policy.
      
      Add a backend function, pg_get_table_distributedby(), to deparse the
      DISTRIBUTED BY definition of a table into a string. This is similar to
      pg_get_indexdef_columns(), pg_get_functiondef() etc. functions that we
      have. Use the new function in psql and pg_dump, when connected to a GPDB6
      server.
      Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Co-authored-by: NPeifeng Qiu <pqiu@pivotal.io>
      Co-authored-by: NAdam Lee <ali@pivotal.io>
      69ec6926
  22. 17 1月, 2019 1 次提交
  23. 13 12月, 2018 1 次提交
    • D
      Reporting cleanup for GPDB specific errors/messages · 56540f11
      Daniel Gustafsson 提交于
      The Greenplum specific error handling via ereport()/elog() calls was
      in need of a unification effort as some parts of the code was using a
      different messaging style to others (and to upstream). This aims at
      bringing many of the GPDB error calls in line with the upstream error
      message writing guidelines and thus make the user experience of
      Greenplum more consistent.
      
      The main contributions of this patch are:
      
      * errmsg() messages shall start with a lowercase letter, and not end
        with a period. errhint() and errdetail() shall be complete sentences
        starting with capital letter and ending with a period. This attempts
        to fix this on as many ereport() calls as possible, with too detailed
        errmsg() content broken up into details and hints where possible.
      
      * Reindent ereport() calls to be more consistent with the common style
        used in upstream and most parts of Greenplum:
      
      	ereport(ERROR,
      			(errcode(<CODE>),
      			 errmsg("short message describing error"),
      			 errhint("Longer message as a complete sentence.")));
      
      * Avoid breaking messages due to long lines since it makes grepping
        for error messages harder when debugging. This is also the de facto
        standard in upstream code.
      
      * Convert a few internal error ereport() calls to elog(). There are
        no doubt more that can be converted, but the low hanging fruit has
        been dealt with. Also convert a few elog() calls which are user
        facing to ereport().
      
      * Update the testfiles to match the new messages.
      
      Spelling and wording is mostly left for a follow-up commit, as this was
      getting big enough as it was. The most obvious cases have been handled
      but there is work left to be done here.
      
      Discussion: https://github.com/greenplum-db/gpdb/pull/6378Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      56540f11
  24. 30 11月, 2018 1 次提交
  25. 29 11月, 2018 1 次提交
    • D
      Compare with None using the is operator · e39047b5
      Daniel Gustafsson 提交于
      While == None works for comparison, it's a wasteful operation as it
      performs type conversion and expansion. Instead move to using the
      "is" operator which is the documented best practice for Python code.
      
      Reviewed-by: Jacob Champion
      e39047b5
  26. 14 11月, 2018 1 次提交
  27. 30 7月, 2018 1 次提交
    • P
      gpload: exit with os._exit to prevent hang (#5335) · f3e5c093
      Peifeng Qiu 提交于
      gpload test case will run gpload with subprocess, read stdout
      and stderr from it and wait for exit. sys.exit in gpload does some
      cleanup may cause deadlock between test and gpload. os._exit will
      exit immediately, but we need to flush stdout and stderr before
      that.
      f3e5c093
  28. 24 7月, 2018 1 次提交
  29. 23 7月, 2018 1 次提交
    • H
      support fast_match option in gpload config file (#5310) · d240a284
      Huiliang.liu 提交于
      - add fast_match option in gpload config file. If both reuse_tables
      and fast_match are true, gpload will try fast match external
      table(without checking columns). If reuse_tables is false and
      fast_match is true, it will print warning message.
      d240a284
  30. 23 4月, 2018 1 次提交
  31. 03 4月, 2018 1 次提交
    • A
      Get rid of pg_exttable.fmterrtbl · 8f6fe2d6
      Adam Lee 提交于
      The pg_exttable.fmterrtbl column stored the OID of the error table, but
      without an error table it is just set to the OID of the external table.
      That is not necessary, there are other columns which indicate if error
      logging is enabled. Therefore this column can be removed.
      8f6fe2d6
  32. 27 3月, 2018 1 次提交
    • P
      Use hard kill in gpload to avoid unexpected gpfdist hang (#4765) · 83fb63c0
      Peifeng Qiu 提交于
      When gpload finishes its query, it will send SIGTERM to gpfdist.
      gpfdist handle SIGTERM with exit(1), which will invoke registered
      apr handlers and cleanup all apr resources including apr_pool. If
      this happens just during normal destruction of apr_pool in
      do_close, gpfdist will hang.
      
      Call _exit in gpfdist to avoid any cleanup handlers, and let gpload
      send SIGKILL to perform hard kill.
      83fb63c0
  33. 26 2月, 2018 1 次提交
    • H
      fix gpload bug about handling nullas option (#4583) · 2e330960
      huiliang-liu 提交于
      - if the data file contains "\N" as the delimiter, it would not be
      recognized properly by gpload
      - root cause: gpload replace the quote in nullas option as well as
      replace '\' as '\\'
      - solution: add quote_no_slash function to handle nullas option
      2e330960