1. 27 6月, 2018 1 次提交
  2. 26 6月, 2018 6 次提交
  3. 25 6月, 2018 2 次提交
  4. 23 6月, 2018 4 次提交
    • S
      Remove QP_memory_accounting job from pipeline · 69c3727e
      Sambitesh Dash 提交于
      QP_memory_accounting tests have been moved to the isolation2 test. So we no
      longer need this job in the pipeline.
      69c3727e
    • I
      Implement filter pushdown for PXF data sources (#4968) · 6d36a1c0
      Ivan Leskin 提交于
      * Change src/backend/access/external functions to extract and pass query constraints;
      * Add a field with constraints to 'ExtProtocolData';
      * Add 'pxffilters' to gpAux/extensions/pxf and modify the extension to use pushdown.
      
      * Remove duplicate '=' check in PXF
      
      Remove check for duplicate '=' for the parameters of external table. Some databases (MS SQL, for example) may use '=' for database name or other parameters. Now PXF extension finds the first '=' in a parameter and treats the whole remaining string as a parameter value.
      
      * disable pushdown by default
      * Disallow passing of constraints of type boolean (the decoding fails on PXF side);
      
      * Fix implicit AND expressions addition
      
      Fix implicit addition of extra 'BoolExpr' to a list of expression items. Before, there was a check that the expression items list did not contain logical operators (and if it did, no extra implicit AND operators were added). This behaviour is incorrect. Consider the following query:
      
      SELECT * FROM table_ex WHERE bool1=false AND id1=60003;
      
      Such query will be translated as a list of three items: 'BoolExpr', 'Var' and 'OpExpr'.
      Due to the presence of a 'BoolExpr', extra implicit 'BoolExpr' will not be added, and
      we get an error "stack is not empty ...".
      
      This commit changes the signatures of some internal pxffilters functions to fix this error.
      We pass a number of required extra 'BoolExpr's to 'add_extra_and_expression_items'.
      
      As 'BoolExpr's of different origin may be present in the list of expression items,
      the mechanism of freeing the BoolExpr node changes.
      
      The current mechanism of implicit AND expressions addition is suitable only before
      OR operators are introduced (we will have to add those expressions to different parts
      of a list, not just the end, as done now).
      6d36a1c0
    • S
      Added details of how aosegments tables are named and retrieved (#5178) · 55d8635c
      Soumyadeep Chakraborty 提交于
      * Added details of how aosegments tables are named
      
      1) How are aosegments table initially named and how they are named following a DDL operation.
      2) Method to get the current aosegments table for a particular AO table.
      
      * Detail : Creation of new aosegments table post DDL
      
      Incorporated PR feedback on including details about the creation process of new aosegments tables after a DDL operation implicating a rewrite of the table on disk, is applied.
      55d8635c
    • L
      docs - create ... external ... temp table (#5180) · 738ddc21
      Lisa Owen 提交于
      * docs - create ... external ... temp table
      
      * update CREATE EXTERNAL TABLE sgml docs
      738ddc21
  5. 22 6月, 2018 6 次提交
    • A
      Bump ORCA version to 2.64.0 · b7ce9c43
      Abhijit Subramanya 提交于
      b7ce9c43
    • A
      Revert "ProcDie: Reply only after syncing to mirror for commit-prepared." · 23241a2b
      Ashwin Agrawal 提交于
      This reverts commit a7842ea9. Yet to fully
      investigate the issue but its hitting the Assertion
      (""!(SyncRepQueueIsOrderedByLSN(mode))"", File: ""syncrep.c"", Line: 214)
      sometimes.
      23241a2b
    • A
      ProcDie: Reply only after syncing to mirror for commit-prepared. · a7842ea9
      Ashwin Agrawal 提交于
      Upstream and for greenplum master if procdie is received while waiting for
      replication, just WARNING is issued and transaction moves forward without
      waiting for mirror. But that would cause inconsistency for QE if failover
      happens to such mirror missing the commit-prepared record.
      
      If only prepare is performed and primary is yet to process the commit-prepared,
      gxact is present in memory. If commit-prepared processing is complete on primary
      gxact is removed from memory. If gxact is found then we will flow through
      regular commit-prepared flow, emit the xlog record and sync the same to
      mirror. But if gxact is not found on primary, we used to return blindly success
      to QD. Hence, modified the code to always call `SyncRepWaitForLSN()` before
      replying to QD incase gxact is not found on primary.
      
      It calls `SyncRepWaitForLSN()` with the lsn value of `flush` from
      `xlogctl->LogwrtResult`, as there is no way to find-out the actual lsn value of
      commit-prepared record for primary. Usage of that lsn is based on following
      assumptions
      	- WAL always is written serially forward
      	- Synchronous mirror if has xlog record xyz must have xlog records before xyz
      	- Not finding gxact entry in-memory on primary for commit-prepared retry
        	  from QD means it was for sure committed (completed) on primary
      a7842ea9
    • J
      Make gprecoverseg use new pg_backup flag --force-overwrite · e029720d
      Jimmy Yih 提交于
      This is needed during gprecoverseg full to preserve important files
      such as pg_log files. We pass this flag down the call stack to prevent
      other utilities such as gpinitstandby or gpaddmirror from using the
      new flag. The new flag can be dangerous if not used properly and
      should only be used when data directory file preservation is
      necessary.
      e029720d
    • J
      Add pg_basebackup flag to force overwrite of destination data directory · 4333acd9
      Jimmy Yih 提交于
      Currently, pg_basebackup has a hard restriction where the destination
      data directory must be empty or nonexistant. It is expected that
      anything of interest should be moved somewhere temporarily and then
      copied back in. To reduce the complexity, we introduce a new flag
      --force-overwrite which will delete the directories or files that are
      being copied from the source data directory before doing the actual
      copy. Combined with the Greenplum-specific exclusion flag (-E), we are
      now able to preserve files of interest.
      
      Our main example would be gprecoverseg full recovery and pg_log
      files. There have been times when a mirror fails and a full recovery
      would run which would drop the entire mirror directory before running
      pg_basebackup which would result in the mirror log files before the
      crash to be erased. This is substantially worse when we think of
      gprecoverseg rebalancing scenario where we currently do not have
      pg_rewind and must run full recovery to bring the old primary back
      up... which would result in vast amounts of old primary log files to
      be erased. Then during rebalance, the acting primary which would
      return to being a mirror also goes through a full recovery so its logs
      as a primary are also removed. The obvious solution would be to tar
      these logs out and untar them back in afterwards, but what if there
      are other files that must be preserved. Creating a copy may be costly
      in environments where disk space is valued highly.
      4333acd9
    • C
      Feature/kerberos setup edit (#5159) · b133cfe1
      Chuck Litzell 提交于
      * Edits to apply organizational improvements made in the HAWQ version, using consistent realm and domain names, and testing that procedures work.
      
      * Convert tasks to topics to fix formatting. Clean up pg_ident.conf topic.
      
      * Convert another task to topic
      
      * Remove extraneous tag
      
      * Formatting and minor edits
      
      * - added $ or # prompts for all code blocks
      - Reworked section "Mapping Kerberos Principals to Greenplum Database Roles" to describe, generally, a user's authentication process and to more clearly describe how principal name is mapped to gpdb name.
      
      * - add krb_realm auth param
      
      - add description of include_realm=1 for completeness
      b133cfe1
  6. 21 6月, 2018 8 次提交
  7. 20 6月, 2018 6 次提交
  8. 19 6月, 2018 7 次提交
    • L
      docs - docs and updates for pgbouncer 1.8.1 (#5151) · a99194e0
      Lisa Owen 提交于
      * docs - docs and updates for pgbouncer 1.8.1
      
      * some edits requested by david
      
      * add pgbouncer config page to see also, include directive
      
      * add auth_hba_type config param
      
      * ldap - add info to migrating section, remove ldap passwds
      
      * remove ldap note
      a99194e0
    • O
      Update utilities to capture hyperloglog counter · aa5fe3d5
      Omer Arap 提交于
      This commit updates the GPSD utility to capture the value of the column
      `stainherit` and also the HLL counters stored in column `stavalues4`
      generated for sample/full table scan based HLL analyze in `pg_statistic`
      table.
      
      This commit also updates minirepro utility to capture hyperloglog
      counter
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      aa5fe3d5
    • O
      Utilize hyperloglog and merge utilities to derive root table statistics · 9c1b1ae3
      Omer Arap 提交于
      This commit introduces an end-to-end scalable solution to generate
      statistics of the root partitions. This is done by merging the
      statistics of leaf partition tables to generate the statistics of the
      root partition. Therefore, ability to merge leaf table statistics for
      the root table makes analyze very incremental and stable.
      
      **CHANGES IN LEAF TABLE STATS COLLECTION:**
      
      Incremental analyze will create sample for each partition as the
      previous version. While analyzing the sample and generating statistics
      for the partition, it will also create a `hyperloglog_counter` data
      structure and add values from the sample to the `hyperloglog_counter`
      such as number of multiples and sample size. Once the entire sample is
      processed, analyze will save the `hyperloglog_counter` as a byte array
      in `pg_statistic` catalog table. We reserve a slot for the
      `hyperlog_counter` in the table and signify this as a specific type of
      statistic kind which is `STATISTIC_KIND_HLL`. We only keep the
      `hyperloglog_counter` in the `pg_catalog` for the leaf partitions. If
      the user chooses to run FULL scan for HLL, we signify the kind as
      `STATISTIC_KIND_FULLHLL`.
      
      **MERGING LEAF STATISTICS**
      
      Once all the leaf partitions are analyzed, we analyze the root
      partition. Initially, we check if all the partitions have been analyzed
      properly and have all the statistics available to us in the
      `pg_statistic` catalog table. If there is a partition with no tuples,
      even though it has no entry in `pg_catalog`, we consider it as analyzed.
      If for some reason a single partition is not analyzed, we fall back to
      the original analyze algorithm that requires to acquire sample for the
      root partition and calculate statistic based on the sample.
      
      Merging null fraction and average width from leaf partition statistics
      is trivial and does not involve significant challenge. We do calculate
      them first. Then, the remaining statistics information are:
      
      - Number of distinct values (NDV)
      
      - Most common values (MCV), and their frequencies termed as most common
      frequency (MCF)
      
      - Histograms that represent the distribution of the data values in the
      table
      
      **Merging NDV:**
      
      Hyperloglog provides a functionality to merge multiple
      `hyperloglog_counter`s into one and calculate the number of distinct
      values using the aggregated `hyperlog_counter`. This aggregated
      `hyperlog_counter` is sufficient only if the user chooses to run full
      scan for hyperloglog. In the sample based approach, without the
      hyperloglog algorithm, derivation of number of distinct values is not
      possible. Hyperloglog enables us to merge the `hyperloglog_counter`s
      from each partition and calculate the NDV on the merged
      `hyperloglog_counter` with an acceptable error rate. However, it does
      not give us the ultimate NDV of the root partition, it provides us the
      NDV of the union of the samples from each partition.
      
      The rest of the NDV interpolation depends on four metrics in postgres
      and based on the formula used in postgres: NDV in the sample, number of
      multiple values in the sample, sample size and total rows in the table.
      Using these values the algorithm calculates the approximate NDV for the
      table. While merging the statistics from the leaf partitions, with the
      help of hyperloglog we can accurately generate NDV for the sample,
      sample size and total rows, however, number of multiples in the
      accumulated sample is unknown since we do not have an access to the
      accumulated sample at this point.
      
      _Number of Multiples_
      
      Our approach to estimate the number of multiples in the aggregated
      sample (which itself is unavailable) for the root requires the
      availability of NDVs, number of multiples and size of each leaf sample.
      The NDVs in each sample is trivial to calculate using the partition's
      `hyperloglog_counter`. The number of multiples and sample size for each
      partition is saved in the `hyperloglog_counter` of the partition to be
      used in the merge during the leaf statistics gathering.
      
      Estimating the number of multiples in the aggregate sample for the root
      partition is a two step process. First, we accurately estimate the
      number of values that reside in more than one partition's sample. Then,
      we estimate the number of multiples that uniquely exists in a single
      partition. Finally, we add these values to estimate the overall number
      of multiples in the aggregate sample of the root partition.
      
      To count the number of values that uniquely exists in one single
      partition, we utilize hyperloglog functionality. We can easily estimate
      how many values appear only on a specific partition _i_. We call the NDV
      of overall aggregate of the entire partition as `NDV_all` and NDV of
      aggregate of all partitions but _i_ as `NDV_minus_i`. The difference of
      `NDV_all` and  `NDV_minus_i` would result in the values that appear in
      only one partition. The rest of the values will contribute to the
      overall number of multiples in the root’s aggregated sample, and we call
      them as `nMultiple_inter` as the number of values that appear in more
      than one partition.
      
      However, that is not enough since even a single value only resides in
      one partition, the partition might have multiple of them. We need a way
      to express the possibility of existence of these values. Remember that
      we also account the number of multiples that uniquely in partition
      sample. We already know the number of multiples inside a partition
      sample, however we need to normalize this value with the proportion of
      the number of values unique to the partition sample to the number of
      distinct values of the partition sample. The normalized value would be
      partition sample i’s contribution to the overall calculation of the
      nMultiple.
      
      Finally, `nMultiple_root` would be the sum of the `nMultiple_inter` and
      `normalized_m_i` for each partition sample.
      
      **Merging MCVs:**
      
      We utilize the merge functionality we imported from the 4.3 version of
      the greenplum DB. The algorithm is trivial. We convert each MCV’s
      frequency into count and add them up if they appear in more than one
      partition. After every possible candidate’s count has been calculated,
      we sort the candidate values and pick the top ones which is defined by
      the `default_statistics_target`. 4.3 previously blindly picks the top
      values with the highest count. We however incorporated the same logic
      used in the current greenplum and postgres and test if a values is a
      real MCV by running some tests. Therefore, even after the merge, the
      logic totally aligns with the postgres.
      
      **Merging Histograms:**
      
      One of the main novel contribution of this commit comes in how we merge
      the histograms from the leaf partitions. In 4.3 we use priority queue to
      merge the histogram from the leaf partition. However, that approach is
      very naive and loses very important statistical information. In
      postgres, histogram is calculated over the values that did not qualify
      as an MCV. The merge logic for the histograms in 4.3, did not take this
      into consideration and significant statistical information is lost while
      we merge the MCV values.
      
      We introduce a novel approach to feed the MCV’s from the leaf partitions
      that did not qualify as a root MCV to the histogram merge logic. To
      fully utilize the previously implemented priority queue logic, we
      treated non-qualified MCV’s as the histograms of a so called `dummy`
      partitions. To be more previcate, if an MCV m1 is a non-qualified MCV we
      create a histogram [m1, m1] where it only has one bucket and the bucket
      size is the count of this non-qualified MCV. When we merge the
      histograms of the leaf partitions and these dummy partitions the merged
      histogram would not lose any statistical information.
      Signed-off-by: NJesse Zhang <sbjesse@gmail.com>
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      9c1b1ae3
    • O
      Import and modify analyze utility functions from 4.3 · 7ea27fc5
      Omer Arap 提交于
      In the previous generation of analyze, gpdb provided features to merge
      statistics such as MCVs (Most common values) and histograms for the root
      or midlevel partitions from the leaf partition's statistics.
      
      This commit imports the utility functions for merging MCVs and
      histograms and modifies based on the needs of current version.
      Signed-off-by: NBhunvesh Chaudhary <bchaudhary@pivotal.io>
      7ea27fc5
    • O
      95279b10
    • A
      Port hyperloglog extension into gpdb · a9301fdc
      Abhijit Subramanya 提交于
      - Port the hyperloglog extension into the contrib directory and make
      corresponding makefile changes to get it to compile.
      - Also modify initdb to install the HLL extension as part of gpinitsystem.
      Signed-off-by: NOmer Arap <oarap@pivotal.io>
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      a9301fdc
    • A
      Fix COPY TO ON SEGMENT processed counting · cb63e543
      Adam Lee 提交于
      The processed variable should not be reset while looping all partitions.
      cb63e543