1. 11 1月, 2019 3 次提交
    • K
      Use centos6-test image in Concourse task files · 4ae1f3aa
      Karen Huddleston 提交于
      Also updated docker README to reference new build image.
      Co-authored-by: NDavid Sharp <dsharp@pivotal.io>
      Co-authored-by: NKaren Huddleston <khuddleston@pivotal.io>
      4ae1f3aa
    • D
      Use centos6-test image for test jobs · 53f84628
      David Sharp 提交于
      and centos6-build for the OSS build job. The image for this job was left
      unmodified when we previously introduced the centos6-build image for
      enterprise compile jobs. We changed the OSS job to also use the
      centos6-build image so we could remove the centos-gpdb-test-6 image.
      
      We also renamed the image resources to
      centos<version>-<test,build>-gpdb6, to clarify what they are
      building/testing. Docker resource names should match the repository and
      tag of the docker image.
      Co-authored-by: NDavid Sharp <dsharp@pivotal.io>
      Co-authored-by: NBen Christel <bchristel@pivotal.io>
      Co-authored-by: NKaren Huddleston <khuddleston@pivotal.io>
      53f84628
    • A
      Disable auto segment rebalance during gpstart. · 95606763
      Ashwin Agrawal 提交于
      gpstart tried to automatically rebalances the cluster if synced
      segment pairs are not in their preferred segment roles (primary or
      mirror). This worked and was basically free in file replication. As
      part of the cluster start, the gpstart utility would see that the
      primary and mirror pair were both in up/sync state but segment roles
      reversed in the catalog. It was simple to just send the correct
      filerep signals to switch their segment roles to their preferred role.
      
      With WAL replication, this is not as trivial.  The primary and mirror
      segments themselves are very aware of their segment roles.  If a
      segment found a recovery.conf file in their data directory, it'll
      automatically start as a mirror.
      
      So, till this is not properly implemented if decided to still be
      supported as part of gpstart, removing the currently broken logic in
      gpstart.
      95606763
  2. 10 1月, 2019 15 次提交
    • D
      docs: update links to the website · b841c2b4
      Daniel Gustafsson 提交于
      The website redesign altered the URLs with no redirects, so existing
      links need to be updated to match the new structure.
      
      Reviewed-by: Mel Kiyama
      Reviewed-by: David Yozie
      b841c2b4
    • D
      Minor wordsmithing on the README · a20ebe96
      Daniel Gustafsson 提交于
      This polishes the wording in the README a bit where it seemed either
      convoluted or strange to me. On top of wording it updates the directory
      structure to reference gpcontrib, and removes mention of TINC with the
      rationale that anyone reading this file is a new contributor and there
      is little value in bringing up a framework which is its deathbed. It
      also fixes the links to the website to actually work, since they site
      redesign broke the old links without redirects. The ORCA naming is
      discussed with all mentions changed to GPORCA, since that's the name
      used in the documentation.
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Discussion: https://github.com/greenplum-db/gpdb/pull/6648
      a20ebe96
    • H
      Don't use TIDs with high offset numbers in AO tables. · c249ac7a
      Heikki Linnakangas 提交于
      Change the mapping AO segfilenum+rownum to an ItemPointer, so that we
      avoid using ItemPointer.ip_posid values higher than 32768. Such offsets
      are impossible on heap tables, because you can't fit that many tuples on
      a page. In GiST, since PostgreSQL 9.1, we have taken advantage of that by
      using 0xfffe (65534) to mark special "invalid" GiST tuples. We can tolerate
      that, because those invalid tuples can only appear on internal pages, so
      they cannot be confused with AO TIDs, which only appear on leaf pages. But
      we will also use of those high values for other similar magic values, in
      later version of PostgreSQL, so it seems better to keep clear of them, even
      if we could make it work.
      
      To allow binary upgrades of indexes that already contain AO tids with high
      offsets, we still allow and handle those, too, in the code to fetch AO
      tuples. Also relax the sanity check in GiST code, to not confuse those high
      values with invalid tuples.
      
      Fixes https://github.com/greenplum-db/gpdb/issues/6227Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
      c249ac7a
    • H
      Don't bind outgoing TCP connections to a particular IP address. · 8b8523eb
      Heikki Linnakangas 提交于
      In the TCP interconnect, we used to bind outgoing TCP connections to the
      same source IP address that the libpq connection came from. That can lead
      to running out of ephemeral TCP ports. I was seeing errors, when running
      the regression tests with the TCP interconnect:
      
      ERROR:  interconnect error setting up outgoing connection
      DETAIL:  Could not bind to local addr 2.0.0.0: Address already in use
      
      This was easily reproducible, by running the parallel group of tests that
      includes the qp_misc_jiras test. Apparantly that parallel group opens
      especially many connections.
      
      When a socket is bound to a particular IP address with bind(), it is also
      allocated an ephemeral TCP port. On Linux, the range of ports available
      can be seen in /proc/sys/net/ipv4/ip_local_port_range. It defaults to
      32768-60999, but even if you increase the range, it's always quite
      limited. bind() reserves the whole TCP port, even though multiple outgoing
      connections could share the same source port, as long as their destination
      IP address or port is different, because bind() doesn't know whether
      you're going to use the port to listen for incoming connections, or for
      establishing an outgoing connection. Listening for incoming connections
      needs to reserve the port.
      
      Linux kernel 4.2 introduced a new socket option, IP_BIND_ADDRESS_NO_PORT,
      that we could use to give bind() a hint that we're using the socket for
      an outgoing connection, so there's no need to reserve the whole port. But
      actually, I don't think we should be calling bind() on outgoing
      connections in the first place. I don't think the logic, to use the
      incoming libpq connection's IP address as the source IP address of outgoing
      interconnect connections, makes sense. The comment says that it is for
      fault tolerance, but I don't buy that argument. If a network adapter is not
      working, it should be disabled in the OS configuration so that it is not
      used. It is not the application's job to make routing decisions.
      
      Forcing the same source IP address seems outright wrong in some scenarios.
      Imagine that the QD has two network adapters: one for connecting to the
      outside world, and another for the internal network where the QEs are. In
      that scenario, the interconnect connections between the QD and the QEs
      should definitely *not* be established through the same network adapter as
      the user's libpq connection.
      
      Better to just remove the code to bind to a particular source IP address,
      and let the OS do its job of routing TCP connections. (AFAICS, the UDP
      interconnect never tried to force a particular source IP address when
      sending.)
      Reviewed-by: NPaul Guo <pguo@pivotal.io>
      Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/ITkZdACpcVQ/H_74phbMFgAJ
      8b8523eb
    • H
      Introduce new class to hold context for Query->DXL translation. · c1de30da
      Heikki Linnakangas 提交于
      The new class, CContextQueryToDXL, holds information that's global for
      the whole query. This makes subquery planning less awkward, as we don't
      need to pass global information up and down the query levels.
      Reviewed-by: NEkta Khanna <ekhanna@pivotal.io>
      Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
      Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
      c1de30da
    • S
      EXCHANGE PARTITION should ERROR out when relpersistence differs · 3d6c5664
      Shaoqi Bai 提交于
      Currently, EXCHANGE PARTITION will allow a target partition and source
      table with differing relpersistence types.
      This should be checked for and banned when checking if the relation
      is_exchangeable.
      Co-authored-by-by: NMelanie Plageman <mplageman@pivotal.io>
      Co-authored-by-by: NShaoqi Bai <sbai@pivotal.io>
      3d6c5664
    • M
      Resolve FIXME for validatepart by passing relpersistence of root · 263355cf
      Melanie Plageman 提交于
      MergeAttributes was used in atpxPart_validate_spec to get the schema and
      constraints to make a new leaf partition as part of ADD or SPLIT
      PARTITION. It was likely used as a convenience, since it already
      existed, and seems like the wrong function for the job.
      
      Previously atpxPart_validate_spec simply hard-coded in false for the relation
      persistence since the parameter was simply `isTemp`. Once the options
      for relation persistence were expanded to included unlogged, this
      paramter was changed to take a relpersistence. In MergeAttributes, for
      the part which we actually hit when calling it from here (we pass in the
      schema as NIL and therefore hit only half of the MergeAttributes code)
      the `supers` parameter is actually that of the parent partition and
      includes relpersistence, so, by passing in the relpersistence of the
      parent as relpersistence here, the checks we do around relpersistence are
      redundant because we are comparing the parent's relpersistence to its
      own. However, because, currently, this function is only called when we
      are making a new relation that, because we don't allow a different
      persistence to be specified for the child would actually just be using
      the relpersistence of the parent anyway, by passing it in hard-coded we
      would actually be incorrectly assuming that we are creating a permanent
      relation always.
      
      Since MergeAttributes was overkill, we wrote a new helper
      function, SetSchemaAndConstraints, to get the schema and constraints of
      a relation. This function doesn't do very many special validation checks
      that may be required by callers when using it in the context of
      partition tables (so user beware), however, it is probably only useful
      in the context of partition tables because it assumes constraints will
      be cooked, which, wouldn't be the case for all relations.
      We split it into two smaller inline functions for clarity. We also felt
      this would be a useful helper function in general, so we extern'd it.
      
      This commit also sets the relpersistence that is used to make the leaf
      partition when adding a new partition or splitting an existing a partition.
      
      makeRangeVar is a function from upstream which is basically a
      constructor. It sets relpersistence in the RangeVar to a hard-coded
      value of RELPERSISTENCE_PERMANENT. However, because we use the root
      partition to get the constraints and column information for the new
      leaf, after we use the default construction of the RangeVar, we need to
      set the relpersistence to that of the parent.
      
      This commit specifically only sets it back for the case in which we are
      adding a partition with `ADD PARTITION` or through `SPLIT PARTITION`.
      
      Without this commit, a leaf partition of an unlogged table created
      through `ADD PARTITION` or `SPLIT PARTITION` would incorrectly have its
      relpersistence set to permanent.
      Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
      Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
      263355cf
    • A
      Fix unit test failure for gp_replication_test. · 3267a548
      Ashwin Agrawal 提交于
      3267a548
    • A
      df97fc50
    • T
      gp_replica_check: Ignore status check unless it is_for_gp_walreciever · 6fb27ca7
      Taylor Vesely 提交于
      The WalSndCtl can have status information for non-mirror walsender
      connections, i.e. pg_basebackup connections. Ignore them.
      Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
      6fb27ca7
    • A
      Make GetMirrorStatus more intelligent · e03a755a
      Asim R P 提交于
      Now that we allow multipl WanSnd objects, FTS probes need to recognize
      the WalSnd object corresponding to the mirror.  This is achieved by
      defining Greenplum-specific application name "gp_replication".  The
      mirrors use this application name as a connection parameter.  Any
      other replication connections (backup and log streamer connections
      initiated by pg_basebackup) do not use this application name.
      
      Log streamer replication connection initiated by pg_basebackup should
      NOT use Grenplum-specific application name.
      Co-authored-by: NDavid Kimura <dkimura@pivotal.io>
      Co-authored-by: NAdam Berlin <aberlin@pivotal.io>
      Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
      e03a755a
    • A
      Initialize cluster with primary/mirror replication slots. · 06e7bf32
      Adam Berlin 提交于
      This replication slot is used for WAL replication between primary and
      mirror segments, and also master and standby.  The replication slot is
      created when a mirror / standby segment is initialized using
      pg_basebackup.  The replication slot is used by primary to keep track
      of the WAL flush location reported by the mirror.  When the mirror
      disconnects, it allows the primary to retain enough WAL so that the
      mirror can catchup after reconnecting in future.
      
      - defaults max_wal_senders to 10 to allow for basebackup to spin up
      senders matching upstream
      - defaults max_replication_slots to 10 instead of 0
      - changes gp_basebackup to create a replication slot when a slot name is
      provided during gpinitsystem
      - changes gp_basebackup to use streaming replication during gpinitsystem
      - creates and uses replication slot during full recovery
      
      Note: We intend to reason more deeply with the default guc settings in
      a later feature.
      Co-authored-by: NDavid Kimura <dkimura@pivotal.io>
      Co-authored-by: NAsim R P <apraveen@pivotal.io>
      06e7bf32
    • P
      pg_basebackup: Add --slot option · 0d1b5de6
      Peter Eisentraut 提交于
      This option specifies a replication slot for WAL streaming (-X stream),
      so that there can be continuous replication slot use between WAL
      streaming during the base backup and the start of regular streaming
      replication.
      Reviewed-by: NMichael Paquier <michael.paquier@gmail.com>
      0d1b5de6
    • Z
      Remove the distributed log on new segments · 68536097
      ZhangJackey 提交于
      df19119c eliminate distributed transaction log creation
      and maintenance on QD (Only a 32K `pg_distributedlog/0000`
       file exists). Gpexpand will copy data files from
      QD to new segments, So in the new segments the oldestXID
       is 3 (loaded from pg_controlfile which copied from QD).
      
      After new segments join the cluster, they will maintain the
      oldestXmin, so it will loop to find the page which is in the
      distributed transaction log.
      
      If the local transaction ID (xid) is huge on new segments, it will
      lead to a hole between 0000 and TransactionIdToPage(xid),
      then an error will be raised.
      
      In this commit we truncate the distributedlog with cutoff of oldestxid
      on new segments,  then the hole is gone, so the oldestXmin will be
      initialized to oldestLocalXmin.
      68536097
    • G
      Update instructions in README.docker.md · f3807f13
      gshaw-pivotal 提交于
      - Include how to make Greenplum within docker accessible to SQL editors
        running on the local machine (outside of docker).
      - Update the pip install command so that psutil and lockfile are
        accessible when the make cluster command is executed.
      f3807f13
  3. 09 1月, 2019 15 次提交
    • G
    • H
      Fix assertion failure in join planning. · 22288e8d
      Heikki Linnakangas 提交于
      cdbpath_motion_for_join() was sometimes returning an incorrect locus for a
      join between SingleQE and Hashed loci. This happened, when even the "last
      resort" strategy to move hashed side to the single QE failed. This can
      happen at least in the query that's added to the regression tests. The
      query involves a nested loop join path, when one side is a SingleQE locus
      and the other side is a Hashed locus, and there are no join predicates
      that can be used to determine the resulting locus.
      
      While we're at it, turn the Assertion that this tripped, and some related
      ones at the same place, into elog()s. No need to crash the whole server if
      the planner screws up, and it'd be good to perform these sanity checks in
      production, too.
      
      The failure of the "last resort" codepath was left unhandled by commit
      0522e960. Fixes https://github.com/greenplum-db/gpdb/issues/6643.
      Reviewed-by: NPaul Guo <pguo@pivotal.io>
      22288e8d
    • Y
      fix typo and indent (#6653) · 1ffc362e
      Yandong Yao 提交于
      1ffc362e
    • R
      Do not enforce join ordering for ANTI and LASJ. (#6625) · 29daab51
      Richard Guo 提交于
      The following identity holds true:
      
      	(A antijoin B on (Pab)) innerjoin C on (Pac)
          	= (A innerjoin C on (Pac)) antijoin B on (Pab)
      
      So we should not enforce join ordering for ANTI. Instead we need to
      collapse ANTI join nodes so that they participate fully in the join
      order search.
      
      For example:
      
      	select * from a join b on a.i = b.i where
      		not exists (select i from c where a.i = c.i);
      
      For this query, the origin join order is "(a innerjoin b) antijoin c". If
      we enforce ANTI join ordering, this will be the final join order. But
      another join order "(a antijoin c) innerjoin b" is also legal. We should
      take this order into consideration and pick a cheaper one.
      
      For LASJ, it is the same as ANTI joins.
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
      29daab51
    • A
      pg_rewind: parse bitmap wal records. · 161920e8
      Ashwin Agrawal 提交于
      Co-authored-by: NEkta Khanna <ekhanna@pivotal.io>
      161920e8
    • A
      pg_rewind: add test for bitmap wal records. · a6913d2f
      Ashwin Agrawal 提交于
      Co-authored-by: NEkta Khanna <ekhanna@pivotal.io>
      a6913d2f
    • A
      In maintenance_mode ignore distributed log. · 4eb48055
      Ashwin Agrawal 提交于
      With this commit the QE in maintenance mode will ignore the
      distributed log and just pretend like single instance postgres.
      
      Without this if starting QE as single instance only, no distributed
      snapshot is executed. Due to this distributed oldest xmin points to
      oldest datfrozen_xid in system. As a result, vacuum any table results
      in HEAP_TUPLE_RECENTLY_DEAD and avoids cleaning up dead rows.
      Co-authored-by: NEkta Khanna <ekhanna@pivotal.io>
      4eb48055
    • P
      Fix calculation of WorkfileMgrLock and WorkfileQuerySpaceLock · 1540eb1c
      Pengzhou Tang 提交于
      All lwlocks are stored in MainLWLockArray which is an array of
      LWLockPadded structures:
      
      typedef union LWLockPadded
      {
        LWLock lock;
        char pad[LWLOCK_PADDED_SIZE];
      } LWLockPadded;
      
      The calculation in SyncHTPartLockId to fetch a lwlock is
      incorrect because it offsets the array as an LWLock array.
      In current code base, it works fine because the size of
      LWLock happens to be 32, if structure LWLock get enlarged,
      the calculation will mess up.
      1540eb1c
    • P
      fix according to comments · a8c2f7c4
      Pengzhou Tang 提交于
      a8c2f7c4
    • P
      Dispatcher should use DISPATCH_WAIT_FINISH mode to wait QEs for init plans · b7bb5438
      Pengzhou Tang 提交于
      GPDB always set the REWIND flag for subplans include init plans, in 6195b967,
      we enhanced the restriction that if a node is not eager free, we cannot squelch
      a node earlier include init plans, this exposes a few hidden bugs: if init plan
      contains a motion node that needs to be squelched earlier, the whole query will
      get stuck in cdbdisp_checkDispatchResult() because some QEs are still keep
      sending tuples.
      
      To resolve this, we use DISPATCH_WAIT_FINISH mode for dispatcher to wait the
      dispatch results of init plan, init plan with motion is always executed on
      QD and should always be a SELECT-like plan, init plan must already fetched
      all the tuples it needed before dispatcher waiting for the QEs,
      DISPATCH_WAIT_FINISH is the right mode for init plan.
      b7bb5438
    • E
      bd7c4b1a
    • E
      Fix distributed snapshot xmax check. · 2b4674a4
      Ekta Khanna 提交于
      As part of commit dc78e56c, logic for distributed snapshot was modified
      to use latestCompletedDxid. This changed the logic from xmax being
      inclusive range to not inclusive for visible transactions in snapshot.
      Hence, updating the check to return
      DISTRIBUTEDSNAPSHOT_COMMITTED_INPROGRESS even for transaction id equal
      to global xmax now. Other way to fix is using latestCompletedDxid
      without +1 for xmax, but better is to keep logic similar to local
      snapshot check and not have xmax in inclusive range of visible
      transactions.
      
      This was exposed in CI by test
      isolation/results/heap-repeatable-read-vacuum-freeze failing
      intermittently.  This was due to isolation framework itself triggering
      query on pg_locks to check for deadlocks. This commit adds explicitely
      test to cover the scenario.
      Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
      2b4674a4
    • A
      Avoid calling CreateRestartPoint() from startup process. · 558c460e
      Ashwin Agrawal 提交于
      With commit 8a11bfff, aggressive restart point creation is not performed in gpdb
      as well. Since CreateRestartPoint() is not coded to be called from startup
      process, GPDB specific code exception was added in past to work correctly for
      previous aggressive restart point creations, calls to which could happen via
      startup process.
      
      Now given only when gp_replica_check is running restartpoint is created on
      checkpoint record, which should be done via checkpointer process. Eliminate any
      case of calling CreateRestartPoint() from startup process and thereby remove
      GPDB added exception to CreateRestartPoint() and align to upstream code.
      558c460e
    • H
      Make gptransfer test's output understandable to gpdiff. · c3b9d927
      Heikki Linnakangas 提交于
      The gptransfer behave test was using gpdiff to compare data between the
      source and target systems, and was relying on gpdiff to mask row order
      differences. However, after 1f44603a, gpdiff no longer recognized the
      results as psql result sets, because it did not echo the SELECT statements
      to the output. gpdiff expects to see those. Fix, by echoing the
      statements, like in pg_regress. That makes the output, if there are any
      differences, more readable anyway.
      
      While we're at it, change the gpdiff invocation to produce a unified diff.
      If the test fails, because there is a difference, that makes the output
      a lot more readable.
      c3b9d927
    • H
      Fix regression failure on "gpmapreduce --help". · f8035a33
      Heikki Linnakangas 提交于
      The test was using "-- ignore", to cause gpdiff to ignore any differences
      in the test output. But after commit 1f44603a, gpdiff doesn't consider
      the test's output as a psql result set anymore, so the "-- ignore" directive
      doesn't work anymore. Use the more common "-- start_ignore"/"-- end_ignore"
      block instead.
      
      (I'm not sure how useful the test is, if we don't check
      the output, but thats a different story.)
      f8035a33
  4. 08 1月, 2019 7 次提交
    • P
      Little refine of SendEosUDPIFC() to resolve potential issue · 3b6da4c6
      Pengzhou Tang 提交于
      Previously, Even a connection has been explicitly set to inactive, the old code
      might still treat the connection as active if conn->cdbProc is not null and
      conn->sndQueue is not empty and increase activeCount. In the next loop, because
      conn->stillActive is true, conn->unackQueue or conn->sndQueue are never be
      freed and activeCount always be non-zeror and might cause an infinite loop.
      3b6da4c6
    • H
      Fix expected output for the changes in isolation2 tester. · 20b3aa3a
      Heikki Linnakangas 提交于
      Commit 7d7782f1 changed the formatting of result sets slightly in
      isolation2 output. I missed changing these expected outputs in that
      commit.
      20b3aa3a
    • H
      Detect beginning of result set better in gpdiff. · 1f44603a
      Heikki Linnakangas 提交于
      Improve the detection of the beginning of a result set. Previously, it
      would get confused by comments like "-------", which look a lot like the
      beginning of a single-column psql result set. That doesn't matter much,
      as long as the test is passing, but if such a test fails, the diff was
      very difficult to read, as atmsort reordered the SQL lines, too.
      
      Make the detection more resilient, by looking at the previous line. In a
      real psql result set, the previous line should be a header line, like
      " col1 | col2 ". A header line begins and ends with spaces, anything else
      means that we're seeing a SQL comment rather than a psql result set.
      
      While we're at it, if the "------" line has any leading or trailing
      whitespace, it's not a psql result set. I'm not sure why we were lenient
      on that, but let's make that more strict, too.
      Reviewed-by: NAsim R P <apraveen@pivotal.io>
      1f44603a
    • H
      Make isolation2 result sets look more like psql's. · 7d7782f1
      Heikki Linnakangas 提交于
      Why, you might ask? The next commit will modify the code in gpdiff.pl, so
      that it doesn't get fooled by "----"-style comments, thinking that they
      are psql result sets. A side-effect of that is that it would also no
      longer recognize the result sets in the isolation2 output, without this
      patch.
      7d7782f1
    • H
      Fix leftover references to MirroredFileSysObj_JustInTimeDbDirCreate. · b58556d7
      Heikki Linnakangas 提交于
      It was removed along with file replication, in commit 5c158ff3.
      b58556d7
    • H
      Remove duplicated code. · f7625523
      Heikki Linnakangas 提交于
      These happen easily when merging code from upstream, that had already been
      backported earlier.
      f7625523
    • H
      Remove some unnecessary differences vs. upstream. · fbaf499a
      Heikki Linnakangas 提交于
      To make merging and diffing with upstream easier.
      fbaf499a