Modify SQL references for replicated tables (#5883)

* Modify SQL references for replicated tables * Changes for edits and review comments

Modify SQL references for replicated tables (#5883)
* Modify SQL references for replicated tables * Changes for edits and review comments
fa01ff3d · Chuck Litzell · GitHub · 04e43e64 · fa01ff3d · fa01ff3d
4 changed file
--- a/gpdb-doc/dita/ref_guide/sql_commands/ALTER_TABLE.xml
+++ b/gpdb-doc/dita/ref_guide/sql_commands/ALTER_TABLE.xml
@@ -16,7 +16,8 @@ ALTER TABLE <varname>name</varname> SET SCHEMA <varname>new_schema</varname>
 ALTER TABLE [ONLY] <varname>name</varname> SET 
      WITH (REORGANIZE=true|false)
    | DISTRIBUTED BY (<varname>column</varname>, [ ... ] ) 
-   | DISTRIBUTED RANDOMLY 
+   | DISTRIBUTED RANDOMLY
+   | DISTRIBUTED REPLICATED 
 
 ALTER TABLE [ONLY] <varname>name</varname> <varname>action</varname> [, ... ]

@@ -205,10 +206,11 @@ ALTER TABLE <varname>name</varname>
                                        fillfactors are appropriate. Note that the table contents
                                        will not be modified immediately by this command. You will
                                        need to rewrite the table to get the desired effects.</li>
-                                <li id="ay141947"><b>SET DISTRIBUTED</b> —Changes the distribution
-                                        policy of a table. Changes to a hash distribution policy
-                                        will cause the table data to be physically redistributed on
-                                        disk, which can be resource intensive.</li>
+                                <li id="ay141947"><b>SET DISTRIBUTED</b> — Changes the distribution
+                                        policy of a table. Changing a hash distribution policy, or
+                                        changing to or from a  replicated policy, will cause the
+                                        table data to be physically redistributed on disk, which can
+                                        be resource intensive.</li>
                                <li id="ay137232"><b>INHERIT <varname>parent_table</varname> / NO
                                                INHERIT <varname>parent_table</varname></b> — Adds
                                        or removes the target table as a child of the specified
@@ -217,12 +219,12 @@ ALTER TABLE <varname>name</varname>
                                        target table must already contain all the same columns as
                                        the parent (it could have additional columns, too). The
                                        columns must have matching data types, and if they have
-                                                <codeph>NOT</codeph><codeph>NULL</codeph>
-                                        constraints in the parent then they must also have
-                                                <codeph>NOT NULL</codeph> constraints in the child.
-                                        There must also be matching child-table constraints for all
-                                                <codeph>CHECK</codeph> constraints of the
-                                        parent.</li>
+                                                <codeph>NOT NULL</codeph> constraints in the parent
+                                        then they must also have <codeph>NOT NULL</codeph>
+                                        constraints in the child. There must also be matching
+                                        child-table constraints for all <codeph>CHECK</codeph>
+                                        constraints of the parent. Neither the target table nor the
+                                        parent table may be replicated tables.</li>
                                <li id="ay137262"><b>OWNER</b> — Changes the owner of the table,
                                        sequence, or view to the specified user. </li>
                                <li id="ay137271"><b>SET TABLESPACE</b> — Changes the table's
@@ -412,15 +414,16 @@ ALTER TABLE <varname>name</varname>
                                </plentry>
                                <plentry>
                                        <pt>DISTRIBUTED BY (<varname>column</varname>) | DISTRIBUTED
-                                                RANDOMLY</pt>
+                                                RANDOMLY | DISTRIBUTED REPLICATED</pt>
                                        <pd>Specifies the distribution policy for a table. Changing
-                                                a hash distribution policy will cause the table data
-                                                to be physically redistributed on disk, which can be
-                                                resource intensive. If you declare the same hash
-                                                distribution policy or change from hash to random
-                                                distribution, data will not be redistributed unless
-                                                you declare <codeph>SET WITH
-                                                  (REORGANIZE=true)</codeph>.</pd>
+                                                a hash distribution policy causes the table data to
+                                                be physically redistributed, which can be resource
+                                                intensive. If you declare the same hash distribution
+                                                policy or change from hash to random distribution,
+                                                data will not be redistributed unless you declare
+                                                  <codeph>SET WITH (REORGANIZE=true)</codeph>.</pd>
+                                        <pd>Changing to or from a replicated distribution policy
+                                                causes the table data to be redistributed.</pd>
                                </plentry>
                                <plentry>
                                        <pt>REORGANIZE=true|false</pt>
@@ -583,7 +586,9 @@ ALTER TABLE <varname>name</varname>
                                                You can exchange a table where the table data is
                                                stored in the database. For example, the table is
                                                created with the <codeph>CREATE TABLE</codeph>
-                                                command. </pd>
+                                                command. The table must have the same number of
+                                                columns, column order, column names, column types,
+                                                and distribution policy as the parent table.</pd>
                                        <pd>With the <codeph>EXCHANGE PARTITION</codeph> clause, you
                                                can also exchange a readable external table (created
                                                with the <codeph>CREATE EXTERNAL TABLE</codeph>
@@ -754,7 +759,7 @@ ALTER TABLE <varname>name</varname>
                                can recurse only for <codeph>CHECK</codeph> constraints.</p>
                        <p>These <codeph>ALTER PARTITION</codeph> operations are supported if no
                                data is changed on a partitioned table that contains a leaf child
-                                partition that has been exchanged to use an external table
+                                partition that has been exchanged to use an external table.
                                Otherwise, an error is returned.<ul id="ul_hcw_mrn_qs">
                                        <li>Adding or dropping a column.</li>
                                        <li>Changing the data type of column.</li>
@@ -788,6 +793,8 @@ ALTER TABLE <varname>name</varname>
 (char_length(zipcode) = 5);</codeblock>
                        <p>Move a table to a different schema:</p>
                        <codeblock>ALTER TABLE myschema.distributors SET SCHEMA yourschema;</codeblock>
+                        <p>Change the distribution policy of a table to replicated:</p>
+                        <codeblock>ALTER TABLE myschema.distributors SET DISTRIBUTED REPLICATED;</codeblock>
                        <p>Add a new partition to a partitioned table:</p>
                        <codeblock>ALTER TABLE sales ADD PARTITION 
             START (date '2017-02-01') INCLUSIVE 

--- a/gpdb-doc/dita/ref_guide/sql_commands/COPY.xml
+++ b/gpdb-doc/dita/ref_guide/sql_commands/COPY.xml
@@ -58,6 +58,11 @@ COPY {table [(<varname>column</varname> [, ...])] | (<varname>query</varname>)}
        either the absolute path or the <codeph>&lt;SEG_DATA_DIR></codeph> string literal. When the
          <codeph>COPY</codeph> operation is run, the segment IDs and the paths of the segment data
        directories are substituted for the string literal values. </p>
+      <p>Using <codeph>COPY TO</codeph> with a replicated table (<codeph>DISTRIBUTED
+          REPLICATED</codeph>) as source creates a file with rows from a single segment so that the
+        target file contains no duplicate rows. Using <codeph>COPY TO</codeph> with the <codeph>ON
+          SEGMENT</codeph> clause with a replicated table as source creates target files on segment
+        hosts containing all table rows.</p>
      <p>The <codeph>ON SEGMENT</codeph> clause allows you to copy table data to files on segment
        hosts for use in operations such as migrating data between clusters or performing a backup.
        Segment data created by the <codeph>ON SEGMENT</codeph> clause can be restored by tools such

--- a/gpdb-doc/dita/ref_guide/sql_commands/CREATE_TABLE.xml
+++ b/gpdb-doc/dita/ref_guide/sql_commands/CREATE_TABLE.xml
@@ -23,7 +23,8 @@
    [ WITH ( <varname>storage_parameter</varname>=<varname>value</varname> [, ... ] )
    [ ON COMMIT {PRESERVE ROWS | DELETE ROWS | DROP} ]
    [ TABLESPACE <varname>tablespace</varname> ]
-   [ DISTRIBUTED BY (<varname>column</varname>, [ ... ] ) | DISTRIBUTED RANDOMLY ]
+   [ DISTRIBUTED BY (<varname>column</varname>, [ ... ] ) | DISTRIBUTED RANDOMLY 
+       | DISTRIBUTED REPLICATED ]
    [ PARTITION BY <varname>partition_type</varname> (<varname>column</varname>)
        [ SUBPARTITION BY <varname>partition_type</varname> (<varname>column</varname>) ] 
           [ SUBPARTITION TEMPLATE ( <varname>template_spec </varname>) ]
@@ -152,15 +153,24 @@
        constraint can also be written as a table constraint; a column constraint is only a
        notational convenience for use when the constraint only affects one column. </p>
      <p>When creating a table, there is an additional clause to declare the Greenplum Database
-        distribution policy. If a <codeph>DISTRIBUTED BY</codeph> or <codeph>DISTRIBUTED
-          RANDOMLY</codeph> clause is not supplied, then Greenplum assigns a hash distribution
-        policy to the table using either the <codeph>PRIMARY KEY</codeph> (if the table has one) or
-        the first column of the table as the distribution key. Columns of geometric or user-defined
-        data types are not eligible as Greenplum distribution key columns. If a table does not have
-        a column of an eligible data type, the rows are distributed based on a round-robin or random
-        distribution. To ensure an even distribution of data in your Greenplum Database system, you
-        want to choose a distribution key that is unique for each record, or if that is not
-        possible, then choose <codeph>DISTRIBUTED RANDOMLY</codeph>.</p>
+        distribution policy. If a <codeph>DISTRIBUTED BY</codeph>, <codeph>DISTRIBUTED
+          RANDOMLY</codeph>, or <codeph>DISTRIBUTED REPLICATED</codeph> clause is not supplied, then
+        Greenplum Database assigns a hash distribution policy to the table using either the
+          <codeph>PRIMARY KEY</codeph> (if the table has one) or the first column of the table as
+        the distribution key. Columns of geometric or user-defined data types are not eligible as
+        Greenplum distribution key columns. If a table does not have a column of an eligible data
+        type, the rows are distributed based on a round-robin or random distribution. To ensure an
+        even distribution of data in your Greenplum Database system, you want to choose a
+        distribution key that is unique for each record, or if that is not possible, then choose
+          <codeph>DISTRIBUTED RANDOMLY</codeph>. </p>
+      <p>If the <codeph>DISTRIBUTED REPLICATED</codeph> clause is supplied, Greenplum Database
+        distributes all rows of the table to all segments in the Greenplum Database system. This
+        option can be used in cases where user-defined functions must execute on the segments, and
+        the functions require access to all rows of the table. Replicated functions can also be used
+        to improve query performance by preventing broadcast motions for the table. The
+          <codeph>DISTRIBUTED REPLICATED</codeph> clause cannot be used with the <codeph>PARTITION
+          BY</codeph> clause or the <codeph>INHERITS</codeph> clause. A replicated table also cannot
+        be inherited by another table.</p>
      <p>The <codeph>PARTITION BY</codeph> clause allows you to divide the table into multiple
        sub-tables (or parts) that, taken together, make up the parent table and share its schema.
        Though the sub-tables exist as independent tables, the Greenplum Database restricts their
@@ -321,8 +331,8 @@
            identifying a set of columns as primary key also provides metadata about the design of
            the schema, as a primary key implies that other tables may rely on this set of columns
            as a unique identifier for rows. For a table to have a primary key, it must be hash
-            distributed (not randomly distributed), and the primary key The column(s) that are
-            unique must contain all the columns of the Greenplum distribution key. In addition, the
+            distributed (not randomly distributed), and the primary key, the column(s) that are
+            unique, must contain all the columns of the Greenplum distribution key. In addition, the
              <codeph>&lt;key&gt;</codeph> must contain all the columns in the partition key if the
            table is partitioned. Note that a <codeph>&lt;key&gt;</codeph> constraint in a
            partitioned table is not the same as a simple <codeph>UNIQUE INDEX</codeph>.</pd>
@@ -452,6 +462,7 @@
        <plentry>
          <pt>DISTRIBUTED BY (<varname>column</varname>, [ ... ] )</pt>
          <pt>DISTRIBUTED RANDOMLY</pt>
+          <pt>DISTRIBUTED REPLICATED</pt>
          <pd>Used to declare the Greenplum Database distribution policy for the table.
              <codeph>DISTRIBUTED BY</codeph> uses hash distribution with one or more columns
            declared as the distribution key. For the most even data distribution, the distribution
@@ -479,7 +490,14 @@
                    <cmdname>DISTRIBUTED BY</cmdname> clause is not specified as part of the table
                  creation command, the command fails.</li>
              </ul></p></pd>
-          <pd>For information about the parameter, see "Server Configuration Parameters."</pd>
+          <pd>For more information about setting the default table distribution policy, see <xref
+              href="../config_params/guc-list.xml#gp_create_table_random_default_distribution"
+              type="section" format="dita"
+                ><codeph>gp_create_table_random_default_distribution</codeph></xref>. </pd>
+          <pd>The <codeph>DISTRIBUTED REPLICATED</codeph> clause replicates the entire table to all
+            Greenplum Database segment instances. It can be used when it is necessary to execute
+            user-defined functions on segments when the functions require access to all rows in the
+            table, or to improve query performance by preventing broadcast motions. </pd>
        </plentry>
        <plentry id="part_by">
          <pt>PARTITION BY</pt>
@@ -573,8 +591,8 @@
      <title>Notes</title>
      <ul id="ul_stf_sl1_tt">
        <li>In Greenplum Database (a Postgres-based system) the data types <codeph>VARCHAR</codeph>
-          or <codeph>TEXT</codeph> handles padding added to the textual data (space characters added
-          after the last non-space character) as significant characters, the data type
+          or <codeph>TEXT</codeph> handle padding added to the textual data (space characters added
+          after the last non-space character) as significant characters; the data type
            <codeph>CHAR</codeph> does not.<p>In Greenplum Database, values of type
                <codeph>CHAR(<varname>n</varname>)</codeph> are padded with trailing spaces to the
            specified width <varname>n</varname>. The values are stored and displayed with the
@@ -598,10 +616,12 @@
          as (or a superset of) the table's distribution key columns. Also, the distribution key
          must be a left-subset of the constraint columns with the columns in the correct order. For
          example, if the primary key is (a,b,c), the distribution key can be only one of the
-          following: (a), (a,b), or (a,b,c).<p>A primary key constraint is simply a combination of a
-            unique constraint and a not-null constraint.</p><p>Greenplum Database automatically
-            creates a <codeph>UNIQUE</codeph> index for each <codeph>UNIQUE</codeph> or
-              <codeph>PRIMARY KEY</codeph> constraint to enforce uniqueness. Thus, it is not
+          following: (a), (a,b), or (a,b,c).<p>Replicated tables (<codeph>DISTRIBUTED
+              REPLICATED</codeph>) can have both <codeph>PRIMARY KEY</codeph> and
+              <codeph>UNIQUE</codeph>column constraints.</p><p>A primary key constraint is simply a
+            combination of a unique constraint and a not-null constraint.</p><p>Greenplum Database
+            automatically creates a <codeph>UNIQUE</codeph> index for each <codeph>UNIQUE</codeph>
+            or <codeph>PRIMARY KEY</codeph> constraint to enforce uniqueness. Thus, it is not
            necessary to create an index explicitly for primary key columns. <codeph>UNIQUE</codeph>
            and <codeph>PRIMARY KEY</codeph> constraints are not allowed on append-optimized tables
            because the <codeph>UNIQUE</codeph> indexes that are created by the constraints are not
@@ -611,8 +631,8 @@
            implementation.</p></li>
        <li>For append-optimized tables, <codeph>UPDATE</codeph> and <codeph>DELETE</codeph> are not
          allowed in a serializable transaction and will cause the transaction to abort.
-            <codeph>CLUSTER</codeph>, <codeph>DECLARE...FOR</codeph><codeph>UPDATE</codeph>, and
-          triggers are not supported with append-optimized tables.</li>
+            <codeph>CLUSTER</codeph>, <codeph>DECLARE...FOR UPDATE</codeph>, and triggers are not
+          supported with append-optimized tables.</li>
        <li>To insert data into a partitioned table, you specify the root partitioned table, the
          table created with the <codeph>CREATE TABLE</codeph> command. You also can specify a leaf
          child table of the partitioned table in an <codeph>INSERT</codeph> command. An error is

--- a/gpdb-doc/dita/ref_guide/sql_commands/CREATE_TABLE_AS.xml
+++ b/gpdb-doc/dita/ref_guide/sql_commands/CREATE_TABLE_AS.xml
@@ -13,7 +13,8 @@
    [ON COMMIT {PRESERVE ROWS | DELETE ROWS | DROP}]
    [TABLESPACE <varname>tablespace</varname>]
    AS <varname>query</varname>
-   [DISTRIBUTED BY (<varname>column</varname>, [ ... ] ) | DISTRIBUTED RANDOMLY]</codeblock>
+   [DISTRIBUTED BY (<varname>column</varname>, [ ... ] ) | DISTRIBUTED RANDOMLY 
+      | DISTRIBUTED REPLICATED ]</codeblock>
      <p>where <varname>storage_parameter</varname> is:</p>
      <codeblock>   APPENDONLY={TRUE|FALSE}
    BLOCKSIZE={8192-2097152}
@@ -134,12 +135,16 @@
        <plentry>
          <pt>DISTRIBUTED BY (<varname>column</varname>, [ ... ] )</pt>
          <pt>DISTRIBUTED RANDOMLY</pt>
+          <pt>DISTRIBUTED REPLICATED</pt>
          <pd>Used to declare the Greenplum Database distribution policy for the table.
              <codeph>DISTIBUTED BY</codeph> uses hash distribution with one or more columns
            declared as the distribution key. For the most even data distribution, the distribution
            key should be the primary key of the table or a unique column (or set of columns). If
            that is not possible, then you may choose <codeph>DISTRIBUTED RANDOMLY</codeph>, which
-            will send the data round-robin to the segment instances. </pd>
+            will send the data round-robin to the segment instances.</pd>
+          <pd><codeph>DISTRIBUTED REPLICATED</codeph> replicates all rows in the table to all
+            Greenplum Database segments. It cannot be used with partitioned tables or with tables
+            that inhert from other tables.</pd>
          <pd>The Greenplum Database server configuration parameter
              <codeph>gp_create_table_random_default_distribution</codeph> controls the default
            table distribution policy if the <cmdname>DISTRIBUTED BY</cmdname> clause is not
@@ -151,11 +156,14 @@
              <li>If the legacy query optimizer creates the table, and the value of the parameter is
                  <codeph>on</codeph>, the table distribution policy is random.</li>
              <li>If GPORCA creates the table, the table distribution policy is random. The
-                parameter value has no affect. </li>
+                parameter value has no effect. </li>
            </ul></pd>
-          <pd>For information about the parameter, see "Server Configuration Parameters." For
-            information about the legacy query optimizer and GPORCA, see "Querying Data" in the
-              <cite>Greenplum Database Administrator Guide</cite>. </pd>
+          <pd>For more information about setting the default table distribution policy, see <xref
+              href="../config_params/guc-list.xml#gp_create_table_random_default_distribution"
+                ><codeph>gp_create_table_random_default_distribution</codeph></xref>. For
+            information about the legacy query optimizer and GPORCA, see <xref
+              href="../../admin_guide/query/topics/query.xml#topic1">Querying Data</xref> in the
+              <cite>Greenplum Database Administrator Guide</cite>.</pd>
        </plentry>
      </parml>
    </section>