未验证 提交 fa01ff3d 编写于 作者: C Chuck Litzell 提交者: GitHub

Modify SQL references for replicated tables (#5883)

* Modify SQL references for replicated tables

* Changes for edits and review comments
上级 04e43e64
......@@ -16,7 +16,8 @@ ALTER TABLE <varname>name</varname> SET SCHEMA <varname>new_schema</varname>
ALTER TABLE [ONLY] <varname>name</varname> SET
    WITH (REORGANIZE=true|false)
   | DISTRIBUTED BY (<varname>column</varname>, [ ... ] )
   | DISTRIBUTED RANDOMLY
   | DISTRIBUTED RANDOMLY
| DISTRIBUTED REPLICATED
ALTER TABLE [ONLY] <varname>name</varname> <varname>action</varname> [, ... ]
......@@ -205,10 +206,11 @@ ALTER TABLE <varname>name</varname>
fillfactors are appropriate. Note that the table contents
will not be modified immediately by this command. You will
need to rewrite the table to get the desired effects.</li>
<li id="ay141947"><b>SET DISTRIBUTED</b> —Changes the distribution
policy of a table. Changes to a hash distribution policy
will cause the table data to be physically redistributed on
disk, which can be resource intensive.</li>
<li id="ay141947"><b>SET DISTRIBUTED</b> — Changes the distribution
policy of a table. Changing a hash distribution policy, or
changing to or from a replicated policy, will cause the
table data to be physically redistributed on disk, which can
be resource intensive.</li>
<li id="ay137232"><b>INHERIT <varname>parent_table</varname> / NO
INHERIT <varname>parent_table</varname></b> — Adds
or removes the target table as a child of the specified
......@@ -217,12 +219,12 @@ ALTER TABLE <varname>name</varname>
target table must already contain all the same columns as
the parent (it could have additional columns, too). The
columns must have matching data types, and if they have
<codeph>NOT</codeph><codeph>NULL</codeph>
constraints in the parent then they must also have
<codeph>NOT NULL</codeph> constraints in the child.
There must also be matching child-table constraints for all
<codeph>CHECK</codeph> constraints of the
parent.</li>
<codeph>NOT NULL</codeph> constraints in the parent
then they must also have <codeph>NOT NULL</codeph>
constraints in the child. There must also be matching
child-table constraints for all <codeph>CHECK</codeph>
constraints of the parent. Neither the target table nor the
parent table may be replicated tables.</li>
<li id="ay137262"><b>OWNER</b> — Changes the owner of the table,
sequence, or view to the specified user. </li>
<li id="ay137271"><b>SET TABLESPACE</b> — Changes the table's
......@@ -412,15 +414,16 @@ ALTER TABLE <varname>name</varname>
</plentry>
<plentry>
<pt>DISTRIBUTED BY (<varname>column</varname>) | DISTRIBUTED
RANDOMLY</pt>
RANDOMLY | DISTRIBUTED REPLICATED</pt>
<pd>Specifies the distribution policy for a table. Changing
a hash distribution policy will cause the table data
to be physically redistributed on disk, which can be
resource intensive. If you declare the same hash
distribution policy or change from hash to random
distribution, data will not be redistributed unless
you declare <codeph>SET WITH
(REORGANIZE=true)</codeph>.</pd>
a hash distribution policy causes the table data to
be physically redistributed, which can be resource
intensive. If you declare the same hash distribution
policy or change from hash to random distribution,
data will not be redistributed unless you declare
<codeph>SET WITH (REORGANIZE=true)</codeph>.</pd>
<pd>Changing to or from a replicated distribution policy
causes the table data to be redistributed.</pd>
</plentry>
<plentry>
<pt>REORGANIZE=true|false</pt>
......@@ -583,7 +586,9 @@ ALTER TABLE <varname>name</varname>
You can exchange a table where the table data is
stored in the database. For example, the table is
created with the <codeph>CREATE TABLE</codeph>
command. </pd>
command. The table must have the same number of
columns, column order, column names, column types,
and distribution policy as the parent table.</pd>
<pd>With the <codeph>EXCHANGE PARTITION</codeph> clause, you
can also exchange a readable external table (created
with the <codeph>CREATE EXTERNAL TABLE</codeph>
......@@ -754,7 +759,7 @@ ALTER TABLE <varname>name</varname>
can recurse only for <codeph>CHECK</codeph> constraints.</p>
<p>These <codeph>ALTER PARTITION</codeph> operations are supported if no
data is changed on a partitioned table that contains a leaf child
partition that has been exchanged to use an external table
partition that has been exchanged to use an external table.
Otherwise, an error is returned.<ul id="ul_hcw_mrn_qs">
<li>Adding or dropping a column.</li>
<li>Changing the data type of column.</li>
......@@ -788,6 +793,8 @@ ALTER TABLE <varname>name</varname>
(char_length(zipcode) = 5);</codeblock>
<p>Move a table to a different schema:</p>
<codeblock>ALTER TABLE myschema.distributors SET SCHEMA yourschema;</codeblock>
<p>Change the distribution policy of a table to replicated:</p>
<codeblock>ALTER TABLE myschema.distributors SET DISTRIBUTED REPLICATED;</codeblock>
<p>Add a new partition to a partitioned table:</p>
<codeblock>ALTER TABLE sales ADD PARTITION
            START (date '2017-02-01') INCLUSIVE
......
......@@ -58,6 +58,11 @@ COPY {table [(<varname>column</varname> [, ...])] | (<varname>query</varname>)}
either the absolute path or the <codeph>&lt;SEG_DATA_DIR></codeph> string literal. When the
<codeph>COPY</codeph> operation is run, the segment IDs and the paths of the segment data
directories are substituted for the string literal values. </p>
<p>Using <codeph>COPY TO</codeph> with a replicated table (<codeph>DISTRIBUTED
REPLICATED</codeph>) as source creates a file with rows from a single segment so that the
target file contains no duplicate rows. Using <codeph>COPY TO</codeph> with the <codeph>ON
SEGMENT</codeph> clause with a replicated table as source creates target files on segment
hosts containing all table rows.</p>
<p>The <codeph>ON SEGMENT</codeph> clause allows you to copy table data to files on segment
hosts for use in operations such as migrating data between clusters or performing a backup.
Segment data created by the <codeph>ON SEGMENT</codeph> clause can be restored by tools such
......
......@@ -23,7 +23,8 @@
   [ WITH ( <varname>storage_parameter</varname>=<varname>value</varname> [, ... ] )
   [ ON COMMIT {PRESERVE ROWS | DELETE ROWS | DROP} ]
   [ TABLESPACE <varname>tablespace</varname> ]
   [ DISTRIBUTED BY (<varname>column</varname>, [ ... ] ) | DISTRIBUTED RANDOMLY ]
   [ DISTRIBUTED BY (<varname>column</varname>, [ ... ] ) | DISTRIBUTED RANDOMLY
| DISTRIBUTED REPLICATED ]
   [ PARTITION BY <varname>partition_type</varname> (<varname>column</varname>)
       [ SUBPARTITION BY <varname>partition_type</varname> (<varname>column</varname>) ]
          [ SUBPARTITION TEMPLATE ( <varname>template_spec </varname>) ]
......@@ -152,15 +153,24 @@
constraint can also be written as a table constraint; a column constraint is only a
notational convenience for use when the constraint only affects one column. </p>
<p>When creating a table, there is an additional clause to declare the Greenplum Database
distribution policy. If a <codeph>DISTRIBUTED BY</codeph> or <codeph>DISTRIBUTED
RANDOMLY</codeph> clause is not supplied, then Greenplum assigns a hash distribution
policy to the table using either the <codeph>PRIMARY KEY</codeph> (if the table has one) or
the first column of the table as the distribution key. Columns of geometric or user-defined
data types are not eligible as Greenplum distribution key columns. If a table does not have
a column of an eligible data type, the rows are distributed based on a round-robin or random
distribution. To ensure an even distribution of data in your Greenplum Database system, you
want to choose a distribution key that is unique for each record, or if that is not
possible, then choose <codeph>DISTRIBUTED RANDOMLY</codeph>.</p>
distribution policy. If a <codeph>DISTRIBUTED BY</codeph>, <codeph>DISTRIBUTED
RANDOMLY</codeph>, or <codeph>DISTRIBUTED REPLICATED</codeph> clause is not supplied, then
Greenplum Database assigns a hash distribution policy to the table using either the
<codeph>PRIMARY KEY</codeph> (if the table has one) or the first column of the table as
the distribution key. Columns of geometric or user-defined data types are not eligible as
Greenplum distribution key columns. If a table does not have a column of an eligible data
type, the rows are distributed based on a round-robin or random distribution. To ensure an
even distribution of data in your Greenplum Database system, you want to choose a
distribution key that is unique for each record, or if that is not possible, then choose
<codeph>DISTRIBUTED RANDOMLY</codeph>. </p>
<p>If the <codeph>DISTRIBUTED REPLICATED</codeph> clause is supplied, Greenplum Database
distributes all rows of the table to all segments in the Greenplum Database system. This
option can be used in cases where user-defined functions must execute on the segments, and
the functions require access to all rows of the table. Replicated functions can also be used
to improve query performance by preventing broadcast motions for the table. The
<codeph>DISTRIBUTED REPLICATED</codeph> clause cannot be used with the <codeph>PARTITION
BY</codeph> clause or the <codeph>INHERITS</codeph> clause. A replicated table also cannot
be inherited by another table.</p>
<p>The <codeph>PARTITION BY</codeph> clause allows you to divide the table into multiple
sub-tables (or parts) that, taken together, make up the parent table and share its schema.
Though the sub-tables exist as independent tables, the Greenplum Database restricts their
......@@ -321,8 +331,8 @@
identifying a set of columns as primary key also provides metadata about the design of
the schema, as a primary key implies that other tables may rely on this set of columns
as a unique identifier for rows. For a table to have a primary key, it must be hash
distributed (not randomly distributed), and the primary key The column(s) that are
unique must contain all the columns of the Greenplum distribution key. In addition, the
distributed (not randomly distributed), and the primary key, the column(s) that are
unique, must contain all the columns of the Greenplum distribution key. In addition, the
<codeph>&lt;key&gt;</codeph> must contain all the columns in the partition key if the
table is partitioned. Note that a <codeph>&lt;key&gt;</codeph> constraint in a
partitioned table is not the same as a simple <codeph>UNIQUE INDEX</codeph>.</pd>
......@@ -452,6 +462,7 @@
<plentry>
<pt>DISTRIBUTED BY (<varname>column</varname>, [ ... ] )</pt>
<pt>DISTRIBUTED RANDOMLY</pt>
<pt>DISTRIBUTED REPLICATED</pt>
<pd>Used to declare the Greenplum Database distribution policy for the table.
<codeph>DISTRIBUTED BY</codeph> uses hash distribution with one or more columns
declared as the distribution key. For the most even data distribution, the distribution
......@@ -479,7 +490,14 @@
<cmdname>DISTRIBUTED BY</cmdname> clause is not specified as part of the table
creation command, the command fails.</li>
</ul></p></pd>
<pd>For information about the parameter, see "Server Configuration Parameters."</pd>
<pd>For more information about setting the default table distribution policy, see <xref
href="../config_params/guc-list.xml#gp_create_table_random_default_distribution"
type="section" format="dita"
><codeph>gp_create_table_random_default_distribution</codeph></xref>. </pd>
<pd>The <codeph>DISTRIBUTED REPLICATED</codeph> clause replicates the entire table to all
Greenplum Database segment instances. It can be used when it is necessary to execute
user-defined functions on segments when the functions require access to all rows in the
table, or to improve query performance by preventing broadcast motions. </pd>
</plentry>
<plentry id="part_by">
<pt>PARTITION BY</pt>
......@@ -573,8 +591,8 @@
<title>Notes</title>
<ul id="ul_stf_sl1_tt">
<li>In Greenplum Database (a Postgres-based system) the data types <codeph>VARCHAR</codeph>
or <codeph>TEXT</codeph> handles padding added to the textual data (space characters added
after the last non-space character) as significant characters, the data type
or <codeph>TEXT</codeph> handle padding added to the textual data (space characters added
after the last non-space character) as significant characters; the data type
<codeph>CHAR</codeph> does not.<p>In Greenplum Database, values of type
<codeph>CHAR(<varname>n</varname>)</codeph> are padded with trailing spaces to the
specified width <varname>n</varname>. The values are stored and displayed with the
......@@ -598,10 +616,12 @@
as (or a superset of) the table's distribution key columns. Also, the distribution key
must be a left-subset of the constraint columns with the columns in the correct order. For
example, if the primary key is (a,b,c), the distribution key can be only one of the
following: (a), (a,b), or (a,b,c).<p>A primary key constraint is simply a combination of a
unique constraint and a not-null constraint.</p><p>Greenplum Database automatically
creates a <codeph>UNIQUE</codeph> index for each <codeph>UNIQUE</codeph> or
<codeph>PRIMARY KEY</codeph> constraint to enforce uniqueness. Thus, it is not
following: (a), (a,b), or (a,b,c).<p>Replicated tables (<codeph>DISTRIBUTED
REPLICATED</codeph>) can have both <codeph>PRIMARY KEY</codeph> and
<codeph>UNIQUE</codeph>column constraints.</p><p>A primary key constraint is simply a
combination of a unique constraint and a not-null constraint.</p><p>Greenplum Database
automatically creates a <codeph>UNIQUE</codeph> index for each <codeph>UNIQUE</codeph>
or <codeph>PRIMARY KEY</codeph> constraint to enforce uniqueness. Thus, it is not
necessary to create an index explicitly for primary key columns. <codeph>UNIQUE</codeph>
and <codeph>PRIMARY KEY</codeph> constraints are not allowed on append-optimized tables
because the <codeph>UNIQUE</codeph> indexes that are created by the constraints are not
......@@ -611,8 +631,8 @@
implementation.</p></li>
<li>For append-optimized tables, <codeph>UPDATE</codeph> and <codeph>DELETE</codeph> are not
allowed in a serializable transaction and will cause the transaction to abort.
<codeph>CLUSTER</codeph>, <codeph>DECLARE...FOR</codeph><codeph>UPDATE</codeph>, and
triggers are not supported with append-optimized tables.</li>
<codeph>CLUSTER</codeph>, <codeph>DECLARE...FOR UPDATE</codeph>, and triggers are not
supported with append-optimized tables.</li>
<li>To insert data into a partitioned table, you specify the root partitioned table, the
table created with the <codeph>CREATE TABLE</codeph> command. You also can specify a leaf
child table of the partitioned table in an <codeph>INSERT</codeph> command. An error is
......
......@@ -13,7 +13,8 @@
   [ON COMMIT {PRESERVE ROWS | DELETE ROWS | DROP}]
   [TABLESPACE <varname>tablespace</varname>]
   AS <varname>query</varname>
   [DISTRIBUTED BY (<varname>column</varname>, [ ... ] ) | DISTRIBUTED RANDOMLY]</codeblock>
   [DISTRIBUTED BY (<varname>column</varname>, [ ... ] ) | DISTRIBUTED RANDOMLY
| DISTRIBUTED REPLICATED ]</codeblock>
<p>where <varname>storage_parameter</varname> is:</p>
<codeblock>   APPENDONLY={TRUE|FALSE}
   BLOCKSIZE={8192-2097152}
......@@ -134,12 +135,16 @@
<plentry>
<pt>DISTRIBUTED BY (<varname>column</varname>, [ ... ] )</pt>
<pt>DISTRIBUTED RANDOMLY</pt>
<pt>DISTRIBUTED REPLICATED</pt>
<pd>Used to declare the Greenplum Database distribution policy for the table.
<codeph>DISTIBUTED BY</codeph> uses hash distribution with one or more columns
declared as the distribution key. For the most even data distribution, the distribution
key should be the primary key of the table or a unique column (or set of columns). If
that is not possible, then you may choose <codeph>DISTRIBUTED RANDOMLY</codeph>, which
will send the data round-robin to the segment instances. </pd>
will send the data round-robin to the segment instances.</pd>
<pd><codeph>DISTRIBUTED REPLICATED</codeph> replicates all rows in the table to all
Greenplum Database segments. It cannot be used with partitioned tables or with tables
that inhert from other tables.</pd>
<pd>The Greenplum Database server configuration parameter
<codeph>gp_create_table_random_default_distribution</codeph> controls the default
table distribution policy if the <cmdname>DISTRIBUTED BY</cmdname> clause is not
......@@ -151,11 +156,14 @@
<li>If the legacy query optimizer creates the table, and the value of the parameter is
<codeph>on</codeph>, the table distribution policy is random.</li>
<li>If GPORCA creates the table, the table distribution policy is random. The
parameter value has no affect. </li>
parameter value has no effect. </li>
</ul></pd>
<pd>For information about the parameter, see "Server Configuration Parameters." For
information about the legacy query optimizer and GPORCA, see "Querying Data" in the
<cite>Greenplum Database Administrator Guide</cite>. </pd>
<pd>For more information about setting the default table distribution policy, see <xref
href="../config_params/guc-list.xml#gp_create_table_random_default_distribution"
><codeph>gp_create_table_random_default_distribution</codeph></xref>. For
information about the legacy query optimizer and GPORCA, see <xref
href="../../admin_guide/query/topics/query.xml#topic1">Querying Data</xref> in the
<cite>Greenplum Database Administrator Guide</cite>.</pd>
</plentry>
</parml>
</section>
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册