提交 3373508e 编写于 作者: M Mel Kiyama 提交者: David Yozie

docs - move install guide to gpdb repo (#8666)

* docs - move install guide to gpdb repo

--move Install Guide source files back to gpdb repo.
--update config.yml and gpdb-landing-subnav.erb files for OSS doc builds.
--removed refs directory - unused utility reference pages.
--Also added more info to creating a gpadmin user.

These files have conditionalized text (pivotal and oss-only).

./supported-platforms.xml
./install_gpdb.xml
./apx_mgmt_utils.xml
./install_guide.ditamap
./preinstall_concepts.xml
./migrate.xml
./install_modules.xml
./prep_os.xml
./upgrading.xml

* docs - updated supported platforms with PXF information.

* docs - install guide review comment update

-- renamed one file from supported-platforms.xml to platform-requirements.xml

* docs - reworded requirement/warning based on review comments.
上级 693b28e1
...@@ -23,6 +23,13 @@ sections: ...@@ -23,6 +23,13 @@ sections:
dita_sections: dita_sections:
- repository:
name: dita
at_path: install_guide
directory: 6-0/install_guide
ditamap_location: install_guide.ditamap
ditaval_location: ../gpdb-webhelp.ditaval
- repository: - repository:
name: dita name: dita
at_path: admin_guide at_path: admin_guide
......
...@@ -5,6 +5,9 @@ ...@@ -5,6 +5,9 @@
<li> <li>
<a href="/6-0/common/gpdb-features.html">Greenplum Database&reg; 6.0 Documentation</a> <a href="/6-0/common/gpdb-features.html">Greenplum Database&reg; 6.0 Documentation</a>
</li> </li>
<li>
<a href="/6-0/install_guide/install_guide.html">Installation Guide</a>
</li>
<li> <li>
<a href="/6-0/admin_guide/admin_guide.html">Administrator Guide</a> <a href="/6-0/admin_guide/admin_guide.html">Administrator Guide</a>
</li> </li>
......
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic1" xml:lang="en">
<title id="ji162018">About Implicit Text Casting in Greenplum Database</title>
<shortdesc>Greenplum Database version 4.3.x is based on PostgreSQL version 8.2. Greenplum Database
version 6.x is based on PostgreSQL version 9.4. PostgreSQL 8.3 removed automatic implicit casts
between the <codeph>text</codeph> type and other data types. When you migrate from Greenplum
Database version 4.3.x to version 6, this change in behavior might impact existing applications
and queries. </shortdesc>
<body>
<p>For information about how Greenplum Database 6 performs type casts, see <xref
href="../admin_guide/query/topics/defining-queries.xml#topic14" scope="peer">Type
Casts</xref><ph otherprops="op-print">in the <cite>Greenplum Database Administrator
Guide</cite></ph>.</p>
<p><b>What is different in Greenplum Database 6</b></p>
<p>Greenplum Database 6 does not automatically implicitly cast between text and other data
types. Greenplum Database 6 also treats certain automatic implicit casts differently than
version 4.3.x, and in some cases does not handle them at all. <b>Applications or queries that
you wrote for Greenplum Database 4.3.x that rely on automatic implicit casting may fail on
Greenplum Database version 6.</b></p>
<p>(The term <i>implicit cast</i>, when used in the remainder of this section, refers to
implicit casts automatically applied by Greenplum Database.)</p>
<ul>
<li>Greenplum Database 6 has downgraded implicit casts in the to-text type direction; these
casts are now treated as assignment casts. A cast from a data type to the text type will
continue to work in Greenplum Database 6 if used in assignment contexts.</li>
<li>Greenplum Database 6 no longer automatically provides an implicit cast in the to-text type
direction that can be used in expression contexts. Additionally, Greenplum Database 6 no
longer provides implicit casts in the from-text type direction. When such expressions or
assignments are encountered, Greenplum Database 6 returns an error and the following
message:<codeblock>HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.</codeblock>To
illustrate, suppose you create two
tables:<codeblock>CREATE TABLE foo (a int) DISTRIBUTED RANDOMLY ;
CREATE TABLE bar (b text) DISTRIBUTED RANDOMLY ;</codeblock>
The following examples demonstrate certain types of text comparison queries that will fail
on Greenplum Database 6. <note>This is not an exhaustive list of failure scenarios.</note><ul>
<li>Queries that reference <codeph>text</codeph> type and non-text type columns in an
expression. In this example query, the comparison expression returns a cast
error.<codeblock>SELECT * FROM foo, bar WHERE foo.a = bar.b;
ERROR: operator does not exist: integer = text
LINE 1: SELECT * FROM foo, bar WHERE foo.a = bar.b;
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.</codeblock>The
updated example casts the <codeph>text</codeph> type to an <codeph>integer</codeph>
type.<codeblock>SELECT * FROM foo, bar WHERE foo.a = bar.b::int;</codeblock></li>
<li>Queries that mix the <codeph>text</codeph> type and non-text type columns in function
and aggregate arguments. In this example, the query that executes the example function
<codeph>concat</codeph> returns a cast
error.<codeblock>CREATE FUNCTION concat(TEXT, TEXT)
RETURNS TEXT AS $$
SELECT $1 || $2
$$ STRICT LANGUAGE SQL;
SELECT concat('a'::TEXT, 2);</codeblock>Adding
an explicit cast from <codeph>integer</codeph> to <codeph>text</codeph> fixes the
issue.<codeblock>SELECT concat('a', 2::text);</codeblock></li>
<li>Queries that perform comparisons between a <codeph>text</codeph> type column and a
non-quoted literal such as an <codeph>integer</codeph>, <codeph>number</codeph>,
<codeph>float</codeph>, or <codeph>oid</codeph>. This example query that compares text
and non-quoted integer returns an
error.<codeblock>SELECT * FROM bar WHERE b = 123;</codeblock>Adding an explicit cast to
text fixes the issue.<codeblock>SELECT * FROM bar WHERE b = 123::text;</codeblock></li>
<li>Queries that perform comparisons between a <codeph>date</codeph> type column or
literal and an integer-like column (Greenplum Database internally converts date types to
the text type) . This example query that compares an <codeph>integer</codeph> column
with a literal of type <codeph>date</codeph> returns an
error.<codeblock>SELECT * FROM foo WHERE a = '20130101'::DATE;</codeblock>There is no
built-in cast from integer type to <codeph>date</codeph> type. However, you can
explicitly cast an <codeph>integer</codeph> to <codeph>text</codeph> and then to
<codeph>date</codeph>. The updated examples use the <codeph>cast</codeph> and
<codeph>::</codeph>
syntax.<codeblock>SELECT * FROM foo WHERE cast(cast(a AS text) AS date) = '20130101'::date;
SELECT * FROM foo WHERE (a::text)::date = '20130101'::date;</codeblock></li>
</ul></li>
</ul>
<p><b>The only supported workaround for the implicit casting differences between Greenplum
Database versions 4.3.x and 6 is to analyze failing applications and queries and update the
application or query to use explicit casts to fix the failures.</b></p>
<p>If rewriting the application or query is not feasible, you may choose to temporarily work
around the change in behaviour introduced by the removal of automatic implicit casts in
Greenplum Database 6. There are two well-known workarounds to this PostgreSQL issue:<ul>
<li>Re-create the implicit casts (described in <xref
href="http://petereisentraut.blogspot.com/2008/03/readding-implicit-casts-in-postgresql.html"
format="html" scope="external">Readding implicit casts in PostgreSQL 8.3</xref>).</li>
<li>Create missing operators (described in <xref
href="http://blog.ioguix.net/postgresql/2010/12/11/Problems-and-workaround-recreating-casts-with-8.3+.html"
format="html" scope="external">Problems and workaround recreating implicit casts using
8.3+</xref>).</li>
</ul>
</p>
<p>The workaround to re-create the implicit casts is not recommended as it breaks concatenation
functionality. With the create missing operators workaround, you create the operators and
functions that implement the comparison expressions that are failing in your applications and
queries.</p>
</body>
<topic id="temp_workaround">
<title>Workaround: Manually Creating Missing Operators</title>
<body>
<note type="warning">Use this workaround only to aid migration to Greenplum Database 6 for
evaluation purposes. Do not use this workaround in a production environment.</note>
<p>When you create an operator, you identify the data types of the left operand and the right
operand. You also identify the name of a function that Greenplum Database invokes to
evaluate the operator expression between the specified data types. The operator function
evaluates the expression by performing either to-text or from-text conversion using the
INPUT/OUTPUT methods of the data types involved. By creating operators for each (text type,
other data type) and (other data type, text type) combination, you effectively implement the
casts that are missing in Greenplum Database 6. </p>
<p>To implement this workaround, complete the following tasks <b>after</b> you install
Greenplum Database 6:</p>
<ol>
<li>Identify and note the names of the Greenplum 6 databases in which you want to create the
missing operators. Consider applying this workaround to all databases in your Greenplum
Database deployment.</li>
<li>Identify a schema in which to create the operators and functions. Use a schema other
than <codeph>pg_catalog</codeph> to ensure that these objects are included in a
<codeph>pg_dump</codeph> or <codeph>gpbackup</codeph> of the database. This procedure
will use a schema named <codeph>cast_fix</codeph> for illustrative purposes.</li>
<li>Review the blog entry <xref
href="http://blog.ioguix.net/postgresql/2010/12/11/Problems-and-workaround-recreating-casts-with-8.3+.html"
format="html" scope="external">Problems and workaround recreating implicit casts using
8.3+</xref>. The blog discusses this temporary workaround to the casting issue, i.e.
creating missing operators. It also references a SQL script that you can run to create a
set of equality (<codeph>=</codeph>) operators and functions for several text and other
data type comparisons.</li>
<li>Download the <xref href="https://gist.github.com/ioguix/4dd187986c4a1b7e1160"
format="html" scope="external"><codeph>8.3 operator workaround.sql</codeph></xref>
script referenced on the blog page, noting the location to which the file was downloaded
on your local system.</li>
<li>The <codeph>8.3 operator workaround.sql</codeph> script creates the equality operators
and functions. Open the script in the editor of your choice, and examine the contents. For
example, using the <codeph>vi</codeph>
editor:<codeblock>vi 8.3 operator workaround.sql</codeblock><p>Notice that the script
creates the operators and functions in the <codeph>pg_catalog</codeph> schema.</p></li>
<li>Replace occurrences of <codeph>pg_catalog</codeph> in the script with the name of the
schema that you identified in Step 2, and then save the file and exit the editor. (You
will create this schema in an upcoming step if the schema does not already exist.) For
example:<codeblock>:s/pg_catalog/cast_fix/g
:wq</codeblock></li>
<li>Analyze your failing queries, identifying the operators and from-type and to-type data
type comparisons that are the source of the failures. Compare this list to the contents of
the <codeph>8.3 operator workaround.sql</codeph> script, and identify the minimum set of
additional operators and left_type/right_type expression combinations that you must
support.</li>
<li>For each operator and left_type/right_type combination that you identify in the previous
step, add <codeph>CREATE</codeph> statements for the following <i>objects</i> to the
<codeph>8.3 operator workaround.sql</codeph> script:<ol>
<li><i>Create the function that implements the left_type operator right_type
comparison.</i> For example, to create a function that implements the greater than
(<codeph>&gt;</codeph>) operator for text (left_type) to integer (right_type)
comparison:<codeblock>CREATE FUNCTION cast_fix.textgtint(text, integer)
RETURNS boolean
STRICT IMMUTABLE LANGUAGE SQL AS $$
SELECT textin(int4out($2)) > $1;
$$;</codeblock><p>
Be sure to schema-qualify the function name.</p></li>
<li><i>Create the operator</i>. For example, to create a greater than
(<codeph>&gt;</codeph>) operator for text (left_type) to integer (right_type) type
comparison that specifies the function you created
above:<codeblock>CREATE OPERATOR cast_fix.> (PROCEDURE=cast_fix.textgtint, LEFTARG=text, RIGHTARG=integer, COMMUTATOR=OPERATOR(cast_fix.>))</codeblock><p>
Be sure to schema-qualify the operator and function names.</p></li>
<li>You must create another operator and function if you want the operator to work in
reverse (i.e. using the example above, if you want a greater than operator for integer
(left_type) to text (right_type) comparison.)</li>
</ol>
</li>
<li>For each database that you identified in Step 1, add the missing operators. For example:<ol>
<li>Connect to the database as an administrative user. For
example:<codeblock>$ psql -d database1 -U gpadmin</codeblock></li>
<li>Create the schema if it does not already exist. For example:
<codeblock>CREATE SCHEMA cast_fix;</codeblock></li>
<li>Run the script. For example, if you downloaded the file to the <codeph>/tmp</codeph>
directory:<codeblock>\i '/tmp/8.3 operator workaround.sql'</codeblock></li>
</ol><p>You must create the schema and run the script for every new database that you
create in your Greenplum Database cluster.</p></li>
<li>Identify and note the names of the users/roles to which you want to provide this
capability. Consider exposing this to all roles in your Greenplum Database
deployment.</li>
<li>For each role that you identified in Step 10, add the schema to the role's
<codeph>search_path</codeph>. For
example:<codeblock>SHOW search_path;
ALTER ROLE bill SET search_path TO existing_search_path, cast_fix;</codeblock><p>If
required, also grant schema-level permissions to the role.</p></li>
</ol>
</body>
</topic>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="Untitled1">
<title>Example Ansible Playbook</title>
<shortdesc>A sample Ansible playbook to install a Greenplum Database software release onto the
hosts that will comprise a Greenplum Database system.</shortdesc>
<body>
<p>This Ansible playbook shows how tasks described in <xref href="install_gpdb.xml#topic1"/>
might be automated using <xref href="https://docs.ansible.com" format="html" scope="external"
>Ansible</xref>.</p>
<note type="important">This playbook is provided as an <i>example only</i> to illustrate how
Greenplum Database cluster configuration and software installation tasks can be automated
using provisioning tools such as Ansible, Chef, or Puppet. Pivotal does not provide support
for Ansible or for the playbook presented in this example.</note>
<p>The example playbook is designed for use with CentOS 7. It creates the
<codeph>gpadmin</codeph> user, installs the Greenplum Database software release, sets the
owner and group of the installed software to <codeph>gpadmin</codeph>, and sets the Pam
security limits for the <codeph>gpadmin</codeph> user. </p>
<p>You can revise the script to work with your operating system platform and to perform
additional host configuration tasks.</p>
<p>Following are steps to use this Ansible playbook.</p>
<ol id="ol_umm_h25_yhb">
<li>Install Ansible on the control node using your package manager. See the <xref
href="https://docs.ansible.com" format="html" scope="external">Ansible documention</xref>
for help with installation.</li>
<li>Set up passwordless SSH from the control node to all hosts that will be a part of the
Greenplum Database cluster. You can use the <codeph>ssh-copy-id</codeph> command to install
your public SSH key on each host in the cluster. Alternatively, your provisioning software
may provide more convenient ways to securely install public keys on multiple hosts.</li>
<li>Create an Ansible inventory by creating a file called <codeph>hosts</codeph> with a list
of the hosts that will comprise your Greenplum Database cluster. For
example:<codeblock>mdw
sdw1
sdw2
...</codeblock>This file can be edited and used with the
Greenplum Database <codeph>gpssh-exkeys</codeph> and <codeph>gpinitsystem</codeph> utilities
later on.</li>
<li>Copy the playbook code below to a file <codeph>ansible-playbook.yml</codeph> on your
Ansible control node.</li>
<li>Edit the playbook variables at the top of the playbook, such as the
<codeph>gpadmin</codeph> administrative user and password to create, and the version of
Greenplum Database you are installing. </li>
<li>Run the playbook, passing the package to be installed to the <codeph>package_path</codeph>
parameter.<codeblock>ansible-playbook ansible-playbook.yml -i hosts -e package_path=./greenplum-db-6.0.0-rhel7-x86_64.rpm</codeblock></li>
</ol>
<section>
<title>Ansible Playbook - Greenplum Database Installation for CentOS 7</title>
<codeblock>
---
- hosts: all
vars:
- version: "6.0.0"
- greenplum_admin_user: "gpadmin"
- greenplum_admin_password: "changeme"
# - package_path: passed via the command line with: -e package_path=./greenplum-db-6.0.0-rhel7-x86_64.rpm
remote_user: root
become: yes
become_method: sudo
connection: ssh
gather_facts: yes
tasks:
- name: create greenplum admin user
user:
name: "{{ greenplum_admin_user }}"
password: "{{ greenplum_admin_password | password_hash('sha512', 'DvkPtCtNH+UdbePZfm9muQ9pU') }}"
- name: copy package to host
copy:
src: "{{ package_path }}"
dest: /tmp
- name: install package
yum:
name: "/tmp/{{ package_path | basename }}"
state: present
- name: cleanup package file from host
file:
path: "/tmp/{{ package_path | basename }}"
state: absent
- name: find install directory
find:
paths: /usr/local
patterns: 'greenplum*'
file_type: directory
register: installed_dir
- name: change install directory ownership
file:
path: '{{ item.path }}'
owner: "{{ greenplum_admin_user }}"
group: "{{ greenplum_admin_user }}"
recurse: yes
with_items: "{{ installed_dir.files }}"
- name: update pam_limits
pam_limits:
domain: "{{ greenplum_admin_user }}"
limit_type: '-'
limit_item: "{{ item.key }}"
value: "{{ item.value }}"
with_dict:
nofile: 524288
nproc: 131072
- name: find installed greenplum version
shell: . /usr/local/greenplum-db/greenplum_path.sh &amp;&amp; /usr/local/greenplum-db/bin/postgres --gp-version
register: postgres_gp_version
- name: fail if the correct greenplum version is not installed
fail:
msg: "Expected greenplum version {{ version }}, but found '{{ postgres_gp_version.stdout }}'"
when: "version is not defined or version not in postgres_gp_version.stdout"
</codeblock>
</section>
<p>When the playbook has executed successfully, you can proceed with <xref
href="create_data_dirs.xml#topic13"/> and <xref href="init_gpdb.xml#topic1"/>.</p>
</body>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic1" xml:lang="en">
<title id="jn135496">Installation Management Utilities</title>
<shortdesc>References for the command-line management utilities used to install and initialize a
Greenplum Database system. </shortdesc>
<body>
<p>For a full reference of all Greenplum Database utilities, see the <xref
href="../utility_guide/utility_guide.xml#topic_nz5_lnf_kp">Greenplum Database Utility
Guide</xref>.</p>
<p>The following Greenplum Database management utilities are located in
<codeph>$GPHOME/bin</codeph>.<simpletable id="jn163810">
<strow>
<stentry>
<ul id="ul_vsx_zwn_r4">
<li>
<codeph>
<xref href="../utility_guide/admin_utilities/gpactivatestandby.xml" type="topic"
format="dita" scope="peer">gpactivatestandby</xref>
</codeph>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpaddmirrors.xml" type="topic"
format="dita" scope="peer">gpaddmirrors</xref>
</codeph>
</p>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpcheckperf.xml" type="topic"
format="dita" scope="peer">gpcheckperf</xref>
</codeph>
</p>
</li>
<li otherprops="pivotal">
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpcopy.xml" type="topic"
format="dita" scope="peer">gpcopy</xref>
</codeph><!--Pivotal-->
</p>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpdeletesystem.xml" type="topic"
format="dita" scope="peer">gpdeletesystem</xref>
</codeph>
</p>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpinitstandby.xml" type="topic"
format="dita" scope="peer">gpinitstandby</xref>
</codeph>
</p>
</li>
</ul>
</stentry>
<stentry>
<ul id="ul_zy5_fxn_r4">
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpinitsystem.xml" type="topic"
format="dita" scope="peer">gpinitsystem</xref>
</codeph>
</p>
</li>
<li>
<codeph>
<xref href="../utility_guide/admin_utilities/gppkg.xml" type="topic" format="dita"
scope="peer">gppkg</xref>
</codeph>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpscp.xml" type="topic"
format="dita" scope="peer">gpscp</xref>
</codeph>
</p>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpssh.xml" type="topic"
format="dita" scope="peer">gpssh</xref>
</codeph>
</p>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpssh-exkeys.xml" type="topic"
format="dita" scope="peer">gpssh-exkeys</xref>
</codeph>
</p>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpstart.xml" type="topic"
format="dita" scope="peer">gpstart</xref>
</codeph>
</p>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpstop.xml" type="topic"
format="dita" scope="peer">gpstop</xref>
</codeph>
</p>
</li>
</ul>
</stentry>
</strow>
</simpletable></p>
</body>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic1" xml:lang="en">
<title id="jh138244">Estimating Storage Capacity</title>
<shortdesc>To estimate how much data your Greenplum Database system can accommodate, use these
measurements as guidelines. Also keep in mind that you may want to have extra space for landing
backup files and data load files on each segment host. </shortdesc>
<topic id="topic2" xml:lang="en">
<title id="jh159441">Calculating Usable Disk Capacity</title>
<body>
<p>To calculate how much data a Greenplum Database system can hold, you have to calculate the
usable disk capacity per segment host and then multiply that by the number of segment hosts
in your Greenplum Database array. Start with the raw capacity of the physical disks on a
segment host that are available for data storage (<varname>raw_capacity</varname>), which
is:</p>
<codeblock><varname>disk_size</varname> * <varname>number_of_disks</varname></codeblock>
<p>Account for file system formatting overhead (roughly 10 percent) and the RAID level you are
using. For example, if using RAID-10, the calculation would be:</p>
<codeblock>(<varname>raw_capacity</varname> * 0.9) / 2 = <varname>formatted_disk_space</varname></codeblock>
<p>For optimal performance, do not completely fill your disks to
capacity, but run at 70% or lower. So with this in mind, calculate the usable disk space as
follows:</p>
<codeblock><varname>formatted_disk_space</varname> * 0.7 = <varname>usable_disk_space</varname></codeblock>
<p>Once you have formatted RAID disk arrays and accounted for the maximum recommended capacity
(<varname>usable_disk_space</varname>), you will need to calculate how much storage is
actually available for user data (<codeph>U</codeph>). If using Greenplum Database mirrors
for data redundancy, this would then double the size of your user data (<codeph>2 *
U</codeph>). Greenplum Database also requires some space be reserved as a working area for
active queries. The work space should be approximately one third the size of your user data
(work space =
<codeph>U/3</codeph>):<codeblock><b>With mirrors:</b> (2 * U) + U/3 = <varname>usable_disk_space</varname>
<b>Without mirrors:</b> U + U/3 = <varname>usable_disk_space</varname></codeblock></p>
<p>Guidelines for temporary file space and user data space assume a typical analytic workload.
Highly concurrent workloads or workloads with queries that require very large amounts of
temporary space can benefit from reserving a larger working area. Typically, overall system
throughput can be increased while decreasing work area usage through proper workload
management. Additionally, temporary space and user space can be isolated from each other by
specifying that they reside on different tablespaces.</p>
<p>In the <i>Greenplum Database Administrator Guide</i>, see these topics:</p>
<ul>
<li id="jh161458">"Managing Workload and Resources" for information about workload
management </li>
<li id="jh161469">"Creating and Managing Tablespaces" for information about moving the
location of temporary files </li>
<li id="jh161162">"Monitoring System State" for information about monitoring Greenplum
Database disk space usage</li>
</ul>
</body>
</topic>
<topic id="topic3" xml:lang="en">
<title id="jh159695">Calculating User Data Size</title>
<body>
<p>As with all databases, the size of your raw data will be slightly larger once it is loaded
into the database. On average, raw data will be about 1.4 times larger on disk after it is
loaded into the database, but could be smaller or larger depending on the data types you are
using, table storage type, in-database compression, and so on.</p>
<ul>
<li id="jh157686">Page Overhead - When your data is loaded into Greenplum Database, it is
divided into pages of 32KB each. Each page has 20 bytes of page overhead.</li>
<li id="jh157687">Row Overhead - In a regular 'heap' storage table, each row of data has 24
bytes of row overhead. An 'append-optimized' storage table has only 4 bytes of row
overhead.</li>
<li id="jh157688">Attribute Overhead - For the data values itself, the size associated with
each attribute value is dependent upon the data type chosen. As a general rule, you want
to use the smallest data type possible to store your data (assuming you know the possible
values a column will have).</li>
<li id="jh157689">Indexes - In Greenplum Database, indexes are distributed across the
segment hosts as is table data. The default index type in Greenplum Database is B-tree.
Because index size depends on the number of unique values in the index and the data to be
inserted, precalculating the exact size of an index is impossible. However, you can
roughly estimate the size of an index using these
formulas.<codeblock><b>B-tree:</b> <varname>unique_values</varname> * (<varname>data_type_size</varname> + 24 bytes)
<b>Bitmap:</b> (<varname>unique_values</varname> * <varname>number_of_rows</varname> * 1 bit * <varname>compression_ratio</varname> / 8) + (<varname>unique_values</varname> * 32)</codeblock></li>
</ul>
</body>
</topic>
<topic id="topic4" xml:lang="en">
<title id="jh159741">Calculating Space Requirements for Metadata and Logs</title>
<body>
<p>On each segment host, you will also want to account for space for Greenplum Database log
files and metadata:</p>
<ul>
<li id="jh159754"><b>System Metadata</b> — For each Greenplum Database segment instance
(primary or mirror) or master instance running on a host, estimate approximately 20 MB for
the system catalogs and metadata. </li>
<li id="jh159758"><b>Write Ahead Log</b> — For each Greenplum Database segment (primary or
mirror) or master instance running on a host, allocate space for the write ahead log
(WAL). The WAL is divided into segment files of 64 MB each. At most, the number of WAL
files will be: <codeblock>2 * <varname>checkpoint_segments</varname> + 1</codeblock><p>You
can use this to estimate space requirements for WAL. The default
<varname>checkpoint_segments</varname> setting for a Greenplum Database instance is 8,
meaning 1088 MB WAL space allocated for each segment or master instance on a
host.</p></li>
<li id="jh159765"><b>Greenplum Database Log Files</b> — Each segment instance and the master
instance generates database log files, which will grow over time. Sufficient space should
be allocated for these log files, and some type of log rotation facility should be used to
ensure that to log files do not grow too large. </li>
<li id="jh160818"><b>Command Center Data</b> — The data collection agents utilized by
Command Center run on the same set of hosts as your Greenplum Database instance and
utilize the system resources of those hosts. The resource consumption of the data
collection agent processes on these hosts is minimal and should not significantly impact
database performance. Historical data collected by the collection agents is stored in its
own Command Center database (named <codeph>gpperfmon</codeph>) within your Greenplum
Database system. Collected data is distributed just like regular database data, so you
will need to account for disk space in the data directory locations of your Greenplum
segment instances. The amount of space required depends on the amount of historical data
you would like to keep. Historical data is not automatically truncated. Database
administrators must set up a truncation policy to maintain the size of the Command Center
database.</li>
</ul>
</body>
</topic>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic xml:lang="en-us" id="about">
<title>Copyright</title>
<body outputclass="db.chapter">
<p>
<xref href="http://pivotal.io/privacy-policy" format="html" scope="external">Privacy
Policy</xref> | <xref href="http://pivotal.io/terms-of-use" format="html"
scope="external">Terms of Use</xref></p>
<p>Copyright © 2017 Pivotal Software, Inc. All rights reserved.</p>
<p>Pivotal Software, Inc. believes the information in this publication is accurate as of its
publication date. The information is subject to change without notice. THE INFORMATION IN
THIS PUBLICATION IS PROVIDED "AS IS." PIVOTAL SOFTWARE, INC. ("Pivotal") MAKES NO
REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS
PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE.</p>
<p>Use, copying, and distribution of any Pivotal software described in this publication
requires an applicable software license.</p>
<p>All trademarks used herein are the property of Pivotal or their respective owners.</p>
<p> </p>
<p>Revised January 2017 (4.3.11.2)</p>
</body>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic13">
<title id="ji162534">Creating the Data Storage Areas</title>
<shortdesc>Describes how to create the directory locations where Greenplum Database data is stored
for each master, standby, and segment instance.</shortdesc>
<topic id="topic_wqb_1lc_wp">
<title>Creating Data Storage Areas on the Master and Standby Master Hosts</title>
<body>
<p>A data storage area is required on the Greenplum Database master and standby master hosts
to store Greenplum Database system data such as catalog data and other system metadata. </p>
<section id="topic_ix1_x1n_tp">
<title>To create the data directory location on the master</title>
<p>The data directory location on the master is different than those on the segments. The
master does not store any user data, only the system catalog tables and system metadata
are stored on the master instance, therefore you do not need to designate as much
storage space as on the segments.</p>
<ol id="ol_x3b_clc_wp">
<li id="ji162541">Create or choose a directory that will serve as your master data
storage area. This directory should have sufficient disk space for your data and be
owned by the <codeph>gpadmin</codeph> user and group. For example, run the following
commands as <codeph>root</codeph>:<codeblock># mkdir -p /data/master</codeblock></li>
<li id="ji162549">Change ownership of this directory to the <codeph>gpadmin</codeph>
user. For example:<codeblock># chown gpadmin:gpadmin /data/master</codeblock></li>
<li id="ji162557">Using <codeph><xref href="../utility_guide/admin_utilities/gpssh.xml"
format="dita" scope="peer" type="topic">gpssh</xref></codeph>, create the master
data directory location on your standby master as well. For
example:<codeblock># source /usr/local/greenplum-db/greenplum_path.sh
# gpssh -h smdw -e 'mkdir -p /data/master'
# gpssh -h smdw -e 'chown gpadmin:gpadmin /data/master'</codeblock></li>
</ol>
</section>
</body>
</topic>
<topic id="topic_plx_zps_vhb">
<title>Creating Data Storage Areas on Segment Hosts</title>
<body>
<p>Data storage areas are required on the Greenplum Database segment hosts for primary
segments. Separate storage areas are required for mirror segments.</p>
<section id="topic_tnb_v1n_tp">
<title>To create the data directory locations on all segment hosts</title>
<ol id="ol_otk_xkc_wp">
<li id="ji162571">On the master host, log in as
<codeph>root</codeph>:<codeblock># su</codeblock></li>
<li id="ji162573">Create a file called <codeph>hostfile_gpssh_segonly</codeph>. This file
should have only one machine configured host name for each segment host. For example, if
you have three segment hosts:<codeblock>sdw1
sdw2
sdw3</codeblock></li>
<li id="ji162580">Using <codeph><xref href="../utility_guide/admin_utilities/gpssh.xml"
format="dita" scope="peer">gpssh</xref></codeph>, create the primary and mirror data
directory locations on all segment hosts at once using the
<codeph>hostfile_gpssh_segonly</codeph> file you just created. For
example:<codeblock># source /usr/local/greenplum-db/greenplum_path.sh
# gpssh -f hostfile_gpssh_segonly -e 'mkdir -p /data/primary'
# gpssh -f hostfile_gpssh_segonly -e 'mkdir -p /data/mirror'
# gpssh -f hostfile_gpssh_segonly -e 'chown -R gpadmin /data/*'</codeblock></li>
</ol>
</section>
</body>
</topic>
<topic id="topic_cwj_hzb_vhb">
<title>Next Steps</title>
<body>
<ul id="ul_xsq_jzb_vhb">
<li><xref href="validate.xml#topic1">Validating Your Systems</xref></li>
<li><xref href="init_gpdb.xml#topic1">Initializing a Greenplum Database System</xref></li>
</ul>
</body>
</topic>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE map PUBLIC "-//OASIS//DTD DITA Map//EN" "map.dtd">
<map title="Installing the Data Science Package">
<topicref href="data_sci_pkgs.xml" navtitle="Installing the Data Science Packages">
<topicref href="install_extensions.xml"/>
<topicref href="install_python_dsmod.xml"
navtitle="Installing the Python Data Science Modules"/>
<topicref href="install_r_dslib.xml" navtitle="Installing the R Data Science Libraries"/>
<topicref href="install_pxf.xml"/>
</topicref>
</map>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="topic_dscipkg">
<title>Installing Optional Extensions</title>
<titlealts>
<!--HTML-only page-->
<navtitle>Installing Optional Extensions</navtitle>
</titlealts>
<shortdesc>Information about installing optional Greenplum Database extensions and packages, such
as the Procedural Language extensions and the Python and R Data Science Packages.</shortdesc>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic_lvl_dsy_4bb">
<title>DCA System Installation and Upgrade</title>
<body>
<p>On supported Dell EMC DCA systems, you can install Pivotal Greenplum 6, or you can upgrade
from Pivotal Greenplum 6.x to 6.new.</p>
<p>Only Pivotal Greenplum Database is supported on DCA systems. Open source versions of
Greenplum Database are not supported. </p>
<ul id="ul_ryv_wn2_pbb">
<li><xref href="#topic_x2w_ltv_yq" format="dita"/></li>
<li><xref href="#topic_usj_2xl_xq" format="dita"/></li>
</ul>
<note type="important">Upgrading Pivotal Greenplum Database 4 or 5 to Pivotal Greenplum 6 is not
supported.</note>
</body>
<topic id="topic_x2w_ltv_yq">
<title class="- topic/title ">Installing the Pivotal Greenplum 6 Software Binaries on DCA
Systems</title>
<body class="- topic/body ">
<note type="important">This section is for installing Pivotal Greenplum 6 only on DCA systems.
Also, see the information on the <xref href="https://support.emc.com/" format="html"
scope="external">DELL EMC support site</xref> (requires login). </note>
<section>
<title>Prerequisites</title>
<ul id="ul_fjm_nx2_pbb">
<li>Ensure your DCA system supports Pivotal Greenplum 6. See <xref
href="supported-platforms.xml" format="dita"/>.</li>
<li>Ensure that no previous versions of Greenplum Database are installed on your
system.<p>Installing Pivotal Greenplum 6 on a DCA system with an existing Greenplum
Database installation is not supported. For information about uninstalling Greenplum
Database software, see your Dell EMC DCA documentation.</p></li>
</ul>
</section>
<section>
<title>Installing Pivotal Greenplum 6</title>
<ol class="- topic/ol " id="ol_ctw_ltv_yq">
<li class="- topic/li ">Download or copy the Greenplum Database DCA installer file to the
Greenplum Database master host.</li>
<li class="- topic/li ">As root, run the DCA installer for Greenplum 6 on the Greenplum
Database master host and specify the file <codeph>hostfile</codeph> that lists all hosts
in the cluster, one host name per line. If necessary, copy <codeph>hostfile</codeph> to
the directory containing the installer before running the installer.<p>This example
command runs the installer for Greenplum Database
6.</p><codeblock># ./greenplum-db-appliance-&lt;version>-RHEL6-x86_64.bin hostfile</codeblock></li>
</ol>
</section>
</body>
</topic>
<topic id="topic_usj_2xl_xq">
<title class="- topic/title ">Upgrading Greenplum 6.x on DCA Systems</title>
<body class="- topic/body ">
<p>Upgrading Pivotal Greenplum from 6.x to 6.new on a Dell EMC DCA system involves stopping
Greenplum Database, updating the Greenplum Database software binaries, and restarting
Greenplum Database. </p>
<note type="important">This section is only for upgrading Pivotal Greenplum 6 on DCA systems.
For information about upgrading on non-DCA systems, see <xref
href="upgrading.xml#topic_tbx_szy_kbb"/>.</note>
<ol class="- topic/ol " id="ol_fql_2xl_xq">
<li class="- topic/li ">Log in to your Greenplum Database master host as the Greenplum
administrative user (<codeph>gpadmin</codeph>):<codeblock># su - gpadmin</codeblock></li>
<li class="- topic/li ">Download or copy the installer file
<codeph>greenplum-db-appliance-&lt;6.new>-RHEL6-x86_64.bin</codeph> to the Greenplum
Database master host.</li>
<li class="- topic/li ">Perform a smart shutdown of your existing Greenplum Database 6.x
system (there can be no active connections to the database). This example uses the
<codeph>-a</codeph> option to disable confirmation
prompts:<codeblock>$ gpstop -a</codeblock></li>
<li class="- topic/li ">As root, run the Greenplum Database DCA installer for 6.new on the
Greenplum Database master host and specify the file <codeph>hostfile</codeph> that lists
all hosts in the cluster. If necessary, copy <codeph>hostfile</codeph> to the directory
containing the installer before running the installer.<p>This example command runs the
installer for Greenplum Database 6.new for Red Hat Enterprise Linux
6.x.</p><codeblock># ./greenplum-db-appliance-&lt;6.new>-RHEL6-x86_64.bin hostfile</codeblock><p>The
file <codeph>hostfile</codeph> is a text file that lists all hosts in the cluster, one
host name per line.</p></li>
<li>If needed, update the <codeph>greenplum_path.sh</codeph> file for use with your specific
installation. These are some examples.<ul id="ul_vs1_3lq_cgb">
<li>If Greenplum Database uses LDAP authentication, edit the
<codeph>greenplum_path.sh</codeph> file to add the
line:<codeblock>export LDAPCONF=/etc/openldap/ldap.conf</codeblock></li>
<li>If Greenplum Database uses PL/Java, you might need to set or update the environment
variables <codeph>JAVA_HOME</codeph> and <codeph>LD_LIBRARY_PATH</codeph> in
<codeph>greenplum_path.sh</codeph>. </li>
</ul>
<note>When comparing the previous and new <codeph>greenplum_path.sh</codeph> files, be
aware that installing some Greenplum Database extensions also updates the
<codeph>greenplum_path.sh</codeph> file. The <codeph>greenplum_path.sh</codeph> from
the previous release might contain updates that were the result of those
extensions.</note></li>
<li class="- topic/li ">Install Greenplum Database extension packages. For information about
installing a Greenplum Database extension package, see <codeph>gppkg</codeph> in the
<cite>Greenplum Database Utility Guide</cite>.<p>Also migrate any additional files that
are used by the extensions (such as JAR files, shared object files, and libraries) from
the previous version installation directory to the new version installation directory.
</p></li>
<li class="- topic/li ">After all segment hosts have been upgraded, you can log in as the
<codeph>gpadmin</codeph> user and restart your Greenplum Database
system:<codeblock># su - gpadmin
$ gpstart</codeblock></li>
<li class="- topic/li ">If you are utilizing Data Domain Boost, you have to re-enter your DD
Boost credentials after upgrading to Greenplum Database 6 as follows:<codeblock>gpcrondump --ddboost-host <varname>ddboost_hostname</varname> --ddboost-user <varname>ddboost_user</varname>
--ddboost-backupdir <varname>backup_directory</varname></codeblock>
<note>If you do not reenter your login credentials after an upgrade, your backup will
never start because the Greenplum Database cannot connect to the Data Domain system. You
will receive an error advising you to check your login credentials. </note></li>
</ol>
<p>After upgrading Greenplum Database, ensure features work as expected. For example, you
should test that backup and restore perform as expected, and Greenplum Database features
such as user-defined functions, and extensions such as MADlib and PostGIS perform as
expected.</p>
</body>
</topic>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic1" xml:lang="en">
<title id="ji163114">Enabling iptables (Optional)</title>
<shortdesc>On Linux systems, you can configure and enable the <codeph>iptables</codeph> firewall
to work with Greenplum Database.</shortdesc>
<body>
<note type="note">Greenplum Database performance might be impacted when
<codeph>iptables</codeph> is enabled. You should test the performance of your application
with <codeph>iptables</codeph> enabled to ensure that performance is acceptable.</note>
<p>For more information about <codeph>iptables</codeph> see the <codeph>iptables</codeph> and
firewall documentation for your operating system. </p>
<section id="ji163124">
<title>How to Enable iptables</title>
<ol id="ol_akk_knz_thb">
<li id="ji163128">As <codeph>gpadmin</codeph>, run this command on the Greenplum Database
master host to stop Greenplum Database:<codeblock>$ gpstop -a</codeblock></li>
<li id="ji163139">On the Greenplum Database hosts:<ol id="ol_bkk_knz_thb">
<li id="ji163142">Update the file <codeph>/etc/sysconfig/iptables</codeph> based on the
<xref href="#topic16" type="topic" format="dita"/>. </li>
<li id="ji163144">As root user, run these commands to enable
<codeph>iptables</codeph>:<codeblock># chkconfig iptables on
# service iptables start</codeblock></li>
</ol></li>
<li id="ji163149">As gpadmin, run this command on the Greenplum Database master host to
start Greenplum Database:<codeblock>$ gpstart -a</codeblock></li>
</ol>
<note type="warning">After enabling <codeph>iptables</codeph>, this error in the
<codeph>/var/log/messages</codeph> file indicates that the setting for the
<codeph>iptables</codeph> table is too low and needs to be
increased.<codeblock>ip_conntrack: table full, dropping packet.</codeblock><p>As root, run
this command to view the <codeph>iptables</codeph> table
value:</p><codeblock># sysctl net.ipv4.netfilter.ip_conntrack_max</codeblock><p>The
following is the recommended setting to ensure that the Greenplum Database workload does
not overflow the <codeph>iptables</codeph> table. The value might need to be adjusted for
your hosts: <codeph>net.ipv4.netfilter.ip_conntrack_max=6553600</codeph></p><p>You can
update the <codeph>/etc/sysctl.conf</codeph> file with the value. See also <xref
href="prep_os.xml#topic3" type="topic" format="dita"/>.</p><p>To set the value until the next
reboot run this command as
root.</p><codeblock># sysctl net.ipv4.netfilter.ip_conntrack_max=6553600</codeblock></note>
</section>
</body>
<topic id="topic16" xml:lang="en">
<title id="ji163171">Example iptables Rules</title>
<body>
<p>When <codeph>iptables</codeph> is enabled, <codeph>iptables</codeph> manages the IP
communication on the host system based on configuration settings (rules). The example rules
are used to configure <codeph>iptables</codeph> for Greenplum Database master host, standby
master host, and segment hosts. </p>
<ul id="ul_ckk_knz_thb">
<li id="ji163179">
<xref href="#topic17" type="topic" format="dita"/>
</li>
<li id="ji163183">
<xref href="#topic18" type="topic" format="dita"/>
</li>
</ul>
<p>The two sets of rules account for the different types of communication Greenplum Database
expects on the master (primary and standby) and segment hosts. The rules should be added to
the <codeph>/etc/sysconfig/iptables</codeph> file of the Greenplum Database hosts. For
Greenplum Database, <codeph>iptables</codeph> rules should allow the following
communication: </p>
<ul id="ul_dkk_knz_thb">
<li id="ji163197">For customer facing communication with the Greenplum Database master,
allow at least <codeph>postgres</codeph> and <codeph>28080</codeph> (<codeph>eth1</codeph>
interface in the example). </li>
<li id="ji163201">For Greenplum Database system interconnect, allow communication using
<codeph>tcp</codeph>, <codeph>udp</codeph>, and <codeph>icmp</codeph> protocols
(<codeph>eth4</codeph> and <codeph>eth5</codeph> interfaces in the example).<p>The
network interfaces that you specify in the <codeph>iptables</codeph> settings are the
interfaces for the Greenplum Database hosts that you list in the
<varname>hostfile_gpinitsystem</varname> file. You specify the file when you run the
<codeph>gpinitsystem</codeph> command to initialize a Greenplum Database system. See
<xref href="./init_gpdb.xml#topic1" type="topic" format="dita"/> for information about
the <varname>hostfile_gpinitsystem</varname> file and the <codeph>gpinitsystem</codeph>
command. </p></li>
<li id="ji163212" otherprops="dca">For the administration network on a Greenplum DCA, allow
communication using <codeph>ssh</codeph>, <codeph>ntp</codeph>, and <codeph>icmp</codeph>
protocols. (<codeph>eth0</codeph> interface in the example).</li>
</ul>
<p>In the <codeph>iptables</codeph> file, each append rule command (lines starting with
<codeph>-A</codeph>) is a single line.</p>
<p>The example rules should be adjusted for your configuration. For example:</p>
<ul id="ul_ekk_knz_thb">
<li id="ji163215">The append command, the <codeph>-A</codeph> lines and connection parameter
<codeph>-i</codeph> should match the connectors for your hosts.</li>
<li id="ji163216">the CIDR network mask information for the source parameter
<codeph>-s</codeph> should match the IP addresses for your network.</li>
</ul>
</body>
<topic id="topic17" xml:lang="en">
<title id="ji163218">Example Master and Standby Master iptables Rules</title>
<body>
<p>Example <codeph>iptables</codeph> rules with comments for the
<codeph>/etc/sysconfig/iptables</codeph> file on the Greenplum Database master host and
standby master host.</p>
<codeblock>*filter
# Following 3 are default rules. If the packet passes through
# the rule set it gets these rule.
# Drop all inbound packets by default.
# Drop all forwarded (routed) packets.
# Let anything outbound go through.
:INPUT DROP [0:0]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [0:0]
# Accept anything on the loopback interface.
-A INPUT -i lo -j ACCEPT
# If a connection has already been established allow the
# remote host packets for the connection to pass through.
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
# These rules let all tcp and udp through on the standard
# interconnect IP addresses and on the interconnect interfaces.
# NOTE: gpsyncmaster uses random tcp ports in the range 1025 to 65535
# and Greenplum Database uses random udp ports in the range 1025 to 65535.
-A INPUT -i eth4 -p udp -s 192.0.2.0/22 -j ACCEPT
-A INPUT -i eth5 -p udp -s 198.51.100.0/22 -j ACCEPT
-A INPUT -i eth4 -p tcp -s 192.0.2.0/22 -j ACCEPT --syn -m state --state NEW
-A INPUT -i eth5 -p tcp -s 198.51.100.0/22 -j ACCEPT --syn -m state --state NEW<ph otherprops="dca">
# Allow udp/tcp ntp connections on the admin network on Greenplum DCA.
-A INPUT -i eth0 -p udp --dport ntp -s 203.0.113.0/21 -j ACCEPT
-A INPUT -i eth0 -p tcp --dport ntp -s 203.0.113.0/21 -j ACCEPT --syn -m state --state NEW</ph>
# Allow ssh on all networks (This rule can be more strict).
-A INPUT -p tcp --dport ssh -j ACCEPT --syn -m state --state NEW
# Allow Greenplum Database on all networks.
-A INPUT -p tcp --dport postgres -j ACCEPT --syn -m state --state NEW
# Allow Greenplum Command Center on the customer facing network.
-A INPUT -i eth1 -p tcp --dport 28080 -j ACCEPT --syn -m state --state NEW
# Allow ping and any other icmp traffic on the interconnect networks.
-A INPUT -i eth4 -p icmp -s 192.0.2.0/22 -j ACCEPT
-A INPUT -i eth5 -p icmp -s 198.51.100.0/22 -j ACCEPT<ph otherprops="dca">
# Allow ping only on the admin network on Greenplum DCA.
-A INPUT -i eth0 -p icmp --icmp-type echo-request -s 203.0.113.0/21 -j ACCEPT</ph>
# Log an error if a packet passes through the rules to the default
# INPUT rule (a DROP).
-A INPUT -m limit --limit 5/min -j LOG --log-prefix "iptables denied: " --log-level 7
COMMIT</codeblock>
</body>
</topic>
<topic id="topic18" xml:lang="en">
<title id="ji163239">Example Segment Host iptables Rules</title>
<body>
<p>Example <codeph>iptables</codeph> rules for the <codeph>/etc/sysconfig/iptables</codeph>
file on the Greenplum Database segment hosts. The rules for segment hosts are similar to
the master rules with fewer interfaces and fewer <codeph>udp</codeph> and
<codeph>tcp</codeph> services. </p>
<codeblock>*filter
:INPUT DROP
:FORWARD DROP
:OUTPUT ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -i eth2 -p udp -s 192.0.2.0/22 -j ACCEPT
-A INPUT -i eth3 -p udp -s 198.51.100.0/22 -j ACCEPT
-A INPUT -i eth2 -p tcp -s 192.0.2.0/22 -j ACCEPT --syn -m state --state NEW
-A INPUT -i eth3 -p tcp -s 198.51.100.0/22 -j ACCEPT --syn -m state --state NEW
-A INPUT -p tcp --dport ssh -j ACCEPT --syn -m state --state NEW
-A INPUT -i eth2 -p icmp -s 192.0.2.0/22 -j ACCEPT
-A INPUT -i eth3 -p icmp -s 198.51.100.0/22 -j ACCEPT
-A INPUT -i eth0 -p icmp --icmp-type echo-request -s 203.0.113.0/21 -j ACCEPT
-A INPUT -m limit --limit 5/min -j LOG --log-prefix "iptables denied: " --log-level 7
COMMIT</codeblock>
</body>
</topic>
</topic>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic xmlns:ditaarch="http://dita.oasis-open.org/architecture/2005/" id="topic1" xml:lang="en"
ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id135496" class="- topic/title ">Greenplum Environment Variables</title>
<shortdesc>Reference of the environment variables to set for Greenplum Database. </shortdesc>
<body class="- topic/body ">
<p>Set these in your user's startup shell profile (such as <codeph
class="+ topic/ph pr-d/codeph ">~/.bashrc</codeph> or <codeph
class="+ topic/ph pr-d/codeph ">~/.bash_profile</codeph>), or in
<codeph class="+ topic/ph pr-d/codeph ">/etc/profile</codeph> if you
want to set them for all users.</p>
</body>
<topic id="topic2" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title class="- topic/title ">Required Environment Variables</title>
<body class="- topic/body ">
<note type="note" class="- topic/note "><codeph
class="+ topic/ph pr-d/codeph ">GPHOME</codeph>, <codeph
class="+ topic/ph pr-d/codeph ">PATH</codeph> and <codeph
class="+ topic/ph pr-d/codeph ">LD_LIBRARY_PATH</codeph> can
be set by sourcing the <codeph>greenplum_path.sh</codeph> file from
your Greenplum Database installation directory</note>
</body>
<topic id="topic3" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138636" class="- topic/title ">GPHOME</title>
<body class="- topic/body ">
<p>This is the installed location of your Greenplum Database
software. For example:</p>
<codeblock>GPHOME=/usr/local/greenplum-db-6.<varname>x.x</varname>
export GPHOME</codeblock>
</body>
</topic>
<topic id="topic4" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id139357" class="- topic/title ">PATH</title>
<body class="- topic/body ">
<p>Your <codeph class="+ topic/ph pr-d/codeph ">PATH</codeph>
environment variable should point to the location of the
Greenplum Database <codeph class="+ topic/ph pr-d/codeph "
>bin</codeph> directory. For example:</p>
<codeblock>PATH=$GPHOME/bin:$PATH
export PATH</codeblock>
</body>
</topic>
<topic id="topic5" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138662" class="- topic/title ">LD_LIBRARY_PATH</title>
<body class="- topic/body ">
<p>The <codeph class="+ topic/ph pr-d/codeph "
>LD_LIBRARY_PATH</codeph> environment variable
should point to the location of the Greenplum
Database/PostgreSQL library files. For example:</p>
<codeblock>LD_LIBRARY_PATH=$GPHOME/lib
export LD_LIBRARY_PATH</codeblock>
</body>
</topic>
<topic id="topic6" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138677" class="- topic/title ">MASTER_DATA_DIRECTORY</title>
<body class="- topic/body ">
<p>This should point to the directory created by the gpinitsystem
utility in the master data directory location. For
example:</p>
<codeblock>MASTER_DATA_DIRECTORY=/data/master/gpseg-1
export MASTER_DATA_DIRECTORY</codeblock>
</body>
</topic>
</topic>
<topic id="topic7" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title class="- topic/title ">Optional Environment Variables</title>
<body class="- topic/body ">
<p>The following are standard PostgreSQL environment variables, which are
also recognized in Greenplum Database. You may want to add the
connection-related environment variables to your profile for
convenience, so you do not have to type so many options on the
command line for client connections. Note that these environment
variables should be set on the Greenplum Database master host
only.</p>
</body>
<topic id="topic8" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id139713" class="- topic/title ">PGAPPNAME</title>
<body class="- topic/body ">
<p>The name of the application that is usually set by an application
when it connects to the server. This name is displayed in
the activity view and in log entries. The <codeph
class="+ topic/ph pr-d/codeph ">PGAPPNAME</codeph>
environmental variable behaves the same as the <codeph
class="+ topic/ph pr-d/codeph "
>application_name</codeph> connection parameter. The
default value for <codeph class="+ topic/ph pr-d/codeph "
>application_name</codeph> is <codeph>psql</codeph>.
The name cannot be longer than 63 characters. </p>
</body>
</topic>
<topic id="topic9" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id139717" class="- topic/title ">PGDATABASE</title>
<body class="- topic/body ">
<p>The name of the default database to use when connecting.</p>
</body>
</topic>
<topic id="topic10" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138709" class="- topic/title ">PGHOST</title>
<body class="- topic/body ">
<p>The Greenplum Database master host name.</p>
</body>
</topic>
<topic id="topic11" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138715" class="- topic/title ">PGHOSTADDR</title>
<body class="- topic/body ">
<p>The numeric IP address of the master host. This can be set
instead of or in addition to <codeph
class="+ topic/ph pr-d/codeph ">PGHOST</codeph> to
avoid DNS lookup overhead.</p>
</body>
</topic>
<topic id="topic12" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138718" class="- topic/title ">PGPASSWORD </title>
<body class="- topic/body ">
<p>The password used if the server demands password authentication.
Use of this environment variable is not recommended for
security reasons (some operating systems allow non-root
users to see process environment variables via <codeph
class="+ topic/ph pr-d/codeph ">ps</codeph>).
Instead consider using the <codeph
class="+ topic/ph pr-d/codeph ">~/.pgpass</codeph>
file.</p>
</body>
</topic>
<topic id="topic13" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138721" class="- topic/title ">PGPASSFILE </title>
<body class="- topic/body ">
<p>The name of the password file to use for lookups. If not set, it
defaults to <codeph class="+ topic/ph pr-d/codeph "
>~/.pgpass</codeph>. See the topic about <xref
href="https://www.postgresql.org/docs/9.4/libpq-pgpass.html"
scope="external" format="html" class="- topic/xref "
>The Password File</xref> in the PostgreSQL
documentation for more information.</p>
</body>
</topic>
<topic id="topic14" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138725" class="- topic/title ">PGOPTIONS</title>
<body class="- topic/body ">
<p>Sets additional configuration parameters for the Greenplum
Database master server.</p>
</body>
</topic>
<topic id="topic15" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138731" class="- topic/title ">PGPORT</title>
<body class="- topic/body ">
<p>The port number of the Greenplum Database server on the master
host. The default port is 5432.</p>
</body>
</topic>
<topic id="topic16" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138741" class="- topic/title ">PGUSER</title>
<body class="- topic/body ">
<p>The Greenplum Database user name used to connect.</p>
</body>
</topic>
<topic id="topic17" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138747" class="- topic/title ">PGDATESTYLE</title>
<body class="- topic/body ">
<p>Sets the default style of date/time representation for a session.
(Equivalent to <codeph class="+ topic/ph pr-d/codeph ">SET
datestyle TO...</codeph>)</p>
</body>
</topic>
<topic id="topic18" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138750" class="- topic/title ">PGTZ</title>
<body class="- topic/body ">
<p>Sets the default time zone for a session. (Equivalent to <codeph
class="+ topic/ph pr-d/codeph ">SET timezone
TO...</codeph>)</p>
</body>
</topic>
<topic id="topic19" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138753" class="- topic/title ">PGCLIENTENCODING</title>
<body class="- topic/body ">
<p>Sets the default client character set encoding for a session.
(Equivalent to <codeph class="+ topic/ph pr-d/codeph ">SET
client_encoding TO...</codeph>)</p>
</body>
</topic>
</topic>
</topic>
此差异已折叠。
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE dita PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<dita>
<topic id="topic12" xml:lang="en">
<title id="ji162527">Procedural Language, Machine Learning, and Geospatial Extensions</title>
<body>
<p><i>Optional.</i> Use the Greenplum package manager (<codeph>gppkg</codeph>) to install
Greenplum Database extensions such as PL/Java, PL/R, PostGIS, and MADlib, along with their
dependencies, across an entire cluster. The package manager also integrates with existing
scripts so that any packages are automatically installed on any new hosts introduced into
the system following cluster expansion or segment host recovery.</p>
<p>See <codeph><xref href="../utility_guide/admin_utilities/gppkg.xml" format="dita"
scope="peer" type="topic">gppkg</xref></codeph> for more information, including
usage.</p>
<p>Extension packages can be downloaded from the Greenplum Database page on <xref
href="https://network.pivotal.io/products/pivotal-gpdb" scope="external" format="html"
class="- topic/xref ">Pivotal Network</xref>. The extension documentation in the <i><xref
format="dita" scope="peer" type="topic" href="../ref_guide/ref_guide.xml">Greenplum
Database Reference Guide</xref></i> contains information about installing extension
packages and using extensions.<ul id="ul_i5j_hrl_dbb">
<li><xref format="dita" scope="peer" type="topic"
href="../ref_guide/extensions/pl_r.xml#topic1">Greenplum PL/R Language
Extension</xref></li>
<li><xref format="dita" scope="peer" type="topic"
href="../ref_guide/extensions/pl_java.xml#topic1">Greenplum PL/Java Language
Extension</xref></li>
<li><xref format="dita" scope="peer" type="topic"
href="../ref_guide/extensions/madlib.xml#topic1">Greenplum MADlib Extension for
Analytics</xref></li>
<li><xref format="dita" scope="peer" type="topic"
href="../ref_guide/extensions/postGIS.xml#topic1">Greenplum PostGIS
Extension</xref></li>
</ul></p>
<note type="important">If you intend to use an extension package with Greenplum Database 6
you must install and use a Greenplum Database extension package (gppkg files and contrib
modules) that is built for Greenplum Database 6. Any custom modules that were used with
earlier versions must be rebuilt for use with Greenplum Database 6.</note>
</body>
</topic>
</dita>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic1" xml:lang="en">
<title id="ji162018">Installing the Greenplum Database Software</title>
<shortdesc>Describes how to install the Greenplum Database software binaries on all of the hosts
that will comprise your Greenplum Database system, how to enable passwordless SSH for the
<codeph>gpadmin</codeph> user, and how to verify the installation.</shortdesc>
<body>
<p>Perform the following tasks in order:</p>
<ol>
<li><xref href="#topic_oy5_21n_1jb" format="dita"/></li>
<li><xref href="#topic_xmb_gb5_vhb" format="dita"/></li>
<li><xref href="#topic10" format="dita">Confirm the software installation.</xref></li>
<li><xref href="#topic_cwj_hzb_vhb" format="dita"/></li>
</ol>
</body>
<topic id="topic_oy5_21n_1jb">
<title>Installing Greenplum Database</title>
<body>
<p>You must install Greenplum Database on each host machine of the Greenplum Database
system.<ph otherprops="oss-only"> You can configure a minimal Greenplum Database system to
run on a single host system with the master instance and segment instances running on the
same host system.</ph><ph otherprops="pivotal"> Pivotal distributes the Greenplum Database
software as a downloadable package that you install on each host system with the operating
system's package management system. You can download the package from <xref
href="https://network.pivotal.io/products/pivotal-gpdb" format="html" scope="external"
>Pivotal Network</xref>.</ph></p>
<!--OSS only-->
<p otherprops="oss-only">Greenplum Database releases are available as: source code tarballs,
RPM installers for CentOS, and DEB packages for Debian and Ubuntu. See <xref
href="https://greenplum.org/download/" format="html" scope="external"
>https://greenplum.org/download/</xref> for links to source code and instructions to
compile Greenplum Database from source, and for links to download pre-built binaries in RPM
and DEB format. For the Ubuntu operating system, Greenplum also offers a binary that can be
installed via the <codeph>apt-get</codeph> command with the Ubuntu Personal Package Archive
system.</p>
<p>Before you begin installing Greenplum Database, be sure you have completed the steps in
<xref href="prep_os.xml#topic1"/> to configure each of the master, standby master, and
segment host machines for Greenplum Database.</p>
<note type="important">After installing Greenplum Database, you must set Greenplum Database
environment variables. See <xref href="init_gpdb.xml#topic8"/>.</note>
<p>See <xref href="ansible-example.xml#Untitled1"/> for an example script that shows how you
can automate creating the <codeph>gpadmin</codeph> user and installing the Greenplum
Database.</p>
<p>Follow these instructions to install Greenplum Database<ph otherprops="oss-only"> from a
pre-built binary</ph>.</p>
<note type="important">You will need sudo or root user access to install from a pre-built
binary. </note>
<ol id="ol_mm5_v1n_1jb">
<li>Download and copy the Greenplum Database package to the <codeph>gpadmin</codeph> user's
home directory on the master, standby master, and every segment host machine. The
distribution file name has the format
<codeph>greenplum-db-&lt;version>-&lt;platform>.rpm</codeph> for RHEL and CentOS
systems, or <codeph>greenplum-db-&lt;version>-&lt;platform>.deb</codeph> for Ubuntu
systems, where <codeph>&lt;platform></codeph> is similar to <codeph>rhel7-x86_64</codeph>
(Red Hat 7 64-bit).</li>
<li>With sudo (or as root), install the Greenplum Database package on each host machine
using your system's package manager software. <ul id="ol_wgt_mxz_1jb">
<li>For RHEL/CentOS systems, execute the <codeph>yum</codeph> command:
<codeblock>$ sudo yum install ./greenplum-db-&lt;version>-&lt;platform>.rpm</codeblock></li>
<li>For Ubuntu systems, execute the <codeph>apt</codeph>
command:<codeblock>$ sudo apt install ./greenplum-db-&lt;version>-&lt;platform>.deb</codeblock></li>
</ul><p>The <codeph>yum</codeph> or <codeph>apt</codeph> command installs software
dependencies, copies the Greenplum Database software files into a version-specific
directory, <codeph>/usr/local/greenplum-db-&lt;version></codeph>, and creates the
symbolic link <codeph>/usr/local/greenplum-db</codeph> to the installation
directory.</p></li>
<li>Change the owner and group of the installed files to
<codeph>gpadmin</codeph>:<codeblock>$ sudo chown -R gpadmin:gpadmin /usr/local/greenplum*</codeblock></li>
</ol>
</body>
</topic>
<topic id="topic_xmb_gb5_vhb">
<title>Enabling Passwordless SSH</title>
<body>
<p>The <codeph>gpadmin</codeph> user on each Greenplum host must be able to SSH from any host
in the cluster to any other host in the cluster without entering a password or passphrase
(called "passwordless SSH"). If you enable passwordless SSH from the master host to every
other host in the cluster ("1-<i>n</i> passwordless SSH"), you can use the Greenplum
Database <codeph>gpssh-exkeys</codeph> command-line utility to enable passwordless SSH from
every host to every other host ("<i>n</i>-<i>n</i> passwordless SSH"). </p>
<ol id="ol_iyx_3b5_vhb">
<li>Log in to the master host as the <codeph>gpadmin</codeph> user.</li>
<li>Source the <codeph>path</codeph> file in the Greenplum Database installation directory.<codeblock>$ source /usr/local/greenplum-db-&lt;version>/greenplum_path.sh</codeblock>
<note>Add the above <codeph>source</codeph> command to the <codeph>gpadmin</codeph> user's
<codeph>.bashrc</codeph> or other shell startup file so that the Greenplum Database
path and environment variables are set whenever you log in as
<codeph>gpadmin</codeph>.</note></li>
<li>Use the <codeph>ssh-copy-id</codeph> command to add the <codeph>gpadmin</codeph> user's
public key to the <codeph>authorized_hosts</codeph> SSH file on every other host in the
cluster.
<codeblock>$ ssh-copy-id smdw
$ ssh-copy-id sdw1
$ ssh-copy-id sdw2
$ ssh-copy-id sdw3
. . .</codeblock>
This enables 1-<i>n</i> passwordless SSH. You will be prompted to enter the
<codeph>gpadmin</codeph> user's password for each host. If you have the
<codeph>sshpass</codeph> command on your system, you can use a command like the
following to avoid the
prompt.<codeblock>$ SSHPASS=&lt;password> sshpass -e ssh-copy-id smdw</codeblock></li>
<li>In the <codeph>gpadmin</codeph> home directory, create a file named
<codeph>hostfile_exkeys</codeph> that has the machine configured host names and host
addresses (interface names) for each host in your Greenplum system (master, standby
master, and segment hosts). Make sure there are no blank lines or extra spaces. Check the
<codeph>/etc/hosts</codeph> file on your systems for the correct host names to use for
your environment. For example, if you have a master, standby master, and three segment
hosts with two unbonded network interfaces per host, your file would look something like
this:<codeblock>mdw
mdw-1
mdw-2
smdw
smdw-1
smdw-2
sdw1
sdw1-1
sdw1-2
sdw2
sdw2-1
sdw2-2
sdw3
sdw3-1
sdw3-2</codeblock></li>
<li>Run the <codeph>gpssh-exkeys</codeph> utility with your <codeph>hostfile_exkeys</codeph>
file to enable <i>n</i>-<i>n</i> passwordless SSH for the <codeph>gpadmin</codeph>
user.<codeblock>$ gpssh-exkeys -f hostfile_exkeys</codeblock></li>
</ol>
</body>
</topic>
<topic id="topic10" xml:lang="en">
<title>Confirming Your Installation</title>
<body>
<p>To make sure the Greenplum software was installed and configured correctly, run the
following confirmation steps from your Greenplum master host. If necessary, correct any
problems before continuing on to the next task.</p>
<ol id="ol_yjk_knz_thb">
<li id="ji162490">Log in to the master host as
<codeph>gpadmin</codeph>:<codeblock>$ su - <varname>gpadmin</varname></codeblock></li>
<li id="ji162506">Use the <codeph>gpssh</codeph> utility to see if you can log in to all
hosts without a password prompt, and to confirm that the Greenplum software was installed
on all hosts. Use the <codeph>hostfile_exkeys</codeph> file you used to set up
passwordless SSH. For
example:<codeblock>$ gpssh -f hostfile_exkeys -e 'ls -l /usr/local/greenplum-db-&lt;version>'</codeblock><p>If
the installation was successful, you should be able to log in to all hosts without a
password prompt. All hosts should show that they have the same contents in their
installation directories, and that the directories are owned by the
<codeph>gpadmin</codeph> user.</p><p>If you are prompted for a password, run the
following command to redo the ssh key
exchange:</p><codeblock>$ gpssh-exkeys -f hostfile_exkeys</codeblock></li>
</ol>
</body>
</topic>
<topic id="topic_zdf_1f5_vhb">
<title>About Your Greenplum Database Installation</title>
<body>
<ul id="ul_a2f_1f5_vhb">
<li><codeph>greenplum_path.sh</codeph> — This file contains the environment variables for
Greenplum Database. See <xref href="./init_gpdb.xml#topic8" type="topic" format="dita"
/>.</li>
<li><b>bin</b> — This directory contains the Greenplum Database management utilities. This
directory also contains the PostgreSQL client and server programs, most of which are also
used in Greenplum Database.</li>
<li><b>docs/cli_help</b> — This directory contains help files for Greenplum Database
command-line utilities. </li>
<li><b>docs/cli_help/gpconfigs</b> — This directory contains sample
<codeph>gpinitsystem</codeph> configuration files and host files that can be modified
and used when installing and initializing a Greenplum Database system.</li>
<li><b>ext</b> — Bundled programs (such as Python) used by some Greenplum Database
utilities.</li>
<li><b>include</b> — The C header files for Greenplum Database.</li>
<li><b>lib</b> — Greenplum Database and PostgreSQL library files.</li>
<li><b>sbin</b> — Supporting/Internal scripts and programs.</li>
<li><b>share</b> — Shared files for Greenplum Database.</li>
</ul>
</body>
</topic>
<topic id="topic_cwj_hzb_vhb">
<title>Next Steps</title>
<body>
<ul id="ul_xsq_jzb_vhb">
<li><xref href="create_data_dirs.xml#topic13"/></li>
<li><xref href="validate.xml#topic1">Validating Your Systems</xref></li>
<li><xref href="init_gpdb.xml#topic1">Initializing a Greenplum Database System</xref></li>
</ul>
</body>
</topic>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE map PUBLIC "-//OASIS//DTD DITA Map//EN" "map.dtd">
<map>
<topicref href="../../6-0/homenav.html" scope="external"
navtitle="Pivotal Greenplum® 6.0 Documentation" format="html" otherprops="op-help"/>
<topicref href="install_guide.xml" navtitle="Installation Guide">
<topicref href="platform-requirements.xml"/>
<topicref href="preinstall_concepts.xml">
<topicref href="preinstall_concepts.xml#topic2"/>
<topicref href="preinstall_concepts.xml#topic4"/>
<topicref href="preinstall_concepts.xml#topic9"/>
<topicref href="preinstall_concepts.xml#topic13"/>
<!-- pivotal only -->
<topicref href="preinstall_concepts.xml#topic_e5t_whm_kbb" otherprops="pivotal"/>
</topicref>
<topicref href="capacity_planning.xml" navtitle="Estimating Storage Capacity">
<topicref href="capacity_planning.xml#topic2"
navtitle="Calculating Usable Disk
Capacity"/>
<topicref href="capacity_planning.xml#topic3" navtitle="Calculating User Data Size"/>
<topicref href="capacity_planning.xml#topic4"
navtitle="Calculating Space Requirements
for Metadata and Logs"/>
</topicref>
<topicref href="prep_os.xml" navtitle="Configuring Your Systems"/>
<topicref href="install_gpdb.xml" navtitle="Installing the Greenplum Database Software"/>
<topicref href="create_data_dirs.xml"/>
<topicref href="validate.xml" navtitle="Validating Your Systems">
<topicref href="validate.xml#topic4" navtitle="Validating Network Performance"/>
<topicref href="validate.xml#topic5" navtitle="Validating Disk I/O and Memory Bandwidth"
/>
</topicref>
<topicref href="init_gpdb.xml" navtitle="Initializing a Greenplum Database System"/>
<!-- pivotal only -->
<topicref href="data_sci_pkgs.ditamap" format="ditamap" otherprops="pivotal"/>
<topicref href="install_modules.xml" navtitle="Installing Additional Supplied Modules"/>
<topicref href="localization.xml" navtitle="Configuring Localization Settings"/>
<!-- hidden for 6.0 beta releases -->
<topicref otherprops="op-hidden" href="upgrading.xml"/>
<topicref href="migrate.xml"/>
<topicref href="enable_iptables.xml" navtitle="Enabling iptables"/>
<topicref href="apx_mgmt_utils.xml" navtitle="Installation Management Utilities"/>
<topicref href="env_var_ref.xml" navtitle="Greenplum Environment Variables">
<topicref href="env_var_ref.xml#topic2" navtitle="Required Environment Variables"/>
<topicref href="env_var_ref.xml#topic7" navtitle="Optional Environment Variables"/>
</topicref>
<topicref href="ansible-example.xml"/>
</topicref>
</map>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="topic_gls_1nf_kp">
<title>Installing and Upgrading Greenplum</title>
<!--HTML-only page-->
<titlealts>
<navtitle>Installation Guide</navtitle>
</titlealts>
<shortdesc>Information about installing, configuring, and upgrading Greenplum Database software
and configuring Greenplum Database host machines.</shortdesc>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE dita PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<dita>
<topic xml:lang="en" id="topic_d45_wcw_pgb">
<title>Installing Additional Supplied Modules</title>
<shortdesc>The Greenplum Database distribution includes several PostgreSQL- and
Greenplum-sourced <codeph>contrib</codeph> modules that you have the option to install. </shortdesc>
<body>
<p>Each module is typically packaged as a Greenplum Database extension. You must register
these modules in each database in which you want to use it. For example, to register the
<codeph>dblink</codeph> module in the database named <codeph>testdb</codeph>, use the
command:</p>
<codeblock>$ psql -d testdb -c 'CREATE EXTENSION dblink;'</codeblock>
<p>To remove a module from a database, drop the associated extension. For example, to remove
the <codeph>dblink</codeph> module from the <codeph>testdb</codeph> database:</p>
<codeblock>$ psql -d testdb -c 'DROP EXTENSION dblink;'</codeblock>
<note>When you drop a module extension from a database, any user-defined function that you
created in the database that references functions defined in the module will no longer work.
If you created any database objects that use data types defined in the module, Greenplum
Database will notify you of these dependencies when you attempt to drop the module
extension.</note>
<p>You can register the following modules in this manner:</p>
<simpletable frame="all" relcolwidth="1.0* 1.0* 1.0*" otherprops="op-help">
<strow>
<stentry>
<ul id="ul_tc3_nlx_wp">
<li><xref href="../ref_guide/modules/citext.xml" type="topic" scope="peer"
format="dita">citext</xref></li>
<li><xref href="../ref_guide/modules/dblink.xml" type="topic" scope="peer"
format="dita">dblink</xref></li>
<li><xref href="../ref_guide/modules/fuzzystrmatch.xml" type="topic" scope="peer"
format="dita">fuzzystrmatch</xref></li>
<li><xref href="../ref_guide/modules/gp_sparse_vector.xml" type="topic" scope="peer"
format="dita">gp_sparse_vector</xref></li>
</ul>
</stentry>
<stentry>
<ul>
<li><xref href="../ref_guide/modules/hstore.xml" type="topic" scope="peer"
format="dita">hstore</xref></li>
<li otherprops="pivotal"><xref href="../ref_guide/modules/orafce_ref.xml" type="topic"
scope="peer" format="dita">orafce</xref><!--Pivotal only--></li>
<li><xref href="../ref_guide/modules/pageinspect.xml" type="topic" scope="peer"
format="dita">pageinspect</xref></li>
<li><xref href="../ref_guide/modules/pgcrypto.xml" type="topic" scope="peer"
format="dita">pgcrypto</xref></li>
<li><xref href="../ref_guide/modules/sslinfo.xml" type="topic" scope="peer"
format="dita">sslinfo</xref></li>
</ul>
</stentry>
</strow>
</simpletable>
<p>For additional information about the modules supplied with Greenplum Database, refer to
<xref href="../ref_guide/modules/intro.xml" format="dita" scope="peer">Additional Supplied
Modules</xref> in the <i>Greenplum Database Reference Guide</i>. </p>
</body>
</topic>
</dita>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE dita PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<dita>
<topic id="topic12xx" xml:lang="en">
<title id="ji162527xx">Greenplum Platform Extension Framework (PXF)</title>
<body>
<p><i>Optional.</i> If you do not plan to use PXF, no action is necessary.</p>
<p>If you plan to use PXF, refer to <xref href="../pxf/instcfg_pxf.html" type="topic"
format="html">Configuring PXF</xref> for instructions.</p>
</body>
</topic>
</dita>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic1" xml:lang="en">
<title id="pw216155">Python Data Science Module Package</title>
<body>
<p>Greenplum Database provides a collection of data science-related Python modules that can be
used with the Greenplum Database PL/Python language. You can download these modules in
<codeph>.gppkg</codeph> format from <xref
href="https://network.pivotal.io/products/pivotal-gpdb" format="html" scope="external"
>Pivotal Network</xref>.</p>
<p>This section contains the following information:</p>
<ul>
<li id="pw22228177">
<xref href="#topic_pydatascimod" type="topic" format="dita"/>
</li>
<li id="pw22228178">
<xref href="#topic_instpdsm" type="topic" format="dita"/>
</li>
<li id="pw22228179">
<xref href="#topic_removepdsm" type="topic" format="dita"/>
</li>
</ul>
<p>For information about the Greenplum Database PL/Python Language, see <xref scope="peer"
type="topic" format="dita" href="../ref_guide/extensions/pl_python.xml#topic1">Greenplum
PL/Python Language Extension</xref>.</p>
</body>
<topic id="topic_pydatascimod">
<title>Python Data Science Modules</title>
<body>
<p>Modules provided in the Python Data Science package include: <table id="iq1395577">
<title>Data Science Modules</title>
<tgroup cols="2">
<colspec colnum="1" colname="col1" colwidth="1*"/>
<colspec colnum="2" colname="col2" colwidth="2*"/>
<thead>
<row>
<entry colname="col1">Module Name</entry>
<entry colname="col2">Description/Used For</entry>
</row>
</thead>
<tbody>
<row>
<entry>atomicwrites</entry>
<entry>Atomic file writes</entry>
</row>
<row>
<entry>attrs</entry>
<entry>Declarative approach for defining class attributes</entry>
</row>
<row>
<entry>Autograd</entry>
<entry>Gradient-based optimization</entry>
</row>
<row>
<entry>backports.functools-lru-cache</entry>
<entry>Backports <codeph>functools.lru_cache</codeph> from Python 3.3</entry>
</row>
<row>
<entry colname="col1">Beautiful Soup</entry>
<entry colname="col2">Navigating HTML and XML</entry>
</row>
<row>
<entry>Blis</entry>
<entry>Blis linear algebra routines</entry>
</row>
<row>
<entry>Boto</entry>
<entry>Amazon Web Services library</entry>
</row>
<row>
<entry>Boto3</entry>
<entry>The AWS SDK</entry>
</row>
<row>
<entry>botocore</entry>
<entry>Low-level, data-driven core of boto3</entry>
</row>
<row>
<entry>Bottleneck</entry>
<entry>Fast NumPy array functions</entry>
</row>
<row>
<entry>Bz2file</entry>
<entry>Read and write bzip2-compressed files</entry>
</row>
<row>
<entry>Certifi</entry>
<entry>Provides Mozilla CA bundle</entry>
</row>
<row>
<entry>Chardet</entry>
<entry>Universal encoding detector for Python 2 and 3</entry>
</row>
<row>
<entry>ConfigParser</entry>
<entry>Updated <codeph>configparser</codeph> module</entry>
</row>
<row>
<entry>contextlib2</entry>
<entry>Backports and enhancements for the <codeph>contextlib</codeph> module</entry>
</row>
<row>
<entry>Cycler</entry>
<entry>Composable style cycles</entry>
</row>
<row>
<entry>cymem</entry>
<entry>Manage calls to calloc/free through Cython</entry>
</row>
<row>
<entry>Docutils</entry>
<entry>Python documentation utilities</entry>
</row>
<row>
<entry>enum34</entry>
<entry>Backport of Python 3.4 Enum</entry>
</row>
<row>
<entry>Funcsigs</entry>
<entry>Python function signatures from PEP362</entry>
</row>
<row>
<entry>functools32</entry>
<entry>Backport of the <codeph>functools</codeph> module from Python 3.2.3</entry>
</row>
<row>
<entry>funcy</entry>
<entry>Functional tools focused on practicality</entry>
</row>
<row>
<entry>future</entry>
<entry>Compatibility layer between Python 2 and Python 3</entry>
</row>
<row>
<entry>futures</entry>
<entry>Backport of the <codeph>concurrent.futures</codeph> package from Python
3</entry>
</row>
<row>
<entry colname="col1"> Gensim </entry>
<entry colname="col2">Topic modeling and document indexing</entry>
</row>
<row>
<entry>h5py</entry>
<entry>Read and write HDF5 files</entry>
</row>
<row>
<entry>idna</entry>
<entry>Internationalized Domain Names in Applications (IDNA)</entry>
</row>
<row>
<entry>importlib-metadata</entry>
<entry>Read metadata from Python packages</entry>
</row>
<row>
<entry>Jinja2</entry>
<entry>Stand-alone template engine</entry>
</row>
<row>
<entry>JMESPath</entry>
<entry>JSON Matching Expressions</entry>
</row>
<row>
<entry>Joblib</entry>
<entry>Python functions as pipeline jobs</entry>
</row>
<row>
<entry>jsonschema</entry>
<entry>JSON Schema validation</entry>
</row>
<row>
<entry colname="col1"> Keras (RHEL/CentOS 7 only) </entry>
<entry colname="col2">Deep learning</entry>
</row>
<row>
<entry>Keras Applications</entry>
<entry>Reference implementations of popular deep learning models</entry>
</row>
<row>
<entry>Keras Preprocessing</entry>
<entry>Easy data preprocessing and data augmentation for deep learning
models</entry>
</row>
<row>
<entry>Kiwi</entry>
<entry>A fast implementation of the Cassowary constraint solver</entry>
</row>
<row>
<entry colname="col1"> Lifelines </entry>
<entry colname="col2">Survival analysis</entry>
</row>
<row>
<entry colname="col1"> lxml </entry>
<entry colname="col2">XML and HTML processing</entry>
</row>
<row>
<entry>MarkupSafe</entry>
<entry>Safely add untrusted strings to HTML/XML markup</entry>
</row>
<row>
<entry>Matplotlib</entry>
<entry>Python plotting package</entry>
</row>
<row>
<entry>mock</entry>
<entry>Rolling backport of <codeph>unittest.mock</codeph></entry>
</row>
<row>
<entry>more-itertools</entry>
<entry>More routines for operating on iterables, beyond itertools</entry>
</row>
<row>
<entry>MurmurHash</entry>
<entry>Cython bindings for MurmurHash</entry>
</row>
<row>
<entry colname="col1"> NLTK </entry>
<entry colname="col2">Natural language toolkit</entry>
</row>
<row>
<entry>NumExpr</entry>
<entry>Fast numerical expression evaluator for NumPy</entry>
</row>
<row>
<entry colname="col1"> NumPy </entry>
<entry colname="col2">Scientific computing</entry>
</row>
<row>
<entry>packaging</entry>
<entry>Core utilities for Python packages</entry>
</row>
<row>
<entry colname="col1"> Pandas </entry>
<entry colname="col2">Data analysis</entry>
</row>
<row>
<entry>pathlib, pathlib2</entry>
<entry>Object-oriented filesystem paths</entry>
</row>
<row>
<entry>patsy</entry>
<entry>Package for describing statistical models and for building design
matrices</entry>
</row>
<row>
<entry colname="col1"> Pattern-en </entry>
<entry colname="col2">Part-of-speech tagging</entry>
</row>
<row>
<entry>pip</entry>
<entry>Tool for installing Python packages</entry>
</row>
<row>
<entry>plac</entry>
<entry>Command line arguments parser</entry>
</row>
<row>
<entry>pluggy</entry>
<entry>Plugin and hook calling mechanisms</entry>
</row>
<row>
<entry>preshed</entry>
<entry>Cython hash table that trusts the keys are pre-hashed</entry>
</row>
<row>
<entry>protobuf</entry>
<entry>Protocol buffers</entry>
</row>
<row>
<entry>py</entry>
<entry>Cross-python path, ini-parsing, io, code, log facilities</entry>
</row>
<row>
<entry colname="col1"> pyLDAvis </entry>
<entry colname="col2">Interactive topic model visualization</entry>
</row>
<row>
<entry colname="col1"> PyMC3 </entry>
<entry colname="col2">Statistical modeling and probabilistic machine
learning</entry>
</row>
<row>
<entry>pyparsing</entry>
<entry>Python parsing</entry>
</row>
<row>
<entry>pytest</entry>
<entry>Testing framework</entry>
</row>
<row>
<entry>python-dateutil</entry>
<entry>Extensions to the standard Python datetime module</entry>
</row>
<row>
<entry>pytz</entry>
<entry>World timezone definitions, modern and historical</entry>
</row>
<row>
<entry>PyYAML</entry>
<entry>YAML parser and emitter</entry>
</row>
<row>
<entry>requests</entry>
<entry>HTTP library</entry>
</row>
<row>
<entry>s3transfer</entry>
<entry>Amazon S3 transfer manager</entry>
</row>
<row>
<entry>scandir</entry>
<entry>Directory iteration function</entry>
</row>
<row>
<entry colname="col1"> scikit-learn </entry>
<entry colname="col2">Machine learning data mining and analysis</entry>
</row>
<row>
<entry colname="col1"> SciPy </entry>
<entry colname="col2">Scientific computing</entry>
</row>
<row>
<entry>setuptools</entry>
<entry>Download, build, install, upgrade, and uninstall Python packages</entry>
</row>
<row>
<entry>six</entry>
<entry>Python 2 and 3 compatibility library</entry>
</row>
<row>
<entry>smart-open</entry>
<entry>Utilities for streaming large files (S3, HDFS, gzip, bz2, and so
forth)</entry>
</row>
<row>
<entry colname="col1"> spaCy </entry>
<entry colname="col2">Large scale natural language processing</entry>
</row>
<row>
<entry>srsly</entry>
<entry>Modern high-performance serialization utilities for Python</entry>
</row>
<row>
<entry colname="col1"> StatsModels </entry>
<entry colname="col2">Statistical modeling</entry>
</row>
<row>
<entry>subprocess32</entry>
<entry>Backport of the subprocess module from Python 3</entry>
</row>
<row>
<entry colname="col1"> Tensorflow (RHEL/CentOS 7 only) </entry>
<entry colname="col2">Numerical computation using data flow graphs</entry>
</row>
<row>
<entry>Theano</entry>
<entry>Optimizing compiler for evaluating mathematical expressions on CPUs and
GPUs</entry>
</row>
<row>
<entry>thinc</entry>
<entry>Practical Machine Learning for NLP</entry>
</row>
<row>
<entry>tqdm</entry>
<entry>Fast, extensible progress meter</entry>
</row>
<row>
<entry>urllib3</entry>
<entry>HTTP library with thread-safe connection pooling, file post, and more</entry>
</row>
<row>
<entry>wasabi</entry>
<entry>Lightweight console printing and formatting toolkit</entry>
</row>
<row>
<entry>wcwidth</entry>
<entry>Measures number of Terminal column cells of wide-character codes</entry>
</row>
<row>
<entry>Werkzeug</entry>
<entry>Comprehensive WSGI web application library</entry>
</row>
<row>
<entry>wheel</entry>
<entry>A built-package format for Python</entry>
</row>
<row>
<entry colname="col1"> XGBoost </entry>
<entry colname="col2">Gradient boosting, classifying, ranking </entry>
</row>
<row>
<entry>zipp</entry>
<entry>Backport of pathlib-compatible object wrapper for zip files</entry>
</row>
</tbody>
</tgroup>
</table></p>
</body>
</topic>
<topic id="topic_instpdsm" xml:lang="en">
<title>Installing the Python Data Science Module Package</title>
<body>
<p>Before you install the Python Data Science Module package, make sure that your Greenplum
Database is running, you have sourced <codeph>greenplum_path.sh</codeph>, and that the
<codeph>$MASTER_DATA_DIRECTORY</codeph> and <codeph>$GPHOME</codeph> environment variables
are set.</p>
<note>The <codeph>PyMC3</codeph> module depends on <codeph>Tk</codeph>. If you want to use
<codeph>PyMC3</codeph>, you must install the <codeph>tk</codeph> OS package on every node
in your cluster. For example: <codeblock>$ yum install tk
</codeblock></note>
<ol>
<li>Locate the Python Data Science module package that you built or downloaded.<p>The file
name format of the package is
<codeph>DataSciencePython-&lt;version&gt;-relhel&lt;N&gt;-x86_64.gppkg</codeph>.</p></li>
<li>Copy the package to the Greenplum Database master host.</li>
<li>Use the <codeph>gppkg</codeph> command to install the package. For
example:<codeblock>$ gppkg -i DataSciencePython-&lt;version&gt;-relhel&lt;N&gt;_x86_64.gppkg</codeblock><p><codeph>gppkg</codeph>
installs the Python Data Science modules on all nodes in your Greenplum Database
cluster. The command also updates the <codeph>PYTHONPATH</codeph>,
<codeph>PATH</codeph>, and <codeph>LD_LIBRARY_PATH</codeph> environment variables in
your <codeph>greenplum_path.sh</codeph> file.</p></li>
<li>Restart Greenplum Database. You must re-source <codeph>greenplum_path.sh</codeph> before
restarting your Greenplum
cluster:<codeblock>$ source /usr/local/greenplum-db/greenplum_path.sh
$ gpstop -r</codeblock></li>
</ol>
<p>The Greenplum Database Python Data Science Modules are installed in the following
directory:</p>
<codeblock>$GPHOME/ext/DataSciencePython/lib/python2.7/site-packages/</codeblock>
</body>
</topic>
<topic id="topic_removepdsm" xml:lang="en">
<title>Uninstalling the Python Data Science Module Package</title>
<body>
<p>Use the <codeph>gppkg</codeph> utility to uninstall the Python Data Science Module package.
You must include the version number in the package name you provide to
<codeph>gppkg</codeph>.</p>
<p> To determine your Python Data Science Module package version number and remove this
package:</p>
<codeblock>$ gppkg -q --all | grep DataSciencePython
DataSciencePython-&lt;version&gt;
$ gppkg -r DataSciencePython-&lt;version&gt;</codeblock>
<p>The command removes the Python Data Science modules from your Greenplum Database cluster.
It also updates the <codeph>PYTHONPATH</codeph>, <codeph>PATH</codeph>, and
<codeph>LD_LIBRARY_PATH</codeph> environment variables in your
<codeph>greenplum_path.sh</codeph> file to their pre-installation values.</p>
<p>Re-source <codeph>greenplum_path.sh</codeph> and restart Greenplum Database after you
remove the Python Data Science Module package:</p>
<codeblock>$ . /usr/local/greenplum-db/greenplum_path.sh
$ gpstop -r </codeblock>
<note>When you uninstall the Python Data Science Module package from your Greenplum Database
cluster, any UDFs that you have created that import Python modules installed with this
package will return an error.</note>
</body>
</topic>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic1" xml:lang="en">
<title id="py212122">R Data Science Library Package</title>
<body>
<p> R packages are modules that contain R functions and data sets. Greenplum Database provides a
collection of data science-related R libraries that can be used with the Greenplum Database
PL/R language. You can download these libraries in <codeph>.gppkg</codeph> format from <xref
href="https://network.pivotal.io/products/pivotal-gpdb" format="html" scope="external"
>Pivotal Network</xref>.</p>
<p>This chapter contains the following information:</p>
<ul>
<li id="py2177">
<xref href="#topic2" type="topic" format="dita"/>
</li>
<li id="py21366577">
<xref href="#topic_instpdsl" type="topic" format="dita"/>
</li>
<li id="py217165">
<xref href="#topic_removepdsl" type="topic" format="dita"/>
</li>
</ul>
<p>For information about the Greenplum Database PL/R Language, see <xref scope="peer"
type="topic" format="dita" href="../ref_guide/extensions/pl_r.xml#topic1">Greenplum PL/R
Language Extension</xref>.</p>
</body>
<topic xml:lang="en" id="topic2">
<title>R Data Science Libraries</title>
<body>
<p>Libraries provided in the R Data Science package include: <simpletable id="l33">
<strow>
<stentry>
<p>abind</p>
<p>adabag</p>
<p>arm</p>
<p>assertthat</p>
<p>backports</p>
<p>BH</p>
<p>bitops</p>
<p>car</p>
<p>caret</p>
<p>caTools</p>
<p>cli</p>
<p>clipr</p>
<p>coda</p>
<p>colorspace</p>
<p>compHclust</p>
<p>crayon</p>
<p>curl</p>
<p>data.table</p>
<p>DBI</p>
<p>Deriv</p>
<p>dichromat</p>
<p>digest</p>
<p>doParallel</p>
<p>dplyr</p>
<p>e1071</p>
<p>fansi</p>
<p>fastICA</p>
<p>fBasics</p>
<p>fGarch</p>
<p>flashClust</p>
<p>foreach</p>
<p>forecast</p>
<p>foreign</p>
<p>fracdiff</p>
<p>gdata</p>
<p>generics</p>
<p>ggplot2</p>
<p>glmnet</p>
<p>glue</p>
<p>gower</p>
<p>gplots</p>
</stentry>
<stentry>
<p>gss</p>
<p>gtable</p>
<p>gtools</p>
<p>hms</p>
<p>hybridHclust</p>
<p>igraph</p>
<p>ipred</p>
<p>iterators</p>
<p>labeling</p>
<p>lattice</p>
<p>lava</p>
<p>lazyeval</p>
<p>lme4</p>
<p>lmtest</p>
<p>lubridate</p>
<p>magrittr</p>
<p>MASS</p>
<p>Matrix</p>
<p>MatrixModels</p>
<p>mcmc</p>
<p>MCMCpack</p>
<p>minqa</p>
<p>ModelMetrics</p>
<p>MTS</p>
<p>munsell</p>
<p>mvtnorm</p>
<p>neuralnet</p>
<p>nloptr</p>
<p>nnet</p>
<p>numDeriv</p>
<p>pbkrtest</p>
<p>pillar</p>
<p>pkgconfig</p>
<p>plogr</p>
<p>plyr</p>
<p>prodlim</p>
<p>purrr</p>
<p>quadprog</p>
<p>quantmod</p>
<p>quantreg</p>
<p>R2jags</p>
</stentry>
<stentry>
<p>R2WinBUGS</p>
<p>R6</p>
<p>randomForest</p>
<p>RColorBrewer</p>
<p>Rcpp</p>
<p>RcppArmadillo</p>
<p>RcppEigen</p>
<p>RcppRoll</p>
<p>readr</p>
<p>recipes</p>
<p>reshape2</p>
<p>rjags</p>
<p>rlang</p>
<p>RobustRankAggreg</p>
<p>ROCR</p>
<p>rpart</p>
<p>RPostgreSQL</p>
<p>sandwich</p>
<p>scales</p>
<p>SparseM</p>
<p>SQUAREM</p>
<p>stabledist</p>
<p>stringi</p>
<p>stringr</p>
<p>survival</p>
<p>tibble</p>
<p>tidyr</p>
<p>tidyselect</p>
<p>timeDate</p>
<p>timeSeries</p>
<p>tseries</p>
<p>TTR</p>
<p>urca</p>
<p>utf8</p>
<p>vctrs</p>
<p>viridisLite</p>
<p>withr</p>
<p>xts</p>
<p>zeallot</p>
<p>zoo</p>
</stentry>
</strow>
</simpletable></p>
</body>
</topic>
<topic id="topic_instpdsl" xml:lang="en">
<title>Installing the R Data Science Library Package</title>
<body>
<p>Before you install the R Data Science Library package, make sure that your Greenplum
Database is running, you have sourced <codeph>greenplum_path.sh</codeph>, and that the
<codeph>$MASTER_DATA_DIRECTORY</codeph> and <codeph>$GPHOME</codeph> environment variables
are set.</p>
<ol>
<li>Locate the R Data Science library package that you built or downloaded.<p>The file name
format of the package is
<codeph>DataScienceR-&lt;version&gt;-relhel&lt;N&gt;_x86_64.gppkg</codeph>.</p></li>
<li>Copy the package to the Greenplum Database master host.</li>
<li>Use the <codeph>gppkg</codeph> command to install the package. For
example:<codeblock>$ gppkg -i DataScienceR-&lt;version&gt;-relhel&lt;N&gt;_x86_64.gppkg</codeblock><p><codeph>gppkg</codeph>
installs the R Data Science libraries on all nodes in your Greenplum Database cluster.
The command also sets the <codeph>R_LIBS_USER</codeph> environment variable and updates
the <codeph>PATH</codeph> and <codeph>LD_LIBRARY_PATH</codeph> environment variables in
your <codeph>greenplum_path.sh</codeph> file.</p></li>
<li>Restart Greenplum Database. You must re-source <codeph>greenplum_path.sh</codeph> before
restarting your Greenplum
cluster:<codeblock>$ source /usr/local/greenplum-db/greenplum_path.sh
$ gpstop -r</codeblock></li>
</ol>
<p>The Greenplum Database R Data Science Modules are installed in the following
directory:<codeblock>$GPHOME/ext/DataScienceR/library</codeblock></p>
<note><codeph>rjags</codeph> libraries are installed in the
<codeph>$GPHOME/ext/DataScienceR/extlib/lib</codeph> directory. If you want to use
<codeph>rjags</codeph> and your <codeph>$GPHOME</codeph> is not
<codeph>/usr/local/greenplum-db</codeph>, you must perform additional configuration steps
to create a symbolic link from <codeph>$GPHOME</codeph> to
<codeph>/usr/local/greenplum-db</codeph> on each node in your Greenplum Database cluster.
For example:
<codeblock>$ gpssh -f all_hosts -e 'ln -s $GPHOME /usr/local/greenplum-db'
$ gpssh -f all_hosts -e 'chown -h gpadmin /usr/local/greenplum-db'
</codeblock></note>
</body>
</topic>
<topic id="topic_removepdsl" xml:lang="en">
<title>Uninstalling the R Data Science Library Package</title>
<body>
<p>Use the <codeph>gppkg</codeph> utility to uninstall the R Data Science Library package. You
must include the version number in the package name you provide to
<codeph>gppkg</codeph>.</p>
<p> To determine your R Data Science Library package version number and remove this
package:</p>
<codeblock>$ gppkg -q --all | grep DataScienceR
DataScienceR-&lt;version&gt;
$ gppkg -r DataScienceR-&lt;version&gt;</codeblock>
<p>The command removes the R Data Science libraries from your Greenplum Database cluster. It
also removes the <codeph>R_LIBS_USER</codeph> environment variable and updates the
<codeph>PATH</codeph> and <codeph>LD_LIBRARY_PATH</codeph> environment variables in your
<codeph>greenplum_path.sh</codeph> file to their pre-installation values.</p>
<p>Re-source <codeph>greenplum_path.sh</codeph> and restart Greenplum Database after you
remove the R Data Science Library package:</p>
<codeblock>$ . /usr/local/greenplum-db/greenplum_path.sh
$ gpstop -r </codeblock>
<note>When you uninstall the R Data Science Library package from your Greenplum Database
cluster, any UDFs that you have created that use R libraries installed with this package
will return an error.</note>
</body>
</topic>
</topic>
此差异已折叠。
此差异已折叠。
此差异已折叠。
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic1" xml:lang="en">
<title id="jf110126">Preface</title>
<body>
<p>This guide describes the tasks you must complete to install and start your Greenplum Database
system. </p>
<ul>
<li id="jf165405">
<xref href="#topic2" type="topic" format="dita"/>
</li>
<li id="jf165409">
<xref href="#topic4" type="topic" format="dita"/>
</li>
<li id="jf135184">
<xref href="#topic7" type="topic" format="dita"/>
</li>
</ul>
</body>
<topic id="topic2" xml:lang="en">
<title id="jf132855">About This Guide</title>
<body>
<p>This guide provides information and instructions for installing and initializing a
Greenplum Database system. This guide is intended for system administrators responsible for
building a Greenplum Database system. </p>
<p>This guide assumes knowledge of Linux/Unix system administration, database management
systems, database administration, and structured query language (SQL).</p>
<p>This guide contains the following chapters and appendices:</p>
<ul>
<li id="jf158523"><xref href="preinstall_concepts.xml#topic1" type="topic" format="dita"/>
Information about the Greenplum system architecture and components.</li>
<li id="jf158530"><xref href="capacity_planning.xml#topic1" type="topic" format="dita"/>
Guidelines for sizing a Greenplum Database system.</li>
<li id="jf158662"><xref href="prep_os.xml#topic1" type="topic" format="dita"/>
Instructions for installing and configuring the Greenplum software on all hosts in your
Greenplum Database array.</li>
<li id="jf158540"><xref href="validate.xml#topic1" type="topic" format="dita"/> — Validation
utilities and tests you can perform to ensure your Greenplum Database system will operate
properly.</li>
<li id="jf158550"><xref href="localization.xml#topic1" type="topic" format="dita"/>
Localization features of Greenplum Database. Locale settings must be configured prior to
initializing your Greenplum Database system.</li>
<li id="jf148595"><xref href="init_gpdb.xml#topic1" type="topic" format="dita"/>
Instructions for initializing a Greenplum Database system. Each database instance (the
master and all segments) must be initialized across all of the hosts in the system in such
a way that they can all work together as a unified DBMS. </li>
<li id="jf159937"><xref href="apx_mgmt_utils.xml#topic1" type="topic" format="dita"/>
Reference information about the command-line management utilities you use to install and
initialize a Greenplum Database system.</li>
<li id="jf167820"><xref href="env_var_ref.xml#topic1" type="topic"
format="dita"/> — Reference information about Greenplum environment variables you can
set in your system user's profile file.</li>
</ul>
</body>
</topic>
<topic id="topic3" xml:lang="en">
<title>About the Greenplum Database Documentation Set</title>
<body>
<p>The Greenplum Database 4.3 server documentation set consists of the following guides.</p>
<table id="jf168868">
<title>Greenplum Database server documentation set</title>
<tgroup cols="2">
<colspec colnum="1" colname="col1" colwidth="130pt"/>
<colspec colnum="2" colname="col2" colwidth="243pt"/>
<thead>
<row>
<entry colname="col1">Guide Name</entry>
<entry colname="col2">Description</entry>
</row>
</thead>
<tbody>
<row>
<entry colname="col1">
<i>Greenplum</i>
<i>Database Administrator Guide</i>
</entry>
<entry colname="col2">Information for administering the Greenplum Database system and
managing databases. It covers topics such as Greenplum Database architecture and
concepts and everyday system administration tasks such as configuring the server,
monitoring system activity, enabling high-availability, backing up and restoring
databases, and expanding the system. Database administration topics include
configuring access control, creating databases and database objects, loading data
into databases, writing queries, managing workloads, and monitoring and
troubleshooting performance.</entry>
</row>
<row>
<entry colname="col1">
<i>Greenplum</i>
<i>Database Reference Guide</i>
</entry>
<entry colname="col2">Reference information for Greenplum Database systems: SQL
commands, system catalogs, environment variables, character set support, datatypes,
the Greenplum MapReduce specification, postGIS extension, server parameters, the
gp_toolkit administrative schema, and SQL 2008 support.</entry>
</row>
<row>
<entry colname="col1">
<i>Greenplum Database Utility Guide</i>
</entry>
<entry colname="col2">Reference information for command-line utilities, client
programs, and Oracle compatibility functions.</entry>
</row>
<row>
<entry colname="col1">
<i>Greenplum Database Installation Guide</i>
</entry>
<entry colname="col2">Information and instructions for installing and initializing a
Greenplum Database system.</entry>
</row>
</tbody>
</tgroup>
</table>
</body>
</topic>
<topic id="topic4" xml:lang="en">
<title id="jf165456">Document Conventions</title>
<body>
<p>Greenplumdocumentation adheres to the following conventions to help you identify certain
types of information.</p>
<ul>
<li id="jf165467" otherprops="op-hidden">
<xref href="#topic5" type="topic" format="dita"/>
</li>
<li id="jf165471">
<xref href="#topic6" format="dita"/>
</li>
</ul>
</body>
<topic id="topic5" xml:lang="en" otherprops="op-hidden">
<title id="jf165544">Text Conventions</title>
<body>
<table id="jf165473">
<title>Text Conventions</title>
<tgroup cols="3">
<colspec colnum="1" colname="col1" colwidth="110pt"/>
<colspec colnum="2" colname="col2" colwidth="165pt"/>
<colspec colnum="3" colname="col3" colwidth="174pt"/>
<thead>
<row>
<entry colname="col1">Text Convention</entry>
<entry colname="col2">Usage</entry>
<entry colname="col3">Examples</entry>
</row>
</thead>
<tbody>
<row>
<entry colname="col1">
<b>bold</b>
</entry>
<entry colname="col2">Button, menu, tab, page, and field names in GUI
applications</entry>
<entry colname="col3">Click <b>Cancel</b> to exit the page without saving your
changes.</entry>
</row>
<row>
<entry colname="col1">
<i>italics</i>
</entry>
<entry colname="col2">New terms where they are defined<p>Database objects, such as
schema, table, or columns names</p></entry>
<entry colname="col3">The <i>master instance</i> is the <codeph>postgres</codeph>
process that accepts client connections.<p>Catalog information for Greenplum
Database resides in the <i>pg_catalog</i> schema.</p></entry>
</row>
<row>
<entry colname="col1">
<codeph>monospace</codeph>
</entry>
<entry colname="col2">File names and path names<p>Programs and
executables</p><p>Command names and syntax</p><p>Parameter names</p></entry>
<entry colname="col3">Edit the <codeph>postgresql.conf</codeph> file.<p>Use
<codeph>gpstart</codeph> to start Greenplum Database.</p></entry>
</row>
<row>
<entry colname="col1">
<varname>monospace italics</varname>
</entry>
<entry colname="col2">Variable information within file paths and file
names<p>Variable information within command syntax</p></entry>
<entry colname="col3">
<codeph>/home/gpadmin/</codeph>
<varname>config_file</varname>
<p>
<codeph>COPY</codeph>
<varname>tablename</varname>
<codeph>FROM '</codeph>
<varname>filename</varname>
<codeph>'</codeph>
</p>
</entry>
</row>
<row>
<entry colname="col1">
<b>monospace bold</b>
</entry>
<entry colname="col2">Used to call attention to a particular part of a command,
parameter, or code snippet.</entry>
<entry colname="col3">Change the host name, port, and database name in the JDBC
connection
URL:<p><codeph>jdbc:postgresql://<b>host</b>:<b>5432</b>/<b>mydb</b></codeph></p></entry>
</row>
<row>
<entry colname="col1">
<codeph>UPPERCASE</codeph>
</entry>
<entry colname="col2">Environment variables<p>SQL commands</p><p>Keyboard
keys</p></entry>
<entry colname="col3">Make sure that the Java <codeph>/bin</codeph> directory is in
your <codeph>$PATH</codeph>. <p><codeph>SELECT * FROM</codeph>
<varname>my_table</varname><codeph>;</codeph></p><p>Press
<codeph>CTRL+C</codeph> to escape.</p></entry>
</row>
</tbody>
</tgroup>
</table>
</body>
</topic>
<topic id="topic6" xml:lang="en">
<title id="jf165598">Command Syntax Conventions</title>
<body>
<table id="jf165546">
<title>Command Syntax Conventions</title>
<tgroup cols="3">
<colspec colnum="1" colname="col1" colwidth="124pt"/>
<colspec colnum="2" colname="col2" colwidth="162pt"/>
<colspec colnum="3" colname="col3" colwidth="162pt"/>
<thead>
<row>
<entry colname="col1">Text Convention</entry>
<entry colname="col2">Usage</entry>
<entry colname="col3">Examples</entry>
</row>
</thead>
<tbody>
<row>
<entry colname="col1">
<codeph>{ }</codeph>
</entry>
<entry colname="col2">Within command syntax, curly braces group related command
options. Do not type the curly braces.</entry>
<entry colname="col3">
<codeph>FROM</codeph>
<b>{</b>
<codeph>'</codeph>
<varname>filename</varname>
<codeph>' | STDIN</codeph>
<b>}</b>
</entry>
</row>
<row>
<entry colname="col1">
<codeph>[ ]</codeph>
</entry>
<entry colname="col2">Within command syntax, square brackets denote optional
arguments. Do not type the brackets.</entry>
<entry colname="col3">
<codeph>TRUNCATE</codeph>
<b>[</b>
<codeph>TABLE</codeph>
<b>]</b>
<varname>name</varname>
</entry>
</row>
<row>
<entry colname="col1">
<codeph>...</codeph>
</entry>
<entry colname="col2">Within command syntax, an ellipsis denotes repetition of a
command, variable, or option. Do not type the ellipsis.</entry>
<entry colname="col3">
<codeph>DROP TABLE </codeph>
<varname>name</varname>
<codeph>[,</codeph>
<b>...</b>
<codeph>]</codeph>
</entry>
</row>
<row>
<entry colname="col1">
<codeph>|</codeph>
</entry>
<entry colname="col2">Within command syntax, the pipe symbol denotes an "OR"
relationship. Do not type the pipe symbol.</entry>
<entry colname="col3">
<codeph>VACUUM [ FULL</codeph>
<b>|</b>
<codeph>FREEZE ]</codeph>
</entry>
</row>
<row>
<entry colname="col1"><codeph>$</codeph> system_command<p><codeph>#</codeph>
root_system_command</p><p><codeph>=&gt;</codeph>
gpdb_command</p><p><codeph>=#</codeph> su_gpdb_command</p></entry>
<entry colname="col2">Denotes a command prompt - do not type the prompt symbol.
<codeph>$</codeph> and <codeph>#</codeph> denote terminal command prompts.
<codeph>=&gt;</codeph> and <codeph>=#</codeph> denote Greenplum Database
interactive program command prompts (<codeph>psql </codeph>or
<codeph>gpssh</codeph>, for example).</entry>
<entry colname="col3">$ <codeph>createdb mydatabase</codeph><p># <codeph>chown
gpadmin -R /datadir</codeph></p><p>=&gt; <codeph>SELECT * FROM
mytable;</codeph></p><p>=# <codeph>SELECT * FROM
pg_database;</codeph></p></entry>
</row>
</tbody>
</tgroup>
</table>
</body>
</topic>
<topic id="topic7" xml:lang="en">
<title id="jf165600">Getting Support</title>
<body>
<p>For technical support, documentation, release notes, software updates, or for
information about Pivotal products, licensing, and services, go to <xref
href="http://www.pivotal.io" scope="external" format="html"
>www.pivotal.io</xref>.</p>
</body>
</topic>
</topic>
</topic>
此差异已折叠。
此差异已折叠。
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic_tbx_szy_kbb" otherprops="pivotal">
<title class="- topic/title ">Upgrading from an Earlier Greenplum 6 Release</title>
<!--GPDB 6X Pivotal only. Hidden: see ditamap-->
<shortdesc>The upgrade path supported for this release is Greenplum Database 6.x to a newer
Greenplum Database 6.x release. Direct upgrade from Greenplum Database 4 or 5 to Greenplum 6 is
not supported.</shortdesc>
<body class="- topic/body ">
<note type="important">Pivotal recommends that customers set the Greenplum Database timezone to
a value that is compatible with their host systems. Setting the Greenplum Database timezone
prevents Greenplum Database from selecting a timezone each time the cluster is restarted and
sets the timezone for the Greenplum Database master and segment instances. After you upgrade
to this release and if you have not set a Greenplum Database timezone value, verify that the
selected Greenplum Database timezone is acceptable for your deployment. See <xref
href="localization.xml" format="dita" scope="peer">Configuring Timezone and Localization
Settings</xref> for more information.</note>
<section id="gpdb_prereq">
<title>Prerequisites</title>
<p>Before starting the upgrade process, Pivotal recommends performing the following checks. </p>
<ul id="ul_ubx_szy_kbb">
<li>Verify the health of the Greenplum Database host hardware, and verify that the hosts
meet the requirements for running Greenplum Database. The Greenplum Database
<codeph>gpcheckperf</codeph> utility can assist you in confirming the host requirements.
<note>If you need to run the <codeph>gpcheckcat</codeph> utility, Pivotal recommends
running it a few weeks before the upgrade and that you run <codeph>gpcheckcat</codeph>
during a maintenance period. If necessary, you can resolve any issues found by the
utility before the scheduled upgrade.</note><p>The utility is in
<codeph>$GPHOME/bin</codeph>. Pivotal recommends that Greenplum Database be in
restricted mode when you run the <codeph>gpcheckcat</codeph> utility. See the
<cite>Greenplum Database Utility Guide</cite> for information about the
<codeph>gpcheckcat</codeph> utility.</p><p>If <codeph>gpcheckcat</codeph> reports
catalog inconsistencies, you can run <codeph>gpcheckcat</codeph> with the
<codeph>-g</codeph> option to generate SQL scripts to fix the
inconsistencies.</p><p>After you run the SQL scripts, run <codeph>gpcheckcat</codeph>
again. You might need to repeat the process of running <codeph>gpcheckcat</codeph> and
creating SQL scripts to ensure that there are no inconsistencies. Pivotal recommends
that the SQL scripts generated by <codeph>gpcheckcat</codeph> be run on a quiescent
system. The utility might report false alerts if there is activity on the system. </p>
<note type="important">If the <codeph>gpcheckcat</codeph> utility reports errors, but does
not generate a SQL script to fix the errors, contact Pivotal Support. Information for
contacting Pivotal Support is at <xref href="https://support.pivotal.io"
scope="external" format="html" class="- topic/xref "
>https://support.pivotal.io</xref>.</note></li>
<li>During the migration process from Greenplum Database 6, a backup is made of some files
and directories in <codeph>$MASTER_DATA_DIRECTORY</codeph>. Pivotal recommends that files
and directories that are not used by Greenplum Database be backed up, if necessary, and
removed from the <codeph>$MASTER_DATA_DIRECTORY</codeph> before migration. For information
about the Greenplum Database migration utilities, see the <cite>Release Notes</cite>.</li>
</ul>
</section>
<p>If you have configured the Greenplum Platform Extension Framework (PXF) in your previous
Greenplum Database installation, you must stop the PXF service, and you might need to back up
PXF configuration files before upgrading to a new version of Greenplum Database. Refer to
<xref href="../pxf/upgrade_pxf.html#pxfpre" type="topic" format="html">PXF Pre-Upgrade
Actions</xref> for instructions.</p>
<p>If you have not yet configured PXF, no action is necessary.</p>
</body>
<topic id="topic17">
<title id="pm440937" class="- topic/title ">Upgrading from <ph otherprops="0_or_x">6.x</ph> to a
Newer 6.x Release</title>
<body class="- topic/body ">
<p>An upgrade from Greenplum Database <ph otherprops="0_or_x">6.x</ph> to a newer 6.x release
involves stopping Greenplum Database, updating the Greenplum Database software binaries, and
restarting Greenplum Database. If you are using Greenplum Database extension packages there
are additional requirements. See <xref href="#topic_tbx_szy_kbb/gpdb_prereq" format="dita"/>
in the previous section.</p>
<ol class="- topic/ol " id="ol_wbx_szy_kbb">
<li class="- topic/li ">Log in to your Greenplum Database master host as the Greenplum
administrative user:<codeblock>$ su - gpadmin</codeblock></li>
<li class="- topic/li ">Perform a smart shutdown of your Greenplum Database 6.x system
(there can be no active connections to the database). This example uses the
<codeph>-a</codeph> option to disable confirmation
prompts:<codeblock>$ gpstop -a</codeblock></li>
<li>Copy the new Greenplum Database software installation package to the
<codeph>gpadmin</codeph> user's home directory on each master, standby, and segment
host.</li>
<li class="- topic/li ">As <codeph>root</codeph>, install the new Greenplum Database
software release on each host. For example, on RHEL/CentOS
systems:<codeblock>$ sudo yum install greenplum-db-&lt;version>-&lt;platform>.rpm</codeblock><p>On
Ubuntu
systems:<codeblock># apt install ./greenplum-db-&lt;version>-&lt;platform>.deb</codeblock></p></li>
<li>Update the permissions for the new installation. For example, run this command as
<codeph>root</codeph> to change the user and group of the installed files to
<codeph>gpadmin</codeph>.<codeblock>$ sudo chown -R gpadmin:gpadmin /usr/local/greenplum*</codeblock></li>
<li>If needed, update the <codeph>greenplum_path.sh</codeph> file on the master and standby
master hosts for use with your specific installation. These are some examples.<ul
id="ul_mk3_xjq_cgb">
<li>If Greenplum Database uses LDAP authentication, edit the
<codeph>greenplum_path.sh</codeph> file to add the
line:<codeblock>export LDAPCONF=/etc/openldap/ldap.conf</codeblock></li>
<li>If Greenplum Database uses PL/Java, you might need to set or update the environment
variables <codeph>JAVA_HOME</codeph> and <codeph>LD_LIBRARY_PATH</codeph> in
<codeph>greenplum_path.sh</codeph>. </li>
</ul>
<note>When comparing the previous and new <codeph>greenplum_path.sh</codeph> files, be
aware that installing some Greenplum Database extensions also updates the
<codeph>greenplum_path.sh</codeph> file. The <codeph>greenplum_path.sh</codeph> from
the previous release might contain updates that were the result of installing those
extensions.</note></li>
<li class="- topic/li ">Edit the environment of the Greenplum Database superuser
(<codeph>gpadmin</codeph>) and make sure you are sourcing the <codeph
class="+ topic/ph pr-d/codeph ">greenplum_path.sh</codeph> file for the new
installation. For example change the following line in the <codeph
class="+ topic/ph pr-d/codeph ">.bashrc</codeph> or your chosen profile
file:<codeblock>source /usr/local/greenplum-db-&lt;current_version>/greenplum_path.sh</codeblock><p>to:</p><codeblock>source /usr/local/greenplum-db-&lt;new_version>/greenplum_path.sh</codeblock><p>Or
if you are sourcing a symbolic link (<codeph>/usr/local/greenplum-db</codeph>) in your
profile files, update the link to point to the newly installed version. For
example:</p><codeblock>$ rm /usr/local/greenplum-db
$ ln -s /usr/local/greenplum-db-&lt;new_version> /usr/local/greenplum-db</codeblock></li>
<li class="- topic/li ">Source the environment file you just edited. For
example:<codeblock>$ source ~/.bashrc</codeblock></li>
<li class="- topic/li ">Use the Greenplum Database <codeph>gppkg</codeph> utility to
re-install Greenplum Database extensions. If you were previously using any Greenplum
Database extensions such as pgcrypto, PL/R, PL/Java, or PostGIS, download the
corresponding packages from <xref href="https://network.pivotal.io/products/pivotal-gpdb"
scope="external" format="html" class="- topic/xref ">Pivotal Network</xref>, and install
using this utility. See the extension documentation for details.<p>Also copy any files
that are used by the extensions (such as JAR files, shared object files, and libraries)
from the previous version installation directory to the new version installation
directory on the master and segment host systems. </p></li>
<li class="- topic/li ">After all segment hosts have been upgraded, log in as the
<codeph>gpadmin</codeph> user and restart your Greenplum Database
system:<codeblock># su - gpadmin
$ gpstart</codeblock></li>
<li>If you configured PXF in your previous Greenplum Database installation, you must
re-initialize the PXF service after you upgrade Greenplum Database. Refer to <xref
href="../pxf/upgrade_pxf.html#pxfup" type="topic" format="html">Upgrading PXF</xref> for
instructions.</li>
</ol>
<p>After upgrading Greenplum Database, ensure that all features work as expected. For example,
test that backup and restore perform as expected, and Greenplum Database features such as
user-defined functions, and extensions such as MADlib and PostGIS perform as expected.</p>
</body>
</topic>
<topic id="topic_zbx_szy_kbb">
<title class="- topic/title ">Troubleshooting a Failed Upgrade</title>
<body class="- topic/body ">
<p>If you experience issues during the migration process and have active entitlements for
Greenplum Database that were purchased through Pivotal, contact Pivotal Support. Information
for contacting Pivotal Support is at <xref href="https://support.pivotal.io"
scope="external" format="html" class="- topic/xref ">https://support.pivotal.io</xref>. </p>
<p>
<b class="+ topic/ph hi-d/b ">Be prepared to provide the following information:</b>
</p>
<ul class="- topic/ul " id="ul_acx_szy_kbb">
<li class="- topic/li ">A completed <xref href="#topic17" format="dita">Upgrade
Procedure</xref></li>
<li class="- topic/li ">Log output from <codeph class="+ topic/ph pr-d/codeph "
>gpcheckcat</codeph> (located in <codeph class="+ topic/ph pr-d/codeph "
>~/gpAdminLogs</codeph>)</li>
</ul>
</body>
</topic>
</topic>
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册