未验证 提交 0af5719f 编写于 作者: D David Yozie 提交者: GitHub

Docs: remove install guide source (#6859)

* bump postgresql url reference to 9.4

* Remove source for install guide

* Revert "bump postgresql url reference to 9.4"

This reverts commit ab3405ae380f2f5a08ca5305f51fd431f479eae3.
上级 6cacc636
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic1" xml:lang="en">
<title id="jn135496">Installation Management Utilities</title>
<shortdesc>References for the command-line management utilities used to install and initialize a
Greenplum Database system. </shortdesc>
<body>
<p>For a full reference of all Greenplum Database utilities, see the <i>Greenplum Database
Utility Guide</i>.</p>
<p>The following Greenplum Database management utilities are located in
<codeph>$GPHOME/bin</codeph>.<simpletable id="jn163810">
<strow>
<stentry>
<ul id="ul_vsx_zwn_r4">
<li>
<codeph>
<xref href="../utility_guide/admin_utilities/gpactivatestandby.xml" type="topic"
format="dita" scope="peer">gpactivatestandby</xref>
</codeph>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpaddmirrors.xml" type="topic"
format="dita" scope="peer">gpaddmirrors</xref>
</codeph>
</p>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpcheck.xml" type="topic"
format="dita" scope="peer">gpcheck</xref></codeph> (deprecated)</p>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpcheckperf.xml" type="topic"
format="dita" scope="peer">gpcheckperf</xref>
</codeph>
</p>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpdeletesystem.xml" type="topic"
format="dita" scope="peer">gpdeletesystem</xref>
</codeph>
</p>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpinitstandby.xml" type="topic"
format="dita" scope="peer">gpinitstandby</xref>
</codeph>
</p>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpinitsystem.xml" type="topic"
format="dita" scope="peer">gpinitsystem</xref>
</codeph>
</p>
</li>
</ul>
</stentry>
<stentry>
<ul id="ul_zy5_fxn_r4">
<li>
<codeph>
<xref href="../utility_guide/admin_utilities/gppkg.xml" type="topic" format="dita"
scope="peer">gppkg</xref>
</codeph>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpscp.xml" type="topic"
format="dita" scope="peer">gpscp</xref>
</codeph>
</p>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpseginstall.xml" type="topic"
format="dita" scope="peer">gpseginstall</xref>
</codeph>
</p>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpssh.xml" type="topic"
format="dita" scope="peer">gpssh</xref>
</codeph>
</p>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpssh-exkeys.xml" type="topic"
format="dita" scope="peer">gpssh-exkeys</xref>
</codeph>
</p>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpstart.xml" type="topic"
format="dita" scope="peer">gpstart</xref>
</codeph>
</p>
</li>
<li>
<p>
<codeph>
<xref href="../utility_guide/admin_utilities/gpstop.xml" type="topic"
format="dita" scope="peer">gpstop</xref>
</codeph>
</p>
</li>
</ul>
</stentry>
</strow>
</simpletable></p>
</body>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic1" xml:lang="en">
<title id="jh138244">Estimating Storage Capacity</title>
<shortdesc>To estimate how much data your Greenplum Database system can accommodate, use these
measurements as guidelines. Also keep in mind that you may want to have extra space for landing
backup files and data load files on each segment host. </shortdesc>
<topic id="topic2" xml:lang="en">
<title id="jh159441">Calculating Usable Disk Capacity</title>
<body>
<p>To calculate how much data a Greenplum Database system can hold, you determine the usable
disk capacity per segment host and then multiply that by the number of segment hosts in your
Greenplum Database array. Start with the raw capacity of the physical disks on a segment
host that are available for data storage (<varname>raw_capacity</varname>), which is:</p>
<codeblock><varname>disk_size</varname> * <varname>number_of_disks</varname></codeblock>
<p>Account for file system formatting overhead (roughly 10 percent) and the RAID level you are
using. For example, if using RAID-10, the calculation would be:</p>
<codeblock>(<varname>raw_capacity</varname> * 0.9) / 2 = <varname>formatted_disk_space</varname></codeblock>
<p>For optimal performance, do not completely fill your disks to
capacity, but run at 70% or lower. So with this in mind, calculate the usable disk space as
follows:</p>
<codeblock><varname>formatted_disk_space</varname> * 0.7 = <varname>usable_disk_space</varname></codeblock>
<p>Once you have formatted RAID disk arrays and accounted for the maximum recommended capacity
(<varname>usable_disk_space</varname>), you will need to calculate how much storage is
actually available for user data (<codeph>U</codeph>). If using Greenplum Database mirrors
for data redundancy, this would then double the size of your user data (<codeph>2 *
U</codeph>). Greenplum Database also requires some space be reserved as a working area for
active queries. The work space should be approximately one third the size of your user data
(work space =
<codeph>U/3</codeph>):<codeblock><b>With mirrors:</b> (2 * U) + U/3 = <varname>usable_disk_space</varname>
<b>Without mirrors:</b> U + U/3 = <varname>usable_disk_space</varname></codeblock></p>
<p>Guidelines for temporary file space and user data space assume a typical analytic workload.
Highly concurrent workloads or workloads with queries that require very large amounts of
temporary space can benefit from reserving a larger working area. Typically, overall system
throughput can be increased while decreasing work area usage through proper workload
management. Additionally, temporary space and user space can be isolated from each other by
specifying that they reside on different tablespaces.</p>
<p>In the <i>Greenplum Database Administrator Guide</i>, see these topics:</p>
<ul>
<li id="jh161458">"Managing Workload and Resources" for information about workload
management </li>
<li id="jh161469">"Creating and Managing Tablespaces" for information about moving the
location of temporary files </li>
<li id="jh161162">"Monitoring System State" for information about monitoring Greenplum
Database disk space usage</li>
</ul>
</body>
</topic>
<topic id="topic3" xml:lang="en">
<title id="jh159695">Calculating User Data Size</title>
<body>
<p>As with all databases, the size of your raw data will be slightly larger once it is loaded
into the database. On average, raw data will be about 1.4 times larger on disk after it is
loaded into the database, but could be smaller or larger depending on the data types you are
using, table storage type, in-database compression, and so on.</p>
<ul>
<li id="jh157686">Page Overhead - When your data is loaded into Greenplum Database, it is
divided into pages of 32KB each. Each page has 20 bytes of page overhead.</li>
<li id="jh157687">Row Overhead - In a regular 'heap' storage table, each row of data has 24
bytes of row overhead. An 'append-optimized' storage table has only 4 bytes of row
overhead.</li>
<li id="jh157688">Attribute Overhead - For the data values itself, the size associated with
each attribute value is dependent upon the data type chosen. As a general rule, you want
to use the smallest data type possible to store your data (assuming you know the possible
values a column will have).</li>
<li id="jh157689">Indexes - In Greenplum Database, indexes are distributed across the
segment hosts as is table data. The default index type in Greenplum Database is B-tree.
Because index size depends on the number of unique values in the index and the data to be
inserted, precalculating the exact size of an index is impossible. However, you can
roughly estimate the size of an index using these
formulas.<codeblock><b>B-tree:</b> <varname>unique_values</varname> * (<varname>data_type_size</varname> + 24 bytes)
<b>Bitmap:</b> (<varname>unique_values</varname> * <varname>number_of_rows</varname> * 1 bit * <varname>compression_ratio</varname> / 8) + (<varname>unique_values</varname> * 32)</codeblock></li>
</ul>
</body>
</topic>
<topic id="topic4" xml:lang="en">
<title id="jh159741">Calculating Space Requirements for Metadata and Logs</title>
<body>
<p>On each segment host, you will also want to account for space for Greenplum Database log
files and metadata:</p>
<ul>
<li id="jh159754"><b>System Metadata</b> — For each Greenplum Database segment instance
(primary or mirror) or master instance running on a host, estimate approximately 20 MB for
the system catalogs and metadata. </li>
<li id="jh159758"><b>Write Ahead Log</b> — For each Greenplum Database segment (primary or
mirror) or master instance running on a host, allocate space for the write ahead log
(WAL). The WAL is divided into segment files of 64 MB each. At most, the number of WAL
files will be: <codeblock>2 * <varname>checkpoint_segments</varname> + 1</codeblock><p>You
can use this to estimate space requirements for WAL. The default
<varname>checkpoint_segments</varname> setting for a Greenplum Database instance is 8,
meaning 1088 MB WAL space allocated for each segment or master instance on a
host.</p></li>
<li id="jh159765"><b>Greenplum Database Log Files</b> — Each segment instance and the master
instance generates database log files, which will grow over time. Sufficient space should
be allocated for these log files, and some type of log rotation facility should be used to
ensure that to log files do not grow too large. </li>
<li id="jh160818"><b>Command Center Data</b> — The data collection agents utilized by
Command Center run on the same set of hosts as your Greenplum Database instance and
utilize the system resources of those hosts. The resource consumption of the data
collection agent processes on these hosts is minimal and should not significantly impact
database performance. Historical data collected by the collection agents is stored in its
own Command Center database (named <codeph>gpperfmon</codeph>) within your Greenplum
Database system. Collected data is distributed just like regular database data, so you
will need to account for disk space in the data directory locations of your Greenplum
segment instances. The amount of space required depends on the amount of historical data
you would like to keep. Historical data is not automatically truncated. Database
administrators must set up a truncation policy to maintain the size of the Command Center
database.</li>
</ul>
</body>
</topic>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic xml:lang="en-us" id="about">
<title>Copyright</title>
<body outputclass="db.chapter">
<p>
<xref href="http://pivotal.io/privacy-policy" format="html" scope="external">Privacy
Policy</xref> | <xref href="http://pivotal.io/terms-of-use" format="html"
scope="external">Terms of Use</xref></p>
<p>Copyright © 2017 Pivotal Software, Inc. All rights reserved.</p>
<p>Pivotal Software, Inc. believes the information in this publication is accurate as of its
publication date. The information is subject to change without notice. THE INFORMATION IN
THIS PUBLICATION IS PROVIDED "AS IS." PIVOTAL SOFTWARE, INC. ("Pivotal") MAKES NO
REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS
PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE.</p>
<p>Use, copying, and distribution of any Pivotal software described in this publication
requires an applicable software license.</p>
<p>All trademarks used herein are the property of Pivotal or their respective owners.</p>
<p> </p>
<p>Revised January 2017 (4.3.11.2)</p>
</body>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE map PUBLIC "-//OASIS//DTD DITA Map//EN" "map.dtd">
<map>
<topicref href="data_sci_pkgs.xml" navtitle="Installing the Data Science Packages">
<topicref href="install_python_dsmod.xml"
navtitle="Installing the Python Data Science Modules"/>
<topicref href="install_r_dslib.xml" navtitle="Installing the R Data Science Libraries"/>
</topicref>
</map>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="topic_dscipkg">
<title>Installing the Data Science Packages</title>
<titlealts>
<!--HTML-only page-->
<navtitle>Installing the Data Science Packages</navtitle>
</titlealts>
<shortdesc>Information about installing the Greenplum Database Python and R Data Science Packages.</shortdesc>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic xmlns:ditaarch="http://dita.oasis-open.org/architecture/2005/" id="topic1" xml:lang="en"
ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id135496" class="- topic/title ">Greenplum Environment Variables</title>
<shortdesc>Reference of the environment variables to set for Greenplum Database. </shortdesc>
<body class="- topic/body ">
<p>Set these in your user's startup shell profile (such as <codeph
class="+ topic/ph pr-d/codeph ">~/.bashrc</codeph> or <codeph
class="+ topic/ph pr-d/codeph ">~/.bash_profile</codeph>), or in
<codeph class="+ topic/ph pr-d/codeph ">/etc/profile</codeph> if you
want to set them for all users.</p>
</body>
<topic id="topic2" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title class="- topic/title ">Required Environment Variables</title>
<body class="- topic/body ">
<note type="note" class="- topic/note "><codeph
class="+ topic/ph pr-d/codeph ">GPHOME</codeph>, <codeph
class="+ topic/ph pr-d/codeph ">PATH</codeph> and <codeph
class="+ topic/ph pr-d/codeph ">LD_LIBRARY_PATH</codeph> can
be set by sourcing the <codeph>greenplum_path.sh</codeph> file from
your Greenplum Database installation directory</note>
</body>
<topic id="topic3" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138636" class="- topic/title ">GPHOME</title>
<body class="- topic/body ">
<p>This is the installed location of your Greenplum Database
software. For example:</p>
<codeblock>GPHOME=/usr/local/greenplum-db-4.3.<varname>x.x</varname>
export GPHOME</codeblock>
</body>
</topic>
<topic id="topic4" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id139357" class="- topic/title ">PATH</title>
<body class="- topic/body ">
<p>Your <codeph class="+ topic/ph pr-d/codeph ">PATH</codeph>
environment variable should point to the location of the
Greenplum Database <codeph class="+ topic/ph pr-d/codeph "
>bin</codeph> directory. For example:</p>
<codeblock>PATH=$GPHOME/bin:$PATH
export PATH</codeblock>
</body>
</topic>
<topic id="topic5" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138662" class="- topic/title ">LD_LIBRARY_PATH</title>
<body class="- topic/body ">
<p>The <codeph class="+ topic/ph pr-d/codeph "
>LD_LIBRARY_PATH</codeph> environment variable
should point to the location of the Greenplum
Database/PostgreSQL library files. For example:</p>
<codeblock>LD_LIBRARY_PATH=$GPHOME/lib
export LD_LIBRARY_PATH</codeblock>
</body>
</topic>
<topic id="topic6" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138677" class="- topic/title ">MASTER_DATA_DIRECTORY</title>
<body class="- topic/body ">
<p>This should point to the directory created by the gpinitsystem
utility in the master data directory location. For
example:</p>
<codeblock>MASTER_DATA_DIRECTORY=/data/master/gpseg-1
export MASTER_DATA_DIRECTORY</codeblock>
</body>
</topic>
</topic>
<topic id="topic7" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title class="- topic/title ">Optional Environment Variables</title>
<body class="- topic/body ">
<p>The following are standard PostgreSQL environment variables, which are
also recognized in Greenplum Database. You may want to add the
connection-related environment variables to your profile for
convenience, so you do not have to type so many options on the
command line for client connections. Note that these environment
variables should be set on the Greenplum Database master host
only.</p>
</body>
<topic id="topic8" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id139713" class="- topic/title ">PGAPPNAME</title>
<body class="- topic/body ">
<p>The name of the application that is usually set by an application
when it connects to the server. This name is displayed in
the activity view and in log entries. The <codeph
class="+ topic/ph pr-d/codeph ">PGAPPNAME</codeph>
environmental variable behaves the same as the <codeph
class="+ topic/ph pr-d/codeph "
>application_name</codeph> connection parameter. The
default value for <codeph class="+ topic/ph pr-d/codeph "
>application_name</codeph> is <codeph>psql</codeph>.
The name cannot be longer than 63 characters. </p>
</body>
</topic>
<topic id="topic9" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id139717" class="- topic/title ">PGDATABASE</title>
<body class="- topic/body ">
<p>The name of the default database to use when connecting.</p>
</body>
</topic>
<topic id="topic10" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138709" class="- topic/title ">PGHOST</title>
<body class="- topic/body ">
<p>The Greenplum Database master host name.</p>
</body>
</topic>
<topic id="topic11" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138715" class="- topic/title ">PGHOSTADDR</title>
<body class="- topic/body ">
<p>The numeric IP address of the master host. This can be set
instead of or in addition to <codeph
class="+ topic/ph pr-d/codeph ">PGHOST</codeph> to
avoid DNS lookup overhead.</p>
</body>
</topic>
<topic id="topic12" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138718" class="- topic/title ">PGPASSWORD </title>
<body class="- topic/body ">
<p>The password used if the server demands password authentication.
Use of this environment variable is not recommended for
security reasons (some operating systems allow non-root
users to see process environment variables via <codeph
class="+ topic/ph pr-d/codeph ">ps</codeph>).
Instead consider using the <codeph
class="+ topic/ph pr-d/codeph ">~/.pgpass</codeph>
file.</p>
</body>
</topic>
<topic id="topic13" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138721" class="- topic/title ">PGPASSFILE </title>
<body class="- topic/body ">
<p>The name of the password file to use for lookups. If not set, it
defaults to <codeph class="+ topic/ph pr-d/codeph "
>~/.pgpass</codeph>. See the topic about <xref
href="https://www.postgresql.org/docs/8.2/libpq-pgpass.html"
scope="external" format="html" class="- topic/xref "
>The Password File</xref> in the PostgreSQL
documentation for more information.</p>
</body>
</topic>
<topic id="topic14" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138725" class="- topic/title ">PGOPTIONS</title>
<body class="- topic/body ">
<p>Sets additional configuration parameters for the Greenplum
Database master server.</p>
</body>
</topic>
<topic id="topic15" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138731" class="- topic/title ">PGPORT</title>
<body class="- topic/body ">
<p>The port number of the Greenplum Database server on the master
host. The default port is 5432.</p>
</body>
</topic>
<topic id="topic16" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138741" class="- topic/title ">PGUSER</title>
<body class="- topic/body ">
<p>The Greenplum Database user name used to connect.</p>
</body>
</topic>
<topic id="topic17" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138747" class="- topic/title ">PGDATESTYLE</title>
<body class="- topic/body ">
<p>Sets the default style of date/time representation for a session.
(Equivalent to <codeph class="+ topic/ph pr-d/codeph ">SET
datestyle TO...</codeph>)</p>
</body>
</topic>
<topic id="topic18" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138750" class="- topic/title ">PGTZ</title>
<body class="- topic/body ">
<p>Sets the default time zone for a session. (Equivalent to <codeph
class="+ topic/ph pr-d/codeph ">SET timezone
TO...</codeph>)</p>
</body>
</topic>
<topic id="topic19" xml:lang="en" ditaarch:DITAArchVersion="1.1"
domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d) (topic indexing-d)"
class="- topic/topic ">
<title id="id138753" class="- topic/title ">PGCLIENTENCODING</title>
<body class="- topic/body ">
<p>Sets the default client character set encoding for a session.
(Equivalent to <codeph class="+ topic/ph pr-d/codeph ">SET
client_encoding TO...</codeph>)</p>
</body>
</topic>
</topic>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic1" xml:lang="en">
<title id="jm138244">Initializing a Greenplum Database System</title>
<shortdesc>Describes how to initialize a Greenplum Database database system. </shortdesc>
<body>
<p>The instructions in this chapter assume you have already installed the Greenplum Database
software on all of the hosts in the system according to the instructions in <xref
href="prep_os_install_gpdb.xml#topic1" type="topic" format="dita"/>. </p>
<p>This chapter contains the following topics:</p>
<ul>
<li id="jm137819">
<xref href="#topic2" type="topic" format="dita"/>
</li>
<li id="jm138219">
<xref href="#topic3" type="topic" format="dita"/>
</li>
<li id="jm138984">
<xref href="#topic9" type="topic" format="dita"/>
</li>
</ul>
</body>
<topic id="topic2" xml:lang="en">
<title id="jm137384">Overview</title>
<body>
<p>Because Greenplum Database is distributed, the process for initializing a Greenplum
Database management system (DBMS) involves initializing several individual PostgreSQL
database instances (called <i>segment instances</i> in Greenplum).</p>
<p>Each database instance (the master and all segments) must be initialized across all of the
hosts in the system in such a way that they can all work together as a unified DBMS.
Greenplum provides its own version of <codeph>initdb</codeph> called <codeph><xref
href="../utility_guide/admin_utilities/gpinitsystem.html" type="topic" format="dita"
scope="peer">gpinitsystem</xref></codeph>, which takes care of initializing the database
on the master and on each segment instance, and starting each instance in the correct order. </p>
<p>After the Greenplum Database database system has been initialized and started, you can then
create and manage databases as you would in a regular PostgreSQL DBMS by connecting to the
Greenplum master.</p>
</body>
</topic>
<topic id="topic3" xml:lang="en">
<title id="jm137833">Initializing Greenplum Database</title>
<body>
<p>These are the high-level tasks for initializing Greenplum Database:</p>
<ol>
<li id="jm138478">Make sure you have completed all of the installation tasks described in
<xref href="prep_os_install_gpdb.xml#topic1" type="topic" format="dita"/>.</li>
<li id="jm138489">Create a host file that contains the host addresses of your
<varname>segments</varname>. See <xref href="#topic4" type="topic" format="dita"/>.</li>
<li id="jm138493">Create your Greenplum Database system configuration file. See <xref
href="#topic5" type="topic" format="dita"/>.</li>
<li id="jm138497">By default, Greenplum Database will be initialized using the locale of the
master host system. Make sure this is the correct locale you want to use, as some locale
options cannot be changed after initialization. See <xref href="localization.xml#topic1"
type="topic" format="dita"/> for more information.</li>
<li id="jm143805">Run the Greenplum Database initialization utility on the master host. See
<xref href="#topic6" type="topic" format="dita"/>.</li>
</ol>
</body>
<topic id="topic4" xml:lang="en">
<title id="jm138555">Creating the Initialization Host File</title>
<body>
<p>The <codeph><xref href="../utility_guide/admin_utilities/gpinitsystem.xml" type="topic"
format="dita" scope="peer">gpinitsystem</xref></codeph> utility requires a host file
that contains the list of addresses for each segment host. The initialization utility
determines the number of segment instances per host by the number host addresses listed
per host times the number of data directory locations specified in the
<codeph>gpinitsystem_config</codeph> file.</p>
<p>This file should only contain <varname>segment</varname> host addresses (not the master
or standby master). For segment machines with multiple, unbonded network interfaces, this
file should list the host address names for each interface — one per line.</p>
<note type="note">The Greenplum Database segment host naming convention is
<varname>sdwN</varname> where <varname>sdw</varname> is a prefix and
<varname>N</varname> is an integer. For example, <codeph>sdw2</codeph> and so on. If
hosts have multiple unbonded NICs, the convention is to append a dash (<codeph>-</codeph>)
and number to the host name. For example, <codeph>sdw1-1</codeph> and
<codeph>sdw1-2</codeph> are the two interface names for host <codeph>sdw1</codeph>.
However, NIC bonding is recommended to create a load-balanced, fault-tolerant
network.</note>
<section id="jm138608">
<title>To create the initialization host file</title>
<ol>
<li id="jm144077">Log in as
<codeph>gpadmin</codeph>.<codeblock>$ su - gpadmin</codeblock></li>
<li id="jm144112">Create a file named <codeph>hostfile_gpinitsystem</codeph>. In this
file add the host address name(s) of your <i>segment</i> host interfaces, one name per
line, no extra lines or spaces. For example, if you have four segment hosts with two
unbonded network interfaces
each:<codeblock>sdw1-1
sdw1-2
sdw2-1
sdw2-2
sdw3-1
sdw3-2
sdw4-1
sdw4-2</codeblock></li>
<li id="jm138635">Save and close the file.</li>
</ol>
<note type="note">If you are not sure of the host names and/or interface address names
used by your machines, look in the <codeph>/etc/hosts</codeph> file.</note>
</section>
</body>
</topic>
<topic id="topic5" xml:lang="en">
<title id="jm138566">Creating the Greenplum Database Configuration File</title>
<body>
<p>Your Greenplum Database configuration file tells the <codeph><xref
href="../utility_guide/admin_utilities/gpinitsystem.html" type="topic" format="dita"
scope="peer">gpinitsystem</xref></codeph> utility how you want to configure your
Greenplum Database system. An example configuration file can be found in
<codeph>$GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config</codeph>.</p>
<section id="jm138725">
<title>To create a gpinitsystem_config file</title>
<ol>
<li id="jm144228">Log in as
<codeph>gpadmin</codeph>.<codeblock>$ su - gpadmin</codeblock></li>
<li id="jm144214">Make a copy of the <codeph>gpinitsystem_config</codeph> file to use as
a starting point. For
example:<codeblock>$ cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config \
/home/gpadmin/gpconfigs/gpinitsystem_config</codeblock></li>
<li id="jm138730">Open the file you just copied in a text editor. <p>Set all of the
required parameters according to your environment. See <xref
href="../utility_guide/admin_utilities/gpinitsystem.xml" type="topic"
format="dita" scope="peer">gpinitsystem</xref> for more information. A Greenplum
Database system must contain a master instance and at <i>least two</i> segment
instances (even if setting up a single node system). </p><p>The
<codeph>DATA_DIRECTORY</codeph> parameter is what determines how many segments per
host will be created. If your segment hosts have multiple network interfaces, and
you used their interface address names in your host file, the number of segments
will be evenly spread over the number of available interfaces.</p><p>Here is an
example of the <i>required</i> parameters in the
<codeph>gpinitsystem_config</codeph> file:
</p><codeblock>ARRAY_NAME="EMC Greenplum DW"
SEG_PREFIX=gpseg
PORT_BASE=40000
declare -a DATA_DIRECTORY=(/data1/primary /data1/primary
/data1/primary /data2/primary /data2/primary /data2/primary)
MASTER_HOSTNAME=mdw
MASTER_DIRECTORY=/data/master
MASTER_PORT=5432
TRUSTED SHELL=ssh
CHECK_POINT_SEGMENTS=8
ENCODING=UNICODE</codeblock></li>
<li id="jm143448">(Optional) If you want to deploy mirror segments, uncomment and set
the mirroring parameters according to your environment. Here is an example of the
<i>optional</i> mirror parameters in the <codeph>gpinitsystem_config</codeph> file:<codeblock>MIRROR_PORT_BASE=7000
REPLICATION_PORT_BASE=8000
MIRROR_REPLICATION_PORT_BASE=9000
declare -a MIRROR_DATA_DIRECTORY=(/data1/mirror /data1/mirror /data1/mirror /data2/mirror /data2/mirror /data2/mirror)</codeblock>
<note>You can initialize your Greenplum system with primary segments only and deploy
mirrors later using the <codeph><xref
href="../utility_guide/admin_utilities/gpaddmirrors.xml" type="topic"
format="dita" scope="peer">gpaddmirrors</xref></codeph> utility.</note></li>
<li id="jm138783">Save and close the file.</li>
</ol>
</section>
</body>
</topic>
<topic id="topic6" xml:lang="en">
<title id="jm142333">Running the Initialization Utility</title>
<body>
<p>The <codeph><xref href="../utility_guide/admin_utilities/gpinitsystem.xml" type="topic"
format="dita" scope="peer">gpinitsystem</xref></codeph> utility will create a
Greenplum Database system using the values defined in the configuration file.</p>
<section id="jm138821">
<title>To run the initialization utility</title>
<ol>
<li id="jm143533">Run the following command referencing the path and file name of your
initialization configuration file (<codeph>gpinitsystem_config</codeph>) and host file
(<codeph>hostfile_gpinitsystem</codeph>). For
example:<codeblock>$ cd ~
$ gpinitsystem -c gpconfigs/gpinitsystem_config -h gpconfigs/hostfile_gpinitsystem</codeblock><p>For
a fully redundant system (with a standby master and a <i>spread</i> mirror
configuration) include the <codeph>-s</codeph> and <codeph>-S</codeph> options. For
example:</p><codeblock>$ gpinitsystem -c gpconfigs/gpinitsystem_config -h gpconfigs/hostfile_gpinitsystem \
-s <varname>standby_master_hostname</varname> -S</codeblock></li>
<li id="jm138853">The utility will verify your setup information and make sure it can
connect to each host and access the data directories specified in your configuration.
If all of the pre-checks are successful, the utility will prompt you to confirm your
configuration. For
example:<codeblock>=&gt; Continue with Greenplum creation? <b>Yy/Nn</b></codeblock></li>
<li id="jm143919">Press <codeph>y</codeph> to start the initialization.</li>
<li id="jm141337">The utility will then begin setup and initialization of the master
instance and each segment instance in the system. Each segment instance is set up in
parallel. Depending on the number of segments, this process can take a while.</li>
<li id="jm141341">At the end of a successful setup, the utility will start your
Greenplum Database system. You should
see:<codeblock><varname>=&gt; Greenplum Database instance successfully created.</varname></codeblock></li>
</ol>
</section>
</body>
<topic id="topic7" xml:lang="en">
<title id="jm138878">Troubleshooting Initialization Problems</title>
<body>
<p>If the utility encounters any errors while setting up an instance, the entire process
will fail, and could possibly leave you with a partially created system. Refer to the
error messages and logs to determine the cause of the failure and where in the process
the failure occurred. Log files are created in <codeph>~/gpAdminLogs</codeph>.</p>
<p>Depending on when the error occurred in the process, you may need to clean up and then
try the gpinitsystem utility again. For example, if some segment instances were created
and some failed, you may need to stop <codeph>postgres</codeph> processes and remove any
utility-created data directories from your data storage area(s). A backout script is
created to help with this cleanup if necessary.</p>
<section id="jm139087">
<title>Using the Backout Script</title>
<p>If the gpinitsystem utility fails, it will create the following backout script if it
has left your system in a partially installed state:</p>
<p>
<codeph>~/gpAdminLogs/backout_gpinitsystem_<varname>&lt;user></varname>_<varname>&lt;timestamp&gt;</varname></codeph>
</p>
<p>You can use this script to clean up a partially created Greenplum Database system.
This backout script will remove any utility-created data directories,
<codeph>postgres</codeph> processes, and log files. After correcting the error that
caused gpinitsystem to fail and running the backout script, you should be ready to
retry initializing your Greenplum Database array.</p>
<p>The following example shows how to run the backout script:</p>
<codeblock>$ sh backout_gpinitsystem_gpadmin_20071031_121053</codeblock>
</section>
</body>
</topic>
</topic>
</topic>
<topic id="topic8" xml:lang="en">
<title id="jm144793">Setting Greenplum Environment Variables</title>
<body>
<p>You must configure your environment on the Greenplum Database master (and standby master).
A <codeph>greenplum_path.sh</codeph> file is provided in your <codeph>$GPHOME</codeph>
directory with environment variable settings for Greenplum Database. You can source this
file in the <codeph>gpadmin</codeph> user's startup shell profile (such as
<codeph>.bashrc</codeph>). </p>
<p>The Greenplum Database management utilities also require that the
<codeph>MASTER_DATA_DIRECTORY</codeph> environment variable be set. This should point to
the directory created by the gpinitsystem utility in the master data directory location. </p>
<note>The <codeph>greenplum_path.sh</codeph> script changes the operating environment in order
to support running the Greenplum Database-specific utilities. These same changes to the
environment can negatively affect the operation of other system-level utilities, such as
<codeph>ps</codeph> or <codeph>yum</codeph>. Use separate accounts for performing system
administration and database administration, instead of attempting to perform both functions
as <codeph>gpadmin</codeph>.</note>
<section id="jm144961">
<title>To set up your user environment for Greenplum</title>
<ol>
<li id="jm144981">Make sure you are logged in as
<codeph>gpadmin</codeph>:<codeblock>$ su - gpadmin</codeblock></li>
<li id="jm145055">Open your profile file (such as <codeph>.bashrc</codeph>) in a text
editor. For example:<codeblock>$ vi ~/.bashrc</codeblock></li>
<li id="jm145072">Add lines to this file to source the <codeph>greenplum_path.sh</codeph>
file and set the <codeph>MASTER_DATA_DIRECTORY</codeph> environment variable. For
example:<codeblock>source /usr/local/greenplum-db/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/data/master/gpseg-1</codeblock></li>
<li id="jm145148">(Optional) You may also want to set some client session environment
variables such as <codeph>PGPORT</codeph>, <codeph>PGUSER</codeph> and
<codeph>PGDATABASE</codeph> for convenience. For
example:<codeblock>export PGPORT=5432
export PGUSER=gpadmin export PGDATABASE=<varname>default_login_database_name</varname></codeblock></li>
<li>(Optional) If you use RHEL 7 or CentOS 7, add the following line to the end of the
<codeph>.bashrc</codeph> file to enable using the <codeph>ps</codeph> command in the
<codeph>greenplum_path.sh</codeph>
environment:<codeblock>export LD_PRELOAD=/lib64/libz.so.1 ps</codeblock></li>
<li id="jm145208">Save and close the file.</li>
<li id="jm145155">After editing the profile file, source it to make the changes active.
For example:<codeblock>$ source ~/.bashrc
</codeblock></li>
<li id="jm145342">If you have a standby master host, copy your environment file to the
standby master as well. For
example:<codeblock>$ cd ~
$ scp .bashrc <varname>standby_hostname</varname>:`pwd`</codeblock></li>
</ol>
<note type="note">The <codeph>.bashrc</codeph> file should not produce any output. If you
wish to have a message display to users upon logging in, use the <codeph>.profile</codeph>
file instead. </note>
</section>
</body>
</topic>
<topic id="topic9" xml:lang="en">
<title id="jm138979">Next Steps</title>
<body>
<p>After your system is up and running, the next steps are:</p>
<ul>
<li id="jm139449">
<xref href="#topic10" type="topic" format="dita"/>
</li>
<li id="jm142387">
<xref href="#topic11" type="topic" format="dita"/>
</li>
</ul>
</body>
<topic id="topic10" xml:lang="en">
<title id="jm138989">Allowing Client Connections</title>
<body>
<p>After a Greenplum Database is first initialized it will only allow local connections to
the database from the <codeph>gpadmin</codeph> role (or whatever system user ran
<codeph>gpinitsystem</codeph>). If you would like other users or client machines to be
able to connect to Greenplum Database, you must give them access. See the <i>Greenplum
Database Administrator Guide</i> for more information.</p>
</body>
</topic>
<topic id="topic11" xml:lang="en">
<title id="jm139424">Creating Databases and Loading Data</title>
<body>
<p>After verifying your installation, you may want to begin creating databases and loading
data. See <xref href="../admin_guide/ddl/ddl.xml" format="dita" type="topic" scope="peer"
>Defining Database Objects</xref> and <xref
href="../admin_guide/load/topics/g-loading-and-unloading-data.xml" type="topic"
format="dita" scope="peer">Loading and Unloading Data</xref> in the <i>Greenplum
Database Administrator Guide</i> for more information about creating databases, schemas,
tables, and other database objects in Greenplum Database and loading your data.</p>
</body>
</topic>
</topic>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE map PUBLIC "-//OASIS//DTD DITA Map//EN" "map.dtd">
<map>
<topicref href="../../600/homenav.html" scope="external" navtitle="Greenplum Database® 6.0 Documentation"
format="html" otherprops="oss-only"/>
<topicref href="../../600/homenav.html" scope="external" navtitle="Pivotal Greenplum® 6.0 Documentation"
format="html" otherprops="pivotal"/>
<topicref href="install_guide.xml" navtitle="Installation Guide">
<topicref href="preinstall_concepts.xml">
<topicref href="preinstall_concepts.xml#topic2"/>
<topicref href="preinstall_concepts.xml#topic4"/>
<topicref href="preinstall_concepts.xml#topic9"/>
<topicref href="preinstall_concepts.xml#topic13"/>
<topicref href="preinstall_concepts.xml#topic_ln5_xhm_kbb"/>
</topicref>
<topicref href="capacity_planning.xml" navtitle="Estimating Storage Capacity">
<topicref href="capacity_planning.xml#topic2" navtitle="Calculating Usable Disk
Capacity"/>
<topicref href="capacity_planning.xml#topic3" navtitle="Calculating User Data Size"/>
<topicref href="capacity_planning.xml#topic4" navtitle="Calculating Space Requirements
for Metadata and Logs"/>
</topicref>
<topicref href="prep_os_install_gpdb.xml" navtitle="Configuring Your Systems and
Installing Greenplum">
<topicref href="prep_os_install_gpdb.xml#topic2" navtitle="System Requirements"/>
<topicref href="prep_os_install_gpdb.xml#topic3" navtitle="Setting the Greenplum Recommended OS
Parameters"/>
<topicref href="prep_os_install_gpdb.xml#topic_ylh_b53_c1b"/>
<topicref href="prep_os_install_gpdb.xml#topic8" navtitle="Installing and Configuring
Greenplum on all Hosts"/>
<topicref href="prep_os_install_gpdb.xml#topic11" navtitle="Installing Oracle
Compatibility Functions"/>
<topicref href="prep_os_install_gpdb.xml#topic_sqb_bsw_2z"/>
<topicref href="prep_os_install_gpdb.xml#topic12"
navtitle="Installing Greenplum
Database Extensions"
otherprops="pivotal"/>
<topicref href="prep_os_install_gpdb.xml#topic13" navtitle="Creating the Data Storage
Areas">
<topicref href="prep_os_install_gpdb.xml#topic_wqb_1lc_wp"
navtitle="Creating the Data Storage Areas - Master"/>
<topicref href="prep_os_install_gpdb.xml#topic_pgz_qkc_wp"
navtitle="Creating the Data Storage Areas - Segments"/>
</topicref>
<topicref href="prep_os_install_gpdb.xml#topic_qst_s5t_wy" navtitle="Synchronizing System Clocks"/>
<topicref href="prep_os_install_gpdb.xml#topic15" navtitle="Enabling iptables"/>
<topicref href="prep_os_install_gpdb.xml#ec2_config" navtitle="Amazon EC2 Configuration"/>
<topicref href="prep_os_install_gpdb.xml#topic19" navtitle="Next Steps"/>
</topicref>
<topicref href="data_sci_pkgs.ditamap" format="ditamap" otherprops="pivotal" />
<topicref href="validate.xml" navtitle="Validating Your Systems">
<topicref href="validate.xml#topic2" navtitle="Validating Your Systems"/>
<topicref href="validate.xml#topic3" navtitle="Validating Your Systems">
<topicref href="validate.xml#topic4" navtitle="Validating Your Systems"/>
</topicref>
<topicref href="validate.xml#topic5" navtitle="Validating Your Systems"/>
</topicref>
<topicref href="localization.xml" navtitle="Configuring Localization Settings"/>
<topicref href="init_gpdb.xml" navtitle="Initializing a Greenplum Database System"/>
<topicref href="apx_mgmt_utils.xml" navtitle="Installation Management Utilities"/>
<topicref href="env_var_ref.xml" navtitle="Greenplum Environment Variables">
<topicref href="env_var_ref.xml#topic2" navtitle="Required Environment Variables"/>
<topicref href="env_var_ref.xml#topic7" navtitle="Optional Environment Variables"/>
</topicref>
</topicref>
</map>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="topic_gls_1nf_kp">
<title>Greenplum Database Installation Guide</title>
<!--HTML-only page-->
<titlealts>
<navtitle>Installation Guide</navtitle>
</titlealts>
<shortdesc>Information about installing and configuring Greenplum Database software and
configuring Greenplum Database host machines.</shortdesc>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic1" xml:lang="en">
<title id="pw216155">Python Data Science Module Package</title>
<body>
<p>Greenplum Database provides a collection of data science-related Python modules that can be
used with the Greenplum Database PL/Python language. You can download these modules in
<codeph>.gppkg</codeph> format from <xref
href="https://network.pivotal.io/products/pivotal-gpdb" format="html" scope="external"
>Pivotal Network</xref>.</p>
<p>This section contains the following information:</p>
<ul>
<li id="pw22228177">
<xref href="#topic_pydatascimod" type="topic" format="dita"/>
</li>
<li id="pw22228178">
<xref href="#topic_instpdsm" type="topic" format="dita"/>
</li>
<li id="pw22228179">
<xref href="#topic_removepdsm" type="topic" format="dita"/>
</li>
</ul>
<p>For information about the Greenplum Database PL/Python Language, see <xref scope="peer"
type="topic" format="dita" href="../ref_guide/extensions/pl_python.xml#topic1">Greenplum
PL/Python Language Extension</xref>.</p>
</body>
<topic id="topic_pydatascimod">
<title>Python Data Science Modules</title>
<body>
<p>Modules provided in the Python Data Science package include: <table id="iq1395577">
<title>Data Science Modules</title>
<tgroup cols="2">
<colspec colnum="1" colname="col1" colwidth="1*"/>
<colspec colnum="2" colname="col2" colwidth="2*"/>
<thead>
<row>
<entry colname="col1">Module Name</entry>
<entry colname="col2">Description/Used For</entry>
</row>
</thead>
<tbody>
<row>
<entry colname="col1"> Beautiful Soup </entry>
<entry colname="col2">Navigating HTML and XML</entry>
</row>
<row>
<entry colname="col1"> Gensim </entry>
<entry colname="col2">Topic modeling and document indexing</entry>
</row>
<row>
<entry colname="col1"> Keras (RHEL/CentOS 7 only) </entry>
<entry colname="col2">Deep learning</entry>
</row>
<row>
<entry colname="col1"> Lifelines </entry>
<entry colname="col2">Survival analysis</entry>
</row>
<row>
<entry colname="col1"> lxml </entry>
<entry colname="col2">XML and HTML processing</entry>
</row>
<row>
<entry colname="col1"> NLTK </entry>
<entry colname="col2">Natural language toolkit</entry>
</row>
<row>
<entry colname="col1"> NumPy </entry>
<entry colname="col2">Scientific computing</entry>
</row>
<row>
<entry colname="col1"> Pandas </entry>
<entry colname="col2">Data analysis</entry>
</row>
<row>
<entry colname="col1"> Pattern-en </entry>
<entry colname="col2">Part-of-speech tagging</entry>
</row>
<row>
<entry colname="col1"> pyLDAvis </entry>
<entry colname="col2">Interactive topic model visualization</entry>
</row>
<row>
<entry colname="col1"> PyMC3 </entry>
<entry colname="col2">Statistical modeling and probabilistic machine
learning</entry>
</row>
<row>
<entry colname="col1"> scikit-learn </entry>
<entry colname="col2">Machine learning data mining and analysis</entry>
</row>
<row>
<entry colname="col1"> SciPy </entry>
<entry colname="col2">Scientific computing</entry>
</row>
<row>
<entry colname="col1"> spaCy </entry>
<entry colname="col2">Large scale natural language processing</entry>
</row>
<row>
<entry colname="col1"> StatsModels </entry>
<entry colname="col2">Statistical modeling</entry>
</row>
<row>
<entry colname="col1"> Tensorflow (RHEL/CentOS 7 only) </entry>
<entry colname="col2">Numerical computation using data flow graphs</entry>
</row>
<row>
<entry colname="col1"> XGBoost </entry>
<entry colname="col2">Gradient boosting, classifying, ranking </entry>
</row>
</tbody>
</tgroup>
</table></p>
</body>
</topic>
<topic id="topic_instpdsm" xml:lang="en">
<title>Installing the Python Data Science Module Package</title>
<body>
<p>Before you install the Python Data Science Module package, make sure that your Greenplum
Database is running, you have sourced <codeph>greenplum_path.sh</codeph>, and that the
<codeph>$MASTER_DATA_DIRECTORY</codeph> and <codeph>$GPHOME</codeph> environment variables
are set.</p>
<note>The <codeph>PyMC3</codeph> module depends on <codeph>Tk</codeph>. If you want to use
<codeph>PyMC3</codeph>, you must install the <codeph>tk</codeph> OS package on every node
in your cluster. For example: <codeblock>$ yum install tk
</codeblock></note>
<ol>
<li>Locate the Python Data Science module package that you built or downloaded.<p>The file
name format of the package is
<codeph>DataSciencePython-&lt;version&gt;-relhel&lt;N&gt;-x86_64.gppkg</codeph>.</p></li>
<li>Copy the package to the Greenplum Database master host.</li>
<li>Use the <codeph>gppkg</codeph> command to install the package. For
example:<codeblock>$ gppkg -i DataSciencePython-&lt;version&gt;-relhel&lt;N&gt;-x86_64.gppkg</codeblock><p><codeph>gppkg</codeph>
installs the Python Data Science modules on all nodes in your Greenplum Database
cluster. The command also updates the <codeph>PYTHONPATH</codeph>,
<codeph>PATH</codeph>, and <codeph>LD_LIBRARY_PATH</codeph> environment variables in
your <codeph>greenplum_path.sh</codeph> file.</p></li>
<li>Restart Greenplum Database. You must re-source <codeph>greenplum_path.sh</codeph> before
restarting your Greenplum
cluster:<codeblock>$ source /usr/local/greenplum-db/greenplum_path.sh
$ gpstop -r</codeblock></li>
</ol>
<p>The Greenplum Database Python Data Science Modules are installed in the following
directory:</p>
<codeblock>$GPHOME/ext/DataSciencePython/lib/python2.7/site-packages/</codeblock>
</body>
</topic>
<topic id="topic_removepdsm" xml:lang="en">
<title>Uninstalling the Python Data Science Module Package</title>
<body>
<p>Use the <codeph>gppkg</codeph> utility to uninstall the Python Data Science Module package.
You must include the version number in the package name you provide to
<codeph>gppkg</codeph>.</p>
<p> To determine your Python Data Science Module package version number and remove this
package:</p>
<codeblock>$ gppkg -q --all | grep DataSciencePython
DataSciencePython-&lt;version&gt;
$ gppkg -r DataSciencePython-&lt;version&gt;</codeblock>
<p>The command removes the Python Data Science modules from your Greenplum Database cluster.
It also updates the <codeph>PYTHONPATH</codeph>, <codeph>PATH</codeph>, and
<codeph>LD_LIBRARY_PATH</codeph> environment variables in your
<codeph>greenplum_path.sh</codeph> file to their pre-installation values.</p>
<p>Re-source <codeph>greenplum_path.sh</codeph> and restart Greenplum Database after you
remove the Python Data Science Module package:</p>
<codeblock>$ . /usr/local/greenplum-db/greenplum_path.sh
$ gpstop -r </codeblock>
<note>When you uninstall the Python Data Science Module package from your Greenplum Database
cluster, any UDFs that you have created that import Python modules installed with this
package will return an error.</note>
</body>
</topic>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic1" xml:lang="en">
<title id="py212122">R Data Science Library Package</title>
<body>
<p> R packages are modules that contain R functions and data sets. Greenplum Database provides a
collection of data science-related R libraries that can be used with the Greenplum Database
PL/R language. You can download these libraries in <codeph>.gppkg</codeph> format from <xref
href="https://network.pivotal.io/products/pivotal-gpdb" format="html" scope="external"
>Pivotal Network</xref>.</p>
<p>This chapter contains the following information:</p>
<ul>
<li id="py2177">
<xref href="#topic2" type="topic" format="dita"/>
</li>
<li id="py21366577">
<xref href="#topic_instpdsl" type="topic" format="dita"/>
</li>
<li id="py217165">
<xref href="#topic_removepdsl" type="topic" format="dita"/>
</li>
</ul>
<p>For information about the Greenplum Database PL/R Language, see <xref scope="peer"
type="topic" format="dita" href="../ref_guide/extensions/pl_r.xml#topic1">Greenplum PL/R
Language Extension</xref>.</p>
</body>
<topic xml:lang="en" id="topic2">
<title>R Data Science Libraries</title>
<body>
<p> Libraries provided in the R Data Science package include: <simpletable id="l33">
<strow>
<stentry>
<p>abind</p>
<p>adabag</p>
<p>arm</p>
<p>assertthat</p>
<p>BH</p>
<p>bitops</p>
<p>car</p>
<p>caret</p>
<p>caTools</p>
<p>coda</p>
<p>colorspace</p>
<p>compHclust</p>
<p>curl</p>
<p>data.table</p>
<p>DBI</p>
<p>dichromat</p>
<p>digest</p>
<p>dplyr</p>
<p>e1071</p>
<p>flashClust</p>
<p>forecast</p>
<p>foreign</p>
<p>gdata</p>
<p>ggplot2</p>
</stentry>
<stentry>
<p>glmnet</p>
<p>gplots</p>
<p>gtable</p>
<p>gtools</p>
<p>hms</p>
<p>hybridHclust</p>
<p>igraph</p>
<p>labeling</p>
<p>lattice</p>
<p>lazyeval</p>
<p>lme4</p>
<p>lmtest</p>
<p>magrittr</p>
<p>MASS</p>
<p>Matrix</p>
<p>MCMCPack</p>
<p>minqa</p>
<p>MTS</p>
<p>munsell</p>
<p>neuralnet</p>
<p>nloptr</p>
<p>nnet</p>
<p>pbkrtest</p>
<p>plyr</p>
</stentry>
<stentry>
<p>quantreg</p>
<p>R2jags</p>
<p>R6</p>
<p>randomForest</p>
<p>RColorBrewer</p>
<p>Rcpp</p>
<p>RcppEigen</p>
<p>readr</p>
<p>reshape2</p>
<p>rjags</p>
<p>RobustRankAggreg</p>
<p>ROCR</p>
<p>rpart</p>
<p>RPostgreSQL</p>
<p>sandwich</p>
<p>scales</p>
<p>SparseM</p>
<p>stringi</p>
<p>stringr</p>
<p>survival</p>
<p>tibble</p>
<p>tseries</p>
<p>zoo</p>
</stentry>
</strow>
</simpletable></p>
</body>
</topic>
<topic id="topic_instpdsl" xml:lang="en">
<title>Installing the R Data Science Library Package</title>
<body>
<p>Before you install the R Data Science Library package, make sure that your Greenplum
Database is running, you have sourced <codeph>greenplum_path.sh</codeph>, and that the
<codeph>$MASTER_DATA_DIRECTORY</codeph> and <codeph>$GPHOME</codeph> environment variables
are set.</p>
<ol>
<li>Locate the R Data Science library package that you built or downloaded.<p>The file name
format of the package is
<codeph>DataScienceR-&lt;version&gt;-relhel&lt;N&gt;-x86_64.gppkg</codeph>.</p></li>
<li>Copy the package to the Greenplum Database master host.</li>
<li>Use the <codeph>gppkg</codeph> command to install the package. For
example:<codeblock>$ gppkg -i DataScienceR-&lt;version&gt;-relhel&lt;N&gt;-x86_64.gppkg</codeblock><p><codeph>gppkg</codeph>
installs the R Data Science libraries on all nodes in your Greenplum Database cluster.
The command also sets the <codeph>R_LIBS_USER</codeph> environment variable and updates
the <codeph>PATH</codeph> and <codeph>LD_LIBRARY_PATH</codeph> environment variables in
your <codeph>greenplum_path.sh</codeph> file.</p></li>
<li>Restart Greenplum Database. You must re-source <codeph>greenplum_path.sh</codeph> before
restarting your Greenplum
cluster:<codeblock>$ source /usr/local/greenplum-db/greenplum_path.sh
$ gpstop -r</codeblock></li>
</ol>
<p>The Greenplum Database R Data Science Modules are installed in the following
directory:<codeblock>$GPHOME/ext/DataScienceR/library</codeblock></p>
<note><codeph>rjags</codeph> libraries are installed in the
<codeph>$GPHOME/ext/DataScienceR/extlib/lib</codeph> directory. If you want to use
<codeph>rjags</codeph> and your <codeph>$GPHOME</codeph> is not
<codeph>/usr/local/greenplum-db</codeph>, you must perform additional configuration steps
to create a symbolic link from <codeph>$GPHOME</codeph> to
<codeph>/usr/local/greenplum-db</codeph> on each node in your Greenplum Database cluster.
For example:
<codeblock>$ gpssh -f all_hosts -e 'ln -s $GPHOME /usr/local/greenplum-db'
$ gpssh -f all_hosts -e 'chown -h gpadmin /usr/local/greenplum-db'
</codeblock></note>
</body>
</topic>
<topic id="topic_removepdsl" xml:lang="en">
<title>Uninstalling the R Data Science Library Package</title>
<body>
<p>Use the <codeph>gppkg</codeph> utility to uninstall the R Data Science Library package. You
must include the version number in the package name you provide to
<codeph>gppkg</codeph>.</p>
<p> To determine your R Data Science Library package version number and remove this
package:</p>
<codeblock>$ gppkg -q --all | grep DataScienceR
DataScienceR-&lt;version&gt;
$ gppkg -r DataScienceR-&lt;version&gt;</codeblock>
<p>The command removes the R Data Science libraries from your Greenplum Database cluster. It
also removes the <codeph>R_LIBS_USER</codeph> environment variable and updates the
<codeph>PATH</codeph> and <codeph>LD_LIBRARY_PATH</codeph> environment variables in your
<codeph>greenplum_path.sh</codeph> file to their pre-installation values.</p>
<p>Re-source <codeph>greenplum_path.sh</codeph> and restart Greenplum Database after you
remove the R Data Science Library package:</p>
<codeblock>$ . /usr/local/greenplum-db/greenplum_path.sh
$ gpstop -r </codeblock>
<note>When you uninstall the R Data Science Library package from your Greenplum Database
cluster, any UDFs that you have created that use R libraries installed with this package
will return an error.</note>
</body>
</topic>
</topic>
此差异已折叠。
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic1" xml:lang="en">
<title id="jf110126">Preface</title>
<body>
<p>This guide describes the tasks you must complete to install and start your Greenplum Database
system. </p>
<ul>
<li id="jf165405">
<xref href="#topic2" type="topic" format="dita"/>
</li>
<li id="jf165409">
<xref href="#topic4" type="topic" format="dita"/>
</li>
<li id="jf135184">
<xref href="#topic7" type="topic" format="dita"/>
</li>
</ul>
</body>
<topic id="topic2" xml:lang="en">
<title id="jf132855">About This Guide</title>
<body>
<p>This guide provides information and instructions for installing and initializing a
Greenplum Database system. This guide is intended for system administrators responsible for
building a Greenplum Database system. </p>
<p>This guide assumes knowledge of Linux/Unix system administration, database management
systems, database administration, and structured query language (SQL).</p>
<p>This guide contains the following chapters and appendices:</p>
<ul>
<li id="jf158523"><xref href="preinstall_concepts.xml#topic1" type="topic" format="dita"/>
Information about the Greenplum system architecture and components.</li>
<li id="jf158530"><xref href="capacity_planning.xml#topic1" type="topic" format="dita"/>
Guidelines for sizing a Greenplum Database system.</li>
<li id="jf158662"><xref href="prep_os_install_gpdb.xml#topic1" type="topic" format="dita"/>
— Instructions for installing and configuring the Greenplum software on all hosts in your
Greenplum Database array.</li>
<li id="jf158540"><xref href="validate.xml#topic1" type="topic" format="dita"/> — Validation
utilities and tests you can perform to ensure your Greenplum Database system will operate
properly.</li>
<li id="jf158550"><xref href="localization.xml#topic1" type="topic" format="dita"/>
Localization features of Greenplum Database. Locale settings must be configured prior to
initializing your Greenplum Database system.</li>
<li id="jf148595"><xref href="init_gpdb.xml#topic1" type="topic" format="dita"/>
Instructions for initializing a Greenplum Database system. Each database instance (the
master and all segments) must be initialized across all of the hosts in the system in such
a way that they can all work together as a unified DBMS. </li>
<li id="jf159937"><xref href="apx_mgmt_utils.xml#topic1" type="topic" format="dita"/>
Reference information about the command-line management utilities you use to install and
initialize a Greenplum Database system.</li>
<li id="jf167820"><xref href="env_var_ref.xml#topic1" type="topic"
format="dita"/> — Reference information about Greenplum environment variables you can
set in your system user's profile file.</li>
</ul>
</body>
</topic>
<topic id="topic3" xml:lang="en">
<title>About the Greenplum Database Documentation Set</title>
<body>
<p>The Greenplum Database 4.3 server documentation set consists of the following guides.</p>
<table id="jf168868">
<title>Greenplum Database server documentation set</title>
<tgroup cols="2">
<colspec colnum="1" colname="col1" colwidth="130pt"/>
<colspec colnum="2" colname="col2" colwidth="243pt"/>
<thead>
<row>
<entry colname="col1">Guide Name</entry>
<entry colname="col2">Description</entry>
</row>
</thead>
<tbody>
<row>
<entry colname="col1">
<i>Greenplum</i>
<i>Database Administrator Guide</i>
</entry>
<entry colname="col2">Information for administering the Greenplum Database system and
managing databases. It covers topics such as Greenplum Database architecture and
concepts and everyday system administration tasks such as configuring the server,
monitoring system activity, enabling high-availability, backing up and restoring
databases, and expanding the system. Database administration topics include
configuring access control, creating databases and database objects, loading data
into databases, writing queries, managing workloads, and monitoring and
troubleshooting performance.</entry>
</row>
<row>
<entry colname="col1">
<i>Greenplum</i>
<i>Database Reference Guide</i>
</entry>
<entry colname="col2">Reference information for Greenplum Database systems: SQL
commands, system catalogs, environment variables, character set support, datatypes,
the Greenplum MapReduce specification, postGIS extension, server parameters, the
gp_toolkit administrative schema, and SQL 2008 support.</entry>
</row>
<row>
<entry colname="col1">
<i>Greenplum Database Utility Guide</i>
</entry>
<entry colname="col2">Reference information for command-line utilities, client
programs, and Oracle compatibility functions.</entry>
</row>
<row>
<entry colname="col1">
<i>Greenplum Database Installation Guide</i>
</entry>
<entry colname="col2">Information and instructions for installing and initializing a
Greenplum Database system.</entry>
</row>
</tbody>
</tgroup>
</table>
</body>
</topic>
<topic id="topic4" xml:lang="en">
<title id="jf165456">Document Conventions</title>
<body>
<p>Greenplumdocumentation adheres to the following conventions to help you identify certain
types of information.</p>
<ul>
<li id="jf165467" otherprops="op-hidden">
<xref href="#topic5" type="topic" format="dita"/>
</li>
<li id="jf165471">
<xref href="#topic6" format="dita"/>
</li>
</ul>
</body>
<topic id="topic5" xml:lang="en" otherprops="op-hidden">
<title id="jf165544">Text Conventions</title>
<body>
<table id="jf165473">
<title>Text Conventions</title>
<tgroup cols="3">
<colspec colnum="1" colname="col1" colwidth="110pt"/>
<colspec colnum="2" colname="col2" colwidth="165pt"/>
<colspec colnum="3" colname="col3" colwidth="174pt"/>
<thead>
<row>
<entry colname="col1">Text Convention</entry>
<entry colname="col2">Usage</entry>
<entry colname="col3">Examples</entry>
</row>
</thead>
<tbody>
<row>
<entry colname="col1">
<b>bold</b>
</entry>
<entry colname="col2">Button, menu, tab, page, and field names in GUI
applications</entry>
<entry colname="col3">Click <b>Cancel</b> to exit the page without saving your
changes.</entry>
</row>
<row>
<entry colname="col1">
<i>italics</i>
</entry>
<entry colname="col2">New terms where they are defined<p>Database objects, such as
schema, table, or columns names</p></entry>
<entry colname="col3">The <i>master instance</i> is the <codeph>postgres</codeph>
process that accepts client connections.<p>Catalog information for Greenplum
Database resides in the <i>pg_catalog</i> schema.</p></entry>
</row>
<row>
<entry colname="col1">
<codeph>monospace</codeph>
</entry>
<entry colname="col2">File names and path names<p>Programs and
executables</p><p>Command names and syntax</p><p>Parameter names</p></entry>
<entry colname="col3">Edit the <codeph>postgresql.conf</codeph> file.<p>Use
<codeph>gpstart</codeph> to start Greenplum Database.</p></entry>
</row>
<row>
<entry colname="col1">
<varname>monospace italics</varname>
</entry>
<entry colname="col2">Variable information within file paths and file
names<p>Variable information within command syntax</p></entry>
<entry colname="col3">
<codeph>/home/gpadmin/</codeph>
<varname>config_file</varname>
<p>
<codeph>COPY</codeph>
<varname>tablename</varname>
<codeph>FROM '</codeph>
<varname>filename</varname>
<codeph>'</codeph>
</p>
</entry>
</row>
<row>
<entry colname="col1">
<b>monospace bold</b>
</entry>
<entry colname="col2">Used to call attention to a particular part of a command,
parameter, or code snippet.</entry>
<entry colname="col3">Change the host name, port, and database name in the JDBC
connection
URL:<p><codeph>jdbc:postgresql://<b>host</b>:<b>5432</b>/<b>mydb</b></codeph></p></entry>
</row>
<row>
<entry colname="col1">
<codeph>UPPERCASE</codeph>
</entry>
<entry colname="col2">Environment variables<p>SQL commands</p><p>Keyboard
keys</p></entry>
<entry colname="col3">Make sure that the Java <codeph>/bin</codeph> directory is in
your <codeph>$PATH</codeph>. <p><codeph>SELECT * FROM</codeph>
<varname>my_table</varname><codeph>;</codeph></p><p>Press
<codeph>CTRL+C</codeph> to escape.</p></entry>
</row>
</tbody>
</tgroup>
</table>
</body>
</topic>
<topic id="topic6" xml:lang="en">
<title id="jf165598">Command Syntax Conventions</title>
<body>
<table id="jf165546">
<title>Command Syntax Conventions</title>
<tgroup cols="3">
<colspec colnum="1" colname="col1" colwidth="124pt"/>
<colspec colnum="2" colname="col2" colwidth="162pt"/>
<colspec colnum="3" colname="col3" colwidth="162pt"/>
<thead>
<row>
<entry colname="col1">Text Convention</entry>
<entry colname="col2">Usage</entry>
<entry colname="col3">Examples</entry>
</row>
</thead>
<tbody>
<row>
<entry colname="col1">
<codeph>{ }</codeph>
</entry>
<entry colname="col2">Within command syntax, curly braces group related command
options. Do not type the curly braces.</entry>
<entry colname="col3">
<codeph>FROM</codeph>
<b>{</b>
<codeph>'</codeph>
<varname>filename</varname>
<codeph>' | STDIN</codeph>
<b>}</b>
</entry>
</row>
<row>
<entry colname="col1">
<codeph>[ ]</codeph>
</entry>
<entry colname="col2">Within command syntax, square brackets denote optional
arguments. Do not type the brackets.</entry>
<entry colname="col3">
<codeph>TRUNCATE</codeph>
<b>[</b>
<codeph>TABLE</codeph>
<b>]</b>
<varname>name</varname>
</entry>
</row>
<row>
<entry colname="col1">
<codeph>...</codeph>
</entry>
<entry colname="col2">Within command syntax, an ellipsis denotes repetition of a
command, variable, or option. Do not type the ellipsis.</entry>
<entry colname="col3">
<codeph>DROP TABLE </codeph>
<varname>name</varname>
<codeph>[,</codeph>
<b>...</b>
<codeph>]</codeph>
</entry>
</row>
<row>
<entry colname="col1">
<codeph>|</codeph>
</entry>
<entry colname="col2">Within command syntax, the pipe symbol denotes an "OR"
relationship. Do not type the pipe symbol.</entry>
<entry colname="col3">
<codeph>VACUUM [ FULL</codeph>
<b>|</b>
<codeph>FREEZE ]</codeph>
</entry>
</row>
<row>
<entry colname="col1"><codeph>$</codeph> system_command<p><codeph>#</codeph>
root_system_command</p><p><codeph>=&gt;</codeph>
gpdb_command</p><p><codeph>=#</codeph> su_gpdb_command</p></entry>
<entry colname="col2">Denotes a command prompt - do not type the prompt symbol.
<codeph>$</codeph> and <codeph>#</codeph> denote terminal command prompts.
<codeph>=&gt;</codeph> and <codeph>=#</codeph> denote Greenplum Database
interactive program command prompts (<codeph>psql </codeph>or
<codeph>gpssh</codeph>, for example).</entry>
<entry colname="col3">$ <codeph>createdb mydatabase</codeph><p># <codeph>chown
gpadmin -R /datadir</codeph></p><p>=&gt; <codeph>SELECT * FROM
mytable;</codeph></p><p>=# <codeph>SELECT * FROM
pg_database;</codeph></p></entry>
</row>
</tbody>
</tgroup>
</table>
</body>
</topic>
<topic id="topic7" xml:lang="en">
<title id="jf165600">Getting Support</title>
<body>
<p>For technical support, documentation, release notes, software updates, or for
information about Pivotal products, licensing, and services, go to <xref
href="http://www.pivotal.io" scope="external" format="html"
>www.pivotal.io</xref>.</p>
</body>
</topic>
</topic>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="gpactivatestandby" conref="../../utility_guide/admin_utilities/gpactivatestandby.xml">
<title>gpactivatestandby</title>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic conref="../../utility_guide/admin_utilities/gpaddmirrors.xml" id="gpaddmirrors">
<title>gpaddmirrors</title>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="gpcheck" conref="../../utility_guide/admin_utilities/gpcheck.xml">
<title>gpcheck</title>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="gpcheckperf" conref="../../utility_guide/admin_utilities/gpcheckperf.xml">
<title>gpcheckperf</title>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="gpdeletesystem" conref="../../utility_guide/admin_utilities/gpdeletesystem.xml">
<title>gpdeletesystem</title>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="gpinitstandby" conref="../../utility_guide/admin_utilities/gpinitstandby.xml">
<title>gpinitstandby</title>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="gpinitsystem" conref="../../utility_guide/admin_utilities/gpinitsystem.xml">
<title>gpinitsystem</title>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="gppkg" conref="../../utility_guide/admin_utilities/gppkg.xml">
<title>gppkg</title>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="gpscp" conref="../../utility_guide/admin_utilities/gpscp.xml">
<title>gpscp</title>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="gpseginstall" conref="../../utility_guide/admin_utilities/gpseginstall.xml">
<title>gpseginstall</title>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="gpssh-exkeys" conref="../../utility_guide/admin_utilities/gpssh-exkeys.xml">
<title>gpssh-exkeys</title>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="gpssh" conref="../../utility_guide/admin_utilities/gpssh.xml">
<title>gpssh</title>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="gpstart" conref="../../utility_guide/admin_utilities/gpstart.xml">
<title>gpstart</title>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="gpstop" conref="../../utility_guide/admin_utilities/gpstop.xml">
<title>gpstop</title>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic1" xml:lang="en">
<title id="jj138244">Validating Your Systems</title>
<shortdesc>Validate your operating system settings, hardware, and network.</shortdesc>
<body>
<p>Greenplum provides the following utilities to validate the configuration and performance of
your systems: </p>
<ul>
<li id="jj161657"><codeph><xref href="../utility_guide/admin_utilities/gpcheck.xml" type="topic" format="dita"
scope="peer">gpcheck</xref></codeph></li>
<li id="jj161668"><codeph><xref href="../utility_guide/admin_utilities/gpcheckperf.xml" type="topic" format="dita"
scope="peer">gpcheckperf</xref></codeph></li>
</ul>
<p>These utilities can be found in <codeph>$GPHOME/bin</codeph> of your Greenplum
installation.</p>
<p>The following tests should be run prior to initializing your Greenplum Database system. </p>
</body>
<topic id="topic2" xml:lang="en">
<title id="jj161684">Validating OS Settings</title>
<body>
<p>Greenplum provides a utility called <codeph><xref href="../utility_guide/admin_utilities/gpcheck.xml"
type="topic" format="dita" scope="peer">gpcheck</xref></codeph> that can be used to
verify that all hosts in your array have the recommended OS settings for running a
production Greenplum Database system. To run <codeph>gpcheck</codeph>:</p>
<ol>
<li id="jj161449">Log in on the master host as the <codeph>gpadmin</codeph> user.</li>
<li id="jj167683">Source the <codeph>greenplum_path.sh</codeph> path file from your
Greenplum installation. For
example:<codeblock>$ source /usr/local/greenplum-db/greenplum_path.sh</codeblock></li>
<li id="jj161454">Create a file called <codeph>hostfile_gpcheck</codeph> that has the
machine-configured host names of each Greenplum host (master, standby master and
segments), one host name per line. Make sure there are no blank lines or extra spaces.
This file should just have a <i>single</i> host name per host. For
example:<codeblock>mdw
smdw
sdw1
sdw2
sdw3</codeblock></li>
<li id="jj161459">Run the <codeph><xref href="../utility_guide/admin_utilities/gpcheck.xml" type="topic"
format="dita" scope="peer">gpcheck</xref></codeph> utility using the host file you
just created. For
example:<codeblock>$ gpcheck -f hostfile_gpcheck -m mdw -s smdw</codeblock></li>
<li id="jj167249">After <codeph><xref href="../utility_guide/admin_utilities/gpcheck.xml" type="topic"
format="dita" scope="peer">gpcheck</xref></codeph> finishes verifying OS
parameters on all hosts (masters and segments), you might be prompted to modify certain OS
parameters before initializing your Greenplum Database system.</li>
</ol>
</body>
</topic>
<topic id="topic3" xml:lang="en">
<title id="jj163212">Validating Hardware Performance</title>
<body>
<p>Greenplum provides a management utility called <codeph><xref
href="../utility_guide/admin_utilities/gpcheckperf.xml" type="topic" format="dita" scope="peer"
>gpcheckperf</xref></codeph>, which can be used to identify hardware and system-level
issues on the machines in your Greenplum Database array. <codeph><xref
href="../utility_guide/admin_utilities/gpcheckperf.xml" type="topic" format="dita" scope="peer"
>gpcheckperf</xref></codeph> starts a session on the specified hosts and runs the
following performance tests:</p>
<ul>
<li id="jj165025">Network Performance (<codeph>gpnetbench*</codeph>)</li>
<li id="jj165034">Disk I/O Performance (<codeph>dd</codeph> test)</li>
<li id="jj165035">Memory Bandwidth (<codeph>stream</codeph> test)</li>
</ul>
<p>Before using <codeph>gpcheckperf</codeph>, you must have a trusted host setup between the
hosts involved in the performance test. You can use the utility <codeph><xref
href="../utility_guide/admin_utilities/gpssh-exkeys.xml" type="topic" format="dita" scope="peer"
>gpssh-exkeys</xref></codeph> to update the known host files and exchange public keys
between hosts if you have not done so already. Note that <codeph>gpcheckperf</codeph> calls
to <codeph><xref href="../utility_guide/admin_utilities/gpssh.xml" type="topic" format="dita" scope="peer"
>gpssh</xref></codeph> and <codeph><xref href="../utility_guide/admin_utilities/gpscp.xml" type="topic"
format="dita" scope="peer">gpscp</xref></codeph>, so these Greenplum utilities must
be in your <codeph>$PATH</codeph>.</p>
</body>
<topic id="topic4" xml:lang="en">
<title>Validating Network Performance</title>
<body>
<p>To test network performance, run <codeph><xref href="../utility_guide/admin_utilities/gpcheckperf.xml"
type="topic" format="dita" scope="peer">gpcheckperf</xref></codeph> with one of
the network test run options: parallel pair test (<codeph>-r N</codeph>), serial pair test
(<codeph>-r n</codeph>), or full matrix test (<codeph>-r M</codeph>). The utility runs a
network benchmark program that transfers a 5 second stream of data from the current host
to each remote host included in the test. By default, the data is transferred in parallel
to each remote host and the minimum, maximum, average and median network transfer rates
are reported in megabytes (MB) per second. If the summary transfer rate is slower than
expected (less than 100 MB/s), you can run the network test serially using the <codeph>-r
n</codeph> option to obtain per-host results. To run a full-matrix bandwidth test, you
can specify <codeph>-r M</codeph> which will cause every host to send and receive data
from every other host specified. This test is best used to validate if the switch fabric
can tolerate a full-matrix workload.</p>
<p>Most systems in a Greenplum Database array are configured with multiple network interface
cards (NICs), each NIC on its own subnet. When testing network performance, it is
important to test each subnet individually. For example, considering the following network
configuration of two NICs per host:</p>
<table id="jj165255">
<title>Example Network Interface Configuration</title>
<tgroup cols="3">
<colspec colnum="1" colname="col1" colwidth="139pt"/>
<colspec colnum="2" colname="col2" colwidth="118pt"/>
<colspec colnum="3" colname="col3" colwidth="118pt"/>
<thead>
<row>
<entry colname="col1">Greenplum Host</entry>
<entry colname="col2">Subnet1 NICs</entry>
<entry colname="col3">Subnet2 NICs</entry>
</row>
</thead>
<tbody>
<row>
<entry colname="col1">Segment 1</entry>
<entry colname="col2">sdw1-1</entry>
<entry colname="col3">sdw1-2</entry>
</row>
<row>
<entry colname="col1">Segment 2</entry>
<entry colname="col2">sdw2-1</entry>
<entry colname="col3">sdw2-2</entry>
</row>
<row>
<entry colname="col1">Segment 3</entry>
<entry colname="col2">sdw3-1</entry>
<entry colname="col3">sdw3-2</entry>
</row>
</tbody>
</tgroup>
</table>
<p>You would create four distinct host files for use with the <codeph><xref
href="../utility_guide/admin_utilities/gpcheckperf.xml" type="topic" format="dita" scope="peer"
>gpcheckperf</xref></codeph> network test:</p>
<table id="jj165603">
<title>Example Network Test Host File Contents</title>
<tgroup cols="2">
<colspec colnum="1" colname="col1" colwidth="188pt"/>
<colspec colnum="2" colname="col2" colwidth="188pt"/>
<thead>
<row>
<entry colname="col1">hostfile_gpchecknet_ic1</entry>
<entry colname="col2">hostfile_gpchecknet_ic2</entry>
</row>
</thead>
<tbody>
<row>
<entry colname="col1">sdw1-1</entry>
<entry colname="col2">sdw1-2</entry>
</row>
<row>
<entry colname="col1">sdw2-1</entry>
<entry colname="col2">sdw2-2</entry>
</row>
<row>
<entry colname="col1">sdw3-1</entry>
<entry colname="col2">sdw3-2</entry>
</row>
</tbody>
</tgroup>
</table>
<p>You would then run <codeph><xref href="../utility_guide/admin_utilities/gpcheckperf.xml" type="topic"
format="dita" scope="peer">gpcheckperf</xref></codeph> once per subnet. For
example (if testing an <i>even</i> number of hosts, run in parallel pairs test mode):</p>
<codeblock>$ gpcheckperf -f hostfile_gpchecknet_ic1 -r N -d /tmp &gt; subnet1.out
$ gpcheckperf -f hostfile_gpchecknet_ic2 -r N -d /tmp &gt; subnet2.out</codeblock>
<p>If you have an <i>odd</i> number of hosts to test, you can run in serial test mode
(<codeph>-r n</codeph>).</p>
</body>
</topic>
</topic>
<topic id="topic5" xml:lang="en">
<title id="jj164897">Validating Disk I/O and Memory Bandwidth</title>
<body>
<p>To test disk and memory bandwidth performance, run <codeph><xref
href="../utility_guide/admin_utilities/gpcheckperf.xml" type="topic" format="dita" scope="peer"
>gpcheckperf</xref></codeph> with the disk and stream test run options (<codeph>-r
ds</codeph>). The disk test uses the <codeph>dd</codeph> command (a standard UNIX utility)
to test the sequential throughput performance of a logical disk or file system. The memory
test uses the STREAM benchmark program to measure sustainable memory bandwidth. Results are
reported in MB per second (MB/s).</p>
<section id="jj161569">
<title>To run the disk and stream tests</title>
<ol>
<li id="jj167791">Log in on the master host as the <codeph>gpadmin</codeph> user.</li>
<li id="jj167824">Source the <codeph>greenplum_path.sh</codeph> path file from your
Greenplum installation. For
example:<codeblock>$ source /usr/local/greenplum-db/greenplum_path.sh</codeblock></li>
<li id="jj167777">Create a host file named <codeph>hostfile_gpcheckperf</codeph> that has
one host name per segment host. Do not include the master host. For
example:<codeblock>sdw1
sdw2
sdw3
sdw4</codeblock></li>
<li id="jj162990">Run the <codeph>gpcheckperf</codeph> utility using the
<codeph>hostfile_gpcheckperf</codeph> file you just created. Use the
<codeph>-d</codeph> option to specify the file systems you want to test on each host
(you must have write access to these directories). You will want to test all primary and
mirror segment data directory locations. For
example:<codeblock>$ gpcheckperf -f hostfile_gpcheckperf -r ds -D \
  -d /data1/primary -d /data2/primary \
  -d /data1/mirror -d /data2/mirror</codeblock></li>
<li id="jj163241">The utility may take a while to perform the tests as it is copying very
large files between the hosts. When it is finished you will see the summary results for
the Disk Write, Disk Read, and Stream tests.</li>
</ol>
</section>
</body>
</topic>
</topic>
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册