perf_issues.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
  PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic1" xml:lang="en">
  <title id="jb177957">Common Causes of Performance Issues</title>
  <shortdesc>This section explains the troubleshooting processes for common performance issues and
    potential solutions to these issues. </shortdesc>
  <body/>
  <topic id="topic2" xml:lang="en">
    <title id="jb150555">Identifying Hardware and Segment Failures</title>
    <body>
      <p>The performance of Greenplum Database depends on the hardware and IT
        infrastructure on which it runs. Greenplum Database is comprised of several
        servers (hosts) acting together as one cohesive system (array); as a first step in
        diagnosing performance problems, ensure that all Greenplum Database segments
        are online. Greenplum Database's performance will be as fast as the slowest
        host in the array. Problems with CPU utilization, memory management, I/O processing, or
        network load affect performance. Common hardware-related issues are:</p>
      <ul>
        <li id="jb155623"><b>Disk Failure</b> – Although a single disk failure should not
          dramatically affect database performance if you are using RAID, disk resynchronization
          does consume resources on the host with failed disks. The <codeph>gpcheckperf</codeph>
          utility can help identify segment hosts that have disk I/O issues.</li>
        <li id="jb155625"><b>Host Failure</b> – When a host is offline, the segments on that host
          are nonoperational. This means other hosts in the array must perform twice their usual
          workload because they are running the primary segments and multiple mirrors. If mirrors
          are not enabled, service is interrupted. Service is temporarily interrupted to recover
          failed segments. The <codeph>gpstate</codeph> utility helps identify failed segments.</li>
        <li id="jb155741"><b>Network Failure</b> – Failure of a network interface card, a switch, or
          DNS server can bring down segments. If host names or IP addresses cannot be resolved
          within your Greenplum array, these manifest themselves as
          interconnect errors in Greenplum Database. The <codeph>gpcheckperf</codeph>
          utility helps identify segment hosts that have network issues.</li>
        <li id="jb155762"><b>Disk Capacity</b> – Disk capacity on your segment hosts should never
          exceed 70 percent full. Greenplum Database needs some free space for runtime
          processing. <ph>To reclaim disk space that deleted rows occupy, run
              <codeph>VACUUM</codeph> after loads or updates. </ph>The <i>gp_toolkit</i>
          administrative schema has many views for checking the size of distributed database
          objects. <p>See the <i>Greenplum Database Reference Guide</i> for
            information about checking database object sizes and disk space.</p></li>
      </ul>
    </body>
  </topic>
  <topic id="topic3" xml:lang="en">
    <title id="jb155843">Managing Workload</title>
    <body>
      <p>A database system has a limited CPU capacity, memory, and disk I/O resources. When multiple
        workloads compete for access to these resources, database performance suffers. Resource
        management maximizes system throughput while meeting varied business requirements. Greenplum
        Database provides resource queues and resource groups to help you manage these 
        system resources.</p>
      <p>Resource queues and resource groups limit resource usage and the total number of
        concurrent queries executing in the particular queue or group. By assigning database roles
        to the appropriate queue or group, administrators can control concurrent user queries
         and prevent system overload. For more information about resource queues and resource
         groups, including selecting the appropriate scheme for your Greenplum Database environment,
         see 
          <xref href="./wlmgmt.xml#topic1" type="topic" format="dita"/>.</p>
      <p>Greenplum Database administrators should run maintenance workloads such as
        data loads and <codeph>VACUUM ANALYZE</codeph> operations after business hours. Do not
        compete with database users for system resources; perform administrative tasks at low-usage
        times.</p>
    </body>
  </topic>
  <topic id="topic4" xml:lang="en">
    <title id="jb155291">Avoiding Contention</title>
    <body>
      <p>Contention arises when multiple users or workloads try to use the system in a conflicting
        way; for example, contention occurs when two transactions try to update a table
        simultaneously. A transaction that seeks a table-level or row-level lock will wait
        indefinitely for conflicting locks to be released. Applications should not hold transactions
        open for long periods of time, for example, while waiting for user input.</p>
    </body>
  </topic>
  <topic id="topic5" xml:lang="en">
    <title id="jb155292">Maintaining Database Statistics</title>
    <body>
      <p>Greenplum Database uses a cost-based query optimizer that relies on database
        statistics. Accurate statistics allow the query optimizer to better estimate the number of
        rows retrieved by a query to choose the most efficient query plan. Without database
        statistics, the query optimizer cannot estimate how many records will be returned. The
        optimizer does not assume it has sufficient memory to perform certain operations such as
        aggregations, so it takes the most conservative action and does these operations by reading
        and writing from disk. This is significantly slower than doing them in memory.
          <cmdname>ANALYZE</cmdname> collects statistics about the database that the query optimizer
        needs.<note>When executing an SQL command with GPORCA, Greenplum Database issues a warning if the command performance could be improved by
          collecting statistics on a column or set of columns referenced by the command. The warning
          is issued on the command line and information is added to the Greenplum Database log file. For information about collecting statistics on table
          columns, see the <cmdname>ANALYZE</cmdname> command in the <cite>Greenplum Database Reference Guide</cite></note></p>
    </body>
    <topic id="topic6" xml:lang="en">
      <title>Identifying Statistics Problems in Query Plans</title>
      <body>
        <p>Before you interpret a query plan for a query using <ph>EXPLAIN</ph> or <codeph>EXPLAIN
            ANALYZE</codeph>, familiarize yourself with the data to help identify possible
          statistics problems. Check the plan for the following indicators of inaccurate
          statistics:</p>
        <ul>
          <li id="jb176751"><b>Are the optimizer's estimates close to reality?</b> Run
              <codeph>EXPLAIN ANALYZE</codeph> and see if the number of rows the optimizer estimated
            is close to the number of rows the query operation returned .</li>
          <li id="jb158716"><b>Are selective predicates applied early in the plan?</b> The most
            selective filters should be applied early in the plan so fewer rows move up the plan
            tree.</li>
          <li id="jb158755"><b>Is the optimizer choosing the best join order?</b> When you have a
            query that joins multiple tables, make sure the optimizer chooses the most selective
            join order. Joins that eliminate the largest number of rows should be done earlier in
            the plan so fewer rows move up the plan tree.</li>
        </ul>
        <p>See <xref href="query/topics/query-profiling.xml#topic39"/> for more information about
          reading query plans.</p>
      </body>
    </topic>
    <topic id="topic7" xml:lang="en">
      <title>Tuning Statistics Collection</title>
      <body>
        <p>The following configuration parameters control the amount of data sampled for statistics
          collection:</p>
        <ul>
          <li id="jb158767">
            <codeph>default_statistics_target</codeph>
          </li>
        </ul>
        <p>These parameters control statistics sampling at the system level. It is better to sample
          only increased statistics for columns used most frequently in query predicates. You can
          adjust statistics for a particular column using the command:</p>
        <p>
          <codeph>ALTER TABLE...SET STATISTICS</codeph>
        </p>
        <p>For example:</p>
        <p>
          <codeblock>ALTER TABLE sales ALTER COLUMN region SET STATISTICS 50;
</codeblock>
        </p>
        <p>This is equivalent to changing <codeph>default_statistics_target</codeph> for a
          particular column. Subsequent <codeph>ANALYZE</codeph> operations will then gather more
          statistics data for that column and produce better query plans as a result.</p>
      </body>
    </topic>
  </topic>
  <topic id="topic8" xml:lang="en">
    <title id="jb155293">Optimizing Data Distribution</title>
    <body>
      <p>When you create a table in Greenplum Database, you must declare a
        distribution key that allows for even data distribution across all segments in the system.
        Because the segments work on a query in parallel, Greenplum Database will
        always be as fast as the slowest segment. If the data is unbalanced, the segments that have
        more data will return their results slower and therefore slow down the entire system.</p>
    </body>
  </topic>
  <topic id="topic9" xml:lang="en">
    <title id="jb155264">Optimizing Your Database Design</title>
    <body>
      <p>Many performance issues can be improved by database design. Examine your database design
        and consider the following: </p>
      <ul>
        <li id="jb155523">Does the schema reflect the way the data is accessed? </li>
        <li id="jb155527">Can larger tables be broken down into partitions? </li>
        <li id="jb155528">Are you using the smallest data type possible to store column values? </li>
        <li id="jb155534">Are columns used to join tables of the same datatype? </li>
        <li id="jb155543">Are your indexes being used?</li>
      </ul>
    </body>
    <topic id="topic10" xml:lang="en">
      <title>Greenplum Database Maximum Limits</title>
      <body>
        <p>To help optimize database design, review the maximum limits that Greenplum Database supports:</p>
        <table id="jb163544">
          <title>Maximum Limits of Greenplum Database</title>
          <tgroup cols="2">
            <colspec colnum="1" colname="col1" colwidth="180pt"/>
            <colspec colnum="2" colname="col2" colwidth="180pt"/>
            <thead>
              <row>
                <entry colname="col1">Dimension</entry>
                <entry colname="col2">Limit</entry>
              </row>
            </thead>
            <tbody>
              <row>
                <entry colname="col1">Database Size</entry>
                <entry colname="col2">Unlimited</entry>
              </row>
              <row>
                <entry colname="col1">Table Size</entry>
                <entry colname="col2">Unlimited, 128 TB per partition per segment</entry>
              </row>
              <row>
                <entry colname="col1">Row Size</entry>
                <entry colname="col2">1.6 TB (1600 columns * 1 GB)</entry>
              </row>
              <row>
                <entry colname="col1">Field Size</entry>
                <entry colname="col2">1 GB</entry>
              </row>
              <row>
                <entry colname="col1">Rows per Table</entry>
                <entry colname="col2">281474976710656 (2^48)</entry>
              </row>
              <row>
                <entry colname="col1">Columns per Table/View</entry>
                <entry colname="col2">1600</entry>
              </row>
              <row>
                <entry colname="col1">Indexes per Table</entry>
                <entry colname="col2">Unlimited</entry>
              </row>
              <row>
                <entry colname="col1">Columns per Index</entry>
                <entry colname="col2">32</entry>
              </row>
              <row>
                <entry colname="col1">Table-level Constraints per Table</entry>
                <entry colname="col2">Unlimited</entry>
              </row>
              <row>
                <entry colname="col1">Table Name Length</entry>
                <entry colname="col2">63 Bytes (Limited by <i>name</i> data type)</entry>
              </row>
            </tbody>
          </tgroup>
        </table>
        <p>Dimensions listed as unlimited are not intrinsically limited by Greenplum Database. However, they are limited in practice to available disk space
          and memory/swap space. Performance may suffer when these values are unusually large.</p>
        <note type="note">
          <p>There is a maximum limit on the number of objects (tables<ph>,
              indexes,</ph> and views, but not rows) that may exist at one time. This limit is
            4294967296 (2^32).</p>
        </note>
      </body>
    </topic>
  </topic>
</topic>