diff --git a/gpdb-doc/dita/analytics/analytics.ditamap b/gpdb-doc/dita/analytics/analytics.ditamap index 110788ac78718a142d885b2d08368fcef379f1b7..a48f78574aa6534a7ef636c10b8e841c831de8a8 100644 --- a/gpdb-doc/dita/analytics/analytics.ditamap +++ b/gpdb-doc/dita/analytics/analytics.ditamap @@ -12,7 +12,13 @@ - + + + + + + + diff --git a/gpdb-doc/dita/analytics/graphics/pl_container_architecture.png b/gpdb-doc/dita/analytics/graphics/pl_container_architecture.png new file mode 100644 index 0000000000000000000000000000000000000000..36acff43a8943630eece62facaea649cd6508a3f Binary files /dev/null and b/gpdb-doc/dita/analytics/graphics/pl_container_architecture.png differ diff --git a/gpdb-doc/dita/analytics/pl_container.xml b/gpdb-doc/dita/analytics/pl_container.xml index 62c950a466bd44dc65571048d01361fdc70ab77f..4e21c04ea3d9b6c8668cdfd711dfb090d6943434 100644 --- a/gpdb-doc/dita/analytics/pl_container.xml +++ b/gpdb-doc/dita/analytics/pl_container.xml @@ -4,430 +4,329 @@ PL/Container Language -

This section includes the following information about PL/Container 1.1 and later:

+

User defined functions (UDFs), using the Greenplum supported procedural languages, PL/Python + and PL/R, should only be created and used by trusted Greenplum Database administrators. This + restricts the data scientists using their own functions for training and debugging models. + PL/Container overcomes this limitation by allowing users to run UDFs inside a restricted + environment with no security risks.

+

This topic covers information about the architecture, installation, and setup of + PL/Container:

    -
  • -
  • -
  • -
  • -
  • +
  • -
  • -
  • -
  • +
  • +
  • -
  • -
  • -
  • -
  • -
+

For detailed information about using PL/Container, refer to:

+

+

    +
  • PL/Container Resource + Management
  • +
  • PL/Container Functions
  • +
+

+

The PL/Container language extension is available as an open source module. For information + about the module, see the README file in the GitHub repository at https://github.com/greenplum-db/plcontainer

- - About the PL/Container Language Extension + + About PL/Container Language Extension -

The Greenplum Database PL/Container language extension (PL/Container) is an interface that - allows Greenplum Database to interact with a Docker container to execute a user-defined - function (UDF) in the container. Docker containers ensure the user code cannot access the - file system of the source host. Also, containers are started without network access or with - limited network access and cannot connect back to Greenplum Database or open any other - external connections. For information about available UDF languages, see

-

Generally speaking, a Docker container is a Linux process that runs in a managed way - by using Linux kernel features such as cgroups, namespaces and union file systems. A Docker - image is the basis of a container. A Docker container is a running instance of a - Docker image. When you start a Docker container you specify a Docker image. A Docker image - is the collection of root filesystem changes and execution parameters that are used when you - run a Docker container on the host system. An image does not have state and never changes. - For information about Docker, see the Docker web site https://www.docker.com/.

-

Greenplum Database starts a container only on the first call to a function in that - container. For example, consider a query that selects table data using all available - segments, and applies a transformation to the data using a PL/Container function. In this - case, Greenplum Database would start the Docker container only once on each segment, and - then contact the running container to obtain the results.

-

After starting a full cycle of a query execution, the executor sends a call to the - container. The container might respond with an SPI - SQL query executed by the container to - get some data back from the database, returning the result to the query executor.

-

The container shuts down when the connection to it is closed. This occurs when you close - the Greenplum Database session that started the container. A container running in standby - mode has almost no consumption of CPU resources as it is waiting on the socket. PL/Container - memory consumption depends on the amount of data you cache in global dictionaries.

-

The PL/Container language extension is available as an open source module. For information - about the module, see the README file in the GitHub repository at https://github.com/greenplum-db/plcontainer.

- -
- - Requirements and Limitations - -
- Requirements -

These are requirements for running PL/Container with Greenplum Database:

-
    -
  • PL/Container is supported with these Greenplum Database releases:
      -
    • PL/Container 1.x and 2.0.x is supported on Greenplum Database 5.2.x and later on - Red Hat Enterprise Linux (RHEL) 7.x (or later) and CentOS 7.x (or later).
    • -
    • PL/Container 2.1.0 and later is supported on Greenplum Database 6 on CentOS 7.x - (or later), RHEL 7.x (or later), or Ubuntu 18.04. - PL/Container 2.1.0 and later supports Docker images with Python 3 - installed.
    • -
    - PL/Container is not supported when Greenplum Database is run within - a Docker container.
  • -
  • For the Docker host RHEL 7.x and CentOS 7.x operating system, the minimum supported - Linux OS kernel version is 3.10.

    You can check your kernel version with the command - uname -r.

  • -
-
    -
  • These are the minimum Docker versions that must be installed on Greenplum Database - hosts (master, primary and all standby hosts):
      -
    • For PL/Container 1.x or 2.0.x on RHEL or CentOS 7.x - Docker 17.05
    • -
    • -

      For PL/Container 2.1.x on CentOS or RHEL 7.x, or Ubuntu 18.04 - Docker 19.03

      -
    • -

    See .

  • -
  • On each Greenplum Database host, the gpadmin user should be part of - the docker group for the user to be able to manage Docker images and - containers.
  • -
-
-
- Limitations -

These are PL/Container limitations:

-
    -
  • Python and R call stack information is not displayed when debugging a UDF.
  • -
  • The plpy.execute() methods nrows() and - status() are not supported.
  • -
  • Multi-dimensional arrays are not supported.
  • -
  • The PL/Python function plpy.SPIError() is not supported.
  • -
  • Executing the SAVEPOINT command with plpy.execute() - is not supported.
  • -
  • Greenplum Database domains are not supported.
  • -
  • The DO command is not supported.
  • -
  • Only PL/Container 1.2.x and later can be managed by Greenplum Database resource - groups. Resource Groups are available in Greenplum Database 5.8.0 and later. See .
  • -
  • OUT parameters are not supported.
  • -
  • The Python dict type cannot be returned from a PL/Python UDF. When - returning the Python dict type from a UDF, you can convert the - dict type to a Greenplum Database user-defined data type (UDT).
  • -
  • You cannot upgrade from version 1.0 to 1.1 or - later with the gppkg utility -u option. You uninstall - version 1.0 and install the new version. See .
  • -
+

The Greenplum Database PL/Container language extension allows users to create and run + PL/Python or PL/R user-defined functions (UDFs) securely, inside a Docker container. A + Docker container is a Linux process that runs in a managed way by using Linux kernel + features such as cgroups, namespaces and union file systems. For information about Docker, + see the Docker web site.

+

Running UDFs inside the Docker container has several benefits:

+
    +
  • ensures isolation of the execution process in a separate environment and allows + decoupling of the data processing. SQL operators such as "scan," "filter," and "project" + are executed at the query executor (QE) side, and advanced data analysis is executed at + the container side.
  • +
  • the user code cannot access the OS or the file system of the local host.
  • +
  • the code cannot introduce any security risks.
  • +
  • the functions cannot connect back to the Greenplum Database if the container is started + with limited or no network access.
  • +
  • the user functions cannot open any unsecure external connections.
  • +
+ +
+ PL/Container Architecture + + + + + +

Example of the process flow:

+

Consider a query that selects table data using all available segments, and transforms the + data using a PL/Container function. On the first call to a function in a segment + container, the query executor on master host starts the container on that segment host. It + then contacts the running container to obtain the results. The container might respond + with an SPI - a SQL query executed by the container to get some data back from the + database - returning the result to the query executor.

+

A container running in standby mode waits on the socket and does not consume any CPU + resources. PL/Container memory consumption depends on the amount of data cached in global + dictionaries.

+

The container connection is closed by closing the Greenplum Database session that started + the container, and the container shuts down.

+
- - About PL/Container Resource Management - -

Greenplum Database runs PL/Container user-defined functions in Docker containers. The - Docker containers and the Greenplum Database server share CPU and memory resources on the - same hosts. In the default case, Greenplum Database is unaware of the resources consumed by - running PL/Container instances. Using PL/Container 1.2 and later with Greenplum Database - 5.8.0 and later, you can use Greenplum Database resource groups to control overall CPU and - memory resource usage for running PL/Container instances, as described in the following - section.

-

PL/Container manages resource usage at two levels - the container level and the runtime - level. You can control container-level CPU and memory resources with the - memory_mb and cpu_share settings that you configure for - the PL/Container runtime. memory_mb governs the memory resources available - to each container instance. The cpu_share setting identifies the relative - weighting of a container's CPU usage compared to other containers. Refer to for PL/Container configuration information.

-

You cannot, by default, restrict the number of executing PL/Container container instances, - nor can you restrict the total amount of memory or CPU resources that they consume.

- - - Using Resource Groups to Manage PL/Container Resources - -

With PL/Container 1.2.0 and later, you can use Greenplum Database resource groups to - manage and limit the total CPU and memory resources of containers in PL/Container - runtimes. For more information about enabling, configuring, and using Greenplum Database - resource groups, refer to Using Resource Groups in the Greenplum Database - Administrator Guide.

- If you do not explicitly configure resource groups for a PL/Container runtime, its - container instances are limited only by system resources. The containers may consume - resources at the expense of the Greenplum Database server. -

Resource groups for external components such as PL/Container use Linux control groups - (cgroups) to manage component-level use of memory and CPU resources. When you manage - PL/Container resources with resource groups, you configure both a memory limit and a CPU - limit that Greenplum Database applies to all container instances that share the same - PL/Container runtime configuration.

-

When you create a resource group to manage the resources of a PL/Container runtime, you - must specify MEMORY_AUDITOR=cgroup and CONCURRENCY=0 in - addition to the required CPU and memory limits. For example, the following command creates - a resource group named plpy_run1_rg for a PL/Container runtime: - CREATE RESOURCE GROUP plpy_run1_rg WITH (MEMORY_AUDITOR=cgroup, CONCURRENCY=0, - CPU_RATE_LIMIT=10, MEMORY_LIMIT=10);

-

PL/Container does not use the MEMORY_SHARED_QUOTA and - MEMORY_SPILL_RATIO resource group memory limits. Refer to the - CREATE RESOURCE GROUP reference page for - detailed information about this SQL command.

-

You can create one or more resource groups to manage your running PL/Container instances. - After you create a resource group for PL/Container, you assign the resource group to one - or more PL/Container runtimes. You make this assignment using the groupid - of the resource group. You can determine the groupid for a given resource - group name from the gp_resgroup_config - gp_toolkit view. For example, the following query displays the - groupid of a resource group named - plpy_run1_rg:SELECT groupname, groupid FROM gp_toolkit.gp_resgroup_config - WHERE groupname='plpy_run1_rg'; - - groupname | groupid ---------------+---------- - plpy_run1_rg | 16391 -(1 row)

-

You assign a resource group to a PL/Container runtime configuration by specifying the - -s resource_group_id=rg_groupid option to the - plcontainer runtime-add (new runtime) or plcontainer - runtime-replace (existing runtime) commands. For example, to assign the - plpy_run1_rg resource group to a new PL/Container runtime named - python_run1: - plcontainer runtime-add -r python_run1 -i pivotaldata/plcontainer_python_shared:devel -l python -s resource_group_id=16391

-

You can also assign a resource group to a PL/Container runtime using the - plcontainer runtime-edit command. For information about the - plcontainer command, see .

-

After you assign a resource group to a PL/Container runtime, all container instances that - share the same runtime configuration are subject to the memory limit and the CPU limit - that you configured for the group. If you decrease the memory limit of a PL/Container - resource group, queries executing in running containers in the group may fail with an out - of memory error. If you drop a PL/Container resource group while there are running - container instances, Greenplum Database kills the running containers.

- -
- - Configuring Resource Groups for PL/Container - -

To use Greenplum Database resource groups to manage PL/Container resources, you must - explicitly configure both resource groups and PL/Container.

- PL/Container 1.2 and later utilizes the new resource group capabilities introduced in - Greenplum Database 5.8.0. If you downgrade to a Greenplum Database system that uses - PL/Container 1.1. or earlier, you must use plcontainer runtime-edit to - remove any resource_group_id settings from your PL/Container runtime - configuration. - - - Procedure - -

Perform the following procedure to configure PL/Container to use Greenplum Database - resource groups for CPU and memory resource management:

-
    -
  1. If you have not already configured and enabled resource groups in your Greenplum - Database deployment, configure cgroups and enable Greenplum Database resource groups - as described in Using Resource Groups in the Greenplum Database - Administrator Guide. - If you have previously configured and enabled resource groups in your - deployment, ensure that the Greenplum Database resource group - gpdb.conf cgroups configuration file includes a memory { - } block as described in the previous link.
  2. -
  3. Analyze the resource usage of your Greenplum Database deployment. Determine the - percentage of resource group CPU and memory resources that you want to allocate to - PL/Container Docker containers.
  4. -
  5. Determine how you want to distribute the total PL/Container CPU and memory resources - that you identified in the step above among the PL/Container runtimes. Identify:
      -
    • The number of PL/Container resource group(s) that you require.
    • -
    • The percentage of memory and CPU resources to allocate to each resource - group.
    • -
    • The resource-group-to-PL/Container-runtime assignment(s).
    • -
  6. -
  7. Create the PL/Container resource groups that you identified in the step above. For - example, suppose that you choose to allocate 25% of both memory and CPU Greenplum - Database resources to PL/Container. If you further split these resources among 2 - resource groups 60/40, the following SQL commands create the resource - groups:CREATE RESOURCE GROUP plr_run1_rg WITH (MEMORY_AUDITOR=cgroup, CONCURRENCY=0, - CPU_RATE_LIMIT=15, MEMORY_LIMIT=15); -CREATE RESOURCE GROUP plpy_run1_rg WITH (MEMORY_AUDITOR=cgroup, CONCURRENCY=0, - CPU_RATE_LIMIT=10, MEMORY_LIMIT=10);
  8. -
  9. Find and note the groupid associated with each resource group that - you created. For - example:SELECT groupname, groupid FROM gp_toolkit.gp_resgroup_config - WHERE groupname IN ('plpy_run1_rg', 'plr_run1_rg'); - - groupname | groupid ---------------+---------- - plpy_run1_rg | 16391 - plr_run1_rg | 16393 -(1 row)
  10. -
  11. Assign each resource group that you created to the desired PL/Container runtime - configuration. If you have not yet created the runtime configuration, use the - plcontainer runtime-add command. If the runtime already exists, use - the plcontainer runtime-replace or plcontainer - runtime-edit command to add the resource group assignment to the runtime - configuration. For example: - plcontainer runtime-add -r python_run1 -i pivotaldata/plcontainer_python_shared:devel -l python -s resource_group_id=16391 -plcontainer runtime-replace -r r_run1 -i pivotaldata/plcontainer_r_shared:devel -l r -s resource_group_id=16393

    For - information about the plcontainer command, see .

  12. -
- -
-
-
- - PL/Container Docker Images - -

PL/Python and PL/R image are available from the Greenplum Database product download site of - Pivotal Network at https://network.pivotal.io/.

    -
  • PL/Container for Python 3 - Docker image with Python 3.7 and the Python Data Science - Module package installed. - PL/Container 2.1.0 and later supports Docker images with Python 3 - installed.
  • -
  • PL/Container for Python 2 - Docker image with Python 2.7.12 and the Python Data - Science Module package installed.
  • -
  • PL/Container for R - A Docker image with container with R-3.3.3 and the R Data Science - Library package installed.
  • -

-

The Data Science packages contain a set of Python modules or R functions and data sets - related to data science. For information about the packages, see - Python Data Science - Module Package and R Data Science Library Package.

-

The Docker image tag represents the PL/Container release version (for example, 2.1.0). For - example, the full Docker image name for the PL/Container for Python Docker image is similar - to pivotaldata/plc_python_shared:2.1.0. This is the name that is referred - to in the default PL/Container configuration. Also, you can create custom Docker images, - install the image and add the image to the PL/Container configuration.

- -
+ - Installing the PL/Container Language Extension + Install PL/Container + -

To use PL/Container, install the PL/Container language extension, install Docker images, - and configure PL/Container to use the images.

    -
  1. Ensure the Greenplum Database hosts meet the prerequisites, see .
  2. -
  3. Install the PL/Container extension, see .

    If you are upgrading from PL/Container - 1.0, see .

  4. -
  5. Build and Install the PL/Container extension from source, see - .
  6. -
  7. Install Docker images and configure PL/Container, see .
  8. -

- - - Installing the PL/Container Language Extension Package - - -

Install the PL/Container language extension with the Greenplum Database - gppkg utility.

+
This topic includes how to:

    +
  • install Docker
  • +
  • install PL/Container
  • +
  • install the PL/Container + Docker images
  • +
  • test the PL/Container + installation.
  • +

The following sections describe these tasks in detail.

+
+ +
+ Prerequisites +
    +
  1. For PL/Container 2.1.0 and later use Greenplum Database 6 on CentOS 7.x (or later), + RHEL 7.x (or later), or Ubuntu 18.04. PL/Container 2.1.0 and later supports Docker + images with Python 3 installed.
  2. +
  3. The minimum Linux OS kernel version supported is 3.10. To verfiy your kernel release + use: $ uname -r +
  4. +
  5. The minimum Docker versions on all hosts needs to be Docker 19.03.
  6. +
+ +
+ +
+ Install Docker +

To use PL/Container you need to install Docker on all Greenplum Database host systems. + The these instructions show how to set up the Docker service on CentOS 7 but RHEL 7 is a + similar process.

+

These steps install the docker package and start the Docker service as a user with sudo + privileges.

+
    +
  1. Ensure the CentOS extras repository is accessible.
  2. +
  3. Ensure the user has sudo privileges or is root.
  4. +
  5. Install the dependencies required for + Docker:sudo yum install -y yum-utils device-mapper-persistent-data lvm2
  6. +
  7. Add the Docker + repo:sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
  8. +
  9. Update yum cache:sudo yum makecache fast
  10. +
  11. Install Docker:sudo yum -y install docker-ce
  12. +
  13. Start Docker daemon:sudo systemctl start docker
  14. +
  15. On each Greenplum Database host, the gpadmin user should be part of + the docker group for the user to be able to manage Docker images and containers. Assign + the Greenplum Database administrator gpadmin to the group + docker: sudo usermod -aG docker gpadmin
  16. +
  17. Exit the session and login again to update the privileges.
  18. +
  19. Configure Docker to start when the host system + starts:sudo systemctl enable docker.servicesudo systemctl start docker.service
  20. +
  21. Run a Docker command to test the Docker installation. This command lists the currently + running Docker containers. docker ps
  22. +
  23. After you install Docker on all Greenplum Database hosts, restart the Greenplum + Database system to give Greenplum Database access to Docker. + gpstop -ra
  24. +
+

For a list of observations while using Docker and PL/Container, see the Notes section. For a list of Docker reference documentation, + see Docker References.

+
+ +
+ Install PL/Container + +

Install the PL/Container language extension using the gppkg utility.

    -
  1. Copy the PL/Container language extension package to the Greenplum Database master host - as the gpadmin user.
  2. -
  3. Make sure Greenplum Database is up and running. If not, bring it up with this - command.gpstart -a
  4. +
  5. Download the PL/Container package version that applies to your Greenplum Database + version, from VMware Tanzu Network. PL/Container is listed under Greenplum Database Language + extensions.
  6. +
  7. As gpadmin, copy the PL/Container language extension package to the + master host.
  8. +
  9. Make sure Greenplum Database is up and running:gpstate -sIf + it's not, start it:gpstart -a
  10. Run the package installation - command.gppkg -i plcontainer-2.1.0-rhel7-x86_64.gppkg
  11. + command:gppkg -i plcontainer-2.1.0-rhel7-x86_64.gppkg
  12. Source the file - $GPHOME/greenplum_path.sh.source $GPHOME/greenplum_path.sh
  13. -
  14. Restart Greenplum Database.gpstop -ra
  15. -
  16. Enable PL/Container for specific databases by running this command.
      -
    1. For PL/Container 1.1 and later, log into the database as a Greenplum Database - superuser (gpadmin) and run this - command.CREATE EXTENSION plcontainer;

      The command - registers PL/Container and creates PL/Container-specific functions and - views.

    2. -
    3. For PL/Container 1.0, run this - command.psql -d your_database -f $GPHOME/share/postgresql/plcontainer/plcontainer_install.sql

      The - SQL script registers the language plcontainer in the database and - creates PL/Container-specific functions and views.

    4. -
  17. + $GPHOME/greenplum_path.sh:source $GPHOME/greenplum_path.sh +
  18. Restart Greenplum Database:gpstop -ra
  19. +
  20. Login into one of the available databases, for example: + psql postgres
  21. +
  22. Register the PL/Container extension, which installs the plcontainer + utility:CREATE EXTENSION plcontainer;

    You'll need to + register the utility separately on each database that might need the PL/Container + functionality.

-

After installing PL/Container, you can manage Docker images and manage the PL/Container - configuration with the Greenplum Database plcontainer utility.

- - - - Upgrading from PL/Container 1.0 - -

To upgrade to PL/Container 1.1 or later, uninstall version 1.0 and install the new - version. You cannot use the gppkg option -u. The - gppkg utility installs PL/Container 1.1 and later as a Greenplum - Database extension, while PL/Container 1.0 is installed as a Greenplum Database language. - The Docker images and the PL/Container configuration do not change when upgrading - PL/Container, only the PL/Container extension installation changes.

-

As part of the upgrade process, you must drop PL/Container from all databases that are - configured with PL/Container.

- Dropping PL/Container from a database drops all PL/Container UDFs - from the database, including user-created PL/Container UDFs. If the UDFs are required, - ensure you can re-create the UDFs before dropping PL/Container. This - SELECT command lists the names of and body of PL/Container UDFs in a - database.SELECT proname, prosrc FROM pg_proc WHERE prolang = (SELECT oid FROM pg_language WHERE lanname = 'plcontainer');

For - information about the catalog tables, pg_proc and - pg_language, see System Tables.

-

These steps upgrade from PL/Container 1.0 to PL/Container 1.1 or later in a database. The - steps save the PL/Container 1.0 configuration and restore the configuration for use with - PL/Container 1.1 or later.

    -
  1. Save the PL/Container configuration. This example saves the configuration to - plcontainer10-backup.xml in the local - directory.plcontainer runtime-backup -f plcontainer10-backup.xml
  2. -
  3. Remove any setting elements that contain the - use_container_network attribute from the configuration file. For - example, this setting element must be removed from the configuration - file.<setting use_container_network="yes"/>
  4. -
  5. Run the plcontainer_uninstall.sql script as the - gpadmin user for each database that is configured with - PL/Container. For example, this command drops the plcontainer - language in the mytest database. - psql -d mytest -f $GPHOME/share/postgresql/plcontainer/plcontainer_uninstall.sql

    The - script drops the plcontainer language with the - CASCADE clause that drops PL/Container-specific functions and - views in the database.

  6. -
  7. Use the Greenplum Database gppkg utility with the - -r option to uninstall the PL/Container language extension. This - example uninstalls the PL/Container language extension on a Linux - system.$ gppkg -r plcontainer-1.0.0
  8. -
  9. Run the package installation command. This example installs the PL/Container 2.1.0 - language extension on a Linux - system.gppkg -i plcontainer-2.1.0-gp6-rhel7_x86_64.gppkg
  10. -
  11. Source the file - $GPHOME/greenplum_path.sh.source $GPHOME/greenplum_path.sh
  12. -
  13. Update the PL/Container configuration. This command restores the saved - configuration.plcontainer runtime-restore -f plcontainer10-backup.xml
  14. -
  15. Restart Greenplum Database.gpstop -ra
  16. -
  17. Register the new PL/Container extension as an extension for each database that uses - PL/Container UDFs. This psql command runs a CREATE - EXTENSION command to register PL/Container in the database - mytest. - psql -d mytest -c 'CREATE EXTENSION plcontainer;'

    The - command registers PL/Container as an extension and creates PL/Container-specific - functions and views.

  18. -

-

After upgrading PL/Container for a database, re-install any user-created PL/Container - UDFs that are required.

- -
- - Upgrading from PL/Container 1.1 +
+ +
+ Install PL/Container Docker Images +

Install the Docker images that PL/Container will use to create language specific + containers to run the UDFs.

+ +

The PL/Container open source module contains dockerfiles to build + Docker images that can be used with PL/Container. You can build a Docker image to run + PL/Python UDFs and a Docker image to run PL/R UDFs. See the dockerfiles in the GitHub + repository at https://github.com/greenplum-db/plcontainer.

+ +
    +
  • Download the tar.gz files that contain the Docker images from from VMware Tanzu + Network. For example, for Greenplum 6.3, download PL/Container Docker Image + for Python with file name plcontainer-python-image-2.1.0-gp6.tar.gz + which includes Python 2.7.12 and the Python Data Science Module Package.

    If + you require different images from the ones provided by Pivotal, you can create custom + Docker images, install the image and add the image to the PL/ Container configuration. +

  • +
  • +

    Use the plcontainer utility command image-add to + install the images on all Greenplum Database hosts where -f + indicates the location of the downloaded files:

    + # Install a Python 2 based Docker image +plcontainer image-add -f /home/gpadmin/plcontainer-python-image-2.1.0-gp6.tar.gz + +# Install a Python 3 based Docker image +plcontainer image-add -f /home/gpadmin/plcontainer-python3-image-2.1.0-gp6.tar.gz + +# Install an R based Docker image +plcontainer image-add -f /home/gpadmin/plcontainer-r-image-2.1.0-gp6.tar.gz +

    The utility displays progress information, similar to:

    + 20200127:21:54:43:004607 plcontainer:mdw:gpadmin-[INFO]:-Checking whether docker is installed on all hosts... +20200127:21:54:43:004607 plcontainer:mdw:gpadmin-[INFO]:-Distributing image file /home/gpadmin/plcontainer-python-images-1.5.0.tar to all hosts... +20200127:21:54:55:004607 plcontainer:mdw:gpadmin-[INFO]:-Loading image on all hosts... +20200127:21:55:37:004607 plcontainer:mdw:gpadmin-[INFO]:-Removing temporary image files on all hosts... +

    For more information on image-add options, visit the plcontainer reference page.

    +
  • +
  • To display the installed Docker images on the local host use: + $ plcontainer image-list + + REPOSITORY + TAG + IMAGE ID + CREATED + + + pivotaldata/plcontainer_r_shared + devel + 7427f920669d + 10 months ago + + + pivotaldata/plcontainer_python_shared + devel + e36827eba53e + 10 months ago + + + pivotaldata/plcontainer_python3_shared + devel + y32827ebe55b + 5 months ago + +
  • +
  • Add the image information to the PL/Container configuration file using + plcontainer runtime-add, to allow PL/Container to associate + containers with specified Docker images.

    Use the -r option to + specify your own user defined runtime ID name, use the -i option to + specify the Docker image, and the -l option to specify the Docker + image language. When there are multiple versions of the same docker image, for + example 1.0.0 or 1.2.0, specify the TAG version using ":" after the image name. +

    # Add a Python 2 based runtime +plcontainer runtime-add -r plc_python_shared -i pivotaldata/plcontainer_python_shared:devel -l python + +# Add a Python 3 based runtime that is supported with PL/Container 2.1.0 and later +plcontainer runtime-add -r plc_python3_shared -i pivotaldata/plcontainer_python3_shared:devel -l python3 + +# Add an R based runtime +plcontainer runtime-add -r plc_r_shared -i pivotaldata/plcontainer_r_shared:devel -l rThe + utility displays progress information as it updates the PL/Container configuration file + on the Greenplum Database instances.

    For details on other + runtime-add options, see the plcontainer reference page. +

  • +
  • Optional: Use Greenplum Database resource groups to manage and limit the total CPU and + memory resources of containers in PL/Container runtimes. In this example, the Python + runtime will be used with a preconfigured resource group + 16391:.plcontainer runtime-add -r plc_python_shared -i pivotaldata/plcontainer_python_shared:devel -l + python -s resource_group_id=16391

    For + more information about enabling, configuring, and using Greenplum Database resource + groups with PL/Container, see PL/Container Resource Management .

  • +
+

You can now create a simple function to test your PL/Container installation.

+
+
+ Test the PL/Container Installation +

List the names of the runtimes your created and added to the PL/Container XML + file:plcontainer runtime-show +

+

Which will show a list of all installed + runtimes:PL/Container Runtime Configuration: +--------------------------------------------------------- + Runtime ID: plc_python_shared + Linked Docker Image: pivotaldata/plcontainer_python_shared:devel + Runtime Setting(s): + Shared Directory: + ---- Shared Directory From HOST '/usr/local/greenplum-db/./bin/plcontainer_clients' to Container '/clientdir', access mode is 'ro' +--------------------------------------------------------- + + +

+

You can also view the PL/Container configuration information with the plcontainer + runtime-show -r <runtime_id> command. You can view the PL/Container + configuration XML file with the plcontainer runtime-edit command.

+

Use the psql utility and select an existing database:

+ psql postgres; +

If PL/Container extension is not registered with the selected database, first enable the + extension:

+ postgres=# CREATE EXTENSION plcontainer; +

Create a simple function to test your installation, where the function will use the + runtime plc_python_shared:

+ postgres=# CREATE FUNCTION dummyPython() RETURNS text AS $$ +# container: plc_python_shared +return 'hello from Python' +$$ LANGUAGE plcontainer; +

And test the function using:

+ postgres=# SELECT dummyPython(); + dummypython +------------------- + hello from Python +(1 row) + +

For further details and examples about using PL/Container functions, see PL/Container Functions .

+
+ +
+ + + Upgrade PL/Container -

To upgrade from PL/Container 1.1 or higher, you save the current configuration, upgrade - PL/Container, and then restore the configuration after upgrade. There is no need to update - the Docker images when you upgrade PL/Container.

+

To upgrade PL/Container, you save the current configuration, upgrade PL/Container, and + then restore the configuration after upgrade. There is no need to update the Docker images + when you upgrade PL/Container.

Before you perform this upgrade procedure, ensure that you have migrated your - PL/Container 1.1 package from your previous Greenplum Database installation to your new + PL/Container package from your previous Greenplum Database installation to your new Greenplum Database installation. Refer to the gppkg command for package installation and migration information. -

Perform the following procedure to upgrade from PL/Container 1.1 to PL/Container version - 1.2 or later.

    +

    To upgrade, perform the following procedure: +

    1. Save the PL/Container configuration. For example, to save the configuration to a - file named plcontainer110-backup.xml in the local - directory:$ plcontainer runtime-backup -f plcontainer110-backup.xml
    2. + file named plcontainer202-backup.xml in the local + directory:$ plcontainer runtime-backup -f plcontainer202-backup.xml
    3. Use the Greenplum Database gppkg utility with the -u option to update the PL/Container language extension. For example, the following command updates the PL/Container language extension to version @@ -435,94 +334,22 @@ plcontainer runtime-replace -r r_run1 -i pivotaldata/plcontainer_r_shared:devel system:$ gppkg -u plcontainer-2.1.0-gp6-rhel7_x86_64.gppkg
    4. Source the Greenplum Database environment file $GPHOME/greenplum_path.sh.$ source $GPHOME/greenplum_path.sh
    5. -
    6. Restore the PL/Container configuration. For example, this command restores the - PL/Container configuration that you saved in a previous step: - $ plcontainer runtime-restore -f plcontainer110-backup.xml
    7. +
    8. Restore the PL/Container configuration that you saved in a previous step: + $ plcontainer runtime-restore -f plcontainer202-backup.xml
    9. Restart Greenplum Database.$ gpstop -ra
    10. You do not need to re-register the PL/Container extension in the databases in which - you previously created the extension. You must register the PL/Container extension in - each new database that will run PL/Container UDFs. For example, the following command - registers PL/Container in a database named mytest: - $ psql -d mytest -c 'CREATE EXTENSION plcontainer;'

      The - command also creates PL/Container-specific functions and views.

    11. + you previously created the extension but ensure that you register the PL/Container + extension in each new database that will run PL/Container UDFs. For example, the + following command registers PL/Container in a database named mytest: + $ psql -d mytest -c 'CREATE EXTENSION plcontainer;'

      The + command also creates PL/Container-specific functions and views.

    - PL/Container 1.2 and later utilizes the new resource group capabilities introduced in - Greenplum Database 5.8.0. If you downgrade to a Greenplum Database system that uses - PL/Container 1.1. or earlier, you must use plcontainer runtime-edit to - remove any resource_group_id settings from your PL/Container runtime - configuration. - - - - Building and Installing the PL/Container Language Extension - - -

    The PL/Container language extension is available as an open source module. For - information about the building and installing the module as part of Greenplum Database, - see the README file in the GitHub repository at https://github.com/greenplum-db/plcontainer.

    - +
    - - - Installing PL/Container Docker Images - -

    The PL/Container language extension includes the plcontainer utility that - installs Docker images on the Greenplum Database hosts and adds configuration information to - the PL/Container configuration file. The configuration information allows PL/Container to - create Docker containers with the Docker images. For information about - plcontainer, see .

    - -

    Download the tar.gz file that contains the Docker - images from Pivotal Network.

      -
    • plcontainer-python3-image-<version>.tar.gz
    • -
    • plcontainer-python-image-<version>.tar.gz
    • -
    • plcontainer-r-image-<version>.tar.gz
    • -

    - -

    The PL/Container open source module contains dockerfiles to build - Docker images that can be used with PL/Container. You can build a Docker image to run - PL/Python UDFs and a Docker image to run PL/R UDFs. See the dockerfiles in the GitHub - repository at https://github.com/greenplum-db/plcontainer.

    -

    Install Docker images on all the Greenplum Database hosts with the plcontainer - image-add command. These examples assume that the Docker images are in files - located in /home/gpadmin. - # Install a Python 2 based Docker image -plcontainer image-add -f /home/gpadmin/plcontainer-python-image-2.1.0-gp6.tar.gz -# Install a Python 3 based Docker image -plcontainer image-add -f /home/gpadmin/plcontainer-python3-image-2.1.0-gp6.tar.gz - -# Install an R based Docker image -plcontainer image-add -f /home/gpadmin/plcontainer-r-image-2.1.0-gp6.tar.gz

    -

    The utility displays progress information as it installs the Docker image on the Greenplum - Database hosts.

    -

    Use the plcontainer image-list command to display the installed Docker - images on the local host.

    -

    Use the plcontainer runtime-add command to add information to the - PL/Container configuration file so that PL/Container can access the Docker image. The - -r option specifies the runtime ID that you choose. The - -l option specifies the language that is contained in the Docker image. - # Add a Python 2 based runtime -plcontainer runtime-add -r plc_py -i pivotaldata/plcontainer_python_shared:devel -l python - -# Add a Python 3 based runtime that is supported with PL/Container 2.1.0 and later -plcontainer runtime-add -r plc_py3 -i pivotaldata/plcontainer_python3_shared:devel -l python3 - -# Add an R based runtime -plcontainer runtime-add -r plc_r -i pivotaldata/plcontainer_r_shared:devel -l r

    -

    The utility displays progress information as it updates the PL/Container configuration file - on the Greenplum Database instances.

    -

    You can view the PL/Container configuration information with the plcontainer - runtime-show -r <runtime_id> command. You can view the PL/Container - configuration XML file with the plcontainer runtime-edit command.

    - -
    - - Uninstalling PL/Container + + + Uninstall PL/Container

    To uninstall PL/Container, remove Docker containers and images, and then remove the PL/Container support from Greenplum Database.

    @@ -553,31 +380,15 @@ plcontainer runtime-add -r plc_r -i pivotaldata/plcontainer_r_shared:devel -l r< Remove PL/Container Support for a Database -

    For a database that no long requires PL/Container, remove support for PL/Container.

    -
    - PL/Container 1.1 and Later -

    For PL/Container 1.1 and later, drop the extension from the database. This - psql command runs a DROP EXTENION command to remove - PL/Container in the database - mytest.psql -d mytest -c 'DROP EXTENSION plcontainer CASCADE;'

    -

    The command drops the plcontainer extension. The - CASCADE keyword drops PL/Container-specific functions and views from - the database.

    -
    -
    - PL/Container 1.0 -

    Run the plcontainer_uninstall.sql script as the - gpadmin user. For example, this command removes the - plcontainer language in the mytest database.

    - psql -d mytest -f $GPHOME/share/postgresql/plcontainer/plcontainer_uninstall.sql -

    The script drops the plcontainer language with - CASCADE to drop PL/Container-specific functions and views from the - database.

    -
    +

    To remove support for PL/Container, drop the extension from the database. Use the + psql utility with DROP EXTENION command (using + -c) to remove PL/Container from mytest database.

    + psql -d mytest -c 'DROP EXTENSION plcontainer CASCADE;' +

    The CASCADE keyword drops PL/Container-specific functions and views.

    - Uninstalling PL/Container Language Extension + Uninstall PL/Container Language Extension

    If no databases have plcontainer as a registered language, uninstall the Greenplum Database PL/Container language extension with the gppkg @@ -595,844 +406,16 @@ plcontainer runtime-add -r plc_r -i pivotaldata/plcontainer_r_shared:devel -l r<

-
- - Using PL/Container Functions - -

When you enable PL/Container in a database of a Greenplum Database system, the language - plcontainer is registered in the database. You can create and run - user-defined functions in the procedural languages supported by the PL/Container Docker - images when you specify plcontainer as a language in a UDF definition.

-

A UDF definition that uses PL/Container must have the these items.

-
    -
  • The first line of the UDF must be # container: - ID
  • -
  • The LANGUAGE attribute must be plcontainer
  • -
-

The ID is the name that PL/Container uses to identify a Docker image. - When Greenplum Database executes a UDF on a host, the Docker image on the host is used to - start a Docker container that runs the UDF. In the XML configuration file - plcontainer_configuration.xml, there is a runtime XML - element that contains a corresponding id XML element that specifies the - Docker container startup information. See for - information about how PL/Container maps the ID to a Docker image. See - for example UDF definitions.

-

The PL/Container configuration file is read only on the first invocation of a PL/Container - function in each Greenplum Database session that runs PL/Container functions. You can force - the configuration file to be re-read by performing a SELECT command on the - view plcontainer_refresh_config during the session. For example, this - SELECT command forces the configuration file to be read.

- SELECT * FROM plcontainer_refresh_config; -

Running the command executes a PL/Container function that updates the configuration on the - master and segment instances and returns the status of the - refresh. gp_segment_id | plcontainer_refresh_local_config ----------------+---------------------------------- - 1 | ok - 0 | ok - -1 | ok -(3 rows)

-

Also, you can show all the configurations in the session by performing a - SELECT command on the view plcontainer_show_config. For - example, this SELECT command returns the PL/Container configurations.

- SELECT * FROM plcontainer_show_config; -

Running the command executes a PL/Container function that displays configuration - information from the master and segment instances. This is an example of the start and end - of the view - output.INFO: plcontainer: Container 'plc_py_test' configuration -INFO: plcontainer: image = 'pivotaldata/plcontainer_python_shared:devel' -INFO: plcontainer: memory_mb = '1024' -INFO: plcontainer: use container network = 'no' -INFO: plcontainer: use container logging = 'no' -INFO: plcontainer: shared directory from host '/usr/local/greenplum-db/./bin/plcontainer_clients' to container '/clientdir' -INFO: plcontainer: access = readonly - - ... - -INFO: plcontainer: Container 'plc_r_example' configuration (seg0 slice3 192.168.180.45:40000 pid=3304) -INFO: plcontainer: image = 'pivotaldata/plcontainer_r_without_clients:0.2' (seg0 slice3 192.168.180.45:40000 pid=3304) -INFO: plcontainer: memory_mb = '1024' (seg0 slice3 192.168.180.45:40000 pid=3304) -INFO: plcontainer: use container network = 'no' (seg0 slice3 192.168.180.45:40000 pid=3304) -INFO: plcontainer: use container logging = 'yes' (seg0 slice3 192.168.180.45:40000 pid=3304) -INFO: plcontainer: shared directory from host '/usr/local/greenplum-db/bin/plcontainer_clients' to container '/clientdir' (seg0 slice3 192.168.180.45:40000 pid=3304) -INFO: plcontainer: access = readonly (seg0 slice3 192.168.180.45:40000 pid=3304) -gp_segment_id | plcontainer_show_local_config ----------------+------------------------------- - 0 | ok - -1 | ok - 1 | ok

-

The PL/Container function plcontainer_containers_summary() displays - information about the currently running Docker - containers.SELECT * FROM plcontainer_containers_summary();

-

If a normal (non-superuser) Greenplum Database user runs the function, the function - displays information only for containers created by the user. If a Greenplum Database - superuser runs the function, information for all containers created by Greenplum Database - users is displayed. This is sample output when 2 containers are running.

- SEGMENT_ID | CONTAINER_ID | UP_TIME | OWNER | MEMORY_USAGE(KB) -------------+------------------------------------------------------------------+--------------+---------+------------------ - 1 | 693a6cb691f1d2881ec0160a44dae2547a0d5b799875d4ec106c09c97da422ea | Up 8 seconds | gpadmin | 12940 - 1 | bc9a0c04019c266f6d8269ffe35769d118bfb96ec634549b2b1bd2401ea20158 | Up 2 minutes | gpadmin | 13628 -(2 rows) - - - Examples - -

The values in the # container lines of the examples, - plc_python_shared and plc_r_shared, are the - id XML elements defined in the plcontainer_config.xml - file. The id element is mapped to the image element that - specifies the Docker image to be started. If you configured PL/Container with a different - ID, change the value of the # container line. For information about - configuring PL/Container and viewing the configuration settings, see .

-

This is an example of PL/Python function that runs using the - plc_python_shared container that contains Python - 2:CREATE OR REPLACE FUNCTION pylog100() RETURNS double precision AS $$ -# container: plc_python_shared -import math -return math.log10(100) -$$ LANGUAGE plcontainer;

-

This is an example of a similar function using the plc_r_shared - container:CREATE OR REPLACE FUNCTION rlog100() RETURNS text AS $$ -# container: plc_r_shared -return(log10(100)) -$$ LANGUAGE plcontainer;

-

If the # container line in a UDF specifies an ID that is not in the - PL/Container configuration file, Greenplum Database returns an error when you try to - execute the UDF.

- -
-
- - About PL/Container Running PL/Python - -

In the Python language container, the module plpy is implemented. The - module contains these methods:

-
    -
  • plpy.execute(stmt) - Executes the query string stmt - and returns query result in a list of dictionary objects. To be able to access the result - fields ensure your query returns named fields.
  • -
  • plpy.prepare(stmt[, argtypes]) - Prepares the execution plan for a - query. It is called with a query string and a list of parameter types, if you have - parameter references in the query.
  • -
  • plpy.execute(plan[, argtypes]) - Executes a prepared plan.
  • -
  • plpy.debug(msg) - Sends a DEBUG2 message to the Greenplum Database - log.
  • -
  • plpy.log(msg) - Sends a LOG message to the Greenplum Database log.
  • -
  • plpy.info(msg) - Sends an INFO message to the Greenplum Database - log.
  • -
  • plpy.notice(msg) - Sends a NOTICE message to the Greenplum Database - log.
  • -
  • plpy.warning(msg) - Sends a WARNING message to the Greenplum Database - log.
  • -
  • plpy.error(msg) - Sends an ERROR message to the Greenplum Database log. - An ERROR message raised in Greenplum Database causes the query execution process to stop - and the transaction to rollback.
  • -
  • plpy.fatal(msg) - Sends a FATAL message to the Greenplum Database log. - A FATAL message causes Greenplum Database session to be closed and transaction to be - rolled back.
  • -
  • plpy.subtransaction() - Manages plpy.execute calls in - an explicit subtransaction. See Explicit Subtransactions in the PostgreSQL documentation for - additional information about plpy.subtransaction().
  • -
-

If an error of level ERROR or FATAL is raised in a nested - Python function call, the message includes the list of enclosing functions.

-

The Python language container supports these string quoting functions that are useful when - constructing ad-hoc queries.

    -
  • plpy.quote_literal(string) - Returns the string quoted to be used as - a string literal in an SQL statement string. Embedded single-quotes and backslashes are - properly doubled. quote_literal() returns null on null input (empty - input). If the argument might be null, quote_nullable() might be more - appropriate.
  • -
  • plpy.quote_nullable(string) - Returns the string quoted to be used as - a string literal in an SQL statement string. If the argument is null, returns - NULL. Embedded single-quotes and backslashes are properly - doubled.
  • -
  • - plpy.quote_ident(string) - Returns the string quoted to be used as an - identifier in an SQL statement string. Quotes are added only if necessary (for example, - if the string contains non-identifier characters or would be case-folded). Embedded - quotes are properly doubled.
  • -

-

When returning text from a PL/Python function, PL/Container converts a Python unicode - object to text in the database encoding. If the conversion cannot be performed, an error is - returned.

-

PL/Container does not support this Greenplum Database PL/Python feature:

    -
  • Multi-dimensional arrays.
  • -

-

Also, the Python module has two global dictionary objects that retain the data between - function calls. They are named GD and SD. GD is used to share the data between all the - function running within the same container, while SD is used for sharing the data between - multiple calls of each separate function. Be aware that accessing the data is possible only - within the same session, when the container process lives on a segment or master. Be aware - that for idle sessions Greenplum Database terminates segment processes, which means the - related containers would be shut down and the data from GD and SD lost.

-

For information about PL/Python, see .

-

For information about the plpy methods, see https://www.postgresql.org/docs/9.4/plpython-database.htm.

- -
- - About PL/Container Running PL/R - -

In the R language container, the module pg.spi is implemented. The module - contains these methods:

-
    -
  • pg.spi.exec(stmt) - Executes the query string stmt and - returns query result in R data.frame. To be able to access the result - fields make sure your query returns named fields.
  • -
  • pg.spi.prepare(stmt[, argtypes]) - Prepares the execution plan for a - query. It is called with a query string and a list of parameter types if you have - parameter references in the query.
  • -
  • pg.spi.execp(plan[, argtypes]) - Execute a prepared plan.
  • -
  • pg.spi.debug(msg) - Sends a DEBUG2 message to the Greenplum Database - log.
  • -
  • pg.spi.log(msg) - Sends a LOG message to the Greenplum Database - log.
  • -
  • pg.spi.info(msg) - Sends an INFO message to the Greenplum Database - log.
  • -
  • pg.spi.notice(msg) - Sends a NOTICE message to the Greenplum Database - log.
  • -
  • pg.spi.warning(msg) - Sends a WARNING message to the Greenplum Database - log.
  • -
  • pg.spi.error(msg) - Sends an ERROR message to the Greenplum Database - log. An ERROR message raised in Greenplum Database causes the query execution process to - stop and the transaction to rollback.
  • -
  • pg.spi.fatal(msg) - Sends a FATAL message to the Greenplum Database - log. A FATAL message causes Greenplum Database session to be closed and transaction to be - rolled back.
  • -
-

PL/Container does not support this PL/R feature:

    -
  • Multi-dimensional arrays.
  • -

-

For information about PL/R, see .

-

For information about the pg.spi methods, see http://www.joeconway.com/plr/doc/plr-spi-rsupport-funcs-normal.html

- -
- - Configuring PL/Container - -

The Greenplum Database utility plcontainer manages the PL/Container - configuration files in a Greenplum Database system. The utility ensures that the - configuration files are consistent across the Greenplum Database master and segment - instances.

- Modifying the configuration files on the segment instances without using - the utility might create different, incompatible configurations on different Greenplum - Database segments that could cause unexpected behavior. -

Configuration changes that are made with the utility are applied to the XML files on all - Greenplum Database segments. However, PL/Container configurations of currently running - sessions use the configuration that existed during session start up. To update the - PL/Container configuration in a running session, execute this command in the session.

- SELECT * FROM plcontainer_refresh_config; -

Running the command executes a PL/Container function that updates the session configuration - on the master and segment instances.

- - - The plcontainer Utility - -

The plcontainer utility installs Docker images and manages the - PL/Container configuration. The utility consists of two sets of commands.

-
    -
  • image-* commands manage Docker images on the Greenplum Database - system hosts.
  • -
  • runtime-* commands manage the PL/Container configuration file on the - Greenplum Database instances. You can add Docker image information to the PL/Container - configuration file including the image name, location, and shared folder information. - You can also edit the configuration file.
  • -
-

To configure PL/Container to use a Docker image, you install the Docker image on all the - Greenplum Database hosts and then add configuration information to the PL/Container - configuration.

-

PL/Container configuration values, such as image names, runtime IDs, and parameter values - and names are case sensitive.

-
- plcontainer Syntax - plcontainer [command] [-h | --help] [--verbose] -

Where command is one of the following.

- image-add {{-f | --file} image_file} | {{-u | --URL} image_URL} - image-delete {-i | --image} image_name - image-list + - runtime-add {-r | --runtime} runtime_id - {-i | --image} image_name {-l | --language} {python | python3 | r} - [{-v | --volume} shared_volume [{-v| --volume} shared_volume...]] - [{-s | --setting} param=value [{-s | --setting} param=value ...]] - runtime-replace {-r | --runtime} runtime_id - {-i | --image} image_name -l {r | python} - [{-v | --volume} shared_volume [{-v | --volume} shared_volume...]] - [{-s | --setting} param=value [{-s | --setting} param=value ...]] - runtime-show {-r | --runtime} runtime_id - runtime-delete {-r | --runtime} runtime_id - runtime-edit [{-e | --editor} editor] - runtime-backup {-f | --file} config_file - runtime-restore {-f | --file} config_file - runtime-verify -
-
- plcontainer Commands and Options -
- - - image-add location - Install a Docker image on the Greenplum Database hosts. Specify either the location - of the Docker image file on the host or the URL to the Docker image. These are the - supported location options.
    -
  • {-f | --file} image_file Specify the tar - archive file on the host that contains the Docker image. This example points to an - image file in the gpadmin home directory - /home/gpadmin/test_image.tar.gz
  • -
  • {-u | --URL} image_URL Specify the URL of the - Docker repository and image. This example URL points to a local Docker repository - 192.168.0.1:5000/images/mytest_plc_r:devel
  • -
- After installing the Docker image, use the runtime-add - command to configure PL/Container to use the Docker image. -
- - image-delete {-i | --image} image_name - Remove an installed Docker image from all Greenplum Database hosts. Specify the full - Docker image name including the tag for example - pivotaldata/plcontainer_python_shared:1.0.0 - - - image-list - List the Docker images installed on the host. The command list only the images on - the local host, not remote hosts. The command lists all installed Docker images, - including images installed with Docker commands. - - - runtime-add options - Add configuration information to the PL/Container configuration file on all - Greenplum Database hosts. If the specified runtime_id exists, the - utility returns an error and the configuration information is not added. - For information about PL/Container configuration, see . - These are the supported options: - - - - {-i | --image} docker-image - Required. Specify the full Docker image name, including the tag, that is - installed on the Greenplum Database hosts. For example - pivotaldata/plcontainer_python:1.0.0. - The utility returns a warning if the specified Docker image is not - installed. - The plcontainer image-list command displays installed image - information including the name and tag (the Repository and Tag columns). - - - {-l | --language} python | python3 | r - Required. Specify the PL/Container language type, supported values are - python (PL/Python using Python 2), python3 - (PL/Python using Python 3) and r (PL/R). When adding - configuration information for a new runtime, the utility adds a startup command - to the configuration based on the language you specify. - Startup command for the Python 2 - language./clientdir/pyclient.sh - Startup command for the Python 3 - language./clientdir/pyclient3.sh - Startup command for the R - language./clientdir/rclient.sh - - - {-r | --runtime} runtime_id - - Required. Add the runtime ID. When adding a runtime element - in the PL/Container configuration file, this is the value of the - id element in the PL/Container configuration file. Maximum - length is 63 Bytes. - You specify the name in the Greenplum Database UDF on the # - container line. See . - - - {-s | --setting} - param=value - Optional. Specify a setting to add to the runtime configuration information. - You can specify this option multiple times. The setting applies to the runtime - configuration specified by the runtime_id. The parameter is - the XML attribute of the settings element in the PL/Container - configuration file. These are valid parameters.
    -
  • cpu_share - Set the CPU limit for each container in the - runtime configuration. The default value is 1024. The value is a relative - weighting of CPU usage compared to other containers.
  • -
  • memory_mb - Set the memory limit for each container in - the runtime configuration. The default value is 1024. The value is an - integer that specifies the amount of memory in MB.
  • -
  • resource_group_id - Assign the specified resource group - to the runtime configuration. The resource group limits the total CPU and - memory resource usage for all containers that share this runtime - configuration. You must specify the groupid of the resource - group. For information about managing PL/Container resources, see About PL/Container Resource - Management.
  • -
  • roles - Specify the Greenplum Database roles that are - allowed to run a container for the runtime configuration. You can specify a - single role name or comma separated lists of role names. The default is no - restriction.
  • -
  • use_container_logging - Enable or disable Docker logging - for the container. The value is either yes (enable logging) - or no (disable logging, the default).

    The Greenplum - Database server configuration parameter log_min_messages controls the log level. - The default log level is warning. For information about - PL/Container log information, see Notes.

  • -
-
- - {-v | --volume} shared-volume - Optional. Specify a Docker volume to bind mount. You can specify this option - multiple times to define multiple volumes. - The format for a shared volume: - host-dir:container-dir:[rw|ro]. - The information is stored as attributes in the shared_directory - element of the runtime element in the PL/Container - configuration file.
    -
  • host-dir - absolute path to a directory on the host - system. The Greenplum Database administrator user (gpadmin) must have - appropriate access to the directory.
  • -
  • container-dir - absolute path to a directory in the - Docker container.
  • -
  • [rw|ro] - read-write or read-only access to the host - directory from the container.
  • -
- When adding configuration information for a new runtime, the utility adds this - read-only shared volume information. - - greenplum-home/bin/plcontainer_clients:/clientdir:ro - - If needed, you can specify other shared directories. The utility returns an - error if the specified container-dir is the same as the one - that is added by the utility, or if you specify multiple shared volumes with the - same container-dir. - Allowing read-write access to a host directory requires - special considerations.
    -
  • When specifying read-write access to host directory, ensure that the - specified host directory has the correct permissions.
  • -
  • When running PL/Container user-defined functions, multiple concurrent - Docker containers that are running on a host could change data in the host - directory. Ensure that the functions support multiple concurrent access to - the data in the host directory.
  • -
-
-
-
-
- - runtime-backup {-f | --file} config_file - -

Copies the PL/Container configuration file to the specified file on the - local host.

-
-
- - runtime-delete {-r | --runtime} runtime_id - -

Removes runtime configuration information in the PL/Container - configuration file on all Greenplum Database instances. The utility returns a - message if the specified runtime_id does not exist in the - file.

-
-
- - runtime-edit [{-e | --editor} editor] - Edit the XML file plcontainer_configuration.xml with the specified - editor. The default editor is vi.

Saving the file updates the - configuration file on all Greenplum Database hosts. If errors exist in the updated - file, the utility returns an error and does not update the file.

-
- - runtime-replace options - -

Replaces runtime configuration information in the PL/Container - configuration file on all Greenplum Database instances. If the - runtime_id does not exist, the information is added to the - configuration file. The utility adds a startup command and shared directory to the - configuration.

-

See runtime-add for command options and information added to the - configuration.

-
-
- - runtime-restore {-f | --file} config_file - -

Replaces information in the PL/Container configuration file - plcontainer_configuration.xml on all Greenplum Database instances - with the information from the specified file on the local host.

-
-
- - runtime-show [{-r | --runtime} runtime_id] - -

Displays formatted PL/Container runtime configuration information. If a - runtime_id is not specified, the configuration for all runtime - IDs are displayed.

-
-
- - runtime-verify - -

Checks the PL/Container configuration information on the Greenplum - Database instances with the configuration information on the master. If the utility - finds inconsistencies, you are prompted to replace the remote copy with the local - copy. The utility also performs XML validation.

-
-
- - -h | --help - Display help text. If specified without a command, displays help for all - plcontainer commands. If specified with a command, displays help - for the command. - - - --verbose - Enable verbose logging for the command. - -
-
- Examples -

These are examples of common commands to manage PL/Container:

-
    -
  • Install a Docker image on all Greenplum Database hosts. This example loads a Docker - image from a file. The utility displays progress information on the command line as - the utility installs the Docker image on all the - hosts.plcontainer image-add -f plc_newr.tar.gz

    After - installing the Docker image, you add or update a runtime entry in the PL/Container - configuration file to give PL/Container access to the Docker image to start Docker - containers.

  • -
  • Add a container entry to the PL/Container configuration file. This example adds - configuration information for a PL/R runtime, and specifies a shared volume and - settings for memory and logging. - plcontainer runtime-add -r runtime2 -i test_image2:0.1 -l r \ - -v /host_dir2/shared2:/container_dir2/shared2:ro \ - -s memory_mb=512 -s use_container_logging=yes

    The - utility displays progress information on the command line as it adds the runtime - configuration to the configuration file and distributes the updated configuration to - all instances.

  • -
  • Show specific runtime with given runtime id in configuration - fileplcontainer runtime-show -r plc_python_shared

    The - utility displays the configuration information similar to this - output.PL/Container Runtime Configuration: ---------------------------------------------------------- - Runtime ID: plc_python_shared - Linked Docker Image: test1:latest - Runtime Setting(s): - Shared Directory: - ---- Shared Directory From HOST '/usr/local/greenplum-db/bin/plcontainer_clients' to Container '/clientdir', access mode is 'ro' - ---- Shared Directory From HOST '/home/gpadmin/share/' to Container '/opt/share', access mode is 'rw' ----------------------------------------------------------

  • -
  • Edit the configuration in an interactive editor of your choice. This example edits - the configuration file with the vim - editor.plcontainer runtime-edit -e vim

    When you save the - file, the utility displays progress information on the command line as it - distributes the file to the Greenplum Database hosts.

  • -
  • Save the current PL/Container configuration to a file. This example saves the file - to the local file - /home/gpadmin/saved_plc_config.xmlplcontainer runtime-backup -f /home/gpadmin/saved_plc_config.xml
  • -
  • Overwrite PL/Container configuration file with an XML file. This example replaces - the information in the configuration file with the information from the file in the - /home/gpadmin - directory.plcontainer runtime-restore -f /home/gpadmin/new_plcontainer_configuration.xmlThe - utility displays progress information on the command line as it distributes the - updated file to the Greenplum Database instances.
  • -
-
- -
- - PL/Container Configuration File - -

PL/Container maintains a configuration file - plcontainer_configuration.xml in the data directory of all Greenplum - Database segments. The PL/Container configuration file is an XML file. In the XML file, - the root element configuration contains one or more - runtime elements. You specify the id of the - runtime element in the # container: line of a - PL/Container function definition.

-

In an XML file, names, such as element and attribute names, and values are case - sensitive.

-

This is an example - file.<?xml version="1.0" ?> -<configuration> - <runtime> - <id>plc_python_example1</id> - <image>pivotaldata/plcontainer_python_with_clients:0.1</image> - <command>./pyclient</command> - </runtime> - <runtime> - <id>plc_python_example2</id> - <image>pivotaldata/plcontainer_python_without_clients:0.1</image> - <command>/clientdir/pyclient.sh</command> - <shared_directory access="ro" container="/clientdir" host="/usr/local/greenplum-db/bin/plcontainer_clients"/> - <setting memory_mb="512"/> - <setting use_container_logging="yes"/> - <setting cpu_share="1024"/> - <setting resource_group_id="16391"/> - </runtime> - <runtime> - <id>plc_r_example</id> - <image>pivotaldata/plcontainer_r_without_clients:0.2</image> - <command>/clientdir/rclient.sh</command> - <shared_directory access="ro" container="/clientdir" host="/usr/local/greenplum-db/bin/plcontainer_clients"/> - <setting use_container_logging="yes"/> - <setting roles="gpadmin,user1"/> - </runtime> - <runtime> -</configuration>

-

These are the XML elements and attributes in a PL/Container configuration file.

- - - configuration - Root element for the XML file. - - - runtime - One element for each specific container available in the system. These are child - elements of the configuration element. - - - - id - Required. The value is used to reference a Docker container from a - PL/Container user-defined function. The id value must be unique - in the configuration. The id must start with a character or - digit (a-z, A-Z, or 0-9) and can contain characters, digits, or the characters - _ (underscore), . (period), or - - (dash). Maximum length is 63 Bytes.

The - id specifies which Docker image to use when PL/Container - creates a Docker container to execute a user-defined function.

-
- - image - -

Required. The value is the full Docker image name, including image tag. The - same way you specify them for starting this container in Docker. Configuration - allows to have many container objects referencing the same image name, this - way in Docker they would be represented by identical containers.

-

For example, you might have two runtime elements, with - different id elements, plc_python_128 and - plc_python_256, both referencing the Docker image - pivotaldata/plcontainer_python:1.0.0. The first - runtime specifies a 128MB RAM limit and the second one - specifies a 256MB limit that is specified by the memory_mb - attribute of a setting element.

-
-
- - command - Required. The value is the command to be run inside of container to start the - client process inside in the container. When creating a runtime - element, the plcontainer utility adds a - command element based on the language (the - -l option). - command element for the Python 2 - language.<command>/clientdir/pyclient.sh</command> - command element for the Python 3 - language.<command>/clientdir/pyclient3.sh</command> - command element for the R - language.<command>/clientdir/rclient.sh</command> - You should modify the value only if you build a custom container and want to - implement some additional initialization logic before the container starts. - This element cannot be set with the plcontainer utility. - You can update the configuration file with the plcontainer - runtime-edit command. - - - shared_directory - Optional. This element specifies a shared Docker shared volume for a container - with access information. Multiple shared_directory elements are - allowed. Each shared_directory element specifies a single - shared volume. XML attributes for the shared_directory - element:
    -
  • host - a directory location on the host system.
  • -
  • container - a directory location inside of - container.
  • -
  • access - access level to the host directory, which can be - either ro (read-only) or rw (read-write). -
  • -
- When creating a runtime element, the - plcontainer utility adds a shared_directory - element.<shared_directory access="ro" container="/clientdir" host="/usr/local/greenplum-db/bin/plcontainer_clients"/> - For each runtime element, the container - attribute of the shared_directory elements must be unique. For - example, a runtime element cannot have two - shared_directory elements with attribute - container="/clientdir". - Allowing read-write access to a host directory requires - special consideration.
    -
  • When specifying read-write access to host directory, ensure that the - specified host directory has the correct permissions.
  • -
  • When running PL/Container user-defined functions, multiple concurrent - Docker containers that are running on a host could change data in the host - directory. Ensure that the functions support multiple concurrent access to - the data in the host directory.
  • -
-
- - settings - Optional. This element specifies Docker container configuration information. - Each setting element contains one attribute. The element - attribute specifies logging, memory, or networking information. For example, - this element enables - logging.<setting use_container_logging="yes"/> - These are the valid attributes. - - cpu_share - Optional. Specify the CPU usage for each PL/Container container in the - runtime. The value of the element is a positive integer. The default value - is 1024. The value is a relative weighting of CPU usage compared to other - containers. - For example, a container with a cpu_share of 2048 is - allocated double the CPU slice time compared with container with the - default value of 1024. - - - memory_mb="size" - Optional. The value specifies the amount of memory, in MB, that each - container is allowed to use. Each container starts with this amount of RAM - and twice the amount of swap space. The container memory consumption is - limited by the host system cgroups configuration, which - means in case of memory overcommit, the container is terminated by the - system. - - - resource_group_id="rg_groupid" - Optional. The value specifies the groupid of the - resource group to assign to the PL/Container runtime. The resource group - limits the total CPU and memory resource usage for all running containers - that share this runtime configuration. You must specify the - groupid of the resource group. If you do not assign a - resource group to a PL/Container runtime configuration, its container - instances are limited only by system resources. For information about - managing PL/Container resources, see About PL/Container Resource Management. - - - roles="list_of_roles" - Optional. The value is a Greenplum Database role name or a - comma-separated list of roles. PL/Container runs a container that uses the - PL/Container runtime configuration only for the listed roles. If the - attribute is not specified, any Greenplum Database role can run an - instance of this container runtime configuration. For example, you create - a UDF that specifies the plcontainer language and - identifies a # container: runtime configuration that has - the roles attribute set. When a role (user) runs the UDF, - PL/Container checks the list of roles and runs the container only if the - role is on the list. - - - use_container_logging="{yes | no}" - Optional. Enables or disables Docker logging for the container. The - attribute value yes enables logging. The attribute value - no disables logging (the default). - The Greenplum Database server configuration parameter log_min_messages controls the - PL/Container log level. The default log level is warning. - For information about PL/Container log information, see Notes. - -

By default, the PL/Container log information is sent to a system - service. On Red Hat 7 or CentOS 7 systems, the log information is sent - to the journald service. On Red Hat 6 or CentOS 6 - systems, the log is sent to the syslogd service.

-
-
-
-
-
-
-
-
- -
- - Updating the PL/Container Configuration - -

You can add a runtime element to the PL/Container configuration file - with the plcontainer runtime-add command. The command options specify - information such as the runtime ID, Docker image, and language. You can use the - plcontainer runtime-replace command to update an existing - runtime element. The utility updates the configuration file on the - master and all segment instances.

-

The PL/Container configuration file can contain multiple runtime - elements that reference the same Docker image specified by the XML element - image. In the example configuration file, the runtime - elements contain id elements named plc_python_128 and - plc_python_256, both referencing the Docker container - pivotaldata/plcontainer_python:1.0.0. The first - runtime element is defined with a 128MB RAM limit and the second one - with a 256MB RAM limit.

- <configuration> - <runtime> - <id>plc_python_128</id> - <image>pivotaldata/plcontainer_python:1.0.0</image> - <command>./client</command> - <shared_directory access="ro" container="/clientdir" host="/usr/local/gpdb/bin/plcontainer_clients"/> - <setting memory_mb="128"/> - </runtime> - <runtime> - <id>plc_python_256</id> - <image>pivotaldata/plcontainer_python:1.0.0</image> - <command>./client</command> - <shared_directory access="ro" container="/clientdir" host="/usr/local/gpdb/bin/plcontainer_clients"/> - <setting memory_mb="256"/> - <setting resource_group_id="16391"/> - </runtime> -<configuration> - -
+ Notes +

Docker Notes

    -
  • PL/Container does not support the Greenplum Database domain object.
  • -
  • PL/Container maintains the configuration file - plcontainer_configuration.xml in the data directory of all Greenplum - Database segment instances: master, standby master, primary, and mirror. This query - lists the Greenplum Database system data - directories:SELECT hostname, datadir FROM gp_segment_configuration;

    A - sample PL/Container configuration file is in - $GPHOME/share/postgresql/plcontainer.

  • -
  • When Greenplum Database executes a PL/Container UDF, Query Executer (QE) processes - start Docker containers and reuse them as needed. After a certain amount of idle time, a - QE process quits and destroys its Docker containers. You can control the amount of idle - time with the Greenplum Database server configuration parameter gp_vmem_idle_resource_timeout. Controlling the idle - time might help with Docker container reuse and avoid the overhead of creating and - starting a Docker container. Changing - gp_vmem_idle_resource_timeout value, might affect performance due - to resource issues. The parameter also controls the freeing of Greenplum Database - resources other than Docker containers.
  • If a PL/Container Docker container exceeds the maximum allowed memory, it is - terminated and an out of memory warning is displayed. On Red Hat 6 or CentOS 6 systems - that are configured with Docker version 1.7.1, the out of memory warning is also - displayed if the PL/Container Docker container main program (PID 1) is terminated.
  • -
  • In some cases, when PL/Container is running in a high concurrency environment, the - Docker daemon hangs with log entries that indicate a memory shortage. This can happen - even when the system seems to have adequate free memory.

    The issue seems to be - triggered by a combination of two factors, the aggressive virtual memory requirement - of the Go language (golang) runtime that is used by PL/Container, and - the Greenplum Database Linux server kernel parameter setting for - overcommit_memory. The parameter is set to 2 which does not allow - memory overcommit.

    A workaround that might help is to increase the amount of - swap space and increase the Linux server kernel parameter - overcommit_ratio. If the issue still occurs after the changes, - there might be memory shortage. You should check free memory on the system and add - more RAM if needed. You can also decrease the cluster load.

  • + terminated and an out of memory warning is displayed.
  • PL/Container does not limit the Docker base device size, the size of the Docker container. In some cases, the Docker daemon controls the base device size. For example, if the Docker storage driver is devicemapper, the Docker daemon @@ -1445,103 +428,29 @@ $$ LANGUAGE plcontainer;

    format="html" scope="external">Daemon storage-driver.

    When setting the Docker base device size, the size must be set on all Greenplum Database hosts.

  • -
  • When PL/Container logging is enabled, you can set the log level with the Greenplum - Database server configuration parameter log_min_messages. The default log level is - warning. The parameter controls the PL/Container log level and also - controls the Greenplum Database log level.
      -
    • PL/Container logging is enabled or disabled for each runtime ID with the - setting attribute use_container_logging. The - default is no logging.
    • -
    • The PL/Container log information is the information from the UDF that is run in - the Docker container. By default, the PL/Container log information is sent to a - system service. On Red Hat 7 or CentOS 7 systems, the log information is sent to the - journald service. On Red Hat 6 or CentOS 6 systems, the log - information is sent to the syslogd service. The PL/Container log - information is sent to the log file of the host were the Docker container runs.
    • -
    • The Greenplum Database log information is sent to log file on the Greenplum - Database master.
    • -

    When testing or troubleshooting a PL/Container UDF, you can change the Greenplum - Database log level with the SET command. You can set the parameter in - the session before you run your PL/Container UDF. This example sets the log level to - debug1.

    SET log_min_messages='debug1' ; - The parameter log_min_messages controls both the Greenplum - Database and PL/Container logging, increasing the log level might affect Greenplum - Database performance even if a PL/Container UDF is not running.
  • +
  • +

    Known issue:

    +

    Occasionally, when PL/Container is running in a high concurrency environment, the + Docker daemon hangs with log entries that indicate a memory shortage. This can happen + even when the system seems to have adequate free memory.

    +

    The issue seems to be triggered by the aggressive virtual memory requirement of the Go + language (golang) runtime that is used by PL/Container, and the Greenplum Database Linux + server kernel parameter setting for overcommit_memory. The parameter is set to 2 + which does not allow memory overcommit.

    +

    A workaround that might help is to increase the amount of swap space and increase the + Linux server kernel parameter overcommit_ratio. If the issue still occurs after the + changes, there might be memory shortage. You should check free memory on the system and + add more RAM if needed. You can also decrease the cluster load.

    +
-
- - Installing Docker - -

To use PL/Container, Docker must be installed on all Greenplum Database host systems. The - these instructions show how to set up the Docker service on CentOS 7. Installing on RHEL 7 - is a similar process.

-

Before performing the Docker installation ensure these requirements are met.

    -
  • The CentOS extras repository is accessible.
  • -
  • The user has sudo privileges or is root.
  • -

-

See also the Docker site installation instructions for CentOS https://docs.docker.com/engine/installation/linux/centos/. For a - list of Docker commands, see the Docker engine Run Reference https://docs.docker.com/engine/reference/run/.

-
- Installing Docker on CentOS 7 -

These steps install the Docker package and start the Docker service as a user with sudo - privileges.

-
    -
  1. Install dependencies required for - Dockersudo yum install -y yum-utils device-mapper-persistent-data lvm2
  2. -
  3. Add the Docker - reposudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
  4. -
  5. Update yum cachesudo yum makecache fast
  6. -
  7. Install Dockersudo yum -y install docker-ce
  8. -
  9. Start Docker daemon.sudo systemctl start docker
  10. -
  11. To give access to the Docker daemon and docker commands, assign the Greenplum Database - administrator (gpadmin) to the group - docker.sudo usermod -aG docker gpadmin
  12. -
  13. Exit the session and login again to update the privileges.
  14. -
  15. Run a Docker command to test the Docker installation. This command lists the currently - running Docker containers. docker ps
  16. -
-

This command configures Docker to start when the host system - starts.sudo systemctl start docker.service

-

After you have installed Docker on all Greenplum Database hosts, restart the Greenplum - Database system to give Greenplum Database access to Docker. - gpstop -ra

-
-
- Installing Docker on CentOS 6 -

These steps install the Docker package and start the docker service as a user with sudo - privileges.

-
    -
  1. Install EPEL packagesudo yum -y install epel-release
  2. -
  3. Install Dockersudo yum -y install docker-io
  4. -
  5. Create a docker groupsudo groupadd docker
  6. -
  7. Start Dockersudo service docker start
  8. -
  9. To give access to the Docker daemon and docker commands, assign the Greenplum Database - administrator (gpadmin) to the group - docker.sudo usermod -aG docker gpadmin
  10. -
  11. Exit the session and login again to update the privileges.
  12. -
  13. Run a Docker command to test the Docker installation. This command lists the currently - running Docker containers. docker ps
  14. -
-

This command configures Docker to start when the host system - starts.sudo chkconfig docker on

-

After you have installed Docker on all Greenplum Database hosts, restart the Greenplum - Database system to give Greenplum Database access to Docker. - gpstop -ra

-
- -
+ + + - References - + Docker References +

Docker home page https://www.docker.com/

Docker command line interface

Dockerfile reference https://docs.docker.com/engine/reference/builder/

+

For CentOS, see Docker site installation instructions for CentOS.

+

For a list of Docker commands, see the Docker + engine Run Reference.

Installing Docker on Linux systems https://docs.docker.com/engine/installation/linux/centos/

@@ -1560,6 +474,7 @@ $$ LANGUAGE plcontainer;

scope="external">What’s New in Python

Porting from Python 2 to 3 Porting Python 2 Code to Python 3

- +
+ diff --git a/gpdb-doc/dita/analytics/pl_container_using.xml b/gpdb-doc/dita/analytics/pl_container_using.xml new file mode 100644 index 0000000000000000000000000000000000000000..9998ed63e7d5d78403d7c8db4bc8ed288f5c31ad --- /dev/null +++ b/gpdb-doc/dita/analytics/pl_container_using.xml @@ -0,0 +1,472 @@ + + + + Using PL/Container + +

This topic covers further details on:

+

+

    +
  • PL/Container Resource Management
  • +
  • PL/Container Functions
  • +
+

+ + + PL/Container Resource Management + +
+

The Docker containers and the Greenplum Database servers share CPU and memory + resources on the same hosts. In the default case, Greenplum Database is unaware + of the resources consumed by running PL/Container instances. You can use + Greenplum Database resource groups to control overall CPU and memory resource + usage for running PL/Container instances.

+

PL/Container manages resource usage at two levels - the container level and the + runtime level. You can control container-level CPU and memory resources with the + memory_mb and cpu_share settings that you + configure for the PL/Container runtime. memory_mb governs the + memory resources available to each container instance. The + cpu_share setting identifies the relative weighting of a + container's CPU usage compared to other containers. See for further + details.

+

You cannot, by default, restrict the number of executing PL/Container container + instances, nor can you restrict the total amount of memory or CPU resources that + they consume.

+
+
+ Using Resource Groups to Manage PL/Container Resources +

With PL/Container 1.2.0 and later, you can use Greenplum Database resource groups + to manage and limit the total CPU and memory resources of containers in + PL/Container runtimes. For more information about enabling, configuring, and + using Greenplum Database resource groups, refer to Using Resource Groups in the Greenplum Database Administrator + Guide.

+ If you do not explicitly configure resource groups for a PL/Container runtime, + its container instances are limited only by system resources. The containers may + consume resources at the expense of the Greenplum Database server. +

Resource groups for external components such as PL/Container use Linux control + groups (cgroups) to manage component-level use of memory and CPU resources. When + you manage PL/Container resources with resource groups, you configure both a + memory limit and a CPU limit that Greenplum Database applies to all container + instances that share the same PL/Container runtime configuration.

+

When you create a resource group to manage the resources of a PL/Container + runtime, you must specify MEMORY_AUDITOR=cgroup and + CONCURRENCY=0 in addition to the required CPU and memory + limits. For example, the following command creates a resource group named + plpy_run1_rg for a PL/Container runtime: + CREATE RESOURCE GROUP plpy_run1_rg WITH (MEMORY_AUDITOR=cgroup, CONCURRENCY=0, + CPU_RATE_LIMIT=10, MEMORY_LIMIT=10);

+

PL/Container does not use the MEMORY_SHARED_QUOTA and + MEMORY_SPILL_RATIO resource group memory limits. Refer to + the CREATE RESOURCE GROUP + reference page for detailed information about this SQL command.

+

You can create one or more resource groups to manage your running PL/Container + instances. After you create a resource group for PL/Container, you assign the + resource group to one or more PL/Container runtimes. You make this assignment + using the groupid of the resource group. You can determine the + groupid for a given resource group name from the + gp_resgroup_config + gp_toolkit view. For example, the following query displays the + groupid of a resource group named + plpy_run1_rg:SELECT groupname, groupid FROM gp_toolkit.gp_resgroup_config + WHERE groupname='plpy_run1_rg'; + + groupname | groupid + --------------+---------- + plpy_run1_rg | 16391 + (1 row)

+

You assign a resource group to a PL/Container runtime configuration by specifying + the -s resource_group_id=rg_groupid option + to the plcontainer runtime-add (new runtime) or + plcontainer runtime-replace (existing runtime) commands. + For example, to assign the plpy_run1_rg resource group to a new + PL/Container runtime named python_run1: + plcontainer runtime-add -r python_run1 -i pivotaldata/plcontainer_python_shared:devel -l python -s resource_group_id=16391

+

You can also assign a resource group to a PL/Container runtime using the + plcontainer runtime-edit command. For information about the + plcontainer command, see reference page.

+

After you assign a resource group to a PL/Container runtime, all container + instances that share the same runtime configuration are subject to the memory + limit and the CPU limit that you configured for the group. If you decrease the + memory limit of a PL/Container resource group, queries executing in running + containers in the group may fail with an out of memory error. If you drop a + PL/Container resource group while there are running container instances, + Greenplum Database kills the running containers.

+
+
+ Configuring Resource Groups for PL/Container +

To use Greenplum Database resource groups to manage PL/Container resources, you + must explicitly configure both resource groups and PL/Container.

+ +

Perform the following procedure to configure PL/Container to use Greenplum Database + resource groups for CPU and memory resource management:

+
    +
  1. If you have not already configured and enabled resource groups in your Greenplum + Database deployment, configure cgroups and enable Greenplum Database resource + groups as described in Using Resource Groups in the + Greenplum Database Administrator Guide. If you have + previously configured and enabled resource groups in your deployment, ensure + that the Greenplum Database resource group gpdb.conf + cgroups configuration file includes a memory { } block as + described in the previous link.
  2. +
  3. Analyze the resource usage of your Greenplum Database deployment. Determine the + percentage of resource group CPU and memory resources that you want to allocate + to PL/Container Docker containers.
  4. +
  5. Determine how you want to distribute the total PL/Container CPU and memory + resources that you identified in the step above among the PL/Container runtimes. + Identify:
      +
    • The number of PL/Container resource group(s) that you require.
    • +
    • The percentage of memory and CPU resources to allocate to each resource + group.
    • +
    • The resource-group-to-PL/Container-runtime assignment(s).
    • +
  6. +
  7. Create the PL/Container resource groups that you identified in the step above. + For example, suppose that you choose to allocate 25% of both memory and CPU + Greenplum Database resources to PL/Container. If you further split these + resources among 2 resource groups 60/40, the following SQL commands create the + resource + groups:CREATE RESOURCE GROUP plr_run1_rg WITH (MEMORY_AUDITOR=cgroup, CONCURRENCY=0, + CPU_RATE_LIMIT=15, MEMORY_LIMIT=15); + CREATE RESOURCE GROUP plpy_run1_rg WITH (MEMORY_AUDITOR=cgroup, CONCURRENCY=0, + CPU_RATE_LIMIT=10, MEMORY_LIMIT=10);
  8. +
  9. Find and note the groupid associated with each resource group + that you created. For + example:SELECT groupname, groupid FROM gp_toolkit.gp_resgroup_config + WHERE groupname IN ('plpy_run1_rg', 'plr_run1_rg'); + + groupname | groupid + --------------+---------- + plpy_run1_rg | 16391 + plr_run1_rg | 16393 + (1 row)
  10. +
  11. Assign each resource group that you created to the desired PL/Container runtime + configuration. If you have not yet created the runtime configuration, use the + plcontainer runtime-add command. If the runtime already + exists, use the plcontainer runtime-replace or + plcontainer runtime-edit command to add the resource group + assignment to the runtime configuration. For example: + plcontainer runtime-add -r python_run1 -i pivotaldata/plcontainer_python_shared:devel -l python -s resource_group_id=16391 + plcontainer runtime-replace -r r_run1 -i pivotaldata/plcontainer_r_shared:devel -l r -s resource_group_id=16393

    For + information about the plcontainer command, see reference page.

  12. +
+ +
+ Notes +

PL/Container logging

+

When PL/Container logging is enabled, you can set the log level with the + Greenplum Database server configuration parameter log_min_messages. The default log level is + warning. The parameter controls the PL/Container log level + and also controls the Greenplum Database log level.

+
    +
  • PL/Container logging is enabled or disabled for each runtime ID with the + setting attribute + use_container_logging. The default is no logging.
  • +
  • The PL/Container log information is the information from the UDF that is run + in the Docker container. By default, the PL/Container log information is + sent to a system service. On Red Hat 7 or CentOS 7 systems, the log + information is sent to the journald service.
  • +
  • The Greenplum Database log information is sent to log file on the Greenplum + Database master.
  • +
  • When testing or troubleshooting a PL/Container UDF, you can change the + Greenplum Database log level with the SET command. You can + set the parameter in the session before you run your PL/Container UDF. This + example sets the log level to debug1. SET log_min_messages='debug1' ; + The parameter log_min_messages controls both the + Greenplum Database and PL/Container logging, increasing the log level + might affect Greenplum Database performance even if a PL/Container UDF + is not running.
  • +
+
+ +
+ + + PL/Container Functions + +

When you enable PL/Container in a database of a Greenplum Database system, the + language plcontainer is registered in that database. Specify + plcontainer as a language in a UDF definition to create and run + user-defined functions in the procedural languages supported by the PL/Container + Docker images.

+ +
+ Limitations +

Review the following limitations when creating and using PL/Container PL/Python + and PL/R functions:

+
    +
  • Greenplum Database domains are not supported.
  • +
  • Multi-dimensional arrays are not supported.
  • +
  • Python and R call stack information is not displayed when debugging a + UDF.
  • +
  • The plpy.execute() methods nrows() and + status() are not supported.
  • +
  • The PL/Python function plpy.SPIError() is not + supported.
  • +
  • Executing the SAVEPOINT command with plpy.execute() is not + supported.
  • +
  • The DO command is not supported.
  • +
  • See .OUT parameters are not supported.The Python dict type cannot be + returned from a PL/Python UDF.
  • +
  • When returning the Python dict type from a UDF, you can convert the dict + type to a Greenplum Database user-defined data type (UDT).
  • +
+
+ +
+ Using PL/Container functions +

A UDF definition that uses PL/Container must have the these items.

+
    +
  • The first line of the UDF must be # container: + ID
  • +
  • The LANGUAGE attribute must be + plcontainer
  • +
+

The ID is the name that PL/Container uses to identify a Docker + image. When Greenplum Database executes a UDF on a host, the Docker image on the + host is used to start a Docker container that runs the UDF. In the XML + configuration file plcontainer_configuration.xml, there is a + runtime XML element that contains a corresponding + id XML element that specifies the Docker container startup + information. See for information about how PL/Container maps the + ID to a Docker image. See for example UDF definitions.

+

The PL/Container configuration file is read only on the first invocation of a + PL/Container function in each Greenplum Database session that runs PL/Container + functions. You can force the configuration file to be re-read by performing a + SELECT command on the view + plcontainer_refresh_config during the session. For example, + this SELECT command forces the configuration file to be + read.

+ SELECT * FROM plcontainer_refresh_config; +

Running the command executes a PL/Container function that updates the + configuration on the master and segment instances and returns the status of the + refresh. gp_segment_id | plcontainer_refresh_local_config + ---------------+---------------------------------- + 1 | ok + 0 | ok + -1 | ok + (3 rows)

+

Also, you can show all the configurations in the session by performing a + SELECT command on the view + plcontainer_show_config. For example, this + SELECT command returns the PL/Container configurations.

+ SELECT * FROM plcontainer_show_config; +

Running the command executes a PL/Container function that displays configuration + information from the master and segment instances. This is an example of the + start and end of the view + output.INFO: plcontainer: Container 'plc_py_test' configuration + INFO: plcontainer: image = 'pivotaldata/plcontainer_python_shared:devel' + INFO: plcontainer: memory_mb = '1024' + INFO: plcontainer: use container network = 'no' + INFO: plcontainer: use container logging = 'no' + INFO: plcontainer: shared directory from host '/usr/local/greenplum-db/./bin/plcontainer_clients' to container '/clientdir' + INFO: plcontainer: access = readonly + + ... + + INFO: plcontainer: Container 'plc_r_example' configuration (seg0 slice3 192.168.180.45:40000 pid=3304) + INFO: plcontainer: image = 'pivotaldata/plcontainer_r_without_clients:0.2' (seg0 slice3 192.168.180.45:40000 pid=3304) + INFO: plcontainer: memory_mb = '1024' (seg0 slice3 192.168.180.45:40000 pid=3304) + INFO: plcontainer: use container network = 'no' (seg0 slice3 192.168.180.45:40000 pid=3304) + INFO: plcontainer: use container logging = 'yes' (seg0 slice3 192.168.180.45:40000 pid=3304) + INFO: plcontainer: shared directory from host '/usr/local/greenplum-db/bin/plcontainer_clients' to container '/clientdir' (seg0 slice3 192.168.180.45:40000 pid=3304) + INFO: plcontainer: access = readonly (seg0 slice3 192.168.180.45:40000 pid=3304) + gp_segment_id | plcontainer_show_local_config + ---------------+------------------------------- + 0 | ok + -1 | ok + 1 | ok

+

The PL/Container function plcontainer_containers_summary() + displays information about the currently running Docker + containers.SELECT * FROM plcontainer_containers_summary();

+

If a normal (non-superuser) Greenplum Database user runs the function, the + function displays information only for containers created by the user. If a + Greenplum Database superuser runs the function, information for all containers + created by Greenplum Database users is displayed. This is sample output when 2 + containers are running.

+ SEGMENT_ID | CONTAINER_ID | UP_TIME | OWNER | MEMORY_USAGE(KB) + ------------+------------------------------------------------------------------+--------------+---------+------------------ + 1 | 693a6cb691f1d2881ec0160a44dae2547a0d5b799875d4ec106c09c97da422ea | Up 8 seconds | gpadmin | 12940 + 1 | bc9a0c04019c266f6d8269ffe35769d118bfb96ec634549b2b1bd2401ea20158 | Up 2 minutes | gpadmin | 13628 + (2 rows) +

When Greenplum Database executes a PL/Container UDF, Query Executer (QE) + processes start Docker containers and reuse them as needed. After a certain + amount of idle time, a QE process quits and destroys its Docker containers. You + can control the amount of idle time with the Greenplum Database server + configuration parameter gp_vmem_idle_resource_timeout. Controlling + the idle time might help with Docker container reuse and avoid the overhead of + creating and starting a Docker container.

+ Changing gp_vmem_idle_resource_timeout value, + might affect performance due to resource issues. The parameter also controls the + freeing of Greenplum Database resources other than Docker containers. +
+ +
+ Examples +

The values in the # container lines of the examples, + plc_python_shared and plc_r_shared, are + the id XML elements defined in the + plcontainer_config.xml file. The id + element is mapped to the image element that specifies the + Docker image to be started. If you configured PL/Container with a different ID, + change the value of the # container line. For information about + configuring PL/Container and viewing the configuration settings, see .

+

This is an example of PL/Python function that runs using the + plc_python_shared container that contains Python + 2:CREATE OR REPLACE FUNCTION pylog100() RETURNS double precision AS $$ + # container: plc_python_shared + import math + return math.log10(100) + $$ LANGUAGE plcontainer;

+

This is an example of a similar function using the plc_r_shared + container:CREATE OR REPLACE FUNCTION rlog100() RETURNS text AS $$ +# container: plc_r_shared +return(log10(100)) +$$ LANGUAGE plcontainer;

+

If the # container line in a UDF specifies an ID that is not in + the PL/Container configuration file, Greenplum Database returns an error when + you try to execute the UDF.

+
+ + +
+ About PL/Container Running PL/Python + +

In the Python language container, the module plpy is + implemented. The module contains these methods:

+
    +
  • plpy.execute(stmt) - Executes the query string + stmt and returns query result in a list of dictionary + objects. To be able to access the result fields ensure your query returns + named fields.
  • +
  • plpy.prepare(stmt[, argtypes]) - Prepares the execution + plan for a query. It is called with a query string and a list of parameter + types, if you have parameter references in the query.
  • +
  • plpy.execute(plan[, argtypes]) - Executes a prepared + plan.
  • +
  • plpy.debug(msg) - Sends a DEBUG2 message to the Greenplum + Database log.
  • +
  • plpy.log(msg) - Sends a LOG message to the Greenplum + Database log.
  • +
  • plpy.info(msg) - Sends an INFO message to the Greenplum + Database log.
  • +
  • plpy.notice(msg) - Sends a NOTICE message to the Greenplum + Database log.
  • +
  • plpy.warning(msg) - Sends a WARNING message to the + Greenplum Database log.
  • +
  • plpy.error(msg) - Sends an ERROR message to the Greenplum + Database log. An ERROR message raised in Greenplum Database causes the query + execution process to stop and the transaction to rollback.
  • +
  • plpy.fatal(msg) - Sends a FATAL message to the Greenplum + Database log. A FATAL message causes Greenplum Database session to be closed + and transaction to be rolled back.
  • +
  • plpy.subtransaction() - Manages + plpy.execute calls in an explicit subtransaction. See + Explicit Subtransactions in the + PostgreSQL documentation for additional information about + plpy.subtransaction().
  • +
+

If an error of level ERROR or FATAL is raised + in a nested Python function call, the message includes the list of enclosing + functions.

+

The Python language container supports these string quoting functions that are + useful when constructing ad-hoc queries.

    +
  • plpy.quote_literal(string) - Returns the string quoted + to be used as a string literal in an SQL statement string. Embedded + single-quotes and backslashes are properly doubled. + quote_literal() returns null on null input (empty + input). If the argument might be null, quote_nullable() + might be more appropriate.
  • +
  • plpy.quote_nullable(string) - Returns the string quoted + to be used as a string literal in an SQL statement string. If the + argument is null, returns NULL. Embedded single-quotes + and backslashes are properly doubled.
  • +
  • + plpy.quote_ident(string) - Returns the string quoted to + be used as an identifier in an SQL statement string. Quotes are added + only if necessary (for example, if the string contains non-identifier + characters or would be case-folded). Embedded quotes are properly + doubled.
  • +

+

When returning text from a PL/Python function, PL/Container converts a Python + unicode object to text in the database encoding. If the conversion cannot be + performed, an error is returned.

+

PL/Container does not support this Greenplum Database PL/Python feature:

    +
  • Multi-dimensional arrays.
  • +

+

Also, the Python module has two global dictionary objects that retain the data + between function calls. They are named GD and SD. GD is used to share the data + between all the function running within the same container, while SD is used for + sharing the data between multiple calls of each separate function. Be aware that + accessing the data is possible only within the same session, when the container + process lives on a segment or master. Be aware that for idle sessions Greenplum + Database terminates segment processes, which means the related containers would + be shut down and the data from GD and SD lost.

+

For information about PL/Python, see .

+

For information about the plpy methods, see https://www.postgresql.org/docs/9.4/plpython-database.htm.

+ +
+
+ About PL/Container Running PL/R + +

In the R language container, the module pg.spi is implemented. + The module contains these methods:

+
    +
  • pg.spi.exec(stmt) - Executes the query string + stmt and returns query result in R + data.frame. To be able to access the result fields make + sure your query returns named fields.
  • +
  • pg.spi.prepare(stmt[, argtypes]) - Prepares the execution + plan for a query. It is called with a query string and a list of parameter + types if you have parameter references in the query.
  • +
  • pg.spi.execp(plan[, argtypes]) - Execute a prepared + plan.
  • +
  • pg.spi.debug(msg) - Sends a DEBUG2 message to the Greenplum + Database log.
  • +
  • pg.spi.log(msg) - Sends a LOG message to the Greenplum + Database log.
  • +
  • pg.spi.info(msg) - Sends an INFO message to the Greenplum + Database log.
  • +
  • pg.spi.notice(msg) - Sends a NOTICE message to the + Greenplum Database log.
  • +
  • pg.spi.warning(msg) - Sends a WARNING message to the + Greenplum Database log.
  • +
  • pg.spi.error(msg) - Sends an ERROR message to the Greenplum + Database log. An ERROR message raised in Greenplum Database causes the query + execution process to stop and the transaction to rollback.
  • +
  • pg.spi.fatal(msg) - Sends a FATAL message to the Greenplum + Database log. A FATAL message causes Greenplum Database session to be closed + and transaction to be rolled back.
  • +
+

PL/Container does not support this PL/R feature:

    +
  • Multi-dimensional arrays.
  • +

+

For information about PL/R, see .

+

For information about the pg.spi methods, see http://www.joeconway.com/plr/doc/plr-spi-rsupport-funcs-normal.html

+ +
+ +
+
\ No newline at end of file diff --git a/gpdb-doc/dita/analytics/plcontainer-configuration-xml.xml b/gpdb-doc/dita/analytics/plcontainer-configuration-xml.xml new file mode 100644 index 0000000000000000000000000000000000000000..196218b9f33440df92d74845c8c399b440b4a9ba --- /dev/null +++ b/gpdb-doc/dita/analytics/plcontainer-configuration-xml.xml @@ -0,0 +1,316 @@ + + + + PL/Container Configuration File + +

The Greenplum Database utility plcontainer manages the + PL/Container configuration files in a Greenplum Database system. The utility + ensures that the configuration files are consistent across the Greenplum + Database master and segment instances.

+ Modifying the configuration files on the segment instances + without using the utility might create different, incompatible configurations on + different Greenplum Database segments that could cause unexpected behavior. +
PL/Container Configuration File +

PL/Container maintains a configuration file + plcontainer_configuration.xml in the data directory of + all Greenplum Database segments. This query lists the Greenplum Database + system data directories: +

SELECT hostname, datadir FROM gp_segment_configuration; + A sample PL/Container configuration file is in + $GPHOME/share/postgresql/plcontainer.

In an XML file, + names, such as element and attribute names, and values are case + sensitive.

In this XML file, the root element + configuration contains one or more runtime + elements. You specify the id of the runtime + element in the # container: line of a PL/Container function + definition.

This is an example file. Note that all XML elements, names, and + attributes are case + sensitive.<?xml version="1.0" ?> +<configuration> + <runtime> + <id>plc_python_example1</id> + <image>pivotaldata/plcontainer_python_with_clients:0.1</image> + <command>./pyclient</command> + </runtime> + <runtime> + <id>plc_python_example2</id> + <image>pivotaldata/plcontainer_python_without_clients:0.1</image> + <command>/clientdir/pyclient.sh</command> + <shared_directory access="ro" container="/clientdir" host="/usr/local/greenplum-db/bin/plcontainer_clients"/> + <setting memory_mb="512"/> + <setting use_container_logging="yes"/> + <setting cpu_share="1024"/> + <setting resource_group_id="16391"/> + </runtime> + <runtime> + <id>plc_r_example</id> + <image>pivotaldata/plcontainer_r_without_clients:0.2</image> + <command>/clientdir/rclient.sh</command> + <shared_directory access="ro" container="/clientdir" host="/usr/local/greenplum-db/bin/plcontainer_clients"/> + <setting use_container_logging="yes"/> + <setting roles="gpadmin,user1"/> + </runtime> + <runtime> +</configuration>

These + are the XML elements and attributes in a PL/Container configuration + file.

+ + configuration + Root element for the XML file. + + + runtime + One element for each specific container available in the system. + These are child elements of the configuration + element. + + + + id + Required. The value is used to reference a Docker + container from a PL/Container user-defined function. The + id value must be unique in the + configuration. The id must start with a + character or digit (a-z, A-Z, or 0-9) and can contain + characters, digits, or the characters _ + (underscore), . (period), or + - (dash). Maximum length is 63 + Bytes.

The id specifies which + Docker image to use when PL/Container creates a + Docker container to execute a user-defined + function.

+
+ + image + +

Required. The value is the full Docker image name, + including image tag. The same way you specify them + for starting this container in Docker. Configuration + allows to have many container objects referencing + the same image name, this way in Docker they would + be represented by identical containers.

+

For example, you might have two + runtime elements, with different + id elements, + plc_python_128 and + plc_python_256, both referencing + the Docker image + pivotaldata/plcontainer_python:1.0.0. + The first runtime specifies a 128MB + RAM limit and the second one specifies a 256MB limit + that is specified by the memory_mb + attribute of a setting element.

+
+
+ + command + Required. The value is the command to be run inside of + container to start the client process inside in the + container. When creating a runtime + element, the plcontainer utility adds a + command element based on the + language (the -l option). + command element for the Python 2 + language.<command>/clientdir/pyclient.sh</command> + command element for the Python 3 + language.<command>/clientdir/pyclient3.sh</command> + command element for the R + language.<command>/clientdir/rclient.sh</command> + You should modify the value only if you build a custom + container and want to implement some additional + initialization logic before the container starts. + This element cannot be set with the + plcontainer utility. You can + update the configuration file with the + plcontainer runtime-edit + command. + + + shared_directory + Optional. This element specifies a shared Docker shared + volume for a container with access information. Multiple + shared_directory elements are + allowed. Each shared_directory element + specifies a single shared volume. XML attributes for the + shared_directory element:
    +
  • host - a directory location on + the host system.
  • +
  • container - a directory + location inside of container.
  • +
  • access - access level to the + host directory, which can be either + ro (read-only) or + rw (read-write).
  • +
+ When creating a runtime element, the + plcontainer utility adds a + shared_directory + element.<shared_directory access="ro" container="/clientdir" host="/usr/local/greenplum-db/bin/plcontainer_clients"/> + For each runtime element, the + container attribute of the + shared_directory elements must be + unique. For example, a runtime element + cannot have two shared_directory + elements with attribute + container="/clientdir". Allowing read-write access to a host + directory requires special consideration.
    +
  • When specifying read-write access to host + directory, ensure that the specified host + directory has the correct permissions.
  • +
  • When running PL/Container user-defined + functions, multiple concurrent Docker containers + that are running on a host could change data in + the host directory. Ensure that the functions + support multiple concurrent access to the data in + the host directory.
  • +
+
+ + settings + Optional. This element specifies Docker container + configuration information. Each setting + element contains one attribute. The element attribute + specifies logging, memory, or networking information. + For example, this element enables + logging.<setting use_container_logging="yes"/> + These are the valid attributes. + + cpu_share + Optional. Specify the CPU usage for each + PL/Container container in the runtime. The value + of the element is a positive integer. The default + value is 1024. The value is a relative weighting + of CPU usage compared to other containers. + For example, a container with a + cpu_share of 2048 is allocated + double the CPU slice time compared with container + with the default value of 1024. + + + memory_mb="size" + Optional. The value specifies the amount of + memory, in MB, that each container is allowed to + use. Each container starts with this amount of RAM + and twice the amount of swap space. The container + memory consumption is limited by the host system + cgroups configuration, which + means in case of memory overcommit, the container + is terminated by the system. + + + resource_group_id="rg_groupid" + Optional. The value specifies the + groupid of the resource group to + assign to the PL/Container runtime. The resource + group limits the total CPU and memory resource usage + for all running containers that share this runtime + configuration. You must specify the + groupid of the resource group. If + you do not assign a resource group to a PL/Container + runtime configuration, its container instances are + limited only by system resources. For information + about managing PL/Container resources, see About PL/Container Resource + Management. + + + roles="list_of_roles" + Optional. The value is a Greenplum Database + role name or a comma-separated list of roles. + PL/Container runs a container that uses the + PL/Container runtime configuration only for the + listed roles. If the attribute is not specified, + any Greenplum Database role can run an instance of + this container runtime configuration. For example, + you create a UDF that specifies the + plcontainer language and + identifies a # container: runtime + configuration that has the roles + attribute set. When a role (user) runs the UDF, + PL/Container checks the list of roles and runs the + container only if the role is on the list. + + + use_container_logging="{yes | no}" + Optional. Enables or disables Docker logging + for the container. The attribute value + yes enables logging. The + attribute value no disables + logging (the default). + The Greenplum Database server configuration + parameter log_min_messages + controls the PL/Container log level. The default log + level is warning. For information + about PL/Container log information, see Notes. + +

By default, the PL/Container log information is + sent to a system service. On Red Hat 7 or CentOS 7 + systems, the log information is sent to the + journald service. On Red Hat 6 or + CentOS 6 systems, the log is sent to the + syslogd service.

+
+
+
+
+
+
+
+
+
+
+ Update the PL/Container Configuration +

You can add a runtime element to the PL/Container + configuration file with the plcontainer runtime-add + command. The command options specify information such as the runtime ID, + Docker image, and language. You can use the plcontainer + runtime-replace command to update an existing + runtime element. The utility updates the configuration + file on the master and all segment instances.

+

The PL/Container configuration file can contain multiple + runtime elements that reference the same Docker image + specified by the XML element image. In the example + configuration file, the runtime elements contain + id elements named plc_python_128 and + plc_python_256, both referencing the Docker container + pivotaldata/plcontainer_python:1.0.0. The first + runtime element is defined with a 128MB RAM limit and + the second one with a 256MB RAM limit.

+ <configuration> + <runtime> + <id>plc_python_128</id> + <image>pivotaldata/plcontainer_python:1.0.0</image> + <command>./client</command> + <shared_directory access="ro" container="/clientdir" host="/usr/local/gpdb/bin/plcontainer_clients"/> + <setting memory_mb="128"/> + </runtime> + <runtime> + <id>plc_python_256</id> + <image>pivotaldata/plcontainer_python:1.0.0</image> + <command>./client</command> + <shared_directory access="ro" container="/clientdir" host="/usr/local/gpdb/bin/plcontainer_clients"/> + <setting memory_mb="256"/> + <setting resource_group_id="16391"/> + </runtime> +<configuration> +

Configuration changes that are made with the utility are applied to the XML + files on all Greenplum Database segments. However, PL/Container + configurations of currently running sessions use the configuration that + existed during session start up. To update the PL/Container configuration in + a running session, execute this command in the session.

+ SELECT * FROM plcontainer_refresh_config; +

Running the command executes a PL/Container function that updates the session + configuration on the master and segment instances.

+
+ + + +
diff --git a/gpdb-doc/dita/analytics/plcontainer_ref.xml b/gpdb-doc/dita/analytics/plcontainer_ref.xml new file mode 100644 index 0000000000000000000000000000000000000000..1dd82aa42c305ca35d169891eb6a046e97e66cf6 --- /dev/null +++ b/gpdb-doc/dita/analytics/plcontainer_ref.xml @@ -0,0 +1,316 @@ + + + + plcontainer + +

The plcontainer utility installs Docker images and manages the + PL/Container configuration. The utility consists of two sets of commands.

+
    +
  • image-* commands manage Docker images on the Greenplum Database + system hosts.
  • +
  • runtime-* commands manage the PL/Container configuration file on the + Greenplum Database instances. You can add Docker image information to the PL/Container + configuration file including the image name, location, and shared folder information. + You can also edit the configuration file.
  • +
+

To configure PL/Container to use a Docker image, you install the Docker image on all the + Greenplum Database hosts and then add configuration information to the PL/Container + configuration.

+

PL/Container configuration values, such as image names, runtime IDs, and parameter values + and names are case sensitive.

+
+ plcontainer Syntax + plcontainer [command] [-h | --help] [--verbose] +

Where command is one of the following.

+ image-add {{-f | --file} image_file} | {{-u | --URL} image_URL} + image-delete {-i | --image} image_name + image-list + + runtime-add {-r | --runtime} runtime_id + {-i | --image} image_name {-l | --language} {python | python3 | r} + [{-v | --volume} shared_volume [{-v| --volume} shared_volume...]] + [{-s | --setting} param=value [{-s | --setting} param=value ...]] + runtime-replace {-r | --runtime} runtime_id + {-i | --image} image_name -l {r | python} + [{-v | --volume} shared_volume [{-v | --volume} shared_volume...]] + [{-s | --setting} param=value [{-s | --setting} param=value ...]] + runtime-show {-r | --runtime} runtime_id + runtime-delete {-r | --runtime} runtime_id + runtime-edit [{-e | --editor} editor] + runtime-backup {-f | --file} config_file + runtime-restore {-f | --file} config_file + runtime-verify +
+
+ plcontainer Commands and Options +
+ + + image-add location + Install a Docker image on the Greenplum Database hosts. Specify either the location + of the Docker image file on the host or the URL to the Docker image. These are the + supported location options.
    +
  • {-f | --file} image_file Specify the tar + archive file on the host that contains the Docker image. This example points to an + image file in the gpadmin home directory + /home/gpadmin/test_image.tar.gz
  • +
  • {-u | --URL} image_URL Specify the URL of the + Docker repository and image. This example URL points to a local Docker repository + 192.168.0.1:5000/images/mytest_plc_r:devel
  • +
+ After installing the Docker image, use the runtime-add + command to configure PL/Container to use the Docker image. +
+ + image-delete {-i | --image} image_name + Remove an installed Docker image from all Greenplum Database hosts. Specify the full + Docker image name including the tag for example + pivotaldata/plcontainer_python_shared:1.0.0 + + + image-list + List the Docker images installed on the host. The command list only the images on + the local host, not remote hosts. The command lists all installed Docker images, + including images installed with Docker commands. + + + runtime-add options + Add configuration information to the PL/Container configuration file on all + Greenplum Database hosts. If the specified runtime_id exists, the + utility returns an error and the configuration information is not added. + These are the supported options: + + + + {-i | --image} docker-image + Required. Specify the full Docker image name, including the tag, that is + installed on the Greenplum Database hosts. For example + pivotaldata/plcontainer_python:1.0.0. + The utility returns a warning if the specified Docker image is not + installed. + The plcontainer image-list command displays installed image + information including the name and tag (the Repository and Tag columns). + + + {-l | --language} python | python3 | r + Required. Specify the PL/Container language type, supported values are + python (PL/Python using Python 2), python3 + (PL/Python using Python 3) and r (PL/R). When adding + configuration information for a new runtime, the utility adds a startup command + to the configuration based on the language you specify. + Startup command for the Python 2 + language./clientdir/pyclient.sh + Startup command for the Python 3 + language./clientdir/pyclient3.sh + Startup command for the R + language./clientdir/rclient.sh + + + {-r | --runtime} runtime_id + + Required. Add the runtime ID. When adding a runtime element + in the PL/Container configuration file, this is the value of the + id element in the PL/Container configuration file. Maximum + length is 63 Bytes. + You specify the name in the Greenplum Database UDF on the # + container line. + + + {-s | --setting} + param=value + Optional. Specify a setting to add to the runtime configuration information. + You can specify this option multiple times. The setting applies to the runtime + configuration specified by the runtime_id. The parameter is + the XML attribute of the settings element in the PL/Container + configuration file. These are valid parameters.
    +
  • cpu_share - Set the CPU limit for each container in the + runtime configuration. The default value is 1024. The value is a relative + weighting of CPU usage compared to other containers.
  • +
  • memory_mb - Set the memory limit for each container in + the runtime configuration. The default value is 1024. The value is an + integer that specifies the amount of memory in MB.
  • +
  • resource_group_id - Assign the specified resource group + to the runtime configuration. The resource group limits the total CPU and + memory resource usage for all containers that share this runtime + configuration. You must specify the groupid of the resource + group. For information about managing PL/Container resources, see About PL/Container Resource + Management.
  • +
  • roles - Specify the Greenplum Database roles that are + allowed to run a container for the runtime configuration. You can specify a + single role name or comma separated lists of role names. The default is no + restriction.
  • +
  • use_container_logging - Enable or disable Docker logging + for the container. The value is either yes (enable logging) + or no (disable logging, the default).

    The Greenplum + Database server configuration parameter log_min_messages controls the log level. + The default log level is warning. For information about + PL/Container log information, see Notes.

  • +
+
+ + {-v | --volume} shared-volume + Optional. Specify a Docker volume to bind mount. You can specify this option + multiple times to define multiple volumes. + The format for a shared volume: + host-dir:container-dir:[rw|ro]. + The information is stored as attributes in the shared_directory + element of the runtime element in the PL/Container + configuration file.
    +
  • host-dir - absolute path to a directory on the host + system. The Greenplum Database administrator user (gpadmin) must have + appropriate access to the directory.
  • +
  • container-dir - absolute path to a directory in the + Docker container.
  • +
  • [rw|ro] - read-write or read-only access to the host + directory from the container.
  • +
+ When adding configuration information for a new runtime, the utility adds this + read-only shared volume information. + + greenplum-home/bin/plcontainer_clients:/clientdir:ro + + If needed, you can specify other shared directories. The utility returns an + error if the specified container-dir is the same as the one + that is added by the utility, or if you specify multiple shared volumes with the + same container-dir. + Allowing read-write access to a host directory requires + special considerations.
    +
  • When specifying read-write access to host directory, ensure that the + specified host directory has the correct permissions.
  • +
  • When running PL/Container user-defined functions, multiple concurrent + Docker containers that are running on a host could change data in the host + directory. Ensure that the functions support multiple concurrent access to + the data in the host directory.
  • +
+
+
+
+
+ + runtime-backup {-f | --file} config_file + +

Copies the PL/Container configuration file to the specified file on the + local host.

+
+
+ + runtime-delete {-r | --runtime} runtime_id + +

Removes runtime configuration information in the PL/Container + configuration file on all Greenplum Database instances. The utility returns a + message if the specified runtime_id does not exist in the + file.

+
+
+ + runtime-edit [{-e | --editor} editor] + Edit the XML file plcontainer_configuration.xml with the specified + editor. The default editor is vi.

Saving the file updates the + configuration file on all Greenplum Database hosts. If errors exist in the updated + file, the utility returns an error and does not update the file.

+
+ + runtime-replace options + +

Replaces runtime configuration information in the PL/Container + configuration file on all Greenplum Database instances. If the + runtime_id does not exist, the information is added to the + configuration file. The utility adds a startup command and shared directory to the + configuration.

+

See runtime-add for command options and information added to the + configuration.

+
+
+ + runtime-restore {-f | --file} config_file + +

Replaces information in the PL/Container configuration file + plcontainer_configuration.xml on all Greenplum Database instances + with the information from the specified file on the local host.

+
+
+ + runtime-show [{-r | --runtime} runtime_id] + +

Displays formatted PL/Container runtime configuration information. If a + runtime_id is not specified, the configuration for all runtime + IDs are displayed.

+
+
+ + runtime-verify + +

Checks the PL/Container configuration information on the Greenplum + Database instances with the configuration information on the master. If the utility + finds inconsistencies, you are prompted to replace the remote copy with the local + copy. The utility also performs XML validation.

+
+
+ + -h | --help + Display help text. If specified without a command, displays help for all + plcontainer commands. If specified with a command, displays help + for the command. + + + --verbose + Enable verbose logging for the command. + +
+
+ Examples +

These are examples of common commands to manage PL/Container:

+
    +
  • Install a Docker image on all Greenplum Database hosts. This example loads a Docker + image from a file. The utility displays progress information on the command line as + the utility installs the Docker image on all the + hosts.plcontainer image-add -f plc_newr.tar.gz

    After + installing the Docker image, you add or update a runtime entry in the PL/Container + configuration file to give PL/Container access to the Docker image to start Docker + containers.

  • +
  • Add a container entry to the PL/Container configuration file. This example adds + configuration information for a PL/R runtime, and specifies a shared volume and + settings for memory and logging. + plcontainer runtime-add -r runtime2 -i test_image2:0.1 -l r \ + -v /host_dir2/shared2:/container_dir2/shared2:ro \ + -s memory_mb=512 -s use_container_logging=yes

    The + utility displays progress information on the command line as it adds the runtime + configuration to the configuration file and distributes the updated configuration to + all instances.

  • +
  • Show specific runtime with given runtime id in configuration + fileplcontainer runtime-show -r plc_python_shared

    The + utility displays the configuration information similar to this + output.PL/Container Runtime Configuration: +--------------------------------------------------------- + Runtime ID: plc_python_shared + Linked Docker Image: test1:latest + Runtime Setting(s): + Shared Directory: + ---- Shared Directory From HOST '/usr/local/greenplum-db/bin/plcontainer_clients' to Container '/clientdir', access mode is 'ro' + ---- Shared Directory From HOST '/home/gpadmin/share/' to Container '/opt/share', access mode is 'rw' +---------------------------------------------------------

  • +
  • Edit the configuration in an interactive editor of your choice. This example edits + the configuration file with the vim + editor.plcontainer runtime-edit -e vim

    When you save the + file, the utility displays progress information on the command line as it + distributes the file to the Greenplum Database hosts.

  • +
  • Save the current PL/Container configuration to a file. This example saves the file + to the local file + /home/gpadmin/saved_plc_config.xmlplcontainer runtime-backup -f /home/gpadmin/saved_plc_config.xml
  • +
  • Overwrite PL/Container configuration file with an XML file. This example replaces + the information in the configuration file with the information from the file in the + /home/gpadmin + directory.plcontainer runtime-restore -f /home/gpadmin/new_plcontainer_configuration.xmlThe + utility displays progress information on the command line as it distributes the + updated file to the Greenplum Database instances.
  • +
+
+ +
\ No newline at end of file diff --git a/gpdb-doc/dita/utility_guide/contrib-programs.xml b/gpdb-doc/dita/utility_guide/contrib-programs.xml index bd1ec36e5cce3e04fa771bfc0d2da9db88412965..6922f29e485b72b49cf0ef076bf4e038c4f98c47 100644 --- a/gpdb-doc/dita/utility_guide/contrib-programs.xml +++ b/gpdb-doc/dita/utility_guide/contrib-programs.xml @@ -9,7 +9,7 @@ installed:

  • pg_upgrade - Server program to upgrade a Greenplum + format="html">pg_upgrade - Server program to upgrade a Postgres Database server instance.pg_upgrade is not intended for direct use with Greenplum 6, but will be used by Greenplum upgrade utilities in a future release.