start.rst 6.0 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
.. Licensed to the Apache Software Foundation (ASF) under one
   or more contributor license agreements.  See the NOTICE file
   distributed with this work for additional information
   regarding copyright ownership.  The ASF licenses this file
   to you under the Apache License, Version 2.0 (the
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at

..   http://www.apache.org/licenses/LICENSE-2.0

.. Unless required by applicable law or agreed to in writing,
   software distributed under the License is distributed on an
   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   KIND, either express or implied.  See the License for the
   specific language governing permissions and limitations
   under the License.

Getting Started
===============

To get started with *PyDolphinScheduler* you must ensure python and pip
installed on your machine, if you're already set up, you can skip straight
to `Installing PyDolphinScheduler`_, otherwise please continue with
`Installing Python`_.

Installing Python
-----------------

How to install `python` and `pip` depends on what operating system
you're using. The python wiki provides up to date
`instructions for all platforms here`_. When you entering the website
and choice your operating system, you would be offered the choice and
select python version. *PyDolphinScheduler* recommend use version above
Python 3.6 and we highly recommend you install *Stable Releases* instead
of *Pre-releases*.

After you have download and installed Python, you should open your terminal,
typing and running :code:`python --version` to check whether the installation
is correct or not. If all thing good, you could see the version in console
without error(here is a example after Python 3.8.7 installed)

.. code-block:: bash

    $ python --version
    Python 3.8.7

Installing PyDolphinScheduler
-----------------------------

After Python is already installed on your machine following section
`installing Python`_, it easy to *PyDolphinScheduler* by pip.

.. code-block:: bash

    $ pip install apache-dolphinscheduler

The latest version of *PyDolphinScheduler* would be installed after you run above
58
command in your terminal. You could go and `start Python Gateway Service`_ to finish
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
the prepare, and then go to :doc:`tutorial` to make your hand dirty. But if you
want to install the unreleased version of *PyDolphinScheduler*, you could go and see
section `installing PyDolphinScheduler in dev`_ for more detail.

Installing PyDolphinScheduler In Dev
------------------------------------

Because the project is developing and some of the features still not release.
If you want to try some thing unreleased you could install from the source code
which we hold in GitHub

.. code-block:: bash

    # Clone Apache DolphinScheduler repository
    $ git clone git@github.com:apache/dolphinscheduler.git
    # Install PyDolphinScheduler in develop mode
    $ cd dolphinscheduler-python/pydolphinscheduler && pip install -e .

77
After you installed *PyDolphinScheduler*, please remember `start Python Gateway Service`_
78 79
which waiting for *PyDolphinScheduler*'s workflow definition require.

80 81
Start Python Gateway Service
----------------------------
82 83 84

Since **PyDolphinScheduler** is Python API for `Apache DolphinScheduler`_, it
could define workflow and tasks structure, but could not run it unless you
85 86 87
`install Apache DolphinScheduler`_ and start its API server which including
Python gateway service in it. We only and some key steps here and you could
go `install Apache DolphinScheduler`_ for more detail
88 89 90

.. code-block:: bash

91 92
    # Start DolphinScheduler api-server which including python gateway service
    $ ./bin/dolphinscheduler-daemon.sh start api-server
93 94

To check whether the server is alive or not, you could run :code:`jps`. And
95
the server is health if keyword `ApiApplicationServer` in the console.
96 97 98 99 100

.. code-block:: bash

    $ jps
    ....
101
    201472 ApiApplicationServer
102 103
    ....

104 105 106 107 108 109
.. note::

   Please make sure you already enabled started Python gateway service along with `api-server`. The configuration is in
   yaml config path `python-gateway.enabled : true` in api-server's configuration path in `api-server/conf/application.yaml`.
   The default value is true and Python gateway service start when api server is been started.

110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140
Run an Example
--------------

Before run an example for pydolphinscheduler, you should get the example code from it source code. You could run
single bash command to get it

.. code-block:: bash

   $ wget https://raw.githubusercontent.com/apache/dolphinscheduler/dev/dolphinscheduler-python/pydolphinscheduler/src/pydolphinscheduler/examples/tutorial.py

or you could copy-paste the content from `tutorial source code`_. And then you could run the example in your
terminal

.. code-block:: bash

   $ python tutorial.py

If you want to submit your workflow to a remote API server, which means that your workflow script is different
from the API server, you should first change pydolphinscheduler configuration and then submit the workflow script

.. code-block:: bash

   $ pydolphinscheduler config --init
   $ pydolphinscheduler config --set java_gateway.address <your-api-server-ip-or-hostname>
   $ python tutorial.py

.. note::

   You could see more information in :doc:`config` about all the configurations pydolphinscheduler supported.


141 142 143 144 145 146 147 148 149 150
What's More
-----------

If you do not familiar with *PyDolphinScheduler*, you could go to :doc:`tutorial`
and see how it work. But if you already know the inside of *PyDolphinScheduler*,
maybe you could go and play with all :doc:`tasks/index` *PyDolphinScheduler* supports.

.. _`instructions for all platforms here`: https://wiki.python.org/moin/BeginnersGuide/Download
.. _`Apache DolphinScheduler`: https://dolphinscheduler.apache.org
.. _`install Apache DolphinScheduler`: https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/guide/installation/standalone.html
151
.. _`tutorial source code`: https://raw.githubusercontent.com/apache/dolphinscheduler/dev/dolphinscheduler-python/pydolphinscheduler/src/pydolphinscheduler/examples/tutorial.py