README.md 11.1 KB
Newer Older
E
Entong Shen 已提交
1 2
<pre>
======================================================================
3
               __________  ____  ____  _________
4 5 6 7
              / ____/ __ \/ __ \/ __ \/ ____/   |
             / / __/ /_/ / / / / / /_/ / /   / /| |
            / /_/ / ____/ /_/ / _, _/ /___/ ___ |
            \____/_/    \____/_/ |_|\____/_/  |_|
E
Entong Shen 已提交
8 9 10 11 12 13
                  The Greenplum Query Optimizer
              Copyright (c) 2015, Pivotal Software, Inc.
            Licensed under the Apache License, Version 2.0
======================================================================
</pre>

14
Welcome to GPORCA, the Greenplum Next Generation Query Optimizer!
E
Entong Shen 已提交
15

16 17
Want to [Contribute](#contribute)?

18 19
GPORCA supports various build types: debug, release with debug info, release.
On x86 systems, GPORCA can also be built as a 32-bit or 64-bit library. You'll
20
need CMake 3.1 or higher to build GPORCA. Get it from cmake.org, or your
E
Entong Shen 已提交
21 22
operating system's package manager.

A
Atri Sharma 已提交
23 24 25
# First Time Setup

## Clone GPORCA
E
Entong Shen 已提交
26 27

```
X
Xin Zhang 已提交
28 29
git clone https://github.com/greenplum-db/gporca.git
cd gporca
E
Entong Shen 已提交
30 31 32 33
```

## Pre-Requisites

X
Xin Zhang 已提交
34 35
GPORCA uses the following library:
* GP-Xerces - Greenplum's patched version of Xerces-C 3.1.X
E
Entong Shen 已提交
36 37 38

### Installing GP-Xerces

39 40
[GP-XERCES is available here](https://github.com/greenplum-db/gp-xerces). The GP-XERCES README
gives instructions for building and installing.
E
Entong Shen 已提交
41

J
Jesse Zhang 已提交
42 43 44 45 46
## Build and install GPORCA

ORCA is built with [CMake](https://cmake.org), so any build system supported by
CMake can be used. The team uses [Ninja](https://ninja-build.org) because it's
really really fast and convenient.
47

A
Atri Sharma 已提交
48 49
Go into `gporca` directory:

E
Entong Shen 已提交
50
```
A
Atri Sharma 已提交
51 52
mkdir build
cd build
J
Jesse Zhang 已提交
53 54
cmake -GNinja ../
ninja install
E
Entong Shen 已提交
55 56
```

57
<a name="test"></a>
A
Atri Sharma 已提交
58
## Test GPORCA
E
Entong Shen 已提交
59

A
Atri Sharma 已提交
60
To run all GPORCA tests, simply use the `ctest` command from the build directory
J
Jesse Zhang 已提交
61
after build finishes.
E
Entong Shen 已提交
62

A
Atri Sharma 已提交
63 64 65 66 67 68
```
ctest
```

Much like `make`, `ctest` has a -j option that allows running multiple tests in
parallel to save time. Using it is recommended for faster testing.
E
Entong Shen 已提交
69 70

```
A
Atri Sharma 已提交
71
ctest -j7
E
Entong Shen 已提交
72 73
```

A
Atri Sharma 已提交
74 75 76 77 78 79 80 81 82 83 84 85 86 87
By default, `ctest` does not print the output of failed tests. To print the
output of failed tests, use the `--output-on-failure` flag like so (this is
useful for debugging failed tests):

```
ctest -j7 --output-on-failure
```

To run a specific individual test, use the `gporca_test` executable directly.

```
./server/gporca_test -U CAggTest
```

X
Xin Zhang 已提交
88
To run a specific minidump, for example for `../data/dxl/minidump/TVFRandom.mdp`:
X
Xin Zhang 已提交
89
```
X
Xin Zhang 已提交
90
./server/gporca_test -d ../data/dxl/minidump/TVFRandom.mdp
X
Xin Zhang 已提交
91 92
```

A
Atri Sharma 已提交
93 94 95
Note that some tests use assertions that are only enabled for DEBUG builds, so
DEBUG-mode tests tend to be more rigorous.

96
<a name="addtest"></a>
97 98 99 100 101 102 103 104 105 106 107
## Adding tests

Most of the regression tests come in the form of a "minidump" file.
A minidump is an XML file that contains all the input needed to plan a query,
including information about all tables, datatypes, and functions used, as well
as statistics. It also contains the resulting plan.

A new minidump can be created by running a query on a live GPDB server:

1. Run these in a psql session:

108
```
109 110 111 112 113
set client_min_messages='log';
set optimizer=on;
set optimizer_enumerate_plans=on;
set optimizer_minidump=always;
set optimizer_enable_constant_expression_evaluation=off;
114
```
115 116 117 118

2. Run the query in the same psql session. It will create a minidump file
   under the "minidumps" directory, in the master's data directory:

119
```
120 121 122
$ ls -l ~/data-master/minidumps/
total 12
-rw------- 1 heikki heikki 10818 Jun 10 22:02 Minidump_20160610_220222_4_14.mdp
123
```
124 125 126 127

3. Run xmllint on the minidump to format it better, and copy it under the
   data/dxl/minidump directory:

128
```
129
xmllint --format ~/data-master/minidumps/Minidump_20160610_220222_4_14.mdp > data/dxl/minidump/MyTest.xml
130
```
131 132 133

5. Add it to the test suite, in server/src/unittest/gpopt/minidump/CICGTest.cpp

134
```
135 136 137 138 139 140 141 142 143 144
--- a/server/src/unittest/gpopt/minidump/CICGTest.cpp
+++ b/server/src/unittest/gpopt/minidump/CICGTest.cpp
@@ -217,6 +217,7 @@ const CHAR *rgszFileNames[] =
                "../data/dxl/minidump/EffectsOfJoinFilter.mdp",
                "../data/dxl/minidump/Join-IDF.mdp",
                "../data/dxl/minidump/CoerceToDomain.mdp",
+               "../data/dxl/minidump/Mytest.mdp",
                "../data/dxl/minidump/LeftOuter2InnerUnionAllAntiSemiJoin.mdp",
 #ifndef GPOS_DEBUG
                // TODO:  - Jul 14 2015; disabling it for debug build to reduce testing time
145
```
146 147


Z
Zak Auerbach 已提交
148 149
## [Experimental] Concourse
GPORCA contains a series of pipeline and task files to run various sets of tests
150 151
on [concourse](http://concourse.ci/). You can learn more about deploying concourse with
[bosh at bosh.io](http://bosh.io/).
Z
Zak Auerbach 已提交
152 153 154 155 156 157 158 159 160 161 162 163 164 165 166

Our concourse currently runs the following sets of tests:
* build and ctest on centos5
* build and ctest on debian8

We are currently working on adding support for the following sets of tests:
* build and ctest on centos6
* build GPDB with GPORCA and run `make installcheck-good` on centos6

All configuration files for our concourse pipelines can be found in the `concourse/` 
directory.

Note: concourse jobs and pipelines for GPORCA are currently experimental and should not be considered
ready for use in production-level CI environments.

A
Atri Sharma 已提交
167 168 169 170 171
# Advanced Setup

## How to generate make files with different options

Here are few build flavors:
E
Entong Shen 已提交
172 173

```
A
Atri Sharma 已提交
174
# debug build
J
Jesse Zhang 已提交
175
cmake -GNinja -D CMAKE_BUILD_TYPE=DEBUG ../
E
Entong Shen 已提交
176 177 178
```

```
A
Atri Sharma 已提交
179
# release build with debug info
J
Jesse Zhang 已提交
180
cmake -GNinja -D CMAKE_BUILD_TYPE=RelWithDebInfo ../
E
Entong Shen 已提交
181 182 183
```

```
A
Atri Sharma 已提交
184
# release build
J
Jesse Zhang 已提交
185
cmake -GNinja -D CMAKE_BUILD_TYPE=RELEASE ../
E
Entong Shen 已提交
186 187
```

X
Xin Zhang 已提交
188
## Explicitly Specifying GP-Xerces For Build
A
Atri Sharma 已提交
189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205

### GP-XERCES

It is recommended to use the `--prefix` option to the Xerces-C configure script
to install GP-Xerces in a location other than the default under `/usr/local/`,
because you may have other software that depends on Xerces-C, and the changes
introduced in the GP-Xerces patch make it incompatible with the upstream
version. Installing in a non-default prefix allows you to have GP-Xerces
installed side-by-side with unpatched Xerces without incompatibilities.

You can point cmake at your patched GP-Xerces installation using the
`XERCES_INCLUDE_DIR` and `XERCES_LIBRARY` options like so:

However, to use the current build scripts in GPDB, Xerces with the gp_xerces
patch will need to be placed on the /usr path.

```
J
Jesse Zhang 已提交
206
cmake -GNinja -D XERCES_INCLUDE_DIR=/opt/gp_xerces/include -D XERCES_LIBRARY=/opt/gp_xerces/lib/libxerces-c.so ..
A
Atri Sharma 已提交
207 208 209 210
```

Again, on Mac OS X, the library name will end with `.dylib` instead of `.so`.

A
Atri Sharma 已提交
211 212 213 214 215 216 217 218 219 220 221 222 223 224
## Cross-Compiling 32-bit or 64-bit libraries

### GP-XERCES
Unless you intend to cross-compile a 32 or 64-bit version of GP-Orca, you can ignore these
instructions. If you need to explicitly compile for the 32 or 64-bit version of
your architecture, you need to set the `CFLAGS` and `CXXFLAGS` environment
variables for the configure script like so (use `-m32` for 32-bit, `-m64` for
64-bit):

```
CFLAGS="-m32" CXXFLAGS="-m32" ../configure --prefix=/opt/gp_xerces_32
```

### GPORCA
E
Entong Shen 已提交
225 226

For the most part you should not need to explicitly compile a 32-bit or 64-bit
227 228 229 230 231
version of the optimizer libraries. By default, a "native" version for your host
platform will be compiled. However, if you are on x86 and want to, for example,
build a 32-bit version of Optimizer libraries on a 64-bit machine, you can do
so as described below. Note that you will need a "multilib" C++ compiler that
supports the -m32/-m64 switches, and you may also need to install 32-bit ("i386")
E
Entong Shen 已提交
232
versions of the C and C++ standard libraries for your OS. Finally, you will need
X
Xin Zhang 已提交
233
to build 32-bit or 64-bit versions of GP-Xerces as appropriate.
E
Entong Shen 已提交
234 235 236 237 238

Toolchain files for building 32 or 64-bit x86 libraries are located in the cmake
directory. Here is an example of building for 32-bit x86:

```
J
Jesse Zhang 已提交
239
cmake -GNinja -D CMAKE_TOOLCHAIN_FILE=../cmake/i386.toolchain.cmake ../
E
Entong Shen 已提交
240 241 242 243 244
```

And for 64-bit x86:

```
J
Jesse Zhang 已提交
245
cmake -GNinja -D CMAKE_TOOLCHAIN_FILE=../cmake/x86_64.toolchain.cmake ../
E
Entong Shen 已提交
246 247
```

A
Atri Sharma 已提交
248
## How to speed-up the build (or debug it)
E
Entong Shen 已提交
249

A
Atri Sharma 已提交
250
For faster build use the -j option of make. For instance, the following command runs make on 7 job slots
E
Entong Shen 已提交
251 252

```
O
oarap 已提交
253
make -j7
E
Entong Shen 已提交
254 255
```

A
Atri Sharma 已提交
256
Show all commands being run as part of make (for debugging purpose)
E
Entong Shen 已提交
257 258

```
O
oarap 已提交
259
make VERBOSE=1
E
Entong Shen 已提交
260 261
```

A
Atri Sharma 已提交
262
### Extended Tests
E
Entong Shen 已提交
263

264
Debug builds of GPORCA include a couple of "extended" tests for features like
E
Entong Shen 已提交
265 266 267 268 269
fault-simulation and time-slicing that work by running the entire test suite
in combination with the feature being tested. These tests can take a long time
to run and are not enabled by default. To turn extended tests on, add the cmake
arguments `-D ENABLE_EXTENDED_TESTS=1`.

A
Atri Sharma 已提交
270
## Installation Details
E
Entong Shen 已提交
271

X
Xin Zhang 已提交
272
GPORCA has four libraries:
E
Entong Shen 已提交
273 274

1. libnaucrates --- has all DXL related classes, and statistics related classes
275 276 277
2. libgpopt     --- has all the code related to the optimization engine, meta-data accessor, logical / physical operators,
                    transformation rules, and translators (DXL to expression and vice versa).
3. libgpdbcost  --- cost model for GPDB.
X
Xin Zhang 已提交
278
4. libgpos	--- abstraction of memory allocation, scheduling, error handling, and testing framework.
E
Entong Shen 已提交
279

280
By default, GPORCA will be installed under /usr/local. You can change this by
E
Entong Shen 已提交
281 282
setting CMAKE_INSTALL_PREFIX when running cmake, for example:
```
J
Jesse Zhang 已提交
283
cmake -GNinja -D CMAKE_INSTALL_PREFIX=/home/user/gporca ../
E
Entong Shen 已提交
284 285 286 287 288 289 290
```

By default, the header files are located in:
```
/usr/local/include/naucrates
/usr/local/include/gpdbcost
/usr/local/include/gpopt
X
Xin Zhang 已提交
291
/usr/local/include/gpos
E
Entong Shen 已提交
292 293 294 295 296 297 298
```
the library is located at:

```
/usr/local/lib/libnaucrates.so*
/usr/local/lib/libgpdbcost.so*
/usr/local/lib/libgpopt.so*
X
Xin Zhang 已提交
299
/usr/local/lib/libgpos.so*
E
Entong Shen 已提交
300 301
```

A
Atri Sharma 已提交
302
Build and install:
E
Entong Shen 已提交
303
```
O
oarap 已提交
304
make install
E
Entong Shen 已提交
305
```
A
Atri Sharma 已提交
306 307

Build and install with verbose output
E
Entong Shen 已提交
308
```
O
oarap 已提交
309
make VERBOSE=1 install
E
Entong Shen 已提交
310 311
```

A
Atri Sharma 已提交
312
## Cleanup
E
Entong Shen 已提交
313

A
Atri Sharma 已提交
314 315 316 317 318
Remove the `cmake` files generated under `build` folder of `gporca` repo:

```
rm -fr build/*
```
E
Entong Shen 已提交
319

A
Atri Sharma 已提交
320
Remove gporca header files and library, (assuming the default install prefix /usr/local)
E
Entong Shen 已提交
321 322

```
O
oarap 已提交
323 324 325
rm -rf /usr/local/include/naucrates
rm -rf /usr/local/include/gpdbcost
rm -rf /usr/local/include/gpopt
X
Xin Zhang 已提交
326
rm -rf /usr/local/include/gpos
O
oarap 已提交
327 328 329
rm -rf /usr/local/lib/libnaucrates.so*
rm -rf /usr/local/lib/libgpdbcost.so*
rm -rf /usr/local/lib/libgpopt.so*
X
Xin Zhang 已提交
330
rm -rf /usr/local/lib/libgpos.so*
331
```
332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351

<a name="contribute"></a>
# How to Contribute

We accept contributions via [Github Pull requests](https://help.github.com/articles/using-pull-requests) only.

Follow the steps below to open a PR:
1. Fork the project’s repository
1. Create your own feature branch (e.g. `git checkout -b better_orca`) and make changes on this branch.
    * Follow the previous sections on this page to setup and build in your environment.
1. Run through all the [tests](#test) in your feature branch and ensure they are successful.
    * Follow the [Add tests](#addtest) section to add new tests.
1. Push your local branch to your fork (e.g. `git push origin better_orca`) and [submit a pull request](https://help.github.com/articles/creating-a-pull-request)

Your contribution will be analyzed for product fit and engineering quality prior to merging.  
Note: All contributions must be sent using GitHub Pull Requests.  

**Your pull request is much more likely to be accepted if it is small and focused with a clear message that conveys the intent of your change.**

Overall we follow GPDB's comprehensive contribution policy. Please refer to it [here](https://github.com/greenplum-db/gpdb#contributing) for details.