Information moved to doc/src/sgml/geqo.sgml.

596731dc · Thomas G. Lockhart · be8300b1 · be8300b1 · be8300b1
隐藏空白更改
内联并排

Showing with 0 addition and 219 deletion

doc/README.GEQO doc/README.GEQO +0 -160

doc/TODO.GEQO doc/TODO.GEQO +0 -59

未找到文件。
--- a/doc/README.GEQO
+++ b/doc/README.GEQO
-
-=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
-Genetic Query Optimization in Database Systems
-=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
-
-                  Martin Utesch
-
-           <utesch@aut.tu-freiberg.de>
-
-          Institute of Automatic Control
-        University of Mining and Technology
-                 Freiberg, Germany
-
-                    02/10/1997
-
-
-1.) Query Handling as a Complex Optimization Problem
-====================================================
-
-   Among all relational operators the most difficult one to process and
-optimize is the JOIN. The number of alternative plans to answer a query
-grows exponentially with the number of JOINs included in it. Further
-optimization effort is caused by the support of a variety of *JOIN
-methods* (e.g., nested loop, index scan, merge join in Postgres) to
-process individual JOINs and a diversity of *indices* (e.g., r-tree,
-b-tree, hash in Postgres) as access paths for relations.
-
-   The current Postgres optimizer implementation performs a *near-
-exhaustive search* over the space of alternative strategies. This query
-optimization technique is inadequate to support database application
-domains that evolve the need for extensive queries, such as artifcial
-intelligence.
-
-   The Institute of Automatic Control at the University of Mining and
-Technology Freiberg, Germany encountered the described problems as its
-folks wanted to take the Postgres DBMS as the backend for a decision
-support knowledge based system for the maintenance of an electrical
-power grid. The DBMS needed to handle large JOIN queries for the
-inference machine of the knowledge based system.
-
-   Performance difficulties within exploring the space of possible query
-plans arose the demand for a new optimization technique being developed.
-
-   In the following we propose the implementation of a *Genetic
-Algorithm* as an option for the database query optimization problem.
-
-
-2.) Genetic Algorithms (GA)
-===========================
-
-   The GA is a heuristic optimization method which operates through 
-determined, randomized search. The set of possible solutions for the
-optimization problem is considered as a *population* of *individuals*.
-The degree of adaption of an individual to its environment is specified
-by its *fitness*.
-
-   The coordinates of an individual in the search space are represented
-by *chromosomes*, in essence a set of character strings. A *gene* is a
-subsection of a chromosome which encodes the value of a single parameter
-being optimized. Typical encodings for a gene could be *binary* or
-*integer*.
-
-   Through simulation of the evolutionary operations *recombination*,
-*mutation*, and *selection* new generations of search points are found
-that show a higher average fitness than their ancestors.
-
-   According to the "comp.ai.genetic" FAQ it cannot be stressed too
-strongly that a GA is not a pure random search for a solution to a
-problem. A GA uses stochastic processes, but the result is distinctly
-non-random (better than random). 
-
-Structured Diagram of a GA:
---------------------------
-
-P(t)    generation of ancestors at a time t
-P''(t)  generation of descendants at a time t
-
-+=========================================+
-|>>>>>>>>>>>  Algorithm GA  <<<<<<<<<<<<<<|
-+=========================================+
-| INITIALIZE t := 0                       |
-+=========================================+
-| INITIALIZE P(t)                         |
-+=========================================+
-| evalute FITNESS of P(t)                 |
-+=========================================+
-| while not STOPPING CRITERION do         |
-|   +-------------------------------------+
-|   | P'(t)  := RECOMBINATION{P(t)}       |
-|   +-------------------------------------+
-|   | P''(t) := MUTATION{P'(t)}           |
-|   +-------------------------------------+
-|   | P(t+1) := SELECTION{P''(t) + P(t)}  |
-|   +-------------------------------------+
-|   | evalute FITNESS of P''(t)           |
-|   +-------------------------------------+
-|   | t := t + 1                          |
-+===+=====================================+
-
-
-3.) Genetic Query Optimization (GEQO) in PostgreSQL
-===================================================
-
-   The GEQO module is intended for the solution of the query
-optimization problem similar to a traveling salesman problem (TSP).
-Possible query plans are encoded as integer strings. Each string
-represents the JOIN order from one relation of the query to the next.
-E. g., the query tree  /\
-                      /\ 2
-                     /\ 3
-                    4  1  is encoded by the integer string '4-1-3-2',
-which means, first join relation '4' and '1', then '3', and
-then '2', where 1, 2, 3, 4 are relids in PostgreSQL.
-
-   Parts of the GEQO module are adapted from D. Whitley's Genitor
-algorithm.
-
-   Specific characteristics of the GEQO implementation in PostgreSQL
-are:
-
-o  usage of a *steady state* GA (replacement of the least fit
-   individuals in a population, not whole-generational replacement)
-   allows fast convergence towards improved query plans. This is
-   essential for query handling with reasonable time;
-
-o  usage of *edge recombination crossover* which is especially suited
-   to keep edge losses low for the solution of the TSP by means of a GA;
-
-o  mutation as genetic operator is deprecated so that no repair
-   mechanisms are needed to generate legal TSP tours.
-
-   The GEQO module gives the following benefits to the PostgreSQL DBMS
-compared to the Postgres query optimizer implementation:
-
-o  handling of large JOIN queries through non-exhaustive search;
-
-o  improved cost size approximation of query plans since no longer
-   plan merging is needed (the GEQO module evaluates the cost for a
-   query plan as an individual).
-
-
-References
-==========
-
-J. Heitk"otter, D. Beasley:
---------------------------
-   "The Hitch-Hicker's Guide to Evolutionary Computation",
-   FAQ in 'comp.ai.genetic',
-   'ftp://ftp.Germany.EU.net/pub/research/softcomp/EC/Welcome.html'
-
-Z. Fong:
--------
-   "The Design and Implementation of the Postgres Query Optimizer",
-   file 'planner/Report.ps' in the 'postgres-papers' distribution
-
-R. Elmasri, S. Navathe:
-----------------------
-   "Fundamentals of Database Systems",
-   The Benjamin/Cummings Pub., Inc.
-
--- a/doc/TODO.GEQO
+++ b/doc/TODO.GEQO
-
-=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
-*         Things left to done for the PostgreSQL                    *
-=           Genetic Query Optimization (GEQO)                       =
-*              module implementation                                *
-=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
-* Martin Utesch		      * Institute of Automatic Control      *
-=                             = University of Mining and Technology =
-* utesch@aut.tu-freiberg.de   * Freiberg, Germany                   *
-=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
-
-
-1.) Basic Improvements
-===============================================================
-
-a) improve freeing of memory when query is already processed:
-------------------------------------------------------------
-with large JOIN queries the computing time spent for the genetic query
-optimization seems to be a mere *fraction* of the time Postgres
-needs for freeing memory via routine 'MemoryContextFree',
-file 'backend/utils/mmgr/mcxt.c';
-debugging showed that it get stucked in a loop of routine
-'OrderedElemPop', file 'backend/utils/mmgr/oset.c';
-the same problems arise with long queries when using the normal
-Postgres query optimization algorithm;
-
-b) improve genetic algorithm parameter settings:
------------------------------------------------
-file 'backend/optimizer/geqo/geqo_params.c', routines
-'gimme_pool_size' and 'gimme_number_generations';
-we have to find a compromise for the parameter settings
-to satisfy two competing demands:
-1.  optimality of the query plan
-2.  computing time
-
-c) find better solution for integer overflow:
---------------------------------------------
-file 'backend/optimizer/geqo/geqo_eval.c', routine
-'geqo_joinrel_size';
-the present hack for MAXINT overflow is to set the Postgres integer
-value of 'rel->size' to its logarithm;
-modifications of 'struct Rel' in 'backend/nodes/relation.h' will
-surely have severe impacts on the whole PostgreSQL implementation.
-
-d) find solution for exhausted memory:
--------------------------------------
-that may occur with more than 10 relations involved in a query,
-file 'backend/optimizer/geqo/geqo_eval.c', routine
-'gimme_tree' which is recursively called;
-maybe I forgot something to be freed correctly, but I dunno what;
-of course the 'rel' data structure of the JOIN keeps growing and
-growing the more relations are packed into it;
-suggestions are welcome :-(
-
-
-2.) Further Improvements
-===============================================================
-Enable bushy query tree processing within PostgreSQL;
-that may improve the quality of query plans.