diff --git a/contrib/adminpack/README.adminpack b/contrib/adminpack/README.adminpack deleted file mode 100644 index 1eb0ef5174e20ec9a0e631c586b764f7b22aba5d..0000000000000000000000000000000000000000 --- a/contrib/adminpack/README.adminpack +++ /dev/null @@ -1,48 +0,0 @@ -PostgreSQL Administration Functions -=================================== - -This directory is a PostgreSQL 'contrib' module which implements a number of -support functions which pgAdmin and other administration and management tools -can use to provide additional functionality if installed on a server. - -Installation -============ - -This module is normally distributed as a PostgreSQL 'contrib' module. To -install it from a pre-configured source tree run the following commands -as a user with appropriate privileges from the adminpack source directory: - -make -make install - -Alternatively, if you have a PostgreSQL 8.2 or higher installation but no -source tree you can install using PGXS. Simply run the following commands the -adminpack source directory: - -make USE_PGXS=1 -make USE_PGXS=1 install - -pgAdmin will look for the functions in the Maintenance Database (usually -"postgres" for 8.2 servers) specified in the connection dialogue for the server. -To install the functions in the database, either run the adminpack.sql script -using the pgAdmin SQL tool (and then close and reopen the connection to the -freshly instrumented server), or run the script using psql, eg: - -psql -U postgres postgres < adminpack.sql - -Other administration tools that use this module may have different requirements, -please consult the tool's documentation for further details. - -Objects implemented (superuser only) -==================================== - -int8 pg_catalog.pg_file_write(fname text, data text, append bool) -bool pg_catalog.pg_file_rename(oldname text, newname text, archivname text) -bool pg_catalog.pg_file_rename(oldname text, newname text) -bool pg_catalog.pg_file_unlink(fname text) -setof record pg_catalog.pg_logdir_ls() - -/* Renaming of existing backend functions for pgAdmin compatibility */ -int8 pg_catalog.pg_file_read(fname text, data text, append bool) -bigint pg_catalog.pg_file_length(text) -int4 pg_catalog.pg_logfile_rotate() diff --git a/contrib/btree_gist/README.btree_gist b/contrib/btree_gist/README.btree_gist deleted file mode 100644 index f54a300babe36a4fa36d59b1fe1632132de7929a..0000000000000000000000000000000000000000 --- a/contrib/btree_gist/README.btree_gist +++ /dev/null @@ -1,55 +0,0 @@ -This is a B-Tree implementation using GiST that supports the int2, int4, -int8, float4, float8 timestamp with/without time zone, time -with/without time zone, date, interval, oid, money, macaddr, char, -varchar/text, bytea, numeric, bit, varbit and inet/cidr types. - -All work was done by Teodor Sigaev (teodor@stack.net) , Oleg Bartunov -(oleg@sai.msu.su), Janko Richter (jankorichter@yahoo.de). -See http://www.sai.msu.su/~megera/postgres/gist for additional -information. - -NEWS: - - Apr 17, 2004 - Performance optimizing - - Jan 21, 2004 - add support for bytea, numeric, bit, varbit, inet/cidr - - Jan 17, 2004 - Reorganizing code and add support for char, varchar/text - - Jan 10, 2004 - btree_gist now support oid , timestamp with time zone , - time with and without time zone, date , interval - money, macaddr - - Feb 5, 2003 - btree_gist now support int2, int8, float4, float8 - -NOTICE: - This version will only work with PostgreSQL version 7.4 and above - because of changes in the system catalogs and the function call - interface. - - If you want to index varchar attributes, you have to index using - the function text(): - Example: - CREATE TABLE test ( a varchar(23) ); - CREATE INDEX testidx ON test USING GIST ( text(a) ); - - -INSTALLATION: - - gmake - gmake install - -- load functions - psql < btree_gist.sql - -REGRESSION TEST: - - gmake installcheck - -EXAMPLE USAGE: - - create table test (a int4); - -- create index - create index testidx on test using gist (a); - -- query - select * from test where a < 10; - diff --git a/contrib/chkpass/README.chkpass b/contrib/chkpass/README.chkpass deleted file mode 100644 index e1491839e207adefa104bdfaf74b1d877ed0fe9c..0000000000000000000000000000000000000000 --- a/contrib/chkpass/README.chkpass +++ /dev/null @@ -1,56 +0,0 @@ -$PostgreSQL: pgsql/contrib/chkpass/README.chkpass,v 1.5 2007/10/01 19:06:48 darcy Exp $ - -Chkpass is a password type that is automatically checked and converted upon -entry. It is stored encrypted. To compare, simply compare against a clear -text password and the comparison function will encrypt it before comparing. -It also returns an error if the code determines that the password is easily -crackable. This is currently a stub that does nothing. - -I haven't worried about making this type indexable. I doubt that anyone -would ever need to sort a file in order of encrypted password. - -If you precede the string with a colon, the encryption and checking are -skipped so that you can enter existing passwords into the field. - -On output, a colon is prepended. This makes it possible to dump and reload -passwords without re-encrypting them. If you want the password (encrypted) -without the colon then use the raw() function. This allows you to use the -type with things like Apache's Auth_PostgreSQL module. - -The encryption uses the standard Unix function crypt(), and so it suffers -from all the usual limitations of that function; notably that only the -first eight characters of a password are considered. - -Here is some sample usage: - -test=# create table test (p chkpass); -CREATE TABLE -test=# insert into test values ('hello'); -INSERT 0 1 -test=# select * from test; - p ----------------- - :dVGkpXdOrE3ko -(1 row) - -test=# select raw(p) from test; - raw ---------------- - dVGkpXdOrE3ko -(1 row) - -test=# select p = 'hello' from test; - ?column? ----------- - t -(1 row) - -test=# select p = 'goodbye' from test; - ?column? ----------- - f -(1 row) - -D'Arcy J.M. Cain -darcy@druid.net - diff --git a/contrib/cube/README.cube b/contrib/cube/README.cube deleted file mode 100644 index 56b06202dc332c6d142f47b161adc61b2d701b3e..0000000000000000000000000000000000000000 --- a/contrib/cube/README.cube +++ /dev/null @@ -1,355 +0,0 @@ -This directory contains the code for the user-defined type, -CUBE, representing multidimensional cubes. - - -FILES ------ - -Makefile building instructions for the shared library - -README.cube the file you are now reading - -cube.c the implementation of this data type in c - -cube.sql.in SQL code needed to register this type with postgres - (transformed to cube.sql by make) - -cubedata.h the data structure used to store the cubes - -cubeparse.y the grammar file for the parser (used by cube_in() in cube.c) - -cubescan.l scanner rules (used by cube_yyparse() in cubeparse.y) - - -INSTALLATION -============ - -To install the type, run - - make - make install - -The user running "make install" may need root access; depending on how you -configured the PostgreSQL installation paths. - -This only installs the type implementation and documentation. To make the -type available in any particular database, as a postgres superuser do: - - psql -d databasename < cube.sql - -If you install the type in the template1 database, all subsequently created -databases will inherit it. - -To test the new type, after "make install" do - - make installcheck - -If it fails, examine the file regression.diffs to find out the reason (the -test code is a direct adaptation of the regression tests from the main -source tree). - -By default the external functions are made executable by anyone. - -SYNTAX -====== - -The following are valid external representations for the CUBE type: - -'x' A floating point value representing - a one-dimensional point or one-dimensional - zero length cubement - -'(x)' Same as above - -'x1,x2,x3,...,xn' A point in n-dimensional space, - represented internally as a zero volume box - -'(x1,x2,x3,...,xn)' Same as above - -'(x),(y)' 1-D cubement starting at x and ending at y - or vice versa; the order does not matter - -'(x1,...,xn),(y1,...,yn)' n-dimensional box represented by - a pair of its opposite corners, no matter which. - Functions take care of swapping to achieve - "lower left -- upper right" representation - before computing any values - -Grammar -------- - -rule 1 box -> O_BRACKET paren_list COMMA paren_list C_BRACKET -rule 2 box -> paren_list COMMA paren_list -rule 3 box -> paren_list -rule 4 box -> list -rule 5 paren_list -> O_PAREN list C_PAREN -rule 6 list -> FLOAT -rule 7 list -> list COMMA FLOAT - -Tokens ------- - -n [0-9]+ -integer [+-]?{n} -real [+-]?({n}\.{n}?|\.{n}) -FLOAT ({integer}|{real})([eE]{integer})? -O_BRACKET \[ -C_BRACKET \] -O_PAREN \( -C_PAREN \) -COMMA \, - - -Examples of valid CUBE representations: --------------------------------------- - -'x' A floating point value representing - a one-dimensional point (or, zero-length - one-dimensional interval) - -'(x)' Same as above - -'x1,x2,x3,...,xn' A point in n-dimensional space, - represented internally as a zero volume cube - -'(x1,x2,x3,...,xn)' Same as above - -'(x),(y)' A 1-D interval starting at x and ending at y - or vice versa; the order does not matter - -'[(x),(y)]' Same as above - -'(x1,...,xn),(y1,...,yn)' An n-dimensional box represented by - a pair of its diagonally opposite corners, - regardless of order. Swapping is provided - by all comarison routines to ensure the - "lower left -- upper right" representation - before actaul comparison takes place. - -'[(x1,...,xn),(y1,...,yn)]' Same as above - - -White space is ignored, so '[(x),(y)]' can be: '[ ( x ), ( y ) ]' - - -DEFAULTS -======== - -I believe this union: - -select cube_union('(0,5,2),(2,3,1)','0'); -cube_union -------------------- -(0, 0, 0),(2, 5, 2) -(1 row) - -does not contradict to the common sense, neither does the intersection - -select cube_inter('(0,-1),(1,1)','(-2),(2)'); -cube_inter -------------- -(0, 0),(1, 0) -(1 row) - -In all binary operations on differently sized boxes, I assume the smaller -one to be a cartesian projection, i. e., having zeroes in place of coordinates -omitted in the string representation. The above examples are equivalent to: - -cube_union('(0,5,2),(2,3,1)','(0,0,0),(0,0,0)'); -cube_inter('(0,-1),(1,1)','(-2,0),(2,0)'); - - -The following containment predicate uses the point syntax, -while in fact the second argument is internally represented by a box. -This syntax makes it unnecessary to define the special Point type -and functions for (box,point) predicates. - -select cube_contains('(0,0),(1,1)', '0.5,0.5'); -cube_contains --------------- -t -(1 row) - - -PRECISION -========= - -Values are stored internally as 64-bit floating point numbers. This means that -numbers with more than about 16 significant digits will be truncated. - - -USAGE -===== - -The access method for CUBE is a GiST index (gist_cube_ops), which is a -generalization of R-tree. GiSTs allow the postgres implementation of -R-tree, originally encoded to support 2-D geometric types such as -boxes and polygons, to be used with any data type whose data domain -can be partitioned using the concepts of containment, intersection and -equality. In other words, everything that can intersect or contain -its own kind can be indexed with a GiST. That includes, among other -things, all geometric data types, regardless of their dimensionality -(see also contrib/seg). - -The operators supported by the GiST access method include: - -a = b Same as - - The cubements a and b are identical. - -a && b Overlaps - - The cubements a and b overlap. - -a @> b Contains - - The cubement a contains the cubement b. - -a <@ b Contained in - - The cubement a is contained in b. - -(Before PostgreSQL 8.2, the containment operators @> and <@ were -respectively called @ and ~. These names are still available, but are -deprecated and will eventually be retired. Notice that the old names -are reversed from the convention formerly followed by the core geometric -datatypes!) - -Although the mnemonics of the following operators is questionable, I -preserved them to maintain visual consistency with other geometric -data types defined in Postgres. - -Other operators: - -[a, b] < [c, d] Less than -[a, b] > [c, d] Greater than - - These operators do not make a lot of sense for any practical - purpose but sorting. These operators first compare (a) to (c), - and if these are equal, compare (b) to (d). That accounts for - reasonably good sorting in most cases, which is useful if - you want to use ORDER BY with this type - -The following functions are available: - -cube_distance(cube, cube) returns double - cube_distance returns the distance between two cubes. If both cubes are - points, this is the normal distance function. - -cube(float8) returns cube - This makes a one dimensional cube with both coordinates the same. - If the type of the argument is a numeric type other than float8 an - explicit cast to float8 may be needed. - cube(1) == '(1)' - -cube(float8, float8) returns cube - This makes a one dimensional cube. - cube(1,2) == '(1),(2)' - -cube(float8[]) returns cube - This makes a zero-volume cube using the coordinates defined by the - array. - cube(ARRAY[1,2]) == '(1,2)' - -cube(float8[], float8[]) returns cube - This makes a cube, with upper right and lower left coordinates as - defined by the 2 float arrays. Arrays must be of the same length. - cube('{1,2}'::float[], '{3,4}'::float[]) == '(1,2),(3,4)' - -cube(cube, float8) returns cube - This builds a new cube by adding a dimension on to an existing cube with - the same values for both parts of the new coordinate. This is useful for - building cubes piece by piece from calculated values. - cube('(1)',2) == '(1,2),(1,2)' - -cube(cube, float8, float8) returns cube - This builds a new cube by adding a dimension on to an existing cube. - This is useful for building cubes piece by piece from calculated values. - cube('(1,2)',3,4) == '(1,3),(2,4)' - -cube_dim(cube) returns int - cube_dim returns the number of dimensions stored in the the data structure - for a cube. This is useful for constraints on the dimensions of a cube. - -cube_ll_coord(cube, int) returns double - cube_ll_coord returns the nth coordinate value for the lower left corner - of a cube. This is useful for doing coordinate transformations. - -cube_ur_coord(cube, int) returns double - cube_ur_coord returns the nth coordinate value for the upper right corner - of a cube. This is useful for doing coordinate transformations. - -cube_subset(cube, int[]) returns cube - Builds a new cube from an existing cube, using a list of dimension indexes - from an array. Can be used to find both the ll and ur coordinate of single - dimenion, e.g.: cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[2]) = '(3),(7)' - Or can be used to drop dimensions, or reorder them as desired, e.g.: - cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[3,2,1,1]) = '(5, 3, 1, 1),(8, 7, 6, 6)' - -cube_is_point(cube) returns bool - cube_is_point returns true if a cube is also a point. This is true when the - two defining corners are the same. - -cube_enlarge(cube, double, int) returns cube - cube_enlarge increases the size of a cube by a specified radius in at least - n dimensions. If the radius is negative the box is shrunk instead. This - is useful for creating bounding boxes around a point for searching for - nearby points. All defined dimensions are changed by the radius. If n - is greater than the number of defined dimensions and the cube is being - increased (r >= 0) then 0 is used as the base for the extra coordinates. - LL coordinates are decreased by r and UR coordinates are increased by r. If - a LL coordinate is increased to larger than the corresponding UR coordinate - (this can only happen when r < 0) than both coordinates are set to their - average. To make it harder for people to break things there is an effective - maximum on the dimension of cubes of 100. This is set in cubedata.h if - you need something bigger. - -There are a few other potentially useful functions defined in cube.c -that vanished from the schema because I stopped using them. Some of -these were meant to support type casting. Let me know if I was wrong: -I will then add them back to the schema. I would also appreciate -other ideas that would enhance the type and make it more useful. - -For examples of usage, see sql/cube.sql - - -CREDITS -======= - -This code is essentially based on the example written for -Illustra, http://garcia.me.berkeley.edu/~adong/rtree - -My thanks are primarily to Prof. Joe Hellerstein -(http://db.cs.berkeley.edu/~jmh/) for elucidating the gist of the GiST -(http://gist.cs.berkeley.edu/), and to his former student, Andy Dong -(http://best.me.berkeley.edu/~adong/), for his exemplar. -I am also grateful to all postgres developers, present and past, for enabling -myself to create my own world and live undisturbed in it. And I would like to -acknowledge my gratitude to Argonne Lab and to the U.S. Department of Energy -for the years of faithful support of my database research. - ------------------------------------------------------------------------- -Gene Selkov, Jr. -Computational Scientist -Mathematics and Computer Science Division -Argonne National Laboratory -9700 S Cass Ave. -Building 221 -Argonne, IL 60439-4844 - -selkovjr@mcs.anl.gov - ------------------------------------------------------------------------- - -Minor updates to this package were made by Bruno Wolff III -in August/September of 2002. - -These include changing the precision from single precision to double -precision and adding some new functions. - ------------------------------------------------------------------------- - -Additional updates were made by Joshua Reich in July 2006. - -These include cube(float8[], float8[]) and cleaning up the code to use -the V1 call protocol instead of the deprecated V0 form. diff --git a/contrib/dblink/README.dblink b/contrib/dblink/README.dblink deleted file mode 100644 index 5b6ffa8ae7897be6a632306dae8470c0881a7451..0000000000000000000000000000000000000000 --- a/contrib/dblink/README.dblink +++ /dev/null @@ -1,109 +0,0 @@ -/* - * dblink - * - * Functions returning results from a remote database - * - * Joe Conway - * And contributors: - * Darko Prenosil - * Shridhar Daithankar - * Kai Londenberg (K.Londenberg@librics.de) - * - * Copyright (c) 2001-2007, PostgreSQL Global Development Group - * ALL RIGHTS RESERVED; - * - * Permission to use, copy, modify, and distribute this software and its - * documentation for any purpose, without fee, and without a written agreement - * is hereby granted, provided that the above copyright notice and this - * paragraph and the following two paragraphs appear in all copies. - * - * IN NO EVENT SHALL THE AUTHOR OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR - * DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING - * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS - * DOCUMENTATION, EVEN IF THE AUTHOR OR DISTRIBUTORS HAVE BEEN ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. - * - * THE AUTHOR AND DISTRIBUTORS SPECIFICALLY DISCLAIMS ANY WARRANTIES, - * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY - * AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS - * ON AN "AS IS" BASIS, AND THE AUTHOR AND DISTRIBUTORS HAS NO OBLIGATIONS TO - * PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. - * - */ - -Release Notes: - 27 August 2006 - - Added async query capability. Original patch by - Kai Londenberg (K.Londenberg@librics.de), modified by Joe Conway - Version 0.7 (as of 25 Feb, 2004) - - Added new version of dblink, dblink_exec, dblink_open, dblink_close, - and, dblink_fetch -- allows ERROR on remote side of connection to - throw NOTICE locally instead of ERROR - Version 0.6 - - functions deprecated in 0.5 have been removed - - added ability to create "named" persistent connections - Version 0.5 - - dblink now supports use directly as a table function; this is the new - preferred usage going forward - - Use of dblink_tok is now deprecated; original form of dblink is also - deprecated. They _will_ be removed in the next version. - - dblink_last_oid is also deprecated; use dblink_exec() which returns - the command status as a single row, single column result. - - Original dblink, dblink_tok, and dblink_last_oid are commented out in - dblink.sql; remove the comments to use the deprecated functions. - - dblink_strtok() and dblink_replace() functions were removed. Use - split() and replace() respectively (new backend functions in - PostgreSQL 7.3) instead. - - New functions: dblink_exec() for non-SELECT queries; dblink_connect() - opens connection that persists for duration of a backend; - dblink_disconnect() closes a persistent connection; dblink_open() - opens a cursor; dblink_fetch() fetches results from an open cursor. - dblink_close() closes a cursor. - - New test suite: dblink_check.sh, dblink.test.sql, - dblink.test.expected.out. Execute dblink_check.sh from the same - directory as the other two files. Output is dblink.test.out and - dblink.test.diff. Note that dblink.test.sql is a good source - of example usage. - - Version 0.4 - - removed cursor wrap around input sql to allow for remote - execution of INSERT/UPDATE/DELETE - - dblink now returns a resource id instead of a real pointer - - added several utility functions -- see below - - Version 0.3 - - fixed dblink invalid pointer causing corrupt elog message - - fixed dblink_tok improper handling of null results - - fixed examples in README.dblink - - Version 0.2 - - initial release - -Installation: - Place these files in a directory called 'dblink' under 'contrib' in the PostgreSQL source tree. Then run: - - make - make install - - You can use dblink.sql to create the functions in your database of choice, e.g. - - psql template1 < dblink.sql - - installs dblink functions into database template1 - -Documentation: - - Note: Parameters representing relation names must include double - quotes if the names are mixed-case or contain special characters. They - must also be appropriately qualified with schema name if applicable. - - See the following files: - doc/connection - doc/cursor - doc/query - doc/execute - doc/misc - -================================================================== --- Joe Conway - diff --git a/contrib/earthdistance/README.earthdistance b/contrib/earthdistance/README.earthdistance deleted file mode 100644 index 9be761cf17a45d5207383e809a808b4bc9125866..0000000000000000000000000000000000000000 --- a/contrib/earthdistance/README.earthdistance +++ /dev/null @@ -1,127 +0,0 @@ -This contrib package contains two different approaches to calculating -great circle distances on the surface of the Earth. The one described -first depends on the contrib/cube package (which MUST be installed before -earthdistance is installed). The second one is based on the point -datatype using latitude and longitude for the coordinates. The install -script makes the defined functions executable by anyone. - -Make sure contrib/cube has been installed. -make -make install -make installcheck - -To use these functions in a particular database as a postgres superuser do: -psql databasename < earthdistance.sql - -------------------------------------------- -contrib/cube based Earth distance functions -Bruno Wolff III -September 2002 - -A spherical model of the Earth is used. - -Data is stored in cubes that are points (both corners are the same) using 3 -coordinates representing the distance from the center of the Earth. - -The radius of the Earth is obtained from the earth() function. It is -given in meters. But by changing this one function you can change it -to use some other units or to use a different value of the radius -that you feel is more appropiate. - -This package also has applications to astronomical databases as well. -Astronomers will probably want to change earth() to return a radius of -180/pi() so that distances are in degrees. - -Functions are provided to allow for input in latitude and longitude (in -degrees), to allow for output of latitude and longitude, to calculate -the great circle distance between two points and to easily specify a -bounding box usable for index searches. - -The functions are all 'sql' functions. If you want to make these functions -executable by other people you will also have to make the referenced -cube functions executable. cube(text), cube(float8), cube(cube,float8), -cube_distance(cube,cube), cube_ll_coord(cube,int) and -cube_enlarge(cube,float8,int) are used indirectly by the earth distance -functions. is_point(cube) and cube_dim(cube) are used in constraints for data -in domain earth. cube_ur_coord(cube,int) is used in the regression tests and -might be useful for looking at bounding box coordinates in user applications. - -A domain of type cube named earth is defined. -There are constraints on it defined to make sure the cube is a point, -that it does not have more than 3 dimensions and that it is very near -the surface of a sphere centered about the origin with the radius of -the Earth. - -The following functions are provided: - -earth() - Returns the radius of the Earth in meters. - -sec_to_gc(float8) - Converts the normal straight line (secant) distance between -between two points on the surface of the Earth to the great circle distance -between them. - -gc_to_sec(float8) - Converts the great circle distance between two points -on the surface of the Earth to the normal straight line (secant) distance -between them. - -ll_to_earth(float8, float8) - Returns the location of a point on the surface -of the Earth given its latitude (argument 1) and longitude (argument 2) in -degrees. - -latitude(earth) - Returns the latitude in degrees of a point on the surface -of the Earth. - -longitude(earth) - Returns the longitude in degrees of a point on the surface -of the Earth. - -earth_distance(earth, earth) - Returns the great circle distance between -two points on the surface of the Earth. - -earth_box(earth, float8) - Returns a box suitable for an indexed search using -the cube @> operator for points within a given great circle distance of a -location. Some points in this box are further than the specified great circle -distance from the location so a second check using earth_distance should be -made at the same time. - -One advantage of using cube representation over a point using latitude and -longitude for coordinates, is that you don't have to worry about special -conditions at +/- 180 degrees of longitude or near the poles. - -Below is the documentation for the Earth distance operator that works -with the point data type. - ---------------------------------------------------------------------- - -I corrected a bug in the geo_distance code where two double constants -were declared as int. I also changed the distance function to use -the haversine formula which is more accurate for small distances. -Bruno Wolff -September 2002 - ---------------------------------------------------------------------- - -Date: Wed, 1 Apr 1998 15:19:32 -0600 (CST) -From: Hal Snyder -To: vmehr@ctp.com -Subject: [QUESTIONS] Re: Spatial data, R-Trees - -> From: Vivek Mehra -> Date: Wed, 1 Apr 1998 10:06:50 -0500 - -> Am just starting out with PostgreSQL and would like to learn more about -> the spatial data handling ablilities of postgreSQL - in terms of using -> R-tree indexes, user defined types, operators and functions. -> -> Would you be able to suggest where I could find some code and SQL to -> look at to create these? - -Here's the setup for adding an operator '<@>' to give distance in -statute miles between two points on the Earth's surface. Coordinates -are in degrees. Points are taken as (longitude, latitude) and not vice -versa as longitude is closer to the intuitive idea of x-axis and -latitude to y-axis. - -There's C source, Makefile for FreeBSD, and SQL for installing and -testing the function. - -Let me know if anything looks fishy! diff --git a/contrib/fuzzystrmatch/README.fuzzystrmatch b/contrib/fuzzystrmatch/README.fuzzystrmatch deleted file mode 100644 index b47d66c4c1dc84909cb55602b751341e810a14e9..0000000000000000000000000000000000000000 --- a/contrib/fuzzystrmatch/README.fuzzystrmatch +++ /dev/null @@ -1,144 +0,0 @@ -/* - * fuzzystrmatch.c - * - * Functions for "fuzzy" comparison of strings - * - * Joe Conway - * - * Copyright (c) 2001-2007, PostgreSQL Global Development Group - * ALL RIGHTS RESERVED; - * - * levenshtein() - * ------------- - * Written based on a description of the algorithm by Michael Gilleland - * found at http://www.merriampark.com/ld.htm - * Also looked at levenshtein.c in the PHP 4.0.6 distribution for - * inspiration. - * - * metaphone() - * ----------- - * Modified for PostgreSQL by Joe Conway. - * Based on CPAN's "Text-Metaphone-1.96" by Michael G Schwern - * Code slightly modified for use as PostgreSQL function (palloc, elog, etc). - * Metaphone was originally created by Lawrence Philips and presented in article - * in "Computer Language" December 1990 issue. - * - * dmetaphone() and dmetaphone_alt() - * --------------------------------- - * A port of the DoubleMetaphone perl module by Andrew Dunstan. See dmetaphone.c - * for more detail. - * - * soundex() - * ----------- - * Folded existing soundex contrib into this one. Renamed text_soundex() (C function) - * to soundex() for consistency. - * - * difference() - * ------------ - * Return the difference between two strings' soundex values. Kris Jurka - * - * Permission to use, copy, modify, and distribute this software and its - * documentation for any purpose, without fee, and without a written agreement - * is hereby granted, provided that the above copyright notice and this - * paragraph and the following two paragraphs appear in all copies. - * - * IN NO EVENT SHALL THE AUTHORS OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR - * DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING - * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS - * DOCUMENTATION, EVEN IF THE AUTHOR OR DISTRIBUTORS HAVE BEEN ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. - * - * THE AUTHORS AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES, - * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY - * AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS - * ON AN "AS IS" BASIS, AND THE AUTHOR AND DISTRIBUTORS HAS NO OBLIGATIONS TO - * PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. - * - */ - - -Version 0.3 (30 June, 2004): - -Release Notes: - Version 0.3 - - added double metaphone code from Andrew Dunstan - - change metaphone so that an empty input string causes an empty - output string to be returned, instead of throwing an ERROR - - fixed examples in README.soundex - - Version 0.2 - - folded soundex contrib into this one - - Version 0.1 - - initial release - -Installation: - Place these files in a directory called 'fuzzystrmatch' under 'contrib' in the PostgreSQL source tree. Then run: - - make - make install - - You can use fuzzystrmatch.sql to create the functions in your database of choice, e.g. - - psql -U postgres template1 < fuzzystrmatch.sql - - installs following functions into database template1: - - levenshtein() - calculates the levenshtein distance between two strings - metaphone() - calculates the metaphone code of an input string - -Documentation -================================================================== -Name - -levenshtein -- calculates the levenshtein distance between two strings - -Synopsis - -levenshtein(text source, text target) - -Inputs - - source - any text string, 255 characters max, NOT NULL - - target - any text string, 255 characters max, NOT NULL - -Outputs - - Returns int - -Example usage - - select levenshtein('GUMBO','GAMBOL'); - -================================================================== -Name - -metaphone -- calculates the metaphone code of an input string - -Synopsis - -metaphone(text source, int max_output_length) - -Inputs - - source - any text string, 255 characters max, NOT NULL - - max_output_length - maximum length of the output metaphone code; if longer, the output - is truncated to this length - -Outputs - - Returns text - -Example usage - - select metaphone('GUMBO',4); - -================================================================== --- Joe Conway - diff --git a/contrib/fuzzystrmatch/README.soundex b/contrib/fuzzystrmatch/README.soundex deleted file mode 100644 index cb33c64469df9d56cfa9ab93388b6f6f763ba108..0000000000000000000000000000000000000000 --- a/contrib/fuzzystrmatch/README.soundex +++ /dev/null @@ -1,66 +0,0 @@ -NOTE: Modified August 07, 2001 by Joe Conway. Updated for accuracy - after combining soundex code into the fuzzystrmatch contrib ---------------------------------------------------------------------- -The Soundex system is a method of matching similar sounding names -(or any words) to the same code. It was initially used by the -United States Census in 1880, 1900, and 1910, but it has little use -beyond English names (or the English pronunciation of names), and -it is not a linguistic tool. - -When comparing two soundex values to determine similarity, the -difference function reports how close the match is on a scale -from zero to four, with zero being no match and four being an -exact match. - -The following are some usage examples: - -SELECT soundex('hello world!'); - -SELECT soundex('Anne'), soundex('Ann'), difference('Anne', 'Ann'); -SELECT soundex('Anne'), soundex('Andrew'), difference('Anne', 'Andrew'); -SELECT soundex('Anne'), soundex('Margaret'), difference('Anne', 'Margaret'); - -CREATE TABLE s (nm text); - -INSERT INTO s VALUES ('john'); -INSERT INTO s VALUES ('joan'); -INSERT INTO s VALUES ('wobbly'); -INSERT INTO s VALUES ('jack'); - -SELECT * FROM s WHERE soundex(nm) = soundex('john'); - -SELECT a.nm, b.nm FROM s a, s b WHERE soundex(a.nm) = soundex(b.nm) AND a.oid <> b.oid; - -CREATE FUNCTION text_sx_eq(text, text) RETURNS boolean AS -'select soundex($1) = soundex($2)' -LANGUAGE SQL; - -CREATE FUNCTION text_sx_lt(text, text) RETURNS boolean AS -'select soundex($1) < soundex($2)' -LANGUAGE SQL; - -CREATE FUNCTION text_sx_gt(text, text) RETURNS boolean AS -'select soundex($1) > soundex($2)' -LANGUAGE SQL; - -CREATE FUNCTION text_sx_le(text, text) RETURNS boolean AS -'select soundex($1) <= soundex($2)' -LANGUAGE SQL; - -CREATE FUNCTION text_sx_ge(text, text) RETURNS boolean AS -'select soundex($1) >= soundex($2)' -LANGUAGE SQL; - -CREATE FUNCTION text_sx_ne(text, text) RETURNS boolean AS -'select soundex($1) <> soundex($2)' -LANGUAGE SQL; - -DROP OPERATOR #= (text, text); - -CREATE OPERATOR #= (leftarg=text, rightarg=text, procedure=text_sx_eq, commutator = #=); - -SELECT * FROM s WHERE text_sx_eq(nm, 'john'); - -SELECT * FROM s WHERE s.nm #= 'john'; - -SELECT * FROM s WHERE difference(s.nm, 'john') > 2; diff --git a/contrib/hstore/README.hstore b/contrib/hstore/README.hstore deleted file mode 100644 index b8c9711389343215eec3618be6b698eafd9f28ec..0000000000000000000000000000000000000000 --- a/contrib/hstore/README.hstore +++ /dev/null @@ -1,188 +0,0 @@ -Hstore - contrib module for storing (key,value) pairs - -[Online version] (http://www.sai.msu.su/~megera/oddmuse/index.cgi?Hstore) - -Motivation - -Many attributes rarely searched, semistructural data, lazy DBA - -Authors - - * Oleg Bartunov , Moscow, Moscow University, Russia - * Teodor Sigaev , Moscow, Delta-Soft Ltd.,Russia - -LEGAL NOTICES: This module is released under BSD license (as PostgreSQL -itself) - -Operations - - * hstore -> text - get value , perl analogy $h{key} - -select 'a=>q, b=>g'->'a'; - ? ------- - q - - * hstore || hstore - concatenation, perl analogy %a=( %b, %c ); - -regression=# select 'a=>b'::hstore || 'c=>d'::hstore; - ?column? --------------------- - "a"=>"b", "c"=>"d" -(1 row) - -but, notice - -regression=# select 'a=>b'::hstore || 'a=>d'::hstore; - ?column? ----------- - "a"=>"d" -(1 row) - - * text => text - creates hstore type from two text strings - -select 'a'=>'b'; - ?column? ----------- - "a"=>"b" - - * hstore @> hstore - contains operation, check if left operand contains right. - -regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'a=>c'; - ?column? ----------- - f -(1 row) - -regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'b=>1'; - ?column? ----------- - t -(1 row) - - * hstore <@ hstore - contained operation, check if left operand is contained - in right - -(Before PostgreSQL 8.2, the containment operators @> and <@ were -respectively called @ and ~. These names are still available, but are -deprecated and will eventually be retired. Notice that the old names -are reversed from the convention formerly followed by the core geometric -datatypes!) - -Functions - - * akeys(hstore) - returns all keys from hstore as array - -regression=# select akeys('a=>1,b=>2'); - akeys -------- - {a,b} - - * skeys(hstore) - returns all keys from hstore as strings - -regression=# select skeys('a=>1,b=>2'); - skeys -------- - a - b - - * avals(hstore) - returns all values from hstore as array - -regression=# select avals('a=>1,b=>2'); - avals -------- - {1,2} - - * svals(hstore) - returns all values from hstore as strings - -regression=# select svals('a=>1,b=>2'); - svals -------- - 1 - 2 - - * delete (hstore,text) - delete (key,value) from hstore if key matches - argument. - -regression=# select delete('a=>1,b=>2','b'); - delete ----------- - "a"=>"1" - - * each(hstore) return (key, value) pairs - -regression=# select * from each('a=>1,b=>2'); - key | value ------+------- - a | 1 - b | 2 - - * exist (hstore,text) - * hstore ? text - - returns 'true if key is exists in hstore and false otherwise. - -regression=# select exist('a=>1','a'), 'a=>1' ? 'a'; - exist | ?column? --------+---------- - t | t - - * defined (hstore,text) - returns true if key is exists in hstore and - its value is not NULL. - -regression=# select defined('a=>NULL','a'); - defined ---------- - f - -Indices - -Module provides index support for '@>' and '?' operations. - -create index hidx on testhstore using gist(h); -create index hidx on testhstore using gin(h); - -Note - -Use parenthesis in select below, because priority of 'is' is higher than that of '->' - -select id from entrants where (info->'education_period') is not null; - -Examples - - * add key - -update tt set h=h||'c=>3'; - - * delete key - -update tt set h=delete(h,'k1'); - - * Statistics - -hstore type, because of its intrinsic liberality, could contain a lot of -different keys. Checking for valid keys is the task of application. -Examples below demonstrate several techniques how to check keys statistics. - - o simple example - -select * from each('aaa=>bq, b=>NULL, ""=>1 '); - - o using table - -select (each(h)).key, (each(h)).value into stat from testhstore ; - - o online stat - -select key, count(*) from (select (each(h)).key from testhstore) as stat group by key order by count desc, key; - key | count ------------+------- - line | 883 - query | 207 - pos | 203 - node | 202 - space | 197 - status | 195 - public | 194 - title | 190 - org | 189 -................... diff --git a/contrib/intagg/README.int_aggregate b/contrib/intagg/README.int_aggregate deleted file mode 100644 index 0c7317ccc9afd1add15d3f95c1ebde864194cd09..0000000000000000000000000000000000000000 --- a/contrib/intagg/README.int_aggregate +++ /dev/null @@ -1,55 +0,0 @@ -Integer aggregator/enumerator. - -Many database systems have the notion of a one to many table. - -A one to many table usually sits between two indexed tables, -as: - -create table one_to_many(left int, right int) ; - -And it is used like this: - -SELECT right.* from right JOIN one_to_many ON (right.id = one_to_many.right) - WHERE one_to_many.left = item; - -This will return all the items in the right hand table for an entry -in the left hand table. This is a very common construct in SQL. - -Now, this methodology can be cumbersome with a very large number of -entries in the one_to_many table. Depending on the order in which -data was entered, a join like this could result in an index scan -and a fetch for each right hand entry in the table for a particular -left hand entry. - -If you have a very dynamic system, there is not much you can do. -However, if you have some data which is fairly static, you can -create a summary table with the aggregator. - -CREATE TABLE summary as SELECT left, int_array_aggregate(right) - AS right FROM one_to_many GROUP BY left; - -This will create a table with one row per left item, and an array -of right items. Now this is pretty useless without some way of using -the array, thats why there is an array enumerator. - -SELECT left, int_array_enum(right) FROM summary WHERE left = item; - -The above query using int_array_enum, produces the same results as: - -SELECT left, right FROM one_to_many WHERE left = item; - -The difference is that the query against the summary table has to get -only one row from the table, where as the query against "one_to_many" -must index scan and fetch a row for each entry. - -On our system, an EXPLAIN shows a query with a cost of 8488 gets reduced -to a cost of 329. The query is a join between the one_to_many table, - -select right, count(right) from -( - select left, int_array_enum(right) as right from summary join - (select left from left_table where left = item) as lefts - ON (summary.left = lefts.left ) -) as list group by right order by count desc ; - - diff --git a/contrib/intarray/README.intarray b/contrib/intarray/README.intarray deleted file mode 100644 index 9f16ca53eccfac09afe84e59bd246727507febd5..0000000000000000000000000000000000000000 --- a/contrib/intarray/README.intarray +++ /dev/null @@ -1,185 +0,0 @@ -This is an implementation of RD-tree data structure using GiST interface -of PostgreSQL. It has built-in lossy compression. - -Current implementation provides index support for one-dimensional array of -integers: gist__int_ops, suitable for small and medium size of arrays (used by -default), and gist__intbig_ops for indexing large arrays (we use superimposed -signature with length of 4096 bits to represent sets). There is also a -non-default gin__int_ops for GIN indexes on integer arrays. - -All work was done by Teodor Sigaev (teodor@stack.net) and Oleg Bartunov -(oleg@sai.msu.su). See http://www.sai.msu.su/~megera/postgres/gist -for additional information. Andrey Oktyabrski did a great work on -adding new functions and operations. - - -FUNCTIONS: - - int icount(int[]) - the number of elements in intarray - -test=# select icount('{1,2,3}'::int[]); - icount --------- - 3 -(1 row) - - int[] sort(int[], 'asc' | 'desc') - sort intarray - -test=# select sort('{1,2,3}'::int[],'desc'); - sort ---------- - {3,2,1} -(1 row) - - int[] sort(int[]) - sort in ascending order - int[] sort_asc(int[]),sort_desc(int[]) - shortcuts for sort - - int[] uniq(int[]) - returns unique elements - -test=# select uniq(sort('{1,2,3,2,1}'::int[])); - uniq ---------- - {1,2,3} -(1 row) - - int idx(int[], int item) - returns index of first intarray matching element to item, or - '0' if matching failed. - -test=# select idx('{1,2,3,2,1}'::int[],2); - idx ------ - 2 -(1 row) - - - int[] subarray(int[],int START [, int LEN]) - returns part of intarray starting from - element number START (from 1) and length LEN. - -test=# select subarray('{1,2,3,2,1}'::int[],2,3); - subarray ----------- - {2,3,2} -(1 row) - - int[] intset(int4) - casting int4 to int[] - -test=# select intset(1); - intset --------- - {1} -(1 row) - -OPERATIONS: - - int[] && int[] - overlap - returns TRUE if arrays have at least one common element - int[] @> int[] - contains - returns TRUE if left array contains right array - int[] <@ int[] - contained - returns TRUE if left array is contained in right array - # int[] - returns the number of elements in array - int[] + int - push element to array ( add to end of array) - int[] + int[] - merge of arrays (right array added to the end of left one) - int[] - int - remove entries matched by right argument from array - int[] - int[] - remove right array from left - int[] | int - returns intarray - union of arguments - int[] | int[] - returns intarray as a union of two arrays - int[] & int[] - returns intersection of arrays - int[] @@ query_int - returns TRUE if array satisfies query (like '1&(2|3)') - query_int ~~ int[] - returns TRUE if array satisfies query (commutator of @@) - -(Before PostgreSQL 8.2, the containment operators @> and <@ were -respectively called @ and ~. These names are still available, but are -deprecated and will eventually be retired. Notice that the old names -are reversed from the convention formerly followed by the core geometric -datatypes!) - -CHANGES: - -August 6, 2002 - 1. Reworked patch from Andrey Oktyabrski (ano@spider.ru) with - functions: icount, sort, sort_asc, uniq, idx, subarray - operations: #, +, -, |, & -October 1, 2001 - 1. Change search method in array to binary -September 28, 2001 - 1. gist__int_ops now is without lossy - 2. add sort entry in picksplit -September 21, 2001 - 1. Added support for boolean query (indexable operator @@, looks like - a @@ '1|(2&3)', perfomance is better in any case ) - 2. Done some small optimizations -March 19, 2001 - 1. Added support for toastable keys - 2. Improved split algorithm for intbig (selection speedup is about 30%) - -INSTALLATION: - - gmake - gmake install - -- load functions - psql < _int.sql - -REGRESSION TEST: - - gmake installcheck - -EXAMPLE USAGE: - - create table message (mid int not null,sections int[]); - create table message_section_map (mid int not null,sid int not null); - - -- create indices -CREATE unique index message_key on message ( mid ); -CREATE unique index message_section_map_key2 on message_section_map (sid, mid ); -CREATE INDEX message_rdtree_idx on message using gist ( sections gist__int_ops); - - -- select some messages with section in 1 OR 2 - OVERLAP operator - select message.mid from message where message.sections && '{1,2}'; - - -- select messages contains in sections 1 AND 2 - CONTAINS operator - select message.mid from message where message.sections @> '{1,2}'; - -- the same, CONTAINED operator - select message.mid from message where '{1,2}' <@ message.sections; - -BENCHMARK: - - subdirectory bench contains benchmark suite. - cd ./bench - 1. createdb TEST - 2. psql TEST < ../_int.sql - 3. ./create_test.pl | psql TEST - 4. ./bench.pl - perl script to benchmark queries, supports OR, AND queries - with/without RD-Tree. Run script without arguments to - see availbale options. - - a)test without RD-Tree (OR) - ./bench.pl -d TEST -c -s 1,2 -v - b)test with RD-Tree - ./bench.pl -d TEST -c -s 1,2 -v -r - -BENCHMARKS: - -Size of table : 200000 -Size of table : 269133 - -Distribution of messages by sections: - -section 0: 74377 messages -section 1: 16284 messages -section 50: 1229 messages -section 99: 683 messages - -old - without RD-Tree support, -new - with RD-Tree - -+----------+---------------+----------------+ -|Search set|OR, time in sec|AND, time in sec| -| +-------+-------+--------+-------+ -| | old | new | old | new | -+----------+-------+-------+--------+-------+ -| 1| 0.625| 0.101| -| -| -+----------+-------+-------+--------+-------+ -| 99| 0.018| 0.017| -| -| -+----------+-------+-------+--------+-------+ -| 1,2| 0.766| 0.133| 0.628| 0.045| -+----------+-------+-------+--------+-------+ -| 1,2,50,65| 0.794| 0.141| 0.030| 0.006| -+----------+-------+-------+--------+-------+ diff --git a/contrib/isn/README.isn b/contrib/isn/README.isn deleted file mode 100644 index 22154266f0ea287a385adec906f18d0a65dd003a..0000000000000000000000000000000000000000 --- a/contrib/isn/README.isn +++ /dev/null @@ -1,220 +0,0 @@ - --- EAN13 - UPC - ISBN (books) - ISMN (music) - ISSN (serials) -------------------------------------------------------------- - -Copyright Germán Méndez Bravo (Kronuz), 2004 - 2006 -This module is released under the same BSD license as the rest of PostgreSQL. - -The information to implement this module was collected through -several sites, including: - http://www.isbn-international.org/ - http://www.issn.org/ - http://www.ismn-international.org/ - http://www.wikipedia.org/ -the prefixes used for hyphenation where also compiled from: - http://www.gs1.org/productssolutions/idkeys/support/prefix_list.html - http://www.isbn-international.org/en/identifiers.html - http://www.ismn-international.org/ranges.html -Care was taken during the creation of the algorithms and they -were meticulously verified against the suggested algorithms -in the official ISBN, ISMN, ISSN User Manuals. - -!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! -THIS MODULE IS PROVIDED "AS IS" AND WITHOUT ANY WARRANTY - OF ANY KIND, EXPRESS OR IMPLIED. -!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! - --- Content of the Module -------------------------------------------------- - -This directory contains definitions for a few PostgreSQL -data types, for the following international-standard namespaces: -EAN13, UPC, ISBN (books), ISMN (music), and ISSN (serials). This module -is inspired by Garrett A. Wollman's isbn_issn code. - -I wanted the database to fully validate numbers and also to use the -upcoming ISBN-13 and the EAN13 standards, as well as to have it -automatically doing hyphenations for ISBN numbers. - -This new module validates, and automatically adds the correct -hyphenations to the numbers. Also, it supports the new ISBN-13 -numbers to be used starting in January 2007. - -Premises: -1. ISBN13, ISMN13, ISSN13 numbers are all EAN13 numbers -2. EAN13 numbers aren't always ISBN13, ISMN13 or ISSN13 (some are) -3. some ISBN13 numbers can be displayed as ISBN -4. some ISMN13 numbers can be displayed as ISMN -5. some ISSN13 numbers can be displayed as ISSN -6. all UPC, ISBN, ISMN and ISSN can be represented as EAN13 numbers - -Note: All types are internally represented as 64 bit integers, - and internally all are consistently interchangeable. - -We have the following data types: - -+ EAN13 for European Article Numbers. - This type will always show the EAN13-display format. - Te output function for this is -> ean13_out() - -+ ISBN13 for International Standard Book Numbers to be displayed in - the new EAN13-display format. -+ ISMN13 for International Standard Music Numbers to be displayed in - the new EAN13-display format. -+ ISSN13 for International Standard Serial Numbers to be displayed - in the new EAN13-display format. - These types will always display the long version of the ISxN (EAN13) - The output function to do this is -> ean13_out() - * The need for these types is just for displaying in different - ways the same data: - ISBN13 is actually the same as ISBN, ISMN13=ISMN and ISSN13=ISSN. - -+ ISBN for International Standard Book Numbers to be displayed in - the current short-display format. -+ ISMN for International Standard Music Numbers to be displayed in - the current short-display format. -+ ISSN for International Standard Serial Numbers to be displayed - in the current short-display format. - These types will display the short version of the ISxN (ISxN 10) - whenever it's possible, and it will show ISxN 13 when it's - impossible to show the short version. - The output function to do this is -> isn_out() - -+ UPC for Universal Product Codes. - UPC numbers are a subset of the EAN13 numbers (they are basically - EAN13 without the first '0' digit.) - The output function to do this is also -> isn_out() - -We have the following input functions: -+ To take a string and return an EAN13 -> ean13_in() -+ To take a string and return valid ISBN or ISBN13 numbers -> isbn_in() -+ To take a string and return valid ISMN or ISMN13 numbers -> ismn_in() -+ To take a string and return valid ISSN or ISSN13 numbers -> issn_in() -+ To take a string and return an UPC codes -> upc_in() - -We are able to cast from: -+ ISBN13 -> EAN13 -+ ISMN13 -> EAN13 -+ ISSN13 -> EAN13 - -+ ISBN -> EAN13 -+ ISMN -> EAN13 -+ ISSN -> EAN13 -+ UPC -> EAN13 - -+ ISBN <-> ISBN13 -+ ISMN <-> ISMN13 -+ ISSN <-> ISSN13 - -We have two operator classes (for btree and for hash) so each data type -can be indexed for faster access. - -The C API is implemented as: -extern Datum isn_out(PG_FUNCTION_ARGS); -extern Datum ean13_out(PG_FUNCTION_ARGS); -extern Datum ean13_in(PG_FUNCTION_ARGS); -extern Datum isbn_in(PG_FUNCTION_ARGS); -extern Datum ismn_in(PG_FUNCTION_ARGS); -extern Datum issn_in(PG_FUNCTION_ARGS); -extern Datum upc_in(PG_FUNCTION_ARGS); - -On success: -+ isn_out() takes any of our types and returns a string containing - the shortes possible representation of the number. - -+ ean13_out() takes any of our types and returns the - EAN13 (long) representation of the number. - -+ ean13_in() takes a string and return a EAN13. Which, as stated in (2) - could or could not be any of our types, but it certainly is an EAN13 - number. Only if the string is a valid EAN13 number, otherwise it fails. - -+ isbn_in() takes a string and return an ISBN/ISBN13. Only if the string - is really a ISBN/ISBN13, otherwise it fails. - -+ ismn_in() takes a string and return an ISMN/ISMN13. Only if the string - is really a ISMN/ISMN13, otherwise it fails. - -+ issn_in() takes a string and return an ISSN/ISSN13. Only if the string - is really a ISSN/ISSN13, otherwise it fails. - -+ upc_in() takes a string and return an UPC. Only if the string is - really a UPC, otherwise it fails. - -(on failure, the functions 'ereport' the error) - --- Testing/Playing Functions -------------------------------------------------- -isn_weak(boolean) - Sets the weak input mode. -This function is intended for testing use only! -isn_weak() gets the current status of the weak mode. - -"Weak" mode is used to be able to insert "invalid" data to a table. -"Invalid" as in the check digit being wrong, not missing numbers. - -Why would you want to use the weak mode? well, it could be that -you have a huge collection of ISBN numbers, and that there are so many of -them that for weird reasons some have the wrong check digit (perhaps the -numbers where scanned from a printed list and the OCR got the numbers wrong, -perhaps the numbers were manually captured... who knows.) Anyway, the thing -is you might want to clean the mess up, but you still want to be able to have -all the numbers in your database and maybe use an external tool to access -the invalid numbers in the database so you can verify the information and -validate it more easily; as selecting all the invalid numbers in the table. - -When you insert invalid numbers in a table using the weak mode, the number -will be inserted with the corrected check digit, but it will be flagged -with an exclamation mark ('!') at the end (i.e. 0-11-000322-5!) - -You can also force the insertion of invalid numbers even not in the weak mode, -appending the '!' character at the end of the number. - -To work with invalid numbers, you can use two functions: - + make_valid(), which validates an invalid number (deleting the invalid flag) - + is_valid(), which checks for the invalid flag presence. - --- Examples of Use -------------------------------------------------- ---Using the types directly: - select isbn('978-0-393-04002-9'); - select isbn13('0901690546'); - select issn('1436-4522'); - ---Casting types: --- note that you can only cast from ean13 to other type when the casted --- number would be valid in the realm of the casted type; --- thus, the following will NOT work: select isbn(ean13('0220356483481')); --- but these will: - select upc(ean13('0220356483481')); - select ean13(upc('220356483481')); - ---Create a table with a single column to hold ISBN numbers: - create table test ( id isbn ); - insert into test values('9780393040029'); - ---Automatically calculating check digits (observe the '?'): - insert into test values('220500896?'); - insert into test values('978055215372?'); - - select issn('3251231?'); - select ismn('979047213542?'); - ---Using the weak mode: - select isn_weak(true); - insert into test values('978-0-11-000533-4'); - insert into test values('9780141219307'); - insert into test values('2-205-00876-X'); - select isn_weak(false); - - select id from test where not is_valid(id); - update test set id=make_valid(id) where id = '2-205-00876-X!'; - - select * from test; - - select isbn13(id) from test; - --- Contact -------------------------------------------------- -Please suggestions or bug reports to kronuz at users.sourceforge.net - -Last reviewed on August 23, 2006 by Kronuz. diff --git a/contrib/lo/README.lo b/contrib/lo/README.lo deleted file mode 100644 index a7b99940f2a6e022a436e4de1a826e58dfeed7be..0000000000000000000000000000000000000000 --- a/contrib/lo/README.lo +++ /dev/null @@ -1,88 +0,0 @@ -PostgreSQL type extension for managing Large Objects ----------------------------------------------------- - -Overview - -One of the problems with the JDBC driver (and this affects the ODBC driver -also), is that the specification assumes that references to BLOBS (Binary -Large OBjectS) are stored within a table, and if that entry is changed, the -associated BLOB is deleted from the database. - -As PostgreSQL stands, this doesn't occur. Large objects are treated as -objects in their own right; a table entry can reference a large object by -OID, but there can be multiple table entries referencing the same large -object OID, so the system doesn't delete the large object just because you -change or remove one such entry. - -Now this is fine for new PostgreSQL-specific applications, but existing ones -using JDBC or ODBC won't delete the objects, resulting in orphaning - objects -that are not referenced by anything, and simply occupy disk space. - - -The Fix - -I've fixed this by creating a new data type 'lo', some support functions, and -a Trigger which handles the orphaning problem. The trigger essentially just -does a 'lo_unlink' whenever you delete or modify a value referencing a large -object. When you use this trigger, you are assuming that there is only one -database reference to any large object that is referenced in a -trigger-controlled column! - -The 'lo' type was created because we needed to differentiate between plain -OIDs and Large Objects. Currently the JDBC driver handles this dilemma easily, -but (after talking to Byron), the ODBC driver needed a unique type. They had -created an 'lo' type, but not the solution to orphaning. - -You don't actually have to use the 'lo' type to use the trigger, but it may be -convenient to use it to keep track of which columns in your database represent -large objects that you are managing with the trigger. - - -Install - -Ok, first build the shared library, and install. Typing 'make install' in the -contrib/lo directory should do it. - -Then, as the postgres super user, run the lo.sql script in any database that -needs the features. This will install the type, and define the support -functions. You can run the script once in template1, and the objects will be -inherited by subsequently-created databases. - - -How to Use - -The easiest way is by an example: - -> create table image (title text, raster lo); -> create trigger t_raster before update or delete on image -> for each row execute procedure lo_manage(raster); - -Create a trigger for each column that contains a lo type, and give the column -name as the trigger procedure argument. You can have more than one trigger on -a table if you need multiple lo columns in the same table, but don't forget to -give a different name to each trigger. - - -Issues - -* Dropping a table will still orphan any objects it contains, as the trigger - is not executed. - - Avoid this by preceding the 'drop table' with 'delete from {table}'. - - If you already have, or suspect you have, orphaned large objects, see - the contrib/vacuumlo module to help you clean them up. It's a good idea - to run contrib/vacuumlo occasionally as a back-stop to the lo_manage - trigger. - -* Some frontends may create their own tables, and will not create the - associated trigger(s). Also, users may not remember (or know) to create - the triggers. - -As the ODBC driver needs a permanent lo type (& JDBC could be optimised to -use it if it's Oid is fixed), and as the above issues can only be fixed by -some internal changes, I feel it should become a permanent built-in type. - -I'm releasing this into contrib, just to get it out, and tested. - -Peter Mount June 13 1998 diff --git a/contrib/ltree/README.ltree b/contrib/ltree/README.ltree deleted file mode 100644 index a9d722d0514bfa515f8bd6e68dc19bae79dc9d4d..0000000000000000000000000000000000000000 --- a/contrib/ltree/README.ltree +++ /dev/null @@ -1,512 +0,0 @@ -contrib/ltree module - -ltree - is a PostgreSQL contrib module which contains implementation of data -types, indexed access methods and queries for data organized as a tree-like -structures. -This module will works for PostgreSQL version 7.3. -(version for 7.2 version is available from http://www.sai.msu.su/~megera/postgres/gist/ltree/ltree-7.2.tar.gz) -------------------------------------------------------------------------------- -All work was done by Teodor Sigaev (teodor@stack.net) and Oleg Bartunov -(oleg@sai.msu.su). See http://www.sai.msu.su/~megera/postgres/gist for -additional information. Authors would like to thank Eugeny Rodichev for helpful -discussions. Comments and bug reports are welcome. -------------------------------------------------------------------------------- - -LEGAL NOTICES: This module is released under BSD license (as PostgreSQL -itself). This work was done in framework of Russian Scientific Network and -partially supported by Russian Foundation for Basic Research and Stack Group. -------------------------------------------------------------------------------- - -MOTIVATION - -This is a placeholder for introduction to the problem. Hope, people reading -this document doesn't need it too much :-) - -DEFINITIONS - -A label of a node is a sequence of one or more words separated by blank -character '_' and containing letters and digits ( for example, [a-zA-Z0-9] for -C locale). The length of a label is limited by 256 bytes. - -Example: 'Countries', 'Personal_Services' - -A label path of a node is a sequence of one or more dot-separated labels -l1.l2...ln, represents path from root to the node. The length of a label path -is limited by 65Kb, but size <= 2Kb is preferrable. We consider it's not a -strict limitation ( maximal size of label path for DMOZ catalogue - http:// -www.dmoz.org, is about 240 bytes !) - -Example: 'Top.Countries.Europe.Russia' - -We introduce several datatypes: - -ltree - - is a datatype for label path. - -ltree[] - - is a datatype for arrays of ltree. - -lquery - - is a path expression that has regular expression in the label path and - used for ltree matching. Star symbol (*) is used to specify any number of - labels (levels) and could be used at the beginning and the end of lquery, - for example, '*.Europe.*'. - - The following quantifiers are recognized for '*' (like in Perl): - - {n} Match exactly n levels - {n,} Match at least n levels - {n,m} Match at least n but not more than m levels - {,m} Match at maximum m levels (eq. to {0,m}) - - It is possible to use several modifiers at the end of a label: - - - @ Do case-insensitive label matching - * Do prefix matching for a label - % Don't account word separator '_' in label matching, that is - 'Russian%' would match 'Russian_nations', but not 'Russian' - - lquery could contains logical '!' (NOT) at the beginning of the label and ' - |' (OR) to specify possible alternatives for label matching. - - Example of lquery: - - - Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain - a) b) c) d) e) - - A label path should - + a) begins from a node with label 'Top' - + b) and following zero or 2 labels until - + c) a node with label beginning from case-insensitive prefix 'sport' - + d) following node with label not matched 'football' or 'tennis' and - + e) ends on node with label beginning from 'Russ' or strictly matched - 'Spain'. - -ltxtquery - - is a datatype for label searching (like type 'query' for full text - searching, see contrib/tsearch). It's possible to use modifiers @,%,* at - the end of word. The meaning of modifiers are the same as for lquery. - - Example: 'Europe & Russia*@ & !Transportation' - - Search paths contain words 'Europe' and 'Russia*' (case-insensitive) and - not 'Transportation'. Notice, the order of words as they appear in label - path is not important ! - -OPERATIONS - -The following operations are defined for type ltree: - -<,>,<=,>=,=, <> - - have their usual meanings. Comparison is doing in the order of direct - tree traversing, children of a node are sorted lexicographic. -ltree @> ltree - - returns TRUE if left argument is an ancestor of right argument (or - equal). -ltree <@ ltree - - returns TRUE if left argument is a descendant of right argument (or - equal). -ltree ~ lquery, lquery ~ ltree - - return TRUE if node represented by ltree satisfies lquery. -ltree ? lquery[], lquery ? ltree[] - - return TRUE if node represented by ltree satisfies at least one lquery - from array. -ltree @ ltxtquery, ltxtquery @ ltree - - return TRUE if node represented by ltree satisfies ltxtquery. -ltree || ltree, ltree || text, text || ltree - - return concatenated ltree. - -Operations for arrays of ltree (ltree[]): - -ltree[] @> ltree, ltree <@ ltree[] - - returns TRUE if array ltree[] contains an ancestor of ltree. -ltree @> ltree[], ltree[] <@ ltree - - returns TRUE if array ltree[] contains a descendant of ltree. -ltree[] ~ lquery, lquery ~ ltree[] - - returns TRUE if array ltree[] contains label paths matched lquery. -ltree[] ? lquery[], lquery[] ? ltree[] - - returns TRUE if array ltree[] contains label paths matched atleaset one - lquery from array. -ltree[] @ ltxtquery, ltxtquery @ ltree[] - - returns TRUE if array ltree[] contains label paths matched ltxtquery - (full text search). -ltree[] ?@> ltree, ltree ?<@ ltree[], ltree[] ?~ lquery, ltree[] ?@ ltxtquery - - returns first element of array ltree[] satisfies corresponding condition - and NULL in vice versa. - -REMARK - -Operations <@, @>, @ and ~ have analogues - ^<@, ^@>, ^@, ^~, which doesn't use -indices ! - -INDICES - -Various indices could be created to speed up execution of operations: - - * B-tree index over ltree: - <, <=, =, >=, > - * GiST index over ltree: - <, <=, =, >=, >, @>, <@, @, ~, ? - Example: - create index path_gist_idx on test using gist (path); - * GiST index over ltree[]: - ltree[]<@ ltree, ltree @> ltree[], @, ~, ?. - Example: - create index path_gist_idx on test using gist (array_path); - Notices: This index is lossy. - -FUNCTIONS - -ltree subltree - ltree subltree(ltree, start, end) - returns subpath of ltree from start (inclusive) until the end. - # select subltree('Top.Child1.Child2',1,2); - subltree - -------- - Child1 -ltree subpath - ltree subpath(ltree, OFFSET,LEN) - ltree subpath(ltree, OFFSET) - returns subpath of ltree from OFFSET (inclusive) with length LEN. - If OFFSET is negative returns subpath starts that far from the end - of the path. If LENGTH is omitted, returns everything to the end - of the path. If LENGTH is negative, leaves that many labels off - the end of the path. - # select subpath('Top.Child1.Child2',1,2); - subpath - ------- - Child1.Child2 - - # select subpath('Top.Child1.Child2',-2,1); - subpath - --------- - Child1 -int4 nlevel - - int4 nlevel(ltree) - returns level of the node. - # select nlevel('Top.Child1.Child2'); - nlevel - -------- - 3 - Note, that arguments start, end, OFFSET, LEN have meaning of level of the - node ! - -int4 index(ltree,ltree), int4 index(ltree,ltree,OFFSET) - returns number of level of the first occurence of second argument in first - one beginning from OFFSET. if OFFSET is negative, than search begins from | - OFFSET| levels from the end of the path. - SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',3); - index - ------- - 6 - SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',-4); - index - ------- - 9 - -ltree text2ltree(text), text ltree2text(text) - cast functions for ltree and text. - - -ltree lca(ltree,ltree,...) (up to 8 arguments) - ltree lca(ltree[]) - Returns Lowest Common Ancestor (lca) - # select lca('1.2.2.3','1.2.3.4.5.6'); - lca - ----- - 1.2 - # select lca('{la.2.3,1.2.3.4.5.6}') is null; - ?column? - ---------- - f - - -INSTALLATION - - cd contrib/ltree - make - make install - make installcheck - -EXAMPLE OF USAGE - - createdb ltreetest - psql ltreetest < /usr/local/pgsql/share/contrib/ltree.sql - psql ltreetest < ltreetest.sql - -Now, we have a database ltreetest populated with a data describing hierarchy -shown below: - - - TOP - / | \ - Science Hobbies Collections - / | \ - Astronomy Amateurs_Astronomy Pictures - / \ | - Astrophysics Cosmology Astronomy - / | \ - Galaxies Stars Astronauts - -Inheritance: - -ltreetest=# select path from test where path <@ 'Top.Science'; - path ------------------------------------- - Top.Science - Top.Science.Astronomy - Top.Science.Astronomy.Astrophysics - Top.Science.Astronomy.Cosmology -(4 rows) - -Matching: - -ltreetest=# select path from test where path ~ '*.Astronomy.*'; - path ------------------------------------------------ - Top.Science.Astronomy - Top.Science.Astronomy.Astrophysics - Top.Science.Astronomy.Cosmology - Top.Collections.Pictures.Astronomy - Top.Collections.Pictures.Astronomy.Stars - Top.Collections.Pictures.Astronomy.Galaxies - Top.Collections.Pictures.Astronomy.Astronauts -(7 rows) -ltreetest=# select path from test where path ~ '*.!pictures@.*.Astronomy.*'; - path ------------------------------------- - Top.Science.Astronomy - Top.Science.Astronomy.Astrophysics - Top.Science.Astronomy.Cosmology -(3 rows) - -Full text search: - -ltreetest=# select path from test where path @ 'Astro*% & !pictures@'; - path ------------------------------------- - Top.Science.Astronomy - Top.Science.Astronomy.Astrophysics - Top.Science.Astronomy.Cosmology - Top.Hobbies.Amateurs_Astronomy -(4 rows) - -ltreetest=# select path from test where path @ 'Astro* & !pictures@'; - path ------------------------------------- - Top.Science.Astronomy - Top.Science.Astronomy.Astrophysics - Top.Science.Astronomy.Cosmology -(3 rows) - -Using Functions: - -ltreetest=# select subpath(path,0,2)||'Space'||subpath(path,2) from test where path <@ 'Top.Science.Astronomy'; - ?column? ------------------------------------------- - Top.Science.Space.Astronomy - Top.Science.Space.Astronomy.Astrophysics - Top.Science.Space.Astronomy.Cosmology -(3 rows) -We could create SQL-function: -CREATE FUNCTION ins_label(ltree, int4, text) RETURNS ltree -AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);' -LANGUAGE SQL IMMUTABLE; - -and previous select could be rewritten as: - -ltreetest=# select ins_label(path,2,'Space') from test where path <@ 'Top.Science.Astronomy'; - ins_label ------------------------------------------- - Top.Science.Space.Astronomy - Top.Science.Space.Astronomy.Astrophysics - Top.Science.Space.Astronomy.Cosmology -(3 rows) - -Or with another arguments: - -CREATE FUNCTION ins_label(ltree, ltree, text) RETURNS ltree -AS 'select subpath($1,0,nlevel($2)) || $3 || subpath($1,nlevel($2));' -LANGUAGE SQL IMMUTABLE; - -ltreetest=# select ins_label(path,'Top.Science'::ltree,'Space') from test where path <@ 'Top.Science.Astronomy'; - ins_label ------------------------------------------- - Top.Science.Space.Astronomy - Top.Science.Space.Astronomy.Astrophysics - Top.Science.Space.Astronomy.Cosmology -(3 rows) - -ADDITIONAL DATA - -To get more feeling from our ltree module you could download -dmozltree-eng.sql.gz (about 3Mb tar.gz archive containing 300,274 nodes), -available from http://www.sai.msu.su/~megera/postgres/gist/ltree/ -dmozltree-eng.sql.gz, which is DMOZ catalogue, prepared for use with ltree. -Setup your test database (dmoz), load ltree module and issue command: - -zcat dmozltree-eng.sql.gz| psql dmoz - -Data will be loaded into database dmoz and all indices will be created. - -BENCHMARKS - -All runs were performed on my IBM ThinkPad T21 (256 MB RAM, 750Mhz) using DMOZ -data, containing 300,274 nodes (see above for download link). We used some -basic queries typical for walking through catalog. - -QUERIES - - * Q0: Count all rows (sort of base time for comparison) - select count(*) from dmoz; - count - -------- - 300274 - (1 row) - * Q1: Get direct children (without inheritance) - select path from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1}'; - path - ----------------------------------- - Top.Adult.Arts.Animation.Cartoons - Top.Adult.Arts.Animation.Anime - (2 rows) - * Q2: The same as Q1 but with counting of successors - select path as parentpath , (select count(*)-1 from dmoz where path <@ - p.path) as count from dmoz p where path ~ 'Top.Adult.Arts.Animation.*{1}'; - parentpath | count - -----------------------------------+------- - Top.Adult.Arts.Animation.Cartoons | 2 - Top.Adult.Arts.Animation.Anime | 61 - (2 rows) - * Q3: Get all parents - select path from dmoz where path @> 'Top.Adult.Arts.Animation' order by - path asc; - path - -------------------------- - Top - Top.Adult - Top.Adult.Arts - Top.Adult.Arts.Animation - (4 rows) - * Q4: Get all parents with counting of children - select path, (select count(*)-1 from dmoz where path <@ p.path) as count - from dmoz p where path @> 'Top.Adult.Arts.Animation' order by path asc; - path | count - --------------------------+-------- - Top | 300273 - Top.Adult | 4913 - Top.Adult.Arts | 339 - Top.Adult.Arts.Animation | 65 - (4 rows) - * Q5: Get all children with levels - select path, nlevel(path) - nlevel('Top.Adult.Arts.Animation') as level - from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1,2}' order by path asc; - path | level - ------------------------------------------------+------- - Top.Adult.Arts.Animation.Anime | 1 - Top.Adult.Arts.Animation.Anime.Fan_Works | 2 - Top.Adult.Arts.Animation.Anime.Games | 2 - Top.Adult.Arts.Animation.Anime.Genres | 2 - Top.Adult.Arts.Animation.Anime.Image_Galleries | 2 - Top.Adult.Arts.Animation.Anime.Multimedia | 2 - Top.Adult.Arts.Animation.Anime.Resources | 2 - Top.Adult.Arts.Animation.Anime.Titles | 2 - Top.Adult.Arts.Animation.Cartoons | 1 - Top.Adult.Arts.Animation.Cartoons.AVS | 2 - Top.Adult.Arts.Animation.Cartoons.Members | 2 - (11 rows) - -Timings - -+---------------------------------------------+ -|Query|Rows|Time (ms) index|Time (ms) no index| -|-----+----+---------------+------------------| -| Q0| 1| NA| 1453.44| -|-----+----+---------------+------------------| -| Q1| 2| 0.49| 1001.54| -|-----+----+---------------+------------------| -| Q2| 2| 1.48| 3009.39| -|-----+----+---------------+------------------| -| Q3| 4| 0.55| 906.98| -|-----+----+---------------+------------------| -| Q4| 4| 24385.07| 4951.91| -|-----+----+---------------+------------------| -| Q5| 11| 0.85| 1003.23| -+---------------------------------------------+ -Timings without indices were obtained using operations which doesn't use -indices (see above) - -Remarks - -We didn't run full-scale tests, also we didn't present (yet) data for -operations with arrays of ltree (ltree[]) and full text searching. We'll -appreciate your input. So far, below some (rather obvious) results: - - * Indices does help execution of queries - * Q4 performs bad because one needs to read almost all data from the HDD - -CHANGES - -Mar 28, 2003 - Added functions index(ltree,ltree,offset), text2ltree(text), - ltree2text(text) -Feb 7, 2003 - Add ? operation - Fix ~ operation bug: eg '1.1.1' ~ '*.1' - Optimize index storage -Aug 9, 2002 - Fixed very stupid but important bug :-) -July 31, 2002 - Now works on 64-bit platforms. - Added function lca - lowest common ancestor - Version for 7.2 is distributed as separate package - - http://www.sai.msu.su/~megera/postgres/gist/ltree/ltree-7.2.tar.gz -July 13, 2002 - Initial release. - -TODO - - * Testing on 64-bit platforms. There are several known problems with byte - alignment; -- RESOLVED - * Better documentation; - * We plan (probably) to improve regular expressions processing using - non-deterministic automata; - * Some sort of XML support; - * Better full text searching; - -SOME BACKGROUNDS - -The approach we use for ltree is much like one we used in our other GiST based -contrib modules (intarray, tsearch, tree, btree_gist, rtree_gist). Theoretical -background is available in papers referenced from our GiST development page -(http://www.sai.msu.su/~megera/postgres/gist). - -A hierarchical data structure (tree) is a set of nodes. Each node has a -signature (LPS) of a fixed size, which is a hashed label path of that node. -Traversing a tree we could *certainly* prune branches if - -LQS (bitwise AND) LPS != LQS - -where LQS is a signature of lquery or ltxtquery, obtained in the same way as -LPS. - -ltree[]: -For array of ltree LPS is a bitwise OR-ed signatures of *ALL* children -reachable from that node. Signatures are stored in RD-tree, implemented using -GiST, which provides indexed access. - -ltree: -For ltree we store LPS in a B-tree, implemented using GiST. Each node entry is -represented by (left_bound, signature, right_bound), so that we could speedup -operations <, <=, =, >=, > using left_bound, right_bound and prune branches of -a tree using signature. -------------------------------------------------------------------------------- -We ask people who find the module useful to send us a postcards to: -Moscow, 119899, Universitetski pr.13, Moscow State University, Sternberg -Astronomical Institute, Russia -For: Bartunov O.S. -and -Moscow, Bratislavskaya str.23, appt. 18, Russia -For: Sigaev F.G. diff --git a/contrib/pageinspect/README.pageinspect b/contrib/pageinspect/README.pageinspect deleted file mode 100644 index fc4991db6410f537f65fed67e0956444a72e35dd..0000000000000000000000000000000000000000 --- a/contrib/pageinspect/README.pageinspect +++ /dev/null @@ -1,94 +0,0 @@ -The functions in this module allow you to inspect the contents of data pages -at a low level, for debugging purposes. All of these functions may be used -only by superusers. - -1. Installation - - $ make - $ make install - $ psql -e -f /usr/local/pgsql/share/contrib/pageinspect.sql test - -2. Functions included: - - get_raw_page - ------------ - get_raw_page reads one block of the named table and returns a copy as a - bytea field. This allows a single time-consistent copy of the block to be - made. - - page_header - ----------- - page_header shows fields which are common to all PostgreSQL heap and index - pages. - - A page image obtained with get_raw_page should be passed as argument: - - regression=# SELECT * FROM page_header(get_raw_page('pg_class',0)); - lsn | tli | flags | lower | upper | special | pagesize | version | prune_xid - -----------+-----+-------+-------+-------+---------+----------+---------+----------- - 0/24A1B50 | 1 | 1 | 232 | 368 | 8192 | 8192 | 4 | 0 - (1 row) - - The returned columns correspond to the fields in the PageHeaderData struct. - See src/include/storage/bufpage.h for details. - - heap_page_items - --------------- - heap_page_items shows all line pointers on a heap page. For those line - pointers that are in use, tuple headers are also shown. All tuples are - shown, whether or not the tuples were visible to an MVCC snapshot at the - time the raw page was copied. - - A heap page image obtained with get_raw_page should be passed as argument: - - test=# SELECT * FROM heap_page_items(get_raw_page('pg_class',0)); - - See src/include/storage/itemid.h and src/include/access/htup.h for - explanations of the fields returned. - - bt_metap - -------- - bt_metap() returns information about a btree index's metapage: - - test=> SELECT * FROM bt_metap('pg_cast_oid_index'); - -[ RECORD 1 ]----- - magic | 340322 - version | 2 - root | 1 - level | 0 - fastroot | 1 - fastlevel | 0 - - bt_page_stats - ------------- - bt_page_stats() shows information about single btree pages: - - test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1); - -[ RECORD 1 ]-+----- - blkno | 1 - type | l - live_items | 256 - dead_items | 0 - avg_item_size | 12 - page_size | 8192 - free_size | 4056 - btpo_prev | 0 - btpo_next | 0 - btpo | 0 - btpo_flags | 3 - - bt_page_items - ------------- - bt_page_items() returns information about specific items on btree pages: - - test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1); - itemoffset | ctid | itemlen | nulls | vars | data - ------------+---------+---------+-------+------+------------- - 1 | (0,1) | 12 | f | f | 23 27 00 00 - 2 | (0,2) | 12 | f | f | 24 27 00 00 - 3 | (0,3) | 12 | f | f | 25 27 00 00 - 4 | (0,4) | 12 | f | f | 26 27 00 00 - 5 | (0,5) | 12 | f | f | 27 27 00 00 - 6 | (0,6) | 12 | f | f | 28 27 00 00 - 7 | (0,7) | 12 | f | f | 29 27 00 00 - 8 | (0,8) | 12 | f | f | 2a 27 00 00 diff --git a/contrib/pg_freespacemap/README.pg_freespacemap b/contrib/pg_freespacemap/README.pg_freespacemap deleted file mode 100644 index 9210419cb8c4d0bee00930cf0675fe90ff3e7474..0000000000000000000000000000000000000000 --- a/contrib/pg_freespacemap/README.pg_freespacemap +++ /dev/null @@ -1,173 +0,0 @@ -Pg_freespacemap - Real time queries on the free space map (FSM). ---------------- - - This module consists of two C functions: 'pg_freespacemap_relations()' and - 'pg_freespacemap_pages()' that return a set of records, plus two views - 'pg_freespacemap_relations' and 'pg_freespacemap_pages' for more - user-friendly access to the functions. - - The module provides the ability to examine the contents of the free space - map, without having to restart or rebuild the server with additional - debugging code. - - By default public access is REVOKED from the functions and views, just in - case there are security issues present in the code. - - -Installation ------------- - - Build and install the main Postgresql source, then this contrib module: - - $ cd contrib/pg_freespacemap - $ gmake - $ gmake install - - - To register the functions and views: - - $ psql -d -f pg_freespacemap.sql - - -Notes ------ - - The definitions for the columns exposed in the views are: - - pg_freespacemap_relations - - Column | references | Description - ------------------+----------------------+---------------------------------- - reltablespace | pg_tablespace.oid | Tablespace oid of the relation. - reldatabase | pg_database.oid | Database oid of the relation. - relfilenode | pg_class.relfilenode | Relfilenode of the relation. - avgrequest | | Moving average of free space - | | requests (NULL for indexes) - interestingpages | | Count of pages last reported as - | | containing useful free space. - storedpages | | Count of pages actually stored - | | in free space map. - nextpage | | Page index (from 0) to start next - | | search at. - - - pg_freespacemap_pages - - Column | references | Description - ----------------+----------------------+------------------------------------ - reltablespace | pg_tablespace.oid | Tablespace oid of the relation. - reldatabase | pg_database.oid | Database oid of the relation. - relfilenode | pg_class.relfilenode | Relfilenode of the relation. - relblocknumber | | Page number in the relation. - bytes | | Free bytes in the page, or NULL - | | for an index page (see below). - - - For pg_freespacemap_relations, there is one row for each relation in the free - space map. storedpages is the number of pages actually stored in the map, - while interestingpages is the number of pages the last VACUUM thought had - useful amounts of free space. - - If storedpages is consistently less than interestingpages then it'd be a - good idea to increase max_fsm_pages. Also, if the number of rows in - pg_freespacemap_relations is close to max_fsm_relations, then you should - consider increasing max_fsm_relations. - - For pg_freespacemap_pages, there is one row for each page in the free space - map. The number of rows for a relation will match the storedpages column - in pg_freespacemap_relations. - - For indexes, what is tracked is entirely-unused pages, rather than free - space within pages. Therefore, the average request size and free bytes - within a page are not meaningful, and are shown as NULL. - - Because the map is shared by all the databases, it will include relations - not belonging to the current database. - - When either of the views are accessed, internal free space map locks are - taken, and a copy of the map data is made for them to display. - This ensures that the views produce a consistent set of results, while not - blocking normal activity longer than necessary. Nonetheless there - could be some impact on database performance if they are read often. - - -Sample output - pg_freespacemap_relations -------------- - -regression=# \d pg_freespacemap_relations -View "public.pg_freespacemap_relations" - Column | Type | Modifiers -------------------+---------+----------- - reltablespace | oid | - reldatabase | oid | - relfilenode | oid | - avgrequest | integer | - interestingpages | integer | - storedpages | integer | - nextpage | integer | -View definition: - SELECT p.reltablespace, p.reldatabase, p.relfilenode, p.avgrequest, p.interestingpages, p.storedpages, p.nextpage - FROM pg_freespacemap_relations() p(reltablespace oid, reldatabase oid, relfilenode oid, avgrequest integer, interestingpages integer, storedpages integer, nextpage integer); - -regression=# SELECT c.relname, r.avgrequest, r.interestingpages, r.storedpages - FROM pg_freespacemap_relations r INNER JOIN pg_class c - ON c.relfilenode = r.relfilenode INNER JOIN pg_database d - ON r.reldatabase = d.oid AND (d.datname = current_database()) - ORDER BY r.storedpages DESC LIMIT 10; - relname | avgrequest | interestingpages | storedpages ----------------------------------+------------+------------------+------------- - onek | 256 | 109 | 109 - pg_attribute | 167 | 93 | 93 - pg_class | 191 | 49 | 49 - pg_attribute_relid_attnam_index | | 48 | 48 - onek2 | 256 | 37 | 37 - pg_depend | 95 | 26 | 26 - pg_type | 199 | 16 | 16 - pg_rewrite | 1011 | 13 | 13 - pg_class_relname_nsp_index | | 10 | 10 - pg_proc | 302 | 8 | 8 -(10 rows) - - -Sample output - pg_freespacemap_pages -------------- - -regression=# \d pg_freespacemap_pages - View "public.pg_freespacemap_pages" - Column | Type | Modifiers -----------------+---------+----------- - reltablespace | oid | - reldatabase | oid | - relfilenode | oid | - relblocknumber | bigint | - bytes | integer | -View definition: - SELECT p.reltablespace, p.reldatabase, p.relfilenode, p.relblocknumber, p.bytes - FROM pg_freespacemap_pages() p(reltablespace oid, reldatabase oid, relfilenode oid, relblocknumber bigint, bytes integer); - -regression=# SELECT c.relname, p.relblocknumber, p.bytes - FROM pg_freespacemap_pages p INNER JOIN pg_class c - ON c.relfilenode = p.relfilenode INNER JOIN pg_database d - ON (p.reldatabase = d.oid AND d.datname = current_database()) - ORDER BY c.relname LIMIT 10; - relname | relblocknumber | bytes ---------------+----------------+------- - a_star | 0 | 8040 - abstime_tbl | 0 | 7908 - aggtest | 0 | 8008 - altinhoid | 0 | 8128 - altstartwith | 0 | 8128 - arrtest | 0 | 7172 - b_star | 0 | 7976 - box_tbl | 0 | 7912 - bt_f8_heap | 54 | 7728 - bt_i4_heap | 49 | 8008 -(10 rows) - - - -Author ------- - - * Mark Kirkwood - diff --git a/contrib/pg_standby/README.pg_standby b/contrib/pg_standby/README.pg_standby deleted file mode 100644 index b0b55a25381b93172bd279a1320b6da1961d96e6..0000000000000000000000000000000000000000 --- a/contrib/pg_standby/README.pg_standby +++ /dev/null @@ -1,206 +0,0 @@ -pg_standby README 2006/12/08 Simon Riggs - -o What is pg_standby? - - pg_standby allows the creation of a Warm Standby server. - It is designed to be a production-ready program, as well as a - customisable template should you require specific modifications. - Other configuration is required as well, all of which is - described in the main server manual. - - The program is designed to be a wait-for restore_command, - required to turn a normal archive recovery into a Warm Standby. - Within the restore_command of the recovery.conf you could - configure pg_standby in the following way: - - restore_command = 'pg_standby archiveDir %f %p %r' - - which would be sufficient to define that files will be restored - from archiveDir. - -o features of pg_standby - - - pg_standby is written in C. So it is very portable - and easy to install. - - - supports copy or link from a directory (only) - - - source easy to modify, with specifically designated - sections to modify for your own needs, allowing - interfaces to be written for additional Backup Archive Restore - (BAR) systems - - - portable: tested on Linux and Windows - -o How to install pg_standby - - $make - $make install - -o How to use pg_standby? - - pg_standby should be used within the restore_command of the - recovery.conf file. See the main PostgreSQL manual for details. - - The basic usage should be like this: - - restore_command = 'pg_standby archiveDir %f %p %r' - - with the pg_standby command usage as - - pg_standby [OPTION]... ARCHIVELOCATION NEXTWALFILE XLOGFILEPATH [RESTARTWALFILE] - - When used within the restore_command the %f and %p macros - will provide the actual file and path required for the restore/recovery. - - pg_standby assumes that ARCHIVELOCATION is directory accessible by the - server-owning user. - - If RESTARTWALFILE is specified, typically by using the %r option, then all files - prior to this file will be removed from ARCHIVELOCATION. This then minimises - the number of files that need to be held, whilst at the same time maintaining - restart capability. This capability additionally assumes that ARCHIVELOCATION - directory is writable. - -o options - - pg_standby allows the following command line switches - - -c - use copy/cp command to restore WAL files from archive - - -d - debug/logging option. - - -k numfiles - Cleanup files in the archive so that we maintain no more - than this many files in the archive. This parameter will - be silently ignored if RESTARTWALFILE is specified, since - that specification method is more accurate in determining - the correct cut-off point in archive. - - You should be wary against setting this number too low, - since this may mean you cannot restart the standby. This - is because the last restartpoint marked in the WAL files - may be many files in the past and can vary considerably. - This should be set to a value exceeding the number of WAL - files that can be recovered in 2*checkpoint_timeout seconds, - according to the value in the warm standby postgresql.conf. - It is wholly unrelated to the setting of checkpoint_segments - on either primary or standby. - - Setting numfiles to be zero will disable deletion of files - from ARCHIVELOCATION. - - If in doubt, use a large value or do not set a value at all. - - If you specify neither RESTARTWALFILE nor -k, then -k 0 - will be assumed, i.e. keep all files in archive. - Default=0, Min=0 - - -l - use ln command to restore WAL files from archive - WAL files will remain in archive - - Link is more efficient, but the default is copy to - allow you to maintain the WAL archive for recovery - purposes as well as high-availability. - The default setting is not necessarily recommended, - consult the main database server manual for discussion. - - This option uses the Windows Vista command mklink - to provide a file-to-file symbolic link. -l will - not work on versions of Windows prior to Vista. - Use the -c option instead. - see http://en.wikipedia.org/wiki/NTFS_symbolic_link - - -r maxretries - the maximum number of times to retry the restore command if it - fails. After each failure, we wait for sleeptime * num_retries - so that the wait time increases progressively, so by default - we will wait 5 secs, 10 secs then 15 secs before reporting - the failure back to the database server. This will be - interpreted as and end of recovery and the Standby will come - up fully as a result. - Default=3, Min=0 - - -s sleeptime - the number of seconds to sleep between testing to see - if the file to be restored is available in the archive yet. - The default setting is not necessarily recommended, - consult the main database server manual for discussion. - Default=5, Min=1, Max=60 - - -t triggerfile - the presence of the triggerfile will cause recovery to end - whether or not the next file is available - It is recommended that you use a structured filename to - avoid confusion as to which server is being triggered - when multiple servers exist on same system. - e.g. /tmp/pgsql.trigger.5432 - - -w maxwaittime - the maximum number of seconds to wait for the next file, - after which recovery will end and the Standby will come up. - A setting of zero means wait forever. - The default setting is not necessarily recommended, - consult the main database server manual for discussion. - Default=0, Min=0 - - Note: --help is not supported since pg_standby is not intended - for interactive use, except during dev/test - -o examples - - Linux - - archive_command = 'cp %p ../archive/%f' - - restore_command = 'pg_standby -l -d -k 255 -r 2 -s 2 -w 0 -t /tmp/pgsql.trigger.5442 $PWD/../archive %f %p 2>> standby.log' - - which will - - use a ln command to restore WAL files from archive - - produce logfile output in standby.log - - keep the last 255 full WAL files, plus the current one - - sleep for 2 seconds between checks for next WAL file is full - - never timeout if file not found - - stop waiting when a trigger file called /tmp.pgsql.trigger.5442 appears - - Windows - - archive_command = 'copy %p ..\\archive\\%f' - Note that backslashes need to be doubled in the archive_command, but - *not* in the restore_command, in 8.2, 8.1, 8.0 on Windows. - - restore_command = 'pg_standby -c -d -s 5 -w 0 -t C:\pgsql.trigger.5442 ..\archive %f %p 2>> standby.log' - - which will - - use a copy command to restore WAL files from archive - - produce logfile output in standby.log - - sleep for 5 seconds between checks for next WAL file is full - - never timeout if file not found - - stop waiting when a trigger file called C:\pgsql.trigger.5442 appears - -o supported versions - - pg_standby is designed to work with PostgreSQL 8.2 and later. It is - currently compatible across minor changes between the way 8.3 and 8.2 - operate. - - PostgreSQL 8.3 provides the %r command line substitution, designed to - let pg_standby know the last file it needs to keep. If the last - parameter is omitted, no error is generated, allowing pg_standby to - function correctly with PostgreSQL 8.2 also. With PostgreSQL 8.2, - the -k option must be used if archive cleanup is required. This option - remains available in 8.3. - -o reported test success - - SUSE Linux 10.2 - Windows XP Pro - -o additional design notes - - The use of a move command seems like it would be a good idea, but - this would prevent recovery from being restartable. Also, the last WAL - file is always requested twice from the archive. diff --git a/contrib/pg_trgm/README.pg_trgm b/contrib/pg_trgm/README.pg_trgm deleted file mode 100644 index e7ff73e4f10686ed2b6aba31bb1bd3a593b718b7..0000000000000000000000000000000000000000 --- a/contrib/pg_trgm/README.pg_trgm +++ /dev/null @@ -1,144 +0,0 @@ -trgm - Trigram matching for PostgreSQL --------------------------------------- - -Introduction - - This module is sponsored by Delta-Soft Ltd., Moscow, Russia. - - The pg_trgm contrib module provides functions and index classes - for determining the similarity of text based on trigram - matching. - -Definitions - - Trigram (or Trigraph) - - A trigram is a set of three consecutive characters taken - from a string. A string is considered to have two spaces - prefixed and one space suffixed when determining the set - of trigrams that comprise the string. - - eg. The set of trigrams in the word "cat" is " c", " ca", - "at " and "cat". - -Public Functions - - real similarity(text, text) - - Returns a number that indicates how closely matches the two - arguments are. A zero result indicates that the two words - are completely dissimilar, and a result of one indicates that - the two words are identical. - - real show_limit() - - Returns the current similarity threshold used by the '%' - operator. This in effect sets the minimum similarity between - two words in order that they be considered similar enough to - be misspellings of each other, for example. - - real set_limit(real) - - Sets the current similarity threshold that is used by the '%' - operator, and is returned by the show_limit() function. - - text[] show_trgm(text) - - Returns an array of all the trigrams of the supplied text - parameter. - -Public Operators - - text % text (returns boolean) - - The '%' operator returns TRUE if its two arguments have a similarity - that is greater than the similarity threshold set by set_limit(). It - will return FALSE if the similarity is less than the current - threshold. - -Public Index Operator Classes - - gist_trgm_ops - - The pg_trgm module comes with an index operator class that allows a - developer to create an index over a text column for the purpose - of very fast similarity searches. - - To use this index, the '%' operator must be used and an appropriate - similarity threshold for the application must be set. - - eg. - - CREATE TABLE test_trgm (t text); - CREATE INDEX trgm_idx ON test_trgm USING gist (t gist_trgm_ops); - - At this point, you will have an index on the t text column that you - can use for similarity searching. - - eg. - - SELECT - t, - similarity(t, 'word') AS sml - FROM - test_trgm - WHERE - t % 'word' - ORDER BY - sml DESC, t; - - This will return all values in the text column that are sufficiently - similar to 'word', sorted from best match to worst. The index will - be used to make this a fast operation over very large data sets. - -Tsearch2 Integration - - Trigram matching is a very useful tool when used in conjunction - with a text index created by the Tsearch2 contrib module. (See - contrib/tsearch2) - - The first step is to generate an auxiliary table containing all - the unique words in the Tsearch2 index: - - CREATE TABLE words AS SELECT word FROM - stat('SELECT to_tsvector(''simple'', bodytext) FROM documents'); - - Where 'documents' is a table that has a text field 'bodytext' - that TSearch2 is used to search. The use of the 'simple' dictionary - with the to_tsvector function, instead of just using the already - existing vector is to avoid creating a list of already stemmed - words. This way, only the original, unstemmed words are added - to the word list. - - Next, create a trigram index on the word column: - - CREATE INDEX words_idx ON words USING gist(word gist_trgm_ops); - or - CREATE INDEX words_idx ON words USING gin(word gist_trgm_ops); - - Now, a SELECT query similar to the example above can be used to - suggest spellings for misspelled words in user search terms. A - useful extra clause is to ensure that the similar words are also - of similar length to the misspelled word. - - Note: Since the 'words' table has been generated as a separate, - static table, it will need to be periodically regenerated so that - it remains up to date with the word list in the Tsearch2 index. - -Authors - - Oleg Bartunov , Moscow, Moscow University, Russia - Teodor Sigaev , Moscow, Delta-Soft Ltd.,Russia - -Contributors - - Christopher Kings-Lynne wrote this README file - -References - - Tsearch2 Development Site - http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ - - GiST Development Site - http://www.sai.msu.su/~megera/postgres/gist/ - diff --git a/contrib/pgbench/README.pgbench b/contrib/pgbench/README.pgbench deleted file mode 100644 index b8572319e13b745eb0a700208870e319a09d3407..0000000000000000000000000000000000000000 --- a/contrib/pgbench/README.pgbench +++ /dev/null @@ -1,284 +0,0 @@ -$PostgreSQL: pgsql/contrib/pgbench/README.pgbench,v 1.20 2007/07/06 20:17:02 wieck Exp $ - -pgbench README - -o What is pgbench? - - pgbench is a simple program to run a benchmark test. pgbench is a - client application of PostgreSQL and runs with PostgreSQL only. It - performs lots of small and simple transactions including - SELECT/UPDATE/INSERT operations then calculates number of - transactions successfully completed within a second (transactions - per second, tps). Targeting data includes a table with at least 100k - tuples. - - Example outputs from pgbench look like: - - number of clients: 4 - number of transactions per client: 100 - number of processed transactions: 400/400 - tps = 19.875015(including connections establishing) - tps = 20.098827(excluding connections establishing) - - Similar program called "JDBCBench" already exists, but it requires - Java that may not be available on every platform. Moreover some - people concerned about the overhead of Java that might lead - inaccurate results. So I decided to write in pure C, and named - it "pgbench." - -o features of pgbench - - - pgbench is written in C using libpq only. So it is very portable - and easy to install. - - - pgbench can simulate concurrent connections using asynchronous - capability of libpq. No threading is required. - -o How to install pgbench - - $make - $make install - -o How to use pgbench? - - (1) (optional)Initialize database by: - - pgbench -i - - where is the name of database. pgbench uses four tables - accounts, branches, history and tellers. These tables will be - destroyed. Be very careful if you have tables having same - names. Default test data contains: - - table # of tuples - ------------------------- - branches 1 - tellers 10 - accounts 100000 - history 0 - - You can increase the number of tuples by using -s option. branches, - tellers and accounts tables are created with a fillfactor which is - set using -F option. See below. - - (2) Run the benchmark test - - pgbench - - The default configuration is: - - number of clients: 1 - number of transactions per client: 10 - -o options - - pgbench has number of options. - - -h hostname - hostname where the backend is running. If this option - is omitted, pgbench will connect to the localhost via - Unix domain socket. - - -p port - the port number that the backend is accepting. default is - libpq's default, usually 5432. - - -c number_of_clients - Number of clients simulated. default is 1. - - -t number_of_transactions - Number of transactions each client runs. default is 10. - - -s scaling_factor - this should be used with -i (initialize) option. - number of tuples generated will be multiple of the - scaling factor. For example, -s 100 will imply 10M - (10,000,000) tuples in the accounts table. - default is 1. NOTE: scaling factor should be at least - as large as the largest number of clients you intend - to test; else you'll mostly be measuring update contention. - Regular (not initializing) runs using one of the - built-in tests will detect scale based on the number of - branches in the database. For custom (-f) runs it can - be manually specified with this parameter. - - -D varname=value - Define a variable. It can be refered to by a script - provided by using -f option. Multiple -D options are allowed. - - -U login - Specify db user's login name if it is different from - the Unix login name. - - -P password - Specify the db password. CAUTION: using this option - might be a security hole since ps command will - show the password. Use this for TESTING PURPOSE ONLY. - - -n - No vacuuming and cleaning the history table prior to the - test is performed. - - -v - Do vacuuming before testing. This will take some time. - With neither -n nor -v, pgbench will vacuum tellers and - branches tables only. - - -S - Perform select only transactions instead of TPC-B. - - -N Do not update "branches" and "tellers". This will - avoid heavy update contention on branches and tellers, - while it will not make pgbench supporting TPC-B like - transactions. - - -f filename - Read transaction script from file. Detailed - explanation will appear later. - - -C - Establish connection for each transaction, rather than - doing it just once at beginning of pgbench in the normal - mode. This is useful to measure the connection overhead. - - -l - Write the time taken by each transaction to a logfile, - with the name "pgbench_log.xxx", where xxx is the PID - of the pgbench process. The format of the log is: - - client_id transaction_no time file_no time-epoch time-us - - where time is measured in microseconds, , the file_no is - which test file was used (useful when multiple were - specified with -f), and time-epoch/time-us are a - UNIX epoch format timestamp followed by an offset - in microseconds (suitable for creating a ISO 8601 - timestamp with a fraction of a second) of when - the transaction completed. - - Here are example outputs: - - 0 199 2241 0 1175850568 995598 - 0 200 2465 0 1175850568 998079 - 0 201 2513 0 1175850569 608 - 0 202 2038 0 1175850569 2663 - - -F fillfactor - - Create tables(accounts, tellers and branches) with the given - fillfactor. Default is 100. This should be used with -i - (initialize) option. - - -d - debug option. - - -o What is the "transaction" actually performed in pgbench? - - (1) begin; - - (2) update accounts set abalance = abalance + :delta where aid = :aid; - - (3) select abalance from accounts where aid = :aid; - - (4) update tellers set tbalance = tbalance + :delta where tid = :tid; - - (5) update branches set bbalance = bbalance + :delta where bid = :bid; - - (6) insert into history(tid,bid,aid,delta) values(:tid,:bid,:aid,:delta); - - (7) end; - -If you specify -N, (4) and (5) aren't included in the transaction. - -o -f option - - This supports for reading transaction script from a specified - file. This file should include SQL commands in each line. SQL - command consists of multiple lines are not supported. Empty lines - and lines begging with "--" will be ignored. - - Multiple -f options are allowed. In this case each transaction is - assigned randomly chosen script. - - SQL commands can include "meta command" which begins with "\" (back - slash). A meta command takes some arguments separted by white - spaces. Currently following meta command is supported: - - \set name operand1 [ operator operand2 ] - set the calculated value using "operand1" "operator" - "operand2" to variable "name". If "operator" and "operand2" - are omitted, the value of operand1 is set to variable "name". - - example: - - \set ntellers 10 * :scale - - \setrandom name min max - - assign random integer to name between min and max - - example: - - \setrandom aid 1 100000 - - variables can be reffered to in SQL comands by adding ":" in front - of the varible name. - - example: - - SELECT abalance FROM accounts WHERE aid = :aid - - Variables can also be defined by using -D option. - - \sleep num [us|ms|s] - - causes script execution to sleep for the specified duration of - microseconds (us), milliseconds (ms) or the default seconds (s). - - example: - - \setrandom millisec 1000 2500 - \sleep :millisec ms - - Example, TPC-B like benchmark can be defined as follows(scaling - factor = 1): - -\set nbranches :scale -\set ntellers 10 * :scale -\set naccounts 100000 * :scale -\setrandom aid 1 :naccounts -\setrandom bid 1 :nbranches -\setrandom tid 1 :ntellers -\setrandom delta 1 10000 -BEGIN -UPDATE accounts SET abalance = abalance + :delta WHERE aid = :aid -SELECT abalance FROM accounts WHERE aid = :aid -UPDATE tellers SET tbalance = tbalance + :delta WHERE tid = :tid -UPDATE branches SET bbalance = bbalance + :delta WHERE bid = :bid -INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, 'now') -END - -If you want to automatically set the scaling factor from the number of -tuples in branches table, use -s option and shell command like this: - -pgbench -s $(psql -At -c "SELECT count(*) FROM branches") -f tpc_b.sql - -Notice that -f option does not execute vacuum and clearing history -table before starting benchmark. - -o License? - -Basically it is same as BSD license. See pgbench.c for more details. - -o History before contributed to PostgreSQL - -2000/1/15 pgbench-1.2 contributed to PostgreSQL - * Add -v option - -1999/09/29 pgbench-1.1 released - * Apply cygwin patches contributed by Yutaka Tanida - * More robust when backends die - * Add -S option (select only) - -1999/09/04 pgbench-1.0 released diff --git a/contrib/pgcrypto/README.pgcrypto b/contrib/pgcrypto/README.pgcrypto deleted file mode 100644 index 05f0e27781b5efd886cdfbe967177260c9564a69..0000000000000000000000000000000000000000 --- a/contrib/pgcrypto/README.pgcrypto +++ /dev/null @@ -1,709 +0,0 @@ -pgcrypto - cryptographic functions for PostgreSQL -================================================= -Marko Kreen - -// Note: this document is in asciidoc format. - - -1. Installation ------------------ - -Run following commands: - - make - make install - make installcheck - -The `make installcheck` command is important. It runs regression tests -for the module. They make sure the functions here produce correct -results. - -Next, to put the functions into a particular database, run the commands in -file pgcrypto.sql, which has been installed into the shared files directory. - -Example using psql: - - psql -d DBNAME -f pgcrypto.sql - - -2. Notes ----------- - -2.1. Configuration -~~~~~~~~~~~~~~~~~~~~ - -pgcrypto configures itself according to the findings of main PostgreSQL -`configure` script. The options that affect it are `--with-zlib` and -`--with-openssl`. - -When compiled with zlib, PGP encryption functions are able to -compress data before encrypting. - -When compiled with OpenSSL there will be more algorithms available. -Also public-key encryption functions will be faster as OpenSSL -has more optimized BIGNUM functions. - -Summary of functionality with and without OpenSSL: - -`----------------------------`---------`------------ - Functionality built-in OpenSSL ----------------------------------------------------- - MD5 yes yes - SHA1 yes yes - SHA224/256/384/512 yes yes (3) - Any other digest algo no yes (1) - Blowfish yes yes - AES yes yes (2) - DES/3DES/CAST5 no yes - Raw encryption yes yes - PGP Symmetric encryption yes yes - PGP Public-Key encryption yes yes ----------------------------------------------------- - -1. Any digest algorithm OpenSSL supports is automatically picked up. - This is not possible with ciphers, which need to be supported - explicitly. - -2. AES is included in OpenSSL since version 0.9.7. If pgcrypto is - compiled against older version, it will use built-in AES code, - so it has AES always available. - -3. SHA2 algorithms were added to OpenSSL in version 0.9.8. For - older versions, pgcrypto will use built-in code. - - -2.2. NULL handling -~~~~~~~~~~~~~~~~~~~~ - -As standard in SQL, all functions return NULL, if any of the arguments -are NULL. This may create security risks on careless usage. - - -2.3. Security -~~~~~~~~~~~~~~~ - -All the functions here run inside database server. That means that all -the data and passwords move between pgcrypto and client application in -clear-text. Thus you must: - -1. Connect locally or use SSL connections. -2. Trust both system and database administrator. - -If you cannot, then better do crypto inside client application. - - -3. General hashing --------------------- - -3.1. digest(data, type) -~~~~~~~~~~~~~~~~~~~~~~~~~ - - digest(data text, type text) RETURNS bytea - digest(data bytea, type text) RETURNS bytea - -Type is here the algorithm to use. Standard algorithms are `md5` and -`sha1`, although there may be more supported, depending on build -options. - -Returns binary hash. - -If you want hexadecimal string, use `encode()` on result. Example: - - CREATE OR REPLACE FUNCTION sha1(bytea) RETURNS text AS $$ - SELECT encode(digest($1, 'sha1'), 'hex') - $$ LANGUAGE SQL STRICT IMMUTABLE; - - -3.2. hmac(data, key, type) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - - hmac(data text, key text, type text) RETURNS bytea - hmac(data bytea, key text, type text) RETURNS bytea - -Calculates Hashed MAC over data. `type` is the same as in `digest()`. -If the key is larger than hash block size it will first hashed and the -hash will be used as key. - -It is similar to digest() but the hash can be recalculated only knowing -the key. This avoids the scenario of someone altering data and also -changing the hash. - -Returns binary hash. - - - -4. Password hashing ---------------------- - -The functions `crypt()` and `gen_salt()` are specifically designed -for hashing passwords. `crypt()` does the hashing and `gen_salt()` -prepares algorithm parameters for it. - -The algorithms in `crypt()` differ from usual hashing algorithms like -MD5 or SHA1 in following respects: - -1. They are slow. As the amount of data is so small, this is only - way to make brute-forcing passwords hard. -2. Include random 'salt' with result, so that users having same - password would have different crypted passwords. This is also - additional defense against reversing the algorithm. -3. Include algorithm type in the result, so passwords hashed with - different algorithms can co-exist. -4. Some of them are adaptive - that means after computers get - faster, you can tune the algorithm to be slower, without - introducing incompatibility with existing passwords. - -Supported algorithms: -`------`-------------`---------`----------`--------------------------- - Type Max password Adaptive Salt bits Description ----------------------------------------------------------------------- -`bf` 72 yes 128 Blowfish-based, variant 2a -`md5` unlimited no 48 md5-based crypt() -`xdes` 8 yes 24 Extended DES -`des` 8 no 12 Original UNIX crypt ----------------------------------------------------------------------- - - -4.1. crypt(password, salt) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - - crypt(password text, salt text) RETURNS text - -Calculates UN*X crypt(3) style hash of password. When storing new -password, you need to use function `gen_salt()` to generate new salt. -When checking password you should use existing hash as salt. - -Example - setting new password: - - UPDATE .. SET pswhash = crypt('new password', gen_salt('md5')); - -Example - authentication: - - SELECT pswhash = crypt('entered password', pswhash) WHERE .. ; - -returns true or false whether the entered password is correct. -It also can return NULL if `pswhash` field is NULL. - - -4.2. gen_salt(type) -~~~~~~~~~~~~~~~~~~~~~ - - gen_salt(type text) RETURNS text - -Generates a new random salt for usage in `crypt()`. For adaptible -algorithms, it uses the default iteration count. - -Accepted types are: `des`, `xdes`, `md5` and `bf`. - - -4.3. gen_salt(type, rounds) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - - gen_salt(type text, rounds integer) RETURNS text - -Same as above, but lets user specify iteration count for some -algorithms. The higher the count, the more time it takes to hash -the password and therefore the more time to break it. Although with -too high count the time to calculate a hash may be several years -- which is somewhat impractical. - -Number is algorithm specific: - -`-----'---------'-----'---------- - type default min max ---------------------------------- - `xdes` 725 1 16777215 - `bf` 6 4 31 ---------------------------------- - -In case of xdes there is a additional limitation that the count must be -a odd number. - -Notes: - -- Original DES crypt was designed to have the speed of 4 hashes per - second on the hardware of that time. -- Slower than 4 hashes per second would probably dampen usability. -- Faster than 100 hashes per second is probably too fast. -- See next section about possible values for `crypt-bf`. - - -4.4. Comparison of crypt and regular hashes -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Here is a table that should give overview of relative slowness -of different hashing algorithms. - -* The goal is to crack a 8-character password, which consists: - 1. Only of lowercase letters - 2. Numbers, lower- and uppercase letters. -* The table below shows how much time it would take to try all - combinations of characters. -* The `crypt-bf` is featured in several settings - the number - after slash is the `rounds` parameter of `gen_salt()`. - -`------------'----------'--------------'-------------------- -Algorithm Hashes/sec Chars: [a-z] Chars: [A-Za-z0-9] ------------------------------------------------------------- -crypt-bf/8 28 246 years 251322 years -crypt-bf/7 57 121 years 123457 years -crypt-bf/6 112 62 years 62831 years -crypt-bf/5 211 33 years 33351 years -crypt-md5 2681 2.6 years 2625 years -crypt-des 362837 7 days 19 years -sha1 590223 4 days 12 years -md5 2345086 1 day 3 years ------------------------------------------------------------- - -* The machine used is 1.5GHz Pentium 4. -* crypt-des and crypt-md5 algorithm numbers are taken from - John the Ripper v1.6.38 `-test` output. -* MD5 numbers are from mdcrack 1.2. -* SHA1 numbers are from lcrack-20031130-beta. -* `crypt-bf` numbers are taken using simple program that loops - over 1000 8-character passwords. That way I can show the speed with - different number of rounds. For reference: `john -test` shows 213 - loops/sec for crypt-bf/5. (The small difference in results is in - accordance to the fact that the `crypt-bf` implementation in pgcrypto - is same one that is used in John the Ripper.) - -Note that "try all combinations" is not a realistic exercise. -Usually password cracking is done with the help of dictionaries, which -contain both regular words and various mutations of them. So, even -somewhat word-like passwords could be cracked much faster than the above -numbers suggest, and a 6-character non-word like password may escape -cracking. Or not. - - -5. PGP encryption -------------------- - -The functions here implement the encryption part of OpenPGP (RFC2440) -standard. Supported are both symmetric-key and public-key encryption. - - -5.1. Overview -~~~~~~~~~~~~~~~ - -Encrypted PGP message consists of 2 packets: - -- Packet for session key - either symmetric- or public-key encrypted. -- Packet for session-key encrypted data. - -When encrypting with password: - -1. Given password is hashed using String2Key (S2K) algorithm. This - is rather similar to `crypt()` algorithm - purposefully slow - and with random salt - but it produces a full-length binary key. -2. If separate session key is requested, new random key will be - generated. Otherwise S2K key will be used directly as session key. -3. If S2K key is to be used directly, then only S2K settings will be put - into session key packet. Otherwise session key will be encrypted with - S2K key and put into session key packet. - -When encrypting with public key: - -1. New random session key is generated. -2. It is encrypted using public key and put into session key packet. - -Now common part, the session-key encrypted data packet: - -1. Optional data-manipulation: compression, conversion to UTF-8, - conversion of line-endings. -2. Data is prefixed with block of random bytes. This is equal - to using random IV. -3. A SHA1 hash of random prefix and data is appended. -4. All this is encrypted with session key. - - -5.2. pgp_sym_encrypt(data, psw) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - - pgp_sym_encrypt(data text, psw text [, options text] ) RETURNS bytea - pgp_sym_encrypt_bytea(data bytea, psw text [, options text] ) RETURNS bytea - -Return a symmetric-key encrypted PGP message. - -Options are described in section 5.8. - - -5.3. pgp_sym_decrypt(msg, psw) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - - pgp_sym_decrypt(msg bytea, psw text [, options text] ) RETURNS text - pgp_sym_decrypt_bytea(msg bytea, psw text [, options text] ) RETURNS bytea - -Decrypt a symmetric-key encrypted PGP message. - -Decrypting bytea data with `pgp_sym_decrypt` is disallowed. -This is to avoid outputting invalid character data. Decrypting -originally textual data with `pgp_sym_decrypt_bytea` is fine. - -Options are described in section 5.8. - - -5.4. pgp_pub_encrypt(data, pub_key) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - - pgp_pub_encrypt(data text, key bytea [, options text] ) RETURNS bytea - pgp_pub_encrypt_bytea(data bytea, key bytea [, options text] ) RETURNS bytea - -Encrypt data with a public key. Giving this function a secret key will -produce a error. - -Options are described in section 5.8. - - -5.5. pgp_pub_decrypt(msg, sec_key [, psw]) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - - pgp_pub_decrypt(msg bytea, key bytea [, psw text [, options text]] ) \ - RETURNS text - pgp_pub_decrypt_bytea(msg bytea, key bytea [,psw text [, options text]] ) \ - RETURNS bytea - -Decrypt a public-key encrypted message with secret key. If the secret -key is password-protected, you must give the password in `psw`. If -there is no password, but you want to specify option for function, you -need to give empty password. - -Decrypting bytea data with `pgp_pub_decrypt` is disallowed. -This is to avoid outputting invalid character data. Decrypting -originally textual data with `pgp_pub_decrypt_bytea` is fine. - -Options are described in section 5.8. - - -5.6. pgp_key_id(key / msg) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - - pgp_key_id(key or msg bytea) RETURNS text - -It shows you either key ID if given PGP public or secret key. Or it -gives the key ID that was used for encrypting the data, if given -encrypted message. - -It can return 2 special key IDs: - -SYMKEY:: - The data is encrypted with symmetric key. - -ANYKEY:: - The data is public-key encrypted, but the key ID is cleared. - That means you need to try all your secret keys on it to see - which one decrypts it. pgcrypto itself does not produce such - messages. - -Note that different keys may have same ID. This is rare but normal -event. Client application should then try to decrypt with each one, -to see which fits - like handling ANYKEY. - - -5.7. armor / dearmor -~~~~~~~~~~~~~~~~~~~~~~ - - armor(data bytea) RETURNS text - dearmor(data text) RETURNS bytea - -Those wrap/unwrap data into PGP Ascii Armor which is basically Base64 -with CRC and additional formatting. - - -5.8. Options for PGP functions -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Options are named to be similar to GnuPG. Values should be given after -an equal sign; separate options from each other with commas. Example: - - pgp_sym_encrypt(data, psw, 'compress-algo=1, cipher-algo=aes256') - -All of the options except `convert-crlf` apply only to encrypt -functions. Decrypt functions get the parameters from PGP data. - -Most interesting options are probably `compression-algo` and -`unicode-mode`. The rest should have reasonable defaults. - - -cipher-algo:: - What cipher algorithm to use. - - Values: bf, aes128, aes192, aes256 (OpenSSL-only: `3des`, `cast5`) - Default: aes128 - Applies: pgp_sym_encrypt, pgp_pub_encrypt - -compress-algo:: - Which compression algorithm to use. Needs building with zlib. - - Values: - 0 - no compression - 1 - ZIP compression - 2 - ZLIB compression [=ZIP plus meta-data and block-CRC's] - Default: 0 - Applies: pgp_sym_encrypt, pgp_pub_encrypt - -compress-level:: - How much to compress. Bigger level compresses smaller but is slower. - 0 disables compression. - - Values: 0, 1-9 - Default: 6 - Applies: pgp_sym_encrypt, pgp_pub_encrypt - -convert-crlf:: - Whether to convert `\n` into `\r\n` when encrypting and `\r\n` to `\n` - when decrypting. RFC2440 specifies that text data should be stored - using `\r\n` line-feeds. Use this to get fully RFC-compliant - behavior. - - Values: 0, 1 - Default: 0 - Applies: pgp_sym_encrypt, pgp_pub_encrypt, pgp_sym_decrypt, pgp_pub_decrypt - -disable-mdc:: - Do not protect data with SHA-1. Only good reason to use this - option is to achieve compatibility with ancient PGP products, as the - SHA-1 protected packet is from upcoming update to RFC2440. (Currently - at version RFC2440bis-14.) Recent gnupg.org and pgp.com software - supports it fine. - - Values: 0, 1 - Default: 0 - Applies: pgp_sym_encrypt, pgp_pub_encrypt - -enable-session-key:: - Use separate session key. Public-key encryption always uses separate - session key, this is for symmetric-key encryption, which by default - uses S2K directly. - - Values: 0, 1 - Default: 0 - Applies: pgp_sym_encrypt - -s2k-mode:: - Which S2K algorithm to use. - - Values: - 0 - Without salt. Dangerous! - 1 - With salt but with fixed iteration count. - 3 - Variable iteration count. - Default: 3 - Applies: pgp_sym_encrypt - -s2k-digest-algo:: - Which digest algorithm to use in S2K calculation. - - Values: md5, sha1 - Default: sha1 - Applies: pgp_sym_encrypt - -s2k-cipher-algo:: - Which cipher to use for encrypting separate session key. - - Values: bf, aes, aes128, aes192, aes256 - Default: use cipher-algo. - Applies: pgp_sym_encrypt - -unicode-mode:: - Whether to convert textual data from database internal encoding to - UTF-8 and back. If your database already is UTF-8, no conversion will - be done, only the data will be tagged as UTF-8. Without this option - it will not be. - - Values: 0, 1 - Default: 0 - Applies: pgp_sym_encrypt, pgp_pub_encrypt - - -5.9. Generating keys with GnuPG -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Generate a new key: - - gpg --gen-key - -The preferred key type is "DSA and Elgamal". - -For RSA encryption you must create either DSA or RSA sign-only key -as master and then add RSA encryption subkey with `gpg --edit-key`. - -List keys: - - gpg --list-secret-keys - -Export ascii-armored public key: - - gpg -a --export KEYID > public.key - -Export ascii-armored secret key: - - gpg -a --export-secret-keys KEYID > secret.key - -You need to use `dearmor()` on them before giving them to -pgp_pub_* functions. Or if you can handle binary data, you can drop -"-a" from gpg. - -For more details see `man gpg`, http://www.gnupg.org/gph/en/manual.html[ -The GNU Privacy Handbook] and other docs on http://www.gnupg.org[] site. - - -5.10. Limitations of PGP code -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -- No support for signing. That also means that it is not checked - whether the encryption subkey belongs to master key. - -- No support for encryption key as master key. As such practice - is generally discouraged, it should not be a problem. - -- No support for several subkeys. This may seem like a problem, as this - is common practice. On the other hand, you should not use your regular - GPG/PGP keys with pgcrypto, but create new ones, as the usage scenario - is rather different. - - -6. Raw encryption -------------------- - -Those functions only run a cipher over data, they don't have any advanced -features of PGP encryption. Therefore they have some major problems: - -1. They use user key directly as cipher key. -2. They don't provide any integrity checking, to see - if the encrypted data was modified. -3. They expect that users manage all encryption parameters - themselves, even IV. -4. They don't handle text. - -So, with the introduction of PGP encryption, usage of raw -encryption functions is discouraged. - - - encrypt(data bytea, key bytea, type text) RETURNS bytea - decrypt(data bytea, key bytea, type text) RETURNS bytea - - encrypt_iv(data bytea, key bytea, iv bytea, type text) RETURNS bytea - decrypt_iv(data bytea, key bytea, iv bytea, type text) RETURNS bytea - -Encrypt/decrypt data with cipher, padding data if needed. - -`type` parameter description in pseudo-noteup: - - algo ['-' mode] ['/pad:' padding] - -Supported algorithms: - -* `bf` - Blowfish -* `aes` - AES (Rijndael-128) - -Modes: - -* `cbc` - next block depends on previous. (default) -* `ecb` - each block is encrypted separately. - (for testing only) - -Padding: - -* `pkcs` - data may be any length (default) -* `none` - data must be multiple of cipher block size. - -IV is initial value for mode, defaults to all zeroes. It is ignored for -ECB. It is clipped or padded with zeroes if not exactly block size. - -So, example: - - encrypt(data, 'fooz', 'bf') - -is equal to - - encrypt(data, 'fooz', 'bf-cbc/pad:pkcs') - - -7. Random bytes ------------------ - - gen_random_bytes(count integer) - -Returns `count` cryptographically strong random bytes as bytea value. -There can be maximally 1024 bytes extracted at a time. This is to avoid -draining the randomness generator pool. - - -8. Credits ------------- - -I have used code from following sources: - -`--------------------`-------------------------`------------------------------- - Algorithm Author Source origin -------------------------------------------------------------------------------- - DES crypt() David Burren and others FreeBSD libcrypt - MD5 crypt() Poul-Henning Kamp FreeBSD libcrypt - Blowfish crypt() Solar Designer www.openwall.com - Blowfish cipher Simon Tatham PuTTY - Rijndael cipher Brian Gladman OpenBSD sys/crypto - MD5 and SHA1 WIDE Project KAME kame/sys/crypto - SHA256/384/512 Aaron D. Gifford OpenBSD sys/crypto - BIGNUM math Michael J. Fromberger dartmouth.edu/~sting/sw/imath -------------------------------------------------------------------------------- - - -9. Legalese -------------- - -* I owe a beer to Poul-Henning. - - -10. References/Links ----------------------- - -10.1. Useful reading -~~~~~~~~~~~~~~~~~~~~~~ - -http://www.gnupg.org/gph/en/manual.html[]:: - The GNU Privacy Handbook - -http://www.openwall.com/crypt/[]:: - Describes the crypt-blowfish algorithm. - -http://www.stack.nl/~galactus/remailers/passphrase-faq.html[]:: - How to choose good password. - -http://world.std.com/~reinhold/diceware.html[]:: - Interesting idea for picking passwords. - -http://www.interhack.net/people/cmcurtin/snake-oil-faq.html[]:: - Describes good and bad cryptography. - - -10.2. Technical references -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -http://www.ietf.org/rfc/rfc2440.txt[]:: - OpenPGP message format - -http://www.imc.org/draft-ietf-openpgp-rfc2440bis[]:: - New version of RFC2440. - -http://www.ietf.org/rfc/rfc1321.txt[]:: - The MD5 Message-Digest Algorithm - -http://www.ietf.org/rfc/rfc2104.txt[]:: - HMAC: Keyed-Hashing for Message Authentication - -http://www.usenix.org/events/usenix99/provos.html[]:: - Comparison of crypt-des, crypt-md5 and bcrypt algorithms. - -http://csrc.nist.gov/cryptval/des.htm[]:: - Standards for DES, 3DES and AES. - -http://en.wikipedia.org/wiki/Fortuna_(PRNG)[]:: - Description of Fortuna CSPRNG. - -http://jlcooke.ca/random/[]:: - Jean-Luc Cooke Fortuna-based /dev/random driver for Linux. - -http://www.cs.ut.ee/~helger/crypto/[]:: - Collection of cryptology pointers. - - -// $PostgreSQL: pgsql/contrib/pgcrypto/README.pgcrypto,v 1.19 2007/03/28 22:48:58 neilc Exp $ diff --git a/contrib/pgrowlocks/README.pgrowlocks b/contrib/pgrowlocks/README.pgrowlocks deleted file mode 100644 index 6964cc9c73e0463506d0870692640c437f1a317f..0000000000000000000000000000000000000000 --- a/contrib/pgrowlocks/README.pgrowlocks +++ /dev/null @@ -1,88 +0,0 @@ -$PostgreSQL: pgsql/contrib/pgrowlocks/README.pgrowlocks,v 1.2 2007/08/27 00:13:51 tgl Exp $ - -pgrowlocks README Tatsuo Ishii - -1. What is pgrowlocks? - - pgrowlocks shows row locking information for specified table. - - pgrowlocks returns following columns: - - locked_row TID, -- row TID - lock_type TEXT, -- lock type - locker XID, -- locking XID - multi bool, -- multi XID? - xids xid[], -- multi XIDs - pids INTEGER[] -- locker's process id - - Here is a sample execution of pgrowlocks: - -test=# SELECT * FROM pgrowlocks('t1'); - locked_row | lock_type | locker | multi | xids | pids -------------+-----------+--------+-------+-----------+--------------- - (0,1) | Shared | 19 | t | {804,805} | {29066,29068} - (0,2) | Shared | 19 | t | {804,805} | {29066,29068} - (0,3) | Exclusive | 804 | f | {804} | {29066} - (0,4) | Exclusive | 804 | f | {804} | {29066} -(4 rows) - - locked_row -- tuple ID(TID) of each locked rows - lock_type -- "Shared" for shared lock, "Exclusive" for exclusive lock - locker -- transaction ID of locker (note 1) - multi -- "t" if locker is a multi transaction, otherwise "f" - xids -- XIDs of lockers (note 2) - pids -- process ids of locking backends - - note1: if the locker is multi transaction, it represents the multi ID - - note2: if the locker is multi, multiple data are shown - -2. Installing pgrowlocks - - Installing pgrowlocks requires PostgreSQL 8.0 or later source tree. - - $ cd /usr/local/src/postgresql-8.1/contrib - $ tar xfz /tmp/pgrowlocks-1.0.tar.gz - - If you are using PostgreSQL 8.0, you need to modify pgrowlocks source code. - Around line 61, you will see: - - #undef MAKERANGEVARFROMNAMELIST_HAS_TWO_ARGS - - change this to: - - #define MAKERANGEVARFROMNAMELIST_HAS_TWO_ARGS - - $ make - $ make install - - $ psql -e -f pgrowlocks.sql test - -3. How to use pgrowlocks - - pgrowlocks grab AccessShareLock for the target table and read each - row one by one to get the row locking information. You should - notice that: - - 1) if the table is exclusive locked by someone else, pgrowlocks - will be blocked. - - 2) pgrowlocks may show incorrect information if there's a new - lock or a lock is freeed while its execution. - - pgrowlocks does not show the contents of locked rows. If you want - to take a look at the row contents at the same time, you could do - something like this: - - SELECT * FROM accounts AS a, pgrowlocks('accounts') AS p WHERE p.locked_ row = a.ctid; - - -4. License - - pgrowlocks is distribute under (modified) BSD license described in - the source file. - -5. History - - 2006/03/21 pgrowlocks version 1.1 released (tested on 8.2 current) - 2005/08/22 pgrowlocks version 1.0 released diff --git a/contrib/pgstattuple/README.pgstattuple b/contrib/pgstattuple/README.pgstattuple deleted file mode 100644 index 8b35ae32a1281bf00fe4fca59bdc01b47a1fe07f..0000000000000000000000000000000000000000 --- a/contrib/pgstattuple/README.pgstattuple +++ /dev/null @@ -1,102 +0,0 @@ -pgstattuple README 2002/08/29 Tatsuo Ishii - -1. Functions supported: - - pgstattuple - ----------- - pgstattuple() returns the relation length, percentage of the "dead" - tuples of a relation and other info. This may help users to determine - whether vacuum is necessary or not. Here is an example session: - - test=> \x - Expanded display is on. - test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc'); - -[ RECORD 1 ]------+------- - table_len | 458752 - tuple_count | 1470 - tuple_len | 438896 - tuple_percent | 95.67 - dead_tuple_count | 11 - dead_tuple_len | 3157 - dead_tuple_percent | 0.69 - free_space | 8932 - free_percent | 1.95 - - Here are explanations for each column: - - table_len -- physical relation length in bytes - tuple_count -- number of live tuples - tuple_len -- total tuples length in bytes - tuple_percent -- live tuples in % - dead_tuple_len -- total dead tuples length in bytes - dead_tuple_percent -- dead tuples in % - free_space -- free space in bytes - free_percent -- free space in % - - pg_relpages - ----------- - pg_relpages() returns the number of pages in the relation. - - pgstatindex - ----------- - pgstatindex() returns an array showing the information about an index: - - test=> \x - Expanded display is on. - test=> SELECT * FROM pgstatindex('pg_cast_oid_index'); - -[ RECORD 1 ]------+------ - version | 2 - tree_level | 0 - index_size | 8192 - root_block_no | 1 - internal_pages | 0 - leaf_pages | 1 - empty_pages | 0 - deleted_pages | 0 - avg_leaf_density | 50.27 - leaf_fragmentation | 0 - - -2. Installing pgstattuple - - $ make - $ make install - $ psql -e -f /usr/local/pgsql/share/contrib/pgstattuple.sql test - - -3. Using pgstattuple - - pgstattuple may be called as a relation function and is - defined as follows: - - CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type - AS 'MODULE_PATHNAME', 'pgstattuple' - LANGUAGE C STRICT; - - CREATE OR REPLACE FUNCTION pgstattuple(oid) RETURNS pgstattuple_type - AS 'MODULE_PATHNAME', 'pgstattuplebyid' - LANGUAGE C STRICT; - - The argument is the relation name (optionally it may be qualified) - or the OID of the relation. Note that pgstattuple only returns - one row. - - -4. Notes - - pgstattuple acquires only a read lock on the relation. So concurrent - update may affect the result. - - pgstattuple judges a tuple is "dead" if HeapTupleSatisfiesNow() - returns false. - - -5. History - - 2007/05/17 - - Moved page-level functions to contrib/pageinspect. - - 2006/06/28 - - Extended to work against indexes. diff --git a/contrib/seg/README.seg b/contrib/seg/README.seg deleted file mode 100644 index 7fa29c44e08312cafad02198af3ef26b05017940..0000000000000000000000000000000000000000 --- a/contrib/seg/README.seg +++ /dev/null @@ -1,326 +0,0 @@ -This directory contains the code for the user-defined type, -SEG, representing laboratory measurements as floating point -intervals. - -RATIONALE -========= - -The geometry of measurements is usually more complex than that of a -point in a numeric continuum. A measurement is usually a segment of -that continuum with somewhat fuzzy limits. The measurements come out -as intervals because of uncertainty and randomness, as well as because -the value being measured may naturally be an interval indicating some -condition, such as the temperature range of stability of a protein. - -Using just common sense, it appears more convenient to store such data -as intervals, rather than pairs of numbers. In practice, it even turns -out more efficient in most applications. - -Further along the line of common sense, the fuzziness of the limits -suggests that the use of traditional numeric data types leads to a -certain loss of information. Consider this: your instrument reads -6.50, and you input this reading into the database. What do you get -when you fetch it? Watch: - -test=> select 6.50 as "pH"; - pH ---- -6.5 -(1 row) - -In the world of measurements, 6.50 is not the same as 6.5. It may -sometimes be critically different. The experimenters usually write -down (and publish) the digits they trust. 6.50 is actually a fuzzy -interval contained within a bigger and even fuzzier interval, 6.5, -with their center points being (probably) the only common feature they -share. We definitely do not want such different data items to appear the -same. - -Conclusion? It is nice to have a special data type that can record the -limits of an interval with arbitrarily variable precision. Variable in -a sense that each data element records its own precision. - -Check this out: - -test=> select '6.25 .. 6.50'::seg as "pH"; - pH ------------- -6.25 .. 6.50 -(1 row) - - -FILES -===== - -Makefile building instructions for the shared library - -README.seg the file you are now reading - -seg.c the implementation of this data type in c - -seg.sql.in SQL code needed to register this type with postgres - (transformed to seg.sql by make) - -segdata.h the data structure used to store the segments - -segparse.y the grammar file for the parser (used by seg_in() in seg.c) - -segscan.l scanner rules (used by seg_yyparse() in segparse.y) - -seg-validate.pl a simple input validation script. It is probably a - little stricter than the type itself: for example, - it rejects '22 ' because of the trailing space. Use - as a filter to discard bad values from a single column; - redirect to /dev/null to see the offending input - -sort-segments.pl a script to sort the tables having a SEG type column - - -INSTALLATION -============ - -To install the type, run - - make - make install - -The user running "make install" may need root access; depending on how you -configured the PostgreSQL installation paths. - -This only installs the type implementation and documentation. To make the -type available in any particular database, do - - psql -d databasename < seg.sql - -If you install the type in the template1 database, all subsequently created -databases will inherit it. - -To test the new type, after "make install" do - - make installcheck - -If it fails, examine the file regression.diffs to find out the reason (the -test code is a direct adaptation of the regression tests from the main -source tree). - - -SYNTAX -====== - -The external representation of an interval is formed using one or two -floating point numbers joined by the range operator ('..' or '...'). -Optional certainty indicators (<, > and ~) are ignored by the internal -logics, but are retained in the data. - -Grammar -------- - -rule 1 seg -> boundary PLUMIN deviation -rule 2 seg -> boundary RANGE boundary -rule 3 seg -> boundary RANGE -rule 4 seg -> RANGE boundary -rule 5 seg -> boundary -rule 6 boundary -> FLOAT -rule 7 boundary -> EXTENSION FLOAT -rule 8 deviation -> FLOAT - -Tokens ------- - -RANGE (\.\.)(\.)? -PLUMIN \'\+\-\' -integer [+-]?[0-9]+ -real [+-]?[0-9]+\.[0-9]+ -FLOAT ({integer}|{real})([eE]{integer})? -EXTENSION [<>~] - - -Examples of valid SEG representations: --------------------------------------- - -Any number (rules 5,6) -- creates a zero-length segment (a point, - if you will) - -~5.0 (rules 5,7) -- creates a zero-length segment AND records - '~' in the data. This notation reads 'approximately 5.0', - but its meaning is not recognized by the code. It is ignored - until you get the value back. View it is a short-hand comment. - -<5.0 (rules 5,7) -- creates a point at 5.0; '<' is ignored but - is preserved as a comment - ->5.0 (rules 5,7) -- creates a point at 5.0; '>' is ignored but - is preserved as a comment - -5(+-)0.3 -5'+-'0.3 (rules 1,8) -- creates an interval '4.7..5.3'. As of this - writing (02/09/2000), this mechanism isn't completely accurate - in determining the number of significant digits for the - boundaries. For example, it adds an extra digit to the lower - boundary if the resulting interval includes a power of ten: - - postgres=> select '10(+-)1'::seg as seg; - seg - --------- - 9.0 .. 11 -- should be: 9 .. 11 - - Also, the (+-) notation is not preserved: 'a(+-)b' will - always be returned as '(a-b) .. (a+b)'. The purpose of this - notation is to allow input from certain data sources without - conversion. - -50 .. (rule 3) -- everything that is greater than or equal to 50 - -.. 0 (rule 4) -- everything that is less than or equal to 0 - -1.5e-2 .. 2E-2 (rule 2) -- creates an interval (0.015 .. 0.02) - -1 ... 2 The same as 1...2, or 1 .. 2, or 1..2 (space is ignored). - Because of the widespread use of '...' in the data sources, - I decided to stick to is as a range operator. This, and - also the fact that the white space around the range operator - is ignored, creates a parsing conflict with numeric constants - starting with a decimal point. - - -Examples of invalid SEG input: ------------------------------- - -.1e7 should be: 0.1e7 -.1 .. .2 should be: 0.1 .. 0.2 -2.4 E4 should be: 2.4E4 - -The following, although it is not a syntax error, is disallowed to improve -the sanity of the data: - -5 .. 2 should be: 2 .. 5 - - -PRECISION -========= - -The segments are stored internally as pairs of 32-bit floating point -numbers. It means that the numbers with more than 7 significant digits -will be truncated. - -The numbers with less than or exactly 7 significant digits retain their -original precision. That is, if your query returns 0.00, you will be -sure that the trailing zeroes are not the artifacts of formatting: they -reflect the precision of the original data. The number of leading -zeroes does not affect precision: the value 0.0067 is considered to -have just 2 significant digits. - - -USAGE -===== - -The access method for SEG is a GiST index (gist_seg_ops), which is a -generalization of R-tree. GiSTs allow the postgres implementation of -R-tree, originally encoded to support 2-D geometric types such as -boxes and polygons, to be used with any data type whose data domain -can be partitioned using the concepts of containment, intersection and -equality. In other words, everything that can intersect or contain -its own kind can be indexed with a GiST. That includes, among other -things, all geometric data types, regardless of their dimensionality -(see also contrib/cube). - -The operators supported by the GiST access method include: - - -[a, b] << [c, d] Is left of - - The left operand, [a, b], occurs entirely to the left of the - right operand, [c, d], on the axis (-inf, inf). It means, - [a, b] << [c, d] is true if b < c and false otherwise - -[a, b] >> [c, d] Is right of - - [a, b] is occurs entirely to the right of [c, d]. - [a, b] >> [c, d] is true if a > d and false otherwise - -[a, b] &< [c, d] Overlaps or is left of - - This might be better read as "does not extend to right of". - It is true when b <= d. - -[a, b] &> [c, d] Overlaps or is right of - - This might be better read as "does not extend to left of". - It is true when a >= c. - -[a, b] = [c, d] Same as - - The segments [a, b] and [c, d] are identical, that is, a == b - and c == d - -[a, b] && [c, d] Overlaps - - The segments [a, b] and [c, d] overlap. - -[a, b] @> [c, d] Contains - - The segment [a, b] contains the segment [c, d], that is, - a <= c and b >= d - -[a, b] <@ [c, d] Contained in - - The segment [a, b] is contained in [c, d], that is, - a >= c and b <= d - -(Before PostgreSQL 8.2, the containment operators @> and <@ were -respectively called @ and ~. These names are still available, but are -deprecated and will eventually be retired. Notice that the old names -are reversed from the convention formerly followed by the core geometric -datatypes!) - -Although the mnemonics of the following operators is questionable, I -preserved them to maintain visual consistency with other geometric -data types defined in Postgres. - -Other operators: - -[a, b] < [c, d] Less than -[a, b] > [c, d] Greater than - - These operators do not make a lot of sense for any practical - purpose but sorting. These operators first compare (a) to (c), - and if these are equal, compare (b) to (d). That accounts for - reasonably good sorting in most cases, which is useful if - you want to use ORDER BY with this type - -There are a few other potentially useful functions defined in seg.c -that vanished from the schema because I stopped using them. Some of -these were meant to support type casting. Let me know if I was wrong: -I will then add them back to the schema. I would also appreciate -other ideas that would enhance the type and make it more useful. - -For examples of usage, see sql/seg.sql - -NOTE: The performance of an R-tree index can largely depend on the -order of input values. It may be very helpful to sort the input table -on the SEG column (see the script sort-segments.pl for an example) - - -CREDITS -======= - -My thanks are primarily to Prof. Joe Hellerstein -(http://db.cs.berkeley.edu/~jmh/) for elucidating the gist of the GiST -(http://gist.cs.berkeley.edu/). I am also grateful to all postgres -developers, present and past, for enabling myself to create my own -world and live undisturbed in it. And I would like to acknowledge my -gratitude to Argonne Lab and to the U.S. Department of Energy for the -years of faithful support of my database research. - - ------------------------------------------------------------------------- -Gene Selkov, Jr. -Computational Scientist -Mathematics and Computer Science Division -Argonne National Laboratory -9700 S Cass Ave. -Building 221 -Argonne, IL 60439-4844 - -selkovjr@mcs.anl.gov - diff --git a/contrib/sslinfo/README.sslinfo b/contrib/sslinfo/README.sslinfo deleted file mode 100644 index 5ce13f54f5c799922a34bd9a071854b9ee6f120f..0000000000000000000000000000000000000000 --- a/contrib/sslinfo/README.sslinfo +++ /dev/null @@ -1,120 +0,0 @@ -sslinfo - information about current SSL certificate for PostgreSQL -================================================================== -Author: Victor Wagner , Cryptocom LTD -E-Mail of Cryptocom OpenSSL development group: - - -1. Notes --------- -This extension won't build unless your PostgreSQL server is configured -with --with-openssl. Information provided with these functions would -be completely useless if you don't use SSL to connect to database. - - -2. Functions Description ------------------------- - -2.1. ssl_is_used() -~~~~~~~~~~~~~~~~~~ - - ssl_is_used() RETURNS boolean; - -Returns TRUE, if current connection to server uses SSL and FALSE -otherwise. - -2.2. ssl_client_cert_present() -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - - ssl_client_cert_present() RETURNS boolean - -Returns TRUE if current client have presented valid SSL client -certificate to the server and FALSE otherwise (e.g., no SSL, -certificate hadn't be requested by server). - -2.3. ssl_client_serial() -~~~~~~~~~~~~~~~~~~~~~~~~ - - ssl_client_serial() RETURNS numeric - -Returns serial number of current client certificate. The combination -of certificate serial number and certificate issuer is guaranteed to -uniquely identify certificate (but not its owner -- the owner ought to -regularily change his keys, and get new certificates from the issuer). - -So, if you run you own CA and allow only certificates from this CA to -be accepted by server, the serial number is the most reliable (albeit -not very mnemonic) means to indentify user. - -2.4. ssl_client_dn() -~~~~~~~~~~~~~~~~~~~~ - - ssl_client_dn() RETURNS text - -Returns the full subject of current client certificate, converting -character data into the current database encoding. It is assumed that -if you use non-Latin characters in the certificate names, your -database is able to represent these characters, too. If your database -uses the SQL_ASCII encoding, non-Latin characters in the name will be -represented as UTF-8 sequences. - -The result looks like '/CN=Somebody /C=Some country/O=Some organization'. - -2.5. ssl_issuer_dn() -~~~~~~~~~~~~~~~~~~~~ - -Returns the full issuer name of the client certificate, converting -character data into current database encoding. - -The combination of the return value of this function with the -certificate serial number uniquely identifies the certificate. - -The result of this function is really useful only if you have more -than one trusted CA certificate in your server's root.crt file, or if -this CA has issued some intermediate certificate authority -certificates. - -2.6. ssl_client_dn_field() -~~~~~~~~~~~~~~~~~~~~~~~~~~ - - ssl_client_dn_field(fieldName text) RETURNS text - -This function returns the value of the specified field in the -certificate subject. Field names are string constants that are -converted into ASN1 object identificators using the OpenSSL object -database. The following values are acceptable: - - commonName (alias CN) - surname (alias SN) - name - givenName (alias GN) - countryName (alias C) - localityName (alias L) - stateOrProvinceName (alias ST) - organizationName (alias O) - organizationUnitName (alias OU) - title - description - initials - postalCode - streetAddress - generationQualifier - description - dnQualifier - x500UniqueIdentifier - pseudonim - role - emailAddress - -All of these fields are optional, except commonName. It depends -entirely on your CA policy which of them would be included and which -wouldn't. The meaning of these fields, howeer, is strictly defined by -the X.500 and X.509 standards, so you cannot just assign arbitrary -meaning to them. - -2.7 ssl_issuer_field() -~~~~~~~~~~~~~~~~~~~ - - ssl_issuer_field(fieldName text) RETURNS text; - -Does same as ssl_client_dn_field, but for the certificate issuer -rather than the certificate subject. diff --git a/contrib/tablefunc/README.tablefunc b/contrib/tablefunc/README.tablefunc deleted file mode 100644 index c54f53231bb263ab07f4ec347efda23da53062d6..0000000000000000000000000000000000000000 --- a/contrib/tablefunc/README.tablefunc +++ /dev/null @@ -1,642 +0,0 @@ -/* - * tablefunc - * - * Sample to demonstrate C functions which return setof scalar - * and setof composite. - * Joe Conway - * And contributors: - * Nabil Sayegh - * - * Copyright (c) 2002-2007, PostgreSQL Global Development Group - * - * Permission to use, copy, modify, and distribute this software and its - * documentation for any purpose, without fee, and without a written agreement - * is hereby granted, provided that the above copyright notice and this - * paragraph and the following two paragraphs appear in all copies. - * - * IN NO EVENT SHALL THE AUTHORS OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR - * DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING - * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS - * DOCUMENTATION, EVEN IF THE AUTHOR OR DISTRIBUTORS HAVE BEEN ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. - * - * THE AUTHORS AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES, - * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY - * AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS - * ON AN "AS IS" BASIS, AND THE AUTHOR AND DISTRIBUTORS HAS NO OBLIGATIONS TO - * PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. - * - */ -Version 0.1 (20 July, 2002): - First release - -Release Notes: - - Version 0.1 - - initial release - -Installation: - Place these files in a directory called 'tablefunc' under 'contrib' in the - PostgreSQL source tree. Then run: - - make - make install - - You can use tablefunc.sql to create the functions in your database of choice, e.g. - - psql -U postgres template1 < tablefunc.sql - - installs following functions into database template1: - - normal_rand(int numvals, float8 mean, float8 stddev) - - returns a set of normally distributed float8 values - - crosstabN(text sql) - - returns a set of row_name plus N category value columns - - crosstab2(), crosstab3(), and crosstab4() are defined for you, - but you can create additional crosstab functions per the instructions - in the documentation below. - - crosstab(text sql) - - returns a set of row_name plus N category value columns - - requires anonymous composite type syntax in the FROM clause. See - the instructions in the documentation below. - - crosstab(text sql, N int) - - obsolete version of crosstab() - - the argument N is now ignored, since the number of value columns - is always determined by the calling query - - connectby(text relname, text keyid_fld, text parent_keyid_fld - [, text orderby_fld], text start_with, int max_depth - [, text branch_delim]) - - returns keyid, parent_keyid, level, and an optional branch string - and an optional serial column for ordering siblings - - requires anonymous composite type syntax in the FROM clause. See - the instructions in the documentation below. - -Documentation -================================================================== -Name - -normal_rand(int, float8, float8) - returns a set of normally - distributed float8 values - -Synopsis - -normal_rand(int numvals, float8 mean, float8 stddev) - -Inputs - - numvals - the number of random values to be returned from the function - - mean - the mean of the normal distribution of values - - stddev - the standard deviation of the normal distribution of values - -Outputs - - Returns setof float8, where the returned set of random values are normally - distributed (Gaussian distribution) - -Example usage - - test=# SELECT * FROM - test=# normal_rand(1000, 5, 3); - normal_rand ----------------------- - 1.56556322244898 - 9.10040991424657 - 5.36957140345079 - -0.369151492880995 - 0.283600703686639 - . - . - . - 4.82992125404908 - 9.71308014517282 - 2.49639286969028 -(1000 rows) - - Returns 1000 values with a mean of 5 and a standard deviation of 3. - -================================================================== -Name - -crosstabN(text) - returns a set of row_name plus N category value columns - -Synopsis - -crosstabN(text sql) - -Inputs - - sql - - A SQL statement which produces the source set of data. The SQL statement - must return one row_name column, one category column, and one value - column. row_name and value must be of type text. - - e.g. provided sql must produce a set something like: - - row_name cat value - ----------+-------+------- - row1 cat1 val1 - row1 cat2 val2 - row1 cat3 val3 - row1 cat4 val4 - row2 cat1 val5 - row2 cat2 val6 - row2 cat3 val7 - row2 cat4 val8 - -Outputs - - Returns setof tablefunc_crosstab_N, which is defined by: - - CREATE TYPE tablefunc_crosstab_N AS ( - row_name TEXT, - category_1 TEXT, - category_2 TEXT, - . - . - . - category_N TEXT - ); - - for the default installed functions, where N is 2, 3, or 4. - - e.g. the provided crosstab2 function produces a set something like: - <== values columns ==> - row_name category_1 category_2 - ---------+------------+------------ - row1 val1 val2 - row2 val5 val6 - -Notes - - 1. The sql result must be ordered by 1,2. - - 2. The number of values columns depends on the tuple description - of the function's declared return type. - - 3. Missing values (i.e. not enough adjacent rows of same row_name to - fill the number of result values columns) are filled in with nulls. - - 4. Extra values (i.e. too many adjacent rows of same row_name to fill - the number of result values columns) are skipped. - - 5. Rows with all nulls in the values columns are skipped. - - 6. The installed defaults are for illustration purposes. You - can create your own return types and functions based on the - crosstab() function of the installed library. See below for - details. - - -Example usage - -create table ct(id serial, rowclass text, rowid text, attribute text, value text); -insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att1','val1'); -insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att2','val2'); -insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att3','val3'); -insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att4','val4'); -insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att1','val5'); -insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att2','val6'); -insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att3','val7'); -insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att4','val8'); - -select * from crosstab3( - 'select rowid, attribute, value - from ct - where rowclass = ''group1'' - and (attribute = ''att2'' or attribute = ''att3'') order by 1,2;'); - - row_name | category_1 | category_2 | category_3 -----------+------------+------------+------------ - test1 | val2 | val3 | - test2 | val6 | val7 | -(2 rows) - -================================================================== -Name - -crosstab(text) - returns a set of row_names plus category value columns - -Synopsis - -crosstab(text sql) - -crosstab(text sql, int N) - -Inputs - - sql - - A SQL statement which produces the source set of data. The SQL statement - must return one row_name column, one category column, and one value - column. - - e.g. provided sql must produce a set something like: - - row_name cat value - ----------+-------+------- - row1 cat1 val1 - row1 cat2 val2 - row1 cat3 val3 - row1 cat4 val4 - row2 cat1 val5 - row2 cat2 val6 - row2 cat3 val7 - row2 cat4 val8 - - N - - Obsolete argument; ignored if supplied (formerly this had to match - the number of category columns determined by the calling query) - -Outputs - - Returns setof record, which must be defined with a column definition - in the FROM clause of the SELECT statement, e.g.: - - SELECT * - FROM crosstab(sql) AS ct(row_name text, category_1 text, category_2 text); - - the example crosstab function produces a set something like: - <== values columns ==> - row_name category_1 category_2 - ---------+------------+------------ - row1 val1 val2 - row2 val5 val6 - -Notes - - 1. The sql result must be ordered by 1,2. - - 2. The number of values columns is determined by the column definition - provided in the FROM clause. The FROM clause must define one - row_name column (of the same datatype as the first result column - of the sql query) followed by N category columns (of the same - datatype as the third result column of the sql query). You can - set up as many category columns as you wish. - - 3. Missing values (i.e. not enough adjacent rows of same row_name to - fill the number of result values columns) are filled in with nulls. - - 4. Extra values (i.e. too many adjacent rows of same row_name to fill - the number of result values columns) are skipped. - - 5. Rows with all nulls in the values columns are skipped. - - 6. You can avoid always having to write out a FROM clause that defines the - output columns by setting up a custom crosstab function that has - the desired output row type wired into its definition. - - There are two ways you can set up a custom crosstab function: - - A. Create a composite type to define your return type, similar to the - examples in the installation script. Then define a unique function - name accepting one text parameter and returning setof your_type_name. - For example, if your source data produces row_names that are TEXT, - and values that are FLOAT8, and you want 5 category columns: - - CREATE TYPE my_crosstab_float8_5_cols AS ( - row_name TEXT, - category_1 FLOAT8, - category_2 FLOAT8, - category_3 FLOAT8, - category_4 FLOAT8, - category_5 FLOAT8 - ); - - CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(text) - RETURNS setof my_crosstab_float8_5_cols - AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT; - - B. Use OUT parameters to define the return type implicitly. - The same example could also be done this way: - - CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(IN text, - OUT row_name TEXT, - OUT category_1 FLOAT8, - OUT category_2 FLOAT8, - OUT category_3 FLOAT8, - OUT category_4 FLOAT8, - OUT category_5 FLOAT8) - RETURNS setof record - AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT; - - -Example usage - -create table ct(id serial, rowclass text, rowid text, attribute text, value text); -insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att1','val1'); -insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att2','val2'); -insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att3','val3'); -insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att4','val4'); -insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att1','val5'); -insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att2','val6'); -insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att3','val7'); -insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att4','val8'); - -SELECT * -FROM crosstab( - 'select rowid, attribute, value - from ct - where rowclass = ''group1'' - and (attribute = ''att2'' or attribute = ''att3'') order by 1,2;', 3) -AS ct(row_name text, category_1 text, category_2 text, category_3 text); - - row_name | category_1 | category_2 | category_3 -----------+------------+------------+------------ - test1 | val2 | val3 | - test2 | val6 | val7 | -(2 rows) - -================================================================== -Name - -crosstab(text, text) - returns a set of row_name, extra, and - category value columns - -Synopsis - -crosstab(text source_sql, text category_sql) - -Inputs - - source_sql - - A SQL statement which produces the source set of data. The SQL statement - must return one row_name column, one category column, and one value - column. It may also have one or more "extra" columns. - - The row_name column must be first. The category and value columns - must be the last two columns, in that order. "extra" columns must be - columns 2 through (N - 2), where N is the total number of columns. - - The "extra" columns are assumed to be the same for all rows with the - same row_name. The values returned are copied from the first row - with a given row_name and subsequent values of these columns are ignored - until row_name changes. - - e.g. source_sql must produce a set something like: - SELECT row_name, extra_col, cat, value FROM foo; - - row_name extra_col cat value - ----------+------------+-----+--------- - row1 extra1 cat1 val1 - row1 extra1 cat2 val2 - row1 extra1 cat4 val4 - row2 extra2 cat1 val5 - row2 extra2 cat2 val6 - row2 extra2 cat3 val7 - row2 extra2 cat4 val8 - - category_sql - - A SQL statement which produces the distinct set of categories. The SQL - statement must return one category column only. category_sql must produce - at least one result row or an error will be generated. category_sql - must not produce duplicate categories or an error will be generated. - - e.g. SELECT DISTINCT cat FROM foo; - - cat - ------- - cat1 - cat2 - cat3 - cat4 - -Outputs - - Returns setof record, which must be defined with a column definition - in the FROM clause of the SELECT statement, e.g.: - - SELECT * FROM crosstab(source_sql, cat_sql) - AS ct(row_name text, extra text, cat1 text, cat2 text, cat3 text, cat4 text); - - the example crosstab function produces a set something like: - <== values columns ==> - row_name extra cat1 cat2 cat3 cat4 - ---------+-------+------+------+------+------ - row1 extra1 val1 val2 val4 - row2 extra2 val5 val6 val7 val8 - -Notes - - 1. source_sql must be ordered by row_name (column 1). - - 2. The number of values columns is determined at run-time. The - column definition provided in the FROM clause must provide for - the correct number of columns of the proper data types. - - 3. Missing values (i.e. not enough adjacent rows of same row_name to - fill the number of result values columns) are filled in with nulls. - - 4. Extra values (i.e. source rows with category not found in category_sql - result) are skipped. - - 5. Rows with a null row_name column are skipped. - - 6. You can create predefined functions to avoid having to write out - the result column names/types in each query. See the examples - for crosstab(text). - - -Example usage - -create table cth(id serial, rowid text, rowdt timestamp, attribute text, val text); -insert into cth values(DEFAULT,'test1','01 March 2003','temperature','42'); -insert into cth values(DEFAULT,'test1','01 March 2003','test_result','PASS'); -insert into cth values(DEFAULT,'test1','01 March 2003','volts','2.6987'); -insert into cth values(DEFAULT,'test2','02 March 2003','temperature','53'); -insert into cth values(DEFAULT,'test2','02 March 2003','test_result','FAIL'); -insert into cth values(DEFAULT,'test2','02 March 2003','test_startdate','01 March 2003'); -insert into cth values(DEFAULT,'test2','02 March 2003','volts','3.1234'); - -SELECT * FROM crosstab -( - 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1', - 'SELECT DISTINCT attribute FROM cth ORDER BY 1' -) -AS -( - rowid text, - rowdt timestamp, - temperature int4, - test_result text, - test_startdate timestamp, - volts float8 -); - rowid | rowdt | temperature | test_result | test_startdate | volts --------+--------------------------+-------------+-------------+--------------------------+-------- - test1 | Sat Mar 01 00:00:00 2003 | 42 | PASS | | 2.6987 - test2 | Sun Mar 02 00:00:00 2003 | 53 | FAIL | Sat Mar 01 00:00:00 2003 | 3.1234 -(2 rows) - -================================================================== -Name - -connectby(text, text, text[, text], text, text, int[, text]) - returns a set - representing a hierarchy (tree structure) - -Synopsis - -connectby(text relname, text keyid_fld, text parent_keyid_fld - [, text orderby_fld], text start_with, int max_depth - [, text branch_delim]) - -Inputs - - relname - - Name of the source relation - - keyid_fld - - Name of the key field - - parent_keyid_fld - - Name of the key_parent field - - orderby_fld - - If optional ordering of siblings is desired: - Name of the field to order siblings - - start_with - - root value of the tree input as a text value regardless of keyid_fld type - - max_depth - - zero (0) for unlimited depth, otherwise restrict level to this depth - - branch_delim - - If optional branch value is desired, this string is used as the delimiter. - When not provided, a default value of '~' is used for internal - recursion detection only, and no "branch" field is returned. - -Outputs - - Returns setof record, which must defined with a column definition - in the FROM clause of the SELECT statement, e.g.: - - SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~') - AS t(keyid text, parent_keyid text, level int, branch text); - - - or - - - SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0) - AS t(keyid text, parent_keyid text, level int); - - - or - - - SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~') - AS t(keyid text, parent_keyid text, level int, branch text, pos int); - - - or - - - SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0) - AS t(keyid text, parent_keyid text, level int, pos int); - -Notes - - 1. keyid and parent_keyid must be the same data type - - 2. The column definition *must* include a third column of type INT4 for - the level value output - - 3. If the branch field is not desired, omit both the branch_delim input - parameter *and* the branch field in the query column definition. Note - that when branch_delim is not provided, a default value of '~' is used - for branch_delim for internal recursion detection, even though the branch - field is not returned. - - 4. If the branch field is desired, it must be the fourth column in the query - column definition, and it must be type TEXT. - - 5. The parameters representing table and field names must include double - quotes if the names are mixed-case or contain special characters. - - 6. If sorting of siblings is desired, the orderby_fld input parameter *and* - a name for the resulting serial field (type INT32) in the query column - definition must be given. - -Example usage - -CREATE TABLE connectby_tree(keyid text, parent_keyid text, pos int); - -INSERT INTO connectby_tree VALUES('row1',NULL, 0); -INSERT INTO connectby_tree VALUES('row2','row1', 0); -INSERT INTO connectby_tree VALUES('row3','row1', 0); -INSERT INTO connectby_tree VALUES('row4','row2', 1); -INSERT INTO connectby_tree VALUES('row5','row2', 0); -INSERT INTO connectby_tree VALUES('row6','row4', 0); -INSERT INTO connectby_tree VALUES('row7','row3', 0); -INSERT INTO connectby_tree VALUES('row8','row6', 0); -INSERT INTO connectby_tree VALUES('row9','row5', 0); - --- with branch, without orderby_fld -SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~') - AS t(keyid text, parent_keyid text, level int, branch text); - keyid | parent_keyid | level | branch --------+--------------+-------+--------------------- - row2 | | 0 | row2 - row4 | row2 | 1 | row2~row4 - row6 | row4 | 2 | row2~row4~row6 - row8 | row6 | 3 | row2~row4~row6~row8 - row5 | row2 | 1 | row2~row5 - row9 | row5 | 2 | row2~row5~row9 -(6 rows) - --- without branch, without orderby_fld -SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0) - AS t(keyid text, parent_keyid text, level int); - keyid | parent_keyid | level --------+--------------+------- - row2 | | 0 - row4 | row2 | 1 - row6 | row4 | 2 - row8 | row6 | 3 - row5 | row2 | 1 - row9 | row5 | 2 -(6 rows) - --- with branch, with orderby_fld (notice that row5 comes before row4) -SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~') - AS t(keyid text, parent_keyid text, level int, branch text, pos int) ORDER BY t.pos; - keyid | parent_keyid | level | branch | pos --------+--------------+-------+---------------------+----- - row2 | | 0 | row2 | 1 - row5 | row2 | 1 | row2~row5 | 2 - row9 | row5 | 2 | row2~row5~row9 | 3 - row4 | row2 | 1 | row2~row4 | 4 - row6 | row4 | 2 | row2~row4~row6 | 5 - row8 | row6 | 3 | row2~row4~row6~row8 | 6 -(6 rows) - --- without branch, with orderby_fld (notice that row5 comes before row4) -SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0) - AS t(keyid text, parent_keyid text, level int, pos int) ORDER BY t.pos; - keyid | parent_keyid | level | pos --------+--------------+-------+----- - row2 | | 0 | 1 - row5 | row2 | 1 | 2 - row9 | row5 | 2 | 3 - row4 | row2 | 1 | 4 - row6 | row4 | 2 | 5 - row8 | row6 | 3 | 6 -(6 rows) - -================================================================== --- Joe Conway - diff --git a/contrib/uuid-ossp/README.uuid-ossp b/contrib/uuid-ossp/README.uuid-ossp deleted file mode 100644 index 6c5b0d04ed1f2363642546ea725aacb5fdfc331a..0000000000000000000000000000000000000000 --- a/contrib/uuid-ossp/README.uuid-ossp +++ /dev/null @@ -1,97 +0,0 @@ -UUID Generation Functions -========================= -Peter Eisentraut - -This module provides functions to generate universally unique -identifiers (UUIDs) using one of the several standard algorithms, as -well as functions to produce certain special UUID constants. - - -Installation ------------- - -The extra library required can be found at -. - - -UUID Generation ---------------- - -The relevant standards ITU-T Rec. X.667, ISO/IEC 9834-8:2005, and RFC -4122 specify four algorithms for generating UUIDs, identified by the -version numbers 1, 3, 4, and 5. (There is no version 2 algorithm.) -Each of these algorithms could be suitable for a different set of -applications. - -uuid_generate_v1() -~~~~~~~~~~~~~~~~~~ - -This function generates a version 1 UUID. This involves the MAC -address of the computer and a time stamp. Note that UUIDs of this -kind reveal the identity of the computer that created the identifier -and the time at which it did so, which might make it unsuitable for -certain security-sensitive applications. - -uuid_generate_v1mc() -~~~~~~~~~~~~~~~~~~~~ - -This function generates a version 1 UUID but uses a random multicast -MAC address instead of the real MAC address of the computer. - -uuid_generate_v3(namespace uuid, name text) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -This function generates a version 3 UUID in the given namespace using -the specified input name. The namespace should be one of the special -constants produced by the uuid_ns_*() functions shown below. (It -could be any UUID in theory.) The name is an identifier in the -selected namespace. For example: - - uuid_generate_v3(uuid_ns_url(), 'http://www.postgresql.org') - -The name parameter will be MD5-hashed, so the cleartext cannot be -derived from the generated UUID. - -The generation of UUIDs by this method has no random or -environment-dependent element and is therefore reproducible. - -uuid_generate_v4() -~~~~~~~~~~~~~~~~~~ - -This function generates a version 4 UUID, which is derived entirely -from random numbers. - -uuid_generate_v5(namespace uuid, name text) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -This function generates a version 5 UUID, which works like a version 3 -UUID except that SHA-1 is used as a hashing method. Version 5 should -be preferred over version 3 because SHA-1 is thought to be more secure -than MD5. - - -UUID Constants --------------- - - uuid_nil() - -A "nil" UUID constant, which does not occur as a real UUID. - - uuid_ns_dns() - -Constant designating the DNS namespace for UUIDs. - - uuid_ns_url() - -Constant designating the URL namespace for UUIDs. - - uuid_ns_oid() - -Constant designating the ISO object identifier (OID) namespace for -UUIDs. (This pertains to ASN.1 OIDs, unrelated to the OIDs used in -PostgreSQL.) - - uuid_ns_x500() - -Constant designating the X.500 distinguished name (DN) namespace for -UUIDs. diff --git a/contrib/vacuumlo/README.vacuumlo b/contrib/vacuumlo/README.vacuumlo deleted file mode 100644 index 560649a65436f0a6b315d59de106c1a187328636..0000000000000000000000000000000000000000 --- a/contrib/vacuumlo/README.vacuumlo +++ /dev/null @@ -1,58 +0,0 @@ -$PostgreSQL: pgsql/contrib/vacuumlo/README.vacuumlo,v 1.5 2005/06/23 00:06:37 tgl Exp $ - -This is a simple utility that will remove any orphaned large objects out of a -PostgreSQL database. An orphaned LO is considered to be any LO whose OID -does not appear in any OID data column of the database. - -If you use this, you may also be interested in the lo_manage trigger in -contrib/lo. lo_manage is useful to try to avoid creating orphaned LOs -in the first place. - - -Compiling --------- - -Simply run make. A single executable "vacuumlo" is created. - - -Usage ------ - -vacuumlo [options] database [database2 ... databasen] - -All databases named on the command line are processed. Available options -include: - - -v Write a lot of progress messages - -n Don't remove large objects, just show what would be done - -U username Username to connect as - -W Prompt for password - -h hostname Database server host - -p port Database server port - - -Method ------- - -First, it builds a temporary table which contains all of the OIDs of the -large objects in that database. - -It then scans through all columns in the database that are of type "oid" -or "lo", and removes matching entries from the temporary table. - -The remaining entries in the temp table identify orphaned LOs. These are -removed. - - -Notes ------ - -I decided to place this in contrib as it needs further testing, but hopefully, -this (or a variant of it) would make it into the backend as a "vacuum lo" -command in a later release. - -Peter Mount -http://www.retep.org.uk -March 21 1999 - -Committed April 10 1999 Peter diff --git a/contrib/xml2/README.xml2 b/contrib/xml2/README.xml2 deleted file mode 100644 index 28d3db0e006aa9d6a4b652c3e7973f1e6de59d3f..0000000000000000000000000000000000000000 --- a/contrib/xml2/README.xml2 +++ /dev/null @@ -1,278 +0,0 @@ -XML-handling functions for PostgreSQL -===================================== - - DEPRECATION NOTICE: From PostgreSQL 8.3 on, there is XML-related - functionality based on the SQL/XML standard in the core server. - That functionality covers XML syntax checking and XPath queries, - which is what this module does as well, and more, but the API is - not at all compatible. It is planned that this module will be - removed in PostgreSQL 8.4 in favor of the newer standard API, so - you are encouraged to try converting your applications. If you - find that some of the functionality of this module is not - available in an adequate form with the newer API, please explain - your issue to pgsql-hackers@postgresql.org so that the deficiency - can be addressed. - -- Peter Eisentraut, 2007-05-24 - -Development of this module was sponsored by Torchbox Ltd. (www.torchbox.com) -It has the same BSD licence as PostgreSQL. - -This version of the XML functions provides both XPath querying and -XSLT functionality. There is also a new table function which allows -the straightforward return of multiple XML results. Note that the current code -doesn't take any particular care over character sets - this is -something that should be fixed at some point! - -Installation ------------- - -The current build process will only work if the files are in -contrib/xml2 in a PostgreSQL 7.3 or later source tree which has been -configured and built (If you alter the subdir value in the Makefile -you can place it in a different directory in a PostgreSQL tree). - -Before you begin, just check the Makefile, and then just 'make' and -'make install'. - -By default, this module requires both libxml2 and libxslt to be installed -on your system. If you do not have libxslt or do not want to use XSLT -functions, you must edit the Makefile to not build the XSLT functions, -as directed in its comments; and edit pgxml.sql.in to remove the XSLT -function declarations, as directed in its comments. - -Description of functions ------------------------- - -The first set of functions are straightforward XML parsing and XPath queries: - -xml_is_well_formed(document) RETURNS bool - -This parses the document text in its parameter and returns true if the -document is well-formed XML. (Note: before PostgreSQL 8.2, this function -was called xml_valid(). That is the wrong name since validity and -well-formedness have different meanings in XML. The old name is still -available, but is deprecated and will be removed in 8.3.) - -xpath_string(document,query) RETURNS text -xpath_number(document,query) RETURNS float4 -xpath_bool(document,query) RETURNS bool - -These functions evaluate the XPath query on the supplied document, and -cast the result to the specified type. - - -xpath_nodeset(document,query,toptag,itemtag) RETURNS text - -This evaluates query on document and wraps the result in XML tags. If -the result is multivalued, the output will look like: - - -Value 1 which could be an XML fragment -Value 2.... - - -If either toptag or itemtag is an empty string, the relevant tag is omitted. -There are also wrapper functions for this operation: - -xpath_nodeset(document,query) RETURNS text omits both tags. -xpath_nodeset(document,query,itemtag) RETURNS text omits toptag. - - -xpath_list(document,query,seperator) RETURNS text - -This function returns multiple values seperated by the specified -seperator, e.g. Value 1,Value 2,Value 3 if seperator=','. - -xpath_list(document,query) RETURNS text - -This is a wrapper for the above function that uses ',' as the seperator. - - -xpath_table ------------ - -This is a table function which evaluates a set of XPath queries on -each of a set of documents and returns the results as a table. The -primary key field from the original document table is returned as the -first column of the result so that the resultset from xpath_table can -be readily used in joins. - -The function itself takes 5 arguments, all text. - -xpath_table(key,document,relation,xpaths,criteria) - -key - the name of the "key" field - this is just a field to be used as -the first column of the output table i.e. it identifies the record from -which each output row came (see note below about multiple values). - -document - the name of the field containing the XML document - -relation - the name of the table or view containing the documents - -xpaths - multiple xpath expressions separated by | - -criteria - The contents of the where clause. This needs to be specified, -so use "true" or "1=1" here if you want to process all the rows in the -relation. - -NB These parameters (except the XPath strings) are just substituted -into a plain SQL SELECT statement, so you have some flexibility - the -statement is - -SELECT , FROM WHERE - -so those parameters can be *anything* valid in those particular -locations. The result from this SELECT needs to return exactly two -columns (which it will unless you try to list multiple fields for key -or document). Beware that this simplistic approach requires that you -validate any user-supplied values to avoid SQL injection attacks. - -Using the function - -The function has to be used in a FROM expression. This gives the following -form: - -SELECT * FROM -xpath_table('article_id', - 'article_xml', - 'articles', - '/article/author|/article/pages|/article/title', - 'date_entered > ''2003-01-01'' ') -AS t(article_id integer, author text, page_count integer, title text); - -The AS clause defines the names and types of the columns in the -virtual table. If there are more XPath queries than result columns, -the extra queries will be ignored. If there are more result columns -than XPath queries, the extra columns will be NULL. - -Note that I've said in this example that pages is an integer. The -function deals internally with string representations, so when you say -you want an integer in the output, it will take the string -representation of the XPath result and use PostgreSQL input functions -to transform it into an integer (or whatever type the AS clause -requests). An error will result if it can't do this - for example if -the result is empty - so you may wish to just stick to 'text' as the -column type if you think your data has any problems. - -The select statement doesn't need to use * alone - it can reference the -columns by name or join them to other tables. The function produces a -virtual table with which you can perform any operation you wish (e.g. -aggregation, joining, sorting etc). So we could also have: - -SELECT t.title, p.fullname, p.email -FROM xpath_table('article_id','article_xml','articles', - '/article/title|/article/author/@id', - 'xpath_string(article_xml,''/article/@date'') > ''2003-03-20'' ') - AS t(article_id integer, title text, author_id integer), - tblPeopleInfo AS p -WHERE t.author_id = p.person_id; - -as a more complicated example. Of course, you could wrap all -of this in a view for convenience. - -Multivalued results - -The xpath_table function assumes that the results of each XPath query -might be multi-valued, so the number of rows returned by the function -may not be the same as the number of input documents. The first row -returned contains the first result from each query, the second row the -second result from each query. If one of the queries has fewer values -than the others, NULLs will be returned instead. - -In some cases, a user will know that a given XPath query will return -only a single result (perhaps a unique document identifier) - if used -alongside an XPath query returning multiple results, the single-valued -result will appear only on the first row of the result. The solution -to this is to use the key field as part of a join against a simpler -XPath query. As an example: - - -CREATE TABLE test -( - id int4 NOT NULL, - xml text, - CONSTRAINT pk PRIMARY KEY (id) -) -WITHOUT OIDS; - -INSERT INTO test VALUES (1, ' -123 -112233 -'); - -INSERT INTO test VALUES (2, ' -111222333 -111222333 -'); - - -The query: - -SELECT * FROM xpath_table('id','xml','test', -'/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1') -AS t(id int4, doc_num varchar(10), line_num varchar(10), val1 int4, -val2 int4, val3 int4) -WHERE id = 1 ORDER BY doc_num, line_num - - -Gives the result: - - id | doc_num | line_num | val1 | val2 | val3 -----+---------+----------+------+------+------ - 1 | C1 | L1 | 1 | 2 | 3 - 1 | | L2 | 11 | 22 | 33 - -To get doc_num on every line, the solution is to use two invocations -of xpath_table and join the results: - -SELECT t.*,i.doc_num FROM - xpath_table('id','xml','test', - '/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1') - AS t(id int4, line_num varchar(10), val1 int4, val2 int4, val3 int4), - xpath_table('id','xml','test','/doc/@num','1=1') - AS i(id int4, doc_num varchar(10)) -WHERE i.id=t.id AND i.id=1 -ORDER BY doc_num, line_num; - -which gives the desired result: - - id | line_num | val1 | val2 | val3 | doc_num -----+----------+------+------+------+--------- - 1 | L1 | 1 | 2 | 3 | C1 - 1 | L2 | 11 | 22 | 33 | C1 -(2 rows) - - - -XSLT functions --------------- - -The following functions are available if libxslt is installed (this is -not currently detected automatically, so you will have to amend the -Makefile) - -xslt_process(document,stylesheet,paramlist) RETURNS text - -This function appplies the XSL stylesheet to the document and returns -the transformed result. The paramlist is a list of parameter -assignments to be used in the transformation, specified in the form -'a=1,b=2'. Note that this is also proof-of-concept code and the -parameter parsing is very simple-minded (e.g. parameter values cannot -contain commas!) - -Also note that if either the document or stylesheet values do not -begin with a < then they will be treated as URLs and libxslt will -fetch them. It thus follows that you can use xslt_process as a means -to fetch the contents of URLs - you should be aware of the security -implications of this. - -There is also a two-parameter version of xslt_process which does not -pass any parameters to the transformation. - - -Feedback --------- - -If you have any comments or suggestions, please do contact me at -jgray@azuli.co.uk. Unfortunately, this isn't my main job, so I can't -guarantee a rapid response to your query! diff --git a/doc/src/sgml/adminpack.sgml b/doc/src/sgml/adminpack.sgml new file mode 100644 index 0000000000000000000000000000000000000000..10816f5c24dce0bada4ca877bda685e0fe5cf79e --- /dev/null +++ b/doc/src/sgml/adminpack.sgml @@ -0,0 +1,32 @@ + + adminpack + + adminpack is a PostgreSQL standard module that implements a number of + support functions which pgAdmin and other administration and management tools + can use to provide additional functionality if installed on a server. + + + + Functions implemented + + Functions implemented by adminpack can only be run by a superuser. Here's a + list of these functions: + + + + int8 pg_catalog.pg_file_write(fname text, data text, append bool) + bool pg_catalog.pg_file_rename(oldname text, newname text, archivname text) + bool pg_catalog.pg_file_rename(oldname text, newname text) + bool pg_catalog.pg_file_unlink(fname text) + setof record pg_catalog.pg_logdir_ls() + + /* Renaming of existing backend functions for pgAdmin compatibility */ + int8 pg_catalog.pg_file_read(fname text, data text, append bool) + bigint pg_catalog.pg_file_length(text) + int4 pg_catalog.pg_logfile_rotate() + + + + + + diff --git a/doc/src/sgml/btree-gist.sgml b/doc/src/sgml/btree-gist.sgml new file mode 100644 index 0000000000000000000000000000000000000000..4e1126e33c3b2e9f299a16c581e9e9f0ed3280eb --- /dev/null +++ b/doc/src/sgml/btree-gist.sgml @@ -0,0 +1,40 @@ + + + + btree-gist + + + btree-gist is a B-Tree implementation using GiST that supports the int2, int4, + int8, float4, float8 timestamp with/without time zone, time + with/without time zone, date, interval, oid, money, macaddr, char, + varchar/text, bytea, numeric, bit, varbit and inet/cidr types. + + + + Example usage + + CREATE TABLE test (a int4); + -- create index + CREATE INDEX testidx ON test USING gist (a); + -- query + SELECT * FROM test WHERE a < 10; + + + + + Authors + + All work was done by Teodor Sigaev (teodor@stack.net) , + Oleg Bartunov (oleg@sai.msu.su), Janko Richter + (jankorichter@yahoo.de). See + for additional + information. + + + + + diff --git a/contrib/pg_buffercache/README.pg_buffercache b/doc/src/sgml/buffercache.sgml similarity index 53% rename from contrib/pg_buffercache/README.pg_buffercache rename to doc/src/sgml/buffercache.sgml index 5be9af8ce412b8a24992028435cbba8e48a9ef25..1347c75ba3249c069780e111857bf7e82fbcb539 100644 --- a/contrib/pg_buffercache/README.pg_buffercache +++ b/doc/src/sgml/buffercache.sgml @@ -1,37 +1,32 @@ -Pg_buffercache - Real time queries on the shared buffer cache. --------------- - - This module consists of a C function 'pg_buffercache_pages()' that returns - a set of records, plus a view 'pg_buffercache' to wrapper the function. - - The intent is to do for the buffercache what pg_locks does for locks, i.e - - ability to examine what is happening at any given time without having to - restart or rebuild the server with debugging code added. - + + pg_buffercache + + + pg_buffercache + + + + pg_buffercache module provides the means for examining + what's happening to the buffercache at any given time without having to + restart or rebuild the server with debugging code added. The intent is to + do for the buffercache what pg_locks does for locks. + + + This module consists of a C function pg_buffercache_pages() + that returns a set of records, plus a view pg_buffercache + to wrapper the function. + + By default public access is REVOKED from both of these, just in case there are security issues lurking. - - -Installation ------------- - - Build and install the main Postgresql source, then this contrib module: - - $ cd contrib/pg_buffercache - $ gmake - $ gmake install - - - To register the functions: - - $ psql -d -f pg_buffercache.sql - - -Notes ------ - - The definition of the columns exposed in the view is: - + + + + Notes + + The definition of the columns exposed in the view is: + + Column | references | Description ----------------+----------------------+------------------------------------ bufferid | | Id, 1..shared_buffers. @@ -41,23 +36,27 @@ Notes relblocknumber | | Offset of the page in the relation. isdirty | | Is the page dirty? usagecount | | Page LRU count - - There is one row for each buffer in the shared cache. Unused buffers are - shown with all fields null except bufferid. - - Because the cache is shared by all the databases, there are pages from - relations not belonging to the current database. - - When the pg_buffercache view is accessed, internal buffer manager locks are - taken, and a copy of the buffer cache data is made for the view to display. - This ensures that the view produces a consistent set of results, while not - blocking normal buffer activity longer than necessary. Nonetheless there - could be some impact on database performance if this view is read often. - - -Sample output -------------- - + + + There is one row for each buffer in the shared cache. Unused buffers are + shown with all fields null except bufferid. + + + Because the cache is shared by all the databases, there are pages from + relations not belonging to the current database. + + + When the pg_buffercache view is accessed, internal buffer manager locks are + taken, and a copy of the buffer cache data is made for the view to display. + This ensures that the view produces a consistent set of results, while not + blocking normal buffer activity longer than necessary. Nonetheless there + could be some impact on database performance if this view is read often. + + + + + Sample output + regression=# \d pg_buffercache; View "public.pg_buffercache" Column | Type | Modifiers @@ -98,18 +97,25 @@ Sample output (10 rows) regression=# + + + + + Authors + + + + Mark Kirkwood markir@paradise.net.nz + + + + Design suggestions: Neil Conway neilc@samurai.com + + + Debugging advice: Tom Lane tgl@sss.pgh.pa.us + + + + + - -Author ------- - - * Mark Kirkwood - - -Help ----- - - * Design suggestions : Neil Conway - * Debugging advice : Tom Lane - - Thanks guys! diff --git a/doc/src/sgml/chkpass.sgml b/doc/src/sgml/chkpass.sgml new file mode 100644 index 0000000000000000000000000000000000000000..e0179b3971c2337d0478e1d25741920b06e760a9 --- /dev/null +++ b/doc/src/sgml/chkpass.sgml @@ -0,0 +1,84 @@ + + chkpass + + + + chkpass is a password type that is automatically checked and converted upon + entry. It is stored encrypted. To compare, simply compare against a clear + text password and the comparison function will encrypt it before comparing. + It also returns an error if the code determines that the password is easily + crackable. This is currently a stub that does nothing. + + + + Note that the chkpass data type is not indexable. + + + + + If you precede the string with a colon, the encryption and checking are + skipped so that you can enter existing passwords into the field. + + + + On output, a colon is prepended. This makes it possible to dump and reload + passwords without re-encrypting them. If you want the password (encrypted) + without the colon then use the raw() function. This allows you to use the + type with things like Apache's Auth_PostgreSQL module. + + + + The encryption uses the standard Unix function crypt(), and so it suffers + from all the usual limitations of that function; notably that only the + first eight characters of a password are considered. + + + + Here is some sample usage: + + + +test=# create table test (p chkpass); +CREATE TABLE +test=# insert into test values ('hello'); +INSERT 0 1 +test=# select * from test; + p +---------------- + :dVGkpXdOrE3ko +(1 row) + +test=# select raw(p) from test; + raw +--------------- + dVGkpXdOrE3ko +(1 row) + +test=# select p = 'hello' from test; + ?column? +---------- + t +(1 row) + +test=# select p = 'goodbye' from test; + ?column? +---------- + f +(1 row) + + + + Author + + D'Arcy J.M. Cain darcy@druid.net + + + + diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml new file mode 100644 index 0000000000000000000000000000000000000000..54c66d33a5ca0605244e5b481afa0ff780830260 --- /dev/null +++ b/doc/src/sgml/contrib.sgml @@ -0,0 +1,56 @@ + + Standard Modules + + + This section contains information regarding the standard modules which + can be found in the contrib directory of the + PostgreSQL distribution. These are porting tools, analysis utilities, + and plug-in features that are not part of the core PostgreSQL system, + mainly because they address a limited audience or are too experimental + to be part of the main source tree. This does not preclude their + usefulness. + + + + Some modules supply new user-defined functions, operators, or types. In + these cases, you will need to run make and make + install in contrib/module. After you have + installed the files you need to register the new entities in the database + system by running the commands in the supplied .sql file. For example, + + + $ psql -d dbname -f module.sql + + + + &adminpack; + &btree-gist; + &chkpass; + &cube; + &dblink; + &earthdistance; + &fuzzystrmatch; + &hstore; + &intagg; + &intarray; + &isn; + &lo; + <ree; + &oid2name; + &pageinspect; + &pgbench; + &buffercache; + &pgcrypto; + &freespacemap; + &pgrowlocks; + &standby; + &pgstattuple; + &trgm; + &seg; + &sslinfo; + &tablefunc; + &uuid-ossp; + &vacuumlo; + &xml2; + + diff --git a/doc/src/sgml/cube.sgml b/doc/src/sgml/cube.sgml new file mode 100644 index 0000000000000000000000000000000000000000..da19ae204afd38ee61a001993c0ad3c53e42681e --- /dev/null +++ b/doc/src/sgml/cube.sgml @@ -0,0 +1,529 @@ + + + cube + + + cube + + + + This module contains the user-defined type, CUBE, representing + multidimensional cubes. + + + + Syntax + + + The following are valid external representations for the CUBE type: + + + + Cube external representations + + + + 'x' + A floating point value representing a one-dimensional point or + one-dimensional zero length cubement + + + + '(x)' + Same as above + + + 'x1,x2,x3,...,xn' + A point in n-dimensional space, represented internally as a zero + volume box + + + + '(x1,x2,x3,...,xn)' + Same as above + + + '(x),(y)' + 1-D cubement starting at x and ending at y or vice versa; the + order does not matter + + + + '(x1,...,xn),(y1,...,yn)' + n-dimensional box represented by a pair of its opposite corners, no + matter which. Functions take care of swapping to achieve "lower left -- + upper right" representation before computing any values + + + + +
+
+ + + Grammar + + Cube Grammar Rules + + + + rule 1 + box -> O_BRACKET paren_list COMMA paren_list C_BRACKET + + + rule 2 + box -> paren_list COMMA paren_list + + + rule 3 + box -> paren_list + + + rule 4 + box -> list + + + rule 5 + paren_list -> O_PAREN list C_PAREN + + + rule 6 + list -> FLOAT + + + rule 7 + list -> list COMMA FLOAT + + + +
+
+ + + Tokens + + Cube Grammar Rules + + + + n + [0-9]+ + + + i + nteger [+-]?{n} + + + real + [+-]?({n}\.{n}?|\.{n}) + + + FLOAT + ({integer}|{real})([eE]{integer})? + + + O_BRACKET + \[ + + + C_BRACKET + \] + + + O_PAREN + \( + + + C_PAREN + \) + + + COMMA + \, + + + +
+
+ + + Examples + + Examples + + + + 'x' + A floating point value representing a one-dimensional point + (or, zero-length one-dimensional interval) + + + + '(x)' + Same as above + + + 'x1,x2,x3,...,xn' + A point in n-dimensional space,represented internally as a zero + volume cube + + + + '(x1,x2,x3,...,xn)' + Same as above + + + '(x),(y)' + A 1-D interval starting at x and ending at y or vice versa; the + order does not matter + + + + '[(x),(y)]' + Same as above + + + '(x1,...,xn),(y1,...,yn)' + An n-dimensional box represented by a pair of its diagonally + opposite corners, regardless of order. Swapping is provided + by all comarison routines to ensure the + "lower left -- upper right" representation + before actaul comparison takes place. + + + + '[(x1,...,xn),(y1,...,yn)]' + Same as above + + + +
+ + White space is ignored, so '[(x),(y)]' can be: '[ ( x ), ( y ) ]' + +
+ + Defaults + + I believe this union: + + +select cube_union('(0,5,2),(2,3,1)','0'); +cube_union +------------------- +(0, 0, 0),(2, 5, 2) +(1 row) + + + + does not contradict to the common sense, neither does the intersection + + + +select cube_inter('(0,-1),(1,1)','(-2),(2)'); +cube_inter +------------- +(0, 0),(1, 0) +(1 row) + + + + In all binary operations on differently sized boxes, I assume the smaller + one to be a cartesian projection, i. e., having zeroes in place of coordinates + omitted in the string representation. The above examples are equivalent to: + + + +cube_union('(0,5,2),(2,3,1)','(0,0,0),(0,0,0)'); +cube_inter('(0,-1),(1,1)','(-2,0),(2,0)'); + + + + The following containment predicate uses the point syntax, + while in fact the second argument is internally represented by a box. + This syntax makes it unnecessary to define the special Point type + and functions for (box,point) predicates. + + + +select cube_contains('(0,0),(1,1)', '0.5,0.5'); +cube_contains +-------------- +t +(1 row) + + + + Precision + +Values are stored internally as 64-bit floating point numbers. This means that +numbers with more than about 16 significant digits will be truncated. + + + + + Usage + + The access method for CUBE is a GiST index (gist_cube_ops), which is a + generalization of R-tree. GiSTs allow the postgres implementation of + R-tree, originally encoded to support 2-D geometric types such as + boxes and polygons, to be used with any data type whose data domain + can be partitioned using the concepts of containment, intersection and + equality. In other words, everything that can intersect or contain + its own kind can be indexed with a GiST. That includes, among other + things, all geometric data types, regardless of their dimensionality + (see also contrib/seg). + + + + The operators supported by the GiST access method include: + + + +a = b Same as + + + The cubements a and b are identical. + + + +a && b Overlaps + + + The cubements a and b overlap. + + + +a @> b Contains + + + The cubement a contains the cubement b. + + + +a <@ b Contained in + + + The cubement a is contained in b. + + + + (Before PostgreSQL 8.2, the containment operators @> and <@ were + respectively called @ and ~. These names are still available, but are + deprecated and will eventually be retired. Notice that the old names + are reversed from the convention formerly followed by the core geometric + datatypes!) + + + + Although the mnemonics of the following operators is questionable, I + preserved them to maintain visual consistency with other geometric + data types defined in Postgres. + + + + Other operators: + + + +[a, b] < [c, d] Less than +[a, b] > [c, d] Greater than + + + + These operators do not make a lot of sense for any practical + purpose but sorting. These operators first compare (a) to (c), + and if these are equal, compare (b) to (d). That accounts for + reasonably good sorting in most cases, which is useful if + you want to use ORDER BY with this type + + + + The following functions are available: + + + + Functions available + + + + cube_distance(cube, cube) returns double + cube_distance returns the distance between two cubes. If both + cubes are points, this is the normal distance function. + + + + cube(float8) returns cube + This makes a one dimensional cube with both coordinates the same. + If the type of the argument is a numeric type other than float8 an + explicit cast to float8 may be needed. + cube(1) == '(1)' + + + + + cube(float8, float8) returns cube + + This makes a one dimensional cube. + cube(1,2) == '(1),(2)' + + + + + cube(float8[]) returns cube + This makes a zero-volume cube using the coordinates + defined by thearray.cube(ARRAY[1,2]) == '(1,2)' + + + + + cube(float8[], float8[]) returns cube + This makes a cube, with upper right and lower left + coordinates as defined by the 2 float arrays. Arrays must be of the + same length. + cube('{1,2}'::float[], '{3,4}'::float[]) == '(1,2),(3,4)' + + + + + + cube(cube, float8) returns cube + This builds a new cube by adding a dimension on to an + existing cube with the same values for both parts of the new coordinate. + This is useful for building cubes piece by piece from calculated values. + cube('(1)',2) == '(1,2),(1,2)' + + + + + cube(cube, float8, float8) returns cube + This builds a new cube by adding a dimension on to an + existing cube. This is useful for building cubes piece by piece from + calculated values. cube('(1,2)',3,4) == '(1,3),(2,4)' + + + + + cube_dim(cube) returns int + cube_dim returns the number of dimensions stored in the + the data structure + for a cube. This is useful for constraints on the dimensions of a cube. + + + + + cube_ll_coord(cube, int) returns double + + cube_ll_coord returns the nth coordinate value for the lower left + corner of a cube. This is useful for doing coordinate transformations. + + + + + cube_ur_coord(cube, int) returns double + + cube_ur_coord returns the nth coordinate value for the + upper right corner of a cube. This is useful for doing coordinate + transformations. + + + + + cube_subset(cube, int[]) returns cube + + Builds a new cube from an existing cube, using a list of + dimension indexes + from an array. Can be used to find both the ll and ur coordinate of single + dimenion, e.g.: cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[2]) = '(3),(7)' + Or can be used to drop dimensions, or reorder them as desired, e.g.: + cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[3,2,1,1]) = + '(5, 3, 1, 1),(8, 7, 6, 6)' + + + + + cube_is_point(cube) returns bool + cube_is_point returns true if a cube is also a point. + This is true when the two defining corners are the same. + + + + cube_enlarge(cube, double, int) returns cube + + cube_enlarge increases the size of a cube by a specified + radius in at least + n dimensions. If the radius is negative the box is shrunk instead. This + is useful for creating bounding boxes around a point for searching for + nearby points. All defined dimensions are changed by the radius. If n + is greater than the number of defined dimensions and the cube is being + increased (r >= 0) then 0 is used as the base for the extra coordinates. + LL coordinates are decreased by r and UR coordinates are increased by r. + If a LL coordinate is increased to larger than the corresponding UR + coordinate (this can only happen when r < 0) than both coordinates are + set to their average. To make it harder for people to break things there + is an effective maximum on the dimension of cubes of 100. This is set + in cubedata.h if you need something bigger. + + + + +
+ + + There are a few other potentially useful functions defined in cube.c + that vanished from the schema because I stopped using them. Some of + these were meant to support type casting. Let me know if I was wrong: + I will then add them back to the schema. I would also appreciate + other ideas that would enhance the type and make it more useful. + + + + For examples of usage, see sql/cube.sql + +
+ + + Credits + + This code is essentially based on the example written for + Illustra, + + + My thanks are primarily to Prof. Joe Hellerstein + () for elucidating the + gist of the GiST (), and + to his former student, Andy Dong + (), for his exemplar. + I am also grateful to all postgres developers, present and past, for enabling + myself to create my own world and live undisturbed in it. And I would like to + acknowledge my gratitude to Argonne Lab and to the U.S. Department of Energy + for the years of faithful support of my database research. + + + + Gene Selkov, Jr. + Computational Scientist + Mathematics and Computer Science Division + Argonne National Laboratory + 9700 S Cass Ave. + Building 221 + Argonne, IL 60439-4844 + selkovjr@mcs.anl.gov + + + + Minor updates to this package were made by Bruno Wolff III + bruno@wolff.to in August/September of 2002. These include + changing the precision from single precision to double precision and adding + some new functions. + + + + Additional updates were made by Joshua Reich josh@root.net in + July 2006. These include cube(float8[], float8[]) and + cleaning up the code to use the V1 call protocol instead of the deprecated V0 + form. + + +
+ diff --git a/doc/src/sgml/dblink.sgml b/doc/src/sgml/dblink.sgml new file mode 100644 index 0000000000000000000000000000000000000000..095d600099b31e8f07e334d67bd9692d9e9c625c --- /dev/null +++ b/doc/src/sgml/dblink.sgml @@ -0,0 +1,1312 @@ + + + dblink + + dblink is a contrib module which allows connections with + other databases. + + + + + + + dblink_connect + opens a persistent connection to a remote database + + + + + dblink_connect(text connstr) + dblink_connect(text connname, text connstr) + + + + + Inputs + + + connname + + if 2 arguments ar given, the first is used as a name for a persistent + connection + + + + + connstr + + standard libpq format connection string, + e.g. "hostaddr=127.0.0.1 port=5432 dbname=mydb user=postgres password=mypasswd" + + + if only one argument is given, the connection is unnamed; only one unnamed + connection can exist at a time + + + + + + Outputs + Returns status = "OK" + + + + Example + + select dblink_connect('dbname=postgres'); + dblink_connect + ---------------- + OK + (1 row) + + select dblink_connect('myconn','dbname=postgres'); + dblink_connect + ---------------- + OK + (1 row) + + + + + + + + dblink_disconnect + closes a persistent connection to a remote database + + + + + dblink_disconnect() + dblink_disconnect(text connname) + + + + + Inputs + + + connname + + if an argument is given, it is used as a name for a persistent + connection to close; otherwiase the unnamed connection is closed + + + + + + Outputs + Returns status = "OK" + + + + Example + + test=# select dblink_disconnect(); + dblink_disconnect + ------------------- + OK + (1 row) + + select dblink_disconnect('myconn'); + dblink_disconnect + ------------------- + OK + (1 row) + + + + + + + + dblink_open + opens a cursor on a remote database + + + + + dblink_open(text cursorname, text sql [, bool fail_on_error]) + dblink_open(text connname, text cursorname, text sql [, bool fail_on_error]) + + + + + Inputs + + + connname + + if three arguments are present, the first is taken as the specific + connection name to use; otherwise the unnamed connection is assumed + + + + + cursorname + + a reference name for the cursor + + + + + sql + + sql statement that you wish to execute on the remote host + e.g. "select * from pg_class" + + + + + fail_on_error + + If true (default when not present) then an ERROR thrown on the remote side + of the connection causes an ERROR to also be thrown locally. If false, the + remote ERROR is locally treated as a NOTICE, and the return value is set + to 'ERROR'. + + + + + + Outputs + Returns status = "OK" + + + + Note + + + + dblink_connect(text connstr) must be executed first + + + + + dblink_open starts an explicit transaction. If, after using dblink_open, + you use dblink_exec to change data, and then an error occurs or you use + dblink_disconnect without a dblink_close first, your change *will* be + lost. Also, using dblink_close explicitly ends the transaction and thus + effectively closes *all* open cursors. + + + + + + + Example + + test=# select dblink_connect('dbname=postgres'); + dblink_connect + ---------------- + OK + (1 row) + + test=# select dblink_open('foo','select proname, prosrc from pg_proc'); + dblink_open + ------------- + OK + (1 row) + + + + + + + + dblink_fetch + returns a set from an open cursor on a remote database + + + + + dblink_fetch(text cursorname, int32 howmany [, bool fail_on_error]) + dblink_fetch(text connname, text cursorname, int32 howmany [, bool fail_on_error]) + + + + + Inputs + + + connname + + if three arguments are present, the first is taken as the specific + connection name to use; otherwise the unnamed connection is assumed + + + + + cursorname + + The reference name for the cursor + + + + + howmany + + Maximum number of rows to retrieve. The next howmany rows are fetched, + starting at the current cursor position, moving forward. Once the cursor + has positioned to the end, no more rows are produced. + + + + + fail_on_error + + If true (default when not present) then an ERROR thrown on the remote side + of the connection causes an ERROR to also be thrown locally. If false, the + remote ERROR is locally treated as a NOTICE, and no rows are returned. + + + + + + Outputs + Returns setof record + + + + Note + + On a mismatch between the number of return fields as specified in the FROM + clause, and the actual number of fields returned by the remote cursor, an + ERROR will be thrown. In this event, the remote cursor is still advanced + by as many rows as it would have been if the ERROR had not occurred. + + + + + Example + + test=# select dblink_connect('dbname=postgres'); + dblink_connect + ---------------- + OK + (1 row) + + test=# select dblink_open('foo','select proname, prosrc from pg_proc where proname like ''bytea%'''); + dblink_open + ------------- + OK + (1 row) + + test=# select * from dblink_fetch('foo',5) as (funcname name, source text); + funcname | source + ----------+---------- + byteacat | byteacat + byteacmp | byteacmp + byteaeq | byteaeq + byteage | byteage + byteagt | byteagt + (5 rows) + + test=# select * from dblink_fetch('foo',5) as (funcname name, source text); + funcname | source + -----------+----------- + byteain | byteain + byteale | byteale + bytealike | bytealike + bytealt | bytealt + byteane | byteane + (5 rows) + + test=# select * from dblink_fetch('foo',5) as (funcname name, source text); + funcname | source + ------------+------------ + byteanlike | byteanlike + byteaout | byteaout + (2 rows) + + test=# select * from dblink_fetch('foo',5) as (funcname name, source text); + funcname | source + ----------+-------- + (0 rows) + + + + + + + + dblink_close + closes a cursor on a remote database + + + + + dblink_close(text cursorname [, bool fail_on_error]) + dblink_close(text connname, text cursorname [, bool fail_on_error]) + + + + + Inputs + + + connname + + if two arguments are present, the first is taken as the specific + connection name to use; otherwise the unnamed connection is assumed + + + + + cursorname + + a reference name for the cursor + + + + + fail_on_error + + If true (default when not present) then an ERROR thrown on the remote side + of the connection causes an ERROR to also be thrown locally. If false, the + remote ERROR is locally treated as a NOTICE, and the return value is set + to 'ERROR'. + + + + + + Outputs + Returns status = "OK" + + + + Note + + dblink_connect(text connstr) or dblink_connect(text connname, text connstr) + must be executed first. + + + + + Example + + test=# select dblink_connect('dbname=postgres'); + dblink_connect + ---------------- + OK + (1 row) + + test=# select dblink_open('foo','select proname, prosrc from pg_proc'); + dblink_open + ------------- + OK + (1 row) + + test=# select dblink_close('foo'); + dblink_close + -------------- + OK + (1 row) + + select dblink_connect('myconn','dbname=regression'); + dblink_connect + ---------------- + OK + (1 row) + + select dblink_open('myconn','foo','select proname, prosrc from pg_proc'); + dblink_open + ------------- + OK + (1 row) + + select dblink_close('myconn','foo'); + dblink_close + -------------- + OK + (1 row) + + + + + + + + dblink_exec + executes an UPDATE/INSERT/DELETE on a remote database + + + + + dblink_exec(text connstr, text sql [, bool fail_on_error]) + dblink_exec(text connname, text sql [, bool fail_on_error]) + dblink_exec(text sql [, bool fail_on_error]) + + + + + Inputs + + + connname/connstr + + If two arguments are present, the first is first assumed to be a specific + connection name to use. If the name is not found, the argument is then + assumed to be a valid connection string, of standard libpq format, + e.g.: "hostaddr=127.0.0.1 dbname=mydb user=postgres password=mypasswd" + + If only one argument is used, then the unnamed connection is used. + + + + + sql + + sql statement that you wish to execute on the remote host, e.g.: + insert into foo values(0,'a','{"a0","b0","c0"}'); + + + + fail_on_error + + If true (default when not present) then an ERROR thrown on the remote side + of the connection causes an ERROR to also be thrown locally. If false, the + remote ERROR is locally treated as a NOTICE, and the return value is set + to 'ERROR'. + + + + + + Outputs + Returns status of the command, or 'ERROR' if the command failed. + + + + Notes + + dblink_open starts an explicit transaction. If, after using dblink_open, + you use dblink_exec to change data, and then an error occurs or you use + dblink_disconnect without a dblink_close first, your change *will* be + lost. + + + + + Example + + select dblink_connect('dbname=dblink_test_slave'); + dblink_connect + ---------------- + OK + (1 row) + + select dblink_exec('insert into foo values(21,''z'',''{"a0","b0","c0"}'');'); + dblink_exec + ----------------- + INSERT 943366 1 + (1 row) + + select dblink_connect('myconn','dbname=regression'); + dblink_connect + ---------------- + OK + (1 row) + + select dblink_exec('myconn','insert into foo values(21,''z'',''{"a0","b0","c0"}'');'); + dblink_exec + ------------------ + INSERT 6432584 1 + (1 row) + + select dblink_exec('myconn','insert into pg_class values (''foo'')',false); + NOTICE: sql error + DETAIL: ERROR: null value in column "relnamespace" violates not-null constraint + + dblink_exec + ------------- + ERROR + (1 row) + + + + + + + + dblink_current_query + returns the current query string + + + + + dblink_current_query () RETURNS text + + + + + Inputs + + + None + + + + + + + Outputs + Returns test -- a copy of the currenty executing query + + + + Example + + test=# select dblink_current_query() from (select dblink('dbname=postgres','select oid, proname from pg_proc where proname = ''byteacat''') as f1) as t1; + dblink_current_query + ----------------------------------------------------------------------------------------------------------------------------------------------------- + select dblink_current_query() from (select dblink('dbname=postgres','select oid, proname from pg_proc where proname = ''byteacat''') as f1) as t1; + (1 row) + + + + + + + + dblink_get_pkey + returns the position and field names of a relation's + primary key fields + + + + + + dblink_get_pkey(text relname) RETURNS setof dblink_pkey_results + + + + + Inputs + + + relname + + any relation name; + e.g. 'foobar' + + + + + + Outputs + + Returns setof dblink_pkey_results -- one row for each primary key field, + in order of position in the key. dblink_pkey_results is defined as follows: + CREATE TYPE dblink_pkey_results AS (position int4, colname text); + + + + + Example + + test=# select * from dblink_get_pkey('foobar'); + position | colname + ----------+--------- + 1 | f1 + 2 | f2 + 3 | f3 + 4 | f4 + 5 | f5 + + + + + + + + dblink_build_sql_insert + + builds an insert statement using a local tuple, replacing the + selection key field values with alternate supplied values + + + + + + dblink_build_sql_insert(text relname + ,int2vector primary_key_attnums + ,int2 num_primary_key_atts + ,_text src_pk_att_vals_array + ,_text tgt_pk_att_vals_array) RETURNS text + + + + + Inputs + + + relname + + any relation name; + e.g. 'foobar'; + + + + primary_key_attnums + + vector of primary key attnums (1 based, see pg_index.indkey); + e.g. '1 2' + + + + num_primary_key_atts + + number of primary key attnums in the vector; e.g. 2 + + + + src_pk_att_vals_array + + array of primary key values, used to look up the local matching + tuple, the values of which are then used to construct the SQL + statement + + + + tgt_pk_att_vals_array + + array of primary key values, used to replace the local tuple + values in the SQL statement + + + + + + Outputs + Returns text -- requested SQL statement + + + + Example + + test=# select dblink_build_sql_insert('foo','1 2',2,'{"1", "a"}','{"1", "b''a"}'); + dblink_build_sql_insert + -------------------------------------------------- + INSERT INTO foo(f1,f2,f3) VALUES('1','b''a','1') + (1 row) + + + + + + + + dblink_build_sql_delete + builds a delete statement using supplied values for selection + key field values + + + + + + dblink_build_sql_delete(text relname + ,int2vector primary_key_attnums + ,int2 num_primary_key_atts + ,_text tgt_pk_att_vals_array) RETURNS text + + + + + Inputs + + + relname + + any relation name; + e.g. 'foobar'; + + + + primary_key_attnums + + vector of primary key attnums (1 based, see pg_index.indkey); + e.g. '1 2' + + + + num_primary_key_atts + + number of primary key attnums in the vector; e.g. 2 + + + + src_pk_att_vals_array + + array of primary key values, used to look up the local matching + tuple, the values of which are then used to construct the SQL + statement + + + + tgt_pk_att_vals_array + + array of primary key values, used to replace the local tuple + values in the SQL statement + + + + + + Outputs + Returns text -- requested SQL statement + + + + Example + + test=# select dblink_build_sql_delete('MyFoo','1 2',2,'{"1", "b"}'); + dblink_build_sql_delete + --------------------------------------------- + DELETE FROM "MyFoo" WHERE f1='1' AND f2='b' + (1 row) + + + + + + + + dblink_build_sql_update + builds an update statement using a local tuple, replacing + the selection key field values with alternate supplied values + + + + + + dblink_build_sql_update(text relname + ,int2vector primary_key_attnums + ,int2 num_primary_key_atts + ,_text src_pk_att_vals_array + ,_text tgt_pk_att_vals_array) RETURNS text + + + + + Inputs + + + relname + + any relation name; + e.g. 'foobar'; + + + + primary_key_attnums + + vector of primary key attnums (1 based, see pg_index.indkey); + e.g. '1 2' + + + + num_primary_key_atts + + number of primary key attnums in the vector; e.g. 2 + + + + src_pk_att_vals_array + + array of primary key values, used to look up the local matching + tuple, the values of which are then used to construct the SQL + statement + + + + tgt_pk_att_vals_array + + array of primary key values, used to replace the local tuple + values in the SQL statement + + + + + + Outputs + Returns text -- requested SQL statement + + + + Example + + test=# select dblink_build_sql_update('foo','1 2',2,'{"1", "a"}','{"1", "b"}'); + dblink_build_sql_update + ------------------------------------------------------------- + UPDATE foo SET f1='1',f2='b',f3='1' WHERE f1='1' AND f2='b' + (1 row) + + + + + + + + dblink_get_connections + returns a text array of all active named dblink connections + + + + + dblink_get_connections() RETURNS text[] + + + + + Inputs + + + none + + + + + + Outputs + Returns text array of all active named dblink connections + + + + Example + + SELECT dblink_get_connections(); + + + + + + + + dblink_is_busy + checks to see if named connection is busy with an async query + + + + + dblink_is_busy(text connname) RETURNS int + + + + + Inputs + + + connname + + The specific connection name to use + + + + + + Outputs + + Returns 1 if connection is busy, 0 if it is not busy. + If this function returns 0, it is guaranteed that dblink_get_result + will not block. + + + + + Example + + SELECT dblink_is_busy('dtest1'); + + + + + + + + dblink_cancel_query + cancels any active query on the named connection + + + + + dblink_cancel_query(text connname) RETURNS text + + + + + Inputs + + + connname + + The specific connection name to use. + + + + + + Outputs + + Returns "OK" on success, or an error message on failure. + + + + + Example + + SELECT dblink_cancel_query('dtest1'); + + + + + + + + dblink_error_message + gets last error message on the named connection + + + + + dblink_error_message(text connname) RETURNS text + + + + + Inputs + + + connname + + The specific connection name to use. + + + + + + Outputs + + Returns last error message. + + + + + Example + + SELECT dblink_error_message('dtest1'); + + + + + + + + dblink + returns a set from a remote database + + + + + dblink(text connstr, text sql [, bool fail_on_error]) + dblink(text connname, text sql [, bool fail_on_error]) + dblink(text sql [, bool fail_on_error]) + + + + + Inputs + + + connname/connstr + + If two arguments are present, the first is first assumed to be a specific + connection name to use. If the name is not found, the argument is then + assumed to be a valid connection string, of standard libpq format, + e.g.: "hostaddr=127.0.0.1 dbname=mydb user=postgres password=mypasswd" + + If only one argument is used, then the unnamed connection is used. + + + + + sql + + sql statement that you wish to execute on the remote host + e.g. "select * from pg_class" + + + + fail_on_error + + If true (default when not present) then an ERROR thrown on the remote side + of the connection causes an ERROR to also be thrown locally. If false, the + remote ERROR is locally treated as a NOTICE, and no rows are returned. + + + + + + + + + + + Outputs + Returns setof record + + + + Example + + select * from dblink('dbname=postgres','select proname, prosrc from pg_proc') + as t1(proname name, prosrc text) where proname like 'bytea%'; + proname | prosrc + ------------+------------ + byteacat | byteacat + byteaeq | byteaeq + bytealt | bytealt + byteale | byteale + byteagt | byteagt + byteage | byteage + byteane | byteane + byteacmp | byteacmp + bytealike | bytealike + byteanlike | byteanlike + byteain | byteain + byteaout | byteaout + (12 rows) + + select dblink_connect('dbname=postgres'); + dblink_connect + ---------------- + OK + (1 row) + + select * from dblink('select proname, prosrc from pg_proc') + as t1(proname name, prosrc text) where proname like 'bytea%'; + proname | prosrc + ------------+------------ + byteacat | byteacat + byteaeq | byteaeq + bytealt | bytealt + byteale | byteale + byteagt | byteagt + byteage | byteage + byteane | byteane + byteacmp | byteacmp + bytealike | bytealike + byteanlike | byteanlike + byteain | byteain + byteaout | byteaout + (12 rows) + + select dblink_connect('myconn','dbname=regression'); + dblink_connect + ---------------- + OK + (1 row) + + select * from dblink('myconn','select proname, prosrc from pg_proc') + as t1(proname name, prosrc text) where proname like 'bytea%'; + proname | prosrc + ------------+------------ + bytearecv | bytearecv + byteasend | byteasend + byteale | byteale + byteagt | byteagt + byteage | byteage + byteane | byteane + byteacmp | byteacmp + bytealike | bytealike + byteanlike | byteanlike + byteacat | byteacat + byteaeq | byteaeq + bytealt | bytealt + byteain | byteain + byteaout | byteaout + (14 rows) + + + A more convenient way to use dblink may be to create a view: + + + create view myremote_pg_proc as + select * + from dblink('dbname=postgres','select proname, prosrc from pg_proc') + as t1(proname name, prosrc text); + + + Then you can simply write: + + + select * from myremote_pg_proc where proname like 'bytea%'; + + + + + + + + dblink_send_query + sends an async query to a remote database + + + + + dblink_send_query(text connname, text sql) + + + + + Inputs + + + connname + + The specific connection name to use. + + + + sql + + sql statement that you wish to execute on the remote host + e.g. "select * from pg_class" + + + + + + Outputs + + Returns int. A return value of 1 if the query was successfully dispatched, + 0 otherwise. If 1, results must be fetched by dblink_get_result(connname). + A running query may be cancelled by dblink_cancel_query(connname). + + + + + Example + + + SELECT dblink_connect('dtest1', 'dbname=contrib_regression'); + SELECT * FROM + dblink_send_query('dtest1', 'SELECT * FROM foo WHERE f1 < 3') AS t1; + + + + + + + + + dblink_get_result + gets an async query result + + + + + dblink_get_result(text connname [, bool fail_on_error]) + + + + + Inputs + + + connname + + The specific connection name to use. An asynchronous query must + have already been sent using dblink_send_query() + + + + fail_on_error + + If true (default when not present) then an ERROR thrown on the remote side + of the connection causes an ERROR to also be thrown locally. If false, the + remote ERROR is locally treated as a NOTICE, and no rows are returned. + + + + + + Outputs + Returns setof record + + + + Notes + + Blocks until a result gets available. + + This function *must* be called if dblink_send_query returned + a 1, even on cancelled queries - otherwise the connection + can't be used anymore. It must be called once for each query + sent, and one additional time to obtain an empty set result, + prior to using the connection again. + + + + + Example + + contrib_regression=# SELECT dblink_connect('dtest1', 'dbname=contrib_regression'); + dblink_connect + ---------------- + OK + (1 row) + + contrib_regression=# SELECT * from + contrib_regression-# dblink_send_query('dtest1', 'select * from foo where f1 < 3') as t1; + t1 + ---- + 1 + (1 row) + + contrib_regression=# SELECT * from dblink_get_result('dtest1') as t1(f1 int, f2 text, f3 text[]); + f1 | f2 | f3 + ----+----+------------ + 0 | a | {a0,b0,c0} + 1 | b | {a1,b1,c1} + 2 | c | {a2,b2,c2} + (3 rows) + + contrib_regression=# SELECT * from dblink_get_result('dtest1') as t1(f1 int, f2 text, f3 text[]); + f1 | f2 | f3 + ----+----+---- + (0 rows) + + contrib_regression=# SELECT * from + dblink_send_query('dtest1', 'select * from foo where f1 < 3; select * from foo where f1 > 6') as t1; + t1 + ---- + 1 + (1 row) + + contrib_regression=# SELECT * from dblink_get_result('dtest1') as t1(f1 int, f2 text, f3 text[]); + f1 | f2 | f3 + ----+----+------------ + 0 | a | {a0,b0,c0} + 1 | b | {a1,b1,c1} + 2 | c | {a2,b2,c2} + (3 rows) + + contrib_regression=# SELECT * from dblink_get_result('dtest1') as t1(f1 int, f2 text, f3 text[]); + f1 | f2 | f3 + ----+----+--------------- + 7 | h | {a7,b7,c7} + 8 | i | {a8,b8,c8} + 9 | j | {a9,b9,c9} + 10 | k | {a10,b10,c10} + (4 rows) + + contrib_regression=# SELECT * from dblink_get_result('dtest1') as t1(f1 int, f2 text, f3 text[]); + f1 | f2 | f3 + ----+----+---- + (0 rows) + + + + diff --git a/doc/src/sgml/earthdistance.sgml b/doc/src/sgml/earthdistance.sgml new file mode 100644 index 0000000000000000000000000000000000000000..2d08bb829d6e43535eb9fe5bed2e8ed1310eb68c --- /dev/null +++ b/doc/src/sgml/earthdistance.sgml @@ -0,0 +1,133 @@ + + earthdistance + + + earthdistance + + + + This module contains two different approaches to calculating + great circle distances on the surface of the Earth. The one described + first depends on the contrib/cube package (which MUST be installed before + earthdistance is installed). The second one is based on the point + datatype using latitude and longitude for the coordinates. The install + script makes the defined functions executable by anyone. + + + A spherical model of the Earth is used. + + + Data is stored in cubes that are points (both corners are the same) using 3 + coordinates representing the distance from the center of the Earth. + + + The radius of the Earth is obtained from the earth() function. It is + given in meters. But by changing this one function you can change it + to use some other units or to use a different value of the radius + that you feel is more appropiate. + + + This package also has applications to astronomical databases as well. + Astronomers will probably want to change earth() to return a radius of + 180/pi() so that distances are in degrees. + + + Functions are provided to allow for input in latitude and longitude (in + degrees), to allow for output of latitude and longitude, to calculate + the great circle distance between two points and to easily specify a + bounding box usable for index searches. + + + The functions are all 'sql' functions. If you want to make these functions + executable by other people you will also have to make the referenced + cube functions executable. cube(text), cube(float8), cube(cube,float8), + cube_distance(cube,cube), cube_ll_coord(cube,int) and + cube_enlarge(cube,float8,int) are used indirectly by the earth distance + functions. is_point(cube) and cube_dim(cube) are used in constraints for data + in domain earth. cube_ur_coord(cube,int) is used in the regression tests and + might be useful for looking at bounding box coordinates in user applications. + + + A domain of type cube named earth is defined. + There are constraints on it defined to make sure the cube is a point, + that it does not have more than 3 dimensions and that it is very near + the surface of a sphere centered about the origin with the radius of + the Earth. + + + The following functions are provided: + + + + EarthDistance functions + + + + earth() + returns the radius of the Earth in meters. + + + sec_to_gc(float8) + converts the normal straight line + (secant) distance between between two points on the surface of the Earth + to the great circle distance between them. + + + + gc_to_sec(float8) + Converts the great circle distance + between two points on the surface of the Earth to the normal straight line + (secant) distance between them. + + + + ll_to_earth(float8, float8) + Returns the location of a point on the surface of the Earth given + its latitude (argument 1) and longitude (argument 2) in degrees. + + + + latitude(earth) + Returns the latitude in degrees of a point on the surface of the + Earth. + + + + longitude(earth) + Returns the longitude in degrees of a point on the surface of the + Earth. + + + + earth_distance(earth, earth) + Returns the great circle distance between two points on the + surface of the Earth. + + + + earth_box(earth, float8) + Returns a box suitable for an indexed search using the cube @> + operator for points within a given great circle distance of a location. + Some points in this box are further than the specified great circle + distance from the location so a second check using earth_distance + should be made at the same time. + + + + <@> operator + gives the distance in statute miles between + two points on the Earth's surface. Coordinates are in degrees. Points are + taken as (longitude, latitude) and not vice versa as longitude is closer + to the intuitive idea of x-axis and latitude to y-axis. + + + + +
+ + One advantage of using cube representation over a point using latitude and + longitude for coordinates, is that you don't have to worry about special + conditions at +/- 180 degrees of longitude or near the poles. + +
+ diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 5502bb42086929ddcf9daa9a10f424ad8be11a50..a1a8d048ed3cc2fa629c59b6ade8c75d55707aea 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -1,4 +1,4 @@ - + @@ -89,6 +89,38 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/doc/src/sgml/freespacemap.sgml b/doc/src/sgml/freespacemap.sgml new file mode 100644 index 0000000000000000000000000000000000000000..70b27415524d57f30c5de8bfd8c0f0db7b795078 --- /dev/null +++ b/doc/src/sgml/freespacemap.sgml @@ -0,0 +1,243 @@ + + pgfreespacemap + + + pgfreespacemap + + + + This modules provides the means for examining the free space map (FSM). It + consists of two C functions: pg_freespacemap_relations() + and pg_freespacemap_pages() that return a set + of records, plus two views pg_freespacemap_relations and + pg_freespacemap_pages for more user-friendly access to + the functions. + + + The module provides the ability to examine the contents of the free space + map, without having to restart or rebuild the server with additional + debugging code. + + + By default public access is REVOKED from the functions and views, just in + case there are security issues present in the code. + + + + Notes + + The definitions for the columns exposed in the views are: + + + + pg_freespacemap_relations + + + + Column + references + Description + + + + + reltablespace + pg_tablespace.oid + Tablespace oid of the relation. + + + reldatabase + pg_database.oid + Database oid of the relation. + + + relfilenode + pg_class.relfilenode + Relfilenode of the relation. + + + avgrequest + + Moving average of free space requests (NULL for indexes) + + + interestingpages + + Count of pages last reported as containing useful free space. + + + storedpages + + Count of pages actually stored in free space map. + + + nextpage + + Page index (from 0) to start next search at. + + + +
+ + + pg_freespacemap_pages + + + + Column + references + Description + + + + + reltablespace + pg_tablespace.oid + Tablespace oid of the relation. + + + reldatabase + pg_database.oid + Database oid of the relation. + + + relfilenode + pg_class.relfilenode + Relfilenode of the relation. + + + relblocknumber + + Page number in the relation. + + + bytes + + Free bytes in the page, or NULL for an index page (see below). + + + +
+ + + For pg_freespacemap_relations, there is one row for each + relation in the free space map. storedpages is the + number of pages actually stored in the map, while + interestingpages is the number of pages the last VACUUM + thought had useful amounts of free space. + + + If storedpages is consistently less than interestingpages + then it'd be a good idea to increase max_fsm_pages. Also, + if the number of rows in pg_freespacemap_relations is + close to max_fsm_relations, then you should consider + increasing max_fsm_relations. + + + For pg_freespacemap_pages, there is one row for each page + in the free space map. The number of rows for a relation will match the + storedpages column in + pg_freespacemap_relations. + + + For indexes, what is tracked is entirely-unused pages, rather than free + space within pages. Therefore, the average request size and free bytes + within a page are not meaningful, and are shown as NULL. + + + Because the map is shared by all the databases, it will include relations + not belonging to the current database. + + + When either of the views are accessed, internal free space map locks are + taken, and a copy of the map data is made for them to display. + This ensures that the views produce a consistent set of results, while not + blocking normal activity longer than necessary. Nonetheless there + could be some impact on database performance if they are read often. + +
+ + + Sample output - pg_freespacemap_relations + +regression=# \d pg_freespacemap_relations +View "public.pg_freespacemap_relations" + Column | Type | Modifiers +------------------+---------+----------- + reltablespace | oid | + reldatabase | oid | + relfilenode | oid | + avgrequest | integer | + interestingpages | integer | + storedpages | integer | + nextpage | integer | +View definition: + SELECT p.reltablespace, p.reldatabase, p.relfilenode, p.avgrequest, p.interestingpages, p.storedpages, p.nextpage + FROM pg_freespacemap_relations() p(reltablespace oid, reldatabase oid, relfilenode oid, avgrequest integer, interestingpages integer, storedpages integer, nextpage integer); + +regression=# SELECT c.relname, r.avgrequest, r.interestingpages, r.storedpages + FROM pg_freespacemap_relations r INNER JOIN pg_class c + ON c.relfilenode = r.relfilenode INNER JOIN pg_database d + ON r.reldatabase = d.oid AND (d.datname = current_database()) + ORDER BY r.storedpages DESC LIMIT 10; + relname | avgrequest | interestingpages | storedpages +---------------------------------+------------+------------------+------------- + onek | 256 | 109 | 109 + pg_attribute | 167 | 93 | 93 + pg_class | 191 | 49 | 49 + pg_attribute_relid_attnam_index | | 48 | 48 + onek2 | 256 | 37 | 37 + pg_depend | 95 | 26 | 26 + pg_type | 199 | 16 | 16 + pg_rewrite | 1011 | 13 | 13 + pg_class_relname_nsp_index | | 10 | 10 + pg_proc | 302 | 8 | 8 +(10 rows) + + + + + Sample output - pg_freespacemap_pages + +regression=# \d pg_freespacemap_pages + View "public.pg_freespacemap_pages" + Column | Type | Modifiers +----------------+---------+----------- + reltablespace | oid | + reldatabase | oid | + relfilenode | oid | + relblocknumber | bigint | + bytes | integer | +View definition: + SELECT p.reltablespace, p.reldatabase, p.relfilenode, p.relblocknumber, p.bytes + FROM pg_freespacemap_pages() p(reltablespace oid, reldatabase oid, relfilenode oid, relblocknumber bigint, bytes integer); + +regression=# SELECT c.relname, p.relblocknumber, p.bytes + FROM pg_freespacemap_pages p INNER JOIN pg_class c + ON c.relfilenode = p.relfilenode INNER JOIN pg_database d + ON (p.reldatabase = d.oid AND d.datname = current_database()) + ORDER BY c.relname LIMIT 10; + relname | relblocknumber | bytes +--------------+----------------+------- + a_star | 0 | 8040 + abstime_tbl | 0 | 7908 + aggtest | 0 | 8008 + altinhoid | 0 | 8128 + altstartwith | 0 | 8128 + arrtest | 0 | 7172 + b_star | 0 | 7976 + box_tbl | 0 | 7912 + bt_f8_heap | 54 | 7728 + bt_i4_heap | 49 | 8008 +(10 rows) + + + + + Author + + Mark Kirkwood markir@paradise.net.nz + + +
+ diff --git a/doc/src/sgml/fuzzystrmatch.sgml b/doc/src/sgml/fuzzystrmatch.sgml new file mode 100644 index 0000000000000000000000000000000000000000..666e031c0d63fcf84fa26be0878ddc930892d081 --- /dev/null +++ b/doc/src/sgml/fuzzystrmatch.sgml @@ -0,0 +1,122 @@ + + + fuzzystrmatch + + + This section describes the fuzzystrmatch module which provides different + functions to determine similarities and distance between strings. + + + + Soundex + + The Soundex system is a method of matching similar sounding names + (or any words) to the same code. It was initially used by the + United States Census in 1880, 1900, and 1910, but it has little use + beyond English names (or the English pronunciation of names), and + it is not a linguistic tool. + + + When comparing two soundex values to determine similarity, the + difference function reports how close the match is on a scale + from zero to four, with zero being no match and four being an + exact match. + + + The following are some usage examples: + + +SELECT soundex('hello world!'); + +SELECT soundex('Anne'), soundex('Ann'), difference('Anne', 'Ann'); +SELECT soundex('Anne'), soundex('Andrew'), difference('Anne', 'Andrew'); +SELECT soundex('Anne'), soundex('Margaret'), difference('Anne', 'Margaret'); + +CREATE TABLE s (nm text); + +INSERT INTO s VALUES ('john'); +INSERT INTO s VALUES ('joan'); +INSERT INTO s VALUES ('wobbly'); +INSERT INTO s VALUES ('jack'); + +SELECT * FROM s WHERE soundex(nm) = soundex('john'); + +SELECT a.nm, b.nm FROM s a, s b WHERE soundex(a.nm) = soundex(b.nm) AND a.oid <> b.oid; + +CREATE FUNCTION text_sx_eq(text, text) RETURNS boolean AS +'select soundex($1) = soundex($2)' +LANGUAGE SQL; + +CREATE FUNCTION text_sx_lt(text, text) RETURNS boolean AS +'select soundex($1) < soundex($2)' +LANGUAGE SQL; + +CREATE FUNCTION text_sx_gt(text, text) RETURNS boolean AS +'select soundex($1) > soundex($2)' +LANGUAGE SQL; + +CREATE FUNCTION text_sx_le(text, text) RETURNS boolean AS +'select soundex($1) <= soundex($2)' +LANGUAGE SQL; + +CREATE FUNCTION text_sx_ge(text, text) RETURNS boolean AS +'select soundex($1) >= soundex($2)' +LANGUAGE SQL; + +CREATE FUNCTION text_sx_ne(text, text) RETURNS boolean AS +'select soundex($1) <> soundex($2)' +LANGUAGE SQL; + +DROP OPERATOR #= (text, text); + +CREATE OPERATOR #= (leftarg=text, rightarg=text, procedure=text_sx_eq, commutator = #=); + +SELECT * FROM s WHERE text_sx_eq(nm, 'john'); + +SELECT * FROM s WHERE s.nm #= 'john'; + +SELECT * FROM s WHERE difference(s.nm, 'john') > 2; + + + + + levenshtein + + This function calculates the levenshtein distance between two strings: + + + int levenshtein(text source, text target) + + + Both source and target can be any + NOT NULL string with a maximum of 255 characters. + + + Example: + + + SELECT levenshtein('GUMBO','GAMBOL'); + + + + + metaphone + + This function calculates and returns the metaphone code of an input string: + + + text metahpone(text source, int max_output_length) + + + source has to be a NOT NULL string with a maximum of + 255 characters. max_output_length fixes the maximum + length of the output metaphone code; if longer, the output is truncated + to this length. + + Example + + SELECT metaphone('GUMBO',4); + + + + diff --git a/doc/src/sgml/hstore.sgml b/doc/src/sgml/hstore.sgml new file mode 100644 index 0000000000000000000000000000000000000000..147fc7fba606ab0d3ef931f5264b8b5d6ce80960 --- /dev/null +++ b/doc/src/sgml/hstore.sgml @@ -0,0 +1,298 @@ + + hstore + + + hstore + + + + The hstore module is usefull for storing (key,value) pairs. + This module can be useful in different scenarios: case with many attributes + rarely searched, semistructural data or a lazy DBA. + + + + Operations + + + + hstore -> text - get value , perl analogy $h{key} + + +select 'a=>q, b=>g'->'a'; + ? +------ + q + + + Note the use of parenthesis in the select below, because priority of 'is' is + higher than that of '->': + + +SELECT id FROM entrants WHERE (info->'education_period') IS NOT NULL; + + + + + + hstore || hstore - concatenation, perl analogy %a=( %b, %c ); + + +regression=# select 'a=>b'::hstore || 'c=>d'::hstore; + ?column? +-------------------- + "a"=>"b", "c"=>"d" +(1 row) + + + + but, notice + + + +regression=# select 'a=>b'::hstore || 'a=>d'::hstore; + ?column? +---------- + "a"=>"d" +(1 row) + + + + + + text => text - creates hstore type from two text strings + + +select 'a'=>'b'; + ?column? +---------- + "a"=>"b" + + + + + + hstore @> hstore - contains operation, check if left operand contains right. + + +regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'a=>c'; + ?column? +---------- + f +(1 row) + +regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'b=>1'; + ?column? +---------- + t +(1 row) + + + + + + hstore <@ hstore - contained operation, check if + left operand is contained in right + + + (Before PostgreSQL 8.2, the containment operators @> and <@ were + respectively called @ and ~. These names are still available, but are + deprecated and will eventually be retired. Notice that the old names + are reversed from the convention formerly followed by the core geometric + datatypes!) + + + + + + + Functions + + + + + akeys(hstore) - returns all keys from hstore as array + + +regression=# select akeys('a=>1,b=>2'); + akeys +------- + {a,b} + + + + + + skeys(hstore) - returns all keys from hstore as strings + + +regression=# select skeys('a=>1,b=>2'); + skeys +------- + a + b + + + + + + avals(hstore) - returns all values from hstore as array + + +regression=# select avals('a=>1,b=>2'); + avals +------- + {1,2} + + + + + + svals(hstore) - returns all values from hstore as + strings + + +regression=# select svals('a=>1,b=>2'); + svals +------- + 1 + 2 + + + + + + delete (hstore,text) - delete (key,value) from hstore if + key matches argument. + + +regression=# select delete('a=>1,b=>2','b'); + delete +---------- + "a"=>"1" + + + + + + each(hstore) - return (key, value) pairs + + +regression=# select * from each('a=>1,b=>2'); + key | value +-----+------- + a | 1 + b | 2 + + + + + + exist (hstore,text) + + + hstore ? text - returns 'true if key is exists in hstore + and false otherwise. + + +regression=# select exist('a=>1','a'), 'a=>1' ? 'a'; + exist | ?column? +-------+---------- + t | t + + + + + + defined (hstore,text) - returns true if key is exists in + hstore and its value is not NULL. + + +regression=# select defined('a=>NULL','a'); + defined +--------- + f + + + + + + + Indices + + Module provides index support for '@>' and '?' operations. + + +CREATE INDEX hidx ON testhstore USING GIST(h); +CREATE INDEX hidx ON testhstore USING GIN(h); + + + + + Examples + + + Add a key: + + +UPDATE tt SET h=h||'c=>3'; + + + Delete a key: + + +UPDATE tt SET h=delete(h,'k1'); + + + + + Statistics + +hstore type, because of its intrinsic liberality, could contain a lot of +different keys. Checking for valid keys is the task of application. +Examples below demonstrate several techniques how to check keys statistics. + + + + Simple example + + +SELECT * FROM each('aaa=>bq, b=>NULL, ""=>1 '); + + + + Using table + + +SELECT (each(h)).key, (each(h)).value INTO stat FROM testhstore ; + + + Online stat + +SELECT key, count(*) FROM (SELECT (each(h)).key FROM testhstore) AS stat GROUP BY key ORDER BY count DESC, key; + key | count +-----------+------- + line | 883 + query | 207 + pos | 203 + node | 202 + space | 197 + status | 195 + public | 194 + title | 190 + org | 189 +................... + + + + + Authors + + Oleg Bartunov oleg@sai.msu.su, Moscow, Moscow University, Russia + + + Teodor Sigaev teodor@sigaev.ru, Moscow, Delta-Soft Ltd.,Russia + + + + diff --git a/doc/src/sgml/intagg.sgml b/doc/src/sgml/intagg.sgml new file mode 100644 index 0000000000000000000000000000000000000000..3fbd5c3281fa558bf0bc87c7a26a81f3518b3ec9 --- /dev/null +++ b/doc/src/sgml/intagg.sgml @@ -0,0 +1,82 @@ + + + intagg + + + intagg + + + + This section describes the intagg module which provides an integer aggregator and an enumerator. + + + Many database systems have the notion of a one to many table. Such a table usually sits between two indexed tables, as: + + +CREATE TABLE one_to_many(left INT, right INT) ; + + + + And it is used like this: + + + + SELECT right.* from right JOIN one_to_many ON (right.id = one_to_many.right) + WHERE one_to_many.left = item; + + + + This will return all the items in the right hand table for an entry + in the left hand table. This is a very common construct in SQL. + + + + Now, this methodology can be cumbersome with a very large number of + entries in the one_to_many table. Depending on the order in which + data was entered, a join like this could result in an index scan + and a fetch for each right hand entry in the table for a particular + left hand entry. If you have a very dynamic system, there is not much you + can do. However, if you have some data which is fairly static, you can + create a summary table with the aggregator. + + + +CREATE TABLE summary as SELECT left, int_array_aggregate(right) + AS right FROM one_to_many GROUP BY left; + + + + This will create a table with one row per left item, and an array + of right items. Now this is pretty useless without some way of using + the array, thats why there is an array enumerator. + + +SELECT left, int_array_enum(right) FROM summary WHERE left = item; + + + + The above query using int_array_enum, produces the same results as: + + +SELECT left, right FROM one_to_many WHERE left = item; + + + + The difference is that the query against the summary table has to get + only one row from the table, where as the query against "one_to_many" + must index scan and fetch a row for each entry. + + + On our system, an EXPLAIN shows a query with a cost of 8488 gets reduced + to a cost of 329. The query is a join between the one_to_many table, + + +SELECT right, count(right) FROM +( + SELECT left, int_array_enum(right) AS right FROM summary JOIN + (SELECT left FROM left_table WHERE left = item) AS lefts + ON (summary.left = lefts.left ) +) AS list GROUP BY right ORDER BY count DESC ; + + + diff --git a/doc/src/sgml/intarray.sgml b/doc/src/sgml/intarray.sgml new file mode 100644 index 0000000000000000000000000000000000000000..7e538a894d5f510a0d51d8d816a741b2ddae8de9 --- /dev/null +++ b/doc/src/sgml/intarray.sgml @@ -0,0 +1,286 @@ + + intarray + + + intarray + + + + This is an implementation of RD-tree data structure using GiST interface + of PostgreSQL. It has built-in lossy compression. + + + + Current implementation provides index support for one-dimensional array of + int4's - gist__int_ops, suitable for small and medium size of arrays (used on + default), and gist__intbig_ops for indexing large arrays (we use superimposed + signature with length of 4096 bits to represent sets). + + + + Functions + + + + + + int icount(int[]) - the number of elements in intarray + + +test=# select icount('{1,2,3}'::int[]); + icount +-------- + 3 +(1 row) + + + + + + int[] sort(int[], 'asc' | 'desc') - sort intarray + + +test=# select sort('{1,2,3}'::int[],'desc'); + sort +--------- + {3,2,1} +(1 row) + + + + + + int[] sort(int[]) - sort in ascending order + + + + + + int[] sort_asc(int[]),sort_desc(int[]) - shortcuts for sort + + + + + + int[] uniq(int[]) - returns unique elements + + +test=# select uniq(sort('{1,2,3,2,1}'::int[])); + uniq +--------- + {1,2,3} +(1 row) + + + + + + int idx(int[], int item) - returns index of first + intarray matching element to item, or '0' if matching failed. + + +test=# select idx('{1,2,3,2,1}'::int[],2); + idx +----- + 2 +(1 row) + + + + + + int[] subarray(int[],int START [, int LEN]) - returns + part of intarray starting from element number START (from 1) and length LEN. + + +test=# select subarray('{1,2,3,2,1}'::int[],2,3); + subarray +---------- + {2,3,2} +(1 row) + + + + + + int[] intset(int4) - casting int4 to int[] + + +test=# select intset(1); + intset +-------- + {1} +(1 row) + + + + + + + + Operations + + Operations + + + + Operator + Description + + + + + int[] && int[] + overlap - returns TRUE if arrays have at least one common element + + + int[] @> int[] + contains - returns TRUE if left array contains right array + + + int[] <@ int[] + contained - returns TRUE if left array is contained in right array + + + # int[] + returns the number of elements in array + + + int[] + int + push element to array ( add to end of array) + + + int[] + int[] + merge of arrays (right array added to the end of left one) + + + int[] - int + remove entries matched by right argument from array + + + int[] - int[] + remove right array from left + + + int[] | int + returns intarray - union of arguments + + + int[] | int[] + returns intarray as a union of two arrays + + + + int[] & int[] + returns intersection of arrays + + + + int[] @@ query_int + + returns TRUE if array satisfies query (like + '1&(2|3)') + + + + + query_int ~~ int[] + returns TRUE if array satisfies query (commutator of @@) + + + +
+ + (Before PostgreSQL 8.2, the containment operators @> and <@ were + respectively called @ and ~. These names are still available, but are + deprecated and will eventually be retired. Notice that the old names + are reversed from the convention formerly followed by the core geometric + datatypes!) + +
+ + + Example + + +CREATE TABLE message (mid INT NOT NULL,sections INT[]); +CREATE TABLE message_section_map (mid INT NOT NULL,sid INT NOT NULL); + +-- create indices +CREATE unique index message_key ON message ( mid ); +CREATE unique index message_section_map_key2 ON message_section_map (sid, mid ); +CREATE INDEX message_rdtree_idx ON message USING GIST ( sections gist__int_ops); + +-- select some messages with section in 1 OR 2 - OVERLAP operator +SELECT message.mid FROM message WHERE message.sections && '{1,2}'; + +-- select messages contains in sections 1 AND 2 - CONTAINS operator +SELECT message.mid FROM message WHERE message.sections @> '{1,2}'; +-- the same, CONTAINED operator +SELECT message.mid FROM message WHERE '{1,2}' <@ message.sections; + + + + + Benchmark + + subdirectory bench contains benchmark suite. + + + cd ./bench + 1. createdb TEST + 2. psql TEST < ../_int.sql + 3. ./create_test.pl | psql TEST + 4. ./bench.pl - perl script to benchmark queries, supports OR, AND queries + with/without RD-Tree. Run script without arguments to + see availbale options. + + a)test without RD-Tree (OR) + ./bench.pl -d TEST -c -s 1,2 -v + b)test with RD-Tree + ./bench.pl -d TEST -c -s 1,2 -v -r + + BENCHMARKS: + + Size of table <message>: 200000 + Size of table <message_section_map>: 269133 + + Distribution of messages by sections: + + section 0: 74377 messages + section 1: 16284 messages + section 50: 1229 messages + section 99: 683 messages + + old - without RD-Tree support, + new - with RD-Tree + + +----------+---------------+----------------+ + |Search set|OR, time in sec|AND, time in sec| + | +-------+-------+--------+-------+ + | | old | new | old | new | + +----------+-------+-------+--------+-------+ + | 1| 0.625| 0.101| -| -| + +----------+-------+-------+--------+-------+ + | 99| 0.018| 0.017| -| -| + +----------+-------+-------+--------+-------+ + | 1,2| 0.766| 0.133| 0.628| 0.045| + +----------+-------+-------+--------+-------+ + | 1,2,50,65| 0.794| 0.141| 0.030| 0.006| + +----------+-------+-------+--------+-------+ + + + + + Authors + + All work was done by Teodor Sigaev (teodor@stack.net) and Oleg + Bartunov (oleg@sai.msu.su). See + for + additional information. Andrey Oktyabrski did a great work on adding new + functions and operations. + + + +
+ diff --git a/doc/src/sgml/isn.sgml b/doc/src/sgml/isn.sgml new file mode 100644 index 0000000000000000000000000000000000000000..c6fef47f0840a9feff8a5a4eb3a9ad0a509966ec --- /dev/null +++ b/doc/src/sgml/isn.sgml @@ -0,0 +1,502 @@ + + isn + + + isn + + + + The isn module adds data types for the following + international-standard namespaces: EAN13, UPC, ISBN (books), ISMN (music), + and ISSN (serials). This module is inspired by Garrett A. Wollman's + isbn_issn code. + + + This module validates, and automatically adds the correct + hyphenations to the numbers. Also, it supports the new ISBN-13 + numbers to be used starting in January 2007. + + + + Premises: + + + + + ISBN13, ISMN13, ISSN13 numbers are all EAN13 numbers + + + EAN13 numbers aren't always ISBN13, ISMN13 or ISSN13 (some are) + + + some ISBN13 numbers can be displayed as ISBN + + + some ISMN13 numbers can be displayed as ISMN + + + some ISSN13 numbers can be displayed as ISSN + + + all UPC, ISBN, ISMN and ISSN can be represented as EAN13 numbers + + + + + + All types are internally represented as 64 bit integers, + and internally all are consistently interchangeable. + + + + + We have two operator classes (for btree and for hash) so each data type + can be indexed for faster access. + + + + + Data types + + + We have the following data types: + + + + Data types + + + + Data type + Description + + + + + EAN13 + + + European Article Numbers. This type will always show the EAN13-display + format. Te output function for this is ean13_out() + + + + + + ISBN13 + + + For International Standard Book Numbers to be displayed in + the new EAN13-display format. + + + + + + ISMN13 + + + For International Standard Music Numbers to be displayed in + the new EAN13-display format. + + + + + ISSN13 + + + For International Standard Serial Numbers to be displayed in the new + EAN13-display format. + + + + + ISBN + + + For International Standard Book Numbers to be displayed in the current + short-display format. + + + + + ISMN + + + For International Standard Music Numbers to be displayed in the + current short-display format. + + + + + ISSN + + + For International Standard Serial Numbers to be displayed in the + current short-display format. These types will display the short + version of the ISxN (ISxN 10) whenever it's possible, and it will + show ISxN 13 when it's impossible to show the short version. The + output function to do this is isn_out() + + + + + UPC + + + For Universal Product Codes. UPC numbers are a subset of the EAN13 + numbers (they are basically EAN13 without the first '0' digit.) + The output function to do this is also isn_out() + + + + + +
+ + + + EAN13, ISBN13, + ISMN13 and ISSN13 types will always + display the long version of the ISxN (EAN13). The output function to do + this is ean13_out(). + + + The need for these types is just for displaying in different ways the same + data: ISBN13 is actually the same as + ISBN, ISMN13=ISMN and + ISSN13=ISSN. + + +
+ + + Input functions + + + We have the following input functions: + + + + Input functions + + + + Function + Description + + + + + ean13_in() + + + To take a string and return an EAN13. + + + + + + isbn_in() + + + To take a string and return valid ISBN or ISBN13 numbers. + + + + + + ismn_in() + + + To take a string and return valid ISMN or ISMN13 numbers. + + + + + + issn_in() + + + To take a string and return valid ISSN or ISSN13 numbers. + + + + + upc_in() + + + To take a string and return an UPC codes. + + + + + +
+
+ + + Casts + + + We are able to cast from: + + + + + ISBN13 -> EAN13 + + + + + ISMN13 -> EAN13 + + + + + ISSN13 -> EAN13 + + + + + ISBN -> EAN13 + + + + + ISMN -> EAN13 + + + + + ISSN -> EAN13 + + + + + UPC -> EAN13 + + + + + ISBN <-> ISBN13 + + + + + ISMN <-> ISMN13 + + + + + ISSN <-> ISSN13 + + + + + + + C API + + The C API is implemented as: + + + extern Datum isn_out(PG_FUNCTION_ARGS); + extern Datum ean13_out(PG_FUNCTION_ARGS); + extern Datum ean13_in(PG_FUNCTION_ARGS); + extern Datum isbn_in(PG_FUNCTION_ARGS); + extern Datum ismn_in(PG_FUNCTION_ARGS); + extern Datum issn_in(PG_FUNCTION_ARGS); + extern Datum upc_in(PG_FUNCTION_ARGS); + + + + On success: + + + + + isn_out() takes any of our types and returns a string containing + the shortes possible representation of the number. + + + + + ean13_out() takes any of our types and returns the + EAN13 (long) representation of the number. + + + + + ean13_in() takes a string and return a EAN13. Which, as stated in (2) + could or could not be any of our types, but it certainly is an EAN13 + number. Only if the string is a valid EAN13 number, otherwise it fails. + + + + + isbn_in() takes a string and return an ISBN/ISBN13. Only if the string + is really a ISBN/ISBN13, otherwise it fails. + + + + + ismn_in() takes a string and return an ISMN/ISMN13. Only if the string + is really a ISMN/ISMN13, otherwise it fails. + + + + + issn_in() takes a string and return an ISSN/ISSN13. Only if the string + is really a ISSN/ISSN13, otherwise it fails. + + + + + upc_in() takes a string and return an UPC. Only if the string is + really a UPC, otherwise it fails. + + + + + + (on failure, the functions 'ereport' the error) + + + + + Testing functions + + Testing functions + + + + Function + Description + + + + + isn_weak(boolean) + Sets the weak input mode. + + + isn_weak() + Gets the current status of the weak mode. + + + make_valid() + Validates an invalid number (deleting the invalid flag). + + + is_valid() + Checks for the invalid flag prsence. + + + +
+ + + Weak mode is used to be able to insert invalid data to + a table. Invalid as in the check digit being wrong, not missing numbers. + + + Why would you want to use the weak mode? Well, it could be that + you have a huge collection of ISBN numbers, and that there are so many of + them that for weird reasons some have the wrong check digit (perhaps the + numbers where scanned from a printed list and the OCR got the numbers wrong, + perhaps the numbers were manually captured... who knows.) Anyway, the thing + is you might want to clean the mess up, but you still want to be able to have + all the numbers in your database and maybe use an external tool to access + the invalid numbers in the database so you can verify the information and + validate it more easily; as selecting all the invalid numbers in the table. + + + When you insert invalid numbers in a table using the weak mode, the number + will be inserted with the corrected check digit, but it will be flagged + with an exclamation mark ('!') at the end (i.e. 0-11-000322-5!) + + + You can also force the insertion of invalid numbers even not in the weak mode, + appending the '!' character at the end of the number. + +
+ + + Examples + +--Using the types directly: +SELECT isbn('978-0-393-04002-9'); +SELECT isbn13('0901690546'); +SELECT issn('1436-4522'); + +--Casting types: +-- note that you can only cast from ean13 to other type when the casted +-- number would be valid in the realm of the casted type; +-- thus, the following will NOT work: select isbn(ean13('0220356483481')); +-- but these will: +SELECT upc(ean13('0220356483481')); +SELECT ean13(upc('220356483481')); + +--Create a table with a single column to hold ISBN numbers: +CREATE TABLE test ( id isbn ); +INSERT INTO test VALUES('9780393040029'); + +--Automatically calculating check digits (observe the '?'): +INSERT INTO test VALUES('220500896?'); +INSERT INTO test VALUES('978055215372?'); + +SELECT issn('3251231?'); +SELECT ismn('979047213542?'); + +--Using the weak mode: +SELECT isn_weak(true); +INSERT INTO test VALUES('978-0-11-000533-4'); +INSERT INTO test VALUES('9780141219307'); +INSERT INTO test VALUES('2-205-00876-X'); +SELECT isn_weak(false); + +SELECT id FROM test WHERE NOT is_valid(id); +UPDATE test SET id=make_valid(id) WHERE id = '2-205-00876-X!'; + +SELECT * FROM test; + +SELECT isbn13(id) FROM test; + + + + + Bibliography + + The information to implement this module was collected through + several sites, including: + + + http://www.isbn-international.org/ + http://www.issn.org/ + http://www.ismn-international.org/ + http://www.wikipedia.org/ + + + the prefixes used for hyphenation where also compiled from: + + + http://www.gs1.org/productssolutions/idkeys/support/prefix_list.html + http://www.isbn-international.org/en/identifiers.html + http://www.ismn-international.org/ranges.html + + + Care was taken during the creation of the algorithms and they + were meticulously verified against the suggested algorithms + in the official ISBN, ISMN, ISSN User Manuals. + + + + + Author + + Germán Méndez Bravo (Kronuz), 2004 - 2006 + + +
+ diff --git a/doc/src/sgml/lo.sgml b/doc/src/sgml/lo.sgml new file mode 100644 index 0000000000000000000000000000000000000000..2a23a5b5cd053d1eed8b0343911ee301f0f6e5f2 --- /dev/null +++ b/doc/src/sgml/lo.sgml @@ -0,0 +1,118 @@ + + + lo + + + lo + + + + PostgreSQL type extension for managing Large Objects + + + + Overview + + One of the problems with the JDBC driver (and this affects the ODBC driver + also), is that the specification assumes that references to BLOBS (Binary + Large OBjectS) are stored within a table, and if that entry is changed, the + associated BLOB is deleted from the database. + + + As PostgreSQL stands, this doesn't occur. Large objects are treated as + objects in their own right; a table entry can reference a large object by + OID, but there can be multiple table entries referencing the same large + object OID, so the system doesn't delete the large object just because you + change or remove one such entry. + + + Now this is fine for new PostgreSQL-specific applications, but existing ones + using JDBC or ODBC won't delete the objects, resulting in orphaning - objects + that are not referenced by anything, and simply occupy disk space. + + + + + The Fix + + I've fixed this by creating a new data type 'lo', some support functions, and + a Trigger which handles the orphaning problem. The trigger essentially just + does a 'lo_unlink' whenever you delete or modify a value referencing a large + object. When you use this trigger, you are assuming that there is only one + database reference to any large object that is referenced in a + trigger-controlled column! + + + The 'lo' type was created because we needed to differentiate between plain + OIDs and Large Objects. Currently the JDBC driver handles this dilemma easily, + but (after talking to Byron), the ODBC driver needed a unique type. They had + created an 'lo' type, but not the solution to orphaning. + + + You don't actually have to use the 'lo' type to use the trigger, but it may be + convenient to use it to keep track of which columns in your database represent + large objects that you are managing with the trigger. + + + + + How to Use + + The easiest way is by an example: + + + CREATE TABLE image (title TEXT, raster lo); + CREATE TRIGGER t_raster BEFORE UPDATE OR DELETE ON image + FOR EACH ROW EXECUTE PROCEDURE lo_manage(raster); + + + Create a trigger for each column that contains a lo type, and give the column + name as the trigger procedure argument. You can have more than one trigger on + a table if you need multiple lo columns in the same table, but don't forget to + give a different name to each trigger. + + + + + Issues + + + + + Dropping a table will still orphan any objects it contains, as the trigger + is not executed. + + + Avoid this by preceding the 'drop table' with 'delete from {table}'. + + + If you already have, or suspect you have, orphaned large objects, see + the contrib/vacuumlo module to help you clean them up. It's a good idea + to run contrib/vacuumlo occasionally as a back-stop to the lo_manage + trigger. + + + + + Some frontends may create their own tables, and will not create the + associated trigger(s). Also, users may not remember (or know) to create + the triggers. + + + + + + As the ODBC driver needs a permanent lo type (& JDBC could be optimised to + use it if it's Oid is fixed), and as the above issues can only be fixed by + some internal changes, I feel it should become a permanent built-in type. + + + + + Author + + Peter Mount peter@retep.org.uk June 13 1998 + + + + diff --git a/doc/src/sgml/ltree.sgml b/doc/src/sgml/ltree.sgml new file mode 100644 index 0000000000000000000000000000000000000000..75c02013c7315d23a99a63d2a740ece7c50bf576 --- /dev/null +++ b/doc/src/sgml/ltree.sgml @@ -0,0 +1,771 @@ + + + ltree + + + ltree + + + + ltree is a PostgreSQL module that contains implementation + of data types, indexed access methods and queries for data organized as a + tree-like structures. + + + + Definitions + + A label of a node is a sequence of one or more words + separated by blank character '_' and containing letters and digits ( for + example, [a-zA-Z0-9] for C locale). The length of a label is limited by 256 + bytes. + + + Example: 'Countries', 'Personal_Services' + + + A label path of a node is a sequence of one or more + dot-separated labels l1.l2...ln, represents path from root to the node. The + length of a label path is limited by 65Kb, but size <= 2Kb is preferrable. + We consider it's not a strict limitation (maximal size of label path for + DMOZ catalogue - , is about 240 + bytes!) + + + Example: 'Top.Countries.Europe.Russia' + + + We introduce several datatypes: + + + + + ltree - is a datatype for label path. + + + + + ltree[] - is a datatype for arrays of ltree. + + + + + lquery + - is a path expression that has regular expression in the label path and + used for ltree matching. Star symbol (*) is used to specify any number of + labels (levels) and could be used at the beginning and the end of lquery, + for example, '*.Europe.*'. + + + The following quantifiers are recognized for '*' (like in Perl): + + + + {n} Match exactly n levels + + + {n,} Match at least n levels + + + {n,m} Match at least n but not more than m levels + + + {,m} Match at maximum m levels (eq. to {0,m}) + + + + It is possible to use several modifiers at the end of a label: + + + + @ Do case-insensitive label matching + + + * Do prefix matching for a label + + + % Don't account word separator '_' in label matching, that is + 'Russian%' would match 'Russian_nations', but not 'Russian' + + + + + + lquery can contain logical '!' (NOT) at the beginning + of the label and '|' (OR) to specify possible alternatives for label + matching. + + + Example of lquery: + + + Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain + a) b) c) d) e) + + + A label path should + + + + + begin from a node with label 'Top' + + + + + and following zero or 2 labels until + + + + + a node with label beginning from case-insensitive prefix 'sport' + + + + + following node with label not matched 'football' or 'tennis' and + + + + + end on node with label beginning from 'Russ' or strictly matched + 'Spain'. + + + + + + + + ltxtquery + - is a datatype for label searching (like type 'query' for full text + searching, see contrib/tsearch). It's possible to use modifiers @,%,* at + the end of word. The meaning of modifiers are the same as for lquery. + + + Example: 'Europe & Russia*@ & !Transportation' + + + Search paths contain words 'Europe' and 'Russia*' (case-insensitive) and + not 'Transportation'. Notice, the order of words as they appear in label + path is not important ! + + + + + + + + Operations + + The following operations are defined for type ltree: + + + + + + <,>,<=,>=,=, <> + - Have their usual meanings. Comparison is doing in the order of direct + tree traversing, children of a node are sorted lexicographic. + + + + + ltree @> ltree + - returns TRUE if left argument is an ancestor of right argument (or + equal). + + + + + ltree <@ ltree + - returns TRUE if left argument is a descendant of right argument (or + equal). + + + + + ltree ~ lquery, lquery ~ ltree + - return TRUE if node represented by ltree satisfies lquery. + + + + + ltree ? lquery[], lquery ? ltree[] + - return TRUE if node represented by ltree satisfies at least one lquery + from array. + + + + + ltree @ ltxtquery, ltxtquery @ ltree + - return TRUE if node represented by ltree satisfies ltxtquery. + + + + + ltree || ltree, ltree || text, text || ltree + - return concatenated ltree. + + + + + + Operations for arrays of ltree (ltree[]): + + + + + ltree[] @> ltree, ltree <@ ltree[] + - returns TRUE if array ltree[] contains an ancestor of ltree. + + + + + ltree @> ltree[], ltree[] <@ ltree + - returns TRUE if array ltree[] contains a descendant of ltree. + + + + + ltree[] ~ lquery, lquery ~ ltree[] + - returns TRUE if array ltree[] contains label paths matched lquery. + + + + + ltree[] ? lquery[], lquery[] ? ltree[] + - returns TRUE if array ltree[] contains label paths matched atleaset one + lquery from array. + + + + + ltree[] @ ltxtquery, ltxtquery @ ltree[] + - returns TRUE if array ltree[] contains label paths matched ltxtquery + (full text search). + + + + + ltree[] ?@> ltree, ltree ?<@ ltree[], ltree[] ?~ lquery, ltree[] ?@ ltxtquery + + - returns first element of array ltree[] satisfies corresponding condition + and NULL in vice versa. + + + + + + + Remark + + + Operations <@, @>, @ and + ~ have analogues - ^<@, ^@>, ^@, ^~, which don't use + indices! + + + + + Indices + + Various indices could be created to speed up execution of operations: + + + + + + B-tree index over ltree: <, <=, =, >=, > + + + + + GiST index over ltree: <, <=, =, >=, >, @>, <@, @, ~, ? + + + Example: + + + CREATE INDEX path_gist_idx ON test USING GIST (path); + + + + GiST index over ltree[]: + ltree[]<@ ltree, ltree @> ltree[], @, ~, ?. + + + Example: + + + CREATE INDEX path_gist_idx ON test USING GIST (array_path); + + + Notices: This index is lossy. + + + + + + + Functions + + + + + ltree subltree(ltree, start, end) + returns subpath of ltree from start (inclusive) until the end. + + + # select subltree('Top.Child1.Child2',1,2); + subltree + -------- + Child1 + + + + + ltree subpath(ltree, OFFSET,LEN) and + ltree subpath(ltree, OFFSET) + returns subpath of ltree from OFFSET (inclusive) with length LEN. + If OFFSET is negative returns subpath starts that far from the end + of the path. If LENGTH is omitted, returns everything to the end + of the path. If LENGTH is negative, leaves that many labels off + the end of the path. + + + # select subpath('Top.Child1.Child2',1,2); + subpath + ------- + Child1.Child2 + + # select subpath('Top.Child1.Child2',-2,1); + subpath + --------- + Child1 + + + + + int4 nlevel(ltree) - returns level of the node. + + + # select nlevel('Top.Child1.Child2'); + nlevel + -------- + 3 + + + Note, that arguments start, end, OFFSET, LEN have meaning of level of the + node ! + + + + + int4 index(ltree,ltree) and + int4 index(ltree,ltree,OFFSET) + returns number of level of the first occurence of second argument in first + one beginning from OFFSET. if OFFSET is negative, than search begins from | + OFFSET| levels from the end of the path. + + + SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',3); + index + ------- + 6 + SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',-4); + index + ------- + 9 + + + + + ltree text2ltree(text) and + text ltree2text(text) cast functions for ltree and text. + + + + + ltree lca(ltree,ltree,...) (up to 8 arguments) and + ltree lca(ltree[]) Returns Lowest Common Ancestor (lca). + + + # select lca('1.2.2.3','1.2.3.4.5.6'); + lca + ----- + 1.2 + # select lca('{la.2.3,1.2.3.4.5.6}') is null; + ?column? + ---------- + f + + + + + + + Installation + + cd contrib/ltree + make + make install + make installcheck + + + + + Example + + createdb ltreetest + psql ltreetest < /usr/local/pgsql/share/contrib/ltree.sql + psql ltreetest < ltreetest.sql + + + +Now, we have a database ltreetest populated with a data describing hierarchy +shown below: + + + + + + TOP + / | \ + Science Hobbies Collections + / | \ + Astronomy Amateurs_Astronomy Pictures + / \ | + Astrophysics Cosmology Astronomy + / | \ + Galaxies Stars Astronauts + + + Inheritance: + + + +ltreetest=# select path from test where path <@ 'Top.Science'; + path +------------------------------------ + Top.Science + Top.Science.Astronomy + Top.Science.Astronomy.Astrophysics + Top.Science.Astronomy.Cosmology +(4 rows) + + + Matching: + + +ltreetest=# select path from test where path ~ '*.Astronomy.*'; + path +----------------------------------------------- + Top.Science.Astronomy + Top.Science.Astronomy.Astrophysics + Top.Science.Astronomy.Cosmology + Top.Collections.Pictures.Astronomy + Top.Collections.Pictures.Astronomy.Stars + Top.Collections.Pictures.Astronomy.Galaxies + Top.Collections.Pictures.Astronomy.Astronauts +(7 rows) +ltreetest=# select path from test where path ~ '*.!pictures@.*.Astronomy.*'; + path +------------------------------------ + Top.Science.Astronomy + Top.Science.Astronomy.Astrophysics + Top.Science.Astronomy.Cosmology +(3 rows) + + + Full text search: + + +ltreetest=# select path from test where path @ 'Astro*% & !pictures@'; + path +------------------------------------ + Top.Science.Astronomy + Top.Science.Astronomy.Astrophysics + Top.Science.Astronomy.Cosmology + Top.Hobbies.Amateurs_Astronomy +(4 rows) + +ltreetest=# select path from test where path @ 'Astro* & !pictures@'; + path +------------------------------------ + Top.Science.Astronomy + Top.Science.Astronomy.Astrophysics + Top.Science.Astronomy.Cosmology +(3 rows) + + + Using Functions: + + +ltreetest=# select subpath(path,0,2)||'Space'||subpath(path,2) from test where path <@ 'Top.Science.Astronomy'; + ?column? +------------------------------------------ + Top.Science.Space.Astronomy + Top.Science.Space.Astronomy.Astrophysics + Top.Science.Space.Astronomy.Cosmology +(3 rows) +We could create SQL-function: +CREATE FUNCTION ins_label(ltree, int4, text) RETURNS ltree +AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);' +LANGUAGE SQL IMMUTABLE; + + + and previous select could be rewritten as: + + + +ltreetest=# select ins_label(path,2,'Space') from test where path <@ 'Top.Science.Astronomy'; + ins_label +------------------------------------------ + Top.Science.Space.Astronomy + Top.Science.Space.Astronomy.Astrophysics + Top.Science.Space.Astronomy.Cosmology +(3 rows) + + + + Or with another arguments: + + + +CREATE FUNCTION ins_label(ltree, ltree, text) RETURNS ltree +AS 'select subpath($1,0,nlevel($2)) || $3 || subpath($1,nlevel($2));' +LANGUAGE SQL IMMUTABLE; + +ltreetest=# select ins_label(path,'Top.Science'::ltree,'Space') from test where path <@ 'Top.Science.Astronomy'; + ins_label +------------------------------------------ + Top.Science.Space.Astronomy + Top.Science.Space.Astronomy.Astrophysics + Top.Science.Space.Astronomy.Cosmology +(3 rows) + + + + + Additional data + + To get more feeling from our ltree module you could download + dmozltree-eng.sql.gz (about 3Mb tar.gz archive containing 300,274 nodes), + available from + + dmozltree-eng.sql.gz, which is DMOZ catalogue, prepared for use with ltree. + Setup your test database (dmoz), load ltree module and issue command: + + + zcat dmozltree-eng.sql.gz| psql dmoz + + + Data will be loaded into database dmoz and all indices will be created. + + + + + Benchmarks + + All runs were performed on my IBM ThinkPad T21 (256 MB RAM, 750Mhz) using DMOZ + data, containing 300,274 nodes (see above for download link). We used some + basic queries typical for walking through catalog. + + + + Queries + + + + Q0: Count all rows (sort of base time for comparison) + + + select count(*) from dmoz; + count + -------- + 300274 + (1 row) + + + + + Q1: Get direct children (without inheritance) + + + select path from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1}'; + path + ----------------------------------- + Top.Adult.Arts.Animation.Cartoons + Top.Adult.Arts.Animation.Anime + (2 rows) + + + + + Q2: The same as Q1 but with counting of successors + + + select path as parentpath , (select count(*)-1 from dmoz where path <@ + p.path) as count from dmoz p where path ~ 'Top.Adult.Arts.Animation.*{1}'; + parentpath | count + -----------------------------------+------- + Top.Adult.Arts.Animation.Cartoons | 2 + Top.Adult.Arts.Animation.Anime | 61 + (2 rows) + + + + + Q3: Get all parents + + + select path from dmoz where path @> 'Top.Adult.Arts.Animation' order by + path asc; + path + -------------------------- + Top + Top.Adult + Top.Adult.Arts + Top.Adult.Arts.Animation + (4 rows) + + + + + Q4: Get all parents with counting of children + + + select path, (select count(*)-1 from dmoz where path <@ p.path) as count + from dmoz p where path @> 'Top.Adult.Arts.Animation' order by path asc; + path | count + --------------------------+-------- + Top | 300273 + Top.Adult | 4913 + Top.Adult.Arts | 339 + Top.Adult.Arts.Animation | 65 + (4 rows) + + + + + Q5: Get all children with levels + + + select path, nlevel(path) - nlevel('Top.Adult.Arts.Animation') as level + from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1,2}' order by path asc; + path | level + ------------------------------------------------+------- + Top.Adult.Arts.Animation.Anime | 1 + Top.Adult.Arts.Animation.Anime.Fan_Works | 2 + Top.Adult.Arts.Animation.Anime.Games | 2 + Top.Adult.Arts.Animation.Anime.Genres | 2 + Top.Adult.Arts.Animation.Anime.Image_Galleries | 2 + Top.Adult.Arts.Animation.Anime.Multimedia | 2 + Top.Adult.Arts.Animation.Anime.Resources | 2 + Top.Adult.Arts.Animation.Anime.Titles | 2 + Top.Adult.Arts.Animation.Cartoons | 1 + Top.Adult.Arts.Animation.Cartoons.AVS | 2 + Top.Adult.Arts.Animation.Cartoons.Members | 2 + (11 rows) + + + + + + + Timings + ++---------------------------------------------+ +|Query|Rows|Time (ms) index|Time (ms) no index| +|-----+----+---------------+------------------| +| Q0| 1| NA| 1453.44| +|-----+----+---------------+------------------| +| Q1| 2| 0.49| 1001.54| +|-----+----+---------------+------------------| +| Q2| 2| 1.48| 3009.39| +|-----+----+---------------+------------------| +| Q3| 4| 0.55| 906.98| +|-----+----+---------------+------------------| +| Q4| 4| 24385.07| 4951.91| +|-----+----+---------------+------------------| +| Q5| 11| 0.85| 1003.23| ++---------------------------------------------+ + + + Timings without indices were obtained using operations which doesn't use + indices (see above) + + + + + Remarks + + We didn't run full-scale tests, also we didn't present (yet) data for + operations with arrays of ltree (ltree[]) and full text searching. We'll + appreciate your input. So far, below some (rather obvious) results: + + + + + Indices does help execution of queries + + + + + Q4 performs bad because one needs to read almost all data from the HDD + + + + + + + Some Backgrounds + + The approach we use for ltree is much like one we used in our other GiST based + contrib modules (intarray, tsearch, tree, btree_gist, rtree_gist). Theoretical + background is available in papers referenced from our GiST development page + (). + + + A hierarchical data structure (tree) is a set of nodes. Each node has a + signature (LPS) of a fixed size, which is a hashed label path of that node. + Traversing a tree we could *certainly* prune branches if + + + LQS (bitwise AND) LPS != LQS + + + where LQS is a signature of lquery or ltxtquery, obtained in the same way as + LPS. + + + ltree[]: + + + For array of ltree LPS is a bitwise OR-ed signatures of *ALL* children + reachable from that node. Signatures are stored in RD-tree, implemented using + GiST, which provides indexed access. + + + ltree: + + + For ltree we store LPS in a B-tree, implemented using GiST. Each node entry is + represented by (left_bound, signature, right_bound), so that we could speedup + operations <, <=, =, >=, > using left_bound, right_bound and prune branches of + a tree using signature. + + + + Authors + + All work was done by Teodor Sigaev (teodor@stack.net) and + Oleg Bartunov (oleg@sai.msu.su). See + for + additional information. Authors would like to thank Eugeny Rodichev for + helpful discussions. Comments and bug reports are welcome. + + + + diff --git a/contrib/oid2name/README.oid2name b/doc/src/sgml/oid2name.sgml similarity index 56% rename from contrib/oid2name/README.oid2name rename to doc/src/sgml/oid2name.sgml index 9dd1ddc310c34e298b2bd57ce5952711672de3e7..2c5c396522f30615d4057bf7585a90748408fd05 100644 --- a/contrib/oid2name/README.oid2name +++ b/doc/src/sgml/oid2name.sgml @@ -1,37 +1,70 @@ -This utility allows administrators to examine the file structure used by -PostgreSQL. To make use of it, you need to be familiar with the file -structure, which is described in the "Database File Layout" chapter of -the "Internals" section of the PostgreSQL documentation. - -Oid2name connects to the database and extracts OID, filenode, and table -name information. You can also have it show database OIDs and tablespace -OIDs. - -When displaying specific tables, you can select which tables to show by -using -o, -f and -t. The first switch takes an OID, the second takes -a filenode, and the third takes a tablename (actually, it's a LIKE -pattern, so you can use things like "foo%"). Note that you can use as many -of these switches as you like, and the listing will include all objects -matched by any of the switches. Also note that these switches can only -show objects in the database given in -d. - -If you don't give any of -o, -f or -t it will dump all the tables in the -database given in -d. If you don't give -d, it will show a database -listing. Alternatively you can give -s to get a tablespace listing. - -Additional switches: - -i include indexes and sequences in the database listing. - -x display more information about each object shown: - tablespace name, schema name, OID. - -S also show system objects - (those in information_schema, pg_toast and pg_catalog schemas) - -q don't display headers - (useful for scripting) - ---------------------------------------------------------------------------- - -Sample session: - + + oid2name + + + oid2name + + + + This utility allows administrators to examine the file structure used by + PostgreSQL. To make use of it, you need to be familiar with the file + structure, which is described in . + + + + Overview + + oid2name connects to the database and extracts OID, + filenode, and table name information. You can also have it show database + OIDs and tablespace OIDs. + + + When displaying specific tables, you can select which tables to show by + using -o, -f and -t. The first switch takes an OID, the second takes + a filenode, and the third takes a tablename (actually, it's a LIKE + pattern, so you can use things like "foo%"). Note that you can use as many + of these switches as you like, and the listing will include all objects + matched by any of the switches. Also note that these switches can only + show objects in the database given in -d. + + + If you don't give any of -o, -f or -t it will dump all the tables in the + database given in -d. If you don't give -d, it will show a database + listing. Alternatively you can give -s to get a tablespace listing. + + + Additional switches + + + + -i + include indexes and sequences in the database listing. + + + -x + display more information about each object shown: tablespace name, + schema name, OID. + + + + -S + also show system objects (those in information_schema, pg_toast + and pg_catalog schemas) + + + + -q + don't display headers(useful for scripting) + + + +
+
+ + + Examples + + $ oid2name All databases: Oid Database Name Tablespace @@ -147,19 +180,26 @@ From database "alvherre": 155156 foo $ # end of sample session. + + + + You can also get approximate size data for each object using psql. For + example, + + + SELECT relpages, relfilenode, relname FROM pg_class ORDER BY relpages DESC; + + + Each page is typically 8k. Relpages is updated by VACUUM. + + + + + Author + + b. palmer, bpalmer@crimelabs.net + + + +
---------------------------------------------------------------------------- - -You can also get approximate size data for each object using psql. For -example, - -SELECT relpages, relfilenode, relname FROM pg_class ORDER BY relpages DESC; - -Each page is typically 8k. Relpages is updated by VACUUM. - ---------------------------------------------------------------------------- - -Mail me with any problems or additions you would like to see. Clearing -house for the code will be at: http://www.crimelabs.net - -b. palmer, bpalmer@crimelabs.net diff --git a/doc/src/sgml/pageinspect.sgml b/doc/src/sgml/pageinspect.sgml new file mode 100644 index 0000000000000000000000000000000000000000..3fe1edf378fc32fe60523cb2463c5de98ff406a3 --- /dev/null +++ b/doc/src/sgml/pageinspect.sgml @@ -0,0 +1,125 @@ + + + pageinspect + + + pageinspect + + + + The functions in this module allow you to inspect the contents of data pages + at a low level, for debugging purposes. + + + + Functions included + + + + + get_raw_page reads one block of the named table and returns a copy as a + bytea field. This allows a single time-consistent copy of the block to be + made. Use of this functions is restricted to superusers. + + + + + + page_header shows fields which are common to all PostgreSQL heap and index + pages. Use of this function is restricted to superusers. + + + A page image obtained with get_raw_page should be passed as argument: + + + test=# SELECT * FROM page_header(get_raw_page('pg_class',0)); + lsn | tli | flags | lower | upper | special | pagesize | version + ----------+-----+-------+-------+-------+---------+----------+--------- + 0/3C5614 | 1 | 1 | 216 | 256 | 8192 | 8192 | 4 + (1 row) + + + The returned columns correspond to the fields in the PageHeaderData-struct, + see src/include/storage/bufpage.h for more details. + + + + + + heap_page_items shows all line pointers on a heap page. For those line + pointers that are in use, tuple headers are also shown. All tuples are + shown, whether or not the tuples were visible to an MVCC snapshot at the + time the raw page was copied. Use of this function is restricted to + superusers. + + + A heap page image obtained with get_raw_page should be passed as argument: + + + test=# SELECT * FROM heap_page_items(get_raw_page('pg_class',0)); + + + See src/include/storage/itemid.h and src/include/access/htup.h for + explanations of the fields returned. + + + + + + bt_metap() returns information about the btree index metapage: + + + test=> SELECT * FROM bt_metap('pg_cast_oid_index'); + -[ RECORD 1 ]----- + magic | 340322 + version | 2 + root | 1 + level | 0 + fastroot | 1 + fastlevel | 0 + + + + + + bt_page_stats() shows information about single btree pages: + + + test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1); + -[ RECORD 1 ]-+----- + blkno | 1 + type | l + live_items | 256 + dead_items | 0 + avg_item_size | 12 + page_size | 8192 + free_size | 4056 + btpo_prev | 0 + btpo_next | 0 + btpo | 0 + btpo_flags | 3 + + + + + + bt_page_items() returns information about specific items on btree pages: + + + test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1); + itemoffset | ctid | itemlen | nulls | vars | data + ------------+---------+---------+-------+------+------------- + 1 | (0,1) | 12 | f | f | 23 27 00 00 + 2 | (0,2) | 12 | f | f | 24 27 00 00 + 3 | (0,3) | 12 | f | f | 25 27 00 00 + 4 | (0,4) | 12 | f | f | 26 27 00 00 + 5 | (0,5) | 12 | f | f | 27 27 00 00 + 6 | (0,6) | 12 | f | f | 28 27 00 00 + 7 | (0,7) | 12 | f | f | 29 27 00 00 + 8 | (0,8) | 12 | f | f | 2a 27 00 00 + + + + + + diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml new file mode 100644 index 0000000000000000000000000000000000000000..7f73dfa9eb37785cb9e5d099cd3fbb3ebc9b2125 --- /dev/null +++ b/doc/src/sgml/pgbench.sgml @@ -0,0 +1,422 @@ + + + pgbench + + + pgbench + + + + pgbench is a simple program to run a benchmark test. + pgbench is a client application of PostgreSQL and runs + with PostgreSQL only. It performs lots of small and simple transactions + including SELECT/UPDATE/INSERT operations then calculates number of + transactions successfully completed within a second (transactions + per second, tps). Targeting data includes a table with at least 100k + tuples. + + + Example outputs from pgbench look like: + + +number of clients: 4 +number of transactions per client: 100 +number of processed transactions: 400/400 +tps = 19.875015(including connections establishing) +tps = 20.098827(excluding connections establishing) + + Similar program called "JDBCBench" already exists, but it requires + Java that may not be available on every platform. Moreover some + people concerned about the overhead of Java that might lead + inaccurate results. So I decided to write in pure C, and named + it "pgbench." + + + + Features of pgbench: + + + + + pgbench is written in C using libpq only. So it is very portable + and easy to install. + + + + + pgbench can simulate concurrent connections using asynchronous + capability of libpq. No threading is required. + + + + + + Overview + + + (optional)Initialize database by: + +pgbench -i <dbname> + + + where <dbname> is the name of database. pgbench uses four tables + accounts, branches, history and tellers. These tables will be + destroyed. Be very careful if you have tables having same + names. Default test data contains: + + +table # of tuples +------------------------- +branches 1 +tellers 10 +accounts 100000 +history 0 + + + You can increase the number of tuples by using -s option. branches, + tellers and accounts tables are created with a fillfactor which is + set using -F option. See below. + + + + Run the benchmark test + +pgbench <dbname> + + + The default configuration is: + + + number of clients: 1 + number of transactions per client: 10 + + + + + + <literal>pgbench</literal> options + + + + Parameter + Description + + + + + -h hostname + + + hostname where the backend is running. If this option + is omitted, pgbench will connect to the localhost via + Unix domain socket. + + + + + -p port + + + the port number that the backend is accepting. default is + libpq's default, usually 5432. + + + + + -c number_of_clients + + + Number of clients simulated. default is 1. + + + + + -t number_of_transactions + + + Number of transactions each client runs. default is 10. + + + + + -s scaling_factor + + + this should be used with -i (initialize) option. + number of tuples generated will be multiple of the + scaling factor. For example, -s 100 will imply 10M + (10,000,000) tuples in the accounts table. + default is 1. + + + NOTE: scaling factor should be at least + as large as the largest number of clients you intend + to test; else you'll mostly be measuring update contention. + Regular (not initializing) runs using one of the + built-in tests will detect scale based on the number of + branches in the database. For custom (-f) runs it can + be manually specified with this parameter. + + + + + -D varname=value + + + Define a variable. It can be refered to by a script + provided by using -f option. Multiple -D options are allowed. + + + + + -U login + + + Specify db user's login name if it is different from + the Unix login name. + + + + + -P password + + + Specify the db password. CAUTION: using this option + might be a security hole since ps command will + show the password. Use this for TESTING PURPOSE ONLY. + + + + + -n + + + No vacuuming and cleaning the history table prior to the + test is performed. + + + + + -v + + + Do vacuuming before testing. This will take some time. + With neither -n nor -v, pgbench will vacuum tellers and + branches tables only. + + + + + -S + + + Perform select only transactions instead of TPC-B. + + + + + -N + + + Do not update "branches" and "tellers". This will + avoid heavy update contention on branches and tellers, + while it will not make pgbench supporting TPC-B like + transactions. + + + + + -f filename + + + Read transaction script from file. Detailed + explanation will appear later. + + + + + -C + + + Establish connection for each transaction, rather than + doing it just once at beginning of pgbench in the normal + mode. This is useful to measure the connection overhead. + + + + + -l + + + Write the time taken by each transaction to a logfile, + with the name "pgbench_log.xxx", where xxx is the PID + of the pgbench process. The format of the log is: + + + client_id transaction_no time file_no time-epoch time-us + + + where time is measured in microseconds, , the file_no is + which test file was used (useful when multiple were + specified with -f), and time-epoch/time-us are a + UNIX epoch format timestamp followed by an offset + in microseconds (suitable for creating a ISO 8601 + timestamp with a fraction of a second) of when + the transaction completed. + + + Here are example outputs: + + + 0 199 2241 0 1175850568 995598 + 0 200 2465 0 1175850568 998079 + 0 201 2513 0 1175850569 608 + 0 202 2038 0 1175850569 2663 + + + + + -F fillfactor + + + Create tables(accounts, tellers and branches) with the given + fillfactor. Default is 100. This should be used with -i + (initialize) option. + + + + + -d + + + debug option. + + + + + +
+
+ + + What is the "transaction" actually performed in pgbench? + + begin; + + update accounts set abalance = abalance + :delta where aid = :aid; + + select abalance from accounts where aid = :aid; + + update tellers set tbalance = tbalance + :delta where tid = :tid; + + update branches set bbalance = bbalance + :delta where bid = :bid; + + insert into history(tid,bid,aid,delta) values(:tid,:bid,:aid,:delta); + + end; + + + If you specify -N, (4) and (5) aren't included in the transaction. + + + + + Script file + + pgbench has support for reading a transaction script + from a specified file (-f option). This file should + include SQL commands in each line. SQL command consists of multiple lines + are not supported. Empty lines and lines begging with "--" will be ignored. + + + Multiple -f options are allowed. In this case each + transaction is assigned randomly chosen script. + + + SQL commands can include "meta command" which begins with "\" (back + slash). A meta command takes some arguments separted by white + spaces. Currently following meta command is supported: + + + + + + \set name operand1 [ operator operand2 ] + - Sets the calculated value using "operand1" "operator" + "operand2" to variable "name". If "operator" and "operand2" + are omitted, the value of operand1 is set to variable "name". + + + Example: + + +\set ntellers 10 * :scale + + + + + \setrandom name min max + - Assigns random integer to name between min and max + + + Example: + + +\setrandom aid 1 100000 + + + + + Variables can be referred to in SQL comands by adding ":" in front + of the varible name. + + + Example: + + +SELECT abalance FROM accounts WHERE aid = :aid + + + Variables can also be defined by using -D option. + + + + + + + Examples + + Example, TPC-B like benchmark can be defined as follows(scaling + factor = 1): + + +\set nbranches :scale +\set ntellers 10 * :scale +\set naccounts 100000 * :scale +\setrandom aid 1 :naccounts +\setrandom bid 1 :nbranches +\setrandom tid 1 :ntellers +\setrandom delta 1 10000 +BEGIN +UPDATE accounts SET abalance = abalance + :delta WHERE aid = :aid +SELECT abalance FROM accounts WHERE aid = :aid +UPDATE tellers SET tbalance = tbalance + :delta WHERE tid = :tid +UPDATE branches SET bbalance = bbalance + :delta WHERE bid = :bid +INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, 'now') +END + + + If you want to automatically set the scaling factor from the number of + tuples in branches table, use -s option and shell command like this: + + +pgbench -s $(psql -At -c "SELECT count(*) FROM branches") -f tpc_b.sql + + + Notice that -f option does not execute vacuum and clearing history + table before starting benchmark. + + + +
+ diff --git a/doc/src/sgml/pgcrypto.sgml b/doc/src/sgml/pgcrypto.sgml new file mode 100644 index 0000000000000000000000000000000000000000..4da29e0377997d86489e9e1abd03e5b5b0ab9f87 --- /dev/null +++ b/doc/src/sgml/pgcrypto.sgml @@ -0,0 +1,1144 @@ + + + pgcrypto + + + pgcrypto + + + + This module provides cryptographic functions for PostgreSQL. + + + + Notes + + Configuration + + pgcrypto configures itself according to the findings of main PostgreSQL + configure script. The options that affect it are + --with-zlib and --with-openssl. + + + When compiled with zlib, PGP encryption functions are able to + compress data before encrypting. + + + When compiled with OpenSSL there will be more algorithms available. + Also public-key encryption functions will be faster as OpenSSL + has more optimized BIGNUM functions. + + + Summary of functionality with and without OpenSSL: + + + Summary of functionality with and without OpenSSL + + + + Functionality + built-in + OpenSSL + + + + + MD5 + yes + yes + + + SHA1 + yes + yes + + + SHA224/256/384/512 + yes + yes (3) + + + Any other digest algo + no + yes (1) + + + Blowfish + yes + yes + + + AES + yes + yes (2) + + + DES/3DES/CAST5 + no + yes + + + Raw encryption + yes + yes + + + PGP Symetric encryption + yes + yes + + + PGP Public-Key encryption + yes + yes + + + +
+ + + + + Any digest algorithm OpenSSL supports is automatically picked up. + This is not possible with ciphers, which need to be supported + explicitly. + + + + + AES is included in OpenSSL since version 0.9.7. If pgcrypto is + compiled against older version, it will use built-in AES code, + so it has AES always available. + + + + + SHA2 algorithms were added to OpenSSL in version 0.9.8. For + older versions, pgcrypto will use built-in code. + + + +
+ + + NULL handling + + As standard in SQL, all functions return NULL, if any of the arguments + are NULL. This may create security risks on careless usage. + + + + + Security + + All the functions here run inside database server. That means that all + the data and passwords move between pgcrypto and client application in + clear-text. Thus you must: + + + + + Connect locally or use SSL connections. + + + Trust both system and database administrator. + + + + If you cannot, then better do crypto inside client application. + + +
+ + + General hashing + + + <literal>digest(data, type)</literal> + + digest(data text, type text) RETURNS bytea + digest(data bytea, type text) RETURNS bytea + + + Type is here the algorithm to use. Standard algorithms are `md5` and + `sha1`, although there may be more supported, depending on build + options. + + + Returns binary hash. + + + If you want hexadecimal string, use `encode()` on result. Example: + + + CREATE OR REPLACE FUNCTION sha1(bytea) RETURNS text AS $$ + SELECT encode(digest($1, 'sha1'), 'hex') + $$ LANGUAGE SQL STRICT IMMUTABLE; + + + + + <literal>hmac(data, key, type)</literal> + + hmac(data text, key text, type text) RETURNS bytea + hmac(data bytea, key text, type text) RETURNS bytea + + + Calculates Hashed MAC over data. `type` is the same as in `digest()`. + If the key is larger than hash block size it will first hashed and the + hash will be used as key. + + + It is similar to digest() but the hash can be recalculated only knowing + the key. This avoids the scenario of someone altering data and also + changing the hash. + + + Returns binary hash. + + + + + + Password hashing + + The functions crypt() and gen_salt() are specifically designed + for hashing passwords. crypt() does the hashing and `gen_salt()` + prepares algorithm parameters for it. + + + The algorithms in `crypt()` differ from usual hashing algorithms like + MD5 or SHA1 in following respects: + + + + + They are slow. As the amount of data is so small, this is only + way to make brute-forcing passwords hard. + + + + + Include random 'salt' with result, so that users having same + password would have different crypted passwords. This is also + additional defense against reversing the algorithm. + + + + + Include algorithm type in the result, so passwords hashed with + different algorithms can co-exist. + + + + + Some of them are adaptive - that means after computers get + faster, you can tune the algorithm to be slower, without + introducing incompatibility with existing passwords. + + + + + + Supported algorithms: + + +`------`-------------`---------`----------`--------------------------- + Type Max password Adaptive Salt bits Description +---------------------------------------------------------------------- +`bf` 72 yes 128 Blowfish-based, variant 2a +`md5` unlimited no 48 md5-based crypt() +`xdes` 8 yes 24 Extended DES +`des` 8 no 12 Original UNIX crypt +---------------------------------------------------------------------- + + + + crypt(password, salt) + + crypt(password text, salt text) RETURNS text + + + Calculates UN*X crypt(3) style hash of password. When storing new + password, you need to use function `gen_salt()` to generate new salt. + When checking password you should use existing hash as salt. + + + Example - setting new password: + + + UPDATE .. SET pswhash = crypt('new password', gen_salt('md5')); + + + Example - authentication: + + + SELECT pswhash = crypt('entered password', pswhash) WHERE .. ; + + + returns true or false whether the entered password is correct. + It also can return NULL if `pswhash` field is NULL. + + + + + gen_salt(type) + + gen_salt(type text) RETURNS text + + + Generates a new random salt for usage in `crypt()`. For adaptible + algorithms, it uses the default iteration count. + + + Accepted types are: `des`, `xdes`, `md5` and `bf`. + + + + gen_salt(type, rounds) + + gen_salt(type text, rounds integer) RETURNS text + + + algorithms. The higher the count, the more time it takes to hash + the password and therefore the more time to break it. Although with + too high count the time to calculate a hash may be several years + - which is somewhat impractical. + + + Number is algorithm specific: + + +`-----'---------'-----'---------- + type default min max +--------------------------------- + `xdes` 725 1 16777215 + `bf` 6 4 31 +--------------------------------- + + + In case of xdes there is a additional limitation that the count must be + a odd number. + + + Notes: + + + + + Original DES crypt was designed to have the speed of 4 hashes per + second on the hardware of that time. + + + + + Slower than 4 hashes per second would probably dampen usability. + + + + + Faster than 100 hashes per second is probably too fast. + + + + + See next section about possible values for `crypt-bf`. + + + + + + + Comparison of crypt and regular hashes + + Here is a table that should give overview of relative slowness + of different hashing algorithms. + + + + + The goal is to crack a 8-character password, which consists: + + + Only of lowercase letters + Numbers, lower- and uppercase letters. + + + + + The table below shows how much time it would take to try all + combinations of characters. + + + + + The crypt-bf is featured in several settings - the number + after slash is the rounds parameter of + gen_salt(). + + + + +`------------'----------'--------------'-------------------- +Algorithm Hashes/sec Chars: [a-z] Chars: [A-Za-z0-9] +------------------------------------------------------------ +crypt-bf/8 28 246 years 251322 years +crypt-bf/7 57 121 years 123457 years +crypt-bf/6 112 62 years 62831 years +crypt-bf/5 211 33 years 33351 years +crypt-md5 2681 2.6 years 2625 years +crypt-des 362837 7 days 19 years +sha1 590223 4 days 12 years +md5 2345086 1 day 3 years +------------------------------------------------------------ + + + + + The machine used is 1.5GHz Pentium 4. + + + + + crypt-des and crypt-md5 algorithm numbers are taken from + John the Ripper v1.6.38 `-test` output. + + + + + MD5 numbers are from mdcrack 1.2. + + + + + SHA1 numbers are from lcrack-20031130-beta. + + + + + crypt-bf numbers are taken using simple program that loops + over 1000 8-character passwords. That way I can show the speed with + different number of rounds. For reference: john -test + shows 213 loops/sec for crypt-bf/5. (The small difference in results is + in accordance to the fact that the crypt-bf implementation in pgcrypto + is same one that is used in John the Ripper.) + + + + + + Note that "try all combinations" is not a realistic exercise. + Usually password cracking is done with the help of dictionaries, which + contain both regular words and various mutations of them. So, even + somewhat word-like passwords could be cracked much faster than the above + numbers suggest, and a 6-character non-word like password may escape + cracking. Or not. + + + + + + + PGP encryption + + The functions here implement the encryption part of OpenPGP (RFC2440) + standard. Supported are both symmetric-key and public-key encryption. + + + + Overview + + Encrypted PGP message consists of 2 packets: + + + Packet for session key - either symmetric- or public-key encrypted. + Packet for session-key encrypted data. + + + When encrypting with password: + + + + + Given password is hashed using String2Key (S2K) algorithm. This + is rather similar to `crypt()` algorithm - purposefully slow + and with random salt - but it produces a full-length binary key. + + + + + If separate session key is requested, new random key will be + generated. Otherwise S2K key will be used directly as session key. + + + + + If S2K key is to be used directly, then only S2K settings will be put + into session key packet. Otherwise session key will be encrypted with + S2K key and put into session key packet. + + + + + When encrypting with public key: + + + New random session key is generated. + It is encrypted using public key and put into session key packet. + + + + Now common part, the session-key encrypted data packet: + + + + + Optional data-manipulation: compression, conversion to UTF-8, + conversion of line-endings. + + + + + Data is prefixed with block of random bytes. This is equal + to using random IV. + + + + + A SHA1 hash of random prefix and data is appended. + + + + + All this is encrypted with session key. + + + + + + + <literal>pgp_sym_encrypt(data, psw)</literal> + + pgp_sym_encrypt(data text, psw text [, options text] ) RETURNS bytea + pgp_sym_encrypt_bytea(data bytea, psw text [, options text] ) RETURNS bytea + + + Return a symmetric-key encrypted PGP message. + + + Options are described in section 5.8. + + + + + <literal>pgp_sym_decrypt(msg, psw)</literal> + + pgp_sym_decrypt(msg bytea, psw text [, options text] ) RETURNS text + pgp_sym_decrypt_bytea(msg bytea, psw text [, options text] ) RETURNS bytea + + + Decrypt a symmetric-key encrypted PGP message. + + + Decrypting bytea data with `pgp_sym_decrypt` is disallowed. + This is to avoid outputting invalid character data. Decrypting + originally textual data with `pgp_sym_decrypt_bytea` is fine. + + + Options are described in section 5.8. + + + + + <literal>pgp_pub_encrypt(data, pub_key)</literal> + + pgp_pub_encrypt(data text, key bytea [, options text] ) RETURNS bytea + pgp_pub_encrypt_bytea(data bytea, key bytea [, options text] ) RETURNS bytea + + + Encrypt data with a public key. Giving this function a secret key will + produce a error. + + + Options are described in section 5.8. + + + + + <literal>pgp_pub_decrypt(msg, sec_key [, psw])</literal> + + pgp_pub_decrypt(msg bytea, key bytea [, psw text [, options text]] ) RETURNS text + pgp_pub_decrypt_bytea(msg bytea, key bytea [,psw text [, options text]] ) RETURNS bytea + + + Decrypt a public-key encrypted message with secret key. If the secret + key is password-protected, you must give the password in `psw`. If + there is no password, but you want to specify option for function, you + need to give empty password. + + + Decrypting bytea data with `pgp_pub_decrypt` is disallowed. + This is to avoid outputting invalid character data. Decrypting + originally textual data with `pgp_pub_decrypt_bytea` is fine. + + + Options are described in section 5.8. + + + + + <literal>pgp_key_id(key / msg)</literal> + + pgp_key_id(key or msg bytea) RETURNS text + + + It shows you either key ID if given PGP public or secret key. Or it + gives the key ID that was used for encrypting the data, if given + encrypted message. + + + It can return 2 special key IDs: + + + + + SYMKEY: + + + The data is encrypted with symmetric key. + + + + + ANYKEY: + + + The data is public-key encrypted, but the key ID is cleared. + That means you need to try all your secret keys on it to see + which one decrypts it. pgcrypto itself does not produce such + messages. + + + + + Note that different keys may have same ID. This is rare but normal + event. Client application should then try to decrypt with each one, + to see which fits - like handling ANYKEY. + + + + + <literal>armor / dearmor</literal> + + armor(data bytea) RETURNS text + dearmor(data text) RETURNS bytea + + + Those wrap/unwrap data into PGP Ascii Armor which is basically Base64 + with CRC and additional formatting. + + + + + Options for PGP functions + + Options are named to be similar to GnuPG. Values should be given after + an equal sign; separate options from each other with commas. Example: + + + pgp_sym_encrypt(data, psw, 'compress-algo=1, cipher-algo=aes256') + + + All of the options except `convert-crlf` apply only to encrypt + functions. Decrypt functions get the parameters from PGP data. + + + Most interesting options are probably `compression-algo` and + unicode-mode. The rest should have reasonable defaults. + + + + + cipher-algo + + What cipher algorithm to use. + + + Values: bf, aes128, aes192, aes256 (OpenSSL-only: `3des`, `cast5`) + Default: aes128 + Applies: pgp_sym_encrypt, pgp_pub_encrypt + + + + + compress-algo + + Which compression algorithm to use. Needs building with zlib. + + + Values: + + + 0 - no compression + 1 - ZIP compression + 2 - ZLIB compression [=ZIP plus meta-data and block-CRC's] + Default: 0 + Applies: pgp_sym_encrypt, pgp_pub_encrypt + + + + + compress-level + + How much to compress. Bigger level compresses smaller but is slower. + 0 disables compression. + + + Values: 0, 1-9 + Default: 6 + Applies: pgp_sym_encrypt, pgp_pub_encrypt + + + + + convert-crlf + + Whether to convert `\n` into `\r\n` when encrypting and `\r\n` to `\n` + when decrypting. RFC2440 specifies that text data should be stored + using `\r\n` line-feeds. Use this to get fully RFC-compliant + behavior. + + + Values: 0, 1 + Default: 0 + Applies: pgp_sym_encrypt, pgp_pub_encrypt, pgp_sym_decrypt, pgp_pub_decrypt + + + + + disable-mdc + + Do not protect data with SHA-1. Only good reason to use this + option is to achieve compatibility with ancient PGP products, as the + SHA-1 protected packet is from upcoming update to RFC2440. (Currently + at version RFC2440bis-14.) Recent gnupg.org and pgp.com software + supports it fine. + + + Values: 0, 1 + Default: 0 + Applies: pgp_sym_encrypt, pgp_pub_encrypt + + + + + enable-session-key + + Use separate session key. Public-key encryption always uses separate + session key, this is for symmetric-key encryption, which by default + uses S2K directly. + + + Values: 0, 1 + Default: 0 + Applies: pgp_sym_encrypt + + + + + s2k-mode + + Which S2K algorithm to use. + + + Values: + 0 - Without salt. Dangerous! + 1 - With salt but with fixed iteration count. + 3 - Variable iteration count. + Default: 3 + Applies: pgp_sym_encrypt + + + + + s2k-digest-algo + + Which digest algorithm to use in S2K calculation. + + + Values: md5, sha1 + Default: sha1 + Applies: pgp_sym_encrypt + + + + + s2k-cipher-algo + + Which cipher to use for encrypting separate session key. + + + Values: bf, aes, aes128, aes192, aes256 + Default: use cipher-algo. + Applies: pgp_sym_encrypt + + + + + unicode-mode + + Whether to convert textual data from database internal encoding to + UTF-8 and back. If your database already is UTF-8, no conversion will + be done, only the data will be tagged as UTF-8. Without this option + it will not be. + + + Values: 0, 1 + Default: 0 + Applies: pgp_sym_encrypt, pgp_pub_encrypt + + + + + + Generating keys with GnuPG + + Generate a new key: + + + gpg --gen-key + + + The preferred key type is "DSA and Elgamal". + + + For RSA encryption you must create either DSA or RSA sign-only key + as master and then add RSA encryption subkey with `gpg --edit-key`. + + + List keys: + + + gpg --list-secret-keys + + + Export ascii-armored public key: + + + gpg -a --export KEYID > public.key + + + Export ascii-armored secret key: + + + gpg -a --export-secret-keys KEYID > secret.key + + + You need to use `dearmor()` on them before giving them to + pgp_pub_* functions. Or if you can handle binary data, you can drop + "-a" from gpg. + + + For more details see `man gpg`, + [The GNU + Privacy Handbook] and other docs on + site. + + + + + Limitations of PGP code + + + + No support for signing. That also means that it is not checked + whether the encryption subkey belongs to master key. + + + + + No support for encryption key as master key. As such practice + is generally discouraged, it should not be a problem. + + + + + No support for several subkeys. This may seem like a problem, as this + is common practice. On the other hand, you should not use your regular + GPG/PGP keys with pgcrypto, but create new ones, as the usage scenario + is rather different. + + + + + + + Raw encryption + + Those functions only run a cipher over data, they don't have any advanced + features of PGP encryption. Therefore they have some major problems: + + + + + They use user key directly as cipher key. + + + + + They don't provide any integrity checking, to see + if the encrypted data was modified. + + + + + They expect that users manage all encryption parameters + themselves, even IV. + + + + + They don't handle text. + + + + + So, with the introduction of PGP encryption, usage of raw + encryption functions is discouraged. + + + encrypt(data bytea, key bytea, type text) RETURNS bytea + decrypt(data bytea, key bytea, type text) RETURNS bytea + + encrypt_iv(data bytea, key bytea, iv bytea, type text) RETURNS bytea + decrypt_iv(data bytea, key bytea, iv bytea, type text) RETURNS bytea + + + Encrypt/decrypt data with cipher, padding data if needed. + + + type parameter description in pseudo-noteup: + + + algo ['-' mode] ['/pad:' padding] + + + Supported algorithms: + + + bf- Blowfish + aes- AES (Rijndael-128) + + + Modes: + + + + + cbc- next block depends on previous. (default) + + + + + ecb- each block is encrypted separately. (for testing + only) + + + + + Padding: + + + + + pkcs-data may be any length (default) + + + + + none- data must be multiple of cipher block size. + + + + + IV is initial value for mode, defaults to all zeroes. It is ignored for + ECB. It is clipped or padded with zeroes if not exactly block size. + + + So, example: + + + encrypt(data, 'fooz', 'bf') + + + is equal to + + + encrypt(data, 'fooz', 'bf-cbc/pad:pkcs') + + + + + Random bytes + + gen_random_bytes(count integer) + + + Returns `count` cryptographically strong random bytes as bytea value. + There can be maximally 1024 bytes extracted at a time. This is to avoid + draining the randomness generator pool. + + + + + References/Links + + + Useful reading + + + : + The GNU Privacy Handbook + + + : + Describes the crypt-blowfish algorithm. + + + + : + + How to choose good password. + + + : + Interesting idea for picking passwords. + + + + : + + Describes good and bad cryptography. + + + + + + Technical references + + + : + OpenPGP message format + + + + : + + New version of RFC2440. + + + : + The MD5 Message-Digest Algorithm + + + : + HMAC: Keyed-Hashing for Message Authentication + + + + : + + Comparison of crypt-des, crypt-md5 and bcrypt algorithms. + + + : + Standards for DES, 3DES and AES. + + + + : + + Description of Fortuna CSPRNG. + + + : + Jean-Luc Cooke Fortuna-based /dev/random driver for Linux. + + + : + Collection of cryptology pointers. + + + + + + + Credits + + pgcrypto uses code from the following sources: + + + Credits + + + + Algorithm + Author + Source origin + + + + + DES crypt() + David Burren and others + FreeBSD libcrypt + + + MD5 crypt() + Poul-Henning Kamp + FreeBSD libcrypt + + + Blowfish crypt() + Solar Designer + www.openwall.com + + + Blowfish cipher + Simon Tatham + PuTTY + + + Rijndael cipher + Brian Gladman + OpenBSD sys/crypto + + + MD5 and SHA1 + WIDE Project + KAME kame/sys/crypto + + + SHA256/384/512 + Aaron D. Gifford + OpenBSD sys/crypto + + + BIGNUM math + Michael J. Fromberger + dartmouth.edu/~sting/sw/imath + + + +
+
+ + + Author + + Marko Kreen markokr@gmail.com + + + +
+ diff --git a/doc/src/sgml/pgrowlocks.sgml b/doc/src/sgml/pgrowlocks.sgml new file mode 100644 index 0000000000000000000000000000000000000000..f7b1e479a06c0e62f4fe94cb0759ce62d364a222 --- /dev/null +++ b/doc/src/sgml/pgrowlocks.sgml @@ -0,0 +1,123 @@ + + + pgrowlocks + + + pgrowlocks + + + + The pgrowlocks module provides a function to show row + locking information for a specified table. + + + + Overview + +pgrowlocks(text) RETURNS pgrowlocks_type + + + The parameter is a name of table. And pgrowlocks_type is + defined as: + + +CREATE TYPE pgrowlocks_type AS ( + locked_row TID, -- row TID + lock_type TEXT, -- lock type + locker XID, -- locking XID + multi bool, -- multi XID? + xids xid[], -- multi XIDs + pids INTEGER[] -- locker's process id +); + + + + pgrowlocks_type + + + + locked_row + tuple ID(TID) of each locked rows + + + lock_type + "Shared" for shared lock, "Exclusive" for exclusive lock + + + locker + transaction ID of locker (Note 1) + + + multi + "t" if locker is a multi transaction, otherwise "f" + + + xids + XIDs of lockers (Note 2) + + + pids + process ids of locking backends + + + +
+ + Note1: If the locker is multi transaction, it represents the multi ID. + + + Note2: If the locker is multi, multiple data are shown. + + + + The calling sequence for pgrowlocks is as follows: + pgrowlocks grabs AccessShareLock for the target table and + reads each row one by one to get the row locking information. You should + notice that: + + + + + if the table is exclusive locked by someone else, + pgrowlocks will be blocked. + + + + + pgrowlocks may show incorrect information if there's a + new lock or a lock is freeed while its execution. + + + + + pgrowlocks does not show the contents of locked rows. If + you want to take a look at the row contents at the same time, you could do + something like this: + + +SELECT * FROM accounts AS a, pgrowlocks('accounts') AS p WHERE p.locked_ row = a.ctid; + +
+ + + Example + + pgrowlocks returns the following data type: + + + Here is a sample execution of pgrowlocks: + + +test=# SELECT * FROM pgrowlocks('t1'); + locked_row | lock_type | locker | multi | xids | pids +------------+-----------+--------+-------+-----------+--------------- + (0,1) | Shared | 19 | t | {804,805} | {29066,29068} + (0,2) | Shared | 19 | t | {804,805} | {29066,29068} + (0,3) | Exclusive | 804 | f | {804} | {29066} + (0,4) | Exclusive | 804 | f | {804} | {29066} +(4 rows) + + + +
+ diff --git a/doc/src/sgml/pgstattuple.sgml b/doc/src/sgml/pgstattuple.sgml new file mode 100644 index 0000000000000000000000000000000000000000..e8fa71602cfe2f5214b635ddb64cb5df04d312b9 --- /dev/null +++ b/doc/src/sgml/pgstattuple.sgml @@ -0,0 +1,158 @@ + + + pgstattuple + + + pgstattuple + + + + pgstattuple modules provides various functions to obtain + tuple statistics. + + + + Functions + + + + + pgstattuple() returns the relation length, percentage + of the "dead" tuples of a relation and other info. This may help users to + determine whether vacuum is necessary or not. Here is an example session: + + +test=> \x +Expanded display is on. +test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc'); +-[ RECORD 1 ]------+------- +table_len | 458752 +tuple_count | 1470 +tuple_len | 438896 +tuple_percent | 95.67 +dead_tuple_count | 11 +dead_tuple_len | 3157 +dead_tuple_percent | 0.69 +free_space | 8932 +free_percent | 1.95 + + + Here are explanations for each column: + + + + <literal>pgstattuple()</literal> column descriptions + + + + Column + Description + + + + + table_len + physical relation length in bytes + + + tuple_count + number of live tuples + + + tuple_len + total tuples length in bytes + + + tuple_percent + live tuples in % + + + dead_tuple_len + total dead tuples length in bytes + + + dead_tuple_percent + dead tuples in % + + + free_space + free space in bytes + + + free_percent + free space in % + + + +
+ + + + pgstattuple acquires only a read lock on the relation. So + concurrent update may affect the result. + + + + + pgstattuple judges a tuple is "dead" if HeapTupleSatisfiesNow() + returns false. + + + +
+ + + + + pg_relpages() returns the number of pages in the relation. + + + + + + pgstatindex() returns an array showing the information about an index: + + +test=> \x +Expanded display is on. +test=> SELECT * FROM pgstatindex('pg_cast_oid_index'); +-[ RECORD 1 ]------+------ +version | 2 +tree_level | 0 +index_size | 8192 +root_block_no | 1 +internal_pages | 0 +leaf_pages | 1 +empty_pages | 0 +deleted_pages | 0 +avg_leaf_density | 50.27 +leaf_fragmentation | 0 + + +
+
+ + + Usage + + pgstattuple may be called as a relation function and is + defined as follows: + + + CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type + AS 'MODULE_PATHNAME', 'pgstattuple' + LANGUAGE C STRICT; + + CREATE OR REPLACE FUNCTION pgstattuple(oid) RETURNS pgstattuple_type + AS 'MODULE_PATHNAME', 'pgstattuplebyid' + LANGUAGE C STRICT; + + + The argument is the relation name (optionally it may be qualified) + or the OID of the relation. Note that pgstattuple only returns + one row. + + + +
+ diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml index 35fb92d57394bb72419b0aceea399991cb88f01d..2cc4d5731309113305297eb03daada7ad514f96c 100644 --- a/doc/src/sgml/postgres.sgml +++ b/doc/src/sgml/postgres.sgml @@ -1,4 +1,4 @@ - + + seg + + + seg + + + + The seg module contains the code for the user-defined + type, SEG, representing laboratory measurements as + floating point intervals. + + + + Rationale + + The geometry of measurements is usually more complex than that of a + point in a numeric continuum. A measurement is usually a segment of + that continuum with somewhat fuzzy limits. The measurements come out + as intervals because of uncertainty and randomness, as well as because + the value being measured may naturally be an interval indicating some + condition, such as the temperature range of stability of a protein. + + + Using just common sense, it appears more convenient to store such data + as intervals, rather than pairs of numbers. In practice, it even turns + out more efficient in most applications. + + + Further along the line of common sense, the fuzziness of the limits + suggests that the use of traditional numeric data types leads to a + certain loss of information. Consider this: your instrument reads + 6.50, and you input this reading into the database. What do you get + when you fetch it? Watch: + + +test=> select 6.50 as "pH"; + pH +--- +6.5 +(1 row) + + + In the world of measurements, 6.50 is not the same as 6.5. It may + sometimes be critically different. The experimenters usually write + down (and publish) the digits they trust. 6.50 is actually a fuzzy + interval contained within a bigger and even fuzzier interval, 6.5, + with their center points being (probably) the only common feature they + share. We definitely do not want such different data items to appear the + same. + + + Conclusion? It is nice to have a special data type that can record the + limits of an interval with arbitrarily variable precision. Variable in + a sense that each data element records its own precision. + + + Check this out: + + +test=> select '6.25 .. 6.50'::seg as "pH"; + pH +------------ +6.25 .. 6.50 +(1 row) + + + + + Syntax + + The external representation of an interval is formed using one or two + floating point numbers joined by the range operator ('..' or '...'). + Optional certainty indicators (<, > and ~) are ignored by the internal + logics, but are retained in the data. + + + + Rules + + + + rule 1 + seg -> boundary PLUMIN deviation + + + rule 2 + seg -> boundary RANGE boundary + + + rule 3 + seg -> boundary RANGE + + + rule 4 + seg -> RANGE boundary + + + rule 5 + seg -> boundary + + + rule 6 + boundary -> FLOAT + + + rule 7 + boundary -> EXTENSION FLOAT + + + rule 8 + deviation -> FLOAT + + + +
+ + + Tokens + + + + RANGE + (\.\.)(\.)? + + + PLUMIN + \'\+\-\' + + + integer + [+-]?[0-9]+ + + + real + [+-]?[0-9]+\.[0-9]+ + + + FLOAT + ({integer}|{real})([eE]{integer})? + + + EXTENSION + [<>~] + + + +
+ + + Examples of valid <literal>SEG</literal> representations + + + + Any number + + (rules 5,6) -- creates a zero-length segment (a point, + if you will) + + + + ~5.0 + + (rules 5,7) -- creates a zero-length segment AND records + '~' in the data. This notation reads 'approximately 5.0', + but its meaning is not recognized by the code. It is ignored + until you get the value back. View it is a short-hand comment. + + + + <5.0 + + (rules 5,7) -- creates a point at 5.0; '<' is ignored but + is preserved as a comment + + + + >5.0 + + (rules 5,7) -- creates a point at 5.0; '>' is ignored but + is preserved as a comment + + + + 5(+-)0.35'+-'0.3 + + + (rules 1,8) -- creates an interval '4.7..5.3'. As of this + writing (02/09/2000), this mechanism isn't completely accurate + in determining the number of significant digits for the + boundaries. For example, it adds an extra digit to the lower + boundary if the resulting interval includes a power of ten: + + +postgres=> select '10(+-)1'::seg as seg; + seg +--------- +9.0 .. 11 -- should be: 9 .. 11 + + + Also, the (+-) notation is not preserved: 'a(+-)b' will + always be returned as '(a-b) .. (a+b)'. The purpose of this + notation is to allow input from certain data sources without + conversion. + + + + + 50 .. + (rule 3) -- everything that is greater than or equal to 50 + + + .. 0 + (rule 4) -- everything that is less than or equal to 0 + + + 1.5e-2 .. 2E-2 + (rule 2) -- creates an interval (0.015 .. 0.02) + + + 1 ... 2 + + The same as 1...2, or 1 .. 2, or 1..2 (space is ignored). + Because of the widespread use of '...' in the data sources, + I decided to stick to is as a range operator. This, and + also the fact that the white space around the range operator + is ignored, creates a parsing conflict with numeric constants + starting with a decimal point. + + + + +
+ + + Examples + + + + .1e7 + should be: 0.1e7 + + + .1 .. .2 + should be: 0.1 .. 0.2 + + + 2.4 E4 + should be: 2.4E4 + + + +
+ + The following, although it is not a syntax error, is disallowed to improve + the sanity of the data: + + + + + + + 5 .. 2 + should be: 2 .. 5 + + + +
+
+ + + Precision + + The segments are stored internally as pairs of 32-bit floating point + numbers. It means that the numbers with more than 7 significant digits + will be truncated. + + + The numbers with less than or exactly 7 significant digits retain their + original precision. That is, if your query returns 0.00, you will be + sure that the trailing zeroes are not the artifacts of formatting: they + reflect the precision of the original data. The number of leading + zeroes does not affect precision: the value 0.0067 is considered to + have just 2 significant digits. + + + + + Usage + + The access method for SEG is a GiST index (gist_seg_ops), which is a + generalization of R-tree. GiSTs allow the postgres implementation of + R-tree, originally encoded to support 2-D geometric types such as + boxes and polygons, to be used with any data type whose data domain + can be partitioned using the concepts of containment, intersection and + equality. In other words, everything that can intersect or contain + its own kind can be indexed with a GiST. That includes, among other + things, all geometric data types, regardless of their dimensionality + (see also contrib/cube). + + + The operators supported by the GiST access method include: + + + + +[a, b] << [c, d] Is left of + + + The left operand, [a, b], occurs entirely to the left of the + right operand, [c, d], on the axis (-inf, inf). It means, + [a, b] << [c, d] is true if b < c and false otherwise + + + + +[a, b] >> [c, d] Is right of + + + [a, b] is occurs entirely to the right of [c, d]. + [a, b] >> [c, d] is true if a > d and false otherwise + + + + +[a, b] &< [c, d] Overlaps or is left of + + + This might be better read as "does not extend to right of". + It is true when b <= d. + + + + +[a, b] &> [c, d] Overlaps or is right of + + + This might be better read as "does not extend to left of". + It is true when a >= c. + + + + +[a, b] = [c, d] Same as + + + The segments [a, b] and [c, d] are identical, that is, a == b + and c == d + + + + +[a, b] && [c, d] Overlaps + + + The segments [a, b] and [c, d] overlap. + + + + +[a, b] @> [c, d] Contains + + + The segment [a, b] contains the segment [c, d], that is, + a <= c and b >= d + + + + +[a, b] <@ [c, d] Contained in + + + The segment [a, b] is contained in [c, d], that is, + a >= c and b <= d + + + + + (Before PostgreSQL 8.2, the containment operators @> and <@ were + respectively called @ and ~. These names are still available, but are + deprecated and will eventually be retired. Notice that the old names + are reversed from the convention formerly followed by the core geometric + datatypes!) + + + Although the mnemonics of the following operators is questionable, I + preserved them to maintain visual consistency with other geometric + data types defined in Postgres. + + + Other operators: + + + +[a, b] < [c, d] Less than +[a, b] > [c, d] Greater than + + + These operators do not make a lot of sense for any practical + purpose but sorting. These operators first compare (a) to (c), + and if these are equal, compare (b) to (d). That accounts for + reasonably good sorting in most cases, which is useful if + you want to use ORDER BY with this type + + + + There are a few other potentially useful functions defined in seg.c + that vanished from the schema because I stopped using them. Some of + these were meant to support type casting. Let me know if I was wrong: + I will then add them back to the schema. I would also appreciate + other ideas that would enhance the type and make it more useful. + + + For examples of usage, see sql/seg.sql + + + NOTE: The performance of an R-tree index can largely depend on the + order of input values. It may be very helpful to sort the input table + on the SEG column (see the script sort-segments.pl for an example) + + + + + Credits + + My thanks are primarily to Prof. Joe Hellerstein + () for elucidating the + gist of the GiST (). I am + also grateful to all postgres developers, present and past, for enabling + myself to create my own world and live undisturbed in it. And I would like + to acknowledge my gratitude to Argonne Lab and to the U.S. Department of + Energy for the years of faithful support of my database research. + + + Gene Selkov, Jr. + Computational Scientist + Mathematics and Computer Science Division + Argonne National Laboratory + 9700 S Cass Ave. + Building 221 + Argonne, IL 60439-4844 + + + selkovjr@mcs.anl.gov + + + + + diff --git a/doc/src/sgml/sslinfo.sgml b/doc/src/sgml/sslinfo.sgml new file mode 100644 index 0000000000000000000000000000000000000000..828fca2591e79f3608da078820c0cc33f7b98968 --- /dev/null +++ b/doc/src/sgml/sslinfo.sgml @@ -0,0 +1,164 @@ + + + sslinfo + + + sslinfo + + + + This modules provides information about current SSL certificate for PostgreSQL. + + + + Notes + + This extension won't build unless your PostgreSQL server is configured + with --with-openssl. Information provided with these functions would + be completely useless if you don't use SSL to connect to database. + + + + + Functions Description + + + + +ssl_is_used() RETURNS boolean; + + + Returns TRUE, if current connection to server uses SSL and FALSE + otherwise. + + + + + +ssl_client_cert_present() RETURNS boolean + + + Returns TRUE if current client have presented valid SSL client + certificate to the server and FALSE otherwise (e.g., no SSL, + certificate hadn't be requested by server). + + + + + +ssl_client_serial() RETURNS numeric + + + Returns serial number of current client certificate. The combination + of certificate serial number and certificate issuer is guaranteed to + uniquely identify certificate (but not its owner -- the owner ought to + regularily change his keys, and get new certificates from the issuer). + + + So, if you run you own CA and allow only certificates from this CA to + be accepted by server, the serial number is the most reliable (albeit + not very mnemonic) means to indentify user. + + + + + +ssl_client_dn() RETURNS text + + + Returns the full subject of current client certificate, converting + character data into the current database encoding. It is assumed that + if you use non-Latin characters in the certificate names, your + database is able to represent these characters, too. If your database + uses the SQL_ASCII encoding, non-Latin characters in the name will be + represented as UTF-8 sequences. + + + The result looks like '/CN=Somebody /C=Some country/O=Some organization'. + + + + + +ssl_issuer_dn() + + + Returns the full issuer name of the client certificate, converting + character data into current database encoding. + + + The combination of the return value of this function with the + certificate serial number uniquely identifies the certificate. + + + The result of this function is really useful only if you have more + than one trusted CA certificate in your server's root.crt file, or if + this CA has issued some intermediate certificate authority + certificates. + + + + + +ssl_client_dn_field(fieldName text) RETURNS text + + + This function returns the value of the specified field in the + certificate subject. Field names are string constants that are + converted into ASN1 object identificators using the OpenSSL object + database. The following values are acceptable: + + +commonName (alias CN) +surname (alias SN) +name +givenName (alias GN) +countryName (alias C) +localityName (alias L) +stateOrProvinceName (alias ST) +organizationName (alias O) +organizationUnitName (alias OU) +title +description +initials +postalCode +streetAddress +generationQualifier +description +dnQualifier +x500UniqueIdentifier +pseudonim +role +emailAddress + + + All of these fields are optional, except commonName. It depends + entirely on your CA policy which of them would be included and which + wouldn't. The meaning of these fields, howeer, is strictly defined by + the X.500 and X.509 standards, so you cannot just assign arbitrary + meaning to them. + + + + + +ssl_issuer_field(fieldName text) RETURNS text; + + + Does same as ssl_client_dn_field, but for the certificate issuer + rather than the certificate subject. + + + + + + + Author + + Victor Wagner vitus@cryptocom.ru, Cryptocom LTD + E-Mail of Cryptocom OpenSSL development group: + openssl@cryptocom.ru + + + + diff --git a/doc/src/sgml/standby.sgml b/doc/src/sgml/standby.sgml new file mode 100644 index 0000000000000000000000000000000000000000..120fed4c2c0ade6ddec7141f061ace4752c7f711 --- /dev/null +++ b/doc/src/sgml/standby.sgml @@ -0,0 +1,249 @@ + + + pg_standby + + + pgstandby + + + + pg_standby is a production-ready program that can be used + to create a Warm Standby server. Other configuration is required as well, + all of which is described in the main server manual. + + + The program is designed to be a wait-for restore_command, + required to turn a normal archive recovery into a Warm Standby. Within the + restore_command of the recovery.conf + you could configure pg_standby in the following way: + + + restore_command = 'pg_standby archiveDir %f %p' + + + which would be sufficient to define that files will be restored from + archiveDir. + + + + pg_standby features include: + + + + + It is written in C. So it is very portable + and easy to install. + + + + + Supports copy or link from a directory (only) + + + + + Source easy to modify, with specifically designated + sections to modify for your own needs, allowing + interfaces to be written for additional Backup Archive Restore + (BAR) systems + + + + + Already tested on Linux and Windows + + + + + + Usage + + pg_standby should be used within the + restore_command of the recovery.conf + file. + + + The basic usage should be like this: + + + restore_command = 'pg_standby archiveDir %f %p' + + + with the pg_standby command usage as + + + pg_standby [OPTION]... [ARCHIVELOCATION] [NEXTWALFILE] [XLOGFILEPATH] + + + When used within the restore_command the %f and %p macros + will provide the actual file and path required for the restore/recovery. + + + + Options + + + + -c + use copy/cp command to restore WAL files from archive + + + -d + debug/logging option. + + + -k numfiles + + + Cleanup files in the archive so that we maintain no more + than this many files in the archive. + + + You should be wary against setting this number too low, + since this may mean you cannot restart the standby. This + is because the last restartpoint marked in the WAL files + may be many files in the past and can vary considerably. + This should be set to a value exceeding the number of WAL + files that can be recovered in 2*checkpoint_timeout seconds, + according to the value in the warm standby postgresql.conf. + It is wholly unrelated to the setting of checkpoint_segments + on either primary or standby. + + + If in doubt, use a large value or do not set a value at all. + + + + + -l + + + use ln command to restore WAL files from archive + WAL files will remain in archive + + + Link is more efficient, but the default is copy to + allow you to maintain the WAL archive for recovery + purposes as well as high-availability. + + + This option uses the Windows Vista command mklink + to provide a file-to-file symbolic link. -l will + not work on versions of Windows prior to Vista. + Use the -c option instead. + see + + + + + -r maxretries + + + the maximum number of times to retry the restore command if it + fails. After each failure, we wait for sleeptime * num_retries + so that the wait time increases progressively, so by default + we will wait 5 secs, 10 secs then 15 secs before reporting + the failure back to the database server. This will be + interpreted as and end of recovery and the Standby will come + up fully as a result. Default=3 + + + + + -s sleeptime + + the number of seconds to sleep between testing to see + if the file to be restored is available in the archive yet. + The default setting is not necessarily recommended, + consult the main database server manual for discussion. + Default=5 + + + + -t triggerfile + + the presence of the triggerfile will cause recovery to end + whether or not the next file is available + It is recommended that you use a structured filename to + avoid confusion as to which server is being triggered + when multiple servers exist on same system. + e.g. /tmp/pgsql.trigger.5432 + + + + -w maxwaittime + + the maximum number of seconds to wait for the next file, + after which recovery will end and the Standby will come up. + The default setting is not necessarily recommended, + consult the main database server manual for discussion. + Default=0 + + + + +
+ + + --help is not supported since + pg_standby is not intended for interactive use, except + during development and testing. + + +
+ + + Examples + + + + Example on Linux + +archive_command = 'cp %p ../archive/%f' + +restore_command = 'pg_standby -l -d -k 255 -r 2 -s 2 -w 0 -t /tmp/pgsql.trigger.5442 $PWD/../archive %f %p 2>> standby.log' + + + which will + + + use a ln command to restore WAL files from archive + produce logfile output in standby.log + keep the last 255 full WAL files, plus the current one + sleep for 2 seconds between checks for next WAL file is full + never timeout if file not found + stop waiting when a trigger file called /tmp.pgsql.trigger.5442 appears + + + + + + Example on Windows + + +archive_command = 'copy %p ..\\archive\\%f' + + + Note that backslashes need to be doubled in the archive_command, but + *not* in the restore_command, in 8.2, 8.1, 8.0 on Windows. + + +restore_command = 'pg_standby -c -d -s 5 -w 0 -t C:\pgsql.trigger.5442 + ..\archive %f %p 2>> standby.log' + + + which will + + + use a copy command to restore WAL files from archive + produce logfile output in standby.log + sleep for 5 seconds between checks for next WAL file is full + never timeout if file not found + stop waiting when a trigger file called C:\pgsql.trigger.5442 appears + + + + + +
+ diff --git a/doc/src/sgml/tablefunc.sgml b/doc/src/sgml/tablefunc.sgml new file mode 100644 index 0000000000000000000000000000000000000000..23d12b558040789e200f855733a987760af6a280 --- /dev/null +++ b/doc/src/sgml/tablefunc.sgml @@ -0,0 +1,765 @@ + + + tablefunc + + + tablefunc + + + + tablefunc provides functions to convert query rows into fields. + + + Functions + + + + + + Function + Returns + Comments + + + + + + + normal_rand(int numvals, float8 mean, float8 stddev) + + + + returns a set of normally distributed float8 values + + + + + crosstabN(text sql) + returns a set of row_name plus N category value columns + + crosstab2(), crosstab3(), and crosstab4() are defined for you, + but you can create additional crosstab functions per the instructions + in the documentation below. + + + + crosstab(text sql) + returns a set of row_name plus N category value columns + + requires anonymous composite type syntax in the FROM clause. See + the instructions in the documentation below. + + + + crosstab(text sql, N int) + + + obsolete version of crosstab() + + the argument N is now ignored, since the number of value columns + is always determined by the calling query + + + + + + + connectby(text relname, text keyid_fld, text parent_keyid_fld + [, text orderby_fld], text start_with, int max_depth + [, text branch_delim]) + + + + returns keyid, parent_keyid, level, and an optional branch string + and an optional serial column for ordering siblings + + + requires anonymous composite type syntax in the FROM clause. See + the instructions in the documentation below. + + + + +
+ + + <literal>normal_rand</literal> + +normal_rand(int numvals, float8 mean, float8 stddev) RETURNS SETOF float8 + + + Where numvals is the number of values to be returned + from the function. mean is the mean of the normal + distribution of values and stddev is the standard + deviation of the normal distribution of values. + + + Returns a float8 set of random values normally distributed (Gaussian + distribution). + + + Example: + + + test=# SELECT * FROM + test=# normal_rand(1000, 5, 3); + normal_rand +---------------------- + 1.56556322244898 + 9.10040991424657 + 5.36957140345079 + -0.369151492880995 + 0.283600703686639 + . + . + . + 4.82992125404908 + 9.71308014517282 + 2.49639286969028 +(1000 rows) + + + Returns 1000 values with a mean of 5 and a standard deviation of 3. + + + + + + <literal>crosstabN(text sql)</literal> + +crosstabN(text sql) + + + The sql parameter is a SQL statement which produces the + source set of data. The SQL statement must return one row_name column, one + category column, and one value column. row_name and + value must be of type text. The function returns a set of + row_name plus N category value columns. + + + Provided sql must produce a set something like: + + +row_name cat value +---------+-------+------- + row1 cat1 val1 + row1 cat2 val2 + row1 cat3 val3 + row1 cat4 val4 + row2 cat1 val5 + row2 cat2 val6 + row2 cat3 val7 + row2 cat4 val8 + + + The returned value is a SETOF table_crosstab_N, which + is defined by: + + +CREATE TYPE tablefunc_crosstab_N AS ( + row_name TEXT, + category_1 TEXT, + category_2 TEXT, + . + . + . + category_N TEXT +); + + + for the default installed functions, where N is 2, 3, or 4. + + + e.g. the provided crosstab2 function produces a set something like: + + + <== values columns ==> + row_name category_1 category_2 + ---------+------------+------------ + row1 val1 val2 + row2 val5 val6 + + + + The sql result must be ordered by 1,2. + + + The number of values columns depends on the tuple description + of the function's declared return type. + + + + + Missing values (i.e. not enough adjacent rows of same row_name to + fill the number of result values columns) are filled in with nulls. + + + + + Extra values (i.e. too many adjacent rows of same row_name to fill + the number of result values columns) are skipped. + + + + + Rows with all nulls in the values columns are skipped. + + + + + The installed defaults are for illustration purposes. You + can create your own return types and functions based on the + crosstab() function of the installed library. See below for + details. + + + + + + Example: + + +create table ct(id serial, rowclass text, rowid text, attribute text, value text); +insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att1','val1'); +insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att2','val2'); +insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att3','val3'); +insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att4','val4'); +insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att1','val5'); +insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att2','val6'); +insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att3','val7'); +insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att4','val8'); + +select * from crosstab3( + 'select rowid, attribute, value + from ct + where rowclass = ''group1'' + and (attribute = ''att2'' or attribute = ''att3'') order by 1,2;'); + + row_name | category_1 | category_2 | category_3 +----------+------------+------------+------------ + test1 | val2 | val3 | + test2 | val6 | val7 | +(2 rows) + + + + + <literal>crosstab(text)</literal> + +crosstab(text sql) +crosstab(text sql, int N) + + + The sql parameter is a SQL statement which produces the + source set of data. The SQL statement must return one + row_name column, one category column, + and one value column. N is an + obsolete argument; ignored if supplied (formerly this had to match the + number of category columns determined by the calling query). + + + + + e.g. provided sql must produce a set something like: + + + row_name cat value +----------+-------+------- + row1 cat1 val1 + row1 cat2 val2 + row1 cat3 val3 + row1 cat4 val4 + row2 cat1 val5 + row2 cat2 val6 + row2 cat3 val7 + row2 cat4 val8 + + + Returns a SETOF RECORD, which must be defined with a + column definition in the FROM clause of the SELECT statement, e.g.: + + + SELECT * + FROM crosstab(sql) AS ct(row_name text, category_1 text, category_2 text); + + + the example crosstab function produces a set something like: + + + <== values columns ==> +row_name category_1 category_2 + ---------+------------+------------ + row1 val1 val2 + row2 val5 val6 + + + Note that it follows these rules: + + + The sql result must be ordered by 1,2. + + + The number of values columns is determined by the column definition + provided in the FROM clause. The FROM clause must define one + row_name column (of the same datatype as the first result column + of the sql query) followed by N category columns (of the same + datatype as the third result column of the sql query). You can + set up as many category columns as you wish. + + + + + Missing values (i.e. not enough adjacent rows of same row_name to + fill the number of result values columns) are filled in with nulls. + + + + + Extra values (i.e. too many adjacent rows of same row_name to fill + the number of result values columns) are skipped. + + + + + Rows with all nulls in the values columns are skipped. + + + + + You can avoid always having to write out a FROM clause that defines the + output columns by setting up a custom crosstab function that has + the desired output row type wired into its definition. + + + + + There are two ways you can set up a custom crosstab function: + + + + + Create a composite type to define your return type, similar to the + examples in the installation script. Then define a unique function + name accepting one text parameter and returning setof your_type_name. + For example, if your source data produces row_names that are TEXT, + and values that are FLOAT8, and you want 5 category columns: + + + CREATE TYPE my_crosstab_float8_5_cols AS ( + row_name TEXT, + category_1 FLOAT8, + category_2 FLOAT8, + category_3 FLOAT8, + category_4 FLOAT8, + category_5 FLOAT8 + ); + + CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(text) + RETURNS setof my_crosstab_float8_5_cols + AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT; + + + + + Use OUT parameters to define the return type implicitly. + The same example could also be done this way: + + + CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(IN text, + OUT row_name TEXT, + OUT category_1 FLOAT8, + OUT category_2 FLOAT8, + OUT category_3 FLOAT8, + OUT category_4 FLOAT8, + OUT category_5 FLOAT8) + RETURNS setof record + AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT; + + + + + Example: + + +CREATE TABLE ct(id SERIAL, rowclass TEXT, rowid TEXT, attribute TEXT, value TEXT); +INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att1','val1'); +INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att2','val2'); +INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att3','val3'); +INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att4','val4'); +INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att1','val5'); +INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att2','val6'); +INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att3','val7'); +INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att4','val8'); + +SELECT * +FROM crosstab( + 'select rowid, attribute, value + from ct + where rowclass = ''group1'' + and (attribute = ''att2'' or attribute = ''att3'') order by 1,2;', 3) +AS ct(row_name text, category_1 text, category_2 text, category_3 text); + + row_name | category_1 | category_2 | category_3 +----------+------------+------------+------------ + test1 | val2 | val3 | + test2 | val6 | val7 | +(2 rows) + + + + + + <literal>crosstab(text, text)</literal> + +crosstab(text source_sql, text category_sql) + + + + Where source_sql is a SQL statement which produces the + source set of data. The SQL statement must return one + row_name column, one category column, + and one value column. It may also have one or more + extra columns. + + + The row_name column must be first. The + category and value columns must be + the last two columns, in that order. extra columns must + be columns 2 through (N - 2), where N is the total number of columns. + + + The extra columns are assumed to be the same for all + rows with the same row_name. The values returned are + copied from the first row with a given row_name and + subsequent values of these columns are ignored until + row_name changes. + + + e.g. source_sql must produce a set something like: + + + SELECT row_name, extra_col, cat, value FROM foo; + + row_name extra_col cat value + ----------+------------+-----+--------- + row1 extra1 cat1 val1 + row1 extra1 cat2 val2 + row1 extra1 cat4 val4 + row2 extra2 cat1 val5 + row2 extra2 cat2 val6 + row2 extra2 cat3 val7 + row2 extra2 cat4 val8 + + + + category_sql has to be a SQL statement which produces + the distinct set of categories. The SQL statement must return one category + column only. category_sql must produce at least one + result row or an error will be generated. category_sql + must not produce duplicate categories or an error will be generated. e.g.: + + +SELECT DISTINCT cat FROM foo; + cat + ------- + cat1 + cat2 + cat3 + cat4 + + + The function returns SETOF RECORD, which must be defined + with a column definition in the FROM clause of the SELECT statement, e.g.: + + + SELECT * FROM crosstab(source_sql, cat_sql) + AS ct(row_name text, extra text, cat1 text, cat2 text, cat3 text, cat4 text); + + + the example crosstab function produces a set something like: + + + <== values columns ==> + row_name extra cat1 cat2 cat3 cat4 + ---------+-------+------+------+------+------ + row1 extra1 val1 val2 val4 + row2 extra2 val5 val6 val7 val8 + + + Note that it follows these rules: + + + source_sql must be ordered by row_name (column 1). + + + The number of values columns is determined at run-time. The + column definition provided in the FROM clause must provide for + the correct number of columns of the proper data types. + + + + + Missing values (i.e. not enough adjacent rows of same row_name to + fill the number of result values columns) are filled in with nulls. + + + + + Extra values (i.e. source rows with category not found in category_sql + result) are skipped. + + + + + Rows with a null row_name column are skipped. + + + + + You can create predefined functions to avoid having to write out + the result column names/types in each query. See the examples + for crosstab(text). + + + + + +CREATE TABLE cth(id serial, rowid text, rowdt timestamp, attribute text, val text); +INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','temperature','42'); +INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','test_result','PASS'); +INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','volts','2.6987'); +INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','temperature','53'); +INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','test_result','FAIL'); +INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','test_startdate','01 March 2003'); +INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','volts','3.1234'); + +SELECT * FROM crosstab +( + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1', + 'SELECT DISTINCT attribute FROM cth ORDER BY 1' +) +AS +( + rowid text, + rowdt timestamp, + temperature int4, + test_result text, + test_startdate timestamp, + volts float8 +); + rowid | rowdt | temperature | test_result | test_startdate | volts +-------+--------------------------+-------------+-------------+--------------------------+-------- + test1 | Sat Mar 01 00:00:00 2003 | 42 | PASS | | 2.6987 + test2 | Sun Mar 02 00:00:00 2003 | 53 | FAIL | Sat Mar 01 00:00:00 2003 | 3.1234 +(2 rows) + + + + + <literal>connectby(text, text, text[, text], text, text, int[, text])</literal> + + +connectby(text relname, text keyid_fld, text parent_keyid_fld + [, text orderby_fld], text start_with, int max_depth + [, text branch_delim]) + + + <literal>connectby</literal> parameters + + + + Parameter + Description + + + + + relname + Name of the source relation + + + keyid_fld + Name of the key field + + + parent_keyid_fld + Name of the key_parent field + + + orderby_fld + + If optional ordering of siblings is desired: Name of the field to + order siblings + + + + start_with + + Root value of the tree input as a text value regardless of + keyid_fld + + + + max_depth + + Zero (0) for unlimited depth, otherwise restrict level to this depth + + + + branch_delim + + If optional branch value is desired, this string is used as the delimiter. + When not provided, a default value of '~' is used for internal + recursion detection only, and no "branch" field is returned. + + + + +
+ + The function returns SETOF RECORD, which must defined + with a column definition in the FROM clause of the SELECT statement, e.g.: + + + SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~') + AS t(keyid text, parent_keyid text, level int, branch text); + + + or + + + SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0) + AS t(keyid text, parent_keyid text, level int); + + + or + + + SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~') + AS t(keyid text, parent_keyid text, level int, branch text, pos int); + + + or + + + SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0) + AS t(keyid text, parent_keyid text, level int, pos int); + + + Note that it follows these rules: + + + keyid and parent_keyid must be the same data type + + + The column definition *must* include a third column of type INT4 for + the level value output + + + + + If the branch field is not desired, omit both the branch_delim input + parameter *and* the branch field in the query column definition. Note + that when branch_delim is not provided, a default value of '~' is used + for branch_delim for internal recursion detection, even though the branch + field is not returned. + + + + + If the branch field is desired, it must be the fourth column in the query + column definition, and it must be type TEXT. + + + + + The parameters representing table and field names must include double + quotes if the names are mixed-case or contain special characters. + + + + + If sorting of siblings is desired, the orderby_fld input parameter *and* + a name for the resulting serial field (type INT32) in the query column + definition must be given. + + + + + Example: + + +CREATE TABLE connectby_tree(keyid text, parent_keyid text, pos int); + +INSERT INTO connectby_tree VALUES('row1',NULL, 0); +INSERT INTO connectby_tree VALUES('row2','row1', 0); +INSERT INTO connectby_tree VALUES('row3','row1', 0); +INSERT INTO connectby_tree VALUES('row4','row2', 1); +INSERT INTO connectby_tree VALUES('row5','row2', 0); +INSERT INTO connectby_tree VALUES('row6','row4', 0); +INSERT INTO connectby_tree VALUES('row7','row3', 0); +INSERT INTO connectby_tree VALUES('row8','row6', 0); +INSERT INTO connectby_tree VALUES('row9','row5', 0); + +-- with branch, without orderby_fld +SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~') + AS t(keyid text, parent_keyid text, level int, branch text); + keyid | parent_keyid | level | branch +-------+--------------+-------+--------------------- + row2 | | 0 | row2 + row4 | row2 | 1 | row2~row4 + row6 | row4 | 2 | row2~row4~row6 + row8 | row6 | 3 | row2~row4~row6~row8 + row5 | row2 | 1 | row2~row5 + row9 | row5 | 2 | row2~row5~row9 +(6 rows) + +-- without branch, without orderby_fld +SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0) + AS t(keyid text, parent_keyid text, level int); + keyid | parent_keyid | level +-------+--------------+------- + row2 | | 0 + row4 | row2 | 1 + row6 | row4 | 2 + row8 | row6 | 3 + row5 | row2 | 1 + row9 | row5 | 2 +(6 rows) + +-- with branch, with orderby_fld (notice that row5 comes before row4) +SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~') + AS t(keyid text, parent_keyid text, level int, branch text, pos int) ORDER BY t.pos; + keyid | parent_keyid | level | branch | pos +-------+--------------+-------+---------------------+----- + row2 | | 0 | row2 | 1 + row5 | row2 | 1 | row2~row5 | 2 + row9 | row5 | 2 | row2~row5~row9 | 3 + row4 | row2 | 1 | row2~row4 | 4 + row6 | row4 | 2 | row2~row4~row6 | 5 + row8 | row6 | 3 | row2~row4~row6~row8 | 6 +(6 rows) + +-- without branch, with orderby_fld (notice that row5 comes before row4) +SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0) + AS t(keyid text, parent_keyid text, level int, pos int) ORDER BY t.pos; + keyid | parent_keyid | level | pos +-------+--------------+-------+----- + row2 | | 0 | 1 + row5 | row2 | 1 | 2 + row9 | row5 | 2 | 3 + row4 | row2 | 1 | 4 + row6 | row4 | 2 | 5 + row8 | row6 | 3 | 6 +(6 rows) + +
+
+ + Author + + Joe Conway + + +
+ diff --git a/doc/src/sgml/trgm.sgml b/doc/src/sgml/trgm.sgml new file mode 100644 index 0000000000000000000000000000000000000000..62a5e30382fb63f9e84648457fcbef5347f5accc --- /dev/null +++ b/doc/src/sgml/trgm.sgml @@ -0,0 +1,214 @@ + + pg_trgm + + + pgtrgm + + + + The pg_trgm module provides functions and index classes + for determining the similarity of text based on trigram matching. + + + + Trigram (or Trigraph) + + A trigram is a set of three consecutive characters taken + from a string. A string is considered to have two spaces + prefixed and one space suffixed when determining the set + of trigrams that comprise the string. + + + eg. The set of trigrams in the word "cat" is " c", " ca", + "at " and "cat". + + + + + Public Functions + + <literal>pg_trgm</literal> functions + + + + Function + Description + + + + + real similarity(text, text) + + + Returns a number that indicates how closely matches the two + arguments are. A zero result indicates that the two words + are completely dissimilar, and a result of one indicates that + the two words are identical. + + + + + real show_limit() + + + Returns the current similarity threshold used by the '%' + operator. This in effect sets the minimum similarity between + two words in order that they be considered similar enough to + be misspellings of each other, for example. + + + + + real set_limit(real) + + + Sets the current similarity threshold that is used by the '%' + operator, and is returned by the show_limit() function. + + + + + text[] show_trgm(text) + + + Returns an array of all the trigrams of the supplied text + parameter. + + + + + Operator: text % text (returns boolean) + + + The '%' operator returns TRUE if its two arguments have a similarity + that is greater than the similarity threshold set by set_limit(). It + will return FALSE if the similarity is less than the current + threshold. + + + + + +
+
+ + + Public Index Operator Class + + The pg_trgm module comes with the + gist_trgm_ops index operator class that allows a + developer to create an index over a text column for the purpose + of very fast similarity searches. + + + To use this index, the '%' operator must be used and an appropriate + similarity threshold for the application must be set. Example: + + +CREATE TABLE test_trgm (t text); +CREATE INDEX trgm_idx ON test_trgm USING gist (t gist_trgm_ops); + + + At this point, you will have an index on the t text column that you + can use for similarity searching. Example: + + +SELECT + t, + similarity(t, 'word') AS sml +FROM + test_trgm +WHERE + t % 'word' +ORDER BY + sml DESC, t; + + + This will return all values in the text column that are sufficiently + similar to 'word', sorted from best match to worst. The index will + be used to make this a fast operation over very large data sets. + + + + + Tsearch2 Integration + + Trigram matching is a very useful tool when used in conjunction + with a text index created by the Tsearch2 contrib module. (See + contrib/tsearch2) + + + The first step is to generate an auxiliary table containing all + the unique words in the Tsearch2 index: + + +CREATE TABLE words AS SELECT word FROM + stat('SELECT to_tsvector(''simple'', bodytext) FROM documents'); + + + Where 'documents' is a table that has a text field 'bodytext' + that TSearch2 is used to search. The use of the 'simple' dictionary + with the to_tsvector function, instead of just using the already + existing vector is to avoid creating a list of already stemmed + words. This way, only the original, unstemmed words are added + to the word list. + + + Next, create a trigram index on the word column: + + +CREATE INDEX words_idx ON words USING gist(word gist_trgm_ops); + + + or + + +CREATE INDEX words_idx ON words USING gin(word gist_trgm_ops); + + + Now, a SELECT query similar to the example above can be + used to suggest spellings for misspelled words in user search terms. A + useful extra clause is to ensure that the similar words are also + of similar length to the misspelled word. + + + + + Since the 'words' table has been generated as a separate, + static table, it will need to be periodically regenerated so that + it remains up to date with the word list in the Tsearch2 index. + + + + + + + References + + Tsearch2 Development Site + + + + GiST Development Site + + + + + + Authors + + Oleg Bartunov oleg@sai.msu.su, Moscow, Moscow University, Russia + + + Teodor Sigaev teodor@sigaev.ru, Moscow, Delta-Soft Ltd.,Russia + + + Documentation: Christopher Kings-Lynne + + + This module is sponsored by Delta-Soft Ltd., Moscow, Russia. + + + +
+ diff --git a/doc/src/sgml/uuid-ossp.sgml b/doc/src/sgml/uuid-ossp.sgml new file mode 100644 index 0000000000000000000000000000000000000000..93e6c0faeaca5fa6062e1736a032468732861ece --- /dev/null +++ b/doc/src/sgml/uuid-ossp.sgml @@ -0,0 +1,163 @@ + + + uuid-ossp + + + uuid-ossp + + + + This module provides functions to generate universally unique + identifiers (UUIDs) using one of the several standard algorithms, as + well as functions to produce certain special UUID constants. + + + + UUID Generation + + The relevant standards ITU-T Rec. X.667, ISO/IEC 9834-8:2005, and RFC + 4122 specify four algorithms for generating UUIDs, identified by the + version numbers 1, 3, 4, and 5. (There is no version 2 algorithm.) + Each of these algorithms could be suitable for a different set of + applications. + + + + <literal>uuid-ossp</literal> functions + + + + Function + Description + + + + + uuid_generate_v1() + + + This function generates a version 1 UUID. This involves the MAC + address of the computer and a time stamp. Note that UUIDs of this + kind reveal the identity of the computer that created the identifier + and the time at which it did so, which might make it unsuitable for + certain security-sensitive applications. + + + + + uuid_generate_v1mc() + + + This function generates a version 1 UUID but uses a random multicast + MAC address instead of the real MAC address of the computer. + + + + + uuid_generate_v3(namespace uuid, name text) + + + This function generates a version 3 UUID in the given namespace using + the specified input name. The namespace should be one of the special + constants produced by the uuid_ns_*() functions shown below. (It + could be any UUID in theory.) The name is an identifier in the + selected namespace. For example: + + + + + uuid_generate_v3(uuid_ns_url(), 'http://www.postgresql.org') + + + The name parameter will be MD5-hashed, so the cleartext cannot be + derived from the generated UUID. + + + The generation of UUIDs by this method has no random or + environment-dependent element and is therefore reproducible. + + + + + uuid_generate_v4() + + + This function generates a version 4 UUID, which is derived entirely + from random numbers. + + + + + uuid_generate_v5(namespace uuid, name text) + + + This function generates a version 5 UUID, which works like a version 3 + UUID except that SHA-1 is used as a hashing method. Version 5 should + be preferred over version 3 because SHA-1 is thought to be more secure + than MD5. + + + + + +
+ + + UUID Constants + + + + uuid_nil() + + + A "nil" UUID constant, which does not occur as a real UUID. + + + + + uuid_ns_dns() + + + Constant designating the DNS namespace for UUIDs. + + + + + uuid_ns_url() + + + Constant designating the URL namespace for UUIDs. + + + + + uuid_ns_oid() + + + Constant designating the ISO object identifier (OID) namespace for + UUIDs. (This pertains to ASN.1 OIDs, unrelated to the OIDs used in + PostgreSQL.) + + + + + uuid_ns_x500() + + + Constant designating the X.500 distinguished name (DN) namespace for + UUIDs. + + + + + +
+
+ + Author + + Peter Eisentraut peter_e@gmx.net + + +
+ diff --git a/doc/src/sgml/vacuumlo.sgml b/doc/src/sgml/vacuumlo.sgml new file mode 100644 index 0000000000000000000000000000000000000000..28219d2b257cf8443a4069a380c6f12944857573 --- /dev/null +++ b/doc/src/sgml/vacuumlo.sgml @@ -0,0 +1,74 @@ + + vacuumlo + + + vacuumlo + + + + This is a simple utility that will remove any orphaned large objects out of a + PostgreSQL database. An orphaned LO is considered to be any LO whose OID + does not appear in any OID data column of the database. + + + If you use this, you may also be interested in the lo_manage trigger in + contrib/lo. lo_manage is useful to try to avoid creating orphaned LOs + in the first place. + + + + + It was decided to place this in contrib as it needs further testing, but hopefully, + this (or a variant of it) would make it into the backend as a "vacuum lo" + command in a later release. + + + + + + Usage + +vacuumlo [options] database [database2 ... databasen] + + + All databases named on the command line are processed. Available options + include: + + +-v Write a lot of progress messages +-n Don't remove large objects, just show what would be done +-U username Username to connect as +-W Prompt for password +-h hostname Database server host +-p port Database server port + + + + + Method + + First, it builds a temporary table which contains all of the OIDs of the + large objects in that database. + + + It then scans through all columns in the database that are of type "oid" + or "lo", and removes matching entries from the temporary table. + + + The remaining entries in the temp table identify orphaned LOs. These are + removed. + + + + + Author + + Peter Mount peter@retep.org.uk + + + + + + + + diff --git a/doc/src/sgml/xml2.sgml b/doc/src/sgml/xml2.sgml new file mode 100644 index 0000000000000000000000000000000000000000..d73789a155d193e39e1317642e4ca07ffdf4e46b --- /dev/null +++ b/doc/src/sgml/xml2.sgml @@ -0,0 +1,436 @@ + + xml2: XML-handling functions + + + xml2 + + + + Deprecation notice + + From PostgreSQL 8.3 on, there is XML-related + functionality based on the SQL/XML standard in the core server. + That functionality covers XML syntax checking and XPath queries, + which is what this module does as well, and more, but the API is + not at all compatible. It is planned that this module will be + removed in PostgreSQL 8.4 in favor of the newer standard API, so + you are encouraged to try converting your applications. If you + find that some of the functionality of this module is not + available in an adequate form with the newer API, please explain + your issue to pgsql-hackers@postgresql.org so that the deficiency + can be addressed. + + + + + Description of functions + + The first set of functions are straightforward XML parsing and XPath queries: + + + + Functions + + + + + + xml_is_well_formed(document) RETURNS bool + + + + + This parses the document text in its parameter and returns true if the + document is well-formed XML. (Note: before PostgreSQL 8.2, this function + was called xml_valid(). That is the wrong name since validity and + well-formedness have different meanings in XML. The old name is still + available, but is deprecated and will be removed in 8.3.) + + + + + + + xpath_string(document,query) RETURNS text + xpath_number(document,query) RETURNS float4 + xpath_bool(document,query) RETURNS bool + + + + + These functions evaluate the XPath query on the supplied document, and + cast the result to the specified type. + + + + + + + xpath_nodeset(document,query,toptag,itemtag) RETURNS text + + + + + This evaluates query on document and wraps the result in XML tags. If + the result is multivalued, the output will look like: + + + <toptag> + <itemtag>Value 1 which could be an XML fragment</itemtag> + <itemtag>Value 2....</itemtag> + </toptag> + + + If either toptag or itemtag is an empty string, the relevant tag is omitted. + + + + + + + xpath_nodeset(document,query) RETURNS + + + + + Like xpath_nodeset(document,query,toptag,itemtag) but text omits both tags. + + + + + + + xpath_nodeset(document,query,itemtag) RETURNS + + + + + Like xpath_nodeset(document,query,toptag,itemtag) but text omits toptag. + + + + + + + xpath_list(document,query,seperator) RETURNS text + + + + + This function returns multiple values seperated by the specified + seperator, e.g. Value 1,Value 2,Value 3 if seperator=','. + + + + + + + xpath_list(document,query) RETURNS text + + + + This is a wrapper for the above function that uses ',' as the seperator. + + + + +
+
+ + + + <literal>xpath_table</literal> + + This is a table function which evaluates a set of XPath queries on + each of a set of documents and returns the results as a table. The + primary key field from the original document table is returned as the + first column of the result so that the resultset from xpath_table can + be readily used in joins. + + + The function itself takes 5 arguments, all text. + + + xpath_table(key,document,relation,xpaths,criteria) + + + Parameters + + + + key + + + the name of the "key" field - this is just a field to be used as + the first column of the output table i.e. it identifies the record from + which each output row came (see note below about multiple values). + + + + + document + + + the name of the field containing the XML document + + + + + relation + + + the name of the table or view containing the documents + + + + + xpaths + + + multiple xpath expressions separated by | + + + + + criteria + + + The contents of the where clause. This needs to be specified, + so use "true" or "1=1" here if you want to process all the rows in the + relation. + + + + + +
+ + + NB These parameters (except the XPath strings) are just substituted + into a plain SQL SELECT statement, so you have some flexibility - the + statement is + + + + + SELECT <key>,<document> FROM <relation> WHERE <criteria> + + + + + so those parameters can be *anything* valid in those particular + locations. The result from this SELECT needs to return exactly two + columns (which it will unless you try to list multiple fields for key + or document). Beware that this simplistic approach requires that you + validate any user-supplied values to avoid SQL injection attacks. + + + + Using the function + + + + The function has to be used in a FROM expression. This gives the following + form: + + + +SELECT * FROM +xpath_table('article_id', + 'article_xml', + 'articles', + '/article/author|/article/pages|/article/title', + 'date_entered > ''2003-01-01'' ') +AS t(article_id integer, author text, page_count integer, title text); + + + + The AS clause defines the names and types of the columns in the + virtual table. If there are more XPath queries than result columns, + the extra queries will be ignored. If there are more result columns + than XPath queries, the extra columns will be NULL. + + + + Note that I've said in this example that pages is an integer. The + function deals internally with string representations, so when you say + you want an integer in the output, it will take the string + representation of the XPath result and use PostgreSQL input functions + to transform it into an integer (or whatever type the AS clause + requests). An error will result if it can't do this - for example if + the result is empty - so you may wish to just stick to 'text' as the + column type if you think your data has any problems. + + + The select statement doesn't need to use * alone - it can reference the + columns by name or join them to other tables. The function produces a + virtual table with which you can perform any operation you wish (e.g. + aggregation, joining, sorting etc). So we could also have: + + + +SELECT t.title, p.fullname, p.email +FROM xpath_table('article_id','article_xml','articles', + '/article/title|/article/author/@id', + 'xpath_string(article_xml,''/article/@date'') > ''2003-03-20'' ') + AS t(article_id integer, title text, author_id integer), + tblPeopleInfo AS p +WHERE t.author_id = p.person_id; + + + + as a more complicated example. Of course, you could wrap all + of this in a view for convenience. + + + Multivalued results + + The xpath_table function assumes that the results of each XPath query + might be multi-valued, so the number of rows returned by the function + may not be the same as the number of input documents. The first row + returned contains the first result from each query, the second row the + second result from each query. If one of the queries has fewer values + than the others, NULLs will be returned instead. + + + In some cases, a user will know that a given XPath query will return + only a single result (perhaps a unique document identifier) - if used + alongside an XPath query returning multiple results, the single-valued + result will appear only on the first row of the result. The solution + to this is to use the key field as part of a join against a simpler + XPath query. As an example: + + + + + CREATE TABLE test + ( + id int4 NOT NULL, + xml text, + CONSTRAINT pk PRIMARY KEY (id) + ) + WITHOUT OIDS; + + INSERT INTO test VALUES (1, '<doc num="C1"> + <line num="L1"><a>1</a><b>2</b><c>3</c></line> + <line num="L2"><a>11</a><b>22</b><c>33</c></line> + </doc>'); + + INSERT INTO test VALUES (2, '<doc num="C2"> + <line num="L1"><a>111</a><b>222</b><c>333</c></line> + <line num="L2"><a>111</a><b>222</b><c>333</c></line> + </doc>'); + + + + + + The query + + + SELECT * FROM xpath_table('id','xml','test', + '/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1') + AS t(id int4, doc_num varchar(10), line_num varchar(10), val1 int4, + val2 int4, val3 int4) + WHERE id = 1 ORDER BY doc_num, line_num + + + + Gives the result: + + + + id | doc_num | line_num | val1 | val2 | val3 + ----+---------+----------+------+------+------ + 1 | C1 | L1 | 1 | 2 | 3 + 1 | | L2 | 11 | 22 | 33 + + + + To get doc_num on every line, the solution is to use two invocations + of xpath_table and join the results: + + + + SELECT t.*,i.doc_num FROM + xpath_table('id','xml','test', + '/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1') + AS t(id int4, line_num varchar(10), val1 int4, val2 int4, val3 int4), + xpath_table('id','xml','test','/doc/@num','1=1') + AS i(id int4, doc_num varchar(10)) + WHERE i.id=t.id AND i.id=1 + ORDER BY doc_num, line_num; + + + + which gives the desired result: + + + + id | line_num | val1 | val2 | val3 | doc_num + ----+----------+------+------+------+--------- + 1 | L1 | 1 | 2 | 3 | C1 + 1 | L2 | 11 | 22 | 33 | C1 + (2 rows) + + +
+ + + + XSLT functions + + The following functions are available if libxslt is installed (this is + not currently detected automatically, so you will have to amend the + Makefile) + + + + <literal>xslt_process</literal> + + xslt_process(document,stylesheet,paramlist) RETURNS text + + + + This function appplies the XSL stylesheet to the document and returns + the transformed result. The paramlist is a list of parameter + assignments to be used in the transformation, specified in the form + 'a=1,b=2'. Note that this is also proof-of-concept code and the + parameter parsing is very simple-minded (e.g. parameter values cannot + contain commas!) + + + Also note that if either the document or stylesheet values do not + begin with a < then they will be treated as URLs and libxslt will + fetch them. It thus follows that you can use xslt_process as a means + to fetch the contents of URLs - you should be aware of the security + implications of this. + + + There is also a two-parameter version of xslt_process which does not + pass any parameters to the transformation. + + + + + + Credits + + Development of this module was sponsored by Torchbox Ltd. (www.torchbox.com) + It has the same BSD licence as PostgreSQL. + + + This version of the XML functions provides both XPath querying and + XSLT functionality. There is also a new table function which allows + the straightforward return of multiple XML results. Note that the current code + doesn't take any particular care over character sets - this is + something that should be fixed at some point! + + + If you have any comments or suggestions, please do contact me at + jgray@azuli.co.uk. Unfortunately, this isn't my main job, so + I can't guarantee a rapid response to your query! + + +
+