提交 c3c69ab4 编写于 作者: B Bruce Momjian

Move most /contrib README files into SGML. Some still need conversion

or will never be converted.
上级 6e414a17
PostgreSQL Administration Functions
This directory is a PostgreSQL 'contrib' module which implements a number of
support functions which pgAdmin and other administration and management tools
can use to provide additional functionality if installed on a server.
This module is normally distributed as a PostgreSQL 'contrib' module. To
install it from a pre-configured source tree run the following commands
as a user with appropriate privileges from the adminpack source directory:
make install
Alternatively, if you have a PostgreSQL 8.2 or higher installation but no
source tree you can install using PGXS. Simply run the following commands the
adminpack source directory:
make USE_PGXS=1
make USE_PGXS=1 install
pgAdmin will look for the functions in the Maintenance Database (usually
"postgres" for 8.2 servers) specified in the connection dialogue for the server.
To install the functions in the database, either run the adminpack.sql script
using the pgAdmin SQL tool (and then close and reopen the connection to the
freshly instrumented server), or run the script using psql, eg:
psql -U postgres postgres < adminpack.sql
Other administration tools that use this module may have different requirements,
please consult the tool's documentation for further details.
Objects implemented (superuser only)
int8 pg_catalog.pg_file_write(fname text, data text, append bool)
bool pg_catalog.pg_file_rename(oldname text, newname text, archivname text)
bool pg_catalog.pg_file_rename(oldname text, newname text)
bool pg_catalog.pg_file_unlink(fname text)
setof record pg_catalog.pg_logdir_ls()
/* Renaming of existing backend functions for pgAdmin compatibility */
int8 pg_catalog.pg_file_read(fname text, data text, append bool)
bigint pg_catalog.pg_file_length(text)
int4 pg_catalog.pg_logfile_rotate()
This is a B-Tree implementation using GiST that supports the int2, int4,
int8, float4, float8 timestamp with/without time zone, time
with/without time zone, date, interval, oid, money, macaddr, char,
varchar/text, bytea, numeric, bit, varbit and inet/cidr types.
All work was done by Teodor Sigaev (teodor@stack.net) , Oleg Bartunov
(oleg@sai.msu.su), Janko Richter (jankorichter@yahoo.de).
See http://www.sai.msu.su/~megera/postgres/gist for additional
Apr 17, 2004 - Performance optimizing
Jan 21, 2004 - add support for bytea, numeric, bit, varbit, inet/cidr
Jan 17, 2004 - Reorganizing code and add support for char, varchar/text
Jan 10, 2004 - btree_gist now support oid , timestamp with time zone ,
time with and without time zone, date , interval
money, macaddr
Feb 5, 2003 - btree_gist now support int2, int8, float4, float8
This version will only work with PostgreSQL version 7.4 and above
because of changes in the system catalogs and the function call
If you want to index varchar attributes, you have to index using
the function text(<varchar>):
CREATE TABLE test ( a varchar(23) );
CREATE INDEX testidx ON test USING GIST ( text(a) );
gmake install
-- load functions
psql <database> < btree_gist.sql
gmake installcheck
create table test (a int4);
-- create index
create index testidx on test using gist (a);
-- query
select * from test where a < 10;
$PostgreSQL: pgsql/contrib/chkpass/README.chkpass,v 1.5 2007/10/01 19:06:48 darcy Exp $
Chkpass is a password type that is automatically checked and converted upon
entry. It is stored encrypted. To compare, simply compare against a clear
text password and the comparison function will encrypt it before comparing.
It also returns an error if the code determines that the password is easily
crackable. This is currently a stub that does nothing.
I haven't worried about making this type indexable. I doubt that anyone
would ever need to sort a file in order of encrypted password.
If you precede the string with a colon, the encryption and checking are
skipped so that you can enter existing passwords into the field.
On output, a colon is prepended. This makes it possible to dump and reload
passwords without re-encrypting them. If you want the password (encrypted)
without the colon then use the raw() function. This allows you to use the
type with things like Apache's Auth_PostgreSQL module.
The encryption uses the standard Unix function crypt(), and so it suffers
from all the usual limitations of that function; notably that only the
first eight characters of a password are considered.
Here is some sample usage:
test=# create table test (p chkpass);
test=# insert into test values ('hello');
test=# select * from test;
(1 row)
test=# select raw(p) from test;
(1 row)
test=# select p = 'hello' from test;
(1 row)
test=# select p = 'goodbye' from test;
(1 row)
D'Arcy J.M. Cain
This directory contains the code for the user-defined type,
CUBE, representing multidimensional cubes.
Makefile building instructions for the shared library
README.cube the file you are now reading
cube.c the implementation of this data type in c
cube.sql.in SQL code needed to register this type with postgres
(transformed to cube.sql by make)
cubedata.h the data structure used to store the cubes
cubeparse.y the grammar file for the parser (used by cube_in() in cube.c)
cubescan.l scanner rules (used by cube_yyparse() in cubeparse.y)
To install the type, run
make install
The user running "make install" may need root access; depending on how you
configured the PostgreSQL installation paths.
This only installs the type implementation and documentation. To make the
type available in any particular database, as a postgres superuser do:
psql -d databasename < cube.sql
If you install the type in the template1 database, all subsequently created
databases will inherit it.
To test the new type, after "make install" do
make installcheck
If it fails, examine the file regression.diffs to find out the reason (the
test code is a direct adaptation of the regression tests from the main
source tree).
By default the external functions are made executable by anyone.
The following are valid external representations for the CUBE type:
'x' A floating point value representing
a one-dimensional point or one-dimensional
zero length cubement
'(x)' Same as above
'x1,x2,x3,...,xn' A point in n-dimensional space,
represented internally as a zero volume box
'(x1,x2,x3,...,xn)' Same as above
'(x),(y)' 1-D cubement starting at x and ending at y
or vice versa; the order does not matter
'(x1,...,xn),(y1,...,yn)' n-dimensional box represented by
a pair of its opposite corners, no matter which.
Functions take care of swapping to achieve
"lower left -- upper right" representation
before computing any values
rule 1 box -> O_BRACKET paren_list COMMA paren_list C_BRACKET
rule 2 box -> paren_list COMMA paren_list
rule 3 box -> paren_list
rule 4 box -> list
rule 5 paren_list -> O_PAREN list C_PAREN
rule 6 list -> FLOAT
rule 7 list -> list COMMA FLOAT
n [0-9]+
integer [+-]?{n}
real [+-]?({n}\.{n}?|\.{n})
FLOAT ({integer}|{real})([eE]{integer})?
Examples of valid CUBE representations:
'x' A floating point value representing
a one-dimensional point (or, zero-length
one-dimensional interval)
'(x)' Same as above
'x1,x2,x3,...,xn' A point in n-dimensional space,
represented internally as a zero volume cube
'(x1,x2,x3,...,xn)' Same as above
'(x),(y)' A 1-D interval starting at x and ending at y
or vice versa; the order does not matter
'[(x),(y)]' Same as above
'(x1,...,xn),(y1,...,yn)' An n-dimensional box represented by
a pair of its diagonally opposite corners,
regardless of order. Swapping is provided
by all comarison routines to ensure the
"lower left -- upper right" representation
before actaul comparison takes place.
'[(x1,...,xn),(y1,...,yn)]' Same as above
White space is ignored, so '[(x),(y)]' can be: '[ ( x ), ( y ) ]'
I believe this union:
select cube_union('(0,5,2),(2,3,1)','0');
(0, 0, 0),(2, 5, 2)
(1 row)
does not contradict to the common sense, neither does the intersection
select cube_inter('(0,-1),(1,1)','(-2),(2)');
(0, 0),(1, 0)
(1 row)
In all binary operations on differently sized boxes, I assume the smaller
one to be a cartesian projection, i. e., having zeroes in place of coordinates
omitted in the string representation. The above examples are equivalent to:
The following containment predicate uses the point syntax,
while in fact the second argument is internally represented by a box.
This syntax makes it unnecessary to define the special Point type
and functions for (box,point) predicates.
select cube_contains('(0,0),(1,1)', '0.5,0.5');
(1 row)
Values are stored internally as 64-bit floating point numbers. This means that
numbers with more than about 16 significant digits will be truncated.
The access method for CUBE is a GiST index (gist_cube_ops), which is a
generalization of R-tree. GiSTs allow the postgres implementation of
R-tree, originally encoded to support 2-D geometric types such as
boxes and polygons, to be used with any data type whose data domain
can be partitioned using the concepts of containment, intersection and
equality. In other words, everything that can intersect or contain
its own kind can be indexed with a GiST. That includes, among other
things, all geometric data types, regardless of their dimensionality
(see also contrib/seg).
The operators supported by the GiST access method include:
a = b Same as
The cubements a and b are identical.
a && b Overlaps
The cubements a and b overlap.
a @> b Contains
The cubement a contains the cubement b.
a <@ b Contained in
The cubement a is contained in b.
(Before PostgreSQL 8.2, the containment operators @> and <@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
Although the mnemonics of the following operators is questionable, I
preserved them to maintain visual consistency with other geometric
data types defined in Postgres.
Other operators:
[a, b] < [c, d] Less than
[a, b] > [c, d] Greater than
These operators do not make a lot of sense for any practical
purpose but sorting. These operators first compare (a) to (c),
and if these are equal, compare (b) to (d). That accounts for
reasonably good sorting in most cases, which is useful if
you want to use ORDER BY with this type
The following functions are available:
cube_distance(cube, cube) returns double
cube_distance returns the distance between two cubes. If both cubes are
points, this is the normal distance function.
cube(float8) returns cube
This makes a one dimensional cube with both coordinates the same.
If the type of the argument is a numeric type other than float8 an
explicit cast to float8 may be needed.
cube(1) == '(1)'
cube(float8, float8) returns cube
This makes a one dimensional cube.
cube(1,2) == '(1),(2)'
cube(float8[]) returns cube
This makes a zero-volume cube using the coordinates defined by the
cube(ARRAY[1,2]) == '(1,2)'
cube(float8[], float8[]) returns cube
This makes a cube, with upper right and lower left coordinates as
defined by the 2 float arrays. Arrays must be of the same length.
cube('{1,2}'::float[], '{3,4}'::float[]) == '(1,2),(3,4)'
cube(cube, float8) returns cube
This builds a new cube by adding a dimension on to an existing cube with
the same values for both parts of the new coordinate. This is useful for
building cubes piece by piece from calculated values.
cube('(1)',2) == '(1,2),(1,2)'
cube(cube, float8, float8) returns cube
This builds a new cube by adding a dimension on to an existing cube.
This is useful for building cubes piece by piece from calculated values.
cube('(1,2)',3,4) == '(1,3),(2,4)'
cube_dim(cube) returns int
cube_dim returns the number of dimensions stored in the the data structure
for a cube. This is useful for constraints on the dimensions of a cube.
cube_ll_coord(cube, int) returns double
cube_ll_coord returns the nth coordinate value for the lower left corner
of a cube. This is useful for doing coordinate transformations.
cube_ur_coord(cube, int) returns double
cube_ur_coord returns the nth coordinate value for the upper right corner
of a cube. This is useful for doing coordinate transformations.
cube_subset(cube, int[]) returns cube
Builds a new cube from an existing cube, using a list of dimension indexes
from an array. Can be used to find both the ll and ur coordinate of single
dimenion, e.g.: cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[2]) = '(3),(7)'
Or can be used to drop dimensions, or reorder them as desired, e.g.:
cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[3,2,1,1]) = '(5, 3, 1, 1),(8, 7, 6, 6)'
cube_is_point(cube) returns bool
cube_is_point returns true if a cube is also a point. This is true when the
two defining corners are the same.
cube_enlarge(cube, double, int) returns cube
cube_enlarge increases the size of a cube by a specified radius in at least
n dimensions. If the radius is negative the box is shrunk instead. This
is useful for creating bounding boxes around a point for searching for
nearby points. All defined dimensions are changed by the radius. If n
is greater than the number of defined dimensions and the cube is being
increased (r >= 0) then 0 is used as the base for the extra coordinates.
LL coordinates are decreased by r and UR coordinates are increased by r. If
a LL coordinate is increased to larger than the corresponding UR coordinate
(this can only happen when r < 0) than both coordinates are set to their
average. To make it harder for people to break things there is an effective
maximum on the dimension of cubes of 100. This is set in cubedata.h if
you need something bigger.
There are a few other potentially useful functions defined in cube.c
that vanished from the schema because I stopped using them. Some of
these were meant to support type casting. Let me know if I was wrong:
I will then add them back to the schema. I would also appreciate
other ideas that would enhance the type and make it more useful.
For examples of usage, see sql/cube.sql
This code is essentially based on the example written for
Illustra, http://garcia.me.berkeley.edu/~adong/rtree
My thanks are primarily to Prof. Joe Hellerstein
(http://db.cs.berkeley.edu/~jmh/) for elucidating the gist of the GiST
(http://gist.cs.berkeley.edu/), and to his former student, Andy Dong
(http://best.me.berkeley.edu/~adong/), for his exemplar.
I am also grateful to all postgres developers, present and past, for enabling
myself to create my own world and live undisturbed in it. And I would like to
acknowledge my gratitude to Argonne Lab and to the U.S. Department of Energy
for the years of faithful support of my database research.
Gene Selkov, Jr.
Computational Scientist
Mathematics and Computer Science Division
Argonne National Laboratory
9700 S Cass Ave.
Building 221
Argonne, IL 60439-4844
Minor updates to this package were made by Bruno Wolff III <bruno@wolff.to>
in August/September of 2002.
These include changing the precision from single precision to double
precision and adding some new functions.
Additional updates were made by Joshua Reich <josh@root.net> in July 2006.
These include cube(float8[], float8[]) and cleaning up the code to use
the V1 call protocol instead of the deprecated V0 form.
* dblink
* Functions returning results from a remote database
* Joe Conway <mail@joeconway.com>
* And contributors:
* Darko Prenosil <Darko.Prenosil@finteh.hr>
* Shridhar Daithankar <shridhar_daithankar@persistent.co.in>
* Kai Londenberg (K.Londenberg@librics.de)
* Copyright (c) 2001-2007, PostgreSQL Global Development Group
* Permission to use, copy, modify, and distribute this software and its
* documentation for any purpose, without fee, and without a written agreement
* is hereby granted, provided that the above copyright notice and this
* paragraph and the following two paragraphs appear in all copies.
Release Notes:
27 August 2006
- Added async query capability. Original patch by
Kai Londenberg (K.Londenberg@librics.de), modified by Joe Conway
Version 0.7 (as of 25 Feb, 2004)
- Added new version of dblink, dblink_exec, dblink_open, dblink_close,
and, dblink_fetch -- allows ERROR on remote side of connection to
throw NOTICE locally instead of ERROR
Version 0.6
- functions deprecated in 0.5 have been removed
- added ability to create "named" persistent connections
Version 0.5
- dblink now supports use directly as a table function; this is the new
preferred usage going forward
- Use of dblink_tok is now deprecated; original form of dblink is also
deprecated. They _will_ be removed in the next version.
- dblink_last_oid is also deprecated; use dblink_exec() which returns
the command status as a single row, single column result.
- Original dblink, dblink_tok, and dblink_last_oid are commented out in
dblink.sql; remove the comments to use the deprecated functions.
- dblink_strtok() and dblink_replace() functions were removed. Use
split() and replace() respectively (new backend functions in
PostgreSQL 7.3) instead.
- New functions: dblink_exec() for non-SELECT queries; dblink_connect()
opens connection that persists for duration of a backend;
dblink_disconnect() closes a persistent connection; dblink_open()
opens a cursor; dblink_fetch() fetches results from an open cursor.
dblink_close() closes a cursor.
- New test suite: dblink_check.sh, dblink.test.sql,
dblink.test.expected.out. Execute dblink_check.sh from the same
directory as the other two files. Output is dblink.test.out and
dblink.test.diff. Note that dblink.test.sql is a good source
of example usage.
Version 0.4
- removed cursor wrap around input sql to allow for remote
- dblink now returns a resource id instead of a real pointer
- added several utility functions -- see below
Version 0.3
- fixed dblink invalid pointer causing corrupt elog message
- fixed dblink_tok improper handling of null results
- fixed examples in README.dblink
Version 0.2
- initial release
Place these files in a directory called 'dblink' under 'contrib' in the PostgreSQL source tree. Then run:
make install
You can use dblink.sql to create the functions in your database of choice, e.g.
psql template1 < dblink.sql
installs dblink functions into database template1
Note: Parameters representing relation names must include double
quotes if the names are mixed-case or contain special characters. They
must also be appropriately qualified with schema name if applicable.
See the following files:
-- Joe Conway
This contrib package contains two different approaches to calculating
great circle distances on the surface of the Earth. The one described
first depends on the contrib/cube package (which MUST be installed before
earthdistance is installed). The second one is based on the point
datatype using latitude and longitude for the coordinates. The install
script makes the defined functions executable by anyone.
Make sure contrib/cube has been installed.
make install
make installcheck
To use these functions in a particular database as a postgres superuser do:
psql databasename < earthdistance.sql
contrib/cube based Earth distance functions
Bruno Wolff III
September 2002
A spherical model of the Earth is used.
Data is stored in cubes that are points (both corners are the same) using 3
coordinates representing the distance from the center of the Earth.
The radius of the Earth is obtained from the earth() function. It is
given in meters. But by changing this one function you can change it
to use some other units or to use a different value of the radius
that you feel is more appropiate.
This package also has applications to astronomical databases as well.
Astronomers will probably want to change earth() to return a radius of
180/pi() so that distances are in degrees.
Functions are provided to allow for input in latitude and longitude (in
degrees), to allow for output of latitude and longitude, to calculate
the great circle distance between two points and to easily specify a
bounding box usable for index searches.
The functions are all 'sql' functions. If you want to make these functions
executable by other people you will also have to make the referenced
cube functions executable. cube(text), cube(float8), cube(cube,float8),
cube_distance(cube,cube), cube_ll_coord(cube,int) and
cube_enlarge(cube,float8,int) are used indirectly by the earth distance
functions. is_point(cube) and cube_dim(cube) are used in constraints for data
in domain earth. cube_ur_coord(cube,int) is used in the regression tests and
might be useful for looking at bounding box coordinates in user applications.
A domain of type cube named earth is defined.
There are constraints on it defined to make sure the cube is a point,
that it does not have more than 3 dimensions and that it is very near
the surface of a sphere centered about the origin with the radius of
the Earth.
The following functions are provided:
earth() - Returns the radius of the Earth in meters.
sec_to_gc(float8) - Converts the normal straight line (secant) distance between
between two points on the surface of the Earth to the great circle distance
between them.
gc_to_sec(float8) - Converts the great circle distance between two points
on the surface of the Earth to the normal straight line (secant) distance
between them.
ll_to_earth(float8, float8) - Returns the location of a point on the surface
of the Earth given its latitude (argument 1) and longitude (argument 2) in
latitude(earth) - Returns the latitude in degrees of a point on the surface
of the Earth.
longitude(earth) - Returns the longitude in degrees of a point on the surface
of the Earth.
earth_distance(earth, earth) - Returns the great circle distance between
two points on the surface of the Earth.
earth_box(earth, float8) - Returns a box suitable for an indexed search using
the cube @> operator for points within a given great circle distance of a
location. Some points in this box are further than the specified great circle
distance from the location so a second check using earth_distance should be
made at the same time.
One advantage of using cube representation over a point using latitude and
longitude for coordinates, is that you don't have to worry about special
conditions at +/- 180 degrees of longitude or near the poles.
Below is the documentation for the Earth distance operator that works
with the point data type.
I corrected a bug in the geo_distance code where two double constants
were declared as int. I also changed the distance function to use
the haversine formula which is more accurate for small distances.
Bruno Wolff
September 2002
Date: Wed, 1 Apr 1998 15:19:32 -0600 (CST)
From: Hal Snyder <hal@vailsys.com>
To: vmehr@ctp.com
Subject: [QUESTIONS] Re: Spatial data, R-Trees
> From: Vivek Mehra <vmehr@ctp.com>
> Date: Wed, 1 Apr 1998 10:06:50 -0500
> Am just starting out with PostgreSQL and would like to learn more about
> the spatial data handling ablilities of postgreSQL - in terms of using
> R-tree indexes, user defined types, operators and functions.
> Would you be able to suggest where I could find some code and SQL to
> look at to create these?
Here's the setup for adding an operator '<@>' to give distance in
statute miles between two points on the Earth's surface. Coordinates
are in degrees. Points are taken as (longitude, latitude) and not vice
versa as longitude is closer to the intuitive idea of x-axis and
latitude to y-axis.
There's C source, Makefile for FreeBSD, and SQL for installing and
testing the function.
Let me know if anything looks fishy!
* fuzzystrmatch.c
* Functions for "fuzzy" comparison of strings
* Joe Conway <mail@joeconway.com>
* Copyright (c) 2001-2007, PostgreSQL Global Development Group
* levenshtein()
* -------------
* Written based on a description of the algorithm by Michael Gilleland
* found at http://www.merriampark.com/ld.htm
* Also looked at levenshtein.c in the PHP 4.0.6 distribution for
* inspiration.
* metaphone()
* -----------
* Modified for PostgreSQL by Joe Conway.
* Based on CPAN's "Text-Metaphone-1.96" by Michael G Schwern <schwern@pobox.com>
* Code slightly modified for use as PostgreSQL function (palloc, elog, etc).
* Metaphone was originally created by Lawrence Philips and presented in article
* in "Computer Language" December 1990 issue.
* dmetaphone() and dmetaphone_alt()
* ---------------------------------
* A port of the DoubleMetaphone perl module by Andrew Dunstan. See dmetaphone.c
* for more detail.
* soundex()
* -----------
* Folded existing soundex contrib into this one. Renamed text_soundex() (C function)
* to soundex() for consistency.
* difference()
* ------------
* Return the difference between two strings' soundex values. Kris Jurka
* Permission to use, copy, modify, and distribute this software and its
* documentation for any purpose, without fee, and without a written agreement
* is hereby granted, provided that the above copyright notice and this
* paragraph and the following two paragraphs appear in all copies.
Version 0.3 (30 June, 2004):
Release Notes:
Version 0.3
- added double metaphone code from Andrew Dunstan
- change metaphone so that an empty input string causes an empty
output string to be returned, instead of throwing an ERROR
- fixed examples in README.soundex
Version 0.2
- folded soundex contrib into this one
Version 0.1
- initial release
Place these files in a directory called 'fuzzystrmatch' under 'contrib' in the PostgreSQL source tree. Then run:
make install
You can use fuzzystrmatch.sql to create the functions in your database of choice, e.g.
psql -U postgres template1 < fuzzystrmatch.sql
installs following functions into database template1:
levenshtein() - calculates the levenshtein distance between two strings
metaphone() - calculates the metaphone code of an input string
levenshtein -- calculates the levenshtein distance between two strings
levenshtein(text source, text target)
any text string, 255 characters max, NOT NULL
any text string, 255 characters max, NOT NULL
Returns int
Example usage
select levenshtein('GUMBO','GAMBOL');
metaphone -- calculates the metaphone code of an input string
metaphone(text source, int max_output_length)
any text string, 255 characters max, NOT NULL
maximum length of the output metaphone code; if longer, the output
is truncated to this length
Returns text
Example usage
select metaphone('GUMBO',4);
-- Joe Conway
Hstore - contrib module for storing (key,value) pairs
[Online version] (http://www.sai.msu.su/~megera/oddmuse/index.cgi?Hstore)
Many attributes rarely searched, semistructural data, lazy DBA
* Oleg Bartunov <oleg@sai.msu.su>, Moscow, Moscow University, Russia
* Teodor Sigaev <teodor@sigaev.ru>, Moscow, Delta-Soft Ltd.,Russia
LEGAL NOTICES: This module is released under BSD license (as PostgreSQL
* hstore -> text - get value , perl analogy $h{key}
select 'a=>q, b=>g'->'a';
* hstore || hstore - concatenation, perl analogy %a=( %b, %c );
regression=# select 'a=>b'::hstore || 'c=>d'::hstore;
"a"=>"b", "c"=>"d"
(1 row)
but, notice
regression=# select 'a=>b'::hstore || 'a=>d'::hstore;
(1 row)
* text => text - creates hstore type from two text strings
select 'a'=>'b';
* hstore @> hstore - contains operation, check if left operand contains right.
regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'a=>c';
(1 row)
regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'b=>1';
(1 row)
* hstore <@ hstore - contained operation, check if left operand is contained
in right
(Before PostgreSQL 8.2, the containment operators @> and <@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
* akeys(hstore) - returns all keys from hstore as array
regression=# select akeys('a=>1,b=>2');
* skeys(hstore) - returns all keys from hstore as strings
regression=# select skeys('a=>1,b=>2');
* avals(hstore) - returns all values from hstore as array
regression=# select avals('a=>1,b=>2');
* svals(hstore) - returns all values from hstore as strings
regression=# select svals('a=>1,b=>2');
* delete (hstore,text) - delete (key,value) from hstore if key matches
regression=# select delete('a=>1,b=>2','b');
* each(hstore) return (key, value) pairs
regression=# select * from each('a=>1,b=>2');
key | value
a | 1
b | 2
* exist (hstore,text)
* hstore ? text
- returns 'true if key is exists in hstore and false otherwise.
regression=# select exist('a=>1','a'), 'a=>1' ? 'a';
exist | ?column?
t | t
* defined (hstore,text) - returns true if key is exists in hstore and
its value is not NULL.
regression=# select defined('a=>NULL','a');
Module provides index support for '@>' and '?' operations.
create index hidx on testhstore using gist(h);
create index hidx on testhstore using gin(h);
Use parenthesis in select below, because priority of 'is' is higher than that of '->'
select id from entrants where (info->'education_period') is not null;
* add key
update tt set h=h||'c=>3';
* delete key
update tt set h=delete(h,'k1');
* Statistics
hstore type, because of its intrinsic liberality, could contain a lot of
different keys. Checking for valid keys is the task of application.
Examples below demonstrate several techniques how to check keys statistics.
o simple example
select * from each('aaa=>bq, b=>NULL, ""=>1 ');
o using table
select (each(h)).key, (each(h)).value into stat from testhstore ;
o online stat
select key, count(*) from (select (each(h)).key from testhstore) as stat group by key order by count desc, key;
key | count
line | 883
query | 207
pos | 203
node | 202
space | 197
status | 195
public | 194
title | 190
org | 189
Integer aggregator/enumerator.
Many database systems have the notion of a one to many table.
A one to many table usually sits between two indexed tables,
create table one_to_many(left int, right int) ;
And it is used like this:
SELECT right.* from right JOIN one_to_many ON (right.id = one_to_many.right)
WHERE one_to_many.left = item;
This will return all the items in the right hand table for an entry
in the left hand table. This is a very common construct in SQL.
Now, this methodology can be cumbersome with a very large number of
entries in the one_to_many table. Depending on the order in which
data was entered, a join like this could result in an index scan
and a fetch for each right hand entry in the table for a particular
left hand entry.
If you have a very dynamic system, there is not much you can do.
However, if you have some data which is fairly static, you can
create a summary table with the aggregator.
CREATE TABLE summary as SELECT left, int_array_aggregate(right)
AS right FROM one_to_many GROUP BY left;
This will create a table with one row per left item, and an array
of right items. Now this is pretty useless without some way of using
the array, thats why there is an array enumerator.
SELECT left, int_array_enum(right) FROM summary WHERE left = item;
The above query using int_array_enum, produces the same results as:
SELECT left, right FROM one_to_many WHERE left = item;
The difference is that the query against the summary table has to get
only one row from the table, where as the query against "one_to_many"
must index scan and fetch a row for each entry.
On our system, an EXPLAIN shows a query with a cost of 8488 gets reduced
to a cost of 329. The query is a join between the one_to_many table,
select right, count(right) from
select left, int_array_enum(right) as right from summary join
(select left from left_table where left = item) as lefts
ON (summary.left = lefts.left )
) as list group by right order by count desc ;
This is an implementation of RD-tree data structure using GiST interface
of PostgreSQL. It has built-in lossy compression.
Current implementation provides index support for one-dimensional array of
integers: gist__int_ops, suitable for small and medium size of arrays (used by
default), and gist__intbig_ops for indexing large arrays (we use superimposed
signature with length of 4096 bits to represent sets). There is also a
non-default gin__int_ops for GIN indexes on integer arrays.
All work was done by Teodor Sigaev (teodor@stack.net) and Oleg Bartunov
(oleg@sai.msu.su). See http://www.sai.msu.su/~megera/postgres/gist
for additional information. Andrey Oktyabrski did a great work on
adding new functions and operations.
int icount(int[]) - the number of elements in intarray
test=# select icount('{1,2,3}'::int[]);
(1 row)
int[] sort(int[], 'asc' | 'desc') - sort intarray
test=# select sort('{1,2,3}'::int[],'desc');
(1 row)
int[] sort(int[]) - sort in ascending order
int[] sort_asc(int[]),sort_desc(int[]) - shortcuts for sort
int[] uniq(int[]) - returns unique elements
test=# select uniq(sort('{1,2,3,2,1}'::int[]));
(1 row)
int idx(int[], int item) - returns index of first intarray matching element to item, or
'0' if matching failed.
test=# select idx('{1,2,3,2,1}'::int[],2);
(1 row)
int[] subarray(int[],int START [, int LEN]) - returns part of intarray starting from
element number START (from 1) and length LEN.
test=# select subarray('{1,2,3,2,1}'::int[],2,3);
(1 row)
int[] intset(int4) - casting int4 to int[]
test=# select intset(1);
(1 row)
int[] && int[] - overlap - returns TRUE if arrays have at least one common element
int[] @> int[] - contains - returns TRUE if left array contains right array
int[] <@ int[] - contained - returns TRUE if left array is contained in right array
# int[] - returns the number of elements in array
int[] + int - push element to array ( add to end of array)
int[] + int[] - merge of arrays (right array added to the end of left one)
int[] - int - remove entries matched by right argument from array
int[] - int[] - remove right array from left
int[] | int - returns intarray - union of arguments
int[] | int[] - returns intarray as a union of two arrays
int[] & int[] - returns intersection of arrays
int[] @@ query_int - returns TRUE if array satisfies query (like '1&(2|3)')
query_int ~~ int[] - returns TRUE if array satisfies query (commutator of @@)
(Before PostgreSQL 8.2, the containment operators @> and <@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
August 6, 2002
1. Reworked patch from Andrey Oktyabrski (ano@spider.ru) with
functions: icount, sort, sort_asc, uniq, idx, subarray
operations: #, +, -, |, &
October 1, 2001
1. Change search method in array to binary
September 28, 2001
1. gist__int_ops now is without lossy
2. add sort entry in picksplit
September 21, 2001
1. Added support for boolean query (indexable operator @@, looks like
a @@ '1|(2&3)', perfomance is better in any case )
2. Done some small optimizations
March 19, 2001
1. Added support for toastable keys
2. Improved split algorithm for intbig (selection speedup is about 30%)
gmake install
-- load functions
psql <database> < _int.sql
gmake installcheck
create table message (mid int not null,sections int[]);
create table message_section_map (mid int not null,sid int not null);
-- create indices
CREATE unique index message_key on message ( mid );
CREATE unique index message_section_map_key2 on message_section_map (sid, mid );
CREATE INDEX message_rdtree_idx on message using gist ( sections gist__int_ops);
-- select some messages with section in 1 OR 2 - OVERLAP operator
select message.mid from message where message.sections && '{1,2}';
-- select messages contains in sections 1 AND 2 - CONTAINS operator
select message.mid from message where message.sections @> '{1,2}';
-- the same, CONTAINED operator
select message.mid from message where '{1,2}' <@ message.sections;
subdirectory bench contains benchmark suite.
cd ./bench
1. createdb TEST
2. psql TEST < ../_int.sql
3. ./create_test.pl | psql TEST
4. ./bench.pl - perl script to benchmark queries, supports OR, AND queries
with/without RD-Tree. Run script without arguments to
see availbale options.
a)test without RD-Tree (OR)
./bench.pl -d TEST -c -s 1,2 -v
b)test with RD-Tree
./bench.pl -d TEST -c -s 1,2 -v -r
Size of table <message>: 200000
Size of table <message_section_map>: 269133
Distribution of messages by sections:
section 0: 74377 messages
section 1: 16284 messages
section 50: 1229 messages
section 99: 683 messages
old - without RD-Tree support,
new - with RD-Tree
|Search set|OR, time in sec|AND, time in sec|
| +-------+-------+--------+-------+
| | old | new | old | new |
| 1| 0.625| 0.101| -| -|
| 99| 0.018| 0.017| -| -|
| 1,2| 0.766| 0.133| 0.628| 0.045|
| 1,2,50,65| 0.794| 0.141| 0.030| 0.006|
-- EAN13 - UPC - ISBN (books) - ISMN (music) - ISSN (serials)
Copyright Germán Méndez Bravo (Kronuz), 2004 - 2006
This module is released under the same BSD license as the rest of PostgreSQL.
The information to implement this module was collected through
several sites, including:
the prefixes used for hyphenation where also compiled from:
Care was taken during the creation of the algorithms and they
were meticulously verified against the suggested algorithms
in the official ISBN, ISMN, ISSN User Manuals.
-- Content of the Module
This directory contains definitions for a few PostgreSQL
data types, for the following international-standard namespaces:
EAN13, UPC, ISBN (books), ISMN (music), and ISSN (serials). This module
is inspired by Garrett A. Wollman's isbn_issn code.
I wanted the database to fully validate numbers and also to use the
upcoming ISBN-13 and the EAN13 standards, as well as to have it
automatically doing hyphenations for ISBN numbers.
This new module validates, and automatically adds the correct
hyphenations to the numbers. Also, it supports the new ISBN-13
numbers to be used starting in January 2007.
1. ISBN13, ISMN13, ISSN13 numbers are all EAN13 numbers
2. EAN13 numbers aren't always ISBN13, ISMN13 or ISSN13 (some are)
3. some ISBN13 numbers can be displayed as ISBN
4. some ISMN13 numbers can be displayed as ISMN
5. some ISSN13 numbers can be displayed as ISSN
6. all UPC, ISBN, ISMN and ISSN can be represented as EAN13 numbers
Note: All types are internally represented as 64 bit integers,
and internally all are consistently interchangeable.
We have the following data types:
+ EAN13 for European Article Numbers.
This type will always show the EAN13-display format.
Te output function for this is -> ean13_out()
+ ISBN13 for International Standard Book Numbers to be displayed in
the new EAN13-display format.
+ ISMN13 for International Standard Music Numbers to be displayed in
the new EAN13-display format.
+ ISSN13 for International Standard Serial Numbers to be displayed
in the new EAN13-display format.
These types will always display the long version of the ISxN (EAN13)
The output function to do this is -> ean13_out()
* The need for these types is just for displaying in different
ways the same data:
ISBN13 is actually the same as ISBN, ISMN13=ISMN and ISSN13=ISSN.
+ ISBN for International Standard Book Numbers to be displayed in
the current short-display format.
+ ISMN for International Standard Music Numbers to be displayed in
the current short-display format.
+ ISSN for International Standard Serial Numbers to be displayed
in the current short-display format.
These types will display the short version of the ISxN (ISxN 10)
whenever it's possible, and it will show ISxN 13 when it's
impossible to show the short version.
The output function to do this is -> isn_out()
+ UPC for Universal Product Codes.
UPC numbers are a subset of the EAN13 numbers (they are basically
EAN13 without the first '0' digit.)
The output function to do this is also -> isn_out()
We have the following input functions:
+ To take a string and return an EAN13 -> ean13_in()
+ To take a string and return valid ISBN or ISBN13 numbers -> isbn_in()
+ To take a string and return valid ISMN or ISMN13 numbers -> ismn_in()
+ To take a string and return valid ISSN or ISSN13 numbers -> issn_in()
+ To take a string and return an UPC codes -> upc_in()
We are able to cast from:
+ ISBN13 -> EAN13
+ ISMN13 -> EAN13
+ ISSN13 -> EAN13
+ ISBN -> EAN13
+ ISMN -> EAN13
+ ISSN -> EAN13
+ UPC -> EAN13
+ ISBN <-> ISBN13
+ ISMN <-> ISMN13
+ ISSN <-> ISSN13
We have two operator classes (for btree and for hash) so each data type
can be indexed for faster access.
The C API is implemented as:
extern Datum isn_out(PG_FUNCTION_ARGS);
extern Datum ean13_out(PG_FUNCTION_ARGS);
extern Datum ean13_in(PG_FUNCTION_ARGS);
extern Datum isbn_in(PG_FUNCTION_ARGS);
extern Datum ismn_in(PG_FUNCTION_ARGS);
extern Datum issn_in(PG_FUNCTION_ARGS);
extern Datum upc_in(PG_FUNCTION_ARGS);
On success:
+ isn_out() takes any of our types and returns a string containing
the shortes possible representation of the number.
+ ean13_out() takes any of our types and returns the
EAN13 (long) representation of the number.
+ ean13_in() takes a string and return a EAN13. Which, as stated in (2)
could or could not be any of our types, but it certainly is an EAN13
number. Only if the string is a valid EAN13 number, otherwise it fails.
+ isbn_in() takes a string and return an ISBN/ISBN13. Only if the string
is really a ISBN/ISBN13, otherwise it fails.
+ ismn_in() takes a string and return an ISMN/ISMN13. Only if the string
is really a ISMN/ISMN13, otherwise it fails.
+ issn_in() takes a string and return an ISSN/ISSN13. Only if the string
is really a ISSN/ISSN13, otherwise it fails.
+ upc_in() takes a string and return an UPC. Only if the string is
really a UPC, otherwise it fails.
(on failure, the functions 'ereport' the error)
-- Testing/Playing Functions
isn_weak(boolean) - Sets the weak input mode.
This function is intended for testing use only!
isn_weak() gets the current status of the weak mode.
"Weak" mode is used to be able to insert "invalid" data to a table.
"Invalid" as in the check digit being wrong, not missing numbers.
Why would you want to use the weak mode? well, it could be that
you have a huge collection of ISBN numbers, and that there are so many of
them that for weird reasons some have the wrong check digit (perhaps the
numbers where scanned from a printed list and the OCR got the numbers wrong,
perhaps the numbers were manually captured... who knows.) Anyway, the thing
is you might want to clean the mess up, but you still want to be able to have
all the numbers in your database and maybe use an external tool to access
the invalid numbers in the database so you can verify the information and
validate it more easily; as selecting all the invalid numbers in the table.
When you insert invalid numbers in a table using the weak mode, the number
will be inserted with the corrected check digit, but it will be flagged
with an exclamation mark ('!') at the end (i.e. 0-11-000322-5!)
You can also force the insertion of invalid numbers even not in the weak mode,
appending the '!' character at the end of the number.
To work with invalid numbers, you can use two functions:
+ make_valid(), which validates an invalid number (deleting the invalid flag)
+ is_valid(), which checks for the invalid flag presence.
-- Examples of Use
--Using the types directly:
select isbn('978-0-393-04002-9');
select isbn13('0901690546');
select issn('1436-4522');
--Casting types:
-- note that you can only cast from ean13 to other type when the casted
-- number would be valid in the realm of the casted type;
-- thus, the following will NOT work: select isbn(ean13('0220356483481'));
-- but these will:
select upc(ean13('0220356483481'));
select ean13(upc('220356483481'));
--Create a table with a single column to hold ISBN numbers:
create table test ( id isbn );
insert into test values('9780393040029');
--Automatically calculating check digits (observe the '?'):
insert into test values('220500896?');
insert into test values('978055215372?');
select issn('3251231?');
select ismn('979047213542?');
--Using the weak mode:
select isn_weak(true);
insert into test values('978-0-11-000533-4');
insert into test values('9780141219307');
insert into test values('2-205-00876-X');
select isn_weak(false);
select id from test where not is_valid(id);
update test set id=make_valid(id) where id = '2-205-00876-X!';
select * from test;
select isbn13(id) from test;
-- Contact
Please suggestions or bug reports to kronuz at users.sourceforge.net
Last reviewed on August 23, 2006 by Kronuz.
PostgreSQL type extension for managing Large Objects
One of the problems with the JDBC driver (and this affects the ODBC driver
also), is that the specification assumes that references to BLOBS (Binary
Large OBjectS) are stored within a table, and if that entry is changed, the
associated BLOB is deleted from the database.
As PostgreSQL stands, this doesn't occur. Large objects are treated as
objects in their own right; a table entry can reference a large object by
OID, but there can be multiple table entries referencing the same large
object OID, so the system doesn't delete the large object just because you
change or remove one such entry.
Now this is fine for new PostgreSQL-specific applications, but existing ones
using JDBC or ODBC won't delete the objects, resulting in orphaning - objects
that are not referenced by anything, and simply occupy disk space.
The Fix
I've fixed this by creating a new data type 'lo', some support functions, and
a Trigger which handles the orphaning problem. The trigger essentially just
does a 'lo_unlink' whenever you delete or modify a value referencing a large
object. When you use this trigger, you are assuming that there is only one
database reference to any large object that is referenced in a
trigger-controlled column!
The 'lo' type was created because we needed to differentiate between plain
OIDs and Large Objects. Currently the JDBC driver handles this dilemma easily,
but (after talking to Byron), the ODBC driver needed a unique type. They had
created an 'lo' type, but not the solution to orphaning.
You don't actually have to use the 'lo' type to use the trigger, but it may be
convenient to use it to keep track of which columns in your database represent
large objects that you are managing with the trigger.
Ok, first build the shared library, and install. Typing 'make install' in the
contrib/lo directory should do it.
Then, as the postgres super user, run the lo.sql script in any database that
needs the features. This will install the type, and define the support
functions. You can run the script once in template1, and the objects will be
inherited by subsequently-created databases.
How to Use
The easiest way is by an example:
> create table image (title text, raster lo);
> create trigger t_raster before update or delete on image
> for each row execute procedure lo_manage(raster);
Create a trigger for each column that contains a lo type, and give the column
name as the trigger procedure argument. You can have more than one trigger on
a table if you need multiple lo columns in the same table, but don't forget to
give a different name to each trigger.
* Dropping a table will still orphan any objects it contains, as the trigger
is not executed.
Avoid this by preceding the 'drop table' with 'delete from {table}'.
If you already have, or suspect you have, orphaned large objects, see
the contrib/vacuumlo module to help you clean them up. It's a good idea
to run contrib/vacuumlo occasionally as a back-stop to the lo_manage
* Some frontends may create their own tables, and will not create the
associated trigger(s). Also, users may not remember (or know) to create
the triggers.
As the ODBC driver needs a permanent lo type (& JDBC could be optimised to
use it if it's Oid is fixed), and as the above issues can only be fixed by
some internal changes, I feel it should become a permanent built-in type.
I'm releasing this into contrib, just to get it out, and tested.
Peter Mount <peter@retep.org.uk> June 13 1998
contrib/ltree module
ltree - is a PostgreSQL contrib module which contains implementation of data
types, indexed access methods and queries for data organized as a tree-like
This module will works for PostgreSQL version 7.3.
(version for 7.2 version is available from http://www.sai.msu.su/~megera/postgres/gist/ltree/ltree-7.2.tar.gz)
All work was done by Teodor Sigaev (teodor@stack.net) and Oleg Bartunov
(oleg@sai.msu.su). See http://www.sai.msu.su/~megera/postgres/gist for
additional information. Authors would like to thank Eugeny Rodichev for helpful
discussions. Comments and bug reports are welcome.
LEGAL NOTICES: This module is released under BSD license (as PostgreSQL
itself). This work was done in framework of Russian Scientific Network and
partially supported by Russian Foundation for Basic Research and Stack Group.
This is a placeholder for introduction to the problem. Hope, people reading
this document doesn't need it too much :-)
A label of a node is a sequence of one or more words separated by blank
character '_' and containing letters and digits ( for example, [a-zA-Z0-9] for
C locale). The length of a label is limited by 256 bytes.
Example: 'Countries', 'Personal_Services'
A label path of a node is a sequence of one or more dot-separated labels
l1.l2...ln, represents path from root to the node. The length of a label path
is limited by 65Kb, but size <= 2Kb is preferrable. We consider it's not a
strict limitation ( maximal size of label path for DMOZ catalogue - http://
www.dmoz.org, is about 240 bytes !)
Example: 'Top.Countries.Europe.Russia'
We introduce several datatypes:
- is a datatype for label path.
- is a datatype for arrays of ltree.
- is a path expression that has regular expression in the label path and
used for ltree matching. Star symbol (*) is used to specify any number of
labels (levels) and could be used at the beginning and the end of lquery,
for example, '*.Europe.*'.
The following quantifiers are recognized for '*' (like in Perl):
{n} Match exactly n levels
{n,} Match at least n levels
{n,m} Match at least n but not more than m levels
{,m} Match at maximum m levels (eq. to {0,m})
It is possible to use several modifiers at the end of a label:
@ Do case-insensitive label matching
* Do prefix matching for a label
% Don't account word separator '_' in label matching, that is
'Russian%' would match 'Russian_nations', but not 'Russian'
lquery could contains logical '!' (NOT) at the beginning of the label and '
|' (OR) to specify possible alternatives for label matching.
Example of lquery:
a) b) c) d) e)
A label path should
+ a) begins from a node with label 'Top'
+ b) and following zero or 2 labels until
+ c) a node with label beginning from case-insensitive prefix 'sport'
+ d) following node with label not matched 'football' or 'tennis' and
+ e) ends on node with label beginning from 'Russ' or strictly matched
- is a datatype for label searching (like type 'query' for full text
searching, see contrib/tsearch). It's possible to use modifiers @,%,* at
the end of word. The meaning of modifiers are the same as for lquery.
Example: 'Europe & Russia*@ & !Transportation'
Search paths contain words 'Europe' and 'Russia*' (case-insensitive) and
not 'Transportation'. Notice, the order of words as they appear in label
path is not important !
The following operations are defined for type ltree:
<,>,<=,>=,=, <>
- have their usual meanings. Comparison is doing in the order of direct
tree traversing, children of a node are sorted lexicographic.
ltree @> ltree
- returns TRUE if left argument is an ancestor of right argument (or
ltree <@ ltree
- returns TRUE if left argument is a descendant of right argument (or
ltree ~ lquery, lquery ~ ltree
- return TRUE if node represented by ltree satisfies lquery.
ltree ? lquery[], lquery ? ltree[]
- return TRUE if node represented by ltree satisfies at least one lquery
from array.
ltree @ ltxtquery, ltxtquery @ ltree
- return TRUE if node represented by ltree satisfies ltxtquery.
ltree || ltree, ltree || text, text || ltree
- return concatenated ltree.
Operations for arrays of ltree (ltree[]):
ltree[] @> ltree, ltree <@ ltree[]
- returns TRUE if array ltree[] contains an ancestor of ltree.
ltree @> ltree[], ltree[] <@ ltree
- returns TRUE if array ltree[] contains a descendant of ltree.
ltree[] ~ lquery, lquery ~ ltree[]
- returns TRUE if array ltree[] contains label paths matched lquery.
ltree[] ? lquery[], lquery[] ? ltree[]
- returns TRUE if array ltree[] contains label paths matched atleaset one
lquery from array.
ltree[] @ ltxtquery, ltxtquery @ ltree[]
- returns TRUE if array ltree[] contains label paths matched ltxtquery
(full text search).
ltree[] ?@> ltree, ltree ?<@ ltree[], ltree[] ?~ lquery, ltree[] ?@ ltxtquery
- returns first element of array ltree[] satisfies corresponding condition
and NULL in vice versa.
Operations <@, @>, @ and ~ have analogues - ^<@, ^@>, ^@, ^~, which doesn't use
indices !
Various indices could be created to speed up execution of operations:
* B-tree index over ltree:
<, <=, =, >=, >
* GiST index over ltree:
<, <=, =, >=, >, @>, <@, @, ~, ?
create index path_gist_idx on test using gist (path);
* GiST index over ltree[]:
ltree[]<@ ltree, ltree @> ltree[], @, ~, ?.
create index path_gist_idx on test using gist (array_path);
Notices: This index is lossy.
ltree subltree
ltree subltree(ltree, start, end)
returns subpath of ltree from start (inclusive) until the end.
# select subltree('Top.Child1.Child2',1,2);
ltree subpath
ltree subpath(ltree, OFFSET,LEN)
ltree subpath(ltree, OFFSET)
returns subpath of ltree from OFFSET (inclusive) with length LEN.
If OFFSET is negative returns subpath starts that far from the end
of the path. If LENGTH is omitted, returns everything to the end
of the path. If LENGTH is negative, leaves that many labels off
the end of the path.
# select subpath('Top.Child1.Child2',1,2);
# select subpath('Top.Child1.Child2',-2,1);
int4 nlevel
int4 nlevel(ltree) - returns level of the node.
# select nlevel('Top.Child1.Child2');
Note, that arguments start, end, OFFSET, LEN have meaning of level of the
node !
int4 index(ltree,ltree), int4 index(ltree,ltree,OFFSET)
returns number of level of the first occurence of second argument in first
one beginning from OFFSET. if OFFSET is negative, than search begins from |
OFFSET| levels from the end of the path.
SELECT index('','5.6',3);
SELECT index('','5.6',-4);
ltree text2ltree(text), text ltree2text(text)
cast functions for ltree and text.
ltree lca(ltree,ltree,...) (up to 8 arguments)
ltree lca(ltree[])
Returns Lowest Common Ancestor (lca)
# select lca('','');
# select lca('{la.2.3,}') is null;
cd contrib/ltree
make install
make installcheck
createdb ltreetest
psql ltreetest < /usr/local/pgsql/share/contrib/ltree.sql
psql ltreetest < ltreetest.sql
Now, we have a database ltreetest populated with a data describing hierarchy
shown below:
/ | \
Science Hobbies Collections
/ | \
Astronomy Amateurs_Astronomy Pictures
/ \ |
Astrophysics Cosmology Astronomy
/ | \
Galaxies Stars Astronauts
ltreetest=# select path from test where path <@ 'Top.Science';
(4 rows)
ltreetest=# select path from test where path ~ '*.Astronomy.*';
(7 rows)
ltreetest=# select path from test where path ~ '*.!pictures@.*.Astronomy.*';
(3 rows)
Full text search:
ltreetest=# select path from test where path @ 'Astro*% & !pictures@';
(4 rows)
ltreetest=# select path from test where path @ 'Astro* & !pictures@';
(3 rows)
Using Functions:
ltreetest=# select subpath(path,0,2)||'Space'||subpath(path,2) from test where path <@ 'Top.Science.Astronomy';
(3 rows)
We could create SQL-function:
CREATE FUNCTION ins_label(ltree, int4, text) RETURNS ltree
AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);'
and previous select could be rewritten as:
ltreetest=# select ins_label(path,2,'Space') from test where path <@ 'Top.Science.Astronomy';
(3 rows)
Or with another arguments:
CREATE FUNCTION ins_label(ltree, ltree, text) RETURNS ltree
AS 'select subpath($1,0,nlevel($2)) || $3 || subpath($1,nlevel($2));'
ltreetest=# select ins_label(path,'Top.Science'::ltree,'Space') from test where path <@ 'Top.Science.Astronomy';
(3 rows)
To get more feeling from our ltree module you could download
dmozltree-eng.sql.gz (about 3Mb tar.gz archive containing 300,274 nodes),
available from http://www.sai.msu.su/~megera/postgres/gist/ltree/
dmozltree-eng.sql.gz, which is DMOZ catalogue, prepared for use with ltree.
Setup your test database (dmoz), load ltree module and issue command:
zcat dmozltree-eng.sql.gz| psql dmoz
Data will be loaded into database dmoz and all indices will be created.
All runs were performed on my IBM ThinkPad T21 (256 MB RAM, 750Mhz) using DMOZ
data, containing 300,274 nodes (see above for download link). We used some
basic queries typical for walking through catalog.
* Q0: Count all rows (sort of base time for comparison)
select count(*) from dmoz;
(1 row)
* Q1: Get direct children (without inheritance)
select path from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1}';
(2 rows)
* Q2: The same as Q1 but with counting of successors
select path as parentpath , (select count(*)-1 from dmoz where path <@
p.path) as count from dmoz p where path ~ 'Top.Adult.Arts.Animation.*{1}';
parentpath | count
Top.Adult.Arts.Animation.Cartoons | 2
Top.Adult.Arts.Animation.Anime | 61
(2 rows)
* Q3: Get all parents
select path from dmoz where path @> 'Top.Adult.Arts.Animation' order by
path asc;
(4 rows)
* Q4: Get all parents with counting of children
select path, (select count(*)-1 from dmoz where path <@ p.path) as count
from dmoz p where path @> 'Top.Adult.Arts.Animation' order by path asc;
path | count
Top | 300273
Top.Adult | 4913
Top.Adult.Arts | 339
Top.Adult.Arts.Animation | 65
(4 rows)
* Q5: Get all children with levels
select path, nlevel(path) - nlevel('Top.Adult.Arts.Animation') as level
from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1,2}' order by path asc;
path | level
Top.Adult.Arts.Animation.Anime | 1
Top.Adult.Arts.Animation.Anime.Fan_Works | 2
Top.Adult.Arts.Animation.Anime.Games | 2
Top.Adult.Arts.Animation.Anime.Genres | 2
Top.Adult.Arts.Animation.Anime.Image_Galleries | 2
Top.Adult.Arts.Animation.Anime.Multimedia | 2
Top.Adult.Arts.Animation.Anime.Resources | 2
Top.Adult.Arts.Animation.Anime.Titles | 2
Top.Adult.Arts.Animation.Cartoons | 1
Top.Adult.Arts.Animation.Cartoons.AVS | 2
Top.Adult.Arts.Animation.Cartoons.Members | 2
(11 rows)
|Query|Rows|Time (ms) index|Time (ms) no index|
| Q0| 1| NA| 1453.44|
| Q1| 2| 0.49| 1001.54|
| Q2| 2| 1.48| 3009.39|
| Q3| 4| 0.55| 906.98|
| Q4| 4| 24385.07| 4951.91|
| Q5| 11| 0.85| 1003.23|
Timings without indices were obtained using operations which doesn't use
indices (see above)
We didn't run full-scale tests, also we didn't present (yet) data for
operations with arrays of ltree (ltree[]) and full text searching. We'll
appreciate your input. So far, below some (rather obvious) results:
* Indices does help execution of queries
* Q4 performs bad because one needs to read almost all data from the HDD
Mar 28, 2003
Added functions index(ltree,ltree,offset), text2ltree(text),
Feb 7, 2003
Add ? operation
Fix ~ operation bug: eg '1.1.1' ~ '*.1'
Optimize index storage
Aug 9, 2002
Fixed very stupid but important bug :-)
July 31, 2002
Now works on 64-bit platforms.
Added function lca - lowest common ancestor
Version for 7.2 is distributed as separate package -
July 13, 2002
Initial release.
* Testing on 64-bit platforms. There are several known problems with byte
alignment; -- RESOLVED
* Better documentation;
* We plan (probably) to improve regular expressions processing using
non-deterministic automata;
* Some sort of XML support;
* Better full text searching;
The approach we use for ltree is much like one we used in our other GiST based
contrib modules (intarray, tsearch, tree, btree_gist, rtree_gist). Theoretical
background is available in papers referenced from our GiST development page
A hierarchical data structure (tree) is a set of nodes. Each node has a
signature (LPS) of a fixed size, which is a hashed label path of that node.
Traversing a tree we could *certainly* prune branches if
LQS (bitwise AND) LPS != LQS
where LQS is a signature of lquery or ltxtquery, obtained in the same way as
For array of ltree LPS is a bitwise OR-ed signatures of *ALL* children
reachable from that node. Signatures are stored in RD-tree, implemented using
GiST, which provides indexed access.
For ltree we store LPS in a B-tree, implemented using GiST. Each node entry is
represented by (left_bound, signature, right_bound), so that we could speedup
operations <, <=, =, >=, > using left_bound, right_bound and prune branches of
a tree using signature.
We ask people who find the module useful to send us a postcards to:
Moscow, 119899, Universitetski pr.13, Moscow State University, Sternberg
Astronomical Institute, Russia
For: Bartunov O.S.
Moscow, Bratislavskaya str.23, appt. 18, Russia
For: Sigaev F.G.
pg_standby README 2006/12/08 Simon Riggs
o What is pg_standby?
pg_standby allows the creation of a Warm Standby server.
It is designed to be a production-ready program, as well as a
customisable template should you require specific modifications.
Other configuration is required as well, all of which is
described in the main server manual.
The program is designed to be a wait-for restore_command,
required to turn a normal archive recovery into a Warm Standby.
Within the restore_command of the recovery.conf you could
configure pg_standby in the following way:
restore_command = 'pg_standby archiveDir %f %p %r'
which would be sufficient to define that files will be restored
from archiveDir.
o features of pg_standby
- pg_standby is written in C. So it is very portable
and easy to install.
- supports copy or link from a directory (only)
- source easy to modify, with specifically designated
sections to modify for your own needs, allowing
interfaces to be written for additional Backup Archive Restore
(BAR) systems
- portable: tested on Linux and Windows
o How to install pg_standby
$make install
o How to use pg_standby?
pg_standby should be used within the restore_command of the
recovery.conf file. See the main PostgreSQL manual for details.
The basic usage should be like this:
restore_command = 'pg_standby archiveDir %f %p %r'
with the pg_standby command usage as
When used within the restore_command the %f and %p macros
will provide the actual file and path required for the restore/recovery.
pg_standby assumes that ARCHIVELOCATION is directory accessible by the
server-owning user.
If RESTARTWALFILE is specified, typically by using the %r option, then all files
prior to this file will be removed from ARCHIVELOCATION. This then minimises
the number of files that need to be held, whilst at the same time maintaining
restart capability. This capability additionally assumes that ARCHIVELOCATION
directory is writable.
o options
pg_standby allows the following command line switches
use copy/cp command to restore WAL files from archive
debug/logging option.
-k numfiles
Cleanup files in the archive so that we maintain no more
than this many files in the archive. This parameter will
be silently ignored if RESTARTWALFILE is specified, since
that specification method is more accurate in determining
the correct cut-off point in archive.
You should be wary against setting this number too low,
since this may mean you cannot restart the standby. This
is because the last restartpoint marked in the WAL files
may be many files in the past and can vary considerably.
This should be set to a value exceeding the number of WAL
files that can be recovered in 2*checkpoint_timeout seconds,
according to the value in the warm standby postgresql.conf.
It is wholly unrelated to the setting of checkpoint_segments
on either primary or standby.
Setting numfiles to be zero will disable deletion of files
If in doubt, use a large value or do not set a value at all.
If you specify neither RESTARTWALFILE nor -k, then -k 0
will be assumed, i.e. keep all files in archive.
Default=0, Min=0
use ln command to restore WAL files from archive
WAL files will remain in archive
Link is more efficient, but the default is copy to
allow you to maintain the WAL archive for recovery
purposes as well as high-availability.
The default setting is not necessarily recommended,
consult the main database server manual for discussion.
This option uses the Windows Vista command mklink
to provide a file-to-file symbolic link. -l will
not work on versions of Windows prior to Vista.
Use the -c option instead.
see http://en.wikipedia.org/wiki/NTFS_symbolic_link
-r maxretries
the maximum number of times to retry the restore command if it
fails. After each failure, we wait for sleeptime * num_retries
so that the wait time increases progressively, so by default
we will wait 5 secs, 10 secs then 15 secs before reporting
the failure back to the database server. This will be
interpreted as and end of recovery and the Standby will come
up fully as a result.
Default=3, Min=0
-s sleeptime
the number of seconds to sleep between testing to see
if the file to be restored is available in the archive yet.
The default setting is not necessarily recommended,
consult the main database server manual for discussion.
Default=5, Min=1, Max=60
-t triggerfile
the presence of the triggerfile will cause recovery to end
whether or not the next file is available
It is recommended that you use a structured filename to
avoid confusion as to which server is being triggered
when multiple servers exist on same system.
e.g. /tmp/pgsql.trigger.5432
-w maxwaittime
the maximum number of seconds to wait for the next file,
after which recovery will end and the Standby will come up.
A setting of zero means wait forever.
The default setting is not necessarily recommended,
consult the main database server manual for discussion.
Default=0, Min=0
Note: --help is not supported since pg_standby is not intended
for interactive use, except during dev/test
o examples
archive_command = 'cp %p ../archive/%f'
restore_command = 'pg_standby -l -d -k 255 -r 2 -s 2 -w 0 -t /tmp/pgsql.trigger.5442 $PWD/../archive %f %p 2>> standby.log'
which will
- use a ln command to restore WAL files from archive
- produce logfile output in standby.log
- keep the last 255 full WAL files, plus the current one
- sleep for 2 seconds between checks for next WAL file is full
- never timeout if file not found
- stop waiting when a trigger file called /tmp.pgsql.trigger.5442 appears
archive_command = 'copy %p ..\\archive\\%f'
Note that backslashes need to be doubled in the archive_command, but
*not* in the restore_command, in 8.2, 8.1, 8.0 on Windows.
restore_command = 'pg_standby -c -d -s 5 -w 0 -t C:\pgsql.trigger.5442 ..\archive %f %p 2>> standby.log'
which will
- use a copy command to restore WAL files from archive
- produce logfile output in standby.log
- sleep for 5 seconds between checks for next WAL file is full
- never timeout if file not found
- stop waiting when a trigger file called C:\pgsql.trigger.5442 appears
o supported versions
pg_standby is designed to work with PostgreSQL 8.2 and later. It is
currently compatible across minor changes between the way 8.3 and 8.2
PostgreSQL 8.3 provides the %r command line substitution, designed to
let pg_standby know the last file it needs to keep. If the last
parameter is omitted, no error is generated, allowing pg_standby to
function correctly with PostgreSQL 8.2 also. With PostgreSQL 8.2,
the -k option must be used if archive cleanup is required. This option
remains available in 8.3.
o reported test success
SUSE Linux 10.2
Windows XP Pro
o additional design notes
The use of a move command seems like it would be a good idea, but
this would prevent recovery from being restartable. Also, the last WAL
file is always requested twice from the archive.
trgm - Trigram matching for PostgreSQL
This module is sponsored by Delta-Soft Ltd., Moscow, Russia.
The pg_trgm contrib module provides functions and index classes
for determining the similarity of text based on trigram
Trigram (or Trigraph)
A trigram is a set of three consecutive characters taken
from a string. A string is considered to have two spaces
prefixed and one space suffixed when determining the set
of trigrams that comprise the string.
eg. The set of trigrams in the word "cat" is " c", " ca",
"at " and "cat".
Public Functions
real similarity(text, text)
Returns a number that indicates how closely matches the two
arguments are. A zero result indicates that the two words
are completely dissimilar, and a result of one indicates that
the two words are identical.
real show_limit()
Returns the current similarity threshold used by the '%'
operator. This in effect sets the minimum similarity between
two words in order that they be considered similar enough to
be misspellings of each other, for example.
real set_limit(real)
Sets the current similarity threshold that is used by the '%'
operator, and is returned by the show_limit() function.
text[] show_trgm(text)
Returns an array of all the trigrams of the supplied text
Public Operators
text % text (returns boolean)
The '%' operator returns TRUE if its two arguments have a similarity
that is greater than the similarity threshold set by set_limit(). It
will return FALSE if the similarity is less than the current
Public Index Operator Classes
The pg_trgm module comes with an index operator class that allows a
developer to create an index over a text column for the purpose
of very fast similarity searches.
To use this index, the '%' operator must be used and an appropriate
similarity threshold for the application must be set.
CREATE TABLE test_trgm (t text);
CREATE INDEX trgm_idx ON test_trgm USING gist (t gist_trgm_ops);
At this point, you will have an index on the t text column that you
can use for similarity searching.
similarity(t, 'word') AS sml
t % 'word'
sml DESC, t;
This will return all values in the text column that are sufficiently
similar to 'word', sorted from best match to worst. The index will
be used to make this a fast operation over very large data sets.
Tsearch2 Integration
Trigram matching is a very useful tool when used in conjunction
with a text index created by the Tsearch2 contrib module. (See
The first step is to generate an auxiliary table containing all
the unique words in the Tsearch2 index:
stat('SELECT to_tsvector(''simple'', bodytext) FROM documents');
Where 'documents' is a table that has a text field 'bodytext'
that TSearch2 is used to search. The use of the 'simple' dictionary
with the to_tsvector function, instead of just using the already
existing vector is to avoid creating a list of already stemmed
words. This way, only the original, unstemmed words are added
to the word list.
Next, create a trigram index on the word column:
CREATE INDEX words_idx ON words USING gist(word gist_trgm_ops);
CREATE INDEX words_idx ON words USING gin(word gist_trgm_ops);
Now, a SELECT query similar to the example above can be used to
suggest spellings for misspelled words in user search terms. A
useful extra clause is to ensure that the similar words are also
of similar length to the misspelled word.
Note: Since the 'words' table has been generated as a separate,
static table, it will need to be periodically regenerated so that
it remains up to date with the word list in the Tsearch2 index.
Oleg Bartunov <oleg@sai.msu.su>, Moscow, Moscow University, Russia
Teodor Sigaev <teodor@sigaev.ru>, Moscow, Delta-Soft Ltd.,Russia
Christopher Kings-Lynne wrote this README file
Tsearch2 Development Site
GiST Development Site
$PostgreSQL: pgsql/contrib/pgbench/README.pgbench,v 1.20 2007/07/06 20:17:02 wieck Exp $
pgbench README
o What is pgbench?
pgbench is a simple program to run a benchmark test. pgbench is a
client application of PostgreSQL and runs with PostgreSQL only. It
performs lots of small and simple transactions including
SELECT/UPDATE/INSERT operations then calculates number of
transactions successfully completed within a second (transactions
per second, tps). Targeting data includes a table with at least 100k
Example outputs from pgbench look like:
number of clients: 4
number of transactions per client: 100
number of processed transactions: 400/400
tps = 19.875015(including connections establishing)
tps = 20.098827(excluding connections establishing)
Similar program called "JDBCBench" already exists, but it requires
Java that may not be available on every platform. Moreover some
people concerned about the overhead of Java that might lead
inaccurate results. So I decided to write in pure C, and named
it "pgbench."
o features of pgbench
- pgbench is written in C using libpq only. So it is very portable
and easy to install.
- pgbench can simulate concurrent connections using asynchronous
capability of libpq. No threading is required.
o How to install pgbench
$make install
o How to use pgbench?
(1) (optional)Initialize database by:
pgbench -i <dbname>
where <dbname> is the name of database. pgbench uses four tables
accounts, branches, history and tellers. These tables will be
destroyed. Be very careful if you have tables having same
names. Default test data contains:
table # of tuples
branches 1
tellers 10
accounts 100000
history 0
You can increase the number of tuples by using -s option. branches,
tellers and accounts tables are created with a fillfactor which is
set using -F option. See below.
(2) Run the benchmark test
pgbench <dbname>
The default configuration is:
number of clients: 1
number of transactions per client: 10
o options
pgbench has number of options.
-h hostname
hostname where the backend is running. If this option
is omitted, pgbench will connect to the localhost via
Unix domain socket.
-p port
the port number that the backend is accepting. default is
libpq's default, usually 5432.
-c number_of_clients
Number of clients simulated. default is 1.
-t number_of_transactions
Number of transactions each client runs. default is 10.
-s scaling_factor
this should be used with -i (initialize) option.
number of tuples generated will be multiple of the
scaling factor. For example, -s 100 will imply 10M
(10,000,000) tuples in the accounts table.
default is 1. NOTE: scaling factor should be at least
as large as the largest number of clients you intend
to test; else you'll mostly be measuring update contention.
Regular (not initializing) runs using one of the
built-in tests will detect scale based on the number of
branches in the database. For custom (-f) runs it can
be manually specified with this parameter.
-D varname=value
Define a variable. It can be refered to by a script
provided by using -f option. Multiple -D options are allowed.
-U login
Specify db user's login name if it is different from
the Unix login name.
-P password
Specify the db password. CAUTION: using this option
might be a security hole since ps command will
show the password. Use this for TESTING PURPOSE ONLY.
No vacuuming and cleaning the history table prior to the
test is performed.
Do vacuuming before testing. This will take some time.
With neither -n nor -v, pgbench will vacuum tellers and
branches tables only.
Perform select only transactions instead of TPC-B.
-N Do not update "branches" and "tellers". This will
avoid heavy update contention on branches and tellers,
while it will not make pgbench supporting TPC-B like
-f filename
Read transaction script from file. Detailed
explanation will appear later.
Establish connection for each transaction, rather than
doing it just once at beginning of pgbench in the normal
mode. This is useful to measure the connection overhead.
Write the time taken by each transaction to a logfile,
with the name "pgbench_log.xxx", where xxx is the PID
of the pgbench process. The format of the log is:
client_id transaction_no time file_no time-epoch time-us
where time is measured in microseconds, , the file_no is
which test file was used (useful when multiple were
specified with -f), and time-epoch/time-us are a
UNIX epoch format timestamp followed by an offset
in microseconds (suitable for creating a ISO 8601
timestamp with a fraction of a second) of when
the transaction completed.
Here are example outputs:
0 199 2241 0 1175850568 995598
0 200 2465 0 1175850568 998079
0 201 2513 0 1175850569 608
0 202 2038 0 1175850569 2663
-F fillfactor
Create tables(accounts, tellers and branches) with the given
fillfactor. Default is 100. This should be used with -i
(initialize) option.
debug option.
o What is the "transaction" actually performed in pgbench?
(1) begin;
(2) update accounts set abalance = abalance + :delta where aid = :aid;
(3) select abalance from accounts where aid = :aid;
(4) update tellers set tbalance = tbalance + :delta where tid = :tid;
(5) update branches set bbalance = bbalance + :delta where bid = :bid;
(6) insert into history(tid,bid,aid,delta) values(:tid,:bid,:aid,:delta);
(7) end;
If you specify -N, (4) and (5) aren't included in the transaction.
o -f option
This supports for reading transaction script from a specified
file. This file should include SQL commands in each line. SQL
command consists of multiple lines are not supported. Empty lines
and lines begging with "--" will be ignored.
Multiple -f options are allowed. In this case each transaction is
assigned randomly chosen script.
SQL commands can include "meta command" which begins with "\" (back
slash). A meta command takes some arguments separted by white
spaces. Currently following meta command is supported:
\set name operand1 [ operator operand2 ]
set the calculated value using "operand1" "operator"
"operand2" to variable "name". If "operator" and "operand2"
are omitted, the value of operand1 is set to variable "name".
\set ntellers 10 * :scale
\setrandom name min max
assign random integer to name between min and max
\setrandom aid 1 100000
variables can be reffered to in SQL comands by adding ":" in front
of the varible name.
SELECT abalance FROM accounts WHERE aid = :aid
Variables can also be defined by using -D option.
\sleep num [us|ms|s]
causes script execution to sleep for the specified duration of
microseconds (us), milliseconds (ms) or the default seconds (s).
\setrandom millisec 1000 2500
\sleep :millisec ms
Example, TPC-B like benchmark can be defined as follows(scaling
factor = 1):
\set nbranches :scale
\set ntellers 10 * :scale
\set naccounts 100000 * :scale
\setrandom aid 1 :naccounts
\setrandom bid 1 :nbranches
\setrandom tid 1 :ntellers
\setrandom delta 1 10000
UPDATE accounts SET abalance = abalance + :delta WHERE aid = :aid
SELECT abalance FROM accounts WHERE aid = :aid
UPDATE tellers SET tbalance = tbalance + :delta WHERE tid = :tid
UPDATE branches SET bbalance = bbalance + :delta WHERE bid = :bid
INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, 'now')
If you want to automatically set the scaling factor from the number of
tuples in branches table, use -s option and shell command like this:
pgbench -s $(psql -At -c "SELECT count(*) FROM branches") -f tpc_b.sql
Notice that -f option does not execute vacuum and clearing history
table before starting benchmark.
o License?
Basically it is same as BSD license. See pgbench.c for more details.
o History before contributed to PostgreSQL
2000/1/15 pgbench-1.2 contributed to PostgreSQL
* Add -v option
1999/09/29 pgbench-1.1 released
* Apply cygwin patches contributed by Yutaka Tanida
* More robust when backends die
* Add -S option (select only)
1999/09/04 pgbench-1.0 released
$PostgreSQL: pgsql/contrib/pgrowlocks/README.pgrowlocks,v 1.2 2007/08/27 00:13:51 tgl Exp $
pgrowlocks README Tatsuo Ishii
1. What is pgrowlocks?
pgrowlocks shows row locking information for specified table.
pgrowlocks returns following columns:
locked_row TID, -- row TID
lock_type TEXT, -- lock type
locker XID, -- locking XID
multi bool, -- multi XID?
xids xid[], -- multi XIDs
pids INTEGER[] -- locker's process id
Here is a sample execution of pgrowlocks:
test=# SELECT * FROM pgrowlocks('t1');
locked_row | lock_type | locker | multi | xids | pids
(0,1) | Shared | 19 | t | {804,805} | {29066,29068}
(0,2) | Shared | 19 | t | {804,805} | {29066,29068}
(0,3) | Exclusive | 804 | f | {804} | {29066}
(0,4) | Exclusive | 804 | f | {804} | {29066}
(4 rows)
locked_row -- tuple ID(TID) of each locked rows
lock_type -- "Shared" for shared lock, "Exclusive" for exclusive lock
locker -- transaction ID of locker (note 1)
multi -- "t" if locker is a multi transaction, otherwise "f"
xids -- XIDs of lockers (note 2)
pids -- process ids of locking backends
note1: if the locker is multi transaction, it represents the multi ID
note2: if the locker is multi, multiple data are shown
2. Installing pgrowlocks
Installing pgrowlocks requires PostgreSQL 8.0 or later source tree.
$ cd /usr/local/src/postgresql-8.1/contrib
$ tar xfz /tmp/pgrowlocks-1.0.tar.gz
If you are using PostgreSQL 8.0, you need to modify pgrowlocks source code.
Around line 61, you will see:
change this to:
$ make
$ make install
$ psql -e -f pgrowlocks.sql test
3. How to use pgrowlocks
pgrowlocks grab AccessShareLock for the target table and read each
row one by one to get the row locking information. You should
notice that:
1) if the table is exclusive locked by someone else, pgrowlocks
will be blocked.
2) pgrowlocks may show incorrect information if there's a new
lock or a lock is freeed while its execution.
pgrowlocks does not show the contents of locked rows. If you want
to take a look at the row contents at the same time, you could do
something like this:
SELECT * FROM accounts AS a, pgrowlocks('accounts') AS p WHERE p.locked_ row = a.ctid;
4. License
pgrowlocks is distribute under (modified) BSD license described in
the source file.
5. History
2006/03/21 pgrowlocks version 1.1 released (tested on 8.2 current)
2005/08/22 pgrowlocks version 1.0 released
pgstattuple README 2002/08/29 Tatsuo Ishii
1. Functions supported:
pgstattuple() returns the relation length, percentage of the "dead"
tuples of a relation and other info. This may help users to determine
whether vacuum is necessary or not. Here is an example session:
test=> \x
Expanded display is on.
test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
-[ RECORD 1 ]------+-------
table_len | 458752
tuple_count | 1470
tuple_len | 438896
tuple_percent | 95.67
dead_tuple_count | 11
dead_tuple_len | 3157
dead_tuple_percent | 0.69
free_space | 8932
free_percent | 1.95
Here are explanations for each column:
table_len -- physical relation length in bytes
tuple_count -- number of live tuples
tuple_len -- total tuples length in bytes
tuple_percent -- live tuples in %
dead_tuple_len -- total dead tuples length in bytes
dead_tuple_percent -- dead tuples in %
free_space -- free space in bytes
free_percent -- free space in %
pg_relpages() returns the number of pages in the relation.
pgstatindex() returns an array showing the information about an index:
test=> \x
Expanded display is on.
test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
-[ RECORD 1 ]------+------
version | 2
tree_level | 0
index_size | 8192
root_block_no | 1
internal_pages | 0
leaf_pages | 1
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 50.27
leaf_fragmentation | 0
2. Installing pgstattuple
$ make
$ make install
$ psql -e -f /usr/local/pgsql/share/contrib/pgstattuple.sql test
3. Using pgstattuple
pgstattuple may be called as a relation function and is
defined as follows:
CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
AS 'MODULE_PATHNAME', 'pgstattuple'
CREATE OR REPLACE FUNCTION pgstattuple(oid) RETURNS pgstattuple_type
AS 'MODULE_PATHNAME', 'pgstattuplebyid'
The argument is the relation name (optionally it may be qualified)
or the OID of the relation. Note that pgstattuple only returns
one row.
4. Notes
pgstattuple acquires only a read lock on the relation. So concurrent
update may affect the result.
pgstattuple judges a tuple is "dead" if HeapTupleSatisfiesNow()
returns false.
5. History
Moved page-level functions to contrib/pageinspect.
Extended to work against indexes.
This directory contains the code for the user-defined type,
SEG, representing laboratory measurements as floating point
The geometry of measurements is usually more complex than that of a
point in a numeric continuum. A measurement is usually a segment of
that continuum with somewhat fuzzy limits. The measurements come out
as intervals because of uncertainty and randomness, as well as because
the value being measured may naturally be an interval indicating some
condition, such as the temperature range of stability of a protein.
Using just common sense, it appears more convenient to store such data
as intervals, rather than pairs of numbers. In practice, it even turns
out more efficient in most applications.
Further along the line of common sense, the fuzziness of the limits
suggests that the use of traditional numeric data types leads to a
certain loss of information. Consider this: your instrument reads
6.50, and you input this reading into the database. What do you get
when you fetch it? Watch:
test=> select 6.50 as "pH";
(1 row)
In the world of measurements, 6.50 is not the same as 6.5. It may
sometimes be critically different. The experimenters usually write
down (and publish) the digits they trust. 6.50 is actually a fuzzy
interval contained within a bigger and even fuzzier interval, 6.5,
with their center points being (probably) the only common feature they
share. We definitely do not want such different data items to appear the
Conclusion? It is nice to have a special data type that can record the
limits of an interval with arbitrarily variable precision. Variable in
a sense that each data element records its own precision.
Check this out:
test=> select '6.25 .. 6.50'::seg as "pH";
6.25 .. 6.50
(1 row)
Makefile building instructions for the shared library
README.seg the file you are now reading
seg.c the implementation of this data type in c
seg.sql.in SQL code needed to register this type with postgres
(transformed to seg.sql by make)
segdata.h the data structure used to store the segments
segparse.y the grammar file for the parser (used by seg_in() in seg.c)
segscan.l scanner rules (used by seg_yyparse() in segparse.y)
seg-validate.pl a simple input validation script. It is probably a
little stricter than the type itself: for example,
it rejects '22 ' because of the trailing space. Use
as a filter to discard bad values from a single column;
redirect to /dev/null to see the offending input
sort-segments.pl a script to sort the tables having a SEG type column
To install the type, run
make install
The user running "make install" may need root access; depending on how you
configured the PostgreSQL installation paths.
This only installs the type implementation and documentation. To make the
type available in any particular database, do
psql -d databasename < seg.sql
If you install the type in the template1 database, all subsequently created
databases will inherit it.
To test the new type, after "make install" do
make installcheck
If it fails, examine the file regression.diffs to find out the reason (the
test code is a direct adaptation of the regression tests from the main
source tree).
The external representation of an interval is formed using one or two
floating point numbers joined by the range operator ('..' or '...').
Optional certainty indicators (<, > and ~) are ignored by the internal
logics, but are retained in the data.
rule 1 seg -> boundary PLUMIN deviation
rule 2 seg -> boundary RANGE boundary
rule 3 seg -> boundary RANGE
rule 4 seg -> RANGE boundary
rule 5 seg -> boundary
rule 6 boundary -> FLOAT
rule 7 boundary -> EXTENSION FLOAT
rule 8 deviation -> FLOAT
RANGE (\.\.)(\.)?
PLUMIN \'\+\-\'
integer [+-]?[0-9]+
real [+-]?[0-9]+\.[0-9]+
FLOAT ({integer}|{real})([eE]{integer})?
Examples of valid SEG representations:
Any number (rules 5,6) -- creates a zero-length segment (a point,
if you will)
~5.0 (rules 5,7) -- creates a zero-length segment AND records
'~' in the data. This notation reads 'approximately 5.0',
but its meaning is not recognized by the code. It is ignored
until you get the value back. View it is a short-hand comment.
<5.0 (rules 5,7) -- creates a point at 5.0; '<' is ignored but
is preserved as a comment
>5.0 (rules 5,7) -- creates a point at 5.0; '>' is ignored but
is preserved as a comment
5'+-'0.3 (rules 1,8) -- creates an interval '4.7..5.3'. As of this
writing (02/09/2000), this mechanism isn't completely accurate
in determining the number of significant digits for the
boundaries. For example, it adds an extra digit to the lower
boundary if the resulting interval includes a power of ten:
postgres=> select '10(+-)1'::seg as seg;
9.0 .. 11 -- should be: 9 .. 11
Also, the (+-) notation is not preserved: 'a(+-)b' will
always be returned as '(a-b) .. (a+b)'. The purpose of this
notation is to allow input from certain data sources without
50 .. (rule 3) -- everything that is greater than or equal to 50
.. 0 (rule 4) -- everything that is less than or equal to 0
1.5e-2 .. 2E-2 (rule 2) -- creates an interval (0.015 .. 0.02)
1 ... 2 The same as 1...2, or 1 .. 2, or 1..2 (space is ignored).
Because of the widespread use of '...' in the data sources,
I decided to stick to is as a range operator. This, and
also the fact that the white space around the range operator
is ignored, creates a parsing conflict with numeric constants
starting with a decimal point.
Examples of invalid SEG input:
.1e7 should be: 0.1e7
.1 .. .2 should be: 0.1 .. 0.2
2.4 E4 should be: 2.4E4
The following, although it is not a syntax error, is disallowed to improve
the sanity of the data:
5 .. 2 should be: 2 .. 5
The segments are stored internally as pairs of 32-bit floating point
numbers. It means that the numbers with more than 7 significant digits
will be truncated.
The numbers with less than or exactly 7 significant digits retain their
original precision. That is, if your query returns 0.00, you will be
sure that the trailing zeroes are not the artifacts of formatting: they
reflect the precision of the original data. The number of leading
zeroes does not affect precision: the value 0.0067 is considered to
have just 2 significant digits.
The access method for SEG is a GiST index (gist_seg_ops), which is a
generalization of R-tree. GiSTs allow the postgres implementation of
R-tree, originally encoded to support 2-D geometric types such as
boxes and polygons, to be used with any data type whose data domain
can be partitioned using the concepts of containment, intersection and
equality. In other words, everything that can intersect or contain
its own kind can be indexed with a GiST. That includes, among other
things, all geometric data types, regardless of their dimensionality
(see also contrib/cube).
The operators supported by the GiST access method include:
[a, b] << [c, d] Is left of
The left operand, [a, b], occurs entirely to the left of the
right operand, [c, d], on the axis (-inf, inf). It means,
[a, b] << [c, d] is true if b < c and false otherwise
[a, b] >> [c, d] Is right of
[a, b] is occurs entirely to the right of [c, d].
[a, b] >> [c, d] is true if a > d and false otherwise
[a, b] &< [c, d] Overlaps or is left of
This might be better read as "does not extend to right of".
It is true when b <= d.
[a, b] &> [c, d] Overlaps or is right of
This might be better read as "does not extend to left of".
It is true when a >= c.
[a, b] = [c, d] Same as
The segments [a, b] and [c, d] are identical, that is, a == b
and c == d
[a, b] && [c, d] Overlaps
The segments [a, b] and [c, d] overlap.
[a, b] @> [c, d] Contains
The segment [a, b] contains the segment [c, d], that is,
a <= c and b >= d
[a, b] <@ [c, d] Contained in
The segment [a, b] is contained in [c, d], that is,
a >= c and b <= d
(Before PostgreSQL 8.2, the containment operators @> and <@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
Although the mnemonics of the following operators is questionable, I
preserved them to maintain visual consistency with other geometric
data types defined in Postgres.
Other operators:
[a, b] < [c, d] Less than
[a, b] > [c, d] Greater than
These operators do not make a lot of sense for any practical
purpose but sorting. These operators first compare (a) to (c),
and if these are equal, compare (b) to (d). That accounts for
reasonably good sorting in most cases, which is useful if
you want to use ORDER BY with this type
There are a few other potentially useful functions defined in seg.c
that vanished from the schema because I stopped using them. Some of
these were meant to support type casting. Let me know if I was wrong:
I will then add them back to the schema. I would also appreciate
other ideas that would enhance the type and make it more useful.
For examples of usage, see sql/seg.sql
NOTE: The performance of an R-tree index can largely depend on the
order of input values. It may be very helpful to sort the input table
on the SEG column (see the script sort-segments.pl for an example)
My thanks are primarily to Prof. Joe Hellerstein
(http://db.cs.berkeley.edu/~jmh/) for elucidating the gist of the GiST
(http://gist.cs.berkeley.edu/). I am also grateful to all postgres
developers, present and past, for enabling myself to create my own
world and live undisturbed in it. And I would like to acknowledge my
gratitude to Argonne Lab and to the U.S. Department of Energy for the
years of faithful support of my database research.
Gene Selkov, Jr.
Computational Scientist
Mathematics and Computer Science Division
Argonne National Laboratory
9700 S Cass Ave.
Building 221
Argonne, IL 60439-4844
sslinfo - information about current SSL certificate for PostgreSQL
Author: Victor Wagner <vitus@cryptocom.ru>, Cryptocom LTD
E-Mail of Cryptocom OpenSSL development group: <openssl@cryptocom.ru>
1. Notes
This extension won't build unless your PostgreSQL server is configured
with --with-openssl. Information provided with these functions would
be completely useless if you don't use SSL to connect to database.
2. Functions Description
2.1. ssl_is_used()
ssl_is_used() RETURNS boolean;
Returns TRUE, if current connection to server uses SSL and FALSE
2.2. ssl_client_cert_present()
ssl_client_cert_present() RETURNS boolean
Returns TRUE if current client have presented valid SSL client
certificate to the server and FALSE otherwise (e.g., no SSL,
certificate hadn't be requested by server).
2.3. ssl_client_serial()
ssl_client_serial() RETURNS numeric
Returns serial number of current client certificate. The combination
of certificate serial number and certificate issuer is guaranteed to
uniquely identify certificate (but not its owner -- the owner ought to
regularily change his keys, and get new certificates from the issuer).
So, if you run you own CA and allow only certificates from this CA to
be accepted by server, the serial number is the most reliable (albeit
not very mnemonic) means to indentify user.
2.4. ssl_client_dn()
ssl_client_dn() RETURNS text
Returns the full subject of current client certificate, converting
character data into the current database encoding. It is assumed that
if you use non-Latin characters in the certificate names, your
database is able to represent these characters, too. If your database
uses the SQL_ASCII encoding, non-Latin characters in the name will be
represented as UTF-8 sequences.
The result looks like '/CN=Somebody /C=Some country/O=Some organization'.
2.5. ssl_issuer_dn()
Returns the full issuer name of the client certificate, converting
character data into current database encoding.
The combination of the return value of this function with the
certificate serial number uniquely identifies the certificate.
The result of this function is really useful only if you have more
than one trusted CA certificate in your server's root.crt file, or if
this CA has issued some intermediate certificate authority
2.6. ssl_client_dn_field()
ssl_client_dn_field(fieldName text) RETURNS text
This function returns the value of the specified field in the
certificate subject. Field names are string constants that are
converted into ASN1 object identificators using the OpenSSL object
database. The following values are acceptable:
commonName (alias CN)
surname (alias SN)
givenName (alias GN)
countryName (alias C)
localityName (alias L)
stateOrProvinceName (alias ST)
organizationName (alias O)
organizationUnitName (alias OU)
All of these fields are optional, except commonName. It depends
entirely on your CA policy which of them would be included and which
wouldn't. The meaning of these fields, howeer, is strictly defined by
the X.500 and X.509 standards, so you cannot just assign arbitrary
meaning to them.
2.7 ssl_issuer_field()
ssl_issuer_field(fieldName text) RETURNS text;
Does same as ssl_client_dn_field, but for the certificate issuer
rather than the certificate subject.
UUID Generation Functions
Peter Eisentraut <peter_e@gmx.net>
This module provides functions to generate universally unique
identifiers (UUIDs) using one of the several standard algorithms, as
well as functions to produce certain special UUID constants.
The extra library required can be found at
UUID Generation
The relevant standards ITU-T Rec. X.667, ISO/IEC 9834-8:2005, and RFC
4122 specify four algorithms for generating UUIDs, identified by the
version numbers 1, 3, 4, and 5. (There is no version 2 algorithm.)
Each of these algorithms could be suitable for a different set of
This function generates a version 1 UUID. This involves the MAC
address of the computer and a time stamp. Note that UUIDs of this
kind reveal the identity of the computer that created the identifier
and the time at which it did so, which might make it unsuitable for
certain security-sensitive applications.
This function generates a version 1 UUID but uses a random multicast
MAC address instead of the real MAC address of the computer.
uuid_generate_v3(namespace uuid, name text)
This function generates a version 3 UUID in the given namespace using
the specified input name. The namespace should be one of the special
constants produced by the uuid_ns_*() functions shown below. (It
could be any UUID in theory.) The name is an identifier in the
selected namespace. For example:
uuid_generate_v3(uuid_ns_url(), 'http://www.postgresql.org')
The name parameter will be MD5-hashed, so the cleartext cannot be
derived from the generated UUID.
The generation of UUIDs by this method has no random or
environment-dependent element and is therefore reproducible.
This function generates a version 4 UUID, which is derived entirely
from random numbers.
uuid_generate_v5(namespace uuid, name text)
This function generates a version 5 UUID, which works like a version 3
UUID except that SHA-1 is used as a hashing method. Version 5 should
be preferred over version 3 because SHA-1 is thought to be more secure
than MD5.
UUID Constants
A "nil" UUID constant, which does not occur as a real UUID.
Constant designating the DNS namespace for UUIDs.
Constant designating the URL namespace for UUIDs.
Constant designating the ISO object identifier (OID) namespace for
UUIDs. (This pertains to ASN.1 OIDs, unrelated to the OIDs used in
Constant designating the X.500 distinguished name (DN) namespace for
$PostgreSQL: pgsql/contrib/vacuumlo/README.vacuumlo,v 1.5 2005/06/23 00:06:37 tgl Exp $
This is a simple utility that will remove any orphaned large objects out of a
PostgreSQL database. An orphaned LO is considered to be any LO whose OID
does not appear in any OID data column of the database.
If you use this, you may also be interested in the lo_manage trigger in
contrib/lo. lo_manage is useful to try to avoid creating orphaned LOs
in the first place.
Simply run make. A single executable "vacuumlo" is created.
vacuumlo [options] database [database2 ... databasen]
All databases named on the command line are processed. Available options
-v Write a lot of progress messages
-n Don't remove large objects, just show what would be done
-U username Username to connect as
-W Prompt for password
-h hostname Database server host
-p port Database server port
First, it builds a temporary table which contains all of the OIDs of the
large objects in that database.
It then scans through all columns in the database that are of type "oid"
or "lo", and removes matching entries from the temporary table.
The remaining entries in the temp table identify orphaned LOs. These are
I decided to place this in contrib as it needs further testing, but hopefully,
this (or a variant of it) would make it into the backend as a "vacuum lo"
command in a later release.
Peter Mount <peter@retep.org.uk>
March 21 1999
Committed April 10 1999 Peter
XML-handling functions for PostgreSQL
DEPRECATION NOTICE: From PostgreSQL 8.3 on, there is XML-related
functionality based on the SQL/XML standard in the core server.
That functionality covers XML syntax checking and XPath queries,
which is what this module does as well, and more, but the API is
not at all compatible. It is planned that this module will be
removed in PostgreSQL 8.4 in favor of the newer standard API, so
you are encouraged to try converting your applications. If you
find that some of the functionality of this module is not
available in an adequate form with the newer API, please explain
your issue to pgsql-hackers@postgresql.org so that the deficiency
can be addressed.
-- Peter Eisentraut, 2007-05-24
Development of this module was sponsored by Torchbox Ltd. (www.torchbox.com)
It has the same BSD licence as PostgreSQL.
This version of the XML functions provides both XPath querying and
XSLT functionality. There is also a new table function which allows
the straightforward return of multiple XML results. Note that the current code
doesn't take any particular care over character sets - this is
something that should be fixed at some point!
The current build process will only work if the files are in
contrib/xml2 in a PostgreSQL 7.3 or later source tree which has been
configured and built (If you alter the subdir value in the Makefile
you can place it in a different directory in a PostgreSQL tree).
Before you begin, just check the Makefile, and then just 'make' and
'make install'.
By default, this module requires both libxml2 and libxslt to be installed
on your system. If you do not have libxslt or do not want to use XSLT
functions, you must edit the Makefile to not build the XSLT functions,
as directed in its comments; and edit pgxml.sql.in to remove the XSLT
function declarations, as directed in its comments.
Description of functions
The first set of functions are straightforward XML parsing and XPath queries:
xml_is_well_formed(document) RETURNS bool
This parses the document text in its parameter and returns true if the
document is well-formed XML. (Note: before PostgreSQL 8.2, this function
was called xml_valid(). That is the wrong name since validity and
well-formedness have different meanings in XML. The old name is still
available, but is deprecated and will be removed in 8.3.)
xpath_string(document,query) RETURNS text
xpath_number(document,query) RETURNS float4
xpath_bool(document,query) RETURNS bool
These functions evaluate the XPath query on the supplied document, and
cast the result to the specified type.
xpath_nodeset(document,query,toptag,itemtag) RETURNS text
This evaluates query on document and wraps the result in XML tags. If
the result is multivalued, the output will look like:
<itemtag>Value 1 which could be an XML fragment</itemtag>
<itemtag>Value 2....</itemtag>
If either toptag or itemtag is an empty string, the relevant tag is omitted.
There are also wrapper functions for this operation:
xpath_nodeset(document,query) RETURNS text omits both tags.
xpath_nodeset(document,query,itemtag) RETURNS text omits toptag.
xpath_list(document,query,seperator) RETURNS text
This function returns multiple values seperated by the specified
seperator, e.g. Value 1,Value 2,Value 3 if seperator=','.
xpath_list(document,query) RETURNS text
This is a wrapper for the above function that uses ',' as the seperator.
This is a table function which evaluates a set of XPath queries on
each of a set of documents and returns the results as a table. The
primary key field from the original document table is returned as the
first column of the result so that the resultset from xpath_table can
be readily used in joins.
The function itself takes 5 arguments, all text.
key - the name of the "key" field - this is just a field to be used as
the first column of the output table i.e. it identifies the record from
which each output row came (see note below about multiple values).
document - the name of the field containing the XML document
relation - the name of the table or view containing the documents
xpaths - multiple xpath expressions separated by |
criteria - The contents of the where clause. This needs to be specified,
so use "true" or "1=1" here if you want to process all the rows in the
NB These parameters (except the XPath strings) are just substituted
into a plain SQL SELECT statement, so you have some flexibility - the
statement is
SELECT <key>,<document> FROM <relation> WHERE <criteria>
so those parameters can be *anything* valid in those particular
locations. The result from this SELECT needs to return exactly two
columns (which it will unless you try to list multiple fields for key
or document). Beware that this simplistic approach requires that you
validate any user-supplied values to avoid SQL injection attacks.
Using the function
The function has to be used in a FROM expression. This gives the following
'date_entered > ''2003-01-01'' ')
AS t(article_id integer, author text, page_count integer, title text);
The AS clause defines the names and types of the columns in the
virtual table. If there are more XPath queries than result columns,
the extra queries will be ignored. If there are more result columns
than XPath queries, the extra columns will be NULL.
Note that I've said in this example that pages is an integer. The
function deals internally with string representations, so when you say
you want an integer in the output, it will take the string
representation of the XPath result and use PostgreSQL input functions
to transform it into an integer (or whatever type the AS clause
requests). An error will result if it can't do this - for example if
the result is empty - so you may wish to just stick to 'text' as the
column type if you think your data has any problems.
The select statement doesn't need to use * alone - it can reference the
columns by name or join them to other tables. The function produces a
virtual table with which you can perform any operation you wish (e.g.
aggregation, joining, sorting etc). So we could also have:
SELECT t.title, p.fullname, p.email
FROM xpath_table('article_id','article_xml','articles',
'xpath_string(article_xml,''/article/@date'') > ''2003-03-20'' ')
AS t(article_id integer, title text, author_id integer),
tblPeopleInfo AS p
WHERE t.author_id = p.person_id;
as a more complicated example. Of course, you could wrap all
of this in a view for convenience.
Multivalued results
The xpath_table function assumes that the results of each XPath query
might be multi-valued, so the number of rows returned by the function
may not be the same as the number of input documents. The first row
returned contains the first result from each query, the second row the
second result from each query. If one of the queries has fewer values
than the others, NULLs will be returned instead.
In some cases, a user will know that a given XPath query will return
only a single result (perhaps a unique document identifier) - if used
alongside an XPath query returning multiple results, the single-valued
result will appear only on the first row of the result. The solution
to this is to use the key field as part of a join against a simpler
XPath query. As an example:
id int4 NOT NULL,
xml text,
INSERT INTO test VALUES (1, '<doc num="C1">
<line num="L1"><a>1</a><b>2</b><c>3</c></line>
<line num="L2"><a>11</a><b>22</b><c>33</c></line>
INSERT INTO test VALUES (2, '<doc num="C2">
<line num="L1"><a>111</a><b>222</b><c>333</c></line>
<line num="L2"><a>111</a><b>222</b><c>333</c></line>
The query:
SELECT * FROM xpath_table('id','xml','test',
AS t(id int4, doc_num varchar(10), line_num varchar(10), val1 int4,
val2 int4, val3 int4)
WHERE id = 1 ORDER BY doc_num, line_num
Gives the result:
id | doc_num | line_num | val1 | val2 | val3
1 | C1 | L1 | 1 | 2 | 3
1 | | L2 | 11 | 22 | 33
To get doc_num on every line, the solution is to use two invocations
of xpath_table and join the results:
SELECT t.*,i.doc_num FROM
AS t(id int4, line_num varchar(10), val1 int4, val2 int4, val3 int4),
AS i(id int4, doc_num varchar(10))
WHERE i.id=t.id AND i.id=1
ORDER BY doc_num, line_num;
which gives the desired result:
id | line_num | val1 | val2 | val3 | doc_num
1 | L1 | 1 | 2 | 3 | C1
1 | L2 | 11 | 22 | 33 | C1
(2 rows)
XSLT functions
The following functions are available if libxslt is installed (this is
not currently detected automatically, so you will have to amend the
xslt_process(document,stylesheet,paramlist) RETURNS text
This function appplies the XSL stylesheet to the document and returns
the transformed result. The paramlist is a list of parameter
assignments to be used in the transformation, specified in the form
'a=1,b=2'. Note that this is also proof-of-concept code and the
parameter parsing is very simple-minded (e.g. parameter values cannot
contain commas!)
Also note that if either the document or stylesheet values do not
begin with a < then they will be treated as URLs and libxslt will
fetch them. It thus follows that you can use xslt_process as a means
to fetch the contents of URLs - you should be aware of the security
implications of this.
There is also a two-parameter version of xslt_process which does not
pass any parameters to the transformation.
If you have any comments or suggestions, please do contact me at
jgray@azuli.co.uk. Unfortunately, this isn't my main job, so I can't
guarantee a rapid response to your query!
adminpack is a PostgreSQL standard module that implements a number of
support functions which pgAdmin and other administration and management tools
can use to provide additional functionality if installed on a server.
<title>Functions implemented</title>
Functions implemented by adminpack can only be run by a superuser. Here's a
list of these functions:
int8 pg_catalog.pg_file_write(fname text, data text, append bool)
bool pg_catalog.pg_file_rename(oldname text, newname text, archivname text)
bool pg_catalog.pg_file_rename(oldname text, newname text)
bool pg_catalog.pg_file_unlink(fname text)
setof record pg_catalog.pg_logdir_ls()
/* Renaming of existing backend functions for pgAdmin compatibility */
int8 pg_catalog.pg_file_read(fname text, data text, append bool)
bigint pg_catalog.pg_file_length(text)
int4 pg_catalog.pg_logfile_rotate()
<indexterm zone="btree-gist">
btree-gist is a B-Tree implementation using GiST that supports the int2, int4,
int8, float4, float8 timestamp with/without time zone, time
with/without time zone, date, interval, oid, money, macaddr, char,
varchar/text, bytea, numeric, bit, varbit and inet/cidr types.
<title>Example usage</title>
CREATE TABLE test (a int4);
-- create index
CREATE INDEX testidx ON test USING gist (a);
-- query
SELECT * FROM test WHERE a < 10;
All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) ,
Oleg Bartunov (<email>oleg@sai.msu.su</email>), Janko Richter
(<email>jankorichter@yahoo.de</email>). See
<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for additional
Pg_buffercache - Real time queries on the shared buffer cache.
This module consists of a C function 'pg_buffercache_pages()' that returns
a set of records, plus a view 'pg_buffercache' to wrapper the function.
The intent is to do for the buffercache what pg_locks does for locks, i.e -
ability to examine what is happening at any given time without having to
restart or rebuild the server with debugging code added.
<sect1 id="buffercache">
<indexterm zone="buffercache">
<literal>pg_buffercache</literal> module provides the means for examining
what's happening to the buffercache at any given time without having to
restart or rebuild the server with debugging code added. The intent is to
do for the buffercache what pg_locks does for locks.
This module consists of a C function <literal>pg_buffercache_pages()</literal>
that returns a set of records, plus a view <literal>pg_buffercache</literal>
to wrapper the function.
By default public access is REVOKED from both of these, just in case there
are security issues lurking.
Build and install the main Postgresql source, then this contrib module:
$ cd contrib/pg_buffercache
$ gmake
$ gmake install
To register the functions:
$ psql -d <database> -f pg_buffercache.sql
The definition of the columns exposed in the view is:
The definition of the columns exposed in the view is:
Column | references | Description
bufferid | | Id, 1..shared_buffers.
......@@ -41,23 +36,27 @@ Notes
relblocknumber | | Offset of the page in the relation.
isdirty | | Is the page dirty?
usagecount | | Page LRU count
There is one row for each buffer in the shared cache. Unused buffers are
shown with all fields null except bufferid.
Because the cache is shared by all the databases, there are pages from
relations not belonging to the current database.
When the pg_buffercache view is accessed, internal buffer manager locks are
taken, and a copy of the buffer cache data is made for the view to display.
This ensures that the view produces a consistent set of results, while not
blocking normal buffer activity longer than necessary. Nonetheless there
could be some impact on database performance if this view is read often.
Sample output
There is one row for each buffer in the shared cache. Unused buffers are
shown with all fields null except bufferid.
Because the cache is shared by all the databases, there are pages from
relations not belonging to the current database.
When the pg_buffercache view is accessed, internal buffer manager locks are
taken, and a copy of the buffer cache data is made for the view to display.
This ensures that the view produces a consistent set of results, while not
blocking normal buffer activity longer than necessary. Nonetheless there
could be some impact on database performance if this view is read often.
<title>Sample output</title>
regression=# \d pg_buffercache;
View "public.pg_buffercache"
Column | Type | Modifiers
......@@ -98,18 +97,25 @@ Sample output
(10 rows)
Mark Kirkwood <email>markir@paradise.net.nz</email>
<para>Design suggestions: Neil Conway <email>neilc@samurai.com</email></para>
<para>Debugging advice: Tom Lane <email>tgl@sss.pgh.pa.us</email></para>
* Mark Kirkwood <markir@paradise.net.nz>
* Design suggestions : Neil Conway <neilc@samurai.com>
* Debugging advice : Tom Lane <tgl@sss.pgh.pa.us>
Thanks guys!
<sect1 id="chkpass">
<indexterm zone="chkpass">
chkpass is a password type that is automatically checked and converted upon
entry. It is stored encrypted. To compare, simply compare against a clear
text password and the comparison function will encrypt it before comparing.
It also returns an error if the code determines that the password is easily
crackable. This is currently a stub that does nothing.
Note that the chkpass data type is not indexable.
I haven't worried about making this type indexable. I doubt that anyone
would ever need to sort a file in order of encrypted password.
If you precede the string with a colon, the encryption and checking are
skipped so that you can enter existing passwords into the field.
On output, a colon is prepended. This makes it possible to dump and reload
passwords without re-encrypting them. If you want the password (encrypted)
without the colon then use the raw() function. This allows you to use the
type with things like Apache's Auth_PostgreSQL module.
The encryption uses the standard Unix function crypt(), and so it suffers
from all the usual limitations of that function; notably that only the
first eight characters of a password are considered.
Here is some sample usage:
test=# create table test (p chkpass);
test=# insert into test values ('hello');
test=# select * from test;
(1 row)
test=# select raw(p) from test;
(1 row)
test=# select p = 'hello' from test;
(1 row)
test=# select p = 'goodbye' from test;
(1 row)
D'Arcy J.M. Cain <email>darcy@druid.net</email>
<chapter id="contrib">
<title>Standard Modules</title>
This section contains information regarding the standard modules which
can be found in the <literal>contrib</literal> directory of the
PostgreSQL distribution. These are porting tools, analysis utilities,
and plug-in features that are not part of the core PostgreSQL system,
mainly because they address a limited audience or are too experimental
to be part of the main source tree. This does not preclude their
Some modules supply new user-defined functions, operators, or types. In
these cases, you will need to run <literal>make</literal> and <literal>make
install</literal> in <literal>contrib/module</literal>. After you have
installed the files you need to register the new entities in the database
system by running the commands in the supplied .sql file. For example,
$ psql -d dbname -f module.sql
<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.51 2007/11/01 17:00:18 momjian Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.52 2007/11/10 23:30:46 momjian Exp $ -->
<!entity history SYSTEM "history.sgml">
<!entity info SYSTEM "info.sgml">
......@@ -89,6 +89,38 @@
<!entity sources SYSTEM "sources.sgml">
<!entity storage SYSTEM "storage.sgml">
<!-- contrib information -->
<!entity contrib SYSTEM "contrib.sgml">
<!entity adminpack SYSTEM "adminpack.sgml">
<!entity btree-gist SYSTEM "btree-gist.sgml">
<!entity chkpass SYSTEM "chkpass.sgml">
<!entity cube SYSTEM "cube.sgml">
<!entity dblink SYSTEM "dblink.sgml">
<!entity earthdistance SYSTEM "earthdistance.sgml">
<!entity fuzzystrmatch SYSTEM "fuzzystrmatch.sgml">
<!entity hstore SYSTEM "hstore.sgml">
<!entity intagg SYSTEM "intagg.sgml">
<!entity intarray SYSTEM "intarray.sgml">
<!entity isn SYSTEM "isn.sgml">
<!entity lo SYSTEM "lo.sgml">
<!entity ltree SYSTEM "ltree.sgml">
<!entity oid2name SYSTEM "oid2name.sgml">
<!entity pageinspect SYSTEM "pageinspect.sgml">
<!entity pgbench SYSTEM "pgbench.sgml">
<!entity buffercache SYSTEM "buffercache.sgml">
<!entity pgcrypto SYSTEM "pgcrypto.sgml">
<!entity freespacemap SYSTEM "freespacemap.sgml">
<!entity pgrowlocks SYSTEM "pgrowlocks.sgml">
<!entity standby SYSTEM "standby.sgml">
<!entity pgstattuple SYSTEM "pgstattuple.sgml">
<!entity trgm SYSTEM "trgm.sgml">
<!entity seg SYSTEM "seg.sgml">
<!entity sslinfo SYSTEM "sslinfo.sgml">
<!entity tablefunc SYSTEM "tablefunc.sgml">
<!entity uuid-ossp SYSTEM "uuid-ossp.sgml">
<!entity vacuumlo SYSTEM "vacuumlo.sgml">
<!entity xml2 SYSTEM "xml2.sgml">
<!-- appendixes -->
<!entity contacts SYSTEM "contacts.sgml">
<!entity cvs SYSTEM "cvs.sgml">
<!-- $PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.83 2007/11/01 17:00:18 momjian Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.84 2007/11/10 23:30:46 momjian Exp $ -->
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.2//EN" [
......@@ -102,6 +102,7 @@
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
想要评论请 注册