From 7eb16b78123bc229e906bc8fe6526bd321850984 Mon Sep 17 00:00:00 2001 From: "Thomas G. Lockhart" Date: Thu, 8 Apr 1999 13:29:08 +0000 Subject: [PATCH] Add section from Tom Lane on hashjoin characteristics of operators. Add emacs editor hints to bottom of file. --- doc/src/sgml/xoper.sgml | 166 ++++++++++++++++++++++++++++------------ 1 file changed, 115 insertions(+), 51 deletions(-) diff --git a/doc/src/sgml/xoper.sgml b/doc/src/sgml/xoper.sgml index 1e96e0a18c..2a1957476b 100644 --- a/doc/src/sgml/xoper.sgml +++ b/doc/src/sgml/xoper.sgml @@ -1,52 +1,116 @@ - -Extending <Acronym>SQL</Acronym>: Operators - - - Postgres supports left unary, right unary and binary - operators. Operators can be overloaded, or re-used - with different numbers and types of arguments. If - there is an ambiguous situation and the system cannot - determine the correct operator to use, it will return - an error and you may have to typecast the left and/or - right operands to help it understand which operator you - meant to use. - To create an operator for adding two complex numbers - can be done as follows. First we need to create a - function to add the new types. Then, we can create the - operator with the function. - - - CREATE FUNCTION complex_add(complex, complex) - RETURNS complex - AS '$PWD/obj/complex.so' - LANGUAGE 'c'; - - CREATE OPERATOR + ( - leftarg = complex, - rightarg = complex, - procedure = complex_add, - commutator = + - ); - - - - - We've shown how to create a binary operator here. To - create unary operators, just omit one of leftarg (for - left unary) or rightarg (for right unary). - If we give the system enough type information, it can - automatically figure out which operators to use. + + Extending <Acronym>SQL</Acronym>: Operators + + + Postgres supports left unary, + right unary and binary + operators. Operators can be overloaded, or re-used + with different numbers and types of arguments. If + there is an ambiguous situation and the system cannot + determine the correct operator to use, it will return + an error and you may have to typecast the left and/or + right operands to help it understand which operator you + meant to use. + To create an operator for adding two complex numbers + can be done as follows. First we need to create a + function to add the new types. Then, we can create the + operator with the function. + + +CREATE FUNCTION complex_add(complex, complex) + RETURNS complex + AS '$PWD/obj/complex.so' + LANGUAGE 'c'; + +CREATE OPERATOR + ( + leftarg = complex, + rightarg = complex, + procedure = complex_add, + commutator = + +); + + + + + We've shown how to create a binary operator here. To + create unary operators, just omit one of leftarg (for + left unary) or rightarg (for right unary). + If we give the system enough type information, it can + automatically figure out which operators to use. - - SELECT (a + b) AS c FROM test_complex; - - +----------------+ - |c | - +----------------+ - |(5.2,6.05) | - +----------------+ - |(133.42,144.95) | - +----------------+ - - - + +SELECT (a + b) AS c FROM test_complex; + ++----------------+ +|c | ++----------------+ +|(5.2,6.05) | ++----------------+ +|(133.42,144.95) | ++----------------+ + + + + + Hash Join Operators + + + Author + + Written by Tom Lane. + + + + + The assumption underlying hash join is that two values that will be + considered equal by the comparison operator will always have the same + hash value. If two values get put in different hash buckets, the join + will never compare them at all, so they are necessarily treated as + unequal. + + + + But we have a number of datatypes for which the "=" operator is not + a straight bitwise comparison. For example, intervaleq is not bitwise + at all; it considers two time intervals equal if they have the same + duration, whether or not their endpoints are identical. What this means + is that a join using "=" between interval fields will yield different + results if implemented as a hash join than if implemented another way, + because a large fraction of the pairs that should match will hash to + different values and will never be compared. + + + + I believe the same problem exists for float data; for example, on + IEEE-compliant machines, minus zero and plus zero have different bit + patterns (hence different hash values) but should be considered equal. + A hashjoin will get it wrong. + + + + I will go through pg_operator and remove the hashable flag from + operators that are not safely hashable, but I see no way to + automatically check for this sort of mistake. The only long-term + answer is to raise the consciousness of datatype creators about what + it means to set the oprcanhash flag. Don't do it unless your equality + operator can be implemented as memcmp()! + + + + + -- GitLab