diff --git a/gpdb-doc/dita/ref_guide/modules/fuzzystrmatch.xml b/gpdb-doc/dita/ref_guide/modules/fuzzystrmatch.xml index 6af225a8940cb55aff927919ff3ca3e7b03d59c4..e9a5d48201e70ad7663f8a261ae921f2040ba58d 100644 --- a/gpdb-doc/dita/ref_guide/modules/fuzzystrmatch.xml +++ b/gpdb-doc/dita/ref_guide/modules/fuzzystrmatch.xml @@ -6,167 +6,29 @@

The fuzzystrmatch module provides functions to determine similarities and distance between strings based on various algorithms.

- -

The Greenplum Database installation contains the files required for the functions in this - module and SQL scripts to define the module functions in a database and remove - the functions from a database.

- The functions soundex, metaphone, - dmetaphone, and dmetaphone_alt do not work well with - multibyte encodings (such as UTF-8). -

The Greenplum Database Fuzzy String Match is based on the PostgreSQL - fuzzystrmatch module.

+

The Greenplum Database fuzzystrmatch module is equivalent + to the PostgreSQL fuzzystrmatch module. There are no + Greenplum Database or MPP-specific considerations for the module.

- - Soundex Functions + + Installing and Registering the Module -

The Soundex system is a method of matching similar-sounding (similar phonemes) names by - converting them to the same code.

- Soundex is most useful for English names. -

These functions work with Soundex codes:

- soundex(text string1) returns text -difference(text string1, text string2) returns int -

The soundex function converts a string to its Soundex code. Soundex - codes consist of four characters.

-

The difference function converts two strings to their Soundex codes - and then reports the number of matching code positions. The result ranges from zero to - four, zero being no match and four being an exact match. These are some examples:

- SELECT soundex('hello world!'); -SELECT soundex('Anne'), soundex('Ann'), difference('Anne', 'Ann'); -SELECT soundex('Anne'), soundex('Andrew'), difference('Anne', 'Andrew'); -SELECT soundex('Anne'), soundex('Margaret'), difference('Anne', 'Margaret'); - -CREATE TABLE s (nm text); - -INSERT INTO s VALUES ('john'); -INSERT INTO s VALUES ('joan'); -INSERT INTO s VALUES ('wobbly'); -INSERT INTO s VALUES ('jack'); - -SELECT * FROM s WHERE soundex(nm) = soundex('john'); - -SELECT * FROM s WHERE difference(s.nm, 'john') > 2; -

For information about the Soundex indexing system see .

+

The fuzzystrmatch module is installed when you install + Greenplum Database. Before you can use any of the functions defined in the + module, you must register the fuzzystrmatch extension in + each database in which you want to use the functions. + Refer to Installing Additional Supplied Modules + for more information.

- - Levenshtein Functions + + Module Documentation -

These functions calculate the Levenshtein distance between two strings:

- levenshtein(text source, text target, int ins_cost, int del_cost, int sub_cost) returns int -levenshtein(text source, text target) returns int -levenshtein_less_equal(text source, text target, int ins_cost, int del_cost, int sub_cost, int max_d) returns int -levenshtein_less_equal(text source, text target, int max_d) returns int -

Both the source and target parameters can be any - non-null string, with a maximum of 255 bytes. The cost parameters - ins_cost, del_cost, and sub_cost - specify cost of a character insertion, deletion, or substitution, respectively. You can - omit the cost parameters, as in the second version of the function; in that case the cost - parameters default to 1.

-

- levenshtein_less_equal is accelerated version of - levenshtein function for low values of distance. If actual distance - is less or equal then max_d, then - levenshtein_less_equal returns an accurate value of the distance. - Otherwise, this function returns value which is greater than max_d. - Examples:

- test=# SELECT levenshtein('GUMBO', 'GAMBOL'); - levenshtein -------------- - 2 -(1 row) - -test=# SELECT levenshtein('GUMBO', 'GAMBOL', 2,1,1); - levenshtein -------------- - 3 -(1 row) - -test=# SELECT levenshtein_less_equal('extensive', 'exhaustive',2); - levenshtein_less_equal ------------------------- - 3 -(1 row) - -test=# SELECT levenshtein_less_equal('extensive', 'exhaustive',4); - levenshtein_less_equal ------------------------- - 4 -(1 row) -

For information about the Levenshtein algorithm, see .

- -
- - Metaphone Functions - -

Metaphone, like Soundex, is based on the idea of constructing a representative code for - an input string. Two strings are then deemed similar if they have the same codes. This - function calculates the metaphone code of an input string - :metaphone(text source, int max_output_length) returns text

-

The source parameter must be a non-null string with a maximum of 255 - characters. The max_output_length parameter sets the maximum length of - the output metaphone code; if longer, the output is truncated to this length. Example:

- test=# SELECT metaphone('GUMBO', 4); - metaphone ------------ - KM -(1 row) -

For information about the Metaphone algorithm, see .

- -
- - Double Metaphone Functions - -

The Double Metaphone system computes two "sounds like" strings for a given input string - - a "primary" and an "alternate". In most cases they are the same, but for non-English names - especially they can be a bit different, depending on pronunciation. These functions - compute the primary and alternate - codes:dmetaphone(text source) returns text -dmetaphone_alt(text source) returns text - There is no length limit on the input strings. Example:

- test=# select dmetaphone('gumbo'); - dmetaphone ------------- - KMP -(1 row) -

For information about the Double Metaphone algorithm, see .

- -
- - Registering the fuzzystrmatch Functions - -

Greenplum Database supplies SQL scripts to register the fuzzystrmatch - module functions.

-

Before you can use fuzzystrmatch functions, you must register - the fuzzystrmatch extension in each database in which you want - to use the functions:

- psql -d testdb -c "CREATE EXTENSION fuzzystrmatch" -

To remove the functions from a database:

- psql -d testdb -c "DROP EXTENSION fuzzystrmatch" - When you remove the fuzzystrmatchfunctions from a database, - UDFs that you created in the database that use the functions will no longer - work. +

See fuzzystrmatch in the PostgreSQL + documentation for detailed information about the individual functions in + this module.