functions-binarystring.md 17.4 KB
Newer Older
K
KyleZhang 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
## 9.5. Binary String Functions and Operators

[]()

 This section describes functions and operators for examining and manipulating binary strings, that is values of type `bytea`. Many of these are equivalent, in purpose and syntax, to the text-string functions described in the previous section.

SQL defines some string functions that use key words, rather than commas, to separate arguments. Details are in [Table 9.11](functions-binarystring.html#FUNCTIONS-BINARYSTRING-SQL). PostgreSQL also provides versions of these functions that use the regular function invocation syntax (see [Table 9.12](functions-binarystring.html#FUNCTIONS-BINARYSTRING-OTHER)).

**Table 9.11. SQL Binary String Functions and Operators**

|                                                                                                                                                                                                Function/Operator<br/><br/> Description<br/><br/> Example(s)                                                                                                                                                                                                |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                                                                                                                              []() `bytea` `||` `bytea``bytea`<br/><br/> Concatenates the two binary strings.<br/><br/>`'\x123456'::bytea || '\x789a00bcde'::bytea``\x123456789a00bcde`                                                                                                                                               |
|                                                                                                                                       []() `bit_length` ( `bytea` ) → `integer`<br/><br/> Returns number of bits in the binary string (8 times the `octet_length`).<br/><br/>`bit_length('\x123456'::bytea)``24`                                                                                                                                        |
|                                                                                                                                                    []() `octet_length` ( `bytea` ) → `integer`<br/><br/> Returns number of bytes in the binary string.<br/><br/>`octet_length('\x123456'::bytea)``3`                                                                                                                                                    |
|[]() `overlay` ( *`bytes`* `bytea` `PLACING` *`newsubstring`* `bytea` `FROM` *`start`* `integer` [ `FOR` *`count`* `integer` ] ) → `bytea`<br/><br/> Replaces the substring of *`bytes`* that starts at the *`start`*'th byte and extends for *`count`* bytes with *`newsubstring`*. If *`count`* is omitted, it defaults to the length of *`newsubstring`*.<br/><br/>`overlay('\x1234567890'::bytea placing '\002\003'::bytea from 2 for 3)``\x12020390`|
|                                                                                           []() `position` ( *`substring`* `bytea` `IN` *`bytes`* `bytea` ) → `integer`<br/><br/> Returns first starting index of the specified *`substring`* within *`bytes`*, or zero if it's not present.<br/><br/>`position('\x5678'::bytea in '\x1234567890'::bytea)``3`                                                                                            |
|                             []() `substring` ( *`bytes`* `bytea` [ `FROM` *`start`* `integer` ] [ `FOR` *`count`* `integer` ] ) → `bytea`<br/><br/> Extracts the substring of *`bytes`* starting at the *`start`*'th byte if that is specified, and stopping after *`count`* bytes if that is specified. Provide at least one of *`start`* and *`count`*.<br/><br/>`substring('\x1234567890'::bytea from 3 for 2)``\x5678`                              |
|                                                 []() `trim` ( [ `LEADING` | `TRAILING` | `BOTH` ] *`bytesremoved`* `bytea` `FROM` *`bytes`* `bytea` ) → `bytea`<br/><br/> Removes the longest string containing only bytes appearing in *`bytesremoved`* from the start, end, or both ends (`BOTH` is the default) of *`bytes`*.<br/><br/>`trim('\x9012'::bytea from '\x1234567890'::bytea)``\x345678`                                                  |
|                                                                                                    `trim` ( [ `LEADING` | `TRAILING` | `BOTH` ] [ `FROM` ] *`bytes`* `bytea`, *`bytesremoved`* `bytea` ) → `bytea`<br/><br/> This is a non-standard syntax for `trim()`.<br/><br/>`trim(both from '\x1234567890'::bytea, '\x9012'::bytea)``\x345678`                                                                                                    |

 Additional binary string manipulation functions are available and are listed in [Table 9.12](functions-binarystring.html#FUNCTIONS-BINARYSTRING-OTHER). Some of them are used internally to implement the SQL-standard string functions listed in [Table 9.11](functions-binarystring.html#FUNCTIONS-BINARYSTRING-SQL).

**Table 9.12. Other Binary String Functions**

|                                                                                                                                                     Function<br/><br/> Description<br/><br/> Example(s)                                                                                                                                                     |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                                                             []() []() `bit_count` ( *`bytes`* `bytea` ) → `bigint`<br/><br/> Returns the number of bits set in the binary string (also known as “popcount”).<br/><br/>`bit_count('\x1234567890'::bytea)``15`                                                                             |
|                                         []() `btrim` ( *`bytes`* `bytea`, *`bytesremoved`* `bytea` ) → `bytea`<br/><br/> Removes the longest string containing only bytes appearing in *`bytesremoved`* from the start and end of *`bytes`*.<br/><br/>`btrim('\x1234567890'::bytea, '\x9012'::bytea)``\x345678`                                          |
|                                                                []() `get_bit` ( *`bytes`* `bytea`, *`n`* `bigint` ) → `integer`<br/><br/> Extracts [n'th](functions-binarystring.html#FUNCTIONS-ZEROBASED-NOTE) bit from binary string.<br/><br/>`get_bit('\x1234567890'::bytea, 30)``1`                                                                 |
|                                                              []() `get_byte` ( *`bytes`* `bytea`, *`n`* `integer` ) → `integer`<br/><br/> Extracts [n'th](functions-binarystring.html#FUNCTIONS-ZEROBASED-NOTE) byte from binary string.<br/><br/>`get_byte('\x1234567890'::bytea, 4)``144`                                                              |
|                                                                                                 []() []() []() `length` ( `bytea` ) → `integer`<br/><br/> Returns the number of bytes in the binary string.<br/><br/>`length('\x1234567890'::bytea)``5`                                                                                                  |
|                                                             `length` ( *`bytes`* `bytea`, *`encoding`* `name` ) → `integer`<br/><br/> Returns the number of characters in the binary string, assuming that it is text in the given *`encoding`*.<br/><br/>`length('jose'::bytea, 'UTF8')``4`                                                             |
|                                            []() `ltrim` ( *`bytes`* `bytea`, *`bytesremoved`* `bytea` ) → `bytea`<br/><br/> Removes the longest string containing only bytes appearing in *`bytesremoved`* from the start of *`bytes`*.<br/><br/>`ltrim('\x1234567890'::bytea, '\x9012'::bytea)``\x34567890`                                             |
|                                                 []() `md5` ( `bytea` ) → `text`<br/><br/> Computes the MD5 [hash](functions-binarystring.html#FUNCTIONS-HASH-NOTE) of the binary string, with the result written in hexadecimal.<br/><br/>`md5('Th\000omas'::bytea)``8ab2d3c9689aaf18​b4958c334c82d8b1`                                                  |
|                                             []() `rtrim` ( *`bytes`* `bytea`, *`bytesremoved`* `bytea` ) → `bytea`<br/><br/> Removes the longest string containing only bytes appearing in *`bytesremoved`* from the end of *`bytes`*.<br/><br/>`rtrim('\x1234567890'::bytea, '\x9012'::bytea)``\x12345678`                                              |
|                                         []() `set_bit` ( *`bytes`* `bytea`, *`n`* `bigint`, *`newvalue`* `integer` ) → `bytea`<br/><br/> Sets [n'th](functions-binarystring.html#FUNCTIONS-ZEROBASED-NOTE) bit in binary string to *`newvalue`*.<br/><br/>`set_bit('\x1234567890'::bytea, 30, 0)``\x1234563890`                                          |
|                                       []() `set_byte` ( *`bytes`* `bytea`, *`n`* `integer`, *`newvalue`* `integer` ) → `bytea`<br/><br/> Sets [n'th](functions-binarystring.html#FUNCTIONS-ZEROBASED-NOTE) byte in binary string to *`newvalue`*.<br/><br/>`set_byte('\x1234567890'::bytea, 4, 64)``\x1234567840`                                        |
|                                                      []() `sha224` ( `bytea` ) → `bytea`<br/><br/> Computes the SHA-224 [hash](functions-binarystring.html#FUNCTIONS-HASH-NOTE) of the binary string.<br/><br/>`sha224('abc'::bytea)``\x23097d223405d8228642a477bda2​55b32aadbce4bda0b3f7e36c9da7`                                                       |
|                                                  []() `sha256` ( `bytea` ) → `bytea`<br/><br/> Computes the SHA-256 [hash](functions-binarystring.html#FUNCTIONS-HASH-NOTE) of the binary string.<br/><br/>`sha256('abc'::bytea)``\xba7816bf8f01cfea414140de5dae2223​b00361a396177a9cb410ff61f20015ad`                                                   |
|                                  []() `sha384` ( `bytea` ) → `bytea`<br/><br/> Computes the SHA-384 [hash](functions-binarystring.html#FUNCTIONS-HASH-NOTE) of the binary string.<br/><br/>`sha384('abc'::bytea)``\xcb00753f45a35e8bb5a03d699ac65007​272c32ab0eded1631a8b605a43ff5bed​8086072ba1e7cc2358baeca134c825a7`                                  |
|                 []() `sha512` ( `bytea` ) → `bytea`<br/><br/> Computes the SHA-512 [hash](functions-binarystring.html#FUNCTIONS-HASH-NOTE) of the binary string.<br/><br/>`sha512('abc'::bytea)``\xddaf35a193617abacc417349ae204131​12e6fa4e89a97ea20a9eeee64b55d39a​2192992a274fc1a836ba3c23a3feebbd​454d4423643ce80e2a9ac94fa54ca49f`                  |
|[]() `substr` ( *`bytes`* `bytea`, *`start`* `integer` [, *`count`* `integer` ] ) → `bytea`<br/><br/> Extracts the substring of *`bytes`* starting at the *`start`*'th byte, and extending for *`count`* bytes if that is specified. (Same as `substring(*`bytes`* from *`start`* for *`count`*)`.)<br/><br/>`substr('\x1234567890'::bytea, 3, 2)``\x5678`|

 Functions `get_byte` and `set_byte` number the first byte of a binary string as byte 0. Functions `get_bit` and `set_bit` number bits from the right within each byte; for example bit 0 is the least significant bit of the first byte, and bit 15 is the most significant bit of the second byte.

 For historical reasons, the function `md5` returns a hex-encoded value of type `text` whereas the SHA-2 functions return type `bytea`. Use the functions [`encode`](functions-binarystring.html#FUNCTION-ENCODE) and [`decode`](functions-binarystring.html#FUNCTION-DECODE) to convert between the two. For example write `encode(sha256('abc'), 'hex')` to get a hex-encoded text representation, or `decode(md5('abc'), 'hex')` to get a `bytea` value.

[]() []() Functions for converting strings between different character sets (encodings), and for representing arbitrary binary data in textual form, are shown in [Table 9.13](functions-binarystring.html#FUNCTIONS-BINARYSTRING-CONVERSIONS). For these functions, an argument or result of type `text` is expressed in the database's default encoding, while arguments or results of type `bytea` are in an encoding named by another argument.

**Table 9.13. Text/Binary String Conversion Functions**

|                                                                                                                                                                                    Function<br/><br/> Description<br/><br/> Example(s)                                                                                                                                                                                   |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|[]() `convert` ( *`bytes`* `bytea`, *`src_encoding`* `name`, *`dest_encoding`* `name` ) → `bytea`<br/><br/> Converts a binary string representing text in encoding *`src_encoding`* to a binary string in encoding *`dest_encoding`* (see [Section 24.3.4](multibyte.html#MULTIBYTE-CONVERSIONS-SUPPORTED) for available conversions).<br/><br/>`convert('text_in_utf8', 'UTF8', 'LATIN1')``\x746578745f696e5f75746638`|
|                           []() `convert_from` ( *`bytes`* `bytea`, *`src_encoding`* `name` ) → `text`<br/><br/> Converts a binary string representing text in encoding *`src_encoding`* to `text` in the database encoding (see [Section 24.3.4](multibyte.html#MULTIBYTE-CONVERSIONS-SUPPORTED) for available conversions).<br/><br/>`convert_from('text_in_utf8', 'UTF8')``text_in_utf8`                            |
|                         []() `convert_to` ( *`string`* `text`, *`dest_encoding`* `name` ) → `bytea`<br/><br/> Converts a `text` string (in the database encoding) to a binary string encoded in encoding *`dest_encoding`* (see [Section 24.3.4](multibyte.html#MULTIBYTE-CONVERSIONS-SUPPORTED) for available conversions).<br/><br/>`convert_to('some_text', 'UTF8')``\x736f6d655f74657874`                         |
|        []() `encode` ( *`bytes`* `bytea`, *`format`* `text` ) → `text`<br/><br/> Encodes binary data into a textual representation; supported *`format`* values are: [`base64`](functions-binarystring.html#ENCODE-FORMAT-BASE64), [`escape`](functions-binarystring.html#ENCODE-FORMAT-ESCAPE), [`hex`](functions-binarystring.html#ENCODE-FORMAT-HEX).<br/><br/>`encode('123\000\001', 'base64')``MTIzAAE=`         |
|                                                                                     []() `decode` ( *`string`* `text`, *`format`* `text` ) → `bytea`<br/><br/> Decodes binary data from a textual representation; supported *`format`* values are the same as for `encode`.<br/><br/>`decode('MTIzAAE=', 'base64')``\x3132330001`                                                                                     |

 The `encode` and `decode` functions support the following textual formats:

base64 []()

 The `base64` format is that of [RFC 2045 Section 6.8](https://tools.ietf.org/html/rfc2045#section-6.8). As per the RFC, encoded lines are broken at 76 characters. However instead of the MIME CRLF end-of-line marker, only a newline is used for end-of-line. The `decode` function ignores carriage-return, newline, space, and tab characters. Otherwise, an error is raised when `decode` is supplied invalid base64 data — including when trailing padding is incorrect.

escape []()

 The `escape` format converts zero bytes and bytes with the high bit set into octal escape sequences (`\`*`nnn`*), and it doubles backslashes. Other byte values are represented literally. The `decode` function will raise an error if a backslash is not followed by either a second backslash or three octal digits; it accepts other byte values unchanged.

hex []()

 The `hex` format represents each 4 bits of data as one hexadecimal digit, `0` through `f`, writing the higher-order digit of each byte first. The `encode` function outputs the `a`-`f` hex digits in lower case. Because the smallest unit of data is 8 bits, there are always an even number of characters returned by `encode`. The `decode` function accepts the `a`-`f` characters in either upper or lower case. An error is raised when `decode` is given invalid hex data — including when given an odd number of characters.

 See also the aggregate function `string_agg` in [Section 9.21](functions-aggregate.html) and the large object functions in [Section 35.4](lo-funcs.html).