## 9.5. Binary String Functions and Operators
[]()
This section describes functions and operators for examining and manipulating binary strings, that is values of type `bytea`. Many of these are equivalent, in purpose and syntax, to the text-string functions described in the previous section.
SQL defines some string functions that use key words, rather than commas, to separate arguments. Details are in [Table 9.11](functions-binarystring.html#FUNCTIONS-BINARYSTRING-SQL). PostgreSQL also provides versions of these functions that use the regular function invocation syntax (see [Table 9.12](functions-binarystring.html#FUNCTIONS-BINARYSTRING-OTHER)).
**Table 9.11. SQL Binary String Functions and Operators**
| Function/Operator
Description
Example(s) |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| []() `bytea` `||` `bytea` → `bytea`
Concatenates the two binary strings.
`'\x123456'::bytea || '\x789a00bcde'::bytea` → `\x123456789a00bcde` |
| []() `bit_length` ( `bytea` ) → `integer`
Returns number of bits in the binary string (8 times the `octet_length`).
`bit_length('\x123456'::bytea)` → `24` |
| []() `octet_length` ( `bytea` ) → `integer`
Returns number of bytes in the binary string.
`octet_length('\x123456'::bytea)` → `3` |
|[]() `overlay` ( *`bytes`* `bytea` `PLACING` *`newsubstring`* `bytea` `FROM` *`start`* `integer` [ `FOR` *`count`* `integer` ] ) → `bytea`
Replaces the substring of *`bytes`* that starts at the *`start`*'th byte and extends for *`count`* bytes with *`newsubstring`*. If *`count`* is omitted, it defaults to the length of *`newsubstring`*.
`overlay('\x1234567890'::bytea placing '\002\003'::bytea from 2 for 3)` → `\x12020390`|
| []() `position` ( *`substring`* `bytea` `IN` *`bytes`* `bytea` ) → `integer`
Returns first starting index of the specified *`substring`* within *`bytes`*, or zero if it's not present.
`position('\x5678'::bytea in '\x1234567890'::bytea)` → `3` |
| []() `substring` ( *`bytes`* `bytea` [ `FROM` *`start`* `integer` ] [ `FOR` *`count`* `integer` ] ) → `bytea`
Extracts the substring of *`bytes`* starting at the *`start`*'th byte if that is specified, and stopping after *`count`* bytes if that is specified. Provide at least one of *`start`* and *`count`*.
`substring('\x1234567890'::bytea from 3 for 2)` → `\x5678` |
| []() `trim` ( [ `LEADING` | `TRAILING` | `BOTH` ] *`bytesremoved`* `bytea` `FROM` *`bytes`* `bytea` ) → `bytea`
Removes the longest string containing only bytes appearing in *`bytesremoved`* from the start, end, or both ends (`BOTH` is the default) of *`bytes`*.
`trim('\x9012'::bytea from '\x1234567890'::bytea)` → `\x345678` |
| `trim` ( [ `LEADING` | `TRAILING` | `BOTH` ] [ `FROM` ] *`bytes`* `bytea`, *`bytesremoved`* `bytea` ) → `bytea`
This is a non-standard syntax for `trim()`.
`trim(both from '\x1234567890'::bytea, '\x9012'::bytea)` → `\x345678` |
Additional binary string manipulation functions are available and are listed in [Table 9.12](functions-binarystring.html#FUNCTIONS-BINARYSTRING-OTHER). Some of them are used internally to implement the SQL-standard string functions listed in [Table 9.11](functions-binarystring.html#FUNCTIONS-BINARYSTRING-SQL).
**Table 9.12. Other Binary String Functions**
| Function
Description
Example(s) |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| []() []() `bit_count` ( *`bytes`* `bytea` ) → `bigint`
Returns the number of bits set in the binary string (also known as “popcount”).
`bit_count('\x1234567890'::bytea)` → `15` |
| []() `btrim` ( *`bytes`* `bytea`, *`bytesremoved`* `bytea` ) → `bytea`
Removes the longest string containing only bytes appearing in *`bytesremoved`* from the start and end of *`bytes`*.
`btrim('\x1234567890'::bytea, '\x9012'::bytea)` → `\x345678` |
| []() `get_bit` ( *`bytes`* `bytea`, *`n`* `bigint` ) → `integer`
Extracts [n'th](functions-binarystring.html#FUNCTIONS-ZEROBASED-NOTE) bit from binary string.
`get_bit('\x1234567890'::bytea, 30)` → `1` |
| []() `get_byte` ( *`bytes`* `bytea`, *`n`* `integer` ) → `integer`
Extracts [n'th](functions-binarystring.html#FUNCTIONS-ZEROBASED-NOTE) byte from binary string.
`get_byte('\x1234567890'::bytea, 4)` → `144` |
| []() []() []() `length` ( `bytea` ) → `integer`
Returns the number of bytes in the binary string.
`length('\x1234567890'::bytea)` → `5` |
| `length` ( *`bytes`* `bytea`, *`encoding`* `name` ) → `integer`
Returns the number of characters in the binary string, assuming that it is text in the given *`encoding`*.
`length('jose'::bytea, 'UTF8')` → `4` |
| []() `ltrim` ( *`bytes`* `bytea`, *`bytesremoved`* `bytea` ) → `bytea`
Removes the longest string containing only bytes appearing in *`bytesremoved`* from the start of *`bytes`*.
`ltrim('\x1234567890'::bytea, '\x9012'::bytea)` → `\x34567890` |
| []() `md5` ( `bytea` ) → `text`
Computes the MD5 [hash](functions-binarystring.html#FUNCTIONS-HASH-NOTE) of the binary string, with the result written in hexadecimal.
`md5('Th\000omas'::bytea)` → `8ab2d3c9689aaf18b4958c334c82d8b1` |
| []() `rtrim` ( *`bytes`* `bytea`, *`bytesremoved`* `bytea` ) → `bytea`
Removes the longest string containing only bytes appearing in *`bytesremoved`* from the end of *`bytes`*.
`rtrim('\x1234567890'::bytea, '\x9012'::bytea)` → `\x12345678` |
| []() `set_bit` ( *`bytes`* `bytea`, *`n`* `bigint`, *`newvalue`* `integer` ) → `bytea`
Sets [n'th](functions-binarystring.html#FUNCTIONS-ZEROBASED-NOTE) bit in binary string to *`newvalue`*.
`set_bit('\x1234567890'::bytea, 30, 0)` → `\x1234563890` |
| []() `set_byte` ( *`bytes`* `bytea`, *`n`* `integer`, *`newvalue`* `integer` ) → `bytea`
Sets [n'th](functions-binarystring.html#FUNCTIONS-ZEROBASED-NOTE) byte in binary string to *`newvalue`*.
`set_byte('\x1234567890'::bytea, 4, 64)` → `\x1234567840` |
| []() `sha224` ( `bytea` ) → `bytea`
Computes the SHA-224 [hash](functions-binarystring.html#FUNCTIONS-HASH-NOTE) of the binary string.
`sha224('abc'::bytea)` → `\x23097d223405d8228642a477bda255b32aadbce4bda0b3f7e36c9da7` |
| []() `sha256` ( `bytea` ) → `bytea`
Computes the SHA-256 [hash](functions-binarystring.html#FUNCTIONS-HASH-NOTE) of the binary string.
`sha256('abc'::bytea)` → `\xba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad` |
| []() `sha384` ( `bytea` ) → `bytea`
Computes the SHA-384 [hash](functions-binarystring.html#FUNCTIONS-HASH-NOTE) of the binary string.
`sha384('abc'::bytea)` → `\xcb00753f45a35e8bb5a03d699ac65007272c32ab0eded1631a8b605a43ff5bed8086072ba1e7cc2358baeca134c825a7` |
| []() `sha512` ( `bytea` ) → `bytea`
Computes the SHA-512 [hash](functions-binarystring.html#FUNCTIONS-HASH-NOTE) of the binary string.
`sha512('abc'::bytea)` → `\xddaf35a193617abacc417349ae20413112e6fa4e89a97ea20a9eeee64b55d39a2192992a274fc1a836ba3c23a3feebbd454d4423643ce80e2a9ac94fa54ca49f` |
|[]() `substr` ( *`bytes`* `bytea`, *`start`* `integer` [, *`count`* `integer` ] ) → `bytea`
Extracts the substring of *`bytes`* starting at the *`start`*'th byte, and extending for *`count`* bytes if that is specified. (Same as `substring(*`bytes`* from *`start`* for *`count`*)`.)
`substr('\x1234567890'::bytea, 3, 2)` → `\x5678`|
Functions `get_byte` and `set_byte` number the first byte of a binary string as byte 0. Functions `get_bit` and `set_bit` number bits from the right within each byte; for example bit 0 is the least significant bit of the first byte, and bit 15 is the most significant bit of the second byte.
For historical reasons, the function `md5` returns a hex-encoded value of type `text` whereas the SHA-2 functions return type `bytea`. Use the functions [`encode`](functions-binarystring.html#FUNCTION-ENCODE) and [`decode`](functions-binarystring.html#FUNCTION-DECODE) to convert between the two. For example write `encode(sha256('abc'), 'hex')` to get a hex-encoded text representation, or `decode(md5('abc'), 'hex')` to get a `bytea` value.
[]() []() Functions for converting strings between different character sets (encodings), and for representing arbitrary binary data in textual form, are shown in [Table 9.13](functions-binarystring.html#FUNCTIONS-BINARYSTRING-CONVERSIONS). For these functions, an argument or result of type `text` is expressed in the database's default encoding, while arguments or results of type `bytea` are in an encoding named by another argument.
**Table 9.13. Text/Binary String Conversion Functions**
| Function
Description
Example(s) |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|[]() `convert` ( *`bytes`* `bytea`, *`src_encoding`* `name`, *`dest_encoding`* `name` ) → `bytea`
Converts a binary string representing text in encoding *`src_encoding`* to a binary string in encoding *`dest_encoding`* (see [Section 24.3.4](multibyte.html#MULTIBYTE-CONVERSIONS-SUPPORTED) for available conversions).
`convert('text_in_utf8', 'UTF8', 'LATIN1')` → `\x746578745f696e5f75746638`|
| []() `convert_from` ( *`bytes`* `bytea`, *`src_encoding`* `name` ) → `text`
Converts a binary string representing text in encoding *`src_encoding`* to `text` in the database encoding (see [Section 24.3.4](multibyte.html#MULTIBYTE-CONVERSIONS-SUPPORTED) for available conversions).
`convert_from('text_in_utf8', 'UTF8')` → `text_in_utf8` |
| []() `convert_to` ( *`string`* `text`, *`dest_encoding`* `name` ) → `bytea`
Converts a `text` string (in the database encoding) to a binary string encoded in encoding *`dest_encoding`* (see [Section 24.3.4](multibyte.html#MULTIBYTE-CONVERSIONS-SUPPORTED) for available conversions).
`convert_to('some_text', 'UTF8')` → `\x736f6d655f74657874` |
| []() `encode` ( *`bytes`* `bytea`, *`format`* `text` ) → `text`
Encodes binary data into a textual representation; supported *`format`* values are: [`base64`](functions-binarystring.html#ENCODE-FORMAT-BASE64), [`escape`](functions-binarystring.html#ENCODE-FORMAT-ESCAPE), [`hex`](functions-binarystring.html#ENCODE-FORMAT-HEX).
`encode('123\000\001', 'base64')` → `MTIzAAE=` |
| []() `decode` ( *`string`* `text`, *`format`* `text` ) → `bytea`
Decodes binary data from a textual representation; supported *`format`* values are the same as for `encode`.
`decode('MTIzAAE=', 'base64')` → `\x3132330001` |
The `encode` and `decode` functions support the following textual formats:
base64 []()
The `base64` format is that of [RFC 2045 Section 6.8](https://tools.ietf.org/html/rfc2045#section-6.8). As per the RFC, encoded lines are broken at 76 characters. However instead of the MIME CRLF end-of-line marker, only a newline is used for end-of-line. The `decode` function ignores carriage-return, newline, space, and tab characters. Otherwise, an error is raised when `decode` is supplied invalid base64 data — including when trailing padding is incorrect.
escape []()
The `escape` format converts zero bytes and bytes with the high bit set into octal escape sequences (`\`*`nnn`*), and it doubles backslashes. Other byte values are represented literally. The `decode` function will raise an error if a backslash is not followed by either a second backslash or three octal digits; it accepts other byte values unchanged.
hex []()
The `hex` format represents each 4 bits of data as one hexadecimal digit, `0` through `f`, writing the higher-order digit of each byte first. The `encode` function outputs the `a`-`f` hex digits in lower case. Because the smallest unit of data is 8 bits, there are always an even number of characters returned by `encode`. The `decode` function accepts the `a`-`f` characters in either upper or lower case. An error is raised when `decode` is given invalid hex data — including when given an odd number of characters.
See also the aggregate function `string_agg` in [Section 9.21](functions-aggregate.html) and the large object functions in [Section 35.4](lo-funcs.html).