# 4.1.

词汇结构4.1.1.

标识符和关键词4.1.2.

常数4.1.3.

运营商4.1.4.

特殊的角色4.1.5.

评论4.1.6.

运算符优先级SQL 输入由一系列命令.一个命令由一系列代币, 以分号 (“;”) 结束。输入流的结尾也终止了一个命令。

哪些标记有效取决于特定命令的语法。令牌可以是关键词*, 一个标识符, 一种带引号的标识符, 一种*文字(或常量),或特殊字符符号。

标记通常由空格(空格、制表符、换行符)分隔,但如果没有歧义则不需要(通常只有在特殊字符与某些其他标记类型相邻时才会出现这种情况)。

SELECT * FROM MY_TABLE;
UPDATE MY_TABLE SET A = 5;
INSERT INTO MY_TABLE VALUES (3, 'hi there');

例如,以下是(语法上)有效的 SQL 输入:这是三个命令的序列,每行一个(尽管这不是必需的;一行上可以有多个命令,并且命令可以有效地跨行拆分)。

此外,注释可以出现在 SQL 输入中。它们不是标记,它们实际上等同于空格。

SQL 语法在哪些标记标识命令以及哪些是操作数或参数方面不是很一致。前几个标记通常是命令名称,因此在上面的示例中,我们通常会说“SELECT”、“UPDATE”和“INSERT”命令。但例如更新命令总是需要一个令牌出现在某个位置,而这种特殊的变化插入还需要一个价值观为了完整。每个命令的精确语法规则在第六部分.

# 4.1.1.

标识符和关键词代币如选择,更新, 或者价值观在上面的例子中是关键词,即在 SQL 语言中具有固定含义的词。代币MY_TABLE一种是的例子身份标识.它们根据使用的命令来标识表、列或其他数据库对象的名称。因此,它们有时简称为“名称”。关键字和标识符具有相同的词汇结构,这意味着如果不了解语言,就无法知道标记是标识符还是关键字。完整的关键词列表可以在附录 C.

SQL 标识符和关键字必须以字母 (一种-z, 但也包括带有变音符号和非拉丁字母的字母)或下划线 (_)。标识符或关键字中的后续字符可以是字母、下划线、数字 (0-9) 或美元符号 ($)。请注意,根据 SQL 标准的字母,标识符中不允许使用美元符号,因此使用它们可能会降低应用程序的可移植性。SQL 标准不会定义包含数字或以下划线开头或结尾的关键字,因此这种形式的标识符可以避免与标准的未来扩展发生冲突。

系统使用不超过名称DATALEN-1 字节的标识符;较长的名称可以写在命令中,但它们会被截断。默认,名称DATALEN为 64,因此最大标识符长度为 63 个字节。如果此限制有问题,可以通过更改名称DATALEN恒定在src/include/pg_config_manual.h.

关键字和未加引号的标识符不区分大小写。所以:

UPDATE MY_TABLE SET A = 5;

等价地可以写成:

uPDaTE my_TabLE SeT a = 5;

常用的约定是将关键字大写,名称小写,例如:

UPDATE my_table SET a = 5;

还有第二种标识符:分隔标识符要么带引号的标识符.它是通过将任意字符序列括在双引号 (")。分隔标识符始终是标识符,而不是关键字。所以“选择”可用于引用名为“select”的列或表,而未引用的选择将被视为关键字,因此在需要表或列名的地方使用时会引发解析错误。该示例可以使用带引号的标识符编写,如下所示:

UPDATE "my_table" SET "a" = 5;

带引号的标识符可以包含任何字符,但代码为零的字符除外。(要包含双引号,请写两个双引号。)这允许构造否则不可能的表或列名称,例如包含空格或&符号的名称。长度限制仍然适用。

引用标识符也使其区分大小写,而未引用的名称总是折叠为小写。例如,标识符食品,, 和“富”被 PostgreSQL 认为是相同的,但是“福”“喂”与这三个和彼此不同。(在 PostgreSQL 中将不带引号的名称折叠为小写与 SQL 标准不兼容,SQL 标准规定不带引号的名称应折叠为大写。因此,富应该相当于“喂”不是“富”根据标准。如果您想编写可移植的应用程序,建议您始终引用特定名称或永远不要引用它。)引用标识符的变体允许包含由其代码点标识的转义 Unicode 字符。

此变体以你&(大写或小写 U 后跟 & 号)紧接在开始双引号之前,中间没有任何空格,例如你&“富”.(请注意,这会与操作员产生歧义&.在运算符周围使用空格以避免此问题。)在引号内,可以通过编写反斜杠后跟四位十​​六进制代码点号或反斜杠后跟加号后跟六来以转义形式指定 Unicode 字符-digit 十六进制代码点编号。例如,标识符“数据”可以写成

U&"d\0061t\+000061"

以下不那么琐碎的示例用西里尔字母书写了俄语单词“slon”(大象):

U&"\0441\043B\043E\043D"

如果需要与反斜杠不同的转义字符,可以使用UESCAPEclause after the string, for example:

U&"d!0061t!+000061" UESCAPE '!'

The escape character can be any single character other than a hexadecimal digit, the plus sign, a single quote, a double quote, or a whitespace character. Note that the escape character is written in single quotes, not double quotes, afterUESCAPE.

To include the escape character in the identifier literally, write it twice.

Either the 4-digit or the 6-digit escape form can be used to specify UTF-16 surrogate pairs to compose characters with code points larger than U+FFFF, although the availability of the 6-digit form technically makes this unnecessary. (Surrogate pairs are not stored directly, but are combined into a single code point.)

If the server encoding is not UTF-8, the Unicode code point identified by one of these escape sequences is converted to the actual server encoding; an error is reported if that's not possible.

# 4.1.2. Constants

There are three kinds ofimplicitly-typed constantsin PostgreSQL: strings, bit strings, and numbers. Constants can also be specified with explicit types, which can enable more accurate representation and more efficient handling by the system. These alternatives are discussed in the following subsections.

# 4.1.2.1. String Constants

A string constant in SQL is an arbitrary sequence of characters bounded by single quotes ('), for example'This is a string'. To include a single-quote character within a string constant, write two adjacent single quotes, e.g.,'Dianne''s horse'. Note that this isnotthe same as a double-quote character (").

Two string constants that are only separated by whitespacewith at least one newlineare concatenated and effectively treated as if the string had been written as one constant. For example:

SELECT 'foo'
'bar';

is equivalent to:

SELECT 'foobar';

but:

SELECT 'foo'      'bar';

is not valid syntax. (This slightly bizarre behavior is specified by SQL; PostgreSQL is following the standard.)

# 4.1.2.2. String Constants with C-Style Escapes

PostgreSQL also accepts “escape” string constants, which are an extension to the SQL standard. An escape string constant is specified by writing the letterE(upper or lower case) just before the opening single quote, e.g.,E'foo'. (When continuing an escape string constant across lines, writeEonly before the first opening quote.) Within an escape string, a backslash character (\) begins a C-likebackslash escapesequence, in which the combination of backslash and following character(s) represent a special byte value, as shown inTable 4.1.

Table 4.1. Backslash Escape Sequences

Backslash Escape Sequence Interpretation
\b backspace
\f form feed
\n newline
\r carriage return
\t tab
\*o*, \**, \*哦哦* ( = 0–7) 八进制字节值
\x*h*, \x*呵呵* (h= 0–9,A–F) 十六进制字节值
\u*xxxx*, \U*xxxxxxxx*(x= 0–9,A–F) 16 位或 32 位十六进制 Unicode 字符值

反斜杠后面的任何其他字符都按字面意思表示。因此,要包含反斜杠字符,请编写两个反斜杠 (\\)。此外,可以通过编写将单引号包含在转义字符串中\', 除了正常的方式''.

您有责任创建的字节序列,尤其是在使用八进制或十六进制转义时,在服务器字符集编码中构成有效字符。一个有用的替代方法是使用 Unicode 转义或替代 Unicode 转义语法,在第 4.1.2.3 节;然后服务器将检查字符转换是否可行。

# 警告

如果配置参数标准_符合的_字符串离开, 然后 PostgreSQL 在常规和转义字符串常量中识别反斜杠转义。但是,从 PostgreSQL 9.1 开始,默认值为,这意味着反斜杠转义仅在转义字符串常量中被识别。这种行为更符合标准,但可能会破坏依赖历史行为的应用程序,其中反斜杠转义总是被识别。作为一种解决方法,您可以将此参数设置为离开,但最好不要使用反斜杠转义。如果您需要使用反斜杠转义来表示特殊字符,请将字符串常量写入.

In addition tostandard_conforming_strings, the configuration parametersescape_string_warningandbackslash_quotegovern treatment of backslashes in string constants.

The character with the code zero cannot be in a string constant.

# 4.1.2.3. String Constants with Unicode Escapes

PostgreSQL also supports another type of escape syntax for strings that allows specifying arbitrary Unicode characters by code point. A Unicode escape string constant starts withU&(upper or lower case letter U followed by ampersand) immediately before the opening quote, without any spaces in between, for exampleU&'foo'. (Note that this creates an ambiguity with the operator&. Use spaces around the operator to avoid this problem.) Inside the quotes, Unicode characters can be specified in escaped form by writing a backslash followed by the four-digit hexadecimal code point number or alternatively a backslash followed by a plus sign followed by a six-digit hexadecimal code point number. For example, the string'data'could be written as

U&'d\0061t\+000061'

The following less trivial example writes the Russian word “slon” (elephant) in Cyrillic letters:

U&'\0441\043B\043E\043D'

If a different escape character than backslash is desired, it can be specified using theUESCAPEclause after the string, for example:

U&'d!0061t!+000061' UESCAPE '!'

The escape character can be any single character other than a hexadecimal digit, the plus sign, a single quote, a double quote, or a whitespace character.

To include the escape character in the string literally, write it twice.

Either the 4-digit or the 6-digit escape form can be used to specify UTF-16 surrogate pairs to compose characters with code points larger than U+FFFF, although the availability of the 6-digit form technically makes this unnecessary. (Surrogate pairs are not stored directly, but are combined into a single code point.)

If the server encoding is not UTF-8, the Unicode code point identified by one of these escape sequences is converted to the actual server encoding; an error is reported if that's not possible.

Also, the Unicode escape syntax for string constants only works when the configuration parameterstandard_conforming_stringsis turned on. This is because otherwise this syntax could confuse clients that parse the SQL statements to the point that it could lead to SQL injections and similar security issues. If the parameter is set to off, this syntax will be rejected with an error message.

# 4.1.2.4. Dollar-Quoted String Constants

While the standard syntax for specifying string constants is usually convenient, it can be difficult to understand when the desired string contains many single quotes or backslashes, since each of those must be doubled. To allow more readable queries in such situations, PostgreSQL provides another way, called “dollar quoting”, to write string constants. A dollar-quoted string constant consists of a dollar sign ($), an optional “tag” of zero or more characters, another dollar sign, an arbitrary sequence of characters that makes up the string content, a dollar sign, the same tag that began this dollar quote, and a dollar sign. For example, here are two different ways to specify the string “Dianne's horse” using dollar quoting:

$$Dianne's horse$$
$SomeTag$Dianne's horse$SomeTag$

Notice that inside the dollar-quoted string, single quotes can be used without needing to be escaped. Indeed, no characters inside a dollar-quoted string are ever escaped: the string content is always written literally. Backslashes are not special, and neither are dollar signs, unless they are part of a sequence matching the opening tag.

It is possible to nest dollar-quoted string constants by choosing different tags at each nesting level. This is most commonly used in writing function definitions. For example:

$function$
BEGIN
    RETURN ($1 ~ $q$[\t\r\n\v\\]$q$);
END;
$function$

Here, the sequence$q$[\t\r\n\v\\]$q$represents a dollar-quoted literal string[\t\r\n\v\\], which will be recognized when the function body is executed by PostgreSQL. But since the sequence does not match the outer dollar quoting delimiter$function$, it is just some more characters within the constant so far as the outer string is concerned.

The tag, if any, of a dollar-quoted string follows the same rules as an unquoted identifier, except that it cannot contain a dollar sign. Tags are case sensitive, so$tag$String content$tag$is correct, but$TAG$String content$tag$is not.

A dollar-quoted string that follows a keyword or identifier must be separated from it by whitespace; otherwise the dollar quoting delimiter would be taken as part of the preceding identifier.

Dollar quoting is not part of the SQL standard, but it is often a more convenient way to write complicated string literals than the standard-compliant single quote syntax. It is particularly useful when representing string constants inside other constants, as is often needed in procedural function definitions. With single-quote syntax, each backslash in the above example would have to be written as four backslashes, which would be reduced to two backslashes in parsing the original string constant, and then to one when the inner string constant is re-parsed during function execution.

# 4.1.2.5. Bit-String Constants

Bit-string constants look like regular string constants with aB(upper or lower case) immediately before the opening quote (no intervening whitespace), e.g.,B'1001'. The only characters allowed within bit-string constants are0and1.

Alternatively, bit-string constants can be specified in hexadecimal notation, using a leadingX(upper or lower case), e.g.,X'1FF'. This notation is equivalent to a bit-string constant with four binary digits for each hexadecimal digit.

Both forms of bit-string constant can be continued across lines in the same way as regular string constants. Dollar quoting cannot be used in a bit-string constant.

# 4.1.2.6. Numeric Constants

Numeric constants are accepted in these general forms:

digits
digits.[digits][e[+-]digits]
[digits].digits[e[+-]digits]
digitse[+-]digits

where*digits*is one or more decimal digits (0 through 9). At least one digit must be before or after the decimal point, if one is used. At least one digit must follow the exponent marker (e), if one is present. There cannot be any spaces or other characters embedded in the constant. Note that any leading plus or minus sign is not actually considered part of the constant; it is an operator applied to the constant.

These are some examples of valid numeric constants:

42
3.5
4.
.001
5e2
1.925e-3

A numeric constant that contains neither a decimal point nor an exponent is initially presumed to be typeintegerif its value fits in typeinteger(32 bits); otherwise it is presumed to be typebigintif its value fits in typebigint(64 bits); otherwise it is taken to be typenumeric. Constants that contain decimal points and/or exponents are always initially presumed to be typenumeric.

The initially assigned data type of a numeric constant is just a starting point for the type resolution algorithms. In most cases the constant will be automatically coerced to the most appropriate type depending on context. When necessary, you can force a numeric value to be interpreted as a specific data type by casting it.For example, you can force a numeric value to be treated as typereal(float4) by writing:

REAL '1.23'  -- string style
1.23::REAL   -- PostgreSQL (historical) style

These are actually just special cases of the general casting notations discussed next.

# 4.1.2.7. Constants of Other Types

A constant of anarbitrarytype can be entered using any one of the following notations:

type 'string'
'string'::type
CAST ( 'string' AS type )

The string constant's text is passed to the input conversion routine for the type called*type*. The result is a constant of the indicated type. The explicit type cast can be omitted if there is no ambiguity as to the type the constant must be (for example, when it is assigned directly to a table column), in which case it is automatically coerced.

The string constant can be written using either regular SQL notation or dollar-quoting.

It is also possible to specify a type coercion using a function-like syntax:

typename ( 'string' )

but not all type names can be used in this way; seeSection 4.2.9详情。

::,投掷(), 和函数调用语法也可用于指定任意表达式的运行时类型转换,如在第 4.2.9 节.为避免语法歧义,*类型*'*细绳*'语法只能用于指定简单文字常量的类型。另一个限制条件*类型*'*细绳*'语法是它不适用于数组类型;采用::要么投掷()指定数组常量的类型。

投掷()语法符合 SQL。这*类型*'*string*'syntax is a generalization of the standard: SQL specifies this syntax only for a few data types, but PostgreSQL allows it for all types. The syntax with::is historical PostgreSQL usage, as is the function-call syntax.

# 4.1.3. Operators

An operator name is a sequence of up toNAMEDATALEN-1 (63 by default) characters from the following list:

+-*/ \<>=~! @ # % ^ & | ` ?

There are a few restrictions on operator names, however:

  • --and/*cannot appear anywhere in an operator name, since they will be taken as the start of a comment.

  • A multiple-character operator name cannot end in+or-, unless the name also contains at least one of these characters:

    ~! @ # % ^ & | ` ?

    For example,@-is an allowed operator name, but*-is not. This restriction allows PostgreSQL to parse SQL-compliant queries without requiring spaces between tokens.

    When working with non-SQL-standard operator names, you will usually need to separate adjacent operators with spaces to avoid ambiguity. For example, if you have defined a prefix operator named@, you cannot writeX*@Y; you must writeX* @Yto ensure that PostgreSQL reads it as two operator names not one.

# 4.1.4. Special Characters

Some characters that are not alphanumeric have a special meaning that is different from being an operator. Details on the usage can be found at the location where the respective syntax element is described. This section only exists to advise the existence and summarize the purposes of these characters.

  • A dollar sign ($) followed by digits is used to represent a positional parameter in the body of a function definition or a prepared statement. In other contexts the dollar sign can be part of an identifier or a dollar-quoted string constant.

  • Parentheses (()) have their usual meaning to group expressions and enforce precedence. In some cases parentheses are required as part of the fixed syntax of a particular SQL command.

  • Brackets ([]) are used to select the elements of an array. SeeSection 8.15for more information on arrays.

  • Commas (,) are used in some syntactical constructs to separate the elements of a list.

  • The semicolon (;) terminates an SQL command. It cannot appear anywhere within a command, except within a string constant or quoted identifier.

  • The colon (:) is used to select “slices” from arrays. (SeeSection 8.15.) In certain SQL dialects (such as Embedded SQL), the colon is used to prefix variable names.

  • The asterisk (*) is used in some contexts to denote all the fields of a table row or composite value. It also has a special meaning when used as the argument of an aggregate function, namely that the aggregate does not require any explicit parameter.

  • The period (.) is used in numeric constants, and to separate schema, table, and column names.

# 4.1.5. Comments

A comment is a sequence of characters beginning with double dashes and extending to the end of the line, e.g.:

-- This is a standard SQL comment

Alternatively, C-style block comments can be used:

/* multiline comment
 * with nesting: /* nested block comment */
 */

where the comment begins with/*and extends to the matching occurrence of*/. These block comments nest, as specified in the SQL standard but unlike C, so that one can comment out larger blocks of code that might contain existing block comments.

A comment is removed from the input stream before further syntax analysis and is effectively replaced by whitespace.

# 4.1.6. Operator Precedence

Table 4.2shows the precedence and associativity of the operators in PostgreSQL. Most operators have the same precedence and are left-associative. The precedence and associativity of the operators is hard-wired into the parser. Add parentheses if you want an expression with multiple operators to be parsed in some other way than what the precedence rules imply.

Table 4.2. Operator Precedence (highest to lowest)

Operator/Element Associativity Description
. left table/column name separator
:: 剩下 PostgreSQL 风格的类型转换
[ ] 剩下 数组元素选择
+ - 一元加,一元减
^ 剩下 求幂
* / % 剩下 乘法、除法、取模
+ - 剩下 加法,减法
(任何其他运营商) 剩下 所有其他本机和用户定义的运算符
之间 我喜欢 相似的 范围包含、集合成员资格、字符串匹配
< > = <= >= <> 比较运算符
一片空白 非空 是真的,是假的,一片空白,区别于, 等等
不是 正确的 逻辑否定
剩下 逻辑合取
要么 剩下 逻辑析取

请注意,运算符优先级规则也适用于与上述内置运算符同名的用户定义运算符。例如,如果您为某些自定义数据类型定义“+”运算符,无论您做什么,它都将具有与内置“+”运算符相同的优先级。

当在操作员语法,例如:

SELECT 3 OPERATOR(pg_catalog.+) 4;

操作员构造被认为具有显示的默认优先级表 4.2对于“任何其他运营商”。无论内部出现哪个特定运算符,这都是正确的操作员().

# 笔记

9.5 之前的 PostgreSQL 版本使用略有不同的运算符优先级规则。特别是,<= >=<>曾经被视为通用运算符;过去的测试具有更高的优先级;和不在和相关的构造行为不一致,在某些情况下被认为具有优先权不是而不是之间.为了更好地符合 SQL 标准并减少对逻辑等效结构的不一致处理造成的混淆,这些规则已更改。在大多数情况下,这些更改不会导致行为改变,或者可能会导致“没有这样的操作员”失败,这可以通过添加括号来解决。但是,在某些极端情况下,查询可能会更改行为而不会报告任何解析错误。