# 70.2. TOAST

70.2.1. Out-of-Line, On-Disk TOAST Storage

70.2.2. Out-of-Line, In-Memory TOAST Storage

This section provides an overview of TOAST (The Oversized-Attribute Storage Technique).

PostgreSQL uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly. To overcome this limitation, large field values are compressed and/or broken up into multiple physical rows. This happens transparently to the user, with only small impact on most of the backend code. The technique is affectionately known as TOAST (or “the best thing since sliced bread”). The TOAST infrastructure is also used to improve handling of large data values in-memory.

Only certain data types support TOAST — there is no need to impose the overhead on data types that cannot produce large field values. To support TOAST, a data type must have a variable-length (varlena) representation, in which, ordinarily, the first four-byte word of any stored value contains the total length of the value in bytes (including itself). TOAST does not constrain the rest of the data type's representation. The special representations collectively calledTOASTed valueswork by modifying or reinterpreting this initial length word. Therefore, the C-level functions supporting a TOAST-able data type must be careful about how they handle potentially TOASTed input values: an input might not actually consist of a four-byte length word and contents until after it's beendetoasted. (This is normally done by invokingPG_DETOAST_DATUMbefore doing anything with an input value, but in some cases more efficient approaches are possible. SeeSection 38.13.1for more detail.)

TOAST usurps two bits of the varlena length word (the high-order bits on big-endian machines, the low-order bits on little-endian machines), thereby limiting the logical size of any value of a TOAST-able data type to 1 GB (230- 1 bytes). When both bits are zero, the value is an ordinary un-TOASTed value of the data type, and the remaining bits of the length word give the total datum size (including length word) in bytes. When the highest-order or lowest-order bit is set, the value has only a single-byte header instead of the normal four-byte header, and the remaining bits of that byte give the total datum size (including length byte) in bytes. This alternative supports space-efficient storage of values shorter than 127 bytes, while still allowing the data type to grow to 1 GB at need. Values with single-byte headers aren't aligned on any particular boundary, whereas values with four-byte headers are aligned on at least a four-byte boundary; this omission of alignment padding provides additional space savings that is significant compared to short values. As a special case, if the remaining bits of a single-byte header are all zero (which would be impossible for a self-inclusive length), the value is a pointer to out-of-line data, with several possible alternatives as described below. The type and size of such aTOAST pointerare determined by a code stored in the second byte of the datum. Lastly, when the highest-order or lowest-order bit is clear but the adjacent bit is set, the content of the datum has been compressed and must be decompressed before use. In this case the remaining bits of the four-byte length word give the total size of the compressed datum, not the original data. Note that compression is also possible for out-of-line data but the varlena header does not tell whether it has occurred — the content of the TOAST pointer tells that, instead.

The compression technique used for either in-line or out-of-line compressed data can be selected for each column by setting theCOMPRESSIONcolumn option inCREATE TABLEorALTER TABLE. The default for columns with no explicit setting is to consult thedefault_toast_compressionparameter at the time data is inserted.

As mentioned, there are multiple types of TOAST pointer datums. The oldest and most common type is a pointer to out-of-line data stored in aTOAST tablethat is separate from, but associated with, the table containing the TOAST pointer datum itself. Theseon-diskpointer datums are created by theTOAST management code (inaccess/common/toast_internals.c) when a tuple to be stored on disk is too large to be stored as-is. Further details appear inSection 70.2.1. Alternatively, a TOAST pointer datum can contain a pointer to out-of-line data that appears elsewhere in memory. Such datums are necessarily short-lived, and will never appear on-disk, but they are very useful for avoiding copying and redundant processing of large data values. Further details appear inSection 70.2.2.

# 70.2.1. Out-of-Line, On-Disk TOAST Storage

If any of the columns of a table are TOAST-able, the table will have an associated TOAST table, whose OID is stored in the table'spg_class.reltoastrelidentry. On-diskTOASTed values are kept in the TOAST table, as described in more detail below.

Out-of-line values are divided (after compression if used) into chunks of at mostTOAST_MAX_CHUNK_SIZEbytes (by default this value is chosen so that four chunk rows will fit on a page, making it about 2000 bytes). Each chunk is stored as a separate row in the TOAST table belonging to the owning table. EveryTOAST table has the columnschunk_id(an OID identifying the particular TOASTed value),chunk_seq(a sequence number for the chunk within its value), and块数据(块的实际数据)。唯一索引chunk_id块序列提供值的快速检索。因此,表示脱机磁盘 TOAST 值的指针数据需要存储要在其中查找的 TOAST 表的 OID 和特定值的 OID(其chunk_id)。为方便起见,指针数据还存储逻辑数据大小(原始未压缩数据长度)、物理存储大小(如果应用压缩则不同)和使用的压缩方法(如果有)。因此,考虑到 varlena 标头字节,磁盘上 TOAST 指针数据的总大小为 18 个字节,而与表示值的实际大小无关。

TOAST 管理代码仅在要存储在表中的行值大于TOAST_TUPLE_THRESHOLD字节(通常为 2 kB)。TOAST 代码将压缩和/或移动字段值到行外,直到行值短于TOAST_TUPLE_TARGET字节(通常也是 2 kB,可调整)或没有更多的增益。在 UPDATE 操作期间,未更改字段的值通常按原样保留;因此,如果行外值均未更改,则对具有行外值的行进行更新不会产生 TOAST 成本。

TOAST 管理代码识别四种不同的策略,用于在磁盘上存储支持 TOAST 的列:

  • 清楚的防止压缩或离线存储;此外,它禁止对 varlena 类型使用单字节标头。对于非 TOAST 数据类型的列,这是唯一可能的策略。

  • 扩展允许压缩和离线存储。这是大多数支持 TOAST 的数据类型的默认值。将首先尝试压缩,然后如果行仍然太大,则进行外联存储。

  • 外部的允许离线存储但不允许压缩。用于外部的将在宽范围内进行子字符串操作文本拜茶列更快(以增加存储空间为代价),因为这些操作经过优化,可以在未压缩时仅获取离线值的所需部分。

  • 主要的允许压缩但不允许离线存储。(实际上,仍然会为这些列执行外联存储,但只有在没有其他方法可以使行小到足以放在页面上时作为最后的手段。)

    每个支持 TOAST 的数据类型都为该数据类型的列指定了一个默认策略,但是给定表列的策略可以用ALTER TABLE ... SET STORAGE.

TOAST_TUPLE_TARGETcan be adjusted for each table usingALTER TABLE ... SET (toast_tuple_target = N)

This scheme has a number of advantages compared to a more straightforward approach such as allowing row values to span pages. Assuming that queries are usually qualified by comparisons against relatively small key values, most of the work of the executor will be done using the main row entry. The big values of TOASTed attributes will only be pulled out (if selected at all) at the time the result set is sent to the client. Thus, the main table is much smaller and more of its rows fit in the shared buffer cache than would be the case without any out-of-line storage. Sort sets shrink also, and sorts will more often be done entirely in memory. A little test showed that a table containing typical HTML pages and their URLs was stored in about half of the raw data size including the TOAST table, and that the main table contained only about 10% of the entire data (the URLs and some small HTML pages). There was no run time difference compared to an un-TOASTed comparison table, in which all the HTML pages were cut down to 7 kB to fit.

# 70.2.2. Out-of-Line, In-Memory TOAST Storage

TOAST pointers can point to data that is not on disk, but is elsewhere in the memory of the current server process. Such pointers obviously cannot be long-lived, but they are nonetheless useful. There are currently two sub-cases: pointers toindirectdata and pointers toexpandeddata.

Indirect TOAST pointers simply point at a non-indirect varlena value stored somewhere in memory. This case was originally created merely as a proof of concept, but it is currently used during logical decoding to avoid possibly having to create physical tuples exceeding 1 GB (as pulling all out-of-line field values into the tuple might do). The case is of limited use since the creator of the pointer datum is entirely responsible that the referenced data survives for as long as the pointer could exist, and there is no infrastructure to help with this.

Expanded TOAST pointers are useful for complex data types whose on-disk representation is not especially suited for computational purposes. As an example, the standard varlena representation of aPostgreSQL array includes dimensionality information, a nulls bitmap if there are any null elements, then the values of all the elements in order. When the element type itself is variable-length, the only way to find the*N*'th element is to scan through all the preceding elements. This representation is appropriate for on-disk storage because of its compactness, but for computations with the array it's much nicer to have an “expanded” or “deconstructed”representation in which all the element starting locations have been identified. The TOAST pointer mechanism supports this need by allowing a pass-by-reference Datum to point to either a standard varlena value (the on-disk representation) or a TOAST pointer that points to an expanded representation somewhere in memory. The details of this expanded representation are up to the data type, though it must have a standard header and meet the other API requirements given insrc/include/utils/expandeddatum.h. C-level functions working with the data type can choose to handle either representation. Functions that do not know about the expanded representation, but simply applyPG_DETOAST_DATUMto their inputs, will automatically receive the traditional varlena representation; so support for an expanded representation can be introduced incrementally, one function at a time.

TOAST pointers to expanded values are further broken down intoread-writeandread-onlypointers. The pointed-to representation is the same either way, but a function that receives a read-write pointer is allowed to modify the referenced value in-place, whereas one that receives a read-only pointer must not; it must first create a copy if it wants to make a modified version of the value. This distinction and some associated conventions make it possible to avoid unnecessary copying of expanded values during query execution.

For all types of in-memory TOAST pointer, the TOASTmanagement code ensures that no such pointer datum can accidentally get stored on disk. In-memory TOAST pointers are automatically expanded to normal in-line varlena values before storage — and then possibly converted to on-disk TOAST pointers, if the containing tuple would otherwise be too big.