PostgreSQL 16 文档: 69.3. 可扩展性

69.3. 可扩展性
上一个	向上	第 69 章 SP-GiST 索引	主页	下一个

69.3. 可扩展性 #

SP-GiST 提供了一个具有高抽象级别的接口，要求访问方法开发者仅实现特定于给定数据类型的方法。 SP-GiST 内核负责高效的磁盘映射和搜索树结构。它还负责并发性和日志记录注意事项。

SP-GiST 树的叶元组通常包含与索引列相同数据类型的数值，尽管它们也可能包含索引列的损耗表示。存储在根级别的叶元组将直接表示原始索引数据值，但较低级别的叶元组可能只包含部分值，例如后缀。在这种情况下，操作符类支持函数必须能够使用从传递到达到叶级别的内部元组中累积的信息来重建原始值。

当使用 INCLUDE 列创建 SP-GiST 索引时，这些列的值也存储在叶元组中。INCLUDE 列与 SP-GiST 操作符类无关，因此此处不再讨论它们。

内部元组更复杂，因为它们是搜索树中的分支点。每个内部元组包含一组一个或多个节点，它们表示相似叶值组。节点包含一个下行链接，该链接指向另一个较低级别的内部元组或指向同一索引页上的所有叶元组的短列表。每个节点通常有一个标签来描述它；例如，在基数树中，节点标签可以是字符串值的下一个字符。（或者，如果操作符类对所有内部元组使用一组固定节点，则可以省略节点标签；请参见第 69.4.2 节。）内部元组可以选择具有前缀值来描述其所有成员。在基数树中，这可以是所表示字符串的公共前缀。前缀值不一定真的是前缀，但可以是操作符类需要的任何数据；例如，在四叉树中，它可以存储四个象限相对于其测量的中心点。然后，四叉树内部元组还将包含四个与该中心点周围的象限相对应的节点。

一些树算法需要知道当前元组的级别（或深度），因此SP-GiST 核心为操作符类提供了在下降树时管理级别计数的可能性。在需要时，还支持增量重建表示的值，以及在树下降期间传递附加数据（称为遍历值）。

注意

SP-GiST 核心代码负责处理空条目。虽然SP-GiST 索引确实存储了索引列中的空值的条目，但这对索引操作符类代码是隐藏的：永远不会将空索引条目或搜索条件传递给操作符类方法。（假设SP-GiST 运算符是严格的，因此不能对空值成功。）因此，此处不再进一步讨论空值。

对于SP-GiST 的索引操作符类，有五个用户定义的方法必须提供，还有两个可选的方法。所有五个强制性方法都遵循接受两个internal 参数的约定，第一个参数是指向包含支持方法输入值的 C 结构的指针，而第二个参数是指向必须放置输出值的 C 结构的指针。四个强制性方法只返回void，因为它们的所有结果都出现在输出结构中；但leaf_consistent 返回boolean 结果。这些方法不得修改其输入结构的任何字段。在所有情况下，在调用用户定义的方法之前，输出结构都会初始化为零。可选的第六个方法compress 接受一个datum 作为唯一参数进行索引，并返回适合在叶元组中物理存储的值。可选的第七个方法options 接受一个internal 指针，指向 C 结构，其中应放置特定于操作类的参数，并返回void。

五种强制用户定义的方法是

config

返回有关索引实现的静态信息，包括前缀和节点标签数据类型的数据类型 OID。

SQL 函数声明必须如下所示

CREATE FUNCTION my_config(internal, internal) RETURNS void ...

第一个参数是指向 C 结构 spgConfigIn 的指针，其中包含函数的输入数据。第二个参数是指向 C 结构 spgConfigOut 的指针，函数必须用结果数据填充该结构。

typedef struct spgConfigIn
{
    Oid         attType;        /* Data type to be indexed */
} spgConfigIn;

typedef struct spgConfigOut
{
    Oid         prefixType;     /* Data type of inner-tuple prefixes */
    Oid         labelType;      /* Data type of inner-tuple node labels */
    Oid         leafType;       /* Data type of leaf-tuple values */
    bool        canReturnData;  /* Opclass can reconstruct original data */
    bool        longValuesOK;   /* Opclass can cope with values > 1 page */
} spgConfigOut;

attType 传递是为了支持多态索引操作符类；对于普通固定数据类型操作符类，它始终具有相同的值，因此可以忽略。

对于不使用前缀的操作符类，prefixType 可以设置为 VOIDOID。同样，对于不使用节点标签的操作符类，labelType 可以设置为 VOIDOID。如果操作符类能够重建最初提供的索引值，则应将 canReturnData 设置为 true。仅当 attType 为可变长度并且操作符类能够通过重复后缀对长值进行分段时，才应将 longValuesOK 设置为 true（请参阅第 69.4.1 节）。

leafType 应与操作符类的 opckeytype 目录项定义的索引存储类型相匹配。（请注意，opckeytype 可以为零，这意味着存储类型与操作符类的输入类型相同，这是最常见的情况。）出于向后兼容性的原因，config 方法可以将 leafType 设置为其他值，并且将使用该值；但这已弃用，因为索引内容随后在目录中被错误识别。此外，允许将 leafType 保留为未初始化（零）；这被解释为意味着从 opckeytype 派生的索引存储类型。

当 attType 和 leafType 不同时，必须提供可选方法 compress。方法 compress 负责将要编入索引的数据从 attType 转换为 leafType。

choose

选择一种将新值插入内部元组的方法。

SQL 函数声明必须如下所示

CREATE FUNCTION my_choose(internal, internal) RETURNS void ...

第一个参数是指向 C 结构 spgChooseIn 的指针，其中包含函数的输入数据。第二个参数是指向 C 结构 spgChooseOut 的指针，函数必须用结果数据填充该结构。

typedef struct spgChooseIn
{
    Datum       datum;          /* original datum to be indexed */
    Datum       leafDatum;      /* current datum to be stored at leaf */
    int         level;          /* current level (counting from zero) */

    /* Data from current inner tuple */
    bool        allTheSame;     /* tuple is marked all-the-same? */
    bool        hasPrefix;      /* tuple has a prefix? */
    Datum       prefixDatum;    /* if so, the prefix value */
    int         nNodes;         /* number of nodes in the inner tuple */
    Datum      *nodeLabels;     /* node label values (NULL if none) */
} spgChooseIn;

typedef enum spgChooseResultType
{
    spgMatchNode = 1,           /* descend into existing node */
    spgAddNode,                 /* add a node to the inner tuple */
    spgSplitTuple               /* split inner tuple (change its prefix) */
} spgChooseResultType;

typedef struct spgChooseOut
{
    spgChooseResultType resultType;     /* action code, see above */
    union
    {
        struct                  /* results for spgMatchNode */
        {
            int         nodeN;      /* descend to this node (index from 0) */
            int         levelAdd;   /* increment level by this much */
            Datum       restDatum;  /* new leaf datum */
        }           matchNode;
        struct                  /* results for spgAddNode */
        {
            Datum       nodeLabel;  /* new node's label */
            int         nodeN;      /* where to insert it (index from 0) */
        }           addNode;
        struct                  /* results for spgSplitTuple */
        {
            /* Info to form new upper-level inner tuple with one child tuple */
            bool        prefixHasPrefix;    /* tuple should have a prefix? */
            Datum       prefixPrefixDatum;  /* if so, its value */
            int         prefixNNodes;       /* number of nodes */
            Datum      *prefixNodeLabels;   /* their labels (or NULL for
                                             * no labels) */
            int         childNodeN;         /* which node gets child tuple */

            /* Info to form new lower-level inner tuple with all old nodes */
            bool        postfixHasPrefix;   /* tuple should have a prefix? */
            Datum       postfixPrefixDatum; /* if so, its value */
        }           splitTuple;
    }           result;
} spgChooseOut;

datum 是 spgConfigIn 的原始数据。attType 要插入到索引中的类型。leafDatum 是 spgConfigOut 的值。leafType 类型，它最初是当提供方法 compress 时对 datum 应用方法 compress 的结果，否则与 datum 的值相同。leafDatum 可以更改为树的较低级别，如果方法 choose 或 picksplit 更改了它。当插入搜索到达叶页面时，leafDatum 的当前值将存储在新建的叶元组中。level 是当前内部元组的级别，从根级别的零开始。allTheSame 为真，如果当前内部元组被标记为包含多个等效节点（请参见第 69.4.3 节）。hasPrefix 为真，如果当前内部元组包含前缀；如果是，prefixDatum 是它的值。nNodes 是内部元组中包含的子节点数，nodeLabels 是其标签值数组，如果没有标签，则为 NULL。

函数 choose 可以确定新值与现有子节点之一匹配，或者必须添加一个新的子节点，或者新值与元组前缀不一致，因此必须拆分内部元组以创建不太严格的前缀。

如果新值与现有子节点之一匹配，请将 resultType 设置为 spgMatchNode。将 nodeN 设置为节点数组中该节点的索引（从零开始）。将 levelAdd 设置为通过该节点下降导致的 level 的增量，或者如果操作符类不使用级别，则将其保留为零。将 restDatum 设置为等于 leafDatum（如果操作符类不修改从一个级别到下一个级别的日期），否则将其设置为要用于 leafDatum 的修改值在下一级。

如果必须添加一个新的子节点，将 resultType 设置为 spgAddNode。将 nodeLabel 设置为要用于新节点的标签，并将 nodeN 设置为在节点数组中插入节点的索引（从零开始）。添加节点后，将再次使用修改后的内部元组调用 choose 函数；该调用应导致 spgMatchNode 结果。

如果新值与元组前缀不一致，将 resultType 设置为 spgSplitTuple。此操作将所有现有节点移动到新的较低级别内部元组中，并将现有内部元组替换为具有指向新较低级别内部元组的单个下行链接的元组。设置 prefixHasPrefix 以指示新上层元组是否应具有前缀，如果应具有前缀，则将 prefixPrefixDatum 设置为前缀值。此新前缀值必须比原始值宽松很多，才能接受要编入索引的新值。将 prefixNNodes 设置为新元组中所需节点的数量，并将 prefixNodeLabels 设置为保存其标签的 palloc 数组，或在不需要节点标签时将其设置为 NULL。请注意，新上层元组的总大小不得大于它所替换的元组的总大小；这会限制新前缀和新标签的长度。将 childNodeN 设置为将下行链接到新较低级别内部元组的节点的索引（从零开始）。设置 postfixHasPrefix 以指示新较低级别内部元组是否应具有前缀，如果应具有前缀，则将 postfixPrefixDatum 设置为前缀值。这两个前缀和下行链接节点的标签（如果存在）的组合必须与原始前缀具有相同的含义，因为没有机会更改移动到新较低级别元组的节点标签，也没有机会更改任何子索引条目。拆分节点后，将再次使用替换内部元组调用 choose 函数。如果 spgSplitTuple 操作未创建合适的节点，则该调用可能会返回 spgAddNode 结果。最终，choose 必须返回 spgMatchNode 以允许插入降至下一级。

picksplit

决定如何针对一组叶元组创建新的内部元组。

SQL 函数声明必须如下所示

CREATE FUNCTION my_picksplit(internal, internal) RETURNS void ...

第一个参数是指向 C 结构 spgPickSplitIn 的指针，其中包含函数的输入数据。第二个参数是指向 C 结构 spgPickSplitOut 的指针，函数必须用结果数据填充该结构。

typedef struct spgPickSplitIn
{
    int         nTuples;        /* number of leaf tuples */
    Datum      *datums;         /* their datums (array of length nTuples) */
    int         level;          /* current level (counting from zero) */
} spgPickSplitIn;

typedef struct spgPickSplitOut
{
    bool        hasPrefix;      /* new inner tuple should have a prefix? */
    Datum       prefixDatum;    /* if so, its value */

    int         nNodes;         /* number of nodes for new inner tuple */
    Datum      *nodeLabels;     /* their labels (or NULL for no labels) */

    int        *mapTuplesToNodes;   /* node index for each leaf tuple */
    Datum      *leafTupleDatums;    /* datum to store in each new leaf tuple */
} spgPickSplitOut;

nTuples 是提供的叶元组数。 datums 是其数据值数组，类型为 spgConfigOut.leafType。 level 是所有叶元组共享的当前级别，它将成为新内部元组的级别。

设置 hasPrefix 以指示新内部元组是否应有前缀，如果应有，则将 prefixDatum 设置为前缀值。设置 nNodes 以指示新内部元组将包含的节点数，并将 nodeLabels 设置为其标签值数组，或在不需要节点标签时将其设置为 NULL。将 mapTuplesToNodes 设置为一个数组，该数组给出每个叶元组应分配到的节点的索引（从零开始）。将 leafTupleDatums 设置为要存储在新叶元组中的值数组（如果运算符类不修改从一个级别到下一级别的值，则这些值将与输入 datums 相同）。请注意， picksplit 函数负责 palloc'ing nodeLabels、mapTuplesToNodes 和 leafTupleDatums 数组。

如果提供了多个叶元组，则 picksplit 函数应将它们分类到多个节点中；否则无法将叶元组拆分到多个页面中，而这是此操作的最终目的。因此，如果 picksplit 函数最终将所有叶元组都放在同一节点中，则核心 SP-GiST 代码将覆盖该决策并生成一个内部元组，其中叶元组被随机分配到几个具有相同标签的节点中。这样的元组标记为 allTheSame 以表示已发生这种情况。 choose 和 inner_consistent 函数必须对这些内部元组进行适当处理。有关详细信息，请参见第 69.4.3 节。

仅当 config 函数将 longValuesOK 设置为 true 并且已提供大于页面的输入值时，才能将 picksplit 应用于单个叶元组。在这种情况下，操作的目的是剥离前缀并生成一个新的、更短的叶数据值。将重复调用，直到生成一个足够短以适合页面的叶数据为止。有关详细信息，请参见第 69.4.1 节。

inner_consistent

返回在树搜索期间要遵循的节点（分支）集。

SQL 函数声明必须如下所示

CREATE FUNCTION my_inner_consistent(internal, internal) RETURNS void ...

第一个参数是指向 C 结构 spgInnerConsistentIn 的指针，其中包含函数的输入数据。第二个参数是指向 C 结构 spgInnerConsistentOut 的指针，函数必须用结果数据填充该结构。

typedef struct spgInnerConsistentIn
{
    ScanKey     scankeys;       /* array of operators and comparison values */
    ScanKey     orderbys;       /* array of ordering operators and comparison
                                 * values */
    int         nkeys;          /* length of scankeys array */
    int         norderbys;      /* length of orderbys array */

    Datum       reconstructedValue;     /* value reconstructed at parent */
    void       *traversalValue; /* opclass-specific traverse value */
    MemoryContext traversalMemoryContext;   /* put new traverse values here */
    int         level;          /* current level (counting from zero) */
    bool        returnData;     /* original data must be returned? */

    /* Data from current inner tuple */
    bool        allTheSame;     /* tuple is marked all-the-same? */
    bool        hasPrefix;      /* tuple has a prefix? */
    Datum       prefixDatum;    /* if so, the prefix value */
    int         nNodes;         /* number of nodes in the inner tuple */
    Datum      *nodeLabels;     /* node label values (NULL if none) */
} spgInnerConsistentIn;

typedef struct spgInnerConsistentOut
{
    int         nNodes;         /* number of child nodes to be visited */
    int        *nodeNumbers;    /* their indexes in the node array */
    int        *levelAdds;      /* increment level by this much for each */
    Datum      *reconstructedValues;    /* associated reconstructed values */
    void      **traversalValues;        /* opclass-specific traverse values */
    double    **distances;              /* associated distances */
} spgInnerConsistentOut;

The array scankeys, of length nkeys, describes the index search condition(s). These conditions are combined with AND — only index entries that satisfy all of them are interesting. (Note that nkeys = 0 implies that all index entries satisfy the query.) Usually the consistent function only cares about the sk_strategy and sk_argument fields of each array entry, which respectively give the indexable operator and comparison value. In particular it is not necessary to check sk_flags to see if the comparison value is NULL, because the SP-GiST core code will filter out such conditions. The array orderbys, of length norderbys, describes ordering operators (if any) in the same manner. reconstructedValue is the value reconstructed for the parent tuple; it is (Datum) 0 at the root level or if the inner_consistent function did not provide a value at the parent level. traversalValue is a pointer to any traverse data passed down from the previous call of inner_consistent on the parent index tuple, or NULL at the root level. traversalMemoryContext is the memory context in which to store output traverse values (see below). level is the current inner tuple's level, starting at zero for the root level. returnData is true if reconstructed data is required for this query; this will only be so if the config function asserted canReturnData. allTheSame is true if the current inner tuple is marked “all-the-same”; in this case all the nodes have the same label (if any) and so either all or none of them match the query (see Section 69.4.3). hasPrefix is true if the current inner tuple contains a prefix; if so, prefixDatum is its value. nNodes is the number of child nodes contained in the inner tuple, and nodeLabels is an array of their label values, or NULL if the nodes do not have labels.

nNodes 必须设置为搜索需要访问的子节点数，nodeNumbers 必须设置为其索引的数组。如果运算符类跟踪级别，将 levelAdds 设置为在下降到要访问的每个节点时所需的级别增量数组。（通常这些增量对于所有节点都是相同的，但并非必然如此，因此使用数组。）如果需要值重构，将 reconstructedValues 设置为要访问的每个子节点重构的值的数组；否则，将 reconstructedValues 保留为 NULL。假定重构的值为 spgConfigOut.leafType 类型。（但是，由于核心系统除了可能复制它们之外不会对它们执行任何操作，因此它们具有与 leafType 相同的 typlen 和 typbyval 属性就足够了。）如果执行有序搜索，将 distances 设置为根据 orderbys 数组的距离值数组（距离最小的节点将首先处理）。否则，将其保留为 NULL。如果希望将附加带外信息（“遍历值”）传递到树搜索的较低级别，将 traversalValues 设置为适当遍历值的数组，每个要访问的子节点一个；否则，将 traversalValues 保留为 NULL。请注意，inner_consistent 函数负责在当前内存上下文中 palloc nodeNumbers、levelAdds、distances、reconstructedValues 和 traversalValues 数组。但是，traversalValues 数组指向的任何输出遍历值都应在 traversalMemoryContext 中分配。每个遍历值必须是一个单独的 palloc 块。

leaf_consistent

如果叶元组满足查询，则返回 true。

SQL 函数声明必须如下所示

CREATE FUNCTION my_leaf_consistent(internal, internal) RETURNS bool ...

第一个参数是指向 spgLeafConsistentIn C 结构的指针，其中包含函数的输入数据。第二个参数是指向 spgLeafConsistentOut C 结构的指针，函数必须用结果数据填充该结构。

typedef struct spgLeafConsistentIn
{
    ScanKey     scankeys;       /* array of operators and comparison values */
    ScanKey     orderbys;       /* array of ordering operators and comparison
                                 * values */
    int         nkeys;          /* length of scankeys array */
    int         norderbys;      /* length of orderbys array */

    Datum       reconstructedValue;     /* value reconstructed at parent */
    void       *traversalValue; /* opclass-specific traverse value */
    int         level;          /* current level (counting from zero) */
    bool        returnData;     /* original data must be returned? */

    Datum       leafDatum;      /* datum in leaf tuple */
} spgLeafConsistentIn;

typedef struct spgLeafConsistentOut
{
    Datum       leafValue;        /* reconstructed original data, if any */
    bool        recheck;          /* set true if operator must be rechecked */
    bool        recheckDistances; /* set true if distances must be rechecked */
    double     *distances;        /* associated distances */
} spgLeafConsistentOut;

长度为 nkeys 的数组 scankeys 描述了索引搜索条件。这些条件与 AND 结合使用 — 仅满足所有条件的索引条目才满足查询。（请注意，nkeys = 0 意味着所有索引条目都满足查询。）通常，一致性函数仅关心每个数组条目的 sk_strategy 和 sk_argument 字段，它们分别给出可索引运算符和比较值。特别是，无需检查 sk_flags 以查看比较值是否为 NULL，因为 SP-GiST 核心代码将过滤掉此类条件。长度为 norderbys 的数组 orderbys 以相同的方式描述排序运算符。 reconstructedValue 是为父元组重建的值；它在根级别或 inner_consistent 函数未在父级别提供值时为 (Datum) 0。 traversalValue 是指向从父索引元组上 inner_consistent 的前一次调用传递下来的任何遍历数据的指针，或在根级别为 NULL。 level 是当前叶元组的级别，从根级别的零开始。如果此查询需要重建数据，则 returnData 为 true；只有当 config 函数断言 canReturnData 时，情况才会如此。 leafDatum 是存储在当前叶元组中的 spgConfigOut.leafType 的键值。

如果叶元组与查询匹配，则函数必须返回 true，否则返回 false。在 true 情况下，如果 returnData 为 true，则必须将 leafValue 设置为最初提供给此叶元组进行索引的值（类型为 spgConfigIn.attType）。此外，如果匹配不确定，则可以将 recheck 设置为 true，因此必须将运算符重新应用于实际堆元组以验证匹配。如果执行有序搜索，请根据 orderbys 数组将 distances 设置为距离值数组。否则将其保留为 NULL。如果返回的距离中至少有一个不准确，请将 recheckDistances 设置为 true。在这种情况下，执行器将在从堆中获取元组后计算准确的距离，并在需要时重新排序元组。

可选的用户定义方法为

Datum compress(Datum in)

将数据项转换为适合在索引的叶元组中进行物理存储的格式。它接受类型为 spgConfigIn.attType 的值，并返回类型为 spgConfigOut.leafType 的值。输出值不得包含非内联 TOAST 指针。

注意：compress 方法仅应用于要存储的值。一致的方法接收查询 scankeys 不变，不使用 compress 进行转换。

options

定义一组用户可见的参数，用于控制运算符类的行为。

SQL 函数声明必须如下所示

CREATE OR REPLACE FUNCTION my_options(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

该函数传递给 local_relopts 结构的指针，该指针需要填充一组运算符类特定选项。可以使用 PG_HAS_OPCLASS_OPTIONS() 和 PG_GET_OPCLASS_OPTIONS() 宏从其他支持函数访问这些选项。

由于 SP-GiST 中键的表示是灵活的，因此它可能取决于用户指定的参数。

所有 SP-GiST 支持方法通常在短期内存上下文中调用；也就是说，在处理每个元组后，CurrentMemoryContext 将被重置。因此，不必非常担心 pfree 所有 palloc 的内容。（config 方法是一个例外：它应该尝试避免内存泄漏。但通常 config 方法只需将常量分配到传递的参数结构中即可。）

如果索引列是可整理数据类型，则将使用标准 PG_GET_COLLATION() 机制将索引整理规则传递给所有支持方法。

上一个	向上	下一个
69.2. 内置运算符类	主页	69.4. 实现