递归SQL查询性能问题


9

这是我第六次尝试问这个问题,也是最短的一个。先前的所有尝试都是通过类似于博客文章的方式而不是问题本身来实现的,但是我向您保证,我的问题是真实的,只是它涉及一个大主题,而没有所有这些细节,该问题将包含在其中。不清楚我的问题是什么。所以这里...

抽象

我有一个数据库,它允许以某种幻想的方式存储数据,并提供了业务流程所需的一些非标准功能。功能如下:

  1. 通过仅插入方法实现的非破坏性和非阻塞性更新/删除,允许数据恢复和自动记录(每个更改都与进行更改的用户相关)
  2. 多版本数据(同一数据可能有多个版本)
  3. 数据库级权限
  4. 最终与ACID规范和交易安全的创建/更新/删除保持一致
  5. 可以将当前数据视图快退或快进到任何时间点。

我可能还没有提到其他功能。

数据库结构

所有用户数据Items均以JSON编码字符串(ntext)的形式存储在表中。所有数据库操作都通过两个存储过程GetLatest和进行InsertSnashot,它们允许对数据进行操作,类似于GIT操作源文件的方式。

结果数据在前端被链接(JOINed)成完全链接的图,因此在大多数情况下不需要进行数据库查询。

也可以将数据存储在常规SQL列中,而不是以Json编码形式存储它们。但是,这增加了总体复杂性。

读取数据

GetLatest结果以说明形式提供数据,请考虑下图进行说明:

结构图

该图显示了对单个记录所做的更改的演变。图表上的箭头显示了进行编辑所基于的版本(假设用户正在脱机更新某些数据,与在线用户进行的更新并行,这种情况会引入冲突,基本上是两种数据版本而不是一个)。

因此,GetLatest在以下输入时间跨度内调用将产生以下记录版本:

GetLatest 0, 15  => 1       <= The data is created upon it's first occurance
GetLatest 0, 25  => 2       <= Inserting another version on top of first one overwrites the existing version
GetLatest 0, 30  => 3       <= The overwrite takes place as soon as the data is inserted
GetLatest 0, 45  => 3, 4    <= This is where the conflict is introduced in the system
GetLatest 0, 55  => 4, 5    <= You can still edit all the versions
GetLatest 0, 65  => 4, 6    <= You can still edit all the versions
GetLatest 0, 75  => 4, 6, 7 <= You can also create additional conflicts
GetLatest 0, 85  => 4, 7, 8 <= You can still edit records
GetLatest 0, 95  => 7, 8, 9 <= You can still edit records
GetLatest 0, 105 => 7, 8    <= Inserting a record with `Json` equal to `NULL` means that the record is deleted
GetLatest 0, 115 => 8       <= Deleting the conflicting versions is the only conflict-resolution scenario
GetLatest 0, 125 => 8, X    <= The conflict can be based on the version that was already deleted.
GetLatest 0, 135 => 8, Y    <= You can delete such version too and both undelete another version on parallel within one Snapshot (or in several Snapshots).
GetLatest 0, 145 => 8       <= You can delete the undeleted versions by inserting NULL.
GetLatest 0, 155 => 8, Z    <= You can again undelete twice-deleted versions
GetLatest 0, 165 => 8       <= You can again delete three-times deleted versions
GetLatest 0, 10000 => 8     <= This means that in order to fast-forward view from moment 0 to moment `10000` you just have to expose record 8 to the user.
GetLatest 55, 115  => 8, [Remove 4], [Remove 5] <= At moment 55 there were two versions [4, 5] so in order to fast-forward to moment 115 the user has to delete versions 4 and 5 and introduce version 8. Please note that version 7 is not present in results since at moment 110 it got deleted.

为了GetLatest支持这种高效的接口中的每个记录包含特殊的服务属性BranchIdRecoveredOnCreatedOnUpdatedOnPrevUpdatedOnCurrUpdatedOnNextUpdatedOnNextId所使用的GetLatest计算出该记录是否属于充分进入规定的时间跨度GetLatest参数

插入资料

为了支持最终的一致性,交易安全性和性能,通过特殊的多阶段过程将数据插入数据库。

  1. 数据只是插入到数据库中,而不能由GetLatest存储过程查询。

  2. 使数据可用于GetLatest存储过程,使数据在规范化(即denormalized = 0)状态下可用。虽然数据是标准化的状态,在服务领域BranchIdRecoveredOnCreatedOnUpdatedOnPrevUpdatedOnCurrUpdatedOnNextUpdatedOnNextId正在计算这实在是太慢了。

  3. 为了加快处理速度,一旦将数据提供给GetLatest存储过程,就对其进行规范化处理。

    • 由于步骤1,2,3是在不同事务中进行的,因此可能在每个操作的中间发生硬件故障。使数据处于中间状态。这种情况是正常的,即使发生这种情况,数据也会在随后的InsertSnapshot调用中恢复。该部分的代码可以在InsertSnapshot存储过程的步骤2和3之间找到。

问题

(业务需要的)新功能迫使我重构特殊Denormalizer视图,该视图将所有功能捆绑在一起,并用于GetLatestInsertSnapshot。之后,我开始遇到性能问题。如果最初SELECT * FROM Denormalizer仅在几分之一秒内执行,那么现在要花费近5分钟来处理10000条记录。

我不是数据库专家,花了近六个月的时间才提出了当前的数据库结构。我先花了两周时间进行重构,然后尝试找出导致性能问题的根本原因。我只是找不到。我提供数据库备份(您可以在这里找到),因为架构(包含所有索引)非常大,适合SqlFiddle,数据库还包含我用于测试目的的过时数据(超过10000条记录) 。我还提供Denormalizer了重构后变得非常缓慢的视图文本:

ALTER VIEW [dbo].[Denormalizer]
AS
WITH Computed AS
(
    SELECT  currItem.Id,
            nextOperation.id AS NextId,
            prevOperation.FinishedOn AS PrevComputed,
            currOperation.FinishedOn AS CurrComputed,
            nextOperation.FinishedOn AS NextComputed

    FROM Items currItem 
    INNER JOIN dbo.Operations AS currOperation ON currItem.OperationId = currOperation.Id

    LEFT OUTER JOIN dbo.Items AS prevItem ON currItem.PreviousId = prevItem.Id
    LEFT OUTER JOIN dbo.Operations AS prevOperation ON prevItem.OperationId = prevOperation.Id 
    LEFT OUTER JOIN
    (
        SELECT MIN(I.id) as id, S.PreviousId, S.FinishedOn
        FROM Items I
        INNER JOIN
        (
            SELECT I.PreviousId, MIN(nxt.FinishedOn) AS FinishedOn
            FROM dbo.Items I
            LEFT OUTER JOIN dbo.Operations AS nxt ON I.OperationId = nxt.Id
            GROUP BY I.PreviousId
        ) AS S ON I.PreviousId = S.PreviousId 
        GROUP BY S.PreviousId, S.FinishedOn
    ) AS nextOperation ON nextOperation.PreviousId = currItem.Id

    WHERE currOperation.Finished = 1 AND currItem.Denormalized = 0
),

RecursionInitialization AS
(
    SELECT  currItem.Id,
            currItem.PreviousId,
            currItem.UUID,
            currItem.Json,
            currItem.TableName,
            currItem.OperationId,
            currItem.PermissionId,
            currItem.Denormalized,
            currItem.Id AS BranchID,
            COALESCE (C.PrevComputed, C.CurrComputed) AS CreatedOn,
            COALESCE (C.PrevComputed, CAST(0 AS BIGINT)) AS RecoveredOn,
            COALESCE (C.PrevComputed, CAST(0 AS BIGINT)) AS UpdatedOnPrev,
            C.CurrComputed AS UpdatedOnCurr,
            COALESCE (C.NextComputed, CAST(8640000000000000 AS BIGINT)) AS UpdatedOnNext,
            C.NextId AS UpdatedOnNextId,

            0 AS RecursionLevel

    FROM Items AS currItem
    INNER JOIN Computed AS C ON currItem.Id = C.Id
    WHERE currItem.Denormalized = 0

    UNION ALL

    SELECT  currItem.Id,
            currItem.PreviousId,
            currItem.UUID,
            currItem.Json,
            currItem.TableName,
            currItem.OperationId,
            currItem.PermissionId,
            currItem.Denormalized,
            currItem.BranchId,
            currItem.CreatedOn,
            currItem.RecoveredOn,
            currItem.UpdatedOnPrev,
            currItem.UpdatedOnCurr,
            currItem.UpdatedOnNext,
            currItem.UpdatedOnNextId,

            0 AS RecursionLevel

    FROM Items AS currItem
    WHERE currItem.Denormalized = 1
),
Recursion AS
(
    SELECT *
    FROM RecursionInitialization AS currItem

    UNION ALL

    SELECT  currItem.Id,
            currItem.PreviousId,
            currItem.UUID,
            currItem.Json,
            currItem.TableName,
            currItem.OperationId,
            currItem.PermissionId,
            currItem.Denormalized,

            CASE
                WHEN prevItem.UpdatedOnNextId = currItem.Id
                THEN prevItem.BranchID
                ELSE currItem.Id
            END AS BranchID,

            prevItem.CreatedOn AS CreatedOn,

            CASE
                WHEN prevItem.Json IS NULL
                THEN CASE
                            WHEN currItem.Json IS NULL
                            THEN prevItem.RecoveredOn
                            ELSE C.CurrComputed
                        END
                ELSE prevItem.RecoveredOn
            END AS RecoveredOn,

            prevItem.UpdatedOnCurr AS UpdatedOnPrev,

            C.CurrComputed AS UpdatedOnCurr,

            COALESCE (C.NextComputed, CAST(8640000000000000 AS BIGINT)) AS UpdatedOnNext,

            C.NextId,

            prevItem.RecursionLevel + 1 AS RecursionLevel
    FROM Items currItem
    INNER JOIN Computed C ON currItem.Id = C.Id
    INNER JOIN Recursion AS prevItem ON currItem.PreviousId = prevItem.Id
    WHERE currItem.Denormalized = 0
)
SELECT  item.Id,
        item.PreviousId,
        item.UUID,
        item.Json,
        item.TableName,
        item.OperationId,
        item.PermissionId,
        item.Denormalized,
        item.BranchID,
        item.CreatedOn,
        item.RecoveredOn,
        item.UpdatedOnPrev,
        item.UpdatedOnCurr,
        item.UpdatedOnNext,
        item.UpdatedOnNextId

FROM Recursion AS item
INNER JOIN
(
    SELECT Id, MAX(RecursionLevel) AS Recursion
    FROM Recursion AS item
    GROUP BY Id
) AS nested ON item.Id = nested.Id AND item.RecursionLevel = nested.Recursion
GO

问题

考虑了两种情况:非规范化和规范化的情况:

  1. 寻找原始备份,是什么让它SELECT * FROM Denormalizer如此缓慢地变慢,我觉得Denormalizer视图的递归部分存在问题,我尝试限制denormalized = 1但不影响性能。

  2. 运行后UPDATE Items SET Denormalized = 0它将使GetLatestSELECT * FROM Denormalizer碰上(最初认为是)缓慢的情况下,是有办法加快速度吧,当我们计算服务领域BranchIdRecoveredOnCreatedOnUpdatedOnPrevUpdatedOnCurrUpdatedOnNextUpdatedOnNextId

先感谢您

聚苯乙烯

我正在尝试使用标准SQL,以使查询可以轻松移植到其他数据库中,例如MySQL / Oracle / SQLite,以备将来使用,但是如果没有标准SQL可能会有所帮助,我可以坚持使用特定于数据库的构造。


1
关于您列出的标准SQL和数据库:您在此处使用CTE,而mySQL不支持它们,并且主要实现之间存在一些语法差异。此外,它们是目前postgres中的优化栅栏,可能会引起很大的性能问题。这些都不会让您停止使用它们,因为它们以“邻接表”形式出现在树上,通常是工作的正确工具,但是在考虑到兼容性时,要牢记这些要点,以便他们咬住您。为迁移到其他DBMS成为现实时所需的任何额外工作做准备。
David Spillett

谢谢,我正在尝试尽可能地坚持标准SQL。这样做的原因是,我认为将来需要将现有代码迁移到其他数据库时,它应该减少问题的数量。并非总是有可能。还有时间因素,它是方程式的一部分。我已经花了半年的时间来解决当前的数据库结构...我希望它包含仅标准的结构,但是如果还需要10年的时间,那将是不可行的方法...因此,如果您发现还有更多标准重构是可能的,我将很乐意接受...
Lu4

1
不,这听起来是一种务实的方式来处理此类情况下的不同优先级和复杂性。如果您还没有遇到这些问题,我只是不屑一顾。(现在最好知道,即使现在不可能/不做任何实际操作,而不是被生产中的意外所困扰! )。
David Spillett

Answers:


9

@ Lu4 ..我投票决定以“冰山一角”关闭此问题,但使用查询提示,您可以在1秒钟内运行它。可以重构该查询并可以使用CROSS APPLY,但这将是一个咨询工作,而不是在问答站点中的答案。

您的查询将在具有4个CPU和16GB RAM的服务器上运行13分钟以上。

在此处输入图片说明

我将您的查询更改为要使用,OPTION(MERGE JOIN)并且查询时间不到1秒

set nocount on 
set statistics io on
set statistics time on
;WITH Computed AS
(
    SELECT  currItem.Id,
            nextOperation.id AS NextId,
            prevOperation.FinishedOn AS PrevComputed,
            currOperation.FinishedOn AS CurrComputed,
            nextOperation.FinishedOn AS NextComputed

    FROM Items currItem 
    INNER JOIN dbo.Operations AS currOperation ON currItem.OperationId = currOperation.Id

    LEFT OUTER JOIN dbo.Items AS prevItem ON currItem.PreviousId = prevItem.Id
    LEFT OUTER JOIN dbo.Operations AS prevOperation ON prevItem.OperationId = prevOperation.Id 
    LEFT OUTER JOIN
    (
        SELECT MIN(I.id) as id, S.PreviousId, S.FinishedOn
        FROM Items I
        INNER JOIN
        (
            SELECT I.PreviousId, MIN(nxt.FinishedOn) AS FinishedOn
            FROM dbo.Items I
            LEFT OUTER JOIN dbo.Operations AS nxt ON I.OperationId = nxt.Id
            GROUP BY I.PreviousId
        ) AS S ON I.PreviousId = S.PreviousId 
        GROUP BY S.PreviousId, S.FinishedOn
    ) AS nextOperation ON nextOperation.PreviousId = currItem.Id

    WHERE currOperation.Finished = 1 AND currItem.Denormalized = 0
),

RecursionInitialization AS
(
    SELECT  currItem.Id,
            currItem.PreviousId,
            currItem.UUID,
            currItem.Json,
            currItem.TableName,
            currItem.OperationId,
            currItem.PermissionId,
            currItem.Denormalized,
            currItem.Id AS BranchID,
            COALESCE (C.PrevComputed, C.CurrComputed) AS CreatedOn,
            COALESCE (C.PrevComputed, CAST(0 AS BIGINT)) AS RecoveredOn,
            COALESCE (C.PrevComputed, CAST(0 AS BIGINT)) AS UpdatedOnPrev,
            C.CurrComputed AS UpdatedOnCurr,
            COALESCE (C.NextComputed, CAST(8640000000000000 AS BIGINT)) AS UpdatedOnNext,
            C.NextId AS UpdatedOnNextId,

            0 AS RecursionLevel

    FROM Items AS currItem
    INNER JOIN Computed AS C ON currItem.Id = C.Id
    WHERE currItem.Denormalized = 0

    UNION ALL

    SELECT  currItem.Id,
            currItem.PreviousId,
            currItem.UUID,
            currItem.Json,
            currItem.TableName,
            currItem.OperationId,
            currItem.PermissionId,
            currItem.Denormalized,
            currItem.BranchId,
            currItem.CreatedOn,
            currItem.RecoveredOn,
            currItem.UpdatedOnPrev,
            currItem.UpdatedOnCurr,
            currItem.UpdatedOnNext,
            currItem.UpdatedOnNextId,

            0 AS RecursionLevel

    FROM Items AS currItem
    WHERE currItem.Denormalized = 1
),
Recursion AS
(
    SELECT *
    FROM RecursionInitialization AS currItem

    UNION ALL

    SELECT  currItem.Id,
            currItem.PreviousId,
            currItem.UUID,
            currItem.Json,
            currItem.TableName,
            currItem.OperationId,
            currItem.PermissionId,
            currItem.Denormalized,

            CASE
                WHEN prevItem.UpdatedOnNextId = currItem.Id
                THEN prevItem.BranchID
                ELSE currItem.Id
            END AS BranchID,

            prevItem.CreatedOn AS CreatedOn,

            CASE
                WHEN prevItem.Json IS NULL
                THEN CASE
                            WHEN currItem.Json IS NULL
                            THEN prevItem.RecoveredOn
                            ELSE C.CurrComputed
                        END
                ELSE prevItem.RecoveredOn
            END AS RecoveredOn,

            prevItem.UpdatedOnCurr AS UpdatedOnPrev,

            C.CurrComputed AS UpdatedOnCurr,

            COALESCE (C.NextComputed, CAST(8640000000000000 AS BIGINT)) AS UpdatedOnNext,

            C.NextId,

            prevItem.RecursionLevel + 1 AS RecursionLevel
    FROM Items currItem
    INNER JOIN Computed C ON currItem.Id = C.Id
    INNER JOIN Recursion AS prevItem ON currItem.PreviousId = prevItem.Id
    WHERE currItem.Denormalized = 0
)
SELECT  item.Id,
        item.PreviousId,
        item.UUID,
        item.Json,
        item.TableName,
        item.OperationId,
        item.PermissionId,
        item.Denormalized,
        item.BranchID,
        item.CreatedOn,
        item.RecoveredOn,
        item.UpdatedOnPrev,
        item.UpdatedOnCurr,
        item.UpdatedOnNext,
        item.UpdatedOnNextId

FROM Recursion AS item
INNER JOIN
(
    SELECT Id, MAX(RecursionLevel) AS Recursion
    FROM Recursion AS item
    GROUP BY Id
) AS nested ON item.Id = nested.Id AND item.RecursionLevel = nested.Recursion
OPTION (MERGE JOIN)

set nocount oFF 
set statistics io OFF
set statistics time OFF

在此处输入图片说明

请注意,您不能在视图中使用查询提示,因此您必须找出将视图作为SP或其他解决方法的一种选择


1
非常感谢您,考虑到这个问题与stackoverflow标准相距甚远,这使您的工作对我来说价值两倍。我将在CROSS APPLY上做作业,并尝试找出OPTION(合并)。现在还不很清楚,该查询似乎是什么问题,但是我很确定我会解决这个问题,再次感谢您
Lu4

@ Lu4问题是查询优化器没有选择(或生成)最佳执行计划。在这种情况下,查询提示“鼓励”优化器使用特定策略来实现联接。有关更多详细信息,请参见联接提示(Transact-SQL)
肯尼·埃维特

CROSS APPLY很棒,但是我建议在尝试使用查询提示之前先阅读执行计划以及如何分析它们。
肯尼·埃维特
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.