这是我第六次尝试问这个问题,也是最短的一个。先前的所有尝试都是通过类似于博客文章的方式而不是问题本身来实现的,但是我向您保证,我的问题是真实的,只是它涉及一个大主题,而没有所有这些细节,该问题将包含在其中。不清楚我的问题是什么。所以这里...
抽象
我有一个数据库,它允许以某种幻想的方式存储数据,并提供了业务流程所需的一些非标准功能。功能如下:
- 通过仅插入方法实现的非破坏性和非阻塞性更新/删除,允许数据恢复和自动记录(每个更改都与进行更改的用户相关)
- 多版本数据(同一数据可能有多个版本)
- 数据库级权限
- 最终与ACID规范和交易安全的创建/更新/删除保持一致
- 可以将当前数据视图快退或快进到任何时间点。
我可能还没有提到其他功能。
数据库结构
所有用户数据Items
均以JSON编码字符串(ntext
)的形式存储在表中。所有数据库操作都通过两个存储过程GetLatest
和进行InsertSnashot
,它们允许对数据进行操作,类似于GIT操作源文件的方式。
结果数据在前端被链接(JOINed)成完全链接的图,因此在大多数情况下不需要进行数据库查询。
也可以将数据存储在常规SQL列中,而不是以Json编码形式存储它们。但是,这增加了总体复杂性。
读取数据
GetLatest
结果以说明形式提供数据,请考虑下图进行说明:
该图显示了对单个记录所做的更改的演变。图表上的箭头显示了进行编辑所基于的版本(假设用户正在脱机更新某些数据,与在线用户进行的更新并行,这种情况会引入冲突,基本上是两种数据版本而不是一个)。
因此,GetLatest
在以下输入时间跨度内调用将产生以下记录版本:
GetLatest 0, 15 => 1 <= The data is created upon it's first occurance
GetLatest 0, 25 => 2 <= Inserting another version on top of first one overwrites the existing version
GetLatest 0, 30 => 3 <= The overwrite takes place as soon as the data is inserted
GetLatest 0, 45 => 3, 4 <= This is where the conflict is introduced in the system
GetLatest 0, 55 => 4, 5 <= You can still edit all the versions
GetLatest 0, 65 => 4, 6 <= You can still edit all the versions
GetLatest 0, 75 => 4, 6, 7 <= You can also create additional conflicts
GetLatest 0, 85 => 4, 7, 8 <= You can still edit records
GetLatest 0, 95 => 7, 8, 9 <= You can still edit records
GetLatest 0, 105 => 7, 8 <= Inserting a record with `Json` equal to `NULL` means that the record is deleted
GetLatest 0, 115 => 8 <= Deleting the conflicting versions is the only conflict-resolution scenario
GetLatest 0, 125 => 8, X <= The conflict can be based on the version that was already deleted.
GetLatest 0, 135 => 8, Y <= You can delete such version too and both undelete another version on parallel within one Snapshot (or in several Snapshots).
GetLatest 0, 145 => 8 <= You can delete the undeleted versions by inserting NULL.
GetLatest 0, 155 => 8, Z <= You can again undelete twice-deleted versions
GetLatest 0, 165 => 8 <= You can again delete three-times deleted versions
GetLatest 0, 10000 => 8 <= This means that in order to fast-forward view from moment 0 to moment `10000` you just have to expose record 8 to the user.
GetLatest 55, 115 => 8, [Remove 4], [Remove 5] <= At moment 55 there were two versions [4, 5] so in order to fast-forward to moment 115 the user has to delete versions 4 and 5 and introduce version 8. Please note that version 7 is not present in results since at moment 110 it got deleted.
为了GetLatest
支持这种高效的接口中的每个记录包含特殊的服务属性BranchId
,RecoveredOn
,CreatedOn
,UpdatedOnPrev
,UpdatedOnCurr
,UpdatedOnNext
,UpdatedOnNextId
所使用的GetLatest
计算出该记录是否属于充分进入规定的时间跨度GetLatest
参数
插入资料
为了支持最终的一致性,交易安全性和性能,通过特殊的多阶段过程将数据插入数据库。
数据只是插入到数据库中,而不能由
GetLatest
存储过程查询。使数据可用于
GetLatest
存储过程,使数据在规范化(即denormalized = 0
)状态下可用。虽然数据是标准化的状态,在服务领域BranchId
,RecoveredOn
,CreatedOn
,UpdatedOnPrev
,UpdatedOnCurr
,UpdatedOnNext
,UpdatedOnNextId
正在计算这实在是太慢了。为了加快处理速度,一旦将数据提供给
GetLatest
存储过程,就对其进行规范化处理。- 由于步骤1,2,3是在不同事务中进行的,因此可能在每个操作的中间发生硬件故障。使数据处于中间状态。这种情况是正常的,即使发生这种情况,数据也会在随后的
InsertSnapshot
调用中恢复。该部分的代码可以在InsertSnapshot
存储过程的步骤2和3之间找到。
- 由于步骤1,2,3是在不同事务中进行的,因此可能在每个操作的中间发生硬件故障。使数据处于中间状态。这种情况是正常的,即使发生这种情况,数据也会在随后的
问题
(业务需要的)新功能迫使我重构特殊Denormalizer
视图,该视图将所有功能捆绑在一起,并用于GetLatest
和InsertSnapshot
。之后,我开始遇到性能问题。如果最初SELECT * FROM Denormalizer
仅在几分之一秒内执行,那么现在要花费近5分钟来处理10000条记录。
我不是数据库专家,花了近六个月的时间才提出了当前的数据库结构。我先花了两周时间进行重构,然后尝试找出导致性能问题的根本原因。我只是找不到。我提供数据库备份(您可以在这里找到),因为架构(包含所有索引)非常大,适合SqlFiddle,数据库还包含我用于测试目的的过时数据(超过10000条记录) 。我还提供Denormalizer
了重构后变得非常缓慢的视图文本:
ALTER VIEW [dbo].[Denormalizer]
AS
WITH Computed AS
(
SELECT currItem.Id,
nextOperation.id AS NextId,
prevOperation.FinishedOn AS PrevComputed,
currOperation.FinishedOn AS CurrComputed,
nextOperation.FinishedOn AS NextComputed
FROM Items currItem
INNER JOIN dbo.Operations AS currOperation ON currItem.OperationId = currOperation.Id
LEFT OUTER JOIN dbo.Items AS prevItem ON currItem.PreviousId = prevItem.Id
LEFT OUTER JOIN dbo.Operations AS prevOperation ON prevItem.OperationId = prevOperation.Id
LEFT OUTER JOIN
(
SELECT MIN(I.id) as id, S.PreviousId, S.FinishedOn
FROM Items I
INNER JOIN
(
SELECT I.PreviousId, MIN(nxt.FinishedOn) AS FinishedOn
FROM dbo.Items I
LEFT OUTER JOIN dbo.Operations AS nxt ON I.OperationId = nxt.Id
GROUP BY I.PreviousId
) AS S ON I.PreviousId = S.PreviousId
GROUP BY S.PreviousId, S.FinishedOn
) AS nextOperation ON nextOperation.PreviousId = currItem.Id
WHERE currOperation.Finished = 1 AND currItem.Denormalized = 0
),
RecursionInitialization AS
(
SELECT currItem.Id,
currItem.PreviousId,
currItem.UUID,
currItem.Json,
currItem.TableName,
currItem.OperationId,
currItem.PermissionId,
currItem.Denormalized,
currItem.Id AS BranchID,
COALESCE (C.PrevComputed, C.CurrComputed) AS CreatedOn,
COALESCE (C.PrevComputed, CAST(0 AS BIGINT)) AS RecoveredOn,
COALESCE (C.PrevComputed, CAST(0 AS BIGINT)) AS UpdatedOnPrev,
C.CurrComputed AS UpdatedOnCurr,
COALESCE (C.NextComputed, CAST(8640000000000000 AS BIGINT)) AS UpdatedOnNext,
C.NextId AS UpdatedOnNextId,
0 AS RecursionLevel
FROM Items AS currItem
INNER JOIN Computed AS C ON currItem.Id = C.Id
WHERE currItem.Denormalized = 0
UNION ALL
SELECT currItem.Id,
currItem.PreviousId,
currItem.UUID,
currItem.Json,
currItem.TableName,
currItem.OperationId,
currItem.PermissionId,
currItem.Denormalized,
currItem.BranchId,
currItem.CreatedOn,
currItem.RecoveredOn,
currItem.UpdatedOnPrev,
currItem.UpdatedOnCurr,
currItem.UpdatedOnNext,
currItem.UpdatedOnNextId,
0 AS RecursionLevel
FROM Items AS currItem
WHERE currItem.Denormalized = 1
),
Recursion AS
(
SELECT *
FROM RecursionInitialization AS currItem
UNION ALL
SELECT currItem.Id,
currItem.PreviousId,
currItem.UUID,
currItem.Json,
currItem.TableName,
currItem.OperationId,
currItem.PermissionId,
currItem.Denormalized,
CASE
WHEN prevItem.UpdatedOnNextId = currItem.Id
THEN prevItem.BranchID
ELSE currItem.Id
END AS BranchID,
prevItem.CreatedOn AS CreatedOn,
CASE
WHEN prevItem.Json IS NULL
THEN CASE
WHEN currItem.Json IS NULL
THEN prevItem.RecoveredOn
ELSE C.CurrComputed
END
ELSE prevItem.RecoveredOn
END AS RecoveredOn,
prevItem.UpdatedOnCurr AS UpdatedOnPrev,
C.CurrComputed AS UpdatedOnCurr,
COALESCE (C.NextComputed, CAST(8640000000000000 AS BIGINT)) AS UpdatedOnNext,
C.NextId,
prevItem.RecursionLevel + 1 AS RecursionLevel
FROM Items currItem
INNER JOIN Computed C ON currItem.Id = C.Id
INNER JOIN Recursion AS prevItem ON currItem.PreviousId = prevItem.Id
WHERE currItem.Denormalized = 0
)
SELECT item.Id,
item.PreviousId,
item.UUID,
item.Json,
item.TableName,
item.OperationId,
item.PermissionId,
item.Denormalized,
item.BranchID,
item.CreatedOn,
item.RecoveredOn,
item.UpdatedOnPrev,
item.UpdatedOnCurr,
item.UpdatedOnNext,
item.UpdatedOnNextId
FROM Recursion AS item
INNER JOIN
(
SELECT Id, MAX(RecursionLevel) AS Recursion
FROM Recursion AS item
GROUP BY Id
) AS nested ON item.Id = nested.Id AND item.RecursionLevel = nested.Recursion
GO
问题
考虑了两种情况:非规范化和规范化的情况:
寻找原始备份,是什么让它
SELECT * FROM Denormalizer
如此缓慢地变慢,我觉得Denormalizer视图的递归部分存在问题,我尝试限制denormalized = 1
但不影响性能。运行后
UPDATE Items SET Denormalized = 0
它将使GetLatest
和SELECT * FROM Denormalizer
碰上(最初认为是)缓慢的情况下,是有办法加快速度吧,当我们计算服务领域BranchId
,RecoveredOn
,CreatedOn
,UpdatedOnPrev
,UpdatedOnCurr
,UpdatedOnNext
,UpdatedOnNextId
先感谢您
聚苯乙烯
我正在尝试使用标准SQL,以使查询可以轻松移植到其他数据库中,例如MySQL / Oracle / SQLite,以备将来使用,但是如果没有标准SQL可能会有所帮助,我可以坚持使用特定于数据库的构造。