如果并行交换事件死锁是无受害者的,这有问题吗?


10

我们在生产环境中会看到很多此类查询内并行线程死锁(SQL Server 2012 SP2-是的...我知道...),但是当查看通过扩展事件捕获的死锁XML时,受害者名单是空的。

<victim-list />

死锁似乎在4个线程之间,两个带有WaitType="e_waitPipeNewRow",两个带有WaitType="e_waitPipeGetRow"

 <resource-list>
  <exchangeEvent id="Pipe13904cb620" WaitType="e_waitPipeNewRow" nodeId="19">
   <owner-list>
    <owner id="process4649868" />
   </owner-list>
   <waiter-list>
    <waiter id="process40eb498" />
   </waiter-list>
  </exchangeEvent>
  <exchangeEvent id="Pipe30670d480" WaitType="e_waitPipeNewRow" nodeId="21">
   <owner-list>
    <owner id="process368ecf8" />
   </owner-list>
   <waiter-list>
    <waiter id="process46a0cf8" />
   </waiter-list>
  </exchangeEvent>
  <exchangeEvent id="Pipe13904cb4e0" WaitType="e_waitPipeGetRow" nodeId="19">
   <owner-list>
    <owner id="process40eb498" />
   </owner-list>
   <waiter-list>
    <waiter id="process368ecf8" />
   </waiter-list>
  </exchangeEvent>
  <exchangeEvent id="Pipe4a106e060" WaitType="e_waitPipeGetRow" nodeId="21">
   <owner-list>
    <owner id="process46a0cf8" />
   </owner-list>
   <waiter-list>
    <waiter id="process4649868" />
   </waiter-list>
  </exchangeEvent>
 </resource-list>

所以:

  1. 受害者名单是空的
  2. 运行查询的应用程序不会出错并完成查询
  3. 据我们所见,除了捕获图形外,没有明显的问题。

因此,除了噪音以外,还有什么需要担心的吗?

编辑:感谢Paul的回答,我可以看到问题可能发生在哪里,并且似乎可以通过tempdb溢出来解决。 在此处输入图片说明

Answers:


11

当通过交换溢出解决查询内并行死锁时,如果死锁图是这种样子的,我不会感到惊讶(因此,除了性能之外,没有其他受害者)。

您可以通过捕获交换溢出并使其与死锁匹配(或不匹配)来证实这一理论。

将交换缓冲区写入tempdb以解决死锁是不理想的。寻求消除执行计划中顺序保留操作的顺序(例如,顺序保留交换提供并行合并联接)。除非这不会引起明显的性能问题,否则您还有其他事情要担心。

出于兴趣,高碎片化/过时的统计数据是否可能使这个问题恶化?

碎片,没有。过时的统计信息:我没有想到的任何特定含义,不。当然,无代表性的统计数据通常很少是一件好事。

这里的根本问题是,当线程之间的依赖关系尽可能少时,并行性最有效。保留顺序会引入相当讨厌的依赖关系。事情很容易陷入困境,而清除僵局的唯一方法是将交易所保留的一堆行溢出到tempdb


-1

为了将这些非关键的“通过溢出自行解决”死锁与更重要的死锁区分开,可以将某些搜索语义应用于Xdl结构。

示例输出

以下SP不能立即使用,因为它取决于ufn_ExtractSubstringsByPattern(),但是该方法可以替换为直接返回不同计数的方法。

ALTER view [Common].[DeadLockRecentHistoryView]
as
/*---------------------------------------------------------------------------------------------------------------------
    Purpose:  List history of recent deadlock events

    Warning:  The XML processing may hit a recursion limit (100), suggest using "option (maxrecursion 10000)".

    Xdl File:
        The SSMS deadlock file format .XDL format (xml) has changed with later versions of SQL Server.  This version tested with 2012.

    Ring Buffer issues:
        https://connect.microsoft.com/SQLServer/feedback/details/754115/xevents-system-health-does-not-catch-all-deadlocks
        https://www.sqlskills.com/blogs/jonathan/why-i-hate-the-ring_buffer-target-in-extended-events/

    Links:
        http://www.sqlskills.com/blogs/jonathan/multi-victim-deadlocks/
        https://www.sqlskills.com/blogs/jonathan/graphically-viewing-extended-events-deadlock-graphs/
        http://www.mssqltips.com/sqlservertip/1234/capturing-sql-server-deadlock-information-in-xml-format/
        http://blogs.msdn.com/b/sqldatabasetalk/archive/2013/05/01/tracking-down-deadlocks-in-sql-database.aspx
        http://dba.stackexchange.com/questions/10644/deadlock-error-isnt-returning-the-deadlock-sql/10646#10646        

    Modified    By           Description
    ----------  -----------  ------------------------------------------------------------------------------------------
    2014.10.29  crokusek     From Internet, http://stackoverflow.com/questions/19817951
    2015.05.05  crokusek     Improve so that the output is consumable by SSMS 2012 as "Open .xdl file"                             
    2015.05.22  crokusek     Remove special character for the cast to Xml (like '&')
    2017.08.03  crokusek     Abandon ring-buffer approach and use event log files.  Filter out internal deadlocks.
    2018.07.16  crokusek     Added field(s) like ProbablyHandledBySpill to help identify non-critical deadlocks.
  ---------------------------------------------------------------------------------------------------------------------*/
with XmlDeadlockReports as
(
  select convert(xml, event_data) as EventData         
    from sys.fn_xe_file_target_read_file(N'system_health*.xel', NULL, NULL, NULL)      
   where substring(event_data, 1, 50) like '%"xml_deadlock_report"%'       
)
select top 10000
       EventData.value('(event/@timestamp)[1]', 'datetime2(7)') as CreatedUtc,
       --(select TimePst from Common.ufn_ConvertUtcToPst(EventData.value('(event/@timestamp)[1]', 'datetime2(7)'))) as CreatedPst,
       DistinctSpidCount,       
       HasExchangeEvent,
       IsVictimless,                  
       --
       -- If the deadlock contains Exchange Events and lists no victims, it probably occurred
       -- during execution of a single query that contained parallellism but got stuck due to 
       -- ordering issues.   /dba/197779
       -- 
       -- These will not raise an exception to the caller and will complete by spilling to tempdb
       -- however they may run much slower than they would without the spill(s).
       --
       convert(bit, iif(DistinctSpidCount = 1 and HasExchangeEvent = 1 and IsVictimless = 1, 1, 0)) as ProbablyHandledBySpill,
       len(et.XdlFileText) as LenXdlFile,
       eddl.XdlFile as XdlFile
  from XmlDeadlockReports
 cross apply 
     ( 
       select eventData.query('event/data/value/deadlock') as XdlFile 
     ) eddl
 cross apply 
     ( 
        select convert(nvarchar(max), eddl.XdlFile) as XdlFileText 
     ) as et
 cross apply 
     (
       select count(distinct Match) as DistinctSpidCount
         from common.ufn_ExtractSubstringsByPattern(et.XdlFileText, 'spid="%%"')
     ) spids
 cross apply
     (
       select convert(bit, iif(charindex('<exchangeEvent', et.XdlFileText) > 0, 1, 0)) as HasExchangeEvent,
              --
              convert(bit, iif(     charindex('<victim-list>', et.XdlFileText) = 0
                                and charindex('<victim-list/>', et.XdlFileText) > 0, 1, 0)) as IsVictimless
     ) as flags        
 order by CreatedUtc desc
GO
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.