从胜负平局数据获取连胜数和连胜类型


15

我做了一个SQL小提琴为这个问题,是否对任何人来说都使事情变得容易。

我有一个各种各样的幻想体育数据库,我想弄清楚的是如何得出“当前连胜”数据(如“ W2”(如果该队赢得了他们的最后两场比赛)或“ L1”(如果他们输了)他们赢得上一场比赛后的最后一场比赛-如果他们并列最近的比赛,则为“ T1”)。

这是我的基本架构:

CREATE TABLE FantasyTeams (
  team_id BIGINT NOT NULL
)

CREATE TABLE FantasyMatches(
    match_id BIGINT NOT NULL,
    home_fantasy_team_id BIGINT NOT NULL,
    away_fantasy_team_id BIGINT NOT NULL,
    fantasy_season_id BIGINT NOT NULL,
    fantasy_league_id BIGINT NOT NULL,
    fantasy_week_id BIGINT NOT NULL,
    winning_team_id BIGINT NULL
)

的值NULLwinning_team_id列指示该匹配领带。

这是一个DML声明示例,其中包含6个团队和3周对战的示例数据:

INSERT INTO FantasyTeams
SELECT 1
UNION
SELECT 2
UNION
SELECT 3
UNION
SELECT 4
UNION
SELECT 5
UNION
SELECT 6

INSERT INTO FantasyMatches
SELECT 1, 2, 1, 2, 4, 44, 2
UNION
SELECT 2, 5, 4, 2, 4, 44, 5
UNION
SELECT 3, 6, 3, 2, 4, 44, 3
UNION
SELECT 4, 2, 4, 2, 4, 45, 2
UNION
SELECT 5, 3, 1, 2, 4, 45, 3
UNION
SELECT 6, 6, 5, 2, 4, 45, 6
UNION
SELECT 7, 2, 6, 2, 4, 46, 2
UNION
SELECT 8, 3, 5, 2, 4, 46, 3
UNION
SELECT 9, 4, 1, 2, 4, 46, NULL

GO

这是所需输出(基于上述DML)的示例,我什至开始想出如何导出时都会遇到麻烦:

| TEAM_ID | STEAK_TYPE | STREAK_COUNT |
|---------|------------|--------------|
|       1 |          T |            1 |
|       2 |          W |            3 |
|       3 |          W |            3 |
|       4 |          T |            1 |
|       5 |          L |            2 |
|       6 |          L |            1 |

我已经尝试过使用子查询和CTE的各种方法,但无法将它们组合在一起。我想避免使用游标,因为将来我可能会有较大的数据集对该游标进行操作。我觉得可能存在一种涉及表变量的方法,该变量以某种方式将这些数据自身连接在一起,但我仍在努力。

附加信息:球队的数量可能会有所变化(6到10之间的任何偶数),每周每队的总比赛数将增加1。关于如何执行此操作的任何想法?


2
顺便说一句,我见过的所有此类模式都使用三态(例如1 2 3表示主场胜利/平局/客场胜利)列作为比赛结果,而不是您的id为NULL / NULL / id的winning_team_id。DB必须检查的约束少了一点。
AakashM 2014年

那么,您是说我设置的设计“好”吗?
jamauss

1
好吧,如果有人要我发表评论,我会说:1)为什么这么多名字中的“幻想” 2)为什么bigint这么多列int可能在其中做3)为什么所有_s?4)我更喜欢表名是单数,但要承认并不是每个人都同意我// //但除了您在此处向我们展示的内容外,它们看起来是连贯的,是的
AakashM 2014年

Answers:


17

由于您使用的是SQL Server 2012,因此可以使用几个新的窗口功能。

with C1 as
(
  select T.team_id,
         case
           when M.winning_team_id is null then 'T'
           when M.winning_team_id = T.team_id then 'W'
           else 'L'
         end as streak_type,
         M.match_id
  from FantasyMatches as M
    cross apply (values(M.home_fantasy_team_id),
                       (M.away_fantasy_team_id)) as T(team_id)
), C2 as
(
  select C1.team_id,
         C1.streak_type,
         C1.match_id,
         lag(C1.streak_type, 1, C1.streak_type) 
           over(partition by C1.team_id 
                order by C1.match_id desc) as lag_streak_type
  from C1
), C3 as
(
  select C2.team_id,
         C2.streak_type,
         sum(case when C2.lag_streak_type = C2.streak_type then 0 else 1 end) 
           over(partition by C2.team_id 
                order by C2.match_id desc rows unbounded preceding) as streak_sum
  from C2
)
select C3.team_id,
       C3.streak_type,
       count(*) as streak_count
from C3
where C3.streak_sum = 0
group by C3.team_id,
         C3.streak_type
order by C3.team_id;

SQL小提琴

C1计算streak_type每个球队和比赛的。

C2查找前一个streak_type排序者match_id desc

C3streak_sum通过match_id desc0long 保留streak_type为与最后一个值相同的时间来生成运行总和。

主查询总结了其中的条纹streak_sum0


4
+1以使用LEAD()。没有足够的人知道2012
马克·辛金森

4
+1,我喜欢在LAG中使用降序顺序来确定最后一个条纹的技巧,非常简洁!顺便说一下,由于OP仅需要团队ID,因此您可以替换FantasyTeams JOIN FantasyMatchesFantasyMatches CROSS APPLY (VALUES (home_fantasy_team_id), (away_fantasy_team_id)),从而有可能提高性能。
Andriy M

@AndriyM好收获!!我将以此来更新答案。如果您需要其他列,FantasyTeams最好加入主查询。
Mikael Eriksson 2014年

感谢您提供的代码示例-我将尝试一下,在我退出会议后稍后再报告...>:-\
jamauss 2014年

@MikaelEriksson-效果很好-谢谢!快速问题-我需要使用此结果集更新现有行(加入FantasyTeams.team_id)-您如何建议将其转换为UPDATE语句?我开始尝试仅将SELECT更改为UPDATE,但不能在UPDATE中使用GROUP BY。您会说我应该将结果集放入临时表中,然后将其加入到UPDATE或其他内容中吗?谢谢!
jamauss

10

解决此问题的一种直观方法是:

  1. 查找每个团队的最新结果
  2. 检查上一个匹配项,如果结果类型匹配,则在连胜计数中添加一个
  3. 重复步骤2,但是一旦遇到第一个不同的结果就停止

假设递归策略得到了有效实施,则随着表的增大,该策略可能会胜过窗口函数解决方案(对数据进行完整扫描)。成功的关键是提供有效的索引以快速定位行(使用查找)并避免排序。所需的索引是:

-- New index #1
CREATE UNIQUE INDEX uq1 ON dbo.FantasyMatches 
    (home_fantasy_team_id, match_id) 
INCLUDE (winning_team_id);

-- New index #2
CREATE UNIQUE INDEX uq2 ON dbo.FantasyMatches 
    (away_fantasy_team_id, match_id) 
INCLUDE (winning_team_id);

为了帮助优化查询,我将使用一个临时表来保存标识为当前条纹一部分的行。如果条纹通常很短(不幸的是,对于我跟随的团队来说如此),此表应该很小:

-- Table to hold just the rows that form streaks
CREATE TABLE #StreakData
(
    team_id bigint NOT NULL,
    match_id bigint NOT NULL,
    streak_type char(1) NOT NULL,
    streak_length integer NOT NULL,
);

-- Temporary table unique clustered index
CREATE UNIQUE CLUSTERED INDEX cuq ON #StreakData (team_id, match_id);

我的递归查询解决方案如下(此处SQL Fiddle):

-- Solution query
WITH Streaks AS
(
    -- Anchor: most recent match for each team
    SELECT 
        FT.team_id, 
        CA.match_id, 
        CA.streak_type, 
        streak_length = 1
    FROM dbo.FantasyTeams AS FT
    CROSS APPLY
    (
        -- Most recent match
        SELECT
            T.match_id,
            T.streak_type
        FROM 
        (
            SELECT 
                FM.match_id, 
                streak_type =
                    CASE 
                        WHEN FM.winning_team_id = FM.home_fantasy_team_id
                            THEN CONVERT(char(1), 'W')
                        WHEN FM.winning_team_id IS NULL
                            THEN CONVERT(char(1), 'T')
                        ELSE CONVERT(char(1), 'L')
                    END
            FROM dbo.FantasyMatches AS FM
            WHERE 
                FT.team_id = FM.home_fantasy_team_id
            UNION ALL
            SELECT 
                FM.match_id, 
                streak_type =
                    CASE 
                        WHEN FM.winning_team_id = FM.away_fantasy_team_id
                            THEN CONVERT(char(1), 'W')
                        WHEN FM.winning_team_id IS NULL
                            THEN CONVERT(char(1), 'T')
                        ELSE CONVERT(char(1), 'L')
                    END
            FROM dbo.FantasyMatches AS FM
            WHERE
                FT.team_id = FM.away_fantasy_team_id
        ) AS T
        ORDER BY 
            T.match_id DESC
            OFFSET 0 ROWS 
            FETCH FIRST 1 ROW ONLY
    ) AS CA
    UNION ALL
    -- Recursive part: prior match with the same streak type
    SELECT 
        Streaks.team_id, 
        LastMatch.match_id, 
        Streaks.streak_type, 
        Streaks.streak_length + 1
    FROM Streaks
    CROSS APPLY
    (
        -- Most recent prior match
        SELECT 
            Numbered.match_id, 
            Numbered.winning_team_id, 
            Numbered.team_id
        FROM
        (
            -- Assign a row number
            SELECT
                PreviousMatches.match_id,
                PreviousMatches.winning_team_id,
                PreviousMatches.team_id, 
                rn = ROW_NUMBER() OVER (
                    ORDER BY PreviousMatches.match_id DESC)
            FROM
            (
                -- Prior match as home or away team
                SELECT 
                    FM.match_id, 
                    FM.winning_team_id, 
                    team_id = FM.home_fantasy_team_id
                FROM dbo.FantasyMatches AS FM
                WHERE 
                    FM.home_fantasy_team_id = Streaks.team_id
                    AND FM.match_id < Streaks.match_id
                UNION ALL
                SELECT 
                    FM.match_id, 
                    FM.winning_team_id, 
                    team_id = FM.away_fantasy_team_id
                FROM dbo.FantasyMatches AS FM
                WHERE 
                    FM.away_fantasy_team_id = Streaks.team_id
                    AND FM.match_id < Streaks.match_id
            ) AS PreviousMatches
        ) AS Numbered
        -- Most recent
        WHERE 
            Numbered.rn = 1
    ) AS LastMatch
    -- Check the streak type matches
    WHERE EXISTS
    (
        SELECT 
            Streaks.streak_type
        INTERSECT
        SELECT 
            CASE 
                WHEN LastMatch.winning_team_id IS NULL THEN 'T' 
                WHEN LastMatch.winning_team_id = LastMatch.team_id THEN 'W' 
                ELSE 'L' 
            END
    )
)
INSERT #StreakData
    (team_id, match_id, streak_type, streak_length)
SELECT
    team_id,
    match_id,
    streak_type,
    streak_length
FROM Streaks
OPTION (MAXRECURSION 0);

T-SQL文本很长,但是查询的每个部分都与该答案开头给出的广泛过程概述紧密对应。由于需要使用某些技巧来避免排序,并TOP在查询的递归部分中产生a (通常是不允许的),因此查询时间更长。

与查询相比,执行计划相对较小且简单。我在下面的屏幕截图中将锚点区域着色为黄色,将递归部分着色为绿色:

递归执行计划

通过在临时表中捕获条纹行,可以轻松获得所需的摘要结果。(使用临时表还避免了以下查询与主递归查询结合使用时可能发生的排序溢出)

-- Basic results
SELECT
    SD.team_id,
    StreakType = MAX(SD.streak_type),
    StreakLength = MAX(SD.streak_length)
FROM #StreakData AS SD
GROUP BY 
    SD.team_id
ORDER BY
    SD.team_id;

基本查询执行计划

相同的查询可用作更新FantasyTeams表的基础:

-- Update team summary
WITH StreakData AS
(
    SELECT
        SD.team_id,
        StreakType = MAX(SD.streak_type),
        StreakLength = MAX(SD.streak_length)
    FROM #StreakData AS SD
    GROUP BY 
        SD.team_id
)
UPDATE FT
SET streak_type = SD.StreakType,
    streak_count = SD.StreakLength
FROM StreakData AS SD
JOIN dbo.FantasyTeams AS FT
    ON FT.team_id = SD.team_id;

或者,如果您喜欢MERGE

MERGE dbo.FantasyTeams AS FT
USING
(
    SELECT
        SD.team_id,
        StreakType = MAX(SD.streak_type),
        StreakLength = MAX(SD.streak_length)
    FROM #StreakData AS SD
    GROUP BY 
        SD.team_id
) AS StreakData
    ON StreakData.team_id = FT.team_id
WHEN MATCHED THEN UPDATE SET
    FT.streak_type = StreakData.StreakType,
    FT.streak_count = StreakData.StreakLength;

两种方法都可以产生有效的执行计划(基于临时表中的已知行数):

更新执行计划

最后,由于递归方法match_id的处理过程中自然包含,因此很容易将match_id形成每个条纹的s 列表添加到输出中:

SELECT
    S.team_id,
    streak_type = MAX(S.streak_type),
    match_id_list =
        STUFF(
        (
            SELECT ',' + CONVERT(varchar(11), S2.match_id)
            FROM #StreakData AS S2
            WHERE S2.team_id = S.team_id
            ORDER BY S2.match_id DESC
            FOR XML PATH ('')
        ), 1, 1, ''),
    streak_length = MAX(S.streak_length)
FROM #StreakData AS S
GROUP BY 
    S.team_id
ORDER BY
    S.team_id;

输出:

包含比赛清单

执行计划:

比赛清单执行计划


2
令人印象深刻!为什么有递归部分的WHERE EXISTS (... INTERSECT ...)而不是正使用的特定原因Streaks.streak_type = CASE ...?我知道前一种方法在需要在两侧都需要同时匹配NULL和值的情况下会很有用,但是在这种情况下,好像右边部分可能不会产生任何NULL,所以……
Andriy M

2
@AndriyM是的。该代码在许多地方和方法中都经过精心编写,以无条件地制定计划。当CASE使用时,优化器不能使用合并连接(它保留工会键顺序),并使用串联加上各种代替。
保罗·怀特

8

获得结果的另一种方法是通过递归CTE

WITH TeamRes As (
SELECT FT.Team_ID
     , FM.match_id
     , Previous_Match = LAG(match_id, 1, 0) 
                        OVER (PARTITION BY FT.Team_ID ORDER BY FM.match_id)
     , Matches = Row_Number() 
                 OVER (PARTITION BY FT.Team_ID ORDER BY FM.match_id Desc)
     , Result = Case Coalesce(winning_team_id, -1)
                     When -1 Then 'T'
                     When FT.Team_ID Then 'W'
                     Else 'L'
                End 
FROM   FantasyMatches FM
       INNER JOIN FantasyTeams FT ON FT.Team_ID IN 
         (FM.home_fantasy_team_id, FM.away_fantasy_team_id)
), Streaks AS (
SELECT Team_ID, Result, 1 As Streak, Previous_Match
FROM   TeamRes
WHERE  Matches = 1
UNION ALL
SELECT tr.Team_ID, tr.Result, Streak + 1, tr.Previous_Match
FROM   TeamRes tr
       INNER JOIN Streaks s ON tr.Team_ID = s.Team_ID 
                           AND tr.Match_id = s.Previous_Match 
                           AND tr.Result = s.Result
)
Select Team_ID, Result, Max(Streak) Streak
From   Streaks
Group By Team_ID, Result
Order By Team_ID

SQLFiddle演示


感谢您提供此答案,很高兴看到一个以上的问题解决方案,并且能够比较两者的性能。
jamauss
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.