创建和填充数字表的最佳方法是什么?


71

我已经看到了许多创建和填充数字表的方法。但是,创建和填充的最佳方法是什么?从“最重要”到“最不重要”定义“最佳”:

  • 用最佳索引创建的表
  • 行生成最快
  • 用于创建和填充的简单代码

如果您不知道数字表是什么,请看这里:为什么要考虑使用辅助数字表?


2
事实证明,这几乎是stackoverflow.com/questions/10819/…的副本,据我所知
Tao

到目前为止,最好的方法是对不需要物理分配的虚拟表的内置实现进行投票。目前可以在这里完成:https
Louis Somers

1
@LouisSomers,我喜欢这种方法。但是,如果您需要程序长时间运行才能添加该功能,则需要诉诸于构建自己的程序。
KM。

1
@KM哈哈,这是真的,我在同一条船上,我只是想筹集更多选票的功能,我觉得比Management Studio中黑暗的主题更重要......
路易·萨默斯

Answers:


136

这里是一些从网上获取的代码示例,以及对该问题的解答。

对于“每种方法”,我已经修改了原始代码,因此每种代码都使用相同的表和列:NumbersTest和Number,具有10,000行或尽可能接近的行。另外,我提供了到原产地的链接。

方法1这是一种非常慢的循环方法,从这里开始,
平均13.01秒
运行了3次最高移除,这是几秒钟的时间:12.42,13.60

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest(Number INT IDENTITY(1,1)) 
SET NOCOUNT ON
WHILE COALESCE(SCOPE_IDENTITY(), 0) < 100000
BEGIN 
    INSERT dbo.NumbersTest DEFAULT VALUES 
END
SET NOCOUNT OFF
-- Add a primary key/clustered index to the numbers table
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE())/1000.0)+' seconds'
SELECT COUNT(*) FROM NumbersTest

方法2这里循环的速度要快得多,
平均1.1658秒
,而去掉最高11次,这是几秒钟的时间:1.117、1.140、1.203、1.170、1.173、1.156、1.203、1.153、1.173、1.170

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number INT NOT NULL);
DECLARE @i INT;
SELECT @i = 1;
SET NOCOUNT ON
WHILE @i <= 10000
BEGIN
    INSERT INTO dbo.NumbersTest(Number) VALUES (@i);
    SELECT @i = @i + 1;
END;
SET NOCOUNT OFF
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE())/1000.0)+' seconds'
SELECT COUNT(*) FROM NumbersTest

方法3这是基于代码从一个单一的INSERT这里
平均488.6毫秒
跑了11次来除去最高的,这里以毫秒为单位的时间:686,673,623,686,343,343,376,360,343,453

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number  int  not null)  
;WITH Nums(Number) AS
(SELECT 1 AS Number
 UNION ALL
 SELECT Number+1 FROM Nums where Number<10000
)
insert into NumbersTest(Number)
    select Number from Nums option(maxrecursion 10000)
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest

方法4这里是“半循环”的方法从这里 平均348.3毫秒(这是很难获得,因为在代码中间的“走出去”的好时机,有什么建议,将不胜感激)
跑了11次来除去最高的,在这里以毫秒为单位的时间:356、360、283、346、360、376、326、373、330、373

DROP TABLE NumbersTest
DROP TABLE #RunDate
CREATE TABLE #RunDate (RunDate datetime)
INSERT INTO #RunDate VALUES(GETDATE())
CREATE TABLE NumbersTest (Number int NOT NULL);
INSERT NumbersTest values (1);
GO --required
INSERT NumbersTest SELECT Number + (SELECT COUNT(*) FROM NumbersTest) FROM NumbersTest
GO 14 --will create 16384 total rows
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
SELECT CONVERT(varchar(20),datediff(ms,RunDate,GETDATE()))+' milliseconds' FROM #RunDate
SELECT COUNT(*) FROM NumbersTest

方法5这是Philip Kelley的回答中的一次INSERT,
平均92.7毫秒
最高去除了11次,这里是毫秒数:80、96、96、93、110、110、110、80、76、93、93

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number  int  not null)  
;WITH
  Pass0 as (select 1 as C union all select 1), --2 rows
  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
  --I removed Pass5, since I'm only populating the Numbers table to 10,000
  Tally as (select row_number() over(order by C) as Number from Pass4)
INSERT NumbersTest
        (Number)
    SELECT Number
        FROM Tally
        WHERE Number <= 10000
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest

方法6这是来自的单个INSERTMladen Prajdic
平均82.3毫秒
最高运行了11次,这是毫秒级的时间:80、80、93、76、93、63、93、76、93、76

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number  int  not null)  
INSERT INTO NumbersTest(Number)
SELECT TOP 10000 row_number() over(order by t1.number) as N
FROM master..spt_values t1 
    CROSS JOIN master..spt_values t2
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number);
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest

方法7在这里是基于从代码的单个INSERT这里
平均56.3毫秒
跑11次来除去最高,这里有以毫秒为单位的时间:63,50,63,46,60,63,63,46,63,46

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
    INTO NumbersTest
    FROM sys.objects s1       --use sys.columns if you don't get enough rows returned to generate all the numbers you need
    CROSS JOIN sys.objects s2 --use sys.columns if you don't get enough rows returned to generate all the numbers you need
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest

看完所有这些方法后,我真的很喜欢方法7,这是最快的方法,代码也很简单。


几年后看到了这个帖子。我对计时一百万行或更多感兴趣。我可能有一天会尝试,但是10000可能是合理需要的数量。
菲利普·凯利

14
虽然很有趣,但时间安排对我而言似乎并不那么重要。特别是因为如果我需要数字表,那么我将创建一次并重复使用它。
本图尔

非常感谢!我知道这很旧,但是对于那些登陆这里的人,我建议创建一个100,000的Numbers表,以便您可以将其与日期结合使用。
BJury

1
方法7创建了一个包含9604行的表。
kerem 2014年

1
@Dave,HA,有一条评论说,他们在回答四年后就收到了9604行!回答六年后,您说它给出了随机结果。随机结果表示您获得随机值。如果sys.objects中的行很少,您将始终获得从1开始的连续整数值,可能小于10,000。我在一个新的数据库(sys.objects中有76行)上尝试了方法7,它可以创建5,776行(76 * 76)。如果添加CROSS JOIN sys.objects s3上一条注释中的建议,您将获得438,976行(76 * 76 * 76)。
KM。

60

我用这个快得快

insert into Numbers(N)
select top 1000000 row_number() over(order by t1.number) as N
from   master..spt_values t1 
       cross join master..spt_values t2

请注意,Azure SQL数据库不支持此功能。
robyaw

20

如果您只是在SQL Server Management Studio或中执行此操作sqlcmd.exe,则可以使用以下事实:批处理分隔符允许您重复批处理:

CREATE TABLE Number (N INT IDENTITY(1,1) PRIMARY KEY NOT NULL);
GO

INSERT INTO Number DEFAULT VALUES;
GO 100000

这会将100000条记录插入到 Numbers使用下一个标识的默认值表中。

太慢了 它与@KM。答案中的方法1相比,后者是最慢的示例。但是,它就像获得代码一样轻巧。您可以通过在插入批处理之后添加主键约束来加快速度。


@培根位,我只能插入一个或多个特定列吗?
方位角

@Azimuth您可以使用此方法,只要您可以编写单个INSERT语句即可,该语句在重复执行时将为每一行创建数据。批处理转发器所做的所有工作都是告诉客户端(SSMS或sqlcmd.exe)将完全相同的查询重复N次。您可以通过多种方式利用T-SQL进行操作,但是我怀疑它很快就会成为除代码轻之外的任何东西。
培根片

14

我从以下模板开始,该模板是从Itzik Ben-Gan例程的多次印刷中获得的:

;WITH
  Pass0 as (select 1 as C union all select 1), --2 rows
  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
  Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),--4,294,967,296 rows
  Tally as (select row_number() over(order by C) as Number from Pass5)
 select Number from Tally where Number <= 1000000

“ WHERE N <= 1000000”子句将输出限制为1到1百万,可以很容易地将其调整到所需的范围。

由于这是一个WITH子句,因此可以将其插入INSERT ... SELECT ...中,如下所示:

--  Sample use: create one million rows
CREATE TABLE dbo.Example (ExampleId  int  not null)  

DECLARE @RowsToCreate int
SET @RowsToCreate = 1000000

--  "Table of numbers" data generator, as per Itzik Ben-Gan (from multiple sources)
;WITH
  Pass0 as (select 1 as C union all select 1), --2 rows
  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
  Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),--4,294,967,296 rows
  Tally as (select row_number() over(order by C) as Number from Pass5)
INSERT Example (ExampleId)
 select Number
  from Tally
  where Number <= @RowsToCreate

建表后为表建立索引将是对其进行索引的最快方法。

哦,我将其称为“ Tally”表。我认为这是一个常用术语,您可以通过谷歌搜索找到许多技巧和示例。


4

对于正在寻找Azure解决方案的任何人

SET NOCOUNT ON    
CREATE TABLE Numbers (n bigint PRIMARY KEY)    
GO    
DECLARE @numbers table(number int);  
WITH numbers(number) as  (   
SELECT 1 AS number   
UNION all   
SELECT number+1 FROM numbers WHERE number<10000  
)  
INSERT INTO @numbers(number)  
SELECT number FROM numbers OPTION(maxrecursion 10000)
INSERT INTO Numbers(n)  SELECT number FROM @numbers

源自sql azure团队博客 http://azure.microsoft.com/blog/2010/09/16/create-a-numbers-table-in-sql-azure/


4

这是一个简短而快速的内存中解决方案,我利用SQL Server 2008中引入的表值构造函数想到的:

它将返回1,000,000行,但是您可以添加/删除CROSS JOIN,也可以使用TOP子句进行修改。

;WITH v AS (SELECT * FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) v(z))

SELECT N FROM (SELECT ROW_NUMBER() OVER (ORDER BY v1.z)-1 N FROM v v1 
    CROSS JOIN v v2 CROSS JOIN v v3 CROSS JOIN v v4 CROSS JOIN v v5 CROSS JOIN v v6) Nums

请注意,这可以快速进行计算,或者(甚至更好)存储在永久表中(只需INTOSELECT N段后添加一个子句),并在N字段上使用主键以提高效率。


如果您想要即时数字表,我喜欢这个想法。当您使用它生成实际表时,它比其他方法慢。
KM。

1
@KM。我只是在安装程序上进行了测试,只花了不到一秒钟的时间。但是,假设我们花了10秒,而另一个花了1秒(用于建立永久表)。IMO,这仍是微不足道的考虑,您将只需要设置一次永久表。其他因素,例如代码详细程度,对我来说更重要。1分钟1秒?那会有所不同,但是我的查询并不是那么慢。
iliketocode

3

我知道这个线程很旧并且可以回答,但是有一种方法可以从方法7中挤出一些额外的性能:

代替此方法(本质上是方法7,但要易于使用):

DECLARE @BIT AS BIT = 0
IF OBJECT_ID('tempdb..#TALLY') IS NOT NULL
  DROP TABLE #TALLY
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
    INTO #TALLY
    FROM sys.objects s1       --use sys.columns if you don't get enough rows returned to generate all the numbers you need
    CROSS JOIN sys.objects s2 --use sys.co
ALTER TABLE #TALLY ADD PRIMARY KEY(Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'

尝试这个:

DECLARE @BIT AS BIT = 0
IF OBJECT_ID('tempdb..#TALLY') IS NOT NULL
  DROP TABLE #TALLY
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
    INTO #TALLY
    FROM        (SELECT @BIT [X] UNION ALL SELECT @BIT) [T2]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T4]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T8]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T16]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T32]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T64]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T128]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T256]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T512]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T1024]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T2048]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T4096]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T8192]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T16384]
ALTER TABLE #TALLY ADD PRIMARY KEY(Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'

在我的服务器上,这需要10毫秒,而从sys.objects中进行选择需要16 -20毫秒。它还具有不依赖于sys.objects中有多少个对象的额外好处。虽然很安全,但从技术上讲它是一个依赖项,而另一个依赖项的运行速度更快。我认为,如果您进行更改,那么提速将取决于使用BIT:

DECLARE @BIT AS BIT = 0

至:

DECLARE @BIT AS BIGINT = 0

这将使我的服务器上的总时间增加〜8-10毫秒。就是说,当您扩展到1,000,000条记录时,BIT vs BIGINT不再明显影响我的查询,但是它仍然运行约680毫秒,而sys.objects约为730毫秒。


2

我使用数字表主要是在BIRT中伪造报表,而不必费心地动态创建记录集。

我对日期也做同样的事情,其表的范围从过去的10年到将来的10年(以及每天的几小时,以获取更详细的报告)。能够获取所有日期的值是一个很巧妙的技巧,即使您的“真实”数据表没有它们的数据也是如此。

我有一个脚本来创建这些脚本,就像(来自内存):

drop table numbers; commit;
create table numbers (n integer primary key); commit;
insert into numbers values (0); commit;
insert into numbers select n+1 from numbers; commit;
insert into numbers select n+2 from numbers; commit;
insert into numbers select n+4 from numbers; commit;
insert into numbers select n+8 from numbers; commit;
insert into numbers select n+16 from numbers; commit;
insert into numbers select n+32 from numbers; commit;
insert into numbers select n+64 from numbers; commit;

每行的行数加倍,因此生成真正的大表不需要很多。

我不确定我是否同意您的观点,因为您只需创建一次就可以快速创建。它的成本在所有访问中摊销,从而使时间变得微不足道。


每个提交;结果为Msg 3902,级别16,状态1,行1 COMMIT TRANSACTION请求没有相应的BEGIN TRANSACTION。
KM。

@KM,第一点很容易通过开始事务来解决(DB / 2,我选择的DBMS,通常配置为自动启动事务)。而且,如果需要更多行,则只需添加更多插入。每个范围都会加倍,因此如果需要,可以很容易地获得大量数字。我还更愿意在可能的情况下提供通用的SQL解决方案,而不是将解决方案限制于特定的供应商。
paxdiablo

2

这是几个其他方法:
方法1

IF OBJECT_ID('dbo.Numbers', 'U') IS NOT NULL
    DROP TABLE dbo.Numbers
GO

CREATE TABLE Numbers (Number int NOT NULL PRIMARY KEY);
GO

DECLARE @i int = 1;
INSERT INTO dbo.Numbers (Number) 
VALUES (1),(2);

WHILE 2*@i < 1048576
BEGIN
    INSERT INTO dbo.Numbers (Number) 
    SELECT Number + 2*@i
    FROM dbo.Numbers;
    SET @i = @@ROWCOUNT;
END
GO

SELECT COUNT(*) FROM Numbers AS RowCownt --1048576 rows

方法2

IF OBJECT_ID('dbo.Numbers', 'U') IS NOT NULL
    DROP TABLE dbo.Numbers
GO

CREATE TABLE dbo.Numbers (Number int NOT NULL PRIMARY KEY);
GO

DECLARE @i INT = 0; 
INSERT INTO dbo.Numbers (Number) 
VALUES (1);

WHILE @i <= 9
BEGIN
    INSERT INTO dbo.Numbers (Number)
    SELECT N.Number + POWER(4, @i) * D.Digit 
    FROM dbo.Numbers AS N
        CROSS JOIN (VALUES(1),(2),(3)) AS D(Digit)
    ORDER BY D.Digit, N.Number
    SET @i = @i + 1;
END
GO

SELECT COUNT(*) FROM dbo.Numbers AS RowCownt --1048576 rows

方法3

IF OBJECT_ID('dbo.Numbers', 'U') IS NOT NULL
    DROP TABLE dbo.Numbers
GO

CREATE TABLE Numbers (Number int identity NOT NULL PRIMARY KEY, T bit NULL);

WITH
    T1(T) AS (SELECT T FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) AS T(T)) --10 rows
   ,T2(T) AS (SELECT A.T FROM T1 AS A CROSS JOIN T1 AS B CROSS JOIN T1 AS C) --1,000 rows
   ,T3(T) AS (SELECT A.T FROM T2 AS A CROSS JOIN T2 AS B CROSS JOIN T2 AS C) --1,000,000,000 rows

INSERT INTO dbo.Numbers(T)
SELECT TOP (1048576) NULL
FROM T3;

ALTER TABLE Numbers
    DROP COLUMN T; 
GO

SELECT COUNT(*) FROM dbo.Numbers AS RowCownt --1048576 rows

方法4,摘自Alex Kuznetsov的“防御性数据库编程”

IF OBJECT_ID('dbo.Numbers', 'U') IS NOT NULL
    DROP TABLE dbo.Numbers
GO

CREATE TABLE Numbers (Number int NOT NULL PRIMARY KEY);
GO

DECLARE @i INT = 1 ; 
INSERT INTO dbo.Numbers (Number) 
VALUES (1);

WHILE @i < 524289 --1048576
BEGIN; 
    INSERT INTO dbo.Numbers (Number) 
    SELECT Number + @i 
    FROM dbo.Numbers; 
    SET @i = @i * 2 ; 
END
GO

SELECT COUNT(*) FROM dbo.Numbers AS RowCownt --1048576 rows

方法5,摘自Erland Sommarskog的SQL Server 2005和“超越”一文中的数组和列表

IF OBJECT_ID('dbo.Numbers', 'U') IS NOT NULL
    DROP TABLE dbo.Numbers
GO

CREATE TABLE Numbers (Number int NOT NULL PRIMARY KEY);
GO

WITH digits (d) AS (
   SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL
   SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL
   SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL
   SELECT 0)
INSERT INTO Numbers (Number)
   SELECT Number
   FROM   (SELECT i.d + ii.d * 10 + iii.d * 100 + iv.d * 1000 +
                  v.d * 10000 + vi.d * 100000 AS Number
           FROM   digits i
           CROSS  JOIN digits ii
           CROSS  JOIN digits iii
           CROSS  JOIN digits iv
           CROSS  JOIN digits v
           CROSS  JOIN digits vi) AS Numbers
   WHERE  Number > 0
GO

SELECT COUNT(*) FROM dbo.Numbers AS RowCownt --999999 rows

简介:
在这5种方法中,方法3似乎是最快的。


1

一些建议的方法基于系统对象(例如,基于“ sys.objects”)。他们假设这些系统对象包含足够的记录来生成我们的数字。

我不会基于任何不属于我的应用程序并且无法完全控制的内容。例如:这些sys表的内容可能会更改,这些表在新版本的SQL等中可能不再有效。

作为解决方案,我们可以创建带有记录的自己的表。然后,我们使用那个代替这些与系统相关的对象(如果事先知道范围,则带有所有数字的表应该很好,否则我们可以选择一个对象进行交叉联接)。

基于CTE的解决方案运行良好,但是它与嵌套循环有关。


0

这是对已接受答案的重新包装-但以一种让您自己比较它们的方式-比较了前3种算法(并且注释说明了为什么排除其他方法),并且您可以针对自己的设置进行测试看看它们各自如何以所需的序列大小执行。

SET NOCOUNT ON;

--
-- Set the count of numbers that you want in your sequence ...
--
DECLARE @NumberOfNumbers int = 10000000;
--
--  Some notes on choosing a useful length for your sequence ...
--      For a sequence of  100 numbers -- winner depends on preference of min/max/avg runtime ... (I prefer PhilKelley algo here - edit the algo so RowSet2 is max RowSet CTE)
--      For a sequence of   1k numbers -- winner depends on preference of min/max/avg runtime ... (Sadly PhilKelley algo is generally lowest ranked in this bucket, but could be tweaked to perform better)
--      For a sequence of  10k numbers -- a clear winner emerges for this bucket
--      For a sequence of 100k numbers -- do not test any looping methods at this size or above ...
--                                        the previous winner fails, a different method is need to guarantee the full sequence desired
--      For a sequence of  1MM numbers -- the statistics aren't changing much between the algorithms - choose one based on your own goals or tweaks
--      For a sequence of 10MM numbers -- only one of the methods yields the desired sequence, and the numbers are much closer than for smaller sequences

DECLARE @TestIteration int = 0;
DECLARE @MaxIterations int = 10;
DECLARE @MethodName varchar(128);

-- SQL SERVER 2017 Syntax/Support needed
DROP TABLE IF EXISTS #TimingTest
CREATE TABLE #TimingTest (MethodName varchar(128), TestIteration int, StartDate DateTime2, EndDate DateTime2, ElapsedTime decimal(38,0), ItemCount decimal(38,0), MaxNumber decimal(38,0), MinNumber decimal(38,0))

--
--  Conduct the test ...
--
WHILE @TestIteration < @MaxIterations
BEGIN
    -- Be sure that the test moves forward
    SET @TestIteration += 1;

/*  -- This method has been removed, as it is BY FAR, the slowest method
    -- This test shows that, looping should be avoided, likely at all costs, if one places a value / premium on speed of execution ...

    --
    -- METHOD - Fast looping
    --

    -- Prep for the test
    DROP TABLE IF EXISTS [Numbers].[Test];
    CREATE TABLE [Numbers].[Test] (Number INT NOT NULL);

    -- Method information
    SET @MethodName = 'FastLoop';

    -- Record the start of the test
    INSERT INTO #TimingTest(MethodName, TestIteration, StartDate)
    SELECT @MethodName, @TestIteration, GETDATE()

    -- Run the algorithm
    DECLARE @i INT = 1;
    WHILE @i <= @NumberOfNumbers
    BEGIN
        INSERT INTO [Numbers].[Test](Number) VALUES (@i);
        SELECT @i = @i + 1;
    END;

    ALTER TABLE [Numbers].[Test] ADD CONSTRAINT PK_Numbers_Test_Number PRIMARY KEY CLUSTERED (Number)

    -- Record the end of the test
    UPDATE tt
        SET 
            EndDate = GETDATE()
    FROM #TimingTest tt
    WHERE tt.MethodName = @MethodName
    and tt.TestIteration = @TestIteration

    -- And the stats about the numbers in the sequence
    UPDATE tt
        SET 
            ItemCount = results.ItemCount,
            MaxNumber = results.MaxNumber,
            MinNumber = results.MinNumber
    FROM #TimingTest tt
    CROSS JOIN (
        SELECT COUNT(Number) as ItemCount, MAX(Number) as MaxNumber, MIN(Number) as MinNumber FROM [Numbers].[Test]
    ) results
    WHERE tt.MethodName = @MethodName
    and tt.TestIteration = @TestIteration
*/

/*  -- This method requires GO statements, which would break the script, also - this answer does not appear to be the fastest *AND* seems to perform "magic"
    --
    -- METHOD - "Semi-Looping"
    --

    -- Prep for the test
    DROP TABLE IF EXISTS [Numbers].[Test];
    CREATE TABLE [Numbers].[Test] (Number INT NOT NULL);

    -- Method information
    SET @MethodName = 'SemiLoop';

    -- Record the start of the test
    INSERT INTO #TimingTest(MethodName, TestIteration, StartDate)
    SELECT @MethodName, @TestIteration, GETDATE()

    -- Run the algorithm 
    INSERT [Numbers].[Test] values (1);
--    GO --required

    INSERT [Numbers].[Test] SELECT Number + (SELECT COUNT(*) FROM [Numbers].[Test]) FROM [Numbers].[Test]
--    GO 14 --will create 16384 total rows

    ALTER TABLE [Numbers].[Test] ADD CONSTRAINT PK_Numbers_Test_Number PRIMARY KEY CLUSTERED (Number)

    -- Record the end of the test
    UPDATE tt
        SET 
            EndDate = GETDATE()
    FROM #TimingTest tt
    WHERE tt.MethodName = @MethodName
    and tt.TestIteration = @TestIteration

    -- And the stats about the numbers in the sequence
    UPDATE tt
        SET 
            ItemCount = results.ItemCount,
            MaxNumber = results.MaxNumber,
            MinNumber = results.MinNumber
    FROM #TimingTest tt
    CROSS JOIN (
        SELECT COUNT(Number) as ItemCount, MAX(Number) as MaxNumber, MIN(Number) as MinNumber FROM [Numbers].[Test]
    ) results
    WHERE tt.MethodName = @MethodName
    and tt.TestIteration = @TestIteration
*/
    --
    -- METHOD - Philip Kelley's algo 
    --          (needs tweaking to match the desired length of sequence in order to optimize its performance, relies more on the coder to properly tweak the algorithm)
    --

    -- Prep for the test
    DROP TABLE IF EXISTS [Numbers].[Test];
    CREATE TABLE [Numbers].[Test] (Number INT NOT NULL);

    -- Method information
    SET @MethodName = 'PhilKelley';

    -- Record the start of the test
    INSERT INTO #TimingTest(MethodName, TestIteration, StartDate)
    SELECT @MethodName, @TestIteration, GETDATE()

    -- Run the algorithm
    ; WITH
    RowSet0 as (select 1 as Item union all select 1),              --          2 rows   -- We only have to name the column in the first select, the second/union select inherits the column name
    RowSet1 as (select 1 as Item from RowSet0 as A, RowSet0 as B), --          4 rows
    RowSet2 as (select 1 as Item from RowSet1 as A, RowSet1 as B), --         16 rows
    RowSet3 as (select 1 as Item from RowSet2 as A, RowSet2 as B), --        256 rows
    RowSet4 as (select 1 as Item from RowSet3 as A, RowSet3 as B), --      65536 rows (65k)
    RowSet5 as (select 1 as Item from RowSet4 as A, RowSet4 as B), -- 4294967296 rows (4BB)
    -- Add more RowSetX to get higher and higher numbers of rows    
    -- Each successive RowSetX results in squaring the previously available number of rows
    Tally   as (select row_number() over (order by Item) as Number from RowSet5) -- This is what gives us the sequence of integers, always select from the terminal CTE expression
    -- Note: testing of this specific use case has shown that making Tally as a sub-query instead of a terminal CTE expression is slower (always) - be sure to follow this pattern closely for max performance
    INSERT INTO [Numbers].[Test] (Number)
    SELECT o.Number
    FROM Tally o
    WHERE o.Number <= @NumberOfNumbers

    ALTER TABLE [Numbers].[Test] ADD CONSTRAINT PK_Numbers_Test_Number PRIMARY KEY CLUSTERED (Number)

    -- Record the end of the test
    UPDATE tt
        SET 
            EndDate = GETDATE()
    FROM #TimingTest tt
    WHERE tt.MethodName = @MethodName
    and tt.TestIteration = @TestIteration

    -- And the stats about the numbers in the sequence
    UPDATE tt
        SET 
            ItemCount = results.ItemCount,
            MaxNumber = results.MaxNumber,
            MinNumber = results.MinNumber
    FROM #TimingTest tt
    CROSS JOIN (
        SELECT COUNT(Number) as ItemCount, MAX(Number) as MaxNumber, MIN(Number) as MinNumber FROM [Numbers].[Test]
    ) results
    WHERE tt.MethodName = @MethodName
    and tt.TestIteration = @TestIteration

    --
    -- METHOD - Mladen Prajdic answer
    --

    -- Prep for the test
    DROP TABLE IF EXISTS [Numbers].[Test];
    CREATE TABLE [Numbers].[Test] (Number INT NOT NULL);

    -- Method information
    SET @MethodName = 'MladenPrajdic';

    -- Record the start of the test
    INSERT INTO #TimingTest(MethodName, TestIteration, StartDate)
    SELECT @MethodName, @TestIteration, GETDATE()

    -- Run the algorithm
    INSERT INTO [Numbers].[Test](Number)
    SELECT TOP (@NumberOfNumbers) row_number() over(order by t1.number) as N
    FROM master..spt_values t1 
    CROSS JOIN master..spt_values t2

    ALTER TABLE [Numbers].[Test] ADD CONSTRAINT PK_Numbers_Test_Number PRIMARY KEY CLUSTERED (Number)

    -- Record the end of the test
    UPDATE tt
        SET 
            EndDate = GETDATE()
    FROM #TimingTest tt
    WHERE tt.MethodName = @MethodName
    and tt.TestIteration = @TestIteration

    -- And the stats about the numbers in the sequence
    UPDATE tt
        SET 
            ItemCount = results.ItemCount,
            MaxNumber = results.MaxNumber,
            MinNumber = results.MinNumber
    FROM #TimingTest tt
    CROSS JOIN (
        SELECT COUNT(Number) as ItemCount, MAX(Number) as MaxNumber, MIN(Number) as MinNumber FROM [Numbers].[Test]
    ) results
    WHERE tt.MethodName = @MethodName
    and tt.TestIteration = @TestIteration

    --
    -- METHOD - Single INSERT
    -- 

    -- Prep for the test
    DROP TABLE IF EXISTS [Numbers].[Test];
    -- The Table creation is part of this algorithm ...

    -- Method information
    SET @MethodName = 'SingleInsert';

    -- Record the start of the test
    INSERT INTO #TimingTest(MethodName, TestIteration, StartDate)
    SELECT @MethodName, @TestIteration, GETDATE()

    -- Run the algorithm
    SELECT TOP (@NumberOfNumbers) IDENTITY(int,1,1) AS Number
    INTO [Numbers].[Test]
    FROM sys.objects s1       -- use sys.columns if you don't get enough rows returned to generate all the numbers you need
    CROSS JOIN sys.objects s2 -- use sys.columns if you don't get enough rows returned to generate all the numbers you need

    ALTER TABLE [Numbers].[Test] ADD CONSTRAINT PK_Numbers_Test_Number PRIMARY KEY CLUSTERED (Number)

    -- Record the end of the test
    UPDATE tt
        SET 
            EndDate = GETDATE()
    FROM #TimingTest tt
    WHERE tt.MethodName = @MethodName
    and tt.TestIteration = @TestIteration

    -- And the stats about the numbers in the sequence
    UPDATE tt
        SET 
            ItemCount = results.ItemCount,
            MaxNumber = results.MaxNumber,
            MinNumber = results.MinNumber
    FROM #TimingTest tt
    CROSS JOIN (
        SELECT COUNT(Number) as ItemCount, MAX(Number) as MaxNumber, MIN(Number) as MinNumber FROM [Numbers].[Test]
    ) results
    WHERE tt.MethodName = @MethodName
    and tt.TestIteration = @TestIteration
END

-- Calculate the timespan for each of the runs
UPDATE tt
    SET
        ElapsedTime = DATEDIFF(MICROSECOND, StartDate, EndDate)
FROM #TimingTest tt

--
-- Report the results ...
--
SELECT 
    MethodName, AVG(ElapsedTime) / AVG(ItemCount) as TimePerRecord, CAST(AVG(ItemCount) as bigint) as SequenceLength,
    MAX(ElapsedTime) as MaxTime, MIN(ElapsedTime) as MinTime,
    MAX(MaxNumber) as MaxNumber, MIN(MinNumber) as MinNumber
FROM #TimingTest tt
GROUP by tt.MethodName
ORDER BY TimePerRecord ASC, MaxTime ASC, MinTime ASC
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.