让我们从比较执行计划开始:
tinker=> EXPLAIN ANALYZE SELECT * FROM generate_series(1,1e7);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Function Scan on generate_series (cost=0.00..10.00 rows=1000 width=32) (actual time=2382.582..4291.136 rows=10000000 loops=1)
Planning time: 0.022 ms
Execution time: 5539.522 ms
(3 rows)
tinker=> EXPLAIN ANALYZE SELECT generate_series(1,1e7);
QUERY PLAN
-------------------------------------------------------------------------------------------------
Result (cost=0.00..5.01 rows=1000 width=0) (actual time=0.008..2622.365 rows=10000000 loops=1)
Planning time: 0.045 ms
Execution time: 3858.661 ms
(3 rows)
好的,现在我们知道该操作SELECT * FROM generate_series()
是使用Function Scan
节点执行的,而SELECT generate_series()
使用Result
节点是执行的。导致这些查询执行不同的原因归结为这两个节点之间的差异,我们确切地知道在哪里查找。
EXPLAIN ANALYZE
输出中的另一件有趣的事情是:注意计时。SELECT generate_series()
是actual time=0.008..2622.365
,SELECT * FROM generate_series()
而是actual time=2382.582..4291.136
。该Function Scan
节点开始返回各地的时间记录Result
节点完成返回的记录。
PostgreSQL 在计划之间t=0
和计划中做t=2382
了Function Scan
什么?显然,这与运行需要多长时间有关generate_series()
,所以我敢打赌这正是它正在做的事情。答案开始成形:似乎Result
立即返回结果,而似乎Function Scan
具体化结果然后进行扫描。
随着EXPLAIN
闪开,让我们检查的实施。该Result
节点位于中nodeResult.c
,表示:
* DESCRIPTION
*
* Result nodes are used in queries where no relations are scanned.
代码很简单。
Function Scan
生活在nodeFunctionScan.c
,实际上似乎需要采取两个阶段的执行策略:
/*
* If first time through, read all tuples from function and put them
* in a tuplestore. Subsequent calls just fetch tuples from
* tuplestore.
*/
为了清楚起见,让我们看看a tuplestore
是什么:
* tuplestore.h
* Generalized routines for temporary tuple storage.
*
* This module handles temporary storage of tuples for purposes such
* as Materialize nodes, hashjoin batch files, etc. It is essentially
* a dumbed-down version of tuplesort.c; it does no sorting of tuples
* but can only store and regurgitate a sequence of tuples. However,
* because no sort is required, it is allowed to start reading the sequence
* before it has all been written. This is particularly useful for cursors,
* because it allows random access within the already-scanned portion of
* a query without having to process the underlying scan to completion.
* Also, it is possible to support multiple independent read pointers.
*
* A temporary file is used to handle the data if it exceeds the
* space limit specified by the caller.
假设得到证实。Function Scan
预先执行,实现函数的结果,对于较大的结果集,结果会溢出到磁盘中。Result
不会实现任何东西,但仅支持琐碎的操作。