我正在HSQLDB
包含500 000项的表的服务器上执行一些测试。该表没有索引。有5000个不同的业务密钥。我需要他们的清单。自然地,我从DISTINCT
查询开始:
SELECT DISTINCT business_key FROM memory WHERE
concept <> 'case' or
attrib <> 'status' or
value <> 'closed'
大约需要90秒!!!
然后我尝试使用GROUP BY
:
SELECT business_key FROM memory WHERE
concept <> 'case' or
attrib <> 'status' or
value <> 'closed'
GROUP BY business_key
它需要1秒钟!!!
试图找出我运行的差异,EXLAIN PLAN FOR
但似乎为两个查询提供了相同的信息。
EXLAIN PLAN FOR DISTINCT ...
isAggregated=[false]
columns=[
COLUMN: PUBLIC.MEMORY.BUSINESS_KEY
]
[range variable 1
join type=INNER
table=MEMORY
alias=M
access=FULL SCAN
condition = [ index=SYS_IDX_SYS_PK_10057_10058
other condition=[
OR arg_left=[
OR arg_left=[
NOT_EQUAL arg_left=[
COLUMN: PUBLIC.MEMORY.CONCEPT] arg_right=[
VALUE = case, TYPE = CHARACTER]] arg_right=[
NOT_EQUAL arg_left=[
COLUMN: PUBLIC.MEMORY.ATTRIB] arg_right=[
VALUE = status, TYPE = CHARACTER]]] arg_right=[
NOT_EQUAL arg_left=[
COLUMN: PUBLIC.MEMORY.VALUE] arg_right=[
VALUE = closed, TYPE = CHARACTER]]]
]
]]
PARAMETERS=[]
SUBQUERIES[]
Object References
PUBLIC.MEMORY
PUBLIC.MEMORY.CONCEPT
PUBLIC.MEMORY.ATTRIB
PUBLIC.MEMORY.VALUE
PUBLIC.MEMORY.BUSINESS_KEY
Read Locks
PUBLIC.MEMORY
WriteLocks
EXLAIN PLAN FOR SELECT ... GROUP BY ...
isDistinctSelect=[false]
isGrouped=[true]
isAggregated=[false]
columns=[
COLUMN: PUBLIC.MEMORY.BUSINESS_KEY
]
[range variable 1
join type=INNER
table=MEMORY
alias=M
access=FULL SCAN
condition = [ index=SYS_IDX_SYS_PK_10057_10058
other condition=[
OR arg_left=[
OR arg_left=[
NOT_EQUAL arg_left=[
COLUMN: PUBLIC.MEMORY.CONCEPT] arg_right=[
VALUE = case, TYPE = CHARACTER]] arg_right=[
NOT_EQUAL arg_left=[
COLUMN: PUBLIC.MEMORY.ATTRIB] arg_right=[
VALUE = status, TYPE = CHARACTER]]] arg_right=[
NOT_EQUAL arg_left=[
COLUMN: PUBLIC.MEMORY.VALUE] arg_right=[
VALUE = closed, TYPE = CHARACTER]]]
]
]]
groupColumns=[
COLUMN: PUBLIC.MEMORY.BUSINESS_KEY]
PARAMETERS=[]
SUBQUERIES[]
Object References
PUBLIC.MEMORY
PUBLIC.MEMORY.CONCEPT
PUBLIC.MEMORY.ATTRIB
PUBLIC.MEMORY.VALUE
PUBLIC.MEMORY.BUSINESS_KEY
Read Locks
PUBLIC.MEMORY
WriteLocks
编辑:我做了其他测试。拥有500 000条HSQLDB
具有不同业务键的记录,DISTINCT
现在的性能更好-3秒,GROUP BY
而花费了大约9秒。
在MySQL
两个查询中,预执行都相同:
MySQL:500 000行-5000个不同的业务密钥:两个查询:0.5秒MySQL:500000行-所有不同的业务密钥:
SELECT DISTINCT ...
-11秒
SELECT ... GROUP BY business_key
-13秒
因此,问题仅与HSQLDB
。
如果有人能解释为什么会有如此巨大的差异,我将不胜感激。
EXPLAIN PLAN
AND的结果,并在运行DISTINCT
后尝试运行查询,GROUP BY
以查看是否某些缓存使时间错了……