如何在select子句中使用Post in子句(如SQL Server)在Post子句中进行Postgresql子查询?


81

我正在尝试在postgresql上编写以下查询:

select name, author_id, count(1), 
    (select count(1)
    from names as n2
    where n2.id = n1.id
        and t2.author_id = t1.author_id
    )               
from names as n1
group by name, author_id

这当然可以在Microsoft SQL Server上使用,但在postegresql上则根本不可用。我读了一点文档,似乎可以将其重写为:

select name, author_id, count(1), total                     
from names as n1, (select count(1) as total
    from names as n2
    where n2.id = n1.id
        and n2.author_id = t1.author_id
    ) as total
group by name, author_id

但这在postegresql上返回以下错误:“ FROM中的子查询无法引用相同查询级别的其他关系”。所以我被卡住了。有人知道我能做到吗?

谢谢


实际上,这似乎应该适用于Postgres(也许在6年前它还没有:))
qwertzguy

Answers:


121

我不确定我是否完全理解您的意图,但以下内容可能与您想要的很接近:

select n1.name, n1.author_id, count_1, total_count
  from (select id, name, author_id, count(1) as count_1
          from names
          group by id, name, author_id) n1
inner join (select id, author_id, count(1) as total_count
              from names
              group by id, author_id) n2
  on (n2.id = n1.id and n2.author_id = n1.author_id)

不幸的是,这增加了按ID以及名称和author_id对第一个子查询进行分组的要求,我认为这不是必需的。不过,我不确定如何解决此问题,因为您需要具有可用的ID才能加入第二个子查询。也许其他人会提出更好的解决方案。

分享并享受。


完美的鲍勃,那确实有效。非常感谢!我必须进行一些更改,因为我不需要带有id的联接,只需要author_id。因此,最后的查询是:从ID,名称,author_id的名称组中选择n1.name,n1.author_id,count_1,total_count(从id组中选择id,name,author_id,count(1)作为count_1)n1内部联接(选择author_id, count(1)作为来自author_id的名称组中的total_count)n2 on(n2.author_id = n1.author_id)现在,我真正想要的是将count_1除以total_count来得到归一化的频率。= D
里卡多(Ricardo)2010年

操作,只是意识到sql在这里没有正确格式化。:(会给出答案补充。
里卡多

我没有问题,里卡多在说'回合,但是这个SQL完全解决了我的问题...:D谢谢!!!
tftd 2011年

15

作为对@Bob Jarvis@dmikam答案的补充,Postgres在不使用LATERAL的情况下无法执行良好的计划,在模拟下面,在两种情况下查询数据结果都是相同的,但是成本却大不相同

表结构

CREATE TABLE ITEMS (
    N INTEGER NOT NULL,
    S TEXT NOT NULL
);

INSERT INTO ITEMS
  SELECT
    (random()*1000000)::integer AS n,
    md5(random()::text) AS s
  FROM
    generate_series(1,1000000);

CREATE INDEX N_INDEX ON ITEMS(N);

执行JOINGROUP BY子查询无LATERAL

EXPLAIN 
SELECT 
    I.*
FROM ITEMS I
INNER JOIN (
    SELECT 
        COUNT(1), n
    FROM ITEMS
    GROUP BY N
) I2 ON I2.N = I.N
WHERE I.N IN (243477, 997947);

结果

Merge Join  (cost=0.87..637500.40 rows=23 width=37)
  Merge Cond: (i.n = items.n)
  ->  Index Scan using n_index on items i  (cost=0.43..101.28 rows=23 width=37)
        Index Cond: (n = ANY ('{243477,997947}'::integer[]))
  ->  GroupAggregate  (cost=0.43..626631.11 rows=861418 width=12)
        Group Key: items.n
        ->  Index Only Scan using n_index on items  (cost=0.43..593016.93 rows=10000000 width=4)

使用 LATERAL

EXPLAIN 
SELECT 
    I.*
FROM ITEMS I
INNER JOIN LATERAL (
    SELECT 
        COUNT(1), n
    FROM ITEMS
    WHERE N = I.N
    GROUP BY N
) I2 ON 1=1 --I2.N = I.N
WHERE I.N IN (243477, 997947);

结果

Nested Loop  (cost=9.49..1319.97 rows=276 width=37)
  ->  Bitmap Heap Scan on items i  (cost=9.06..100.20 rows=23 width=37)
        Recheck Cond: (n = ANY ('{243477,997947}'::integer[]))
        ->  Bitmap Index Scan on n_index  (cost=0.00..9.05 rows=23 width=0)
              Index Cond: (n = ANY ('{243477,997947}'::integer[]))
  ->  GroupAggregate  (cost=0.43..52.79 rows=12 width=12)
        Group Key: items.n
        ->  Index Only Scan using n_index on items  (cost=0.43..52.64 rows=12 width=4)
              Index Cond: (n = i.n)

我的Postgres版本是 PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1)


3
感谢您提示使用LATERAL !!
leole

13

我只是根据鲍勃·贾维斯(Bob Jarvis)的回答(在我上面的评论中发布),以我需要的最终SQL的格式版本在这里回答:

select n1.name, n1.author_id, cast(count_1 as numeric)/total_count
  from (select id, name, author_id, count(1) as count_1
          from names
          group by id, name, author_id) n1
inner join (select author_id, count(1) as total_count
              from names
              group by author_id) n2
  on (n2.author_id = n1.author_id)

12

我知道这很旧,但是自Postgresql 9.3起,就有一个选项可以使用关键字“ LATERAL”在JOINS内使用RELATED子查询,因此该问题的查询如下:

SELECT 
    name, author_id, count(*), t.total
FROM
    names as n1
    INNER JOIN LATERAL (
        SELECT 
            count(*) as total
        FROM 
            names as n2
        WHERE 
            n2.id = n1.id
            AND n2.author_id = n1.author_id
    ) as t ON 1=1
GROUP BY 
    n1.name, n1.author_id

1
我想知道这两个查询的性能是否有所不同,或者对于postgresql来说是相同的计划
deFreitas

1
我做了这个测试,答案在这里(我的答案)
deFreitas

2
select n1.name, n1.author_id, cast(count_1 as numeric)/total_count
  from (select id, name, author_id, count(1) as count_1
          from names
          group by id, name, author_id) n1
inner join (select distinct(author_id), count(1) as total_count
              from names) n2
  on (n2.author_id = n1.author_id)
Where true

distinct如果更多的内部联接,则使用此参数,因为更多的联接组性能较慢

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.