在PostgreSQL中查找表的空列


Answers:


13

测试平台:

create role stack;
create schema authorization stack;
set role stack;

create table my_table as 
select generate_series(0,9) as id, 1 as val1, null::integer as val2;

create table my_table2 as 
select generate_series(0,9) as id, 1 as val1, null::integer as val2, 3 as val3;

功能:

create function has_nonnulls(p_schema in text, p_table in text, p_column in text)
                returns boolean language plpgsql as $$
declare 
  b boolean;
begin
  execute 'select exists(select * from '||
          p_table||' where '||p_column||' is not null)' into b;
  return b;
end;$$;

查询:

select table_schema, table_name, column_name, 
       has_nonnulls(table_schema, table_name, column_name)
from information_schema.columns
where table_schema='stack';

结果:

 table_schema | table_name | column_name | has_nonnulls
--------------+------------+-------------+--------------
 stack        | my_table   | id          | t
 stack        | my_table   | val1        | t
 stack        | my_table   | val2        | f
 stack        | my_table2  | id          | t
 stack        | my_table2  | val1        | t
 stack        | my_table2  | val2        | f
 stack        | my_table2  | val3        | t
(7 rows)

另外,您可以通过查询目录来获得近似答案-如果null_frac为零,则表示没有null,但应与“真实”数据进行仔细检查:

select tablename, attname, null_frac from pg_stats where schemaname='stack';

 tablename | attname | null_frac
-----------+---------+-----------
 my_table  | id      |         0
 my_table  | val1    |         0
 my_table  | val2    |         1
 my_table2 | id      |         0
 my_table2 | val1    |         0
 my_table2 | val2    |         1
 my_table2 | val3    |         0
(7 rows)

1
这是一个令人毛骨悚然的老问题,但是使用空间扩展(postgis)的人们应该注意,pg_stats如果在创建表时它们是空的,则不会出现空的空间列。今天我在做家务时发现了这一点。我发现使用导入了一些历史性的Aspatialogr2ogr。如果要导入的数据中没有空间列,则ogr2ogr创建一个充满的几何列<NULL>。我的数据库pg_stats没有导入的aspatial表中的几何列(它具有这些表的所有其他列)。我觉得很奇怪。
GT。

6

在Postgresql中,您可以直接从统计信息中获取数据:

vacuum analyze; -- if needed

select schemaname, tablename, attname
from pg_stats
where most_common_vals is null
and most_common_freqs is null
and histogram_bounds is null
and correlation is null
and null_frac = 1;

您可能会得到一些误报,因此在找到候选者之后应进行重新检查。


您是否需要其他条件null_frac=1
杰克·道格拉斯

我不确定。null_frac大概是实数,因此在某些奇怪的情况下可能会四舍五入为1。但是,即使每1万行中只有1行,也会产生适合的结果。
丹尼斯·德伯纳迪

1

我将向您展示我的T-SQL解决方案,该解决方案适用于SQL Server2008。我对PostgreSQL不熟悉,但是我希望您能在我的解决方案中找到一些指导。

-- create test table
IF object_id ('dbo.TestTable') is not null
    DROP table testTable
go
create table testTable (
    id int identity primary key clustered,
    nullColumn varchar(100) NULL,
    notNullColumn varchar(100) not null,
    combinedColumn varchar(100) NULL,
    testTime datetime default getdate()
);
go

-- insert test data:
INSERT INTO testTable(nullColumn, notNullColumn, combinedColumn)
SELECT NULL, 'Test', 'Combination'
from sys.objects
union all
SELECT NULL, 'Test2', NULL
from sys.objects

select *
from testTable

-- FIXED SCRIPT FOR KNOWN TABLE (known structure) - find all completely NULL columns
select sum(datalength(id)) as SumColLength,
    'id' as ColumnName
from dbo.testTable
UNION ALL
select sum(datalength(nullColumn)) as SumColLength,
    'nullColumn' as ColumnName
from dbo.testTable
UNION ALL
select sum(datalength(notNullColumn)) as SumColLength,
    'notNullColumn' as ColumnName
from dbo.testTable
UNION ALL
select sum(datalength(combinedColumn)) as SumColLength,
    'combinedColumn' as ColumnName
from dbo.testTable
UNION ALL
select sum(datalength(testTime)) as SumColLength,
    'testTime' as ColumnName
from dbo.testTable

-- DYNAMIC SCRIPT (unknown structure) - find all completely NULL columns
declare @sql varchar(max) = '', @tableName sysname = 'testTable';

SELECT @sql +=
        'select sum(datalength(' + c.COLUMN_NAME + ')) as SumColLength,
    ''' + c.COLUMN_NAME + ''' as ColumnName
from ' + c.TABLE_SCHEMA + '.' + c.TABLE_NAME --as StatementToExecute
+ '
UNION ALL
'
FROM INFORMATION_SCHEMA.COLUMNS c
WHERE c.TABLE_NAME = @tableName;

SET @sql = left(@sql, len(@sql)-11)
print @sql;
exec (@sql);

简而言之,我要做的是创建一个包含5列的测试表,ID和testTime由identity和getdate()函数生成,而3个varchar列是您感兴趣的列。一个将只有NULL值,一个将没有任何NULL,另一个将是组合列。该脚本的最终结果是该脚本将报告列nullColumn为所有行均为NULL。

想法是为每一列计算函数DATALENGTH(计算给定表达式的字节数)。因此,我为每列的每一行计算了DATALENGTH值,并为每列求和。如果每列的SUM为NULL,则整个列具有NULL行,否则内部有一些数据。

现在,您必须找到PostgreSQL的翻译,并希望有一个同事能够为您提供帮助。也许有一个不错的系统视图,它将显示我对于重新发明轮子的愚蠢:-)。


1

您需要在信息目录中查询以下信息:

SELECT column_name FROM information_schema.columns WHERE table_name='your_table'

给您列的匹配表。

我目前没有Postgres安装,但是其余的应该很简单

   loop over the results of the above query and foreach result
        send a COUNT(*) to the table
        if the count is null, give back the column,
                 else ignore it
   end foreach

这是可行的,但这是一种迭代方法:-)。我更喜欢基于集合的方法。
玛丽安(Marian)

0

从多种资源合并之后,我想到了此函数并进行查询以查找所有数据库表中的所有空列

CREATE OR REPLACE FUNCTION public.isEmptyColumn(IN table_name varchar, IN column_name varchar)
RETURNS boolean AS $$
declare 
    count integer;
BEGIN
    execute FORMAT('SELECT COUNT(*) from %s WHERE %s IS NOT NULL', table_name, quote_ident(column_name)) into count;
    RETURN (count = 0);
END; $$
LANGUAGE PLPGSQL; 


SELECT s.table_name, s.column_name
FROM information_schema.columns s
WHERE (s.table_schema LIKE 'public') AND
      (s.table_name NOT LIKE 'pg_%') AND
      (public.isEmptyColumn(s.table_name, s.column_name))

请享用 :)

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.