如何在所有表中搜索特定值（PostgreSQL）？

111

是否可以在每个表的每一列中搜索PostgreSQL中的特定值？

类似的问题可以在这里为Oracle。

postgresql grep string-matching

— 桑德罗·蒙达（Sandro Munda）
source

您是否在寻找工具或所链接问题中所示过程的实现？

— a_horse_with_no_name 2011年

不，这是在所有字段/表中查找特定值的最简单方法。

— 桑德罗·蒙达

所以您不想使用外部工具？

— a_horse_with_no_name 2011年

1

如果这是最简单的方法=>可以使用外部工具:-)

— Sandro Munda

131

如何转储数据库的内容，然后使用grep？

$ pg_dump --data-only --inserts -U postgres your-db-name > a.tmp
$ grep United a.tmp
INSERT INTO countries VALUES ('US', 'United States');
INSERT INTO countries VALUES ('GB', 'United Kingdom');

相同的实用程序pg_dump可以在输出中包含列名称。只需更改--inserts为即可--column-inserts。这样，您也可以搜索特定的列名。但是，如果我要查找列名，则可能会转储模式而不是数据。

$ pg_dump --data-only --column-inserts -U postgres your-db-name > a.tmp
$ grep country_code a.tmp
INSERT INTO countries (iso_country_code, iso_country_name) VALUES ('US', 'United  States');
INSERT INTO countries (iso_country_code, iso_country_name) VALUES ('GB', 'United Kingdom');

— Mike Sherrill“猫召回”
source

5

+1自由而简单。而且，如果您希望结构pg_dump也可以做到这一点。另外，如果不是grep，请在转储的结构和/或数据上使用所需的文件内容搜索工具。

— Kuberchaun 2011年

如果要grep文本数据（通常在最新版本的postgres中进行编码），则可能需要ALTER DATABASE your_db_name SET bytea_output = 'escape';在数据库（或其副本）上进行转储之前。（我没有看到仅为pg_dump命令指定此方法。）

— phils 2015年

你能详细解释一下吗？如何在所有表中搜索字符串“ ABC”？

— 波萨尔先生

1

如果您使用的是IntelliJ，则可以右键单击数据库，然后选择“使用'pg_dump'转储”或“将数据转储到文件”

— Laurens

3

对于任何无法将其转储到磁盘的足够大的数据库，这如何有效？

— Govind Parmar，

76

这是一个pl / pgsql函数，该函数可以在任何列包含特定值的地方找到记录。它以文本格式搜索的值，要搜索到的表名数组（默认为所有表）和模式名数组（默认为所有模式名）作为参数。

它返回一个带有模式，表名，列名和伪列的表结构（表ctid中行的非持久物理位置，请参见System Columns）

CREATE OR REPLACE FUNCTION search_columns(
    needle text,
    haystack_tables name[] default '{}',
    haystack_schema name[] default '{}'
)
RETURNS table(schemaname text, tablename text, columnname text, rowctid text)
AS $$
begin
  FOR schemaname,tablename,columnname IN
      SELECT c.table_schema,c.table_name,c.column_name
      FROM information_schema.columns c
        JOIN information_schema.tables t ON
          (t.table_name=c.table_name AND t.table_schema=c.table_schema)
        JOIN information_schema.table_privileges p ON
          (t.table_name=p.table_name AND t.table_schema=p.table_schema
              AND p.privilege_type='SELECT')
        JOIN information_schema.schemata s ON
          (s.schema_name=t.table_schema)
      WHERE (c.table_name=ANY(haystack_tables) OR haystack_tables='{}')
        AND (c.table_schema=ANY(haystack_schema) OR haystack_schema='{}')
        AND t.table_type='BASE TABLE'
  LOOP
    FOR rowctid IN
      EXECUTE format('SELECT ctid FROM %I.%I WHERE cast(%I as text)=%L',
       schemaname,
       tablename,
       columnname,
       needle
      )
    LOOP
      -- uncomment next line to get some progress report
      -- RAISE NOTICE 'hit in %.%', schemaname, tablename;
      RETURN NEXT;
    END LOOP;
 END LOOP;
END;
$$ language plpgsql;

另请参见github上的版本基于相同原理的但增加了一些速度并报告了改进。

在测试数据库中的使用示例：

搜索公共模式内的所有表：

选择* from search_columns（'foobar'）;
 模式名| 表名| 列名| 行号
------------ + ----------- + ------------ + ---------
 公共| s3 | 用户名| （0.11）
 公共| s2 | relname | （7,29）
 公共| w | 身体| （0,2）
（3列）

在特定表中搜索：

 从search_columns（'foobar'，'{w}'）选择*;
 模式名| 表名| 列名| 行号
------------ + ----------- + ------------ + ---------
 公共| w | 身体| （0,2）
（1列）

搜索从选择中获得的表的子集：

select * from search_columns（'foobar'，array（从information_schema.tables中选择table_name :: name，其中table_name如's％'），array ['public']）；
 模式名| 表名| 列名| 行号
------------ + ----------- + ------------ + ---------
 公共| s2 | relname | （7,29）
 公共| s3 | 用户名| （0.11）
（2列）

获取具有相应基表和ctid的结果行：

从public.w选择*，其中ctid ='（0,2）';
 标题| 身体| 电视         
------- + -------- + ---------------------
 托托| foobar | 'foobar'：2'toto'：1

变体

要针对正则表达式而不是严格相等的条件（例如grep）进行测试，请查询以下部分：

SELECT ctid FROM %I.%I WHERE cast(%I as text)=%L

可能更改为：

SELECT ctid FROM %I.%I WHERE cast(%I as text) ~ %L
对于不区分大小写的比较，您可以编写：

SELECT ctid FROM %I.%I WHERE lower(cast(%I as text)) = lower(%L)

— 丹尼尔·韦里特
source

错误：“默认”或接近“默认”时的语法错误第3行：haystack_tables name []默认'{}'（使用PostgreSQL 8.2.17且无法升级）

— Henno 2014年

@Henno：是的，它需要PG-9.1。现在进行编辑以使其明确。要将其与旧版本一起使用，您必须对其进行调整。

— DanielVérité2014年

1

@Rajendra_Prasad：正则表达式运算符具有不区分大小写的变体：~*比lower（）更合适。但是无论如何，这t.*不是以上答案的一部分。由于列分隔符，逐列搜索与将行作为值搜索不同。

— 丹尼尔·韦里特

2

每个schema-table-column仅返回一行。

— theGtknerd

1

非常感谢。该解决方案非常适合我。我必须在包含特定URL的1000多个表的列表中找到一个表。你救了我的一天！

— Sunil

7

在每个表的每一列中搜索特定值

这没有定义如何精确匹配。
它也没有定义确切返回什么。

假设：

查找包含其文本表示形式中给定值的任何列的任何行-而不是等于给定值。
返回表名（regclass）和元组ID（ctid），因为这很简单。

这是一种简单，快速且有点脏的死法：

CREATE OR REPLACE FUNCTION search_whole_db(_like_pattern text)
  RETURNS TABLE(_tbl regclass, _ctid tid) AS
$func$
BEGIN
   FOR _tbl IN
      SELECT c.oid::regclass
      FROM   pg_class c
      JOIN   pg_namespace n ON n.oid = relnamespace
      WHERE  c.relkind = 'r'                           -- only tables
      AND    n.nspname !~ '^(pg_|information_schema)'  -- exclude system schemas
      ORDER BY n.nspname, c.relname
   LOOP
      RETURN QUERY EXECUTE format(
         'SELECT $1, ctid FROM %s t WHERE t::text ~~ %L'
       , _tbl, '%' || _like_pattern || '%')
      USING _tbl;
   END LOOP;
END
$func$  LANGUAGE plpgsql;

呼叫：

SELECT * FROM search_whole_db('mypattern');

提供搜索模式，但不要将其括起来%。

为什么稍微脏？

如果text表示形式中行的分隔符和修饰符可以是搜索模式的一部分，则可能存在误报：

列分隔符：,默认情况下
整行用括号括起来：()
一些值用双引号引起来 "
\ 可以添加为转义字符

而且，某些列的文本表示形式可能取决于本地设置-但是这种歧义是问题的固有特征，而不是我的解决方案固有的特征。

每个符合条件的行仅返回一次，即使多次匹配也是如此（与此处的其他答案相反）。

这将搜索整个数据库，除了系统目录。通常需要很长时间才能完成。您可能希望限制为某些模式/表（甚至是列），如其他答案所示。或添加通知和进度指示器，这也在另一个答案中得到了证明。

的regclass对象标识符类型被表示为表名，模式限定在必要时根据当前的歧义search_path：

使用表，字段和架构名称查找引用的表名称

什么是ctid？

如何将ctid分解为页码和行号？

您可能希望在搜索模式中转义具有特殊含义的字符。看到：

转义函数，用于正则表达式或LIKE模式

— 欧文·布兰德斯特
source

这个更好的解决方案甚至可以使用Lower（）-'SELECT $ 1，ctid from％st WHERE Lower（t :: text）~~ lower（％L）'

— Georgi Bonchev，

5

如果有人认为这会有所帮助。这是@DanielVérité的函数，另一个参数接受可以在搜索中使用的列名。这样，它减少了处理时间。至少在我的测试中，它减少了很多。

CREATE OR REPLACE FUNCTION search_columns(
    needle text,
    haystack_columns name[] default '{}',
    haystack_tables name[] default '{}',
    haystack_schema name[] default '{public}'
)
RETURNS table(schemaname text, tablename text, columnname text, rowctid text)
AS $$
begin
  FOR schemaname,tablename,columnname IN
      SELECT c.table_schema,c.table_name,c.column_name
      FROM information_schema.columns c
      JOIN information_schema.tables t ON
        (t.table_name=c.table_name AND t.table_schema=c.table_schema)
      WHERE (c.table_name=ANY(haystack_tables) OR haystack_tables='{}')
        AND c.table_schema=ANY(haystack_schema)
        AND (c.column_name=ANY(haystack_columns) OR haystack_columns='{}')
        AND t.table_type='BASE TABLE'
  LOOP
    EXECUTE format('SELECT ctid FROM %I.%I WHERE cast(%I as text)=%L',
       schemaname,
       tablename,
       columnname,
       needle
    ) INTO rowctid;
    IF rowctid is not null THEN
      RETURN NEXT;
    END IF;
 END LOOP;
END;
$$ language plpgsql;

波纹管是上面创建的search_function的用法示例。

SELECT * FROM search_columns('86192700'
    , array(SELECT DISTINCT a.column_name::name FROM information_schema.columns AS a
            INNER JOIN information_schema.tables as b ON (b.table_catalog = a.table_catalog AND b.table_schema = a.table_schema AND b.table_name = a.table_name)
        WHERE 
            a.column_name iLIKE '%cep%' 
            AND b.table_type = 'BASE TABLE'
            AND b.table_schema = 'public'
    )

    , array(SELECT b.table_name::name FROM information_schema.columns AS a
            INNER JOIN information_schema.tables as b ON (b.table_catalog = a.table_catalog AND b.table_schema = a.table_schema AND b.table_name = a.table_name)
        WHERE 
            a.column_name iLIKE '%cep%' 
            AND b.table_type = 'BASE TABLE'
            AND b.table_schema = 'public')
);

— 丹尼尔·马丁豪
source

5

在不存储新过程的情况下，您可以使用代码块并执行以获取事件表。您可以按架构，表或列名称过滤结果。

DO $$
DECLARE
  value int := 0;
  sql text := 'The constructed select statement';
  rec1 record;
  rec2 record;
BEGIN
  DROP TABLE IF EXISTS _x;
  CREATE TEMPORARY TABLE _x (
    schema_name text, 
    table_name text, 
    column_name text,
    found text
  );
  FOR rec1 IN 
        SELECT table_schema, table_name, column_name
        FROM information_schema.columns 
        WHERE table_name <> '_x'
                AND UPPER(column_name) LIKE UPPER('%%')                  
                AND table_schema <> 'pg_catalog'
                AND table_schema <> 'information_schema'
                AND data_type IN ('character varying', 'text', 'character', 'char', 'varchar')
        LOOP
    sql := concat('SELECT ', rec1."column_name", ' AS "found" FROM ',rec1."table_schema" , '.',rec1."table_name" , ' WHERE UPPER(',rec1."column_name" , ') LIKE UPPER(''','%my_substring_to_find_goes_here%' , ''')');
    RAISE NOTICE '%', sql;
    BEGIN
        FOR rec2 IN EXECUTE sql LOOP
            RAISE NOTICE '%', sql;
            INSERT INTO _x VALUES (rec1."table_schema", rec1."table_name", rec1."column_name", rec2."found");
        END LOOP;
    EXCEPTION WHEN OTHERS THEN
    END;
  END LOOP;
  END; $$;

SELECT * FROM _x;

— Profimedica
source

您在哪里指定搜索字符串？还是只是逐表转储整个数据库？

— jimtut

1

我没有为字符串创建参数。您可以对其进行硬编码并直接作为一个块运行，也可以从中创建一个存储过程。无论如何，您要搜索的字符串都在两个百分号之间：WHERE UPPER（'，rec1。“ column_name”，'）LIKE UPPER（'''，'%%'，'''）

— profimedica

5

有一种无需创建功能或使用外部工具即可实现此目的的方法。通过使用query_to_xml()可以在另一个查询中动态运行查询的Postgres 函数，可以在许多表中搜索文本。这是基于我的答案来检索所有表的行数：

要foo在模式中的所有表中搜索字符串，可以使用以下内容：

with found_rows as (
  select format('%I.%I', table_schema, table_name) as table_name,
         query_to_xml(format('select to_jsonb(t) as table_row 
                              from %I.%I as t 
                              where t::text like ''%%foo%%'' ', table_schema, table_name), 
                      true, false, '') as table_rows
  from information_schema.tables 
  where table_schema = 'public'
)
select table_name, x.table_row
from found_rows f
  left join xmltable('//table/row' 
                     passing table_rows
                       columns
                         table_row text path 'table_row') as x on true

请注意，必须使用xmltablePostgres 10或更高版本。对于较旧的Postgres版本，也可以使用xpath（）完成此操作。

with found_rows as (
  select format('%I.%I', table_schema, table_name) as table_name,
         query_to_xml(format('select to_jsonb(t) as table_row 
                              from %I.%I as t 
                              where t::text like ''%%foo%%'' ', table_schema, table_name), 
                      true, false, '') as table_rows
  from information_schema.tables 
  where table_schema = 'public'
)
select table_name, x.table_row
from found_rows f
   cross join unnest(xpath('/table/row/table_row/text()', table_rows)) as r(data)

公用表表达式（WITH ...）仅用于方便使用。它循环遍历public架构中的所有表。对于每个表，通过该query_to_xml()函数运行以下查询：

select to_jsonb(t)
from some_table t
where t::text like '%foo%';

where子句用于确保仅对包含搜索字符串的行完成XML内容的昂贵生成。这可能会返回如下内容：

<table xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
  <table_row>{"id": 42, "some_column": "foobar"}</table_row>
</row>
</table>

完成将完整的行转换为jsonb，以便在结果中可以看到哪个值属于哪个列。

上面的内容可能会返回如下内容：

table_name   |   table_row
-------------+----------------------------------------
public.foo   |  {"id": 1, "some_column": "foobar"}
public.bar   |  {"id": 42, "another_column": "barfoo"}

Postgres 10+的在线示例

较旧的Postgres版本的在线示例

— a_horse_with_no_name
source

我试图运行较旧的PostgreSQL版本的代码，我收到以下错误

ERROR: 42883: function format("unknown", information_schema.sql_identifier, information_schema.sql_identifier) does not exist

— 马特

您可能需要投射它们：format('%I.%I', table_schema::text, table_name::text)

— a_horse_with_no_name

好的，做到了，现在我有ERROR: 42883: function format("unknown", character varying, character varying) does not exist

— Matt

那么您的许多Postgres版本太旧了，以至于id甚至都没有format()功能

— a_horse_with_no_name

我认为Redshift是基于8.3的？

— 马特

3

这是@DanielVérité的功能，具有进度报告功能。它以三种方式报告进度：

通过RAISE NOTICE；
通过将提供的{progress_seq}序列的值从{要搜索的列总数}减小到0；
通过将进度和找到的表一起写入位于c：\ windows \ temp \ {progress_seq} .txt中的文本文件。

_

CREATE OR REPLACE FUNCTION search_columns(
    needle text,
    haystack_tables name[] default '{}',
    haystack_schema name[] default '{public}',
    progress_seq text default NULL
)
RETURNS table(schemaname text, tablename text, columnname text, rowctid text)
AS $$
DECLARE
currenttable text;
columnscount integer;
foundintables text[];
foundincolumns text[];
begin
currenttable='';
columnscount = (SELECT count(1)
      FROM information_schema.columns c
      JOIN information_schema.tables t ON
        (t.table_name=c.table_name AND t.table_schema=c.table_schema)
      WHERE (c.table_name=ANY(haystack_tables) OR haystack_tables='{}')
        AND c.table_schema=ANY(haystack_schema)
        AND t.table_type='BASE TABLE')::integer;
PERFORM setval(progress_seq::regclass, columnscount);

  FOR schemaname,tablename,columnname IN
      SELECT c.table_schema,c.table_name,c.column_name
      FROM information_schema.columns c
      JOIN information_schema.tables t ON
        (t.table_name=c.table_name AND t.table_schema=c.table_schema)
      WHERE (c.table_name=ANY(haystack_tables) OR haystack_tables='{}')
        AND c.table_schema=ANY(haystack_schema)
        AND t.table_type='BASE TABLE'
  LOOP
    EXECUTE format('SELECT ctid FROM %I.%I WHERE cast(%I as text)=%L',
       schemaname,
       tablename,
       columnname,
       needle
    ) INTO rowctid;
    IF rowctid is not null THEN
      RETURN NEXT;
      foundintables = foundintables || tablename;
      foundincolumns = foundincolumns || columnname;
      RAISE NOTICE 'FOUND! %, %, %, %', schemaname,tablename,columnname, rowctid;
    END IF;
         IF (progress_seq IS NOT NULL) THEN 
        PERFORM nextval(progress_seq::regclass);
    END IF;
    IF(currenttable<>tablename) THEN  
    currenttable=tablename;
     IF (progress_seq IS NOT NULL) THEN 
        RAISE NOTICE 'Columns left to look in: %; looking in table: %', currval(progress_seq::regclass), tablename;
        EXECUTE 'COPY (SELECT unnest(string_to_array(''Current table (column ' || columnscount-currval(progress_seq::regclass) || ' of ' || columnscount || '): ' || tablename || '\n\nFound in tables/columns:\n' || COALESCE(
        (SELECT string_agg(c1 || '/' || c2, '\n') FROM (SELECT unnest(foundintables) AS c1,unnest(foundincolumns) AS c2) AS t1)
        , '') || ''',''\n''))) TO ''c:\WINDOWS\temp\' || progress_seq || '.txt''';
    END IF;
    END IF;
 END LOOP;
END;
$$ language plpgsql;

— 亚历山大科夫斯基
source

3

-下面的函数将列出数据库中包含特定字符串的所有表

 select TablesCount(‘StringToSearch’);

-遍历数据库中的所有表

CREATE OR REPLACE FUNCTION **TablesCount**(_searchText TEXT)
RETURNS text AS 
$$ -- here start procedural part
   DECLARE _tname text;
   DECLARE cnt int;
   BEGIN
    FOR _tname IN SELECT table_name FROM information_schema.tables where table_schema='public' and table_type='BASE TABLE'  LOOP
         cnt= getMatchingCount(_tname,Columnames(_tname,_searchText));
                                RAISE NOTICE 'Count% ', CONCAT('  ',cnt,' Table name: ', _tname);
                END LOOP;
    RETURN _tname;
   END;
$$ -- here finish procedural part
LANGUAGE plpgsql; -- language specification

-返回满足条件的表的计数。-例如，如果预期的文本存在于表的任何字段中，则该计数将大于0。我们可以在postgres数据库的结果查看器的“消息”部分中找到通知。

CREATE OR REPLACE FUNCTION **getMatchingCount**(_tname TEXT, _clause TEXT)
RETURNS int AS 
$$
Declare outpt text;
    BEGIN
    EXECUTE 'Select Count(*) from '||_tname||' where '|| _clause
       INTO outpt;
       RETURN outpt;
    END;
$$ LANGUAGE plpgsql;

-获取每个表的字段。用表的所有列构建where子句。

CREATE OR REPLACE FUNCTION **Columnames**(_tname text,st text)
RETURNS text AS 
$$ -- here start procedural part
DECLARE
                _name text;
                _helper text;
   BEGIN
                FOR _name IN SELECT column_name FROM information_schema.Columns WHERE table_name =_tname LOOP
                                _name=CONCAT('CAST(',_name,' as VarChar)',' like ','''%',st,'%''', ' OR ');
                                _helper= CONCAT(_helper,_name,' ');
                END LOOP;
                RETURN CONCAT(_helper, ' 1=2');

   END;
$$ -- here finish procedural part
LANGUAGE plpgsql; -- language specification

— 加内什
source