替代字符串或执行过程以防止SQL查询代码重复的替代方法?


19

免责声明:作为一个只使用数据库的人,请耐心等待。(大多数时候,我会在工作中进行C ++编程,但是每个奇数月我都需要在Oracle数据库中搜索/修复/添加某些内容。)

我一再需要编写复杂的SQL查询,无论是针对临时查询还是针对应用程序内置的查询,其中大部分查询只是重复的“代码”。

用传统的编程语言编写此类可憎的代码会给您带来深重的麻烦,但是我(I)尚未找到任何体面的技术来防止SQL查询代码重复。


编辑: 1,我要感谢对我的原始示例进行了出色改进的回答者。但是,这个问题与我的示例无关。这是关于SQL查询中的重复性。这样,到目前为止的答案(JackPLeigh)在显示可以通过编写更好的查询来减少重复性方面做得很好。但是即使如此,您仍然面临着一些重复性,这些重复性显然无法消除:这总是使SQL困扰我。在“传统”编程语言中,我可以进行大量重构以最大程度地减少代码中的重复性,但是对于SQL,似乎没有(?)工具允许这样做,除了以重复的方式编写较少的语句。

请注意,我再次删除了Oracle标记,因为如果没有数据库或脚本语言可以支持更多功能,我将非常感兴趣。


这是我今天拼凑而成的一颗宝石。它基本上报告单个表的一组列中的差异。请略读以下代码,特别是。最后是大型查询。我将在下面继续。

--
-- Create Table to test queries
--
CREATE TABLE TEST_ATTRIBS (
id NUMBER PRIMARY KEY,
name  VARCHAR2(300) UNIQUE,
attr1 VARCHAR2(2000),
attr2 VARCHAR2(2000),
attr3 INTEGER,
attr4 NUMBER,
attr5 VARCHAR2(2000)
);

--
-- insert some test data
--
insert into TEST_ATTRIBS values ( 1, 'Alfred',   'a', 'Foobar', 33, 44, 'e');
insert into TEST_ATTRIBS values ( 2, 'Batman',   'b', 'Foobar', 66, 44, 'e');
insert into TEST_ATTRIBS values ( 3, 'Chris',    'c', 'Foobar', 99, 44, 'e');
insert into TEST_ATTRIBS values ( 4, 'Dorothee', 'd', 'Foobar', 33, 44, 'e');
insert into TEST_ATTRIBS values ( 5, 'Emilia',   'e', 'Barfoo', 66, 44, 'e');
insert into TEST_ATTRIBS values ( 6, 'Francis',  'f', 'Barfoo', 99, 44, 'e');
insert into TEST_ATTRIBS values ( 7, 'Gustav',   'g', 'Foobar', 33, 44, 'e');
insert into TEST_ATTRIBS values ( 8, 'Homer',    'h', 'Foobar', 66, 44, 'e');
insert into TEST_ATTRIBS values ( 9, 'Ingrid',   'i', 'Foobar', 99, 44, 'e');
insert into TEST_ATTRIBS values (10, 'Jason',    'j', 'Bob',    33, 44, 'e');
insert into TEST_ATTRIBS values (12, 'Konrad',   'k', 'Bob',    66, 44, 'e');
insert into TEST_ATTRIBS values (13, 'Lucas',    'l', 'Foobar', 99, 44, 'e');

insert into TEST_ATTRIBS values (14, 'DUP_Alfred',   'a', 'FOOBAR', 33, 44, 'e');
insert into TEST_ATTRIBS values (15, 'DUP_Chris',    'c', 'Foobar', 66, 44, 'e');
insert into TEST_ATTRIBS values (16, 'DUP_Dorothee', 'd', 'Foobar', 99, 44, 'e');
insert into TEST_ATTRIBS values (17, 'DUP_Gustav',   'X', 'Foobar', 33, 44, 'e');
insert into TEST_ATTRIBS values (18, 'DUP_Homer',    'h', 'Foobar', 66, 44, 'e');
insert into TEST_ATTRIBS values (19, 'DUP_Ingrid',   'Y', 'foo',    99, 44, 'e');

insert into TEST_ATTRIBS values (20, 'Martha',   'm', 'Bob',    33, 88, 'f');

-- Create comparison view
CREATE OR REPLACE VIEW TA_SELFCMP as
select 
t1.id as id_1, t2.id as id_2, t1.name as name, t2.name as name_dup,
t1.attr1 as attr1_1, t1.attr2 as attr2_1, t1.attr3 as attr3_1, t1.attr4 as attr4_1, t1.attr5 as attr5_1,
t2.attr1 as attr1_2, t2.attr2 as attr2_2, t2.attr3 as attr3_2, t2.attr4 as attr4_2, t2.attr5 as attr5_2
from TEST_ATTRIBS t1, TEST_ATTRIBS t2
where t1.id <> t2.id
and t1.name <> t2.name
and t1.name = REPLACE(t2.name, 'DUP_', '')
;

-- NOTE THIS PIECE OF HORRIBLE CODE REPETITION --
-- Create comparison report
-- compare 1st attribute
select 'attr1' as Different,
id_1, id_2, name, name_dup,
CAST(attr1_1 AS VARCHAR2(2000)) as Val1, CAST(attr1_2 AS VARCHAR2(2000)) as Val2
from TA_SELFCMP
where attr1_1 <> attr1_2
or (attr1_1 is null and attr1_2 is not null)
or (attr1_1 is not null and attr1_2 is null)
union
-- compare 2nd attribute
select 'attr2' as Different,
id_1, id_2, name, name_dup,
CAST(attr2_1 AS VARCHAR2(2000)) as Val1, CAST(attr2_2 AS VARCHAR2(2000)) as Val2
from TA_SELFCMP
where attr2_1 <> attr2_2
or (attr2_1 is null and attr2_2 is not null)
or (attr2_1 is not null and attr2_2 is null)
union
-- compare 3rd attribute
select 'attr3' as Different,
id_1, id_2, name, name_dup,
CAST(attr3_1 AS VARCHAR2(2000)) as Val1, CAST(attr3_2 AS VARCHAR2(2000)) as Val2
from TA_SELFCMP
where attr3_1 <> attr3_2
or (attr3_1 is null and attr3_2 is not null)
or (attr3_1 is not null and attr3_2 is null)
union
-- compare 4th attribute
select 'attr4' as Different,
id_1, id_2, name, name_dup,
CAST(attr4_1 AS VARCHAR2(2000)) as Val1, CAST(attr4_2 AS VARCHAR2(2000)) as Val2
from TA_SELFCMP
where attr4_1 <> attr4_2
or (attr4_1 is null and attr4_2 is not null)
or (attr4_1 is not null and attr4_2 is null)
union
-- compare 5th attribute
select 'attr5' as Different,
id_1, id_2, name, name_dup,
CAST(attr5_1 AS VARCHAR2(2000)) as Val1, CAST(attr5_2 AS VARCHAR2(2000)) as Val2
from TA_SELFCMP
where attr5_1 <> attr5_2
or (attr5_1 is null and attr5_2 is not null)
or (attr5_1 is not null and attr5_2 is null)
;

如您所见,生成“差异报告”的查询使用相同的SQL SELECT块5次(很可能是42次!)。这让我感到十分震惊(毕竟,我写了代码之后,我可以这么说了),但是我还没有找到任何好的解决方案。

  • 如果这将是一些实际应用程序代码中的查询,我可以编写一个函数将此查询拼凑在一起作为字符串,然后将查询作为字符串执行。

    • ->构建字符串非常可怕,而且测试和维护都很可怕。如果“应用程序代码”是用PL / SQL之类的语言编写的,那会感到很不舒服,很伤人。
  • 另外,如果从PL / SQL或类似工具中使用,我猜想有一些程序上的方法可以使此查询更易于维护。

    • ->将可以在单个查询中表达的内容展开到程序步骤中,只是为了防止代码重复,这也是错误的。
  • 如果需要将此查询作为数据库中的视图,那么据我所知,除了我上面发布的内容,实际上除了维护视图定义外别无其他方法。(!!?)

    • ->实际上,一旦与上述声明相距不远,我实际上必须对2页视图定义进行一些维护。显然,要更改此视图中的任何内容,都需要在视图定义上进行正则表达式文本搜索,以确定是否在另一行中使用了相同的子语句,是否需要在此处更改。

那么,随着标题的发展,有什么技术可以防止不得不写这种可憎的东西?

Answers:


13

您太谦虚了-考虑到您要执行的任务,您的SQL编写得很简洁。一些提示:

  • t1.name <> t2.name如果t1.name = REPLACE(t2.name, 'DUP_', '')-您可以删除前者,则始终为真
  • 通常你要union allunion则意味着union all删除重复项。在这种情况下,这可能没有什么区别,但union all除非您明确要删除任何重复项,否则始终使用是一个好习惯。
  • 如果您愿意转换为varchar 之后进行数字比较,则可能需要考虑以下内容:

    create view test_attribs_cast as 
    select id, name, attr1, attr2, cast(attr3 as varchar(2000)) as attr3, 
           cast(attr4 as varchar(2000)) as attr4, attr5
    from test_attribs;
    
    create view test_attribs_unpivot as 
    select id, name, 1 as attr#, attr1 as attr from test_attribs_cast union all
    select id, name, 2, attr2 from test_attribs_cast union all
    select id, name, 3, attr3 from test_attribs_cast union all
    select id, name, 4, attr4 from test_attribs_cast union all
    select id, name, 5, attr5 from test_attribs_cast;
    
    select 'attr'||t1.attr# as different, t1.id as id_1, t2.id as id_2, t1.name, 
           t2.name as name_dup, t1.attr as val1, t2.attr as val2
    from test_attribs_unpivot t1 join test_attribs_unpivot t2 on(
           t1.id<>t2.id and 
           t1.name = replace(t2.name, 'DUP_', '') and 
           t1.attr#=t2.attr# )
    where t1.attr<>t2.attr or (t1.attr is null and t2.attr is not null)
          or (t1.attr is not null and t2.attr is null);

    第二个视图是一种unpivot操作-如果您的体重至少为11g,则可以使用unpivot子句更简洁地执行此操作-参见此处的示例

  • 我说如果您可以在SQL中进行操作,请不要走程序路线,但是...
  • 尽管您提到了测试和维护方面的问题,但动态SQL还是值得考虑的

- 编辑 -

为了回答问题的更一般的方面,有一些减少SQL中重复的​​技术,包括:

  • 视图-您知道那一个:)
  • 公用表表达式(例如,请参见此处
  • 数据库的各个功能,例如decode(有关如何减少重复的信息,请参见Leigh的答案),窗口函数以及分层 / 递归查询,仅举几例

但是您不能直接将面向对象的思想带入SQL领域-在许多情况下,如果查询可读且编写得当,重复可以了,例如为了避免重复而采用动态SQL是不明智的。

包含Leigh的建议更改和CTE(而非视图)的最终查询可能如下所示:

with t as ( select id, name, attr#, 
                   decode(attr#,1,attr1,2,attr2,3,attr3,4,attr4,attr5) attr
            from test_attribs
                 cross join (select rownum attr# from dual connect by rownum<=5))
select 'attr'||t1.attr# as different, t1.id as id_1, t2.id as id_2, t1.name, 
       t2.name as name_dup, t1.attr as val1, t2.attr as val2
from t t1 join test_attribs_unpivot t2 
               on( t1.id<>t2.id and 
                   t1.name = replace(t2.name, 'DUP_', '') and 
                   t1.attr#=t2.attr# )
where t1.attr<>t2.attr or (t1.attr is null and t2.attr is not null)
      or (t1.attr is not null and t2.attr is null);

1
+1,部分用于UNION ALL。通常,UNION如果没有这种处理,ALL通常会导致为所需的排序操作假脱机到临时存储区(因为有效地UNION ALL跟随了“ UNION” ,DISTINCT这意味着要进行排序),因此在某些情况下,性能差异可能很大。
David Spillett

7

这是JackPDouglas (+1)提供的test_attribs_unpivot视图的替代方法,该视图在11g之前的版本中可用,并且较少进行全表扫描:

CREATE OR REPLACE VIEW test_attribs_unpivot AS
   SELECT ID, Name, MyRow Attr#, CAST(
      DECODE(MyRow,1,attr1,2,attr2,3,attr3,4,attr4,attr5) AS VARCHAR2(2000)) attr
   FROM TEST_ATTRIBS 
   CROSS JOIN (SELECT level MyRow FROM dual connect by level<=5);

在此视图中,可以直接使用他的最终查询。


好多了!我想你甚至可以放弃演员阵容?
杰克·道格拉斯

代替SELECT rownum MyRow FROM test_attribs where rownum<=5使用select level MyRow from dual connect by level <= 5。您不希望所有这些逻辑获取仅用于创建5行。
斯特凡Oravec

@ŠtefanOravec-我有这样的想法,但是我更改了它,因为我不确定分层查询的版本。由于至少从版本8开始提供该功能,因此我将对其进行更改。
雷·里菲尔

4

我经常遇到类似的问题,比较表的两个版本的新行,删除行或更改行。一个月前,我在这里发布了使用PowerShell的SQL Server解决方案。

为了适应您的问题,我首先创建两个视图以将原始行与重复行分开

CREATE OR REPLACE VIEW V1_TEST_ATTRIBS AS 
select * from TEST_ATTRIBS where SUBSTR(name, 1, 4) <> 'DUP_'; 

CREATE OR REPLACE VIEW V2_TEST_ATTRIBS AS 
select id, REPLACE(name, 'DUP_', '') name, attr1, attr2, attr3, attr4, attr5 from TEST_ATTRIBS where SUBSTR(name, 1, 4) = 'DUP_'; 

然后我检查更改

SELECT 1 SRC, NAME, ATTR1, ATTR2, ATTR3, ATTR4, ATTR5 FROM V1_TEST_ATTRIBS
MINUS
Select 1 SRC, NAME, ATTR1, ATTR2, ATTR3, ATTR4, ATTR5 from V2_TEST_ATTRIBS
UNION
SELECT 2 SRC, NAME, ATTR1, ATTR2, ATTR3, ATTR4, ATTR5 FROM V2_TEST_ATTRIBS
MINUS
SELECT 2 SRC ,NAME, ATTR1, ATTR2, ATTR3, ATTR4, ATTR5 FROM V1_TEST_ATTRIBS
ORDER BY NAME, SRC;

在这里,我可以找到您的原始ID

Select NVL(v1.id, v2.id) id,  t.name, t.attr1, t.attr2, t.attr3, t.attr4, t.attr5 from
(
SELECT 1 SRC, NAME, ATTR1, ATTR2, ATTR3, ATTR4, ATTR5 FROM V1_TEST_ATTRIBS
MINUS
Select 1 SRC, NAME, ATTR1, ATTR2, ATTR3, ATTR4, ATTR5 from V2_TEST_ATTRIBS
UNION
SELECT 2 SRC, NAME, ATTR1, ATTR2, ATTR3, ATTR4, ATTR5 FROM V2_TEST_ATTRIBS
MINUS
Select 2 SRC ,NAME, ATTR1, ATTR2, ATTR3, ATTR4, ATTR5 from V1_TEST_ATTRIBS
) t
LEFT JOIN V1_TEST_ATTRIBS V1 ON T.NAME = V1.NAME AND T.SRC = 1
LEFT JOIN V2_TEST_ATTRIBS V2 ON T.NAME = V2.NAME AND T.SRC = 2
ORDER by NAME, SRC;

顺便说一句:MINUS,UNION和GROUP BY将不同的NULL视为相等。使用这些操作使查询更加优雅。

对于SQL Server用户的提示:MINUS在这里被命名为EXCEPT,但是工作原理相似。

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.