根据文件的第5列值过滤.CSV文件，并将这些记录打印到新文件中

我有一个.CSV文件，格式如下：

"column 1","column 2","column 3","column 4","column 5","column 6","column 7","column 8","column 9","column 10
"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23455","12312255564","string, with, multiple, commas","string with or, without commas","string 2","USD","433","70%","07/15/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""
"46476","15467534544","lengthy string, with commas, multiple: colans","string with or, without commas","string 2","CAND","388","70%","09/21/2013",""

文件的第5列具有不同的字符串。我需要根据第5列的值过滤掉文件。可以说，我需要一个当前文件中的新文件，该文件的第五个字段中的记录仅包含值“字符串1”。

为此，我尝试了以下命令，

awk -F"," ' { if toupper($5) == "STRING 1") PRINT }' file1.csv > file2.csv

但是它给我抛出如下错误：

awk: { if toupper($5) == "STRING 1") PRINT }
awk: ^ syntax error
awk: { if toupper($5) == "STRING 1") PRINT }
awk: ^ syntax error

然后，我使用以下代码给了我一个奇怪的输出。

awk -F"," '$5="string 1" {print}' file1.csv > file2.csv

输出：

"column 1" "column 2" "column 3" "column 4" string 1 "column 6" "column 7" "column 8" "column 9" "column 10
"12310" "42324564756" "a simple string with a comma" string 1 without commas" "string 1" "USD" "12" "70%" "08/01/2013" ""
"23455" "12312255564" "string with string 1 commas" "string with or without commas" "string 2" "USD" "433" "70%" "07/15/2013" ""
"23525" "74535243123" "string with commas string 1 "string with or without commas" "string 1" "CAND" "744" "70%" "05/06/2013" ""
"46476" "15467534544" "lengthy string with commas string 1 "string with or without commas" "string 2" "CAND" "388" "70%" "09/21/2013" ""

PS：我使用toupper命令是安全的，因为我不确定字符串是小写还是大写。我需要知道我的代码有什么问题，以及在使用AWK搜索模式时字符串中的空格是否重要。

— 德鲁夫
source

Answers:

awk -F '","'  'BEGIN {OFS=","} { if (toupper($5) == "STRING 1")  print }' file1.csv > file2.csv

输出量

"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""

我想这就是你想要的。

— 利莫瓦拉
source

输出正是我所需要的。我还没想到做'","'定界符，否则它将解决我的问题...很好的解决方案...

— Dhruuv

@Dhruuv做'","'定界符是您上一个问题建议的大多数答案:)。

— terdon

@terdon：是的，我知道，但是当我遇到问题时，这并没有引起我的注意。坦白说，我认为可能是命令所引起的问题或分隔符以外的其他因素导致了问题... :)因此没有尝试... :(

— Dhruuv

@Dhruuv不确定细节，因为我无法告诉您您要做什么，但是您的其他条件几乎肯定是错误的。如果$ 5美元是HYPERION，您是否要尝试仅打印一张？如果是这样，请尝试else{if(toupper($5)=="HYPERION"){print}}。目前不在我的电脑上，因此我的语法可能错误，但是您不能为else语句提供条件。

— terdon

awk -F '","' 'BEGIN {OFS=","} { if (NR==1) {print} else{if (toupper($5) == "STRING 1") print} }' file1

— limovala 2013年

CSV的问题是没有标准。如果您需要经常处理CSV格式的数据，则可能需要寻找一种更可靠的方法，而不仅仅是将其","用作字段分隔符。在这种情况下，Perl的Text::CSVCPAN模块非常适合该工作：

$ perl -mText::CSV_XS -WlanE '
    BEGIN {our $csv = Text::CSV_XS->new;} 
    $csv->parse($_); 
    my @fields = $csv->fields(); 
    print if $fields[4] =~ /string 1/i;
' file1.csv
"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""

-1

awk 'BEGIN {FS = "," }'  '{ (if toupper($5)  == "STRING 1") print; }'  file1.csv > file2.csv

— 波斯湾
source

抱歉地说，您的解决方案没有从文件中返回任何记录...我想只是添加定界符'","'就可以了...谢谢... :)

— Dhruuv

@Mohsen -1，因为1）您需要转义“，否则它们将不被视为文件定界符的一部分。请参见OP另一个问题的答案； 2）您将BEGIN块与命令的其余部分完全分开了。休息就试试awk 'BEGIN {FS = "," }' '{print $0}'，你会看到它不产生任何输出在未来，请测试你的答案，看看他们之前实际工作张贴。

— terdon