grep忽略模式

我正在使用cURL从网站中提取URL，如下所示。

curl www.somesite.com | grep "<a href=.*title=" > new.txt

我的new.txt文件如下。

<a href="http://website1.com" title="something">
<a href="http://website1.com" information="something" title="something">
<a href="http://website2.com" title="some_other_thing">
<a href="http://website2.com" information="something" title="something">
<a href="http://websitenotneeded.com" title="something NOTNEEDED">

但是，我只需要提取以下信息。

<a href="http://website1.com" title="something">
<a href="http://website2.com" information="something" title="something">

我试图忽略<a href其中包含信息并且其标题以NOTNEEDED结尾的。

如何修改我的grep语句？

grep

— 拉梅什
source

您在此处显示的输出是否正确？在此示例中，描述它的文字没有意义。

— slm

你不是在找curl www.somesite.com | grep "<a href=.*title=" | grep -v NOTNEEDED > new.txt吗？

— terdon

@terdon，正是我想要的。如果您发布它，我可以接受它作为答案。

— Ramesh 2014年

Ramesh，基本上是@slm的答案。我刚刚对其进行了编辑，因此您可以接受它。

— terdon

哦，是的，我没有意识到管道是如此强大。我已经接受它作为答案。谢谢！

— Ramesh 2014年

Answers:

我没有完全按照您的示例和说明进行操作，但这听起来像是您想要的：

$ grep -v "<a href=.*title=.*NOTNEEDED" sample.txt 
<a href="http://website1.com" title="something">
<a href="http://website1.com" information="something" title="something">
<a href="http://website2.com" title="some_other_thing">
<a href="http://website2.com" information="something" title="something">

因此，对于您的示例：

$ curl www.example.com | grep -v "<a href=.*title=" | grep -v NOTNEEDED > new.txt

— slm
source

我在<a href部分中有一个课程。基本上，我不希望在输出中显示该内容。

— 拉梅什2014年

在grep的手册页说：

-v, --invert-match
    Invert the sense of matching, to select non-matching lines. (-v is specified by POSIX .)

您可以将正则表达式用于多个反转：

grep -v 'red\|green\|blue'

要么

grep -v red | grep -v green | grep -v blue

— 是的就是我的名字
source