我试图输出一个字符串,其中包含一个字符串的两个单词之间的所有内容:
输入:
"Here is a String"
输出:
"is a"
使用:
sed -n '/Here/,/String/p'
包括端点,但我不想包含它们。
sed
常见问题是“如何提取特定行之间的文本”;这是stackoverflow.com/questions/16643288/...
我试图输出一个字符串,其中包含一个字符串的两个单词之间的所有内容:
输入:
"Here is a String"
输出:
"is a"
使用:
sed -n '/Here/,/String/p'
包括端点,但我不想包含它们。
sed
常见问题是“如何提取特定行之间的文本”;这是stackoverflow.com/questions/16643288/...
Answers:
sed -e 's/Here\(.*\)String/\1/'
echo "Here is a one is a String" | sed -e 's/one is\(.*\)String/\1/'
。如果你只是想之间的部分“一个是”和“字符串”,那么你需要做的正则表达式的整条生产线相匹配:sed -e 's/.*one is\(.*\)String.*/\1/'
。在sed中,s/pattern/replacement/
说“在每行上用'替换'替换'样式'”。它只会更改与“ pattern”匹配的任何内容,因此,如果要替换整行,则需要使“ pattern”与整行匹配。
Here is a String Here is a String
GNU grep还可以支持正面和负面的提前和回溯:对于您的情况,命令为:
echo "Here is a string" | grep -o -P '(?<=Here).*(?=string)'
如果有多次出现Here
并且string
,你可以选择你是否想从第一场比赛Here
和最后的string
或单独匹配。在正则表达式的方面,它被称为贪婪匹配(第一情况)或非贪婪匹配(第二种情况)
$ echo 'Here is a string, and Here is another string.' | grep -oP '(?<=Here).*(?=string)' # Greedy match
is a string, and Here is another
$ echo 'Here is a string, and Here is another string.' | grep -oP '(?<=Here).*?(?=string)' # Non-greedy match (Notice the '?' after '*' in .*)
is a
is another
-P
选项grep
在* BSD或任何SVR4(Solaris等)随附的选项中不存在。在FreeBSD中,您可以安装devel/pcre
包含的端口,该端口pcregrep
支持PCRE(和向前/向后)。OSX的较早版本使用GNU grep,但在OSX Mavericks中,-P
是从FreeBSD的版本衍生而来,该版本不包含该选项。
Here is a string a string
,既 " is a "
和" is a string a "
有效的答案(忽略引号),按问题的要求。这取决于您要选择哪一个,然后答案可能会有所不同。无论如何,对于您的要求,它将起作用:echo "Here is a string a string" | grep -o -P '(?<=Here).*?(?=string)'
接受的答案不会删除之前Here
或之后的文本String
。这将:
sed -e 's/.*Here\(.*\)String.*/\1/'
主要区别是.*
紧接在之前Here
和之后String
。
.
与换行符不匹配。如果你想匹配换行符,可以更换.
喜欢的东西[\s\s]
。
您可以单独在Bash中剥离字符串:
$ foo="Here is a String"
$ foo=${foo##*Here }
$ echo "$foo"
is a String
$ foo=${foo%% String*}
$ echo "$foo"
is a
$
如果您有一个包含PCRE的GNU grep,则可以使用零宽度的断言:
$ echo "Here is a String" | grep -Po '(?<=(Here )).*(?= String)'
is a
通过GNU awk,
$ echo "Here is a string" | awk -v FS="(Here|string)" '{print $2}'
is a
具有-P
(perl-regexp)参数的grep 支持\K
,它有助于丢弃先前匹配的字符。在我们的例子中,先前匹配的字符串被Here
删除,因此从最终输出中将其丢弃。
$ echo "Here is a string" | grep -oP 'Here\K.*(?=string)'
is a
$ echo "Here is a string" | grep -oP 'Here\K(?:(?!string).)*'
is a
如果您想要输出,is a
那么可以尝试以下方法,
$ echo "Here is a string" | grep -oP 'Here\s*\K.*(?=\s+string)'
is a
$ echo "Here is a string" | grep -oP 'Here\s*\K(?:(?!\s+string).)*'
is a
echo "Here is a string dfdsf Here is a string" | awk -v FS="(Here|string)" '{print $2}'
,它仅返回is a
而不是is a is a
@Avinash Raj
如果您的文件较长且包含多行,则首先打印数字行会很有用:
cat -n file | sed -n '/Here/,/String/p'
-n
in中的选项cat
。
cat
可以完全省略;sed
知道如何读取文件或标准输入。
要了解 sed
命令,我们必须逐步构建它。
这是你的原文
user@linux:~$ echo "Here is a String"
Here is a String
user@linux:~$
让我们尝试Here
使用s
ubstition选项删除字符串sed
user@linux:~$ echo "Here is a String" | sed 's/Here //'
is a String
user@linux:~$
在这一点上,我相信你将能够去除String
以及
user@linux:~$ echo "Here is a String" | sed 's/String//'
Here is a
user@linux:~$
但这不是您想要的输出。
要组合两个sed命令,请使用-e
option
user@linux:~$ echo "Here is a String" | sed -e 's/Here //' -e 's/String//'
is a
user@linux:~$
希望这可以帮助
您可以使用\1
(请参阅http://www.grymoire.com/Unix/Sed.html#uh-4):
echo "Hello is a String" | sed 's/Hello\(.*\)String/\1/g'
括号内的内容将存储为\1
。
问题。 我存储的Claws Mail消息包装如下,并且我试图提取“主题”行:
Subject: [SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular
link in major cell growth pathway: Findings point to new potential
therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is
Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as
a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway
identified [Lysosomal amino acid transporter SLC38A9 signals arginine
sufficiency to mTORC1]]
Message-ID: <20171019190902.18741771@VictoriasJourney.com>
在此线程中的A2中,如何使用sed / grep提取两个单词之间的文本?只要匹配的文本不包含换行符,下面的第一个表达式就可以“起作用”:
grep -o -P '(?<=Subject: ).*(?=molecular)' corpus/01
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key
但是,尽管尝试了多种变体(.+?; /s; ...
),但我无法使它们起作用:
grep -o -P '(?<=Subject: ).*(?=link)' corpus/01
grep -o -P '(?<=Subject: ).*(?=therapeutic)' corpus/01
etc.
解决方案1。
Per 在不同行的两个字符串之间提取文本
sed -n '/Subject: /{:a;N;/Message-ID:/!ba; s/\n/ /g; s/\s\s*/ /g; s/.*Subject: \|Message-ID:.*//g;p}' corpus/01
这使
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]
解决方案2. *
sed ':a;N;$!ba;s/\n/ /g' corpus/01
将用空格替换换行符。
如何使用sed / grep在两个单词之间提取文本,将其与A2链接起来?,我们得到:
sed ':a;N;$!ba;s/\n/ /g' corpus/01 | grep -o -P '(?<=Subject: ).*(?=Message-ID:)'
这使
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]
此变体删除双倍空格:
sed ':a;N;$!ba;s/\n/ /g; s/\s\s*/ /g' corpus/01 | grep -o -P '(?<=Subject: ).*(?=Message-ID:)'
给予
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]
Here is a Here String
什么,结果应该是什么?还是I Hereby Dub Thee Sir Stringy
?