Answers:
Bash / grep版本:
#!/bin/bash
# string-and-first-word.sh
# Finds a string and the first word of the line that contains that string.
text_file="$1"
shift
for string; do
# Find string in file. Process output one line at a time.
grep "$string" "$text_file" |
while read -r line
do
# Get the first word of the line.
first_word="${line%% *}"
# Remove special characters from the first word.
first_word="${first_word//[^[:alnum:]]/}"
# If the first word is the same as the string, don't print it twice.
if [[ "$string" != "$first_word" ]]; then
echo -ne "$first_word\t"
fi
echo "$string"
done
done
这样称呼它:
./string-and-first-word.sh /path/to/file text thing try Better
输出:
This text
Another thing
It try
Better
Perl进行救援!
#!/usr/bin/perl
use warnings;
use strict;
my $file = shift;
my $regex = join '|', map quotemeta, @ARGV;
$regex = qr/\b($regex)\b/;
open my $IN, '<', $file or die "$file: $!";
while (<$IN>) {
if (my ($match) = /$regex/) {
print my ($first) = /^\S+/g;
if ($match ne $first) {
print "\t$match";
}
print "\n";
}
}
另存为first-plus-word
,运行为
perl first-plus-word file.txt text thing try Better
它根据输入的单词创建一个正则表达式。然后将每一行与正则表达式进行匹配,如果存在匹配项,则打印第一个单词,如果与单词不同,则也打印该单词。
这是awk版本:
awk '
NR==FNR {a[$0]++; next;}
{
gsub(/"/,"",$0);
for (i=1; i<=NF; i++)
if ($i in a) printf "%s\n", i==1? $i : $1"\t"$i;
}
' file2 file1
file2
单词列表在哪里,file1
包含短语。
这是python版本:
#!/usr/bin/env python
from __future__ import print_function
import sys
# List of strings that you want
# to search in the file. Change it
# as you fit necessary. Remember commas
strings = [
'text', 'thing',
'try', 'Better'
]
with open(sys.argv[1]) as input_file:
for line in input_file:
for string in strings:
if string in line:
words = line.strip().split()
print(words[0],end="")
if len(words) > 1:
print("\t",string)
else:
print("")
$> cat input_file.txt
This is a single text line
Another thing
It is better you try again
Better
$> python ./initial_word.py input_file.txt
This text
Another thing
It try
Better
旁注:该脚本python3
兼容,因此您可以使用python2
或来运行它python3
。
尝试这个:
$ sed -En 's/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/p' File
This text
Another thing
It try
Better
如果之前的标签页Better
有问题,请尝试以下操作:
$ sed -En 's/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/; ta; b; :a; s/^\t//; p' File
This text
Another thing
It try
Better
以上已在GNU sed(gsed
在OSX 上称为)上进行了测试。对于BSD sed,可能需要进行一些小的更改。
s/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/
这会寻找一个单词,[[:alnum:]]+
然后是一个空格,[[:space:]]
然后是任何东西.*
,然后是您的单词之一text|thing|try|Better
,然后是任何东西。如果找到该单词,则将其替换为该行上的第一个单词(如果有),一个制表符和匹配的单词。
ta; b; :a; s/^\t//; p
如果替换命令导致替换,这意味着您在一行上找到了一个单词,则该ta
命令告诉sed跳转到label a
。如果不是,则分支(b
)到下一行。 :a
定义标签 因此,如果找到您的一个单词,我们(a)进行替换s/^\t//
,如果有一个单词,则删除前导制表符;以及(b)打印(p
)该行。
一种简单的bash / sed方法:
$ while read w; do sed -nE "s/\"(\S*).*$w.*/\1\t$w/p" file; done < words
This text
Another thing
It try
Better
在while read w; do ...; done < words
将文件中的每一行遍历words
并保存为$w
。默认情况下,-n
makes sed
不打印任何内容。sed
然后,该命令将替换双引号,然后替换非空格(\"(\S*)
,括号用于“捕获”与之匹配\S*
的第一个单词,然后我们可以将其称为\1
),0个或多个字符(.*
),然后我们正在寻找的字词($w
),然后再输入0个或更多字符(.*
)。如果匹配,我们只用第一个单词,一个制表符和$w
(\1\t$w
)替换它,然后打印该行(p
in就是s///p
这样做的)。
这是Ruby版本
str_list = ['text', 'thing', 'try', 'Better']
File.open(ARGV[0]) do |f|
lines = f.readlines
lines.each_with_index do |l, idx|
if l.match(str_list[idx])
l = l.split(' ')
if l.length == 1
puts l[0]
else
puts l[0] + "\t" + str_list[idx]
end
end
end
end
示例文本文件hello.txt
包含
This is a single text line
Another thing
It is better you try again
Better
运行ruby source.rb hello.txt
结果
This text
Another thing
It try
Better