如何在两个时间戳之间提取日志

25

我想提取两个时间戳之间的所有日志。有些行可能没有时间戳，但我也想要这些行。简而言之，我希望每一行都落在两个时间戳下。我的日志结构如下：

[2014-04-07 23:59:58] CheckForCallAction [ERROR] Exception caught in +CheckForCallAction :: null
--Checking user--
Post
[2014-04-08 00:00:03] MobileAppRequestFilter [DEBUG] Action requested checkforcall

假设我要提取2014-04-07 23:00和之间的所有内容2014-04-08 02:00。

请注意，日志中可能没有开始时间戳或结束时间戳，但是我希望这两个时间戳之间的每一行。

— 阿米特
source

可能的重复stackoverflow.com/questions/7575267/...

— 拉梅什

您是否只需要一次或在不同时间以编程方式执行此操作？

— Bratchley 2014年

我问的原因是，如果您知道文字值，则可以执行两个上下文grep（一个在起始定界符之后抓取所有内容，另一个在结束定界符后停止打印）。如果日期/时间可以更改，那么您可以通过date -d命令输入用户输入并使用其构建搜索模式，从而轻松地即时生成这些日期/时间。

— Bratchley 2014年

@Ramesh，引用的问题太广泛了。

— maxschlepzig 2014年

@JoelDavis：我想以编程方式进行操作。因此，每次我只需要输入所需的时间戳以提取我/ tmp位置中这些时间戳之间的日志即可。

— 阿米特

19

您可以awk为此使用：

$ awk -F'[]]|[[]' \
  '$0 ~ /^\[/ && $2 >= "2014-04-07 23:00" { p=1 }
   $0 ~ /^\[/ && $2 >= "2014-04-08 02:00" { p=0 }
                                        p { print $0 }' log

哪里：

-F使用正则表达式指定字符[并]作为字段分隔符
$0 引用整行
$2 引用日期字段
p 用作保护实际打印的布尔变量
$0 ~ /regex/ 如果正则表达式匹配则为true $0
>=用于按字典顺序比较字符串（等同于strcmp()）

变化

上面的命令行实现了右打开时间间隔匹配。要获得封闭间隔语义，只需增加正确的日期，例如：

$ awk -F'[]]|[[]' \
  '$0 ~ /^\[/ && $2 >= "2014-04-07 23:00"    { p=1 }
   $0 ~ /^\[/ && $2 >= "2014-04-08 02:00:01" { p=0 }
                                           p { print $0 }' log

如果要以其他格式匹配时间戳，则必须修改$0 ~ /^\[/子表达式。请注意，它过去会忽略打印开/关逻辑没有任何时间戳的行。

例如，对于时间戳格式YYYY-MM-DD HH24:MI:SS（例如，不带花[]括号），您可以这样修改命令：

$ awk \
  '$0 ~ /^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-2][0-9]:[0-5][0-9]:[0-5][0-9]/
      {
        if ($1" "$2 >= "2014-04-07 23:00")     p=1;
        if ($1" "$2 >= "2014-04-08 02:00:01")  p=0;
      }
    p { print $0 }' log

（请注意，字段分隔符也已更改-默认为空白/非空白过渡）

— Maxschlepzig
source

感谢共享脚本，但不检查结束时间戳。请检查一下。还让我知道如果我有2014-04-07 23:59:58这样的日志该怎么办。我的意思是没有括号

— Amit

@Amit，更新了答案

— maxschlepzig 2014年

尽管我不认为这是一个字符串问题（请参见我的回答），但您可以通过不重复所有测试来提高可读性，并可能更快一些：

$1 ~ /^[0-9]{4}-[0-9]{2}-[0-9]{2}/ && $2 ~/[0-2][0-9]:[0-5][0-9]:[0-5][0-9]/ { Time = $1" "$2; if (Time >= "2014-04-07 23:00" ) { p=1 } if (Time >= "2014-04-08 02:00:01" ) { p=0 } }  p

嗨，马克斯，还有一个小疑问。如果我有类似Apr-07-2014 10:51:17的东西。然后我需要做什么更改。.我尝试了code$ 0〜/ ^ [az | AZ] {4}-[0-9] {2}-[0-9] {4} [0-2] [0-9 ]：[0-5] [0-9]：[0-5] [0-9] / && $ 1“” $ 2> =“ Apr-07-2014 11:00” {p = 1} $ 0〜/ ^ [az | AZ] {4}-[0-9] {2}-[0-9] {4} [0-2] [0-9]：[0-5] [0-9]：[0 -5] [0-9] / && $ 1“” $ 2> =“ Apr-07-2014 12:00:01” {p = 0} code但不起作用

— Amit

@awk_FTW，更改了代码，以便显式共享正则表达式。

— maxschlepzig 2014年

12

dategrep在https://github.com/mdom/dategrep上查看

描述：

dategrep在命名的输入文件中搜索与日期范围匹配的行，并将其打印到stdout。

如果dategrep适用于可搜索的文件，则可以执行二进制搜索以找到要高效打印的第一行和最后一行。如果一个文件名参数只是一个连字符，dategrep也可以从stdin读取，但是在这种情况下，它必须解析每一行，这会更慢。

用法示例：

dategrep --start "12:00" --end "12:15" --format "%b %d %H:%M:%S" syslog
dategrep --end "12:15" --format "%b %d %H:%M:%S" syslog
dategrep --last-minutes 5 --format "%b %d %H:%M:%S" syslog
dategrep --last-minutes 5 --format rsyslog syslog
cat syslog | dategrep --end "12:15" -

尽管此限制可能使它不适合您的确切问题：

此刻，dategrep一旦找到无法解析的行，就会死掉。在将来的版本中，这将是可配置的。

— cpugeniusmv
source

我是几天前才得到onethingwell.org/post/81991115668/dategrep的帮助，所以对他很了解！

— cpugeniusmv 2014年

3

一种替代awk或非标准的工具是使用GNU grep进行上下文抓取。GNU grep将允许您指定要打印的正匹配之后-A和要打印的前-B几行之后的行数，例如：

[davisja5@xxxxxxlp01 ~]$ cat test.txt
Ignore this line, please.
This one too while you're at it...
[2014-04-07 23:59:58] CheckForCallAction [ERROR] Exception caught in +CheckForCallAction :: null
--Checking user--
Post
[2014-04-08 00:00:03] MobileAppRequestFilter [DEBUG] Action requested checkforcall
we don't
want these lines.


[davisja5@xxxxxxlp01 ~]$ egrep "^\[2014-04-07 23:59:58\]" test.txt -A 10000 | egrep "^\[2014-04-08 00:00:03\]" -B 10000
[2014-04-07 23:59:58] CheckForCallAction [ERROR] Exception caught in +CheckForCallAction :: null
--Checking user--
Post
[2014-04-08 00:00:03] MobileAppRequestFilter [DEBUG] Action requested checkforcall

上面的代码基本上告诉grep您在与您要开始的模式匹配的行之后打印10,000行，有效地使输出从您想要的位置开始，一直到结束（希望），而第二egrep行管道告诉它仅打印带有结束定界符的行以及之前的10,000行。这两个的最终结果是从您想要的地方开始，而不是您告诉它停止的地方过去。

10,000只是我想出的数字，如果您认为输出太长，可以随意将其更改为一百万。

— 布拉奇利
source

如果没有开始和结束范围的日志条目，这将如何工作？如果OP希望在14:00和15:00之间进行所有操作，但是14:00没有日志条目，那么？

它将sed搜索以及正在搜索文字匹配的。dategrep是给出的所有答案中最正确的答案（因为您需要能够对要接受的时间戳“模糊”），但是就像答案所说的那样，我只是在提及它。也就是说，如果日志足够活跃以生成足够的输出以保证剪切，则在给定的时间段内可能还会有某种输入。

— Bratchley 2014年

0

使用sed：

#!/bin/bash

E_BADARGS=23

if [ $# -ne "3" ]
then
  echo "Usage: `basename $0` \"<start_date>\" \"<end_date>\" file"
  echo "NOTE:Make sure to put dates in between double quotes"
  exit $E_BADARGS
fi 

isDatePresent(){
        #check if given date exists in file.
        local date=$1
        local file=$2
        grep -q "$date" "$file"
        return $?

}

convertToEpoch(){
    #converts to epoch time
    local _date=$1
    local epoch_date=`date --date="$_date" +%s`
    echo $epoch_date
}

convertFromEpoch(){
    #converts to date/time format from epoch
    local epoch_date=$1
    local _date=`date  --date="@$epoch_date" +"%F %T"`
    echo $_date

}

getDates(){
        # collects all dates at beginning of lines in a file, converts them to epoch and returns a sequence of numbers
        local file="$1"
        local state="$2"
        local i=0
        local date_array=( )
        if [[ "$state" -eq "S" ]];then
            datelist=`cat "$file" | sed -r -e "s/^\[([^\[]+)\].*/\1/" | egrep  "^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}"`
        elif [[ "$state" -eq "E" ]];then
            datelist=`tac "$file" | sed -r -e "s/^\[([^\[]+)\].*/\1/" | egrep  "^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}"`

        else
            echo "Something went wrong while getting dates..." 1>&2
            exit 500
        fi

        while read _date
            do
                epoch_date=`convertToEpoch "$_date"`
                date_array[$i]=$epoch_date
                #echo "$_date" "$epoch_date" 1>&2

            (( i++ ))
            done<<<"$datelist"
        echo ${date_array[@]}   


}

findneighbours(){
    # search next best date if date is not in the file using recursivity
    IFS="$old_IFS"
    local elt=$1
    shift
    local state="$1"
    shift
    local -a array=( "$@" ) 

    index_pivot=`expr ${#array[@]} / 2`
    echo "#array="${#array[@]} ";array="${array[@]} ";index_pivot="$index_pivot 1>&2
    if [ "$index_pivot" -eq 1 -a ${#array[@]} -eq 2 ];then

        if [ "$state" == "E" ];then
            echo ${array[0]}
        elif [ "$state" == "S" ];then
            echo ${array[(( ${#array[@]} - 1 ))]} 
        else
            echo "State" $state "undefined" 1>&2
            exit 100
        fi

    else
        echo "elt with index_pivot="$index_pivot":"${array[$index_pivot]} 1>&2
        if [ $elt -lt ${array[$index_pivot]} ];then
            echo "elt is smaller than pivot" 1>&2
            array=( ${array[@]:0:(($index_pivot + 1)) } )
        else
            echo "elt is bigger than pivot" 1>&2
            array=( ${array[@]:$index_pivot:(( ${#array[@]} - 1 ))} ) 
        fi
        findneighbours "$elt" "$state" "${array[@]}"
    fi
}



findFirstDate(){
    local file="$1"
    echo "Looking for first date in file" 1>&2
    while read line
        do 
            echo "$line" | egrep -q "^\[[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\]" &>/dev/null
            if [ "$?" -eq "0" ]
            then
                #echo "line=" "$line" 1>&2
                firstdate=`echo "$line" | sed -r -e "s/^\[([^\[]+)\].*/\1/"`
                echo "$firstdate"
                break
            else
                echo $? 1>&2
            fi
        done< <( cat "$file" )



}

findLastDate(){
    local file="$1"
    echo "Looking for last date in file" 1>&2
    while read line
        do 
            echo "$line" | egrep -q "^\[[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\]" &>/dev/null
            if [ "$?" -eq "0" ]
            then
                #echo "line=" "$line" 1>&2
                lastdate=`echo "$line" | sed -r -e "s/^\[([^\[]+)\].*/\1/"`
                echo "$lastdate"
                break
            else
                echo $? 1>&2
            fi
        done< <( tac "$file" )


}

findBestDate(){

        IFS="$old_IFS"
        local initdate="$1"
        local file="$2"
        local state="$3"
        local first_elts="$4"
        local last_elts="$5"
        local date_array=( )
        local initdate_epoch=`convertToEpoch "$initdate"`   

        if [[ $initdate_epoch -lt $first_elt ]];then
            echo `convertFromEpoch "$first_elt"`
        elif [[ $initdate_epoch -gt $last_elt ]];then
            echo `convertFromEpoch "$last_elt"` 

        else
            date_array=( `getDates "$file" "$state"` )
            echo "date_array="${date_array[@]} 1>&2
            #first_elt=${date_array[0]}
            #last_elt=${date_array[(( ${#date_array[@]} - 1 ))]}

            echo `convertFromEpoch $(findneighbours "$initdate_epoch" "$state" "${date_array[@]}")`

        fi

}


main(){
    init_date_start="$1"
    init_date_end="$2"
    filename="$3"
    echo "problem start.." 1>&2
    date_array=( "$init_date_start","$init_date_end"  )
    flag_array=( 0 0 )
    i=0
    #echo "$IFS" | cat -vte
    old_IFS="$IFS"
    #changing separator to avoid whitespace issue in date/time format
    IFS=,
    for _date in ${date_array[@]}
    do
        #IFS="$old_IFS"
        #echo "$IFS" | cat -vte
        if isDatePresent "$_date" "$filename";then
            if [ "$i" -eq 0 ];then 
                echo "Starting date exists" 1>&2
                #echo "date_start=""$_date" 1>&2
                date_start="$_date"
            else
                echo "Ending date exists" 1>&2
                #echo "date_end=""$_date" 1>&2
                date_end="$_date"
            fi

        else
            if [ "$i" -eq 0 ];then 
                echo "start date $_date not found" 1>&2
            else
                echo "end date $_date not found" 1>&2
            fi
            flag_array[$i]=1
        fi
        #IFS=,
        (( i++ ))
    done

    IFS="$old_IFS"
    if [ ${flag_array[0]} -eq 1 -o ${flag_array[1]} -eq 1 ];then

        first_elt=`convertToEpoch "$(findFirstDate "$filename")"`
        last_elt=`convertToEpoch "$(findLastDate "$filename")"`
        border_dates_array=( "$first_elt","$last_elt" )

        #echo "first_elt=" $first_elt "last_elt=" $last_elt 1>&2
        i=0
        IFS=,
        for _date in ${date_array[@]}
        do
            if [ $i -eq 0 -a ${flag_array[$i]} -eq 1 ];then
                date_start=`findBestDate "$_date" "$filename" "S" "${border_dates_array[@]}"`
            elif [ $i -eq 1 -a ${flag_array[$i]} -eq 1 ];then
                date_end=`findBestDate "$_date" "$filename" "E" "${border_dates_array[@]}"`
            fi

            (( i++ ))
        done
    fi


    sed -r -n "/^\[${date_start}\]/,/^\[${date_end}\]/p" "$filename"

}


main "$1" "$2" "$3"

将其复制到文件中。如果您不想看到调试信息，则将调试发送到stderr，因此只需添加“ 2> / dev / null”

— UnX
source

1

这不会显示没有时间戳的日志文件。

— 阿米特

@Amit，是的，您尝试过吗？

— UnX 2014年

@rMistero，它将不起作用，因为如果在22:30没有日志条目，则范围不会终止。如OP所述，开始和停止时间可能不在日志中。你可以调整你的正则表达式，它的工作，但你会失去分辨率，我们无法保证提前的范围将在适当的时候终止。

@awk_FTW这是一个示例，我没有使用Amit提供的时间戳。再次可以使用正则表达式。我同意认为，如果明确提供时间戳或不存在时间戳正则表达式匹配项，则如果时间戳不存在，它将不起作用。我会尽快改善的

— 。.– UnX

“如OP所述，开始和停止时间可能不在日志中。” 不，请再次阅读操作说明。OP表示将存在这些标记，但是中间的行不一定以时间戳开头。说停止时间可能不存在甚至没有意义。如果不能保证终止标记在那里，您怎么能告诉任何工具在哪里停止？没有任何标准可以使该工具告诉它在哪里停止处理。

— Bratchley 2014年