随着Stack Overflow的增长,我们开始仔细查看IIS日志以识别有问题的HTTP客户端-诸如流氓网络蜘蛛,设置大页面以每秒刷新一次的用户,写得不好的一次性网络抓取工具,棘手的问题尝试增加页面的用户数以千计的次数,依此类推。
我提出了一些LogParser查询,这些查询可以帮助我们识别指向IIS日志文件时的大多数异常情况。
URL的最高带宽使用率
SELECT top 50 DISTINCT
SUBSTR(TO_LOWERCASE(cs-uri-stem), 0, 55) AS Url,
Count(*) AS Hits,
AVG(sc-bytes) AS AvgBytes,
SUM(sc-bytes) as ServedBytes
FROM {filename}
GROUP BY Url
HAVING Hits >= 20
ORDER BY ServedBytes DESC
网址命中avgbyte -------------------------------------------------- ---- ------- ------- /favicon.ico 16774 522 8756028 /content/img/search.png 15342 446 6842532
URL的热门歌曲
SELECT TOP 100
cs-uri-stem as Url,
COUNT(cs-uri-stem) AS Hits
FROM {filename}
GROUP BY cs-uri-stem
ORDER BY COUNT(cs-uri-stem) DESC
网址点击 -------------------------------------------------- ---- /content/img/sf/vote-arrow-down.png 14076 /content/img/sf/vote-arrow-up.png 14018
IP / User-Agent的最高带宽和点击量
SELECT TOP 30
c-ip as Client,
SUBSTR(cs(User-Agent), 0, 70) as Agent,
Sum(sc-bytes) AS TotalBytes,
Count(*) as Hits
FROM {filename}
group by c-ip, cs(User-Agent)
ORDER BY TotalBytes desc
客户端用户代理的总字节数 ------------- ----------------------------------------------------- -------- --------- ----- 66.249.68.47 Mozilla / 5.0 +(兼容; + Googlebot / 2.1; 135131089 16640 194.90.190.41 omgilibot / 0.3 ++ omgili.com 133805857 6447
IP / User-Agent每小时最高带宽
SELECT TOP 30
TO_STRING(time, 'h') as Hour,
c-ip as Client,
SUBSTR(cs(User-Agent), 0, 70) as Agent,
Sum(sc-bytes) AS TotalBytes,
count(*) as Hits
FROM {filename}
group by c-ip, cs(User-Agent), hour
ORDER BY sum(sc-bytes) desc
小时客户端用户代理的总字节数 -------------- ----------------------------------- ------ -------- ---- 9 194.90.190.41 omgilibot / 0.3 ++ omgili.com 30634860 1549 10 194.90.190.41 omgilibot / 0.3 ++ omgili.com 29070370 1503
IP / User-Agent每小时热门歌曲排行
SELECT TOP 30
TO_STRING(time, 'h') as Hour,
c-ip as Client,
SUBSTR(cs(User-Agent), 0, 70) as Agent,
count(*) as Hits,
Sum(sc-bytes) AS TotalBytes
FROM {filename}
group by c-ip, cs(User-Agent), hour
ORDER BY Hits desc
hr客户端用户代理命中的字节数 -------------- ----------------------------------- ------ ---- -------- 10 194.90.190.41 omgilibot / 0.3 ++ omgili.com 1503 29070370 12 66.249.68.47 Mozilla / 5.0 +(兼容; + Googlebot / 2.1 1363 13186302
{filename}当然是IIS日志文件的路径,例如
c:\working\sologs\u_ex090708.log
我进行了大量的Web搜索,以查找良好的IIS LogParser查询,但很少发现。上面的这5个,极大地帮助我们确定了严重的问题客户。但我想知道-我们还缺少什么?
还有哪些其他方式可以对IIS日志进行切片和切块(最好使用LogParser查询),以将其挖掘为统计异常?您是否在服务器上运行任何良好的IIS LogParser查询?