如何在一系列数据中找到局部峰/谷？

16

这是我的实验：

我正在使用quantmod包中的findPeaks函数：

我想在公差5内检测“局部”峰，即时间序列从局部峰下降5以后的第一个位置：

aa=100:1
bb=sin(aa/3)
cc=aa*bb
plot(cc, type="l")
p=findPeaks(cc, 5)
points(p, cc[p])
p

输出是

[1] 3 22 41

这似乎是错误的，因为我预计“本地高峰”会超过3个...

有什么想法吗？

r time-series

— 露娜
source

我没有这个包裹。您能描述所使用的数字程序吗？

— AdamO'2

的完整源代码findPeaks出现在我的回复@Adam中。顺便说一句，包是“ quantmod”。

— ub

Cross发布在R-SIG-Finance上。

— 约书亚·乌尔里希

8

通过在R提示符下键入其名称来获取此代码的源。输出是

function (x, thresh = 0) 
{
    pks <- which(diff(sign(diff(x, na.pad = FALSE)), na.pad = FALSE) < 0) + 2
    if (!missing(thresh)) {
        pks[x[pks - 1] - x[pks] > thresh]
    }
    else pks
}

该测试x[pks - 1] - x[pks] > thresh将每个峰值与该系列中紧随其后的值进行比较（而不是与该系列中的下一个谷值）。它在峰值之后立即使用（粗略的）函数斜率大小的估计，并仅选择斜率超过thresh大小的那些峰。在您的情况下，只有前三个峰足够清晰以通过测试。您将使用默认值检测所有峰：

> findPeaks(cc)
[1]  3 22 41 59 78 96

— ub
source

30

我同意whuber的回答，但只想补充一下代码的“ +2”部分，该部分试图使索引偏移以匹配新发现的峰值实际上“过冲”，应为“ +1”。例如，在当前示例中，我们获得：

> findPeaks(cc)
[1]  3 22 41 59 78 96

当我们在图表上突出显示这些发现的峰（粗体红色）时：

我们看到它们始终与实际峰值相差1点。

结果

pks[x[pks - 1] - x[pks] > thresh]

应该是pks[x[pks] - x[pks + 1] > thresh]或pks[x[pks] - x[pks - 1] > thresh]

大更新

在我自己寻求找到合适的峰发现函数的追求之后，我这样写：

find_peaks <- function (x, m = 3){
    shape <- diff(sign(diff(x, na.pad = FALSE)))
    pks <- sapply(which(shape < 0), FUN = function(i){
       z <- i - m + 1
       z <- ifelse(z > 0, z, 1)
       w <- i + m + 1
       w <- ifelse(w < length(x), w, length(x))
       if(all(x[c(z : i, (i + 2) : w)] <= x[i + 1])) return(i + 1) else return(numeric(0))
    })
     pks <- unlist(pks)
     pks
}

“峰值”定义为局部最大值，其m任一侧的点都小于该最大值。因此，参数越大m，高峰资助程序就越严格。所以：

find_peaks(cc, m = 1)
[1]  2 21 40 58 77 95

该功能还可以用来找到任何串行向量的局部极小x通过find_peaks(-x)。

注意：如果有人需要，我现在将函数放在gitHub上：https : //github.com/stas-g/findPeaks

— 斯塔斯
source

6

Eek：小更新。我必须更改两行代码，即边界（添加-1和+1）才能与Stas_G的功能（在实际数据集中发现太多“额外的峰值”）等效。对任何人的道歉在我的原始帖子中误入歧途。

我已经使用Stas_g的find peaks算法已有一段时间了。由于它的简单性，这对我以后的项目之一很有益。但是，我需要使用它数百万次进行计算，因此我将其重写为Rcpp（请参阅Rcpp软件包）。在简单的测试中，它比R版本快大约6倍。如果有人感兴趣，我已在下面添加了代码。希望我能帮助某人，干杯！

一些小的警告。该函数以与R代码相反的顺序返回峰值索引。它需要一个内部的C ++ Sign函数，该函数包括在内。它尚未完全优化，但是预期不会进一步提高性能。

//This function returns the sign of a given real valued double.
// [[Rcpp::export]]
double signDblCPP (double x){
  double ret = 0;
  if(x > 0){ret = 1;}
  if(x < 0){ret = -1;}
  return(ret);
}

//Tested to be 6x faster(37 us vs 207 us). This operation is done from 200x per layer
//Original R function by Stas_G
// [[Rcpp::export]]
NumericVector findPeaksCPP( NumericVector vY, int m = 3) {
  int sze = vY.size();
  int i = 0;//generic iterator
  int q = 0;//second generic iterator

  int lb = 0;//left bound
  int rb = 0;//right bound

  bool isGreatest = true;//flag to state whether current index is greatest known value

  NumericVector ret(1);
  int pksFound = 0;

  for(i = 0; i < (sze-2); ++i){
    //Find all regions with negative laplacian between neighbors
    //following expression is identical to diff(sign(diff(xV, na.pad = FALSE)))
    if(signDblCPP( vY(i + 2)  - vY( i + 1 ) ) - signDblCPP( vY( i + 1 )  - vY( i ) ) < 0){
      //Now assess all regions with negative laplacian between neighbors...
      lb = i - m - 1;// define left bound of vector
      if(lb < 0){lb = 0;}//ensure our neighbor comparison is bounded by vector length
      rb = i + m + 1;// define right bound of vector
      if(rb >= (sze-2)){rb = (sze-3);}//ensure our neighbor comparison is bounded by vector length
      //Scan through loop and ensure that the neighbors are smaller in magnitude
      for(q = lb; q < rb; ++q){
        if(vY(q) > vY(i+1)){ isGreatest = false; }
      }

      //We have found a peak by our criterion
      if(isGreatest){
        if(pksFound > 0){//Check vector size.
         ret.insert( 0, double(i + 2) );
       }else{
         ret(0) = double(i + 2);
        }
        pksFound = pksFound + 1;
      }else{ // we did not find a peak, reset location is peak max flag.
        isGreatest = true;
      }//End if found peak
    }//End if laplace condition
  }//End loop
  return(ret);
}//End Fn

— 卡西克
source

该for循环似乎有缺陷，@caseyk：for(q = lb; q < rb; ++q){ if(vY(q) > vY(i+1)){ isGreatest = false; } }上次运行循环“胜”，其等效于：isGreatest = vY(rb-1) <= vY(rb)。为了实现该行上方的注释，必须将for循环更改为：for(q = lb; isGreatest && (q < rb); ++q){ isGreatest = (vY(q) <= vY(i+1)) }

— Bernhard Wagner，

嗯自从我编写这段代码以来已经有很长时间了。IIRC直接使用Stas_G函数进行了测试，并保持了完全相同的结果。尽管我确实明白您在说什么，但是我不确定输出会产生什么差异。与我建议/改编的解决方案相比，这对您来说值得一试。

— caseyk

我还应该补充一点，就是我亲自测试了该脚本，大概是100倍（假设这是我的项目中的脚本），并且使用了超过一百万次，并提供了与文献结果完全一致的间接结果。一个特定的测试用例。因此，如果它是“有缺陷的”，那不是“有缺陷的”;）

— caseyk

1

首先：该算法还错误地调用了平坦高原右侧的下降，因为sign(diff(x, na.pad = FALSE)) 它将为0，然后为-1，因此其差异也将为-1。一个简单的解决方法是确保否定条目前面的符号差异不为零，而是正数：

    n <- length(x)
    dx.1 <- sign(diff(x, na.pad = FALSE))
    pks <- which(diff(dx.1, na.pad = FALSE) < 0 & dx.1[-(n-1)] > 0) + 1

第二：算法给出非常局部的结果，例如序列中连续三个连续项的任何一次运行中的“上”后是“下”。如果有人对有噪连续函数的局部最大值感兴趣，那么-可能还有其他更好的东西，但这是我便宜又直接的解决方案

首先使用3个连续点的运行平均值识别峰，以
使数据如此平滑。还应采用上述控制以防止平坦然后掉落。

对于黄土平滑版本，通过将以每个峰值为中心的窗口内的平均值与外部本地项的平均值进行比较来过滤这些候选项。

"myfindPeaks" <- 
function (x, thresh=0.05, span=0.25, lspan=0.05, noisey=TRUE)
{
  n <- length(x)
  y <- x
  mu.y.loc <- y
  if(noisey)
  {
    mu.y.loc <- (x[1:(n-2)] + x[2:(n-1)] + x[3:n])/3
    mu.y.loc <- c(mu.y.loc[1], mu.y.loc, mu.y.loc[n-2])
  }
  y.loess <- loess(x~I(1:n), span=span)
  y <- y.loess[[2]]
  sig.y <- var(y.loess$resid, na.rm=TRUE)^0.5
  DX.1 <- sign(diff(mu.y.loc, na.pad = FALSE))
  pks <- which(diff(DX.1, na.pad = FALSE) < 0 & DX.1[-(n-1)] > 0) + 1
  out <- pks
  if(noisey)
  {
    n.w <- floor(lspan*n/2)
    out <- NULL
    for(pk in pks)
    {
      inner <- (pk-n.w):(pk+n.w)
      outer <- c((pk-2*n.w):(pk-n.w),(pk+2*n.w):(pk+n.w))
      mu.y.outer <- mean(y[outer])
      if(!is.na(mu.y.outer)) 
        if (mean(y[inner])-mu.y.outer > thresh*sig.y) out <- c(out, pk)
    }
  }
  out
}

— 伊兹密里
source

0

的确，该函数还可以标识平稳期的结束，但是我认为还有另一个更简单的解决方法：由于实际峰的第一个差异将产生“ 1”，然后是“ -1”，因此第二个差异将是“ -2”，我们可以直接检查

    pks <- which(diff(sign(diff(x, na.pad = FALSE)), na.pad = FALSE) < 1) + 1

— aloHola94
source

这似乎无法回答问题。

— Michael R. Chernick

0

使用Numpy

ser = np.random.randint(-40, 40, 100) # 100 points
peak = np.where(np.diff(ser) < 0)[0]

要么

double_difference = np.diff(np.sign(np.diff(ser)))
peak = np.where(double_difference == -2)[0]

使用熊猫

ser = pd.Series(np.random.randint(2, 5, 100))
peak_df = ser[(ser.shift(1) < ser) & (ser.shift(-1) < ser)]
peak = peak_df.index

— Faizanur Rahman
source