计算移动平均

185

我正在尝试使用R计算矩阵中一系列值的移动平均值。但是，普通的R邮件列表搜索并不是很有帮助。R中似乎没有内置函数可以让我计算移动平均值。有任何包装提供吗？还是我需要自己写？

r moving-average r-faq

— 杰瑞德
source

140

动物园中的滚动平均值/最大值/中位数套餐（rollmean）
TTR中的移动平均
马中预测

— f3lix
source

1

不包含给定时间戳的未来值的R中的移动平均值是多少？我检查了一下forecast::ma，它包含了所有街区，不对。

— hhh

213

或者您可以使用过滤器简单地计算它，这是我使用的功能：

ma <- function(x, n = 5){filter(x, rep(1 / n, n), sides = 2)}

如果使用dplyr，请小心stats::filter在以上功能中指定。

— 马蒂·帕斯特（Matti Pastell）
source

49

我应该指出，在许多人不想忽视的用例中，“ sides = 2”可能是一个重要的选择。如果只想在移动平均值中显示尾随信息，则应使用sides = 1。

— evanrsparks 2012年

35

几年后，但dplyr现在具有过滤器功能，如果您已加载此软件包，请使用stats::filter

— blmoore 2015年

sides = 2相当于zoo :: rollmean或RcppRoll :: roll_mean的align =“ center”。sides = 1等同于“正确”对齐。我没有办法进行“左”对齐或使用“部分”数据（2个或更多值）进行计算？

— 马特L.17年

29

使用cumsum应该足够有效。假设您有一个向量x，并且想要n个数字的连续和

cx <- c(0,cumsum(x))
rsum <- (cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]) / n

如@mzuther的注释所指出的那样，这假定数据中没有NA。要处理这些问题，需要将每个窗口除以非NA值的数量。这是一种实现方法，其中包含@Ricardo Cruz的评论：

cx <- c(0, cumsum(ifelse(is.na(x), 0, x)))
cn <- c(0, cumsum(ifelse(is.na(x), 0, 1)))
rx <- cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]
rn <- cn[(n+1):length(cx)] - cn[1:(length(cx) - n)]
rsum <- rx / rn

仍然存在以下问题：如果窗口中的所有值均为NA，则将存在除以零的错误。

— fish鱼
source

8

该解决方案的一个缺点是它无法处理遗漏：cumsum(c(1:3,NA,1:3))

— Jthorpe

您可以轻松地使其处理NA cx <- c(0, cumsum(ifelse(is.na(x), 0, x)))。

— 里卡多·克鲁兹

@Ricardo Cruz：最好删除NA并相应地调整向量长度。考虑一个具有很多NA的向量-零将平均值拉向零，而删除NA将使平均值保持不变。当然，这完全取决于您的数据和您要回答的问题。:)

— mzuther

@mzuther，我根据您的评论更新了答案。感谢您的输入。我认为处理丢失数据的正确方法不是扩展窗口（通过删除NA值），而是通过使用正确的分母平均每个窗口。

— pipefish '18

1

rn <-cn [（n + 1）：length（cx）]-cx [1：（length（cx）-n）]实际上应该是rn <-cn [（n + 1）：length（cx）]- cn [1：（length（cx）-n）]

— adrianmcmenamin

22

在data.table 1.12.0中，frollmean添加了新功能来计算快速而精确的滚动，这意味着要谨慎处理NA，NaN并+Inf，-Inf值。

由于问题中没有可复制的示例，因此这里没有更多要解决的问题。

您可以找到有关的更多信息 ?frollmean在手册中，也可以在上在线获得?frollmean。

以下手册中的示例：

library(data.table)
d = as.data.table(list(1:6/2, 3:8/4))

# rollmean of single vector and single window
frollmean(d[, V1], 3)

# multiple columns at once
frollmean(d, 3)

# multiple windows at once
frollmean(d[, .(V1)], c(3, 4))

# multiple columns and multiple windows at once
frollmean(d, c(3, 4))

## three above are embarrassingly parallel using openmp

— 詹戈里基
source

10

该caTools包装具有非常快的滚动平均值/最小值/最大值/标准偏差和其他一些功能。我只使用过，runmean并且runsd它们是迄今为止提到的其他所有软件包中最快的。

— 埃迪
source

1

这太棒了！它是唯一以一种美观，简单的方式执行此操作的函数。现在是2018年...

— Felipe Gerard '18

9

您可以使用RcppRollC ++编写的快速移动平均值。只需调用该roll_mean函数即可。可以在这里找到文档。

否则，这个（较慢的）for循环应该可以解决问题：

ma <- function(arr, n=15){
  res = arr
  for(i in n:length(arr)){
    res[i] = mean(arr[(i-n):i])
  }
  res
}

— cantdutchthis
source

3

您能否详细解释一下我，该算法如何工作？因为我无法理解这个想法

— Daniel Yefimov

首先，他使用初始化一个长度相同的向量res = arr。然后有一个循环，从n第15个元素开始迭代，直到数组的结尾。这意味着他取平均值的第一个子集arr[1:15]填补了位置res[15]。现在，我更喜欢设置res = rep(NA, length(arr))而不是res = arr让每个res[1:14]等于NA的元素而不是等于一个数字的数字，在这里我们不能取15个元素的全部平均值。

— 埃文·弗里德兰德

7

其实RcppRoll很好。

由cantdutchthis发布的代码必须在固定于窗口的第四行中更正：

ma <- function(arr, n=15){
  res = arr
  for(i in n:length(arr)){
    res[i] = mean(arr[(i-n+1):i])
  }
  res
}

这里给出了另一种处理丢失的方法。

第三种方法，改进cantdutchthis代码以计算是否计算局部平均值，如下所示：

  ma <- function(x, n=2,parcial=TRUE){
  res = x #set the first values

  if (parcial==TRUE){
    for(i in 1:length(x)){
      t<-max(i-n+1,1)
      res[i] = mean(x[t:i])
    }
    res

  }else{
    for(i in 1:length(x)){
      t<-max(i-n+1,1)
      res[i] = mean(x[t:i])
    }
    res[-c(seq(1,n-1,1))] #remove the n-1 first,i.e., res[c(-3,-4,...)]
  }
}

— 罗德里戈·雷梅迪奥
source

5

为了补充cantdutchthis和Rodrigo Remedio的回答 ;

moving_fun <- function(x, w, FUN, ...) {
  # x: a double vector
  # w: the length of the window, i.e., the section of the vector selected to apply FUN
  # FUN: a function that takes a vector and return a summarize value, e.g., mean, sum, etc.
  # Given a double type vector apply a FUN over a moving window from left to the right, 
  #    when a window boundary is not a legal section, i.e. lower_bound and i (upper bound) 
  #    are not contained in the length of the vector, return a NA_real_
  if (w < 1) {
    stop("The length of the window 'w' must be greater than 0")
  }
  output <- x
  for (i in 1:length(x)) {
     # plus 1 because the index is inclusive with the upper_bound 'i'
    lower_bound <- i - w + 1
    if (lower_bound < 1) {
      output[i] <- NA_real_
    } else {
      output[i] <- FUN(x[lower_bound:i, ...])
    }
  }
  output
}

# example
v <- seq(1:10)

# compute a MA(2)
moving_fun(v, 2, mean)

# compute moving sum of two periods
moving_fun(v, 2, sum)

— 克里斯托瓦尔·阿尔卡萨（CristóbalAlcázar）
source

2

这是示例代码，显示了如何使用zoo软件包中的函数来计算居中移动平均值和尾随移动平均值。rollmean

library(tidyverse)
library(zoo)

some_data = tibble(day = 1:10)
# cma = centered moving average
# tma = trailing moving average
some_data = some_data %>%
    mutate(cma = rollmean(day, k = 3, fill = NA)) %>%
    mutate(tma = rollmean(day, k = 3, fill = NA, align = "right"))
some_data
#> # A tibble: 10 x 3
#>      day   cma   tma
#>    <int> <dbl> <dbl>
#>  1     1    NA    NA
#>  2     2     2    NA
#>  3     3     3     2
#>  4     4     4     3
#>  5     5     5     4
#>  6     6     6     5
#>  7     7     7     6
#>  8     8     8     7
#>  9     9     9     8
#> 10    10    NA     9

— 我喜欢编码
source

1

一个人可以使用runner包装来移动功能。在这种情况下mean_run功能。问题cummean在于它不处理NA值，但是mean_run可以处理。runner程序包还支持不规则的时间序列，并且窗口可以取决于日期：

library(runner)
set.seed(11)
x1 <- rnorm(15)
x2 <- sample(c(rep(NA,5), rnorm(15)), 15, replace = TRUE)
date <- Sys.Date() + cumsum(sample(1:3, 15, replace = TRUE))

mean_run(x1)
#>  [1] -0.5910311 -0.2822184 -0.6936633 -0.8609108 -0.4530308 -0.5332176
#>  [7] -0.2679571 -0.1563477 -0.1440561 -0.2300625 -0.2844599 -0.2897842
#> [13] -0.3858234 -0.3765192 -0.4280809

mean_run(x2, na_rm = TRUE)
#>  [1] -0.18760011 -0.09022066 -0.06543317  0.03906450 -0.12188853 -0.13873536
#>  [7] -0.13873536 -0.14571604 -0.12596067 -0.11116961 -0.09881996 -0.08871569
#> [13] -0.05194292 -0.04699909 -0.05704202

mean_run(x2, na_rm = FALSE )
#>  [1] -0.18760011 -0.09022066 -0.06543317  0.03906450 -0.12188853 -0.13873536
#>  [7]          NA          NA          NA          NA          NA          NA
#> [13]          NA          NA          NA

mean_run(x2, na_rm = TRUE, k = 4)
#>  [1] -0.18760011 -0.09022066 -0.06543317  0.03906450 -0.10546063 -0.16299272
#>  [7] -0.21203756 -0.39209010 -0.13274756 -0.05603811 -0.03894684  0.01103493
#> [13]  0.09609256  0.09738460  0.04740283

mean_run(x2, na_rm = TRUE, k = 4, idx = date)
#> [1] -0.187600111 -0.090220655 -0.004349696  0.168349653 -0.206571573 -0.494335093
#> [7] -0.222969541 -0.187600111 -0.087636571  0.009742884  0.009742884  0.012326968
#> [13]  0.182442234  0.125737145  0.059094786

您还可以指定其他选项，例如lag，并只滚动at特定的索引。软件包和功能文档中的更多内容。

— 戈贡佐
source

0

虽然有点慢，但是您也可以使用zoo :: rollapply对矩阵执行计算。

reqd_ma <- rollapply(x, FUN = mean, width = n)

其中x是数据集，FUN =平均值是函数；您还可以将其更改为最小值，最大值，标准差等，并且宽度是滚动窗口。

— 加里玛古拉蒂
source

1

它不慢；与基数R相比，它要快得多。

set.seed(123); x <- rnorm(1000); system.time(apply(embed(x, 5), 1, mean));  library(zoo); system.time(rollapply(x, 5, mean))

在我的机器上，它是如此之快，以至于返回了0秒的时间。

— G. Grothendieck

0

滑块包可用于此目的。它具有专门设计的感觉类似于purrr的界面。它接受任何任意函数，并且可以返回任何类型的输出。数据帧甚至逐行迭代。pkgdown站点在这里。

library(slider)

x <- 1:3

# Mean of the current value + 1 value before it
# returned as a double vector
slide_dbl(x, ~mean(.x, na.rm = TRUE), .before = 1)
#> [1] 1.0 1.5 2.5


df <- data.frame(x = x, y = x)

# Slide row wise over data frames
slide(df, ~.x, .before = 1)
#> [[1]]
#>   x y
#> 1 1 1
#> 
#> [[2]]
#>   x y
#> 1 1 1
#> 2 2 2
#> 
#> [[3]]
#>   x y
#> 1 2 2
#> 2 3 3

滑块和data.table的开销都frollapply()应该很低（比Zoo快得多）。frollapply()对于此处的这个简单示例，它看起来要快一些，但是请注意，它仅接受数字输入，并且输出必须是标量数字值。滑块功能是完全通用的，您可以返回任何数据类型。

library(slider)
library(zoo)
library(data.table)

x <- 1:50000 + 0L

bench::mark(
  slider = slide_int(x, function(x) 1L, .before = 5, .complete = TRUE),
  zoo = rollapplyr(x, FUN = function(x) 1L, width = 6, fill = NA),
  datatable = frollapply(x, n = 6, FUN = function(x) 1L),
  iterations = 200
)
#> # A tibble: 3 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 slider      19.82ms   26.4ms     38.4    829.8KB     19.0
#> 2 zoo        177.92ms  211.1ms      4.71    17.9MB     24.8
#> 3 datatable    7.78ms   10.9ms     87.9    807.1KB     38.7

— 戴维斯·沃恩（Davis Vaughan）
source