GBM中的交互深度是什么意思?


30

我对R中gbm中的交互深度参数有一个疑问。这可能是一个菜鸟问题,对此我深表歉意,但是我认为该参数表示树中终端节点的数量基本上是X方向的,预测变量之间的相互作用?只是想了解它是如何工作的。另外,如果我有一个数据集,其中包含两个不同的因子变量,而同一个数据集,则我得到的模型就完全不同了,除了将这两个因子变量合并为一个因子(例如,因子1中的X级别,因子2中的Y级别,合并变量具有X * Y因子)。后者比前者更具预测性。我曾以为,增加互动深度会促进这种关系。

Answers:


22

先前的两个答案都是错误的。软件包GBM使用interaction.depth参数作为它必须在树上执行的多个拆分(从单个节点开始)。由于每个分割为2(节点3和终端节点的数目增加的节点的总数 {左节点,右节点,NA节点})树中的节点的总数将是3 * Ñ + 1和数量终端节点2 * N +1的个数。可以通过查看函数的输出来验证这一点。3N+12N+1pretty.gbm.tree

这种行为相当容易引起误解,因为用户确实希望深度是生成树的深度。它不是。


这里的N是多少:节点数,interaction.depth或其他内容?
朱利安(Julian)

它是从单个节点(也称为交互深度)开始执行的多个拆分。
随机

1
我认为每个拆分只会使终端节点的总数增加1。因此,假设一棵树只有一个拆分,那么它就有2个终端节点,现在您在先前的一个终端节点上执行拆分,然后有3个终端现在的节点。因此增量仅为1。我正确吗?还是我误解了某些东西?
莉莉·隆

1
@LilyLong可能还不清楚,但是gbm实际上将节点分为三个,第三个子节点将NA值分组(即那些不能直接与给定值比较的子节点)。这意味着每个拆分将节点数增加两个。自从我上次使用它以来,该软件包可能已经发生了演变,以避免创建第三个子节点,因此请通过运行pretty.gbm.tree函数仔细检查。
随机

2

我对R中gbm中的交互深度参数有一个疑问。这可能是一个菜鸟问题,对此我深表歉意,但是我认为该参数表示树中终端节点的数量基本上是X方向的,预测变量之间的相互作用?

交互深度与终端节点数量之间的联系

一看interaction.depth作为分裂节点的数量。一个interaction.depth固定在k将导致具有k + 1个节点 的终端节点 (omitting the NA nodes), so we have :

interaction.depth=#{TerminalNodes}+1

Link between interaction.depth and the interaction order

The link between interaction.depth and interaction order is more tedious.

Instead of reasoning with the interaction.depth, let's reason with the number of terminal nodes, which we will called J.

Example: Let's say you have J=4 terminal nodes (interaction.depth=3) you can either :

  1. do the first split on the root, then the second split on the left node of the root and the third split on the right node of the root. The interaction order for this tree will be 2.
  2. do the first split on the root, then the second split on the left (respectively right) node of the root, and a third split on this very left (respectively right) node. The interaction order for this tree will be 3.

So you cannot know in advance what will be the interaction order between your features in a given tree. However it is possible to upper bound this value. Let P be the interaction order of the features in a given tree. We have :

Pmin(J1,n)
with n being the number of observations. For more details see the section 7 of the original article of Friedman.

1

Previous answer is not correct.

Stumps will have an interaction.depth of 1 (and have two leaves). But interaction.depth=2 gives three leaves.

So: NumberOfLeaves = interaction.depth + 1


0

Actually, the previous answers are incorrect.

Let K be the interaction.depth, then the number of nodes N and leaves L (i.e terminal nodes) are respectively given by the following:

N=2(K+1)1L=2K
The previous 2 formulas can easily be demonstrated: a tree of depth K can be seen as having K+1 levels k ranging from 0 (root level) to K (leaf level).

Each of these levels has 2k nodes. And the tree's total number of nodes is the sum of the number of nodes at each level.

In mathematical terms:

N=k=0K2k)

which is equivalent to:

N=2(K+1)1
(as per the formula of the sum of the terms of a geometrical progression).

0

You can try

table(predict(gbm( y ~.,data=TrainingData, distribution="gaussian", verbose =FALSE, n.trees =1 , shrinkage =0.01, bag.fraction =1 , interaction.depth = 1 ),n.trees=1))

and see that there are only 2 unique predicted values. interaction.depth = 2 will get you 3 distinct predicted values. And convince yourself.


Not clear how this answers the question.
Michael R. Chernick
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.