如何在机器学习中处理分层/嵌套数据
我将用一个例子来解释我的问题。假设您要根据以下属性预测个人的收入:{年龄,性别,国家/地区,城市}。你有一个像这样的训练数据集 train <- data.frame(CountryID=c(1,1,1,1, 2,2,2,2, 3,3,3,3), RegionID=c(1,1,1,2, 3,3,4,4, 5,5,5,5), CityID=c(1,1,2,3, 4,5,6,6, 7,7,7,8), Age=c(23,48,62,63, 25,41,45,19, 37,41,31,50), Gender=factor(c("M","F","M","F", "M","F","M","F", "F","F","F","M")), Income=c(31,42,71,65, 50,51,101,38, 47,50,55,23)) train CountryID RegionID CityID Age Gender Income 1 1 1 1 23 M 31 2 1 1 1 48 F 42 3 1 1 2 62 M 71 4 …