第四章习题,部分题目未给出答案
1.
这个题比较简单,有高中生推导水平的应该不难。
2~3证明题,略
4.
(a)
这个问题问我略困惑,答案怎么直接写出来了,难道不是10%么
(b)
这个答案是(0.1*0.1)/(1*1),所以答案是1%
(c)
其实就是个空间所占比例,所以这题是(0.1**100)*100 = 0.1**98%
(d)
这题答案显而易见啊,而且是指数级别下降
(e)
答案是0.1**(1)、0.1**(1/2)、0.1**(1/3)...0.1**(1/100)
5.
这题在中文版的104页的偏差-方差权衡说的听清楚。
(a)
当贝叶斯决策边界是线性的时候,训练集上当然是QDA效果好,因为拟合的更好。而测试集上是LDA更好,因为更接近实际。
(b)
当贝叶斯决策边界是非线性的时候,QDA在训练集和测试集都比LDA好
(c)
相比于LDA,QDA的预测率变得更好。因为当样本量n提升时,一个*度更高的模型会产生更好的效果,因为方差会被大的样本抵消一点
(d)
不对。因为当样本很少时,QDA会过拟合。
6.
(a)
由公式直接带入p(X)=37.75%
(b)
还是带入上述公式,反求X1为50hours
7.
其实就是贝叶斯公式+中文版书97页公式4-12。。。有点繁琐,最后答案是75.2%
8.
文字题。。当你用K=1的KNN时,在训练集上的错误率是0%,所以测试集上错误率实际是36%。我们当然选逻辑回归啦
9.
参见92页公式4-3。。。就是带入公式而已,第一题是27%,第二题是0.19
10.
(a)
感觉题目里面让我们进行数值和图像描述统计时,大概就三条命令:summary()、pairs()、cor()。不过pairs()在特征很多的时候,跑的真心慢,cor()在使用前也要把定性的变量去掉。
library(ISLR)
summary(Weekly)
pairs(Weekly)
cor(Weekly[, -9])
(b)
attach(Weekly)
glm.fit = glm(Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume, data = Weekly, family = binomial)
summary(glm.fit)
(c)
glm.probs = predict(glm.fit, type = "response")
glm.pred = rep("Down", length(glm.probs))
glm.pred[glm.probs > 0.5] = "Up"
table(glm.pred, Direction)
(d)
train = (Year < 2009)
Weekly.0910 = Weekly[!train, ]
glm.fit = glm(Direction ~ Lag2, data = Weekly, family = binomial, subset = train)
glm.probs = predict(glm.fit, Weekly.0910, type = "response")
glm.pred = rep("Down", length(glm.probs))
glm.pred[glm.probs > 0.5] = "Up"
Direction.0910 = Direction[!train]
table(glm.pred, Direction.0910)
mean(glm.pred == Direction.0910)
(e)
library(MASS)
lda.fit = lda(Direction ~ Lag2, data = Weekly, subset = train)
lda.pred = predict(lda.fit, Weekly.0910)
table(lda.pred$class, Direction.0910)
mean(lda.pred$class == Direction.0910)
(f)
qda.fit = qda(Direction ~ Lag2, data = Weekly, subset = train)
qda.class = predict(qda.fit, Weekly.0910)$class
table(qda.class, Direction.0910)
mean(qda.class == Direction.0910)
(g)
library(class)
train.X = as.matrix(Lag2[train])
test.X = as.matrix(Lag2[!train])
train.Direction = Direction[train]
set.seed(1)
knn.pred = knn(train.X, test.X, train.Direction, k = 1)
table(knn.pred, Direction.0910)
mean(knn.pred == Direction.0910)
(h)
两种方法的准确率一样。。。
(i)
# Logistic regression with Lag2:Lag1
glm.fit = glm(Direction ~ Lag2:Lag1, data = Weekly, family = binomial, subset = train)
glm.probs = predict(glm.fit, Weekly.0910, type = "response")
glm.pred = rep("Down", length(glm.probs))
glm.pred[glm.probs > 0.5] = "Up"
Direction.0910 = Direction[!train]
table(glm.pred, Direction.0910)
mean(glm.pred == Direction.0910)
## [1] 0.5865 # LDA with Lag2 interaction with Lag1
lda.fit = lda(Direction ~ Lag2:Lag1, data = Weekly, subset = train)
lda.pred = predict(lda.fit, Weekly.0910)
mean(lda.pred$class == Direction.0910)
## [1] 0.5769 # QDA with sqrt(abs(Lag2))
qda.fit = qda(Direction ~ Lag2 + sqrt(abs(Lag2)), data = Weekly, subset = train)
qda.class = predict(qda.fit, Weekly.0910)$class
table(qda.class, Direction.0910)
mean(qda.class == Direction.0910)
## [1] 0.5769 # KNN k =10
knn.pred = knn(train.X, test.X, train.Direction, k = 10)
table(knn.pred, Direction.0910)
mean(knn.pred == Direction.0910)
## [1] 0.5769 # KNN k = 100
knn.pred = knn(train.X, test.X, train.Direction, k = 100)
table(knn.pred, Direction.0910)
mean(knn.pred == Direction.0910)
## [1] 0.5577
结果在代码注释中,逻辑回归效果最好
11.
(a)
library(ISLR)
summary(Auto) attach(Auto)
mpg01 = rep(0, length(mpg))
mpg01[mpg > median(mpg)] = 1
Auto = data.frame(Auto, mpg01)
(b)
cor(Auto[, -9])
pairs(Auto)
(c)
train = (year%%2 == 0) # if the year is even
test = !train
Auto.train = Auto[train, ]
Auto.test = Auto[test, ]
mpg01.test = mpg01[test]
(d)
library(MASS)
lda.fit = lda(mpg01 ~ cylinders + weight + displacement + horsepower, data = Auto, subset = train)
lda.pred = predict(lda.fit, Auto.test)
mean(lda.pred$class != mpg01.test)
(e)
qda.fit = qda(mpg01 ~ cylinders + weight + displacement + horsepower, data = Auto, subset = train)
qda.pred = predict(qda.fit, Auto.test)
mean(qda.pred$class != mpg01.test)
(f)
glm.fit = glm(mpg01 ~ cylinders + weight + displacement + horsepower, data = Auto, family = binomial, subset = train)
glm.probs = predict(glm.fit, Auto.test, type = "response")
glm.pred = rep(0, length(glm.probs))
glm.pred[glm.probs > 0.5] = 1
mean(glm.pred != mpg01.test)
(g)
library(class)
train.X = cbind(cylinders, weight, displacement, horsepower)[train, ]
test.X = cbind(cylinders, weight, displacement, horsepower)[test, ]
train.mpg01 = mpg01[train]
set.seed(1)
# KNN(k=1)
knn.pred = knn(train.X, test.X, train.mpg01, k = 1)
mean(knn.pred != mpg01.test) # KNN(k=10)
knn.pred = knn(train.X, test.X, train.mpg01, k = 10)
mean(knn.pred != mpg01.test) # KNN(k=100)
knn.pred = knn(train.X, test.X, train.mpg01, k = 100)
mean(knn.pred != mpg01.test)
13题和11题类似,就是用这几个函数。所以13题略。
12.
(a)~(b)
Power = function() {
2^3
}
print(Power()) Power2 = function(x, a) {
x^a
}
Power2(3, 8)
(c)
Power2(10, 3)
Power2(8, 17)
Power2(131, 3)
(d)~(f)
Power3 = function(x, a) {
result = x^a
return(result)
} x = 1:10
plot(x, Power3(x, 2), log = "xy", ylab = "Log of y = x^2", xlab = "Log of x",
main = "Log of x^2 versus Log of x") PlotPower = function(x, a) {
plot(x, Power3(x, a))
}
PlotPower(1:10, 3)