1.5.2 Bagging算法

智能系统与技术丛书·AI安全之对抗样本入门作者：兜哥投票推荐加入书签留言反馈

    1.5.2 bagging算法
    与boosting算法不同，bagging算法的分类器之间没有依赖关系，可以并行生成。bagging使用自助采样法，即对于m个样本的原始训练集，我们每次先随机采集一个样本放入采样集，接着把该样本放回，也就是说下次采样时该样本仍有可能被采集到，这样采集m次，最终可以得到m个样本的采样集。由于是随机采样，每次的采样集不同于原始训练集和其他采样集，这样得到了多个不同的分类器。
    下面举个例子，数据集使用随机生成的数据，使用baggingclassifier，分类器个数设置为100：
    x, y = datasets.make_classification(n_samples=1000,
    n_features=100,n_redundant=0, random_state = 1)
    train_x, test_x, train_y, test_y = train_test_split(x,
    y,
    test_size=0.2,
    random_state=66)
    clf = baggingclassifier(n_estimators=100)
    clf.fit(train_x, train_y)
    pred_y = clf.predict(test_x)
    report(test_y, pred_y)
    输出对应的性能指标，准确度为83.5%，f1为84.21%，准确率为84.61%，召回率为83.81%，auc为0.83：
    accuracy_score:
    0.835
    f1_score:
    0.842105263158
    recall_score:
    0.838095238095
    precision_score:
    0.846153846154
    confusion_matrix:
    [[79 16]
    [17 88]]
    auc:
    0.834837092732
    对应的roc曲线如图1-37所示，综合指标优于之前的knn，也略优于adaboost。
    图1-37 bagging的roc曲线