按照 https://github.com/dmlc/xgboost/tree/master/demo/binary_classification 获取数据

疑问

  • booster 不同类型的区别
    • gblinear
    • gbtree
import numpy as np
import xgboost as xgb
dtrain = xgb.DMatrix('agaricus.txt.train')
dtest = xgb.DMatrix('agaricus.txt.test')
params = {
    'booster': 'gbtree', # Specify which booster to use: gbtree, gblinear or dart.
    'objective': 'binary:logistic', # 损失函数
    'eta': 1.0, # 学习率
    'gamma': 1.0, # minimum loss reduction required to make a further partition on a leaf node of the tree. The larger, the more conservative the algorithm will be.
    'min_child_weight': 1, # 如果子结点的实例权重之和小于这个值,就不再分块了。
    'max_depth': 3
    # the console version parameters 
    # num_round = 2
}

We use the tree booster and logistic regression objective in our setting. This indicates that we accomplish our task using classic gradient boosting regression tree(GBRT), which is a promising method for binary classification. 树形提升和 logistic regression objective 就是经典的树形提升回归树 (GBRT)

watchlist = [(dtest, 'eval'), (dtrain, 'test')]
num_round = 1
bst = xgb.train(params, dtrain, num_round, watchlist)
[0]    eval-error:0.015793    test-error:0.014524
preds = bst.predict(dtest)
labels = dtest.get_label()
print('error=%f' % (sum(1 for i in range(len(preds)) if int(preds[i] > 0.5) != labels[i]) / float(len(preds))))
error=0.015793
bst.dump_model('dump.nice.model', 'featmap.txt')
with open('dump.nice.model', 'r') as inf:
    print(inf.read())
booster[0]:
0:[odor=pungent] yes=2,no=1
    1:[stalk-root=cup] yes=4,no=3
        3:[stalk-root=missing] yes=8,no=7
            7:leaf=1.899
            8:leaf=-1.94737
        4:[bruises?=no] yes=10,no=9
            9:leaf=1.78378
            10:leaf=-1.98135
    2:[spore-print-color=orange] yes=6,no=5
        5:[stalk-surface-below-ring=silky] yes=12,no=11
            11:leaf=-1.98546
            12:leaf=0.938776
        6:leaf=1.87097

results matching ""

    No results matching ""