作者:欧新宇(Xinyu OU)
本文档所展示的测试结果,均运行于:Intel Core i7-7700K CPU 4.2GHz
最后修订: 2020-01-29
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
cancer = load_breast_cancer()
X, y = cancer.data, cancer.target
X_train, X_test, y_train, y_test = train_test_split(X, y)
bnb = BernoulliNB()
bnb.fit(X_train, y_train)
data_id = 142
data_x = [X_test[data_id]]
data_y = y_test[data_id]
print("样本的正确分类为: {}".format(data_y))
print("GaussianNB模型预测的分类是: {}".format(bnb.predict(data_x)[0]))
print("+ 属于分类0的概率值是:{:.5f}".format(bnb.predict_proba(data_x)[0][0]))
print("+ 属于分类1的概率值是:{:.5f}".format(bnb.predict_proba(data_x)[0][1]))
此处,
如果有数据包含n个类别,则predict()输出的仍然是概率最高的那个类别的id,而predict_proba()输出的将是这n个类别,每一个类别的预测概率。