细胞图像数据的主动学习( 三 ) 科学家

y = label_encoder.fit_transform(data[target
)

# take the learning features only
X = data.iloc[: 5:

# create training and testing sets
X_train X_test y_train y_test = train_test_split(X.to_numpy() y test_size=0.33 random_state=42)
下一步就是创建模型
dummy_learner = LogisticRegression()

active_learner = ActiveLearner(
estimator=LogisticRegression()
query_strategy=uncertainty_sampling()
)
dummy_learner是使用随机策略的模型，而active_learner是使用主动学习策略的模型。为了实例化一个主动学习模型，我们使用modAL包中的ActiveLearner对象。在“estimator”字段中，可以插入任何sklearnAPI兼容的模型。在query_strategy '字段中可以选择特定的主动学习策略。这里使用“uncertainty_sampling()” 。这方面更多的信息请查看modAL文档。
将训练数据分成两组。第一个是训练数据，我们知道它的标签，会用它来训练模型。第二个是验证数据，虽然标签也是已知的，但是我们假装不知道它的标签，并通过模型预测的标签和实际标签进行比较来评估模型的性能。然后我们将训练的数据样本数设置成5 。
# the training size that we will start with
base_size = 5

# the 'base' data that will be the training set for our model
X_train_base_dummy = X_train[:base_size

X_train_base_active = X_train[:base_size

y_train_base_dummy = y_train[:base_size

y_train_base_active = y_train[:base_size

# the 'new' data that will simulate unlabeled data that we pick a sample from and label it
X_train_new_dummy = X_train[base_size:

X_train_new_active = X_train[base_size:

y_train_new_dummy = y_train[base_size:

y_train_new_active = y_train[base_size:

我们训练298个epoch ，在每个epoch中，将训练这俩个模型和选择下一个样本，并根据每个模型的策略选择是否将样本加入到我们的“基础”数据中，并在每个epoch中测试其准确性。因为分类是不平衡的，所以使用平均精度评分来衡量模型的性能。
在随机策略中选择下一个样本，只需将下一个样本添加到虚拟数据集的“新”组中，这是因为数据集已经是打乱的的，因此不需要在进行这个操作。对于主动学习，将使用名为“query”的ActiveLearner方法，该方法获取“新”组的未标记数据，并返回他建议添加到训练“基础”组的样本索引。被选择的样本都将从组中删除，因此样本只能被选择一次。
# arrays to accumulate the scores of each simulation along the epochs
dummy_scores = [

active_scores = [

# number of desired epochs
range_epoch = 298
# running the experiment
for i in range(range_epoch):
# train the models on the 'base' dataset
active_learner.fit(X_train_base_active y_train_base_active)
dummy_learner.fit(X_train_base_dummy y_train_base_dummy)

# evaluate the models
dummy_pred = dummy_learner.predict(X_test)
active_pred = active_learner.predict(X_test)

# accumulate the scores
dummy_scores.append(average_precision_score(dummy_pred y_test))
active_scores.append(average_precision_score(active_pred y_test))

# pick the next sample in the random strategy and randomly
# add it to the 'base' dataset of the dummy learner and remove it from the 'new' dataset
X_train_base_dummy = np.append(X_train_base_dummy [X_train_new_dummy[0 :

axis=0)
y_train_base_dummy = np.concatenate([y_train_base_dummy np.array([y_train_new_dummy[0

)
axis=0)
X_train_new_dummy = X_train_new_dummy[1:

y_train_new_dummy = y_train_new_dummy[1:

# pick next sample in the active strategy
query_idx query_sample = active_learner.query(X_train_new_active)

# add the index to the 'base' dataset of the active learner and remove it from the 'new' dataset
X_train_base_active = np.append(X_train_base_active X_train_new_active[query_idx
axis=0)
y_train_base_active = np.concatenate([y_train_base_active y_train_new_active[query_idx

axis=0)
X_train_new_active = np.concatenate([X_train_new_active[:query_idx[0

X_train_new_active[query_idx[0
+ 1:

axis=0)
y_train_new_active = np.concatenate([y_train_new_active[:query_idx[0

y_train_new_active[query_idx[0
+ 1:

axis=0)
结果如下：
plt.plot(list(range(range_epoch)) active_scores label='Active Learning')
plt.plot(list(range(range_epoch)) dummy_scores label='Dummy')
plt.xlabel('number of added samples')
plt.ylabel('average precision score')

细胞图像数据的主动学习( 三 )

推荐阅读

如何找寻自我

缝纫机调线器怎么安装平车方法如何

强组词强字组词

火笋鸡翅的做法（增肥食谱）

闺女生日快乐祝福语朋友圈

LV请来潮牌设计师做艺术总监，看中的是啥

老虎豆怎么做好吃老虎豆图片怎样弄来吃

暖气有流水声是什么原因

皮球是什么体

手机怎样开通QQ空间

男生发mua说明 mua是什么意思

手指盖凹陷怎么回事

对自己的生活失去掌控咋调整

小米10s怎么没有月亮模式

我想找个偏僻的地方搞养殖！有没有推荐的地方？

南京养老金认证上门服务怎么申请南京市养老金认证

如何评价猪场阉割猪？

最后一个字是豹的成语

泰山散酒怎么样

猫发情的声音(猫发情的叫声)