细胞图像数据的主动学习( 二 ) 科学家

\"\"\"
crop_cell(row)

given a pd.Series row of the dataframe load row['filename'
with PIL
crop it to the box row['xmin'
row['xmax'
row['ymin'
row['ymax'

save the cropped image
return cropped filename
\"\"\"
input_dir = 'BCCD\\JPEGImages'
output_dir = 'BCCD\\cropped'
# open image
im = Image.open(f\"{input_dir\\{row['filename'
\")

# size of the image in pixels
width height = im.size

# setting the points for cropped image
left = row['xmin'

bottom = row['ymax'

right = row['xmax'

top = row['ymin'

# cropped image
im1 = im.crop((left top right bottom))
cropped_fname = f\"BloodImage_{row['image_id'
:03d_{row['cell_id'
:02d.jpg\"
# shows the image in image viewer
# im1.show()

# save image
try:
im1.save(f\"{output_dir\\{cropped_fname\")
except:
return 'error while saving image'

return cropped_fname

if __name__ == \"__main__\":
# load labels csv into Pandas DataFrame
filepath = \"BCCD\\dataset2-master\\labels.csv\"
df = pd.read_csv(filepath)

# iterate through cells crop each cell and save cropped cell to file
dataset_df['cell_filename'
= dataset_df.apply(crop_cell axis=1)
以上就是我们所做的所有预处理操作。现在，我们继续使用CellProfiler提取特征。
使用CellProfiler提取细胞特征CellProfiler是一个免费的开源图像分析软件，可以从大规模细胞图像中自动定量测量。 CellProfiler还包含一个GUI界面，允许我们可视化的操作
首先下载CellProfiler ，如果CellProfiler无法打开，则可能需要安装Visual C ++发布包，具体安装方式参考官网。
打开软件就可以加载图像了，如果想构建管道可以在CellProfiler官网找到其提供的可用的功能列表。大多数功能分为三个主要组：图像处理，目标的处理和测量。
常用的功能如下：
图像处理 - 转为灰度图：

对象目标处理 - 识别主要对象

测量 - 测量对象强度

CellProfiler可以将输出为CSV文件或者保存指定数据库中。这里我们将输出保存为CSV文件，然后将其加载到Python进行进一步处理。
说明：CellProfiler还可以将你处理图像的流程保存并进行分享。
主动学习我们现在已经有了训练需要的搜有数据，现在可以开始试验使用主动学习策略是否可以通过更少的数据标记获得更高的准确性。我们的假设是：使用主动学习可以通过大量减少在细胞分类任务上训练机器学习模型所需的标记数据量来节省宝贵的时间和精力。
主动学习框架
在深入研究实验之前，我们希望对modAL进行快速介绍： modAL是Python的活跃学习框架。它提供了Sklearn API ，因此可以非常容易的将其集成到代码中。该框架可以轻松地使用不同的主动学习策略。他们的文档也很清晰，所以建议从它开始你的一个主动学习项目。
主动学习与随机学习
为了验证假设，我们将进行一项实验，将添加新标签数据的随机子抽样策略与主动学习策略进行比较。开始用一些相同的标记样本训练2个Logistic回归估计器。然后将在一个模型中使用随机策略，在第二个模型中使用主动学习策略。
我们首先为实验准备数据，加载由Cell Profiler言创建的特征。这里过滤了无色血细胞的血小板，只保留红和白细胞（将问题简化，并减少数据量）。所以现在我们正在尝试解决二进制分类问题 - RBC与WBC 。使用Sklearn Label的label encoder进行编码，并拆分数据集进行训练和测试。
# imports for the whole experiment
import numpy as np
from matplotlib import pyplot as plt
from modAL import ActiveLearner
import pandas as pd
from modAL.uncertainty import uncertainty_sampling
from sklearn import preprocessing
from sklearn.metrics importaverage_precision_score
from sklearn.linear_model import LogisticRegression

# upload the cell profiler features for each cell
data = https://mparticle.uc.cn/api/pd.read_csv('Zaretski_Image_All.csv')

# filter platelets
data = https://mparticle.uc.cn/api/data[data['cell_type'
!= 'Platelets'

# define the label
target = 'cell_type'
label_encoder = preprocessing.LabelEncoder()

细胞图像数据的主动学习( 二 )

推荐阅读

如何找寻自我

缝纫机调线器怎么安装平车方法如何

强组词强字组词

火笋鸡翅的做法（增肥食谱）

闺女生日快乐祝福语朋友圈

LV请来潮牌设计师做艺术总监，看中的是啥

老虎豆怎么做好吃老虎豆图片怎样弄来吃

暖气有流水声是什么原因

皮球是什么体

手机怎样开通QQ空间

男生发mua说明 mua是什么意思

手指盖凹陷怎么回事

对自己的生活失去掌控咋调整

小米10s怎么没有月亮模式

我想找个偏僻的地方搞养殖！有没有推荐的地方？

南京养老金认证上门服务怎么申请南京市养老金认证

如何评价猪场阉割猪？

最后一个字是豹的成语

泰山散酒怎么样

猫发情的声音(猫发情的叫声)