之前介绍过Luna16肺结节检测竞赛的情况,接下来会做一系列项目的具体实现过程。首先附上该项目的Github链接:https://github.com/Minerva-J/DeepLung。
该项目基于 wentaozhu的工作,并做修改和版本更新。原始版本对应环境 python 2.7 pytorch 0.1(https://github.com/wentaozhu/DeepLung), 文章链接是 https://arxiv.org/pdf/1801.09555.pdf (DeepLung: Deep 3D Dual Path Nets for Automated Pulmonary Nodule Detection and Classification." IEEE WACV, 2018.)
这里我将版本环境修改为 PyTorch1.0 Python3.7 with multi GPUs, 膜拜大佬!!!
数据预处理的全部代码都在prepare.py中,主函数如下:
def preprocess_luna():
luna_segment = config['luna_segment']
savepath = config['preprocess_result_path']
luna_data = config['luna_data']
luna_label = config['luna_label']
finished_flag = '.1flag_preprocessluna'
print('starting preprocessing luna')
if not os.path.exists(finished_flag):
annos = np.array(pandas.read_csv(luna_label))
pool = Pool()
if not os.path.exists(savepath):
os.mkdir(savepath)
for setidx in range(10):
print('process subset', setidx)
filelist = [f.split('.mhd')[0] for f in os.listdir(luna_data+'subset'+str(setidx)) if f.endswith('.mhd') ]
if not os.path.exists(savepath+'subset'+str(setidx)):
os.mkdir(savepath+'subset'+str(setidx))
partial_savenpy_luna = partial(savenpy_luna, annos=annos, filelist=filelist, luna_segment=luna_segment, luna_data=luna_data+'subset'+str(setidx)+'/', savepath=savepath+'subset'+str(setidx)+'/')
N = len(filelist)
#savenpy(1)
_=pool.map(partial_savenpy_luna,range(N))
pool.close()
pool.join()
print('end preprocessing luna')
# f= open(finished_flag,"w+")
针对十份数据中的每一份,参数在 config_training.py.中, *_data_path is the unzip raw data path for LUNA16. *_preprocess_result_path is the save path for the preprocessing. *_annos_path is the path for annotations. *_segment is the path for LUNA16 segmentation, which can be downloaded from LUNA16 website.
The function of preprocess_luna in 662 line proproces luna data, and generate mask.npy clean.npy labe.npy spacing.npy,extendbox.npy,origin.npy to subset folder of config['preprocess_result_path'].
前面主函数调用savenpy_luna函数,
def savenpy_luna(id, annos, filelist, luna_segment, luna_data,savepath):
islabel = True
isClean = True
resolution = np.array([1,1,1])
name = filelist[id]
sliceim,origin,spacing,isflip = load_itk_image(os.path.join(luna_data,name+'.mhd'))
Mask,origin,spacing,isflip = load_itk_image(os.path.join(luna_segment,name+'.mhd'))
if isflip:
Mask = Mask[:,::-1,::-1]
newshape = np.round(np.array(Mask.shape)*spacing/resolution).astype('int')
m1 = Mask==3#左肺
m2 = Mask==4#右肺
Mask = m1+m2
xx,yy,zz= np.where(Mask)
box = np.array([[np.min(xx),np.max(xx)],[np.min(yy),np.max(yy)],[np.min(zz),np.max(zz)]])
box = box*np.expand_dims(spacing,1)/np.expand_dims(resolution,1)
box = np.floor(box).astype('int')
margin = 5
extendbox = np.vstack([np.max([[0,0,0],box[:,0]-margin],0),np.min([newshape,box[:,1]+2*margin],axis=0).T]).T
this_annos = np.copy(annos[annos[:,0]==(name)])
if isClean:
convex_mask = m1
dm1 = process_mask(m1)#对掩码采取膨胀操作
dm2 = process_mask(m2)
dilatedMask = dm1+dm2
Mask = m1+m2
extramask = dilatedMask ^ Mask#异或操作
bone_thresh = 210
pad_value = 170
if isflip:
sliceim = sliceim[:,::-1,::-1]
print('flip!')
#预处理Three automated preprocessing steps are employed for the input CT images. First, we clip the raw data into [−1200, 600]. Second, we transform the range linearlyinto [0, 1]. Finally, we use LUNA16’s given segmentation ground truth and remove the background.
sliceim = lumTrans(sliceim)
sliceim = sliceim*dilatedMask+pad_value*(1-dilatedMask).astype('uint8')#膨胀肺区,其他地方=170
bones = (sliceim*extramask)>bone_thresh
sliceim[bones] = pad_value#肺边缘大于bone_thresh=170
sliceim1,_ = resample(sliceim,spacing,resolution,order=1)
sliceim2 = sliceim1[extendbox[0,0]:extendbox[0,1],
extendbox[1,0]:extendbox[1,1],
extendbox[2,0]:extendbox[2,1]]#(290, 254, 334)
# plt.imshow(sliceim2[100])
###################################
# new_ct = sitk.GetImageFromArray(sliceim2)
# new_ct.SetDirection(ct.GetDirection())
# new_ct.SetOrigin(origin)
# new_ct.SetSpacing((1, 1, 1))
# new_ct_name = 'volume-' + str(random.randint) + '.nii'
# sitk.WriteImage(new_ct, os.path.join('./', new_ct_name))
##########################################
sliceim = sliceim2[np.newaxis,...]
print('clean,spacing,extendbox,origin,mask,label to',savepath)
np.save(os.path.join(savepath, name+'_clean.npy'), sliceim)
np.save(os.path.join(savepath, name+'_spacing.npy'), spacing)
np.save(os.path.join(savepath, name+'_extendbox.npy'), extendbox)
np.save(os.path.join(savepath, name+'_origin.npy'), origin)
np.save(os.path.join(savepath, name+'_mask.npy'), Mask)
if islabel:
this_annos = np.copy(annos[annos[:,0]==(name)])
label = []
if len(this_annos)>0:
for c in this_annos:
pos = worldToVoxelCoord(c[1:4][::-1],origin=origin,spacing=spacing)
if isflip:
print('isflip',pos[1:],Mask.shape[1:3],pos[1:])
pos[1:] = Mask.shape[1:3]-pos[1:]
label.append(np.concatenate([pos,[c[4]/spacing[1]]]))
label = np.array(label)
if len(label)==0:
label2 = np.array([[0,0,0,0]])
else:
label2 = np.copy(label).T#变成列向量
label2[:3] = label2[:3]*np.expand_dims(spacing,1)/np.expand_dims(resolution,1)
label2[3] = label2[3]*spacing[1]/resolution[1]
label2[:3] = label2[:3]-np.expand_dims(extendbox[:,0],1)
label2 = label2[:4].T
np.save(os.path.join(savepath,name+'_label.npy'), label2)
print(name)
加载原始数据和掩码,用的是load_itk_image函数 求取掩码的边界,即非零部分的边缘,求出一个box,然后对其应用新的分辨率,也就是重采样,将分辨率统一,采用的函数是resample 将数据clip至-1200~600,此范围外的数据置为-1200或600,然后再将数据归一化至0~255,采用的是lum_trans函数 对掩码进行一下膨胀操作,去除肺部的小空洞,采用的函数是process_mask,然后对原始数据应用新掩码,并将掩码外的数据值为170(水的HU值经过归一化后的新数值) 将原始数据重采样,再截取box内的数据即可。 读取标签,将其转换为体素坐标,采用的函数是worldToVoxelCoord,再对其应用新的分辨率,最后注意,数据是box内的数据,所以坐标是相对box的坐标。 将预处理后的数据和标签以.npy格式存储 预处理之后的数据显示如下:
本文分享自 Python编程和深度学习 微信公众号,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。
本文参与 腾讯云自媒体同步曝光计划 ,欢迎热爱写作的你一起参与!