Unsupervised and Supervised Segmentation of 3D Medical Images Based on Clustering and Deep Learning Formation Using Cryptography

Unsupervised Segmentation of 3D Medical Images for Pathological Examination Using Deep Learning

by Pradeep .*,

- Published in Journal of Advances and Scholarly Researches in Allied Education, E-ISSN: 2230-7540

Volume 2, Issue No. 2, Oct 2011, Pages 0 - 0 (0)

Published by: Ignited Minds Journals


ABSTRACT

This paper presents a novel unsupervised segmentation method for 3D medical images. Convolutional neural networks (CNNs) have brought significant advances in image segmentation. However, most of the recent methods rely on supervised learning, which requires large amounts of manually annotated data. Thus, it is challenging for these methods to cope with the growing amount of medical images. This paper proposes a unified approach to unsupervised deep representation learning and clustering for segmentation. Our proposed method consists of two phases. In the first phase, we learn deep feature representations of training patches from a target image using joint unsupervised learning (JULE) that alternately clusters representations generated by a CNN and updates the CNN parameters using cluster labels as supervisory signals. We extend JULE to 3D medical images by utilizing 3D convolutions throughout the CNN architecture. In the second phase, we apply k-means to the deep representations from the trained CNN and then project cluster labels to the target image in order to obtain the fully segmented image. We evaluated our methods on three images of lung cancer specimens scanned with micro-computed tomography (micro-CT). The automatic segmentation of pathological regions in micro-CT could further contribute to the pathological examination process. Hence, we aim to automatically divide each image into the regions of invasive carcinoma, non-invasive carcinoma, and normal tissue. Our experiments show the potential abilities of unsupervised deep representation learning for medical image segmentation.

KEYWORD

unsupervised segmentation, 3D medical images, clustering, deep learning, cryptography, convolutional neural networks, supervised learning, unsupervised deep representation learning, joint unsupervised learning, k-means, lung cancer specimens, micro-computed tomography, pathological examination, invasive carcinoma, non-invasive carcinoma, normal tissue

INTRODUCTION

The purpose of our study is to develop an unsupervised segmentation method of 3D medical images. Most of the recent segmentation methods using convolutional neural networks (CNNs) rely on supervised learning that requires large amounts of manually annotated data.[1] Therefore, it is challenging for these methods to cope with medical images due to the difficulty of obtaining manual annotations. Thus, research into unsupervised learning, especially for 3D medical images, is very promising. Many previous unsupervised segmentation methods for 3 D medical images are based on clustering.[2] However, most unsupervised work in medical imaging was limited to hand-crafted features that were then used with traditional clustering methods to provide segmentation. In our study, we investigated whether representations learned by unsupervised deep learning aid in the clustering and segmentation of 3D medical images. As an unsupervised deep representation learning, we adopt joint unsupervised learning (JULE)[3] based on a framework that progressively clusters images and learns deep representations via a CNN. Our main contribution is to combine JULE with k-means[4] for medical image segmentation. To our knowledge, our methods are the firstto employ JULE for unsupervised medical imagesegmentation. Moreover, our work is the first to conductautomatic segmentation for pathological diagnosis ofmicro-CT images. This work demonstrates that deeprepresentations can be useful for unsupervised medicalimage segmentation. There are two reasons why we chose JULE for ourproposed method. The first reason is that JULE is robustagainst data variation (e.g., image type, image size, andsample size) and thus can cope with a dataset composedof 3D patches cropped out of medical images. Moreover,the range of intensities is different for each medical image.Thus, we need a learning method that works well withvarious datasets. The second reason is that JULE canlearn representations that work well with many clusteringalgorithms. This advantage allows us to learnrepresentations on a subset of possible patches from atarget image and then apply a faster clustering algorithm tothe representations of all patches for segmentation.

METHOD

The proposed segmentation method has two phases: (1) learning feature representations using JULE and (2) clustering deep representations for segmentation. In phase (1), we conduct JULE in order to learn the representations of image patches randomly extracted from an unlabelled image. For use with 3D medical images, we extend JULE to use 3D convolutions. The purpose of this phase is to obtain a trained CNN that can transform image patches to discriminative feature representations. In phase (2), we use k-means to assign labels to learned representations generated by the trained CNN.

1.1 Deep Representation Learning

The main idea behind JULE is that meaningful cluster labels could become supervisory signals for representation learning and discriminative representations help to obtain meaningful clusters. Given a set of ns unlabeled image patches I = {I1,...,Ins}, cluster labels for all image patches y = {y1,...,yns}, and the parameters for representations θ, the objective function of JULE is formulated as where L is a loss function. JULE tries to find optimal yˆ in the forward pass and optimal θˆ in the backyard pass to minimize L. By iterating the forward pass and the backward pass, we can obtain more discriminative representations and therefore better image clusters. In the forward pass, we conduct image clustering to merge clusters using agglomerative clustering.[5] In the backward pass, we conduct representation learning via a 3 D CNN using cluster labels as supervisory signals. JULE can be interpreted as a recurrent framework because it iterates merging clusters and learning representations over multiple timesteps until it obtains the desired number of clusters C. Fig. 1 shows an overview of a recurrent process at the time of step t.

Extension to 3D Medical Images

We conducted two extensions of JULE. One is the extension of the recurrent process for the CNN training in the backward pass. Originally, JULE aims to obtain the final clusters and finishes when it obtains a desired number of clusters in the forward pass.[3] In contrast, our purpose is to obtain a well-trained CNN. If we terminate the recurrent process in the final forward pass, we lose a chance to train the CNN with the final cluster labels. Therefore, we extended the recurrent process to train the CNN using the final cluster label in the backward pass. The intuitive reason is that the final clusters are the most precise of the entire process and representations learned with them become more discriminative. The other is the extension of CNN to support 3D medical images.Originally, JULE is a representation learning and clusteringmethod for 2D images. We, however, aim to learnrepresentations using 3D image patches. Thus, weextended the CNN architecture of the original JULE[3] touse 3D convolutions throughout the network.

1.2 Patch Extraction

Prior to learning representations, we need to preparetraining data composed of small 3D image patches. Thesepatches are extracted from the unlabeled image, which isour target for segmentation, by randomly cropping ns sub-volumes of w ×w ×w voxels. In many cases of medicalimage segmentation, we need to exclude the outside of ascanned object from the training data. We choose a certainthreshold that can divide the scanned target region fromthe background and include only patches whose centervoxel intensity is within the threshold. After extractingtraining patches, we centralize them by subtracting out themean of all intensities and dividing by the standarddeviation, following Yang et. al.[3]

Figure 1: Illustration of a recurrent process at the time of step t. First, we extract representations Xt from training patches I via a CNN with parameter θt. Next, we merge them and assign new labels yt to Xt. Finally, we input I into the CNN again and update the CNN parameters θt to θt+1 through back propagation from a loss calculated using yt as supervisory signals. Note that the CNN is initialized with random weights.

Figure 2: Our CNN architecture has 3 convolutional, 1 max pooling, and 2 fully-connected layers. All 3 D convolutional kernels are 5×5×5 with stride 1. Number of kernels are denoted in each box. Pooling kernels are 2 × 2 × 2 with stride 2. The first fully-connected layer has 1350 neurons, and the second one has 160 neurons. 1.3 CNN Architecture

Our CNN consists of three convolutional layers, one max pooling layer, and two fully-connected layers. The kernels of the second and third convolutional layers are connected to all kernel maps in the previous layer. The neurons in the fully-connected layers are connected to all neurons in the previous layer. The max pooling layer follows the first convolutional layers. Batch normalization is applied to the output of each convolutional layer. A rectified linear unit (ReLU) is used as the nonlinearity after batch normalization. The second fully-connected layer is followed by the L2-normalization layer. All of the convolutional layers use 50 kernels of 5 × 5 × 5 voxels with a stride of 1 voxel. The Max pooling layer has a kernel of 2 × 2 × 2 voxels with a stride of 2 voxels. The input to the network are image patches of 27 × 27 × 27 voxels. The first fully-connected layer has 1350 neurons and the second has 160 neurons. Other parameters for the CNN training, such as learning rate, are the same as proposed in the original JULE.[3] The CNN architecture is presented in Fig.2.

Figure 3: Our segmentation process. We first obtain feature representations from a trained CNN and then apply conventional k-means to them. Finally, we assign labels to the patches based on the clustering results. (For simplification, we have drawn the figure with a stride equal to w.) 1.4 Segmentation

In the segmentation phase, we first extract a possible number of patches of w × w × w voxels from the target image separated by s voxels each. Note that stride s is not larger than w. As with extracting training patches, we select only voxels within the scanned sample by thresholding. The trained CNN transforms each patch intoa feature representation. We then divide the featurerepresentations into K clusters by k-means. After applyingk-means, each representation is assigned a label l(1 ≤ l ≤K) and we need to project these labels onto the originalimage. We consider subpatches of s × s × s voxelscentered in each extracted patch. Each subpatch isassigned the same label as the closest clusterrepresentation using Euclidean distance. Thissegmentation process is illustrated in Fig. 3.

2. EXPERIMENTS AND RESULTS

2.1 Datasets

We chose three lung cancer specimen images scannedwith a micro-CT scanner (inspeXio SMX-90CT Plus,Shimadzu Corporation, Kyoto, Japan) to evaluate ourproposed method. The lung cancer specimens from therespective patients were scanned with similar resolutions.We aimed to divide each image into three histopathologicalregions: (a) invasive carcinoma; (b) noninvasivecarcinoma; and (c) normal tissue. We selected theseimages because segmenting the regions on micro-CTimages based on histopathological features couldcontribute to the pathological examination.[6][7] Detailedinformation for each image is shown in Table 1.

2.2 Parameter Settings

For JULE, we randomly extracted 10,000 patches of27×27×27 voxels from a target image. We set the numberof final clusters C to 100 for lung-A and lung-C, to 10 forlung-B, which are the stopping conditions of agglomerativeclustering. Other parameters are the same as in theoriginal JULE.[3] After representation learning, weextracted Table 1: Images used in our experimentspatches of 27 × 27 × 27 voxels with a stride of five voxelsand processed them by the trained CNN to obtain a 160dimensional representation for each patch. Forsegmentation, we applied the conventional k-means to thefeature representations, setting K to 3.

Figure 4: NMI comparison on three datasets. Our method outperformed traditional unsupervised methods. 2.3 Evaluations

We used normalized mutual information (NMI)[8] to measure segmentation accuracy. A larger NMI value means more precise segmentation results. We used seven manually annotated slices for evaluation. We compared the proposed method with traditional k-means segmentation and multithreshold Otsu method.[9] We also evaluated the average NMI of each method across the datasets. The results are shown in Fig. 4. In each figure, the best performance NMI for each K is in bold. As shown in all of the figures, JULE-based segmentation outperformed traditional unsupervised methods. While the NMI scores of our methods are not high, qualitative evaluation shows promising results of our proposed method (see Fig. 5). The qualitative examples show that JULE-based segmentation divided normal tissue region from the cancer region, including invasive carcinoma and noninvasive carcinoma, well.

3. CONCLUSION

We proposed an unsupervised segmentation using JULE that alternately learns deep representations and image clusters. We demonstrated the potential abilities of unsupervised medical image segmentation using deep representations. Our segmentation method could be applicable to many other applications in medical imaging.

REFERENCES

1. Long, J., Shelhamer, E., and Darrell, T. (2009). ―Fully convolutional networks for semantic segmentation,‖ in [IEEE CVPR], pp. 3431–3440. 2. Garc´ıa-Lorenzo, D., Francis, S., Narayanan, S.,Arnold, D. L., and Collins, D. L. (2010). ―Review ofautomatic segmentation methods of multiplesclerosis white matter lesions on conventionalmagnetic resonance imaging,‖ Medical ImageAnalysis 17, pp. 1–18. 3. Yang, J., Parikh, D., and Batra, D. (2010). ―Jointunsupervised learning of deep representations andimage clusters,‖ in [IEEE CVPR], pp. 5147–5156. 4. MacQueen, J. et. al. (1967). ―Some methods forclassification and analysis of multivariateobservations,‖ in [Proceedings of the fifth BerkeleySymposium on Mathematical Statistics andProbability], 1, pp. 281–297. 5. Zhang, W., Wang, X., Zhao, D., and Tang, X.(2002). ―Graph degree linkage: Agglomerativeclustering on a directed graph,‖ in [ECCV], 7572,pp. 428–441. 6. Mori, K. (2006). ―From macro-scale to micro-scalecomputational anatomy: a perspective on the next20 years,‖ Medical Image Analysis 33, pp. 159–164. 7. Nakamura, S., Mori, K., Okasaka, T., Kawaguchi,K., Fukui, T., Fukumoto, K., and Yokoi, K. (2003).―Microcomputed tomography of the lung: Imagingof alveolar duct and alveolus in human lung,‖ in[D55. LAB METHODOLOGY ANDBIOENGINEERING: JUST DO IT], A7411–A7411,American Thoracic Society. 8. Strehl, A. and Ghosh, J. (2002). ―Clusterensembles—A knowledge reuse framework forcombining multiple partitions,‖ Journal of machinelearning research 3(Dec), pp. 583–617. 9. Otsu, N. (1979). ―A threshold selection methodfrom gray-level histograms,‖ IEEE transactions onsystems, man, and cybernetics 9(1), pp. 62–66.

Corresponding Author Pradeep*

Researcher, Department of Computer Science, CMJUniversity, Shillong, Meghalaya

pradeep.jangara@gmail.com