Unsupervised Landmark Detection Based Motion Estimation for Dynamic Medical Images

Dynamic medical imaging in 4D typically requires motion estimation of the organs. With respect to the spatial-temporal information of the 3D volume sampled over multiple time points, one may assess structural and functional property of the target organ. Our early study in CVPR 2020 shows that, with limited number of temporally sampled phase images, one can reconstruct the organ motion trajectory of high resolution both spatially and temporally, via interpolation in a well-encoded latent space.

Many motion estimation methods are image based, which optimize the motion field by evaluating in the entire image space directly and are thus prone to implausible outputs especially in the presence of large motion. An alternative solution based on sparse landmarks can alleviate this concern. Specifically, we propose a two-stage motion estimation framework of Dense-Sparse-Dense (DSD). First, we extract sparse landmarks from the dense image to represent the target organ. Second, we construct the dense motion field, given the extracted sparse landmarks and their corresponding displacements across different time points.

The proposed DSD framework contains an unsupervised landmark detection network and a dense motion reconstruction network. First (with blue background in the above figure), the fixed image and the moving image are separately input to the landmark detection network to obtain their respective yet corresponding landmarks. Second (with orange background), we estimate the dense motion field from the sparse displacements of the corresponding landmarks of the two input images.

The bottleneck for DSD relates to landmark detection. It is non-trivial for human raters to label corresponding landmarks in different phase images, which is typically required to serve as ground-truth in supervised learning. To tackle this issue, we design an unsupervised 3D landmark detection network. The detector is pre-trained with self-supervised representation learning, to focus its attention toward the motion-affected organ. Several losses are carefully designed to ensure that the landmarks are spatially sparse yet anatomically informative for the target organ. Our experimental results demonstrate that the DSD solution can complete landmark detection and then motion estimation without any need of manual annotation.

For more details of this work, please refer to Guo et al.

Multi-Modal MRI Reconstruction Assisted with Spatial Alignment Network

In clinical practice, magnetic resonance imaging (MRI) with multiple contrasts is usually acquired in a single study to assess different properties of the same region of interest in human body. The whole acquisition process can be accelerated by having one or more modalities under-sampled in the k-space.

Our early research (Xiang et al.) demonstrates that, considering the redundancy between different contrasts or modalities, a target MRI modality under-sampled in the k-space can be better reconstructed with the helps from a fully-sampled sequence (i.e., the reference modality). It implies that, in the same study of the same subject, multiple sequences can be utilized together toward the purpose of highly efficient multi-modal reconstruction.

However, we find that multi-modal reconstruction can be negatively affected by subtle spatial misalignment between different sequences, which is actually common in clinical practice. Thus, we aim to integrate the spatial alignment network with reconstruction in this work, to improve the quality of the reconstructed target modality.

A spatial alignment network is integrated into the multi-modal MRI reconstruction pipeline to compensate for the spatial misalignment between the fully-sampled reference image and the under-sampled target.

Specifically, the spatial alignment network estimates the spatial misalignment between the fully-sampled reference and the under-sampled target images, and warps the reference image accordingly. Then, the aligned fully-sampled reference image joins the under-sampled target image in the reconstruction network, to produce the high-quality target image. Our experiments on both clinical MRI and multi-coil k-space raw data demonstrate the superiority and robustness of our spatial alignment network.

For more details of this work, please refer to Xuan et al.