In the field of radiology, generative models, notably DALL-E 2, have tremendous promise for image production and modification. Scientists have shown that DALL-E 2 has learned relevant x-ray picture representations and is able to produce new images from text descriptions, expand images beyond their initial borders, and delete elements. Despite limitations, the authors feel that the use of generative models in radiology research is conceivable so long as the models are further refined and adapted to the specific areas of interest.

Investigating the generative capabilities of DALL-E 2

Due to its ability to produce photorealistic images from brief, written inputs, DALL-E 2, a deep learning model initially released by OpenAI for text-to-image creation, has gained considerable public attention. The generative capabilities of DALL-E 2 raise the question of whether they may be used in the medical domain to produce or supplement data, as medical data can be limited and hard to get. By producing and altering x-ray, CT, MRI, and ultrasound images, the radiological information stored in DALL-E 2 was systematically investigated. Scientists examined whether DALL-E 2 has learned relevant representations of medical images and if it may be further tweaked for use in medical applications. 

Creating radiological images from written prompts

Using the phrase “An X-ray of” and a one-word description of the anatomical region required, scientists instructed the model to generate four simulated x-ray images of the head, chest, shoulder, abdomen, pelvis, hand, knee, and ankle. They observed that the coloring and general structure of the bones in the images looked realistic and that the overall anatomical claims were accurate, demonstrating the presence of fundamental x-ray anatomy ideas. However, the trabecular structure of the bone seemed random and did not follow the flow of mechanical stress as it would in actual x-rays. The model also had trouble accurately generating joints, and the quality of the images was not what one would anticipate for clinical x-ray images, with inaccurate collimation and missing organ sections.

The model also had difficulties generating cross-sectional images from CT or MRI, with the images being mostly nonsensical. A whole CT or MRI scan on an x-ray film was frequently displayed instead of a cross-sectional image. The desired anatomic features were not evident on ultrasound images, and all images resembled obstetric images. But so far, the images displayed notions of CT, MRI, and ultrasound images, indicating that DALL-E 2 has representations of these modalities.

Reconstructing missing areas in radiological images 

DALL-E 2’s ability to reconstruct missing areas in radiological images (inpainting) depends on the extent of its radiological knowledge. In the case of the pelvis, thorax, and thoracic spine, the results of the inpainting task were nearly indistinguishable from the original radiographs. This suggested that DALL-E 2 has a good understanding of the anatomy and structure of these body parts and is able to generate realistic replacements.

However, the results were not as convincing for images that included joints like the ankle, wrist, shoulder, and knee. In these cases, DALL-E 2’s reconstructions varied from the original radiographs and deviated from realistic representations. For example, in the ankle and wrist images, the number of tarsal bones and the structure varied greatly from those in the original radiographs. Similarly, in the shoulder images, DALL-E 2 failed to reconstruct the glenoid cavity and articular surface of the humerus, and in one image, a foreign body was inserted into the shoulder that remotely resembled a prosthesis. Finally, when reconstructing the knee image, the model omitted the patella but retained the bicondylar structure of the femur. These limitations suggest that DALL-E 2’s radiological knowledge is less extensive for joint structures than it is for other body parts.

Extending images beyond its boundaries

Researchers picked radiographic images of several anatomical locations at random and had the model stretch the images beyond their borders. The results demonstrated that the model was capable of producing realistic radiograph representations, with anatomical proportions such as the length of the femur and the size of the lung remaining accurate.

However, finer details were inconsistent, like the number of lumbar vertebrae. This indicates that while the model may possess some anatomical knowledge, it may not be as comprehensive as that of a licensed medical professional.

The researchers also discovered that the model worked best when constructing anterior and posterior views, which are easier anatomical perspectives, but lateral views were more difficult and had worse outcomes. This may be due to the fact that lateral views require more complicated spatial reasoning and anatomical knowledge, which may be more challenging for an image-generating model to portray correctly.

Limitations in generating pathological images 

Researchers attempted to produce pathological images, such as those of fractures, using DALL-E 2. However, it was found during testing that most of these images were distorted.

Moreover, DALL-E 2 contains a filter that prohibits the production of dangerous content. This filter is intended to prevent the generation of potentially violent or unsuitable images, such as those displaying gore or sexual material. Because the filter prohibited the use of specific trigger phrases, such as “bleeding,” the researchers were unable to evaluate the production of images for the majority of pathologies.


DALL-E 2, a generative model created by OpenAI, can generate x-ray images with comparable anatomical proportions and style to actual x-ray images, showing that appropriate representations for radiographs were learned during training. Unfortunately, the model’s generative skills for pathological images such as fractures and tumors were restricted, and it demonstrated poor performance for CT, MRI, and ultrasound images.

Access to data is crucial in deep learning, but in radiology, data is divided across multiple institutions, and privacy considerations hinder integrating them into a single huge database. Synthetic data using generative models such as DALL-E 2 provide promise for developing data sets that are considerably bigger than those currently accessible, solving privacy concerns, and expediting the development of new deep-learning tools for radiology.

Article Source: Reference Paper

Learn More:


Please enter your comment!
Please enter your name here