If you’re on twitter, you’ve likely come across the new trend of AI-generated artwork. By entering a short phrase, AI-powered search engines like DALL-E and Lexica are able to construct entirely new pieces of art. For example, an AI-generated series of images depicting a fictitious “Studio Ghibli Movie San Francisco” recently went viral on twitter, capturing enthusiasm by representing the city in such a distinctive artistic style.
Image generators like DALL-E work on two basic principles. The first is that these models are designed to relate language to images. These image generators use a specific framework for connecting text and imagery called Contrastive Language-Image Pre-training, or CLIP. In short, CLIP allows these models to optimally map an image to a caption, enabling the models to work with both text and imagery. Diffusion models are the second feature of AI-generated imagery, and are ultimately what allows these programs to generate new artwork. When DALL-E receives a text prompt, it begins with a “prior model”. In this step, an existing image is encoded that corresponds to the text input. Noise is then added to the image to provide a layer of stochasticity to the results. New artwork is then generated in an image decoding step, where the added noise is gradually reversed. The goal is not to perfectly denoise the encoded image, as that would simply return the starting picture, but rather to return an image that retains the main features of the artwork but is noticeably distinct.
Playing with DALL-E is like probing a computer's imagination, and the results can be incredibly entertaining. For example, when prompted with our company name “Cromatic”, DALL-E outputs a series of images related to color, seeming to have mapped to the more common word “chromatic”. If we help nudge DALL-E toward biology with “Cromatic Biotech”, we still get images filling out the color spectrum, only this time with test tubes and beakers.
DALL-E’s interpretation of “Cromatic” as “Chromatic” illustrates the point that the AI works off of image models that are the most likely representation of a text, weighted by the frequency of the concept in its expansive pool of source images. Unfortunately, this can sometimes lead to biased outputs. As recently reported by Wired, DALL-E often “leans toward generating images of white men by default, overly sexualizes images of women, and reinforces racial stereotypes.”
While these are serious issues that need to be addressed, the current bias of DALL-E presents the opportunity to hold a mirror to representation in society’s content around a particular subject. We were particularly curious what biases DALL-E might show in biology research and biotech. As shown below, DALL-E often echoes what statistics tell us about gender and race discrepancies in science and startups:
1. When prompted for “biologist”, DALL-E shows a range of genders and races. However, when you specify “biology professor”, DALL-E only shows depictions of older white men.
2. When prompted for “biotechnology founder”, DALL-E again shows a diversity of individuals. However, when looking for “biotechnology investors”, DALL-E only depicts men in this role.
3. We see a similar trend searching for “venture capitalist”, again only returning images of mostly white men.