Fashion is a ruthless business

Words by

Published on

December 21, 2023

Squeezing creativity from an LLM

Repo: https://github.com/jftesser/prompt-draw

My household is pretty committed to using technology for creativity. I am, after all, a software designer and my partner, Russell ,builds products for creatives and designs synthesizers in his spare time. So it’s not too surprising that we’ve both spent a lot of time experimenting with LLMs and image generators. We also hear worries from nearly everywhere–including our parents–that AI is boring, mediocre and uninspiring. Earlier this year when we ended up with overlapping vacation time we decided to make a game. A game with AI.

We were inspired by the quick party games created by Jackbox Games. Our goal was to design and build an interaction that highlights some of the creative strengths we’ve both encountered in our own interactions with AI systems and to use those strengths to create a game that’s actually fun to play.

Broadly, the game works like this: players play as up-and-coming fashion designers, competing to win a commission from a celebrity attending a red carpet event. The celebrity shares their desires with the players who then describe the garments they would create in response. The celebrity “looks” at their garments and assesses how well each meets the brief. The round ends with the celebrity choosing which garment they’ll wear to their event and sharing why.

We wanted the AI to take an active role, driving the game action forward and providing a supportive environment that encourages creativity in the players. But we also wanted to lean away from the LLM’s default mode of being encouraging and positive. It requires care and effort to design a system like this – for example, in the system note, we stress that our setting is a game, not the real world. Establishing this fictional context helps the LLM to more easily provide snarky critiques that skew more mean and less friendly.

Getting the prompt to work well took several rounds of iteration. Our first few playtests made it clear that the system thought “alien” and “concert on the moon” were great matches when asked to imagine a wild celebrity client and event. We ended up adding:

"You must not be an alien, and please try not to use a space theme unless you feel really passionately about it.”

and later in the system note:

“Remember, aliens don't walk red carpets."

This was to dissuade it from focusing too much on space. Mimes are also disproportionately popular, but I didn’t mind that quirk and decided to leave it in. In a later version of the system we also feed the last five celebrity names and descriptions back into the system note to decrease the chances of repeat celebrity themes when playing multiple rounds of the game. This can be accomplished because we save previous games in a Firebase database.

After sharing the instructions, the game begins by presenting the players with background on a celebrity client who needs an outfit for an upcoming event. One of my favorite examples is Flamboyanzo the Magnificent. I still regularly think about the name “Flamboyanzo”. I’m not sure why I find it so compelling, but I think this might be why I left the mimes in.

I’m Flamboyanzo the Magnificent, the world's greatest mime turned experimental stage magician! I need an outfit that shouts louder than my silent acting, with a show-stopping hat that could house a thousand rabbits and a cloak that could hide an elephant. Bring on the spectacle!

Using their phones, players have two minutes to describe an outfit that fulfills their client’s wishes and matches their personality.

Once all players have submitted, or the two minutes elapse, slightly conditioned versions of the players’ descriptions are sent back to the LLM for judgment and to an image generator for visualization.

`Full body red carpet photo of ${celebrity} wearing ${prompt}`

The LLM judges how well each player’s prompt meets the celebrity brief and picks a winner. The game displays the resulting images and critiques in sequence before revealing the winner. We use prompt engineering techniques like asking the model to think out loud to both improve the quality of the judgments and to, hopefully, create some funny content:

A beaver tail? What kind of fashion catastrophe is this? However, the diamond lightning suit and ruby stilettos are quite interesting. And the magic wand with emeralds shooting out of it, now that's a spectacle. But laser gloves? This isn't Star Wars, honey.

Beaver tail and sapphire hat. Diamond lightning suit and ruby stilletos. Magic wand with emeralds shooting out of it. Laser gloves - by Murray

For image generation, we started by using the DALL-E API, as we’d already set up an OpenAI connection. This led to some interesting outputs, including a pretty fun interpretation of one of the celebrities as a llama. The human representation, however, was not sufficiently diverse from a skin tone or body type perspective and often had poor composition. We explored other image generator options and decided to go with Stability’s Stable Diffusion XL model due to cost and ease of setup balanced with image quality. This did create a tradeoff between the more fun imaginative mistakes that came from DALL-E and the vastly better average image quality from Stability. I don’t think there will be more llamas in Fashion Frenzy’s future, but there will be more garments I’d want to wear.

A generated photo of a llama — DALL-E's (not DALL-E-3's) best llama

We’ve played Fashion Frenzy with our families and colleagues. Their responses have helped to refine the game, and have also given me some confidence that we’re achieving our design goals. With less design and technology focused audiences like Russell’s parents, there’s a wonderful moment of delight and shock when the player’s garment image is revealed. These players don’t necessarily anticipate that their quick description will be transformed into a semi-realistic visualization, and there’s an inherent satisfaction in seeing your own vague thoughts translated into something beautiful (or funny). When sharing with my coworkers I’ve gotten lots of specific and insightful questions about the prompt engineering that enabled the game design, and even philosophical questions about the nature of humor.

Building this game was a fun and creative experience for me. I used AI assistance tools like GitHub Copilot to help when authoring the game, trying to lean in as much as possible to an AI-enhanced process for my personal work. Part of why I feel so strongly about the creative possibilities of AI is because I’ve been using these tools and their predecessors in my process for years, in ways that I believe make my work better and more interesting. It’s true that Copilot makes me faster, but it also contributed to the prompt engineering of the system note.

The goal of the project was to make some part of the creative possibilities of AI legible and obvious to its audience. From our playtests we have evidence that this is, to at least some degree, working! We also had to confront the amount of effort it took to make a system that achieves this level of legible creativity. I certainly felt the limits of my own creativity, from tasks like patching up the model’s behavior through prompt engineering to designing a serviceable style palette for the web app’s front end. While I think we’ve gotten to a system that creates a compelling and fun game loop with engaging critiques from the LLM, the variation of the LLM’s scenarios and way of speaking isn’t infinite. Some of this is to be expected, after all the background context doesn’t change (much) between runs. The LLM is in its own version of the movie Memento. But it’s also true that even with the care we put into our prompts the system is not creating superhuman levels of fun.

I’m really glad Russell and I made this game. It was much more satisfying than having yet another conversation about LLMs

Want to playtest? Drop me a line at jfizel@ideo.com.

An aside here at the end. I’m at home with my parents for the holiday and we decided to play Fashion Frenzy for the first time since GPT-4 upgraded to gpt-4-turbo. While the quality of the celebrity names hasn’t changed much, the scenarios have become more, well, elaborate. Here’s an example:

Hello to all you wannabes! I'm Morticia Crumble, the infamous multi-platinum kazoo player who once brought the Queen of England to tears with my rendition of 'Bohemian Rhapsody'. Now, I need an outfit for the annual Kazoo Symphony Gala that screams 'extravagant', but also subtly, yet blatantly, shows my undying love for rubber ducks. I want you to use feathers, lots of feathers, making me look like a fabulous canary in a rubber duck pond. Let's see what you got!

We may want to re-engineer a bit to bring down the complexity.

Repo: https://github.com/jftesser/prompt-draw

If These Pants Could Talk

How AI and View-Masters are Helping Us Reimagine the Future

We Made A Game to Test Your Patience—An IDEO Playable Experiment