Instruction-based image editing improves the controllability and flexibility of image manipulation via natural commands without elaborate descriptions or regional masks. However, human instructions are sometimes too brief for current methods to capture and follow. Multimodal large language models (MLLMs) show promising capabilities in cross-modal understanding and visual-aware response generation via LMs. We investigate how MLLMs facilitate edit instructions and present MLLM-Guided Image Editing (MGIE). MGIE learns to derive expressive instructions and provides explicit guidance. The editing model jointly captures this visual imagination and performs manipulation through end-to-end training. We evaluate various aspects of Photoshop-style modification, global photo optimization, and local editing. Extensive experimental results demonstrate that expressive instructions are crucial to instruction-based image editing, and our MGIE can lead to a notable improvement in automatic metrics and human evaluation while maintaining competitive inference efficiency.
👇 press the tab for different datasets
Input | Instruction | InsPix2Pix | LGIE | MGIE | GroundTruth |
---|---|---|---|---|---|
turn the day into night |
|||||
make the forest path into a beach |
|||||
make the frame red |
|||||
as if the shop was a library |
|||||
make it the vatican |
|||||
turn the sunset into a firestorm |
Input | Instruction | InsPix2Pix | LGIE | MGIE | GroundTruth |
---|---|---|---|---|---|
remove text |
|||||
show him on a frozen lake with snowy mountains |
|||||
increase the brightness of the entire image |
|||||
take the people out of the back in the photo |
|||||
add tiger |
|||||
change the background to blue |
Input | Instruction | InsPix2Pix | LGIE | MGIE | GroundTruth |
---|---|---|---|---|---|
edit out skiers on right |
|||||
make it look more professional |
|||||
remove hot air balloons |
|||||
make colors pop out |
|||||
remove boy with red shirt from picture |
|||||
lighten out yellow tone |
Input | Instruction | InsPix2Pix | LGIE | MGIE | GroundTruth |
---|---|---|---|---|---|
add brightness so the clouds look bright white |
|||||
make the color more green |
|||||
add more contrast to simulate more light |
|||||
remove the blue hue out of the picture |
|||||
brighten image a lot, sharpen photo |
|||||
need to clarified, more focus |
Input | Instruction | InsPix2Pix | LGIE | MGIE | GroundTruth |
---|---|---|---|---|---|
have there be a birthday cake on the table |
|||||
put buildings in the background of the image |
|||||
make the face happy |
|||||
let there be palm trees |
|||||
has a green web page |
|||||
replace food with soup |
Input | Instruction | Expressive Instruction by LGIE | Expressive Instruction by MGIE |
---|---|---|---|
turn the day into night |
Turn into night, such as a photograph of a starry night sky, a moon, stars, or a planet. Hold your cursor over the night sky icon. |
If the day were to be turned into night in this image, the Eiffel Tower would be illuminated by artificial lights, creating a contrast against the dark sky. | |
make the forest path into a beach |
The beach would be made from the dirt and rocks that were on the forest path, and the trees would be transplanted to make way for the new beach. |
The image would depict a person running along a sandy shore. The trees surrounding the path would be replaced with ocean waves. | |
make the frame red |
You can edit the image to make the frame red. You could also crop the frame to remove the white borders and make it more prominent. |
If the glasses had a red frame instead of a black frame, the red frame would draw attention to the glasses with the gray background. |
|
as if the shop was a library |
For the idea of making the shop look like a library, you can organize the books and flowers in a way that creates a calm and orderly environment. |
This image shows a woman standing at a counter with an abundance of flowers. The books on the shelves behind her would replace the vases of flowers. |
|
make it the vatican |
The image would be draped in gold and crimson fabrics. The colors would be deep and rich, evoking a sense of reverence and awe. |
If the image were set at the Vatican, it would show a serene scene with the city's iconic architecture, such as St. Peter's Basilica or the Vatican City walls. |
|
turn the sunset into a firestorm |
The entire scene would be enveloped in a hazy, orange-reddish glow, and the horizon would be defined by a towering wall of flame. |
The already vibrant colors from the sunset would be intensified, with the orange hues becoming more fiery and possibly spreading across the sky. |
Input | Instruction | Expressive Instruction by LGIE | Expressive Instruction by MGIE |
---|---|---|---|
remove text |
The sentence would become blank, leaving no visual representation of the previously described image. |
The text could be removed completely, leaving only the dog and the pier. |
|
show him on a frozen lake with snowy mountains |
The image would likely be quite picturesque and beautiful with a touch of brown from the frozen lake and the snow. |
The man enjoys the thrill of gliding across the icy surface. The atmosphere would be accentuated by the snow-covered landscape and the distant mountains. |
|
increase the brightness of the entire image |
Increasing the brightness of an image. The result is a clearer and more vibrant image. |
Adjust the brightness to bring out the details. A setting with a group of women sitting around a round table would appear more vivid. |
|
take the people out of the back in the photo |
The photo would be less crowded and have more space without the people in the back, creating a more tranquil atmosphere. |
The scene would appear more empty and there would be a noticeable presence of the bride and groom. |
|
add tiger |
This image features a cartoon illustration of a beloved character of Tiger. His bright orange fur stands out against the green grass below. |
An image of a man holding a football in a field will be enhanced with the addition of tiger. It will engage in a playful interaction or enjoy the field. |
|
change the background to blue |
If you change the background color of an image to blue, the other colors in the image will be adjusted to match the new background color. |
A woman wearing a pink dress and holding yellow sunflowers would stand in front of a blue background. |
Input | Instruction | Expressive Instruction by LGIE | Expressive Instruction by MGIE |
---|---|---|---|
edit out skiers on right |
This image originally featured skiers on the right side, but they have been removed, leaving the image empty. |
The image would focus on the individual skier with the backpack instead of the group of skiers in the foreground. |
|
make it look more professional |
Using high-quality, professional-looking stock images or custom artwork. Make it more visually appealing, or remove any unwanted elements. |
This image could be made more professional by featuring a businessman and a composition that highlights the tall building in the background. |
|
remove hot air balloons |
The sky is now empty. These colorful vessels have now disappeared, leaving no trace of their existence. |
The mountain range would still be the focus of the photograph, but the balloon would no longer be a focal point. |
|
make colors pop out |
Adjust the colors in the image pop to make them look more vivid and striking. This will create a more eye-catching contrast. |
It would enhance the vibrancy and saturation of the man's tan trench coat and light blue sneakers, as well as the wooden bench and dirt field. |
|
remove boy with red shirt from picture |
The image would be filled in the area where the boy with the red shirt is currently visible. |
Without the boy, the scene would feature a man alone in a forested area, sitting on a log and enjoying a drink. |
|
lighten out yellow tone |
Reduce the amount of yellow tone in the image by adjusting the white balance to make the tone appear less yellow. |
The image of the man and woman in formal attire will appear more vibrant and lively, and the background will become less prominent. |
Input | Instruction | Expressive Instruction by LGIE | Expressive Instruction by MGIE |
---|---|---|---|
add brightness so the clouds look bright white |
Make the clouds in your image look brighter and more white. Or remove them and then add back in as white. |
Adjust the brightness of the image to bring out the details of the sky, the cloud, and the building in the picture. |
|
make the color more green |
The image would become greener if the color was made more green. This would mean more green pixels, resulting in a greener hue overall. |
Enhance the greenery in the image by making the grass greener. The field would have a richer hue and would stand out more against the backdrop. |
|
add more contrast to simulate more light |
Adding more contrast to an image will make it appear more vibrant, sharp, and detailed, just like what you see in real life. |
It would make the snow brighter, the skier's jacket stand out more, and other details more visible and dynamic. |
|
remove the blue hue out of the picture |
Colors and details will be easier to distinguish, allowing the image to truly stand out. |
The image would make the orange truck stand out. The sky in the background would also show more vivid colors. |
|
brighten image a lot, sharpen photo |
The image would be transformed into a painting-like look, with bright colors and sharp details. It would be highly saturated. |
Brighten the damaged window sill to bring out the details of the chipping paint and broken glass. |
|
need to clarified, more focus |
It encourages the viewer to take a closer look and focus on the key elements, bringing the image to better clarity and sharpness. |
This will help capture the intricate details of the yellow and orange flowers and maximize the visual impact of the bouquet. |
Input | Instruction | Expressive Instruction by LGIE | Expressive Instruction by MGIE |
---|---|---|---|
have there be a birthday cake on the table |
A birthday cake sits on a table, decorated with candles and frosting. |
This image features a kitchen with a dining table and chairs, and a birthday cake has been added to the table. |
|
put buildings in the background of the image |
The image would be a peaceful cityscape. Silhouetted buildings in the background would be a stunning contrast. |
The image would feature a train traveling down the tracks with city buildings in the background, creating a dynamic composition. |
|
make the face happy |
Making the face emoji happier would involve adding more emotion and energy, like smiling lines and cheerful expressions. |
Change the emoticon displayed on a cell phone screen to a cheerful one, next to a laptop on a desk. |
|
let there be palm trees |
The image would be transformed from its current state to a landscape with palm trees, bringing a sense of warmth and vibrancy. |
Palm trees would be a natural addition to a beach scene with a large clock in the foreground. |
|
has a green web page |
This web page will show a green background with white text. The vibrant color combination is sure to make any content stand out. |
The green web page is related to a project or task that the user is currently working on. The other scheme still remains. |
|
replace food with soup |
The bowl was empty, so I filled it with a bowl of steaming soup. The aroma filled the room as I stirred the soup, ready to enjoy. |
The image would change from sandwiches to bowls of soup. The tray would still have a plate but with a different meal option of soup. |