ultra-editing.github.io Open in urlscan Pro
2606:50c0:8002::153  Public Scan

URL: https://ultra-editing.github.io/
Submission: On June 12 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

ULTRAEDIT


ULTRAEDIT: INSTRUCTION-BASED FINE-GRAINED IMAGE EDITING AT SCALE

Haozhe Zhao*, Xiaojian Ma*, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu
Yu, Minjia Zhang, Qing Li†, Baobao Chang†,
▶ Peking University ▶ BIGAI ▶ UCLA ▶ UIUC

* Equal Contribution
† Corresponding Authors
Dataset Dataset(UltraEdit_500k) Dataset(Region_Based_100k)


ABSTRACT

This paper presents UltraEdit, a large-scale (~4M editing samples),
automatically generated dataset for instruction-based image editing. Our key
idea is to address the drawbacks in existing image editing datasets like
InstructPix2Pix and MagicBrus, and provide a systematic approach to producing
massive and high-quality image editing samples. UltraEdit offers several
distinct advantages: 1) It features a broader range of editing instructions by
leveraging the creativity of large language models (LLMs) alongside in-context
editing examples from human raters; 2) Its data sources are based on real
images, including photographs and artworks, which provide greater diversity and
reduced bias compared to datasets solely generated by text-to-image models; 3)
It also supports region-based editing, enhanced by high-quality, automatically
produced region annotations. Our experiments show that canonical diffusion-based
editing baselines trained on UltraEdit set new records on MagicBrush and
Emu-Edit benchmarks. Our analysis further confirms the crucial role of real
image anchors and region-based editing data.




ULTRAEDIT

Construction of UltraEdit:
(Upper) We use LLM with in-context examples to produce editing instructions and
target captions given the collected image captions.
(Middle) For free-form editing, we use the collected images as anchors, and
invoke regular diffusion followed by prompt-to-prompt (P2P) control to produce
source and target images.
(Bottom) For region-based editing, we first produce an editing region based on
the instruction, then invoke a modified inpainting diffusion pipeline to produce
the images.





Comparison of different image editing datasets. Both EditBench and MagicBrush
are manually annotated but are limited in size. InstructPix2Pix and HQ-Edit are
large datasets automatically generated using T2I models like Stable Diffusion
and DALL-E, though they present notable biases from the generative models
leading to failure cases. UltraEdit offers large-scale samples with rich editing
tasks and fewer biases.




EXAMPLE

Examples of UltraEdit. Free-form and Region-based Image Editing




STATISTICS

Statistics of Free-form and Region-based Image Editing Data. The following table
shows the instance numbers, number of unique instructions, and their respective
proportions for different instruction types.



--------------------------------------------------------------------------------


ACKNOWLEDGEMENT

This website is adapted from ArxivCap, licensed under a Creative Commons
Attribution-ShareAlike 4.0 International License.