Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
A pair of researchers at OpenAI has published a paper describing a new type of model — specifically, a new type of continuous-time consistency model (sCM) — that increases the speed at which multimedia including images, video, and audio can be generated by AI by 50 times compared to traditional diffusion models, generating images in nearly a 10th of a second compared to more than 5 seconds for regular diffusion.
With the introduction of sCM, OpenAI has managed to achieve comparable sample quality with only two sampling steps, offering a solution that accelerates the generative process without compromising on quality.
Described in the pre-peer reviewed paper published on arXiv.org and blog post released today, authored by Cheng Lu and Yang Song, the innovation enables these models to generate high-quality samples in just two steps—significantly faster than previous diffusion-based models that require hundreds of steps.
Song was also a leading author on a 2023 paper from OpenAI researchers including former chief scientist Ilya Sutskever that coined the idea of “consistency models,” as having “points on the same trajectory map to the same initial point.”
While diffusion models have delivered outstanding results in producing realistic images, 3D models, audio, and video, their inefficiency in sampling—often requiring dozens to hundreds of sequential steps—has made them less suitable for real-time applications.
Theoretically, the technology could provide the basis for a near-realtime AI image generation model from OpenAI. As fellow VentureBeat reporter Sean Michael Kerner mused in our internal Slack channels, “can DALL-E 4 be far behind?”
Faster sampling while retaining high quality
In traditional diffusion models, a large number of denoising steps are needed to create a sample, which contributes to their slow speed.
In contrast, sCM converts noise into high-quality samples directly within one or two steps, cutting down on the computational cost and time.
OpenAI’s largest sCM model, which boasts 1.5 billion parameters, can generate a sample in just 0.11 seconds on a single A100 GPU.
This results in a 50x speed-up in wall-clock time compared to diffusion models, making real-time generative AI applications much more feasible.
Reaching diffusion-model quality with far less computational resources
The team behind sCM trained a continuous-time consistency model on ImageNet 512×512, scaling up to 1.5 billion parameters.
Even at this scale, the model maintains a sample quality that rivals the best diffusion models, achieving a Fréchet Inception Distance (FID) score of 1.88 on ImageNet 512×512.
This brings the sample quality within 10% of diffusion models, which require significantly more computational effort to achieve similar results.
Benchmarks reveal strong performance
OpenAI’s new approach has undergone extensive benchmarking against other state-of-the-art generative models.
By measuring both the sample quality using FID scores and the effective sampling compute, the research demonstrates that sCM provides top-tier results with significantly less computational overhead.
While previous fast-sampling methods have struggled with reduced sample quality or complex training setups, sCM manages to overcome these challenges, offering both speed and high fidelity.
The success of sCM is also attributed to its ability to scale proportionally with the teacher diffusion model from which it distills knowledge.
As both the sCM and the teacher diffusion model grow in size, the gap in sample quality narrows further, and increasing the number of sampling steps in sCM reduces the quality difference even more.
Applications and future uses
The fast sampling and scalability of sCM models open new possibilities for real-time generative AI across multiple domains.
From image generation to audio and video synthesis, sCM provides a practical solution for applications that demand rapid, high-quality output.
Additionally, OpenAI’s research hints at the potential for further system optimization that could accelerate performance even more, tailoring these models to the specific needs of various industries.
Source link