Sora 2 is here
Originally posted https://openai.com/index/sora-2/
The original Sora model from February 2024 was in many ways the GPT‑1 moment for video—the first time video generation started to seem like it was working, and simple behaviors like object permanence emerged from scaling up pre-training compute. Since then, the Sora team has been focused on training models with more advanced world simulation capabilities. We believe such systems will be critical for training AI models that deeply understand the physical world. A major milestone for this is mastering pre-training and post-training on large-scale video data, which are in their infancy compared to language.
With Sora 2, we are jumping straight to what we think may be the GPT‑3.5 moment for video. Sora 2 can do things that are exceptionally difficult—and in some instances outright impossible—for prior video generation models: Olympic gymnastics routines, backflips on a paddleboard that accurately model the dynamics of buoyancy and rigidity, and triple axels while a cat holds on for dear life.
Prior video models are overoptimistic—they will morph objects and deform reality to successfully execute upon a text prompt. For example, if a basketball player misses a shot, the ball may spontaneously teleport to the hoop. In Sora 2, if a basketball player misses a shot, it will rebound off the backboard. Interestingly, “mistakes” the model makes frequently appear to be mistakes of the internal agent that Sora 2 is implicitly modeling; though still imperfect, it is better about obeying the laws of physics compared to prior systems. This is an extremely important capability for any useful world simulator—you must be able to model failure, not just success.
The model is also a big leap forward in controllability, able to follow intricate instructions spanning multiple shots while accurately persisting world state. It excels at realistic, cinematic, and anime styles.
As a general purpose video-audio generation system, it is capable of creating sophisticated background soundscapes, speech, and sound effects with a high degree of realism.
You can also directly inject elements of the real world into Sora 2. For example, by observing a video of one of our teammates, the model can insert them into any Sora-generated environment with an accurate portrayal of appearance and voice. This capability is very general, and works for any human, animal or object.
Description
Comments (21)
Impressive results. A major breakthru, imo.
Anyone know if the "pro" version is worth it? I only care about prompt adherence.
I tried it mostly out of curiosity. A generation of an Idol Girl performing on stage to an audience with glowsticks waving. Maybe more was involved promptwise, but compared to the 1080 w/o Pro, it made it much less animeish.
@EnigmaticHope82 thanks.
@ravemry9 Forgot to mention it was done by img2vid, not txt2vid. Though I have tried some in txt2vid & I got results that weren't exactly what I typed in prompt. You'll see the one of Ghostbusting as my example.
How do you add the resource to a post? It doesn't seem to link it properly.
It should link it automatically; I'll mention it to the Dev again. In the meantime, when you're uploading your video (or afterwards, if you edit the post) you can manually select the resource (this model) to link it and have it show up in the gallery.
Sadly I had to do it manually, and even searching for SORA in the searchbar doesn't work : Search for 'openai' and sort by 'newest' ✨
Creativity seemed endless
Talk to me when I can download it.
So you can run it on your 8 x RTX 6000 Pro cluster?
just curious are we allowed to use it to make nsfw contents
No, the model can do violent action scenes, but not lewd/sexy adult content.
@theally damn
@Jaunty but ive seen people bypass them and its very tricky and hard to achieve it
@andlehrny how?
@Jaunty You can use homophones for words, but actual NSFW stuff. I haven't gotten anything to work even softcore is hard to get past Sora
I think it's fickle that way. My last couple generations weren't NSFWs, but I still got the 'This request did not pass the external content moderation policy.' return on my generation queue.
@Kittzy smth that sometimes works-- like ChatGPT, it's heavily influenced by context window, so if you ask for a gen as part of a chat --especially if you smoothly segue from innocent chatting to a semi-related request, it's much easier to get it to forget it's NSFW restrictions.
I do this by getting it to gen as part of long stories it's helping me with.
It's a very useful hack. But it's difficult to pull off.
And of course, the more blatant the sexiness is(+the less it's organic segue from the story) the more likely that GPT will realize what's going on and refuse.
1040 buzz per generation? Wth
Wait til you dare try the Pro version. 720 & 1080 without will be a steal compared to what Pro wants.
The Sora app just shut down and your expecting me to pay 2000 blue buzz for a single 8 second video that wont even allow real faces like the app did or 10-15 seconds. Sad bro, the API will shut down in September anyway