!Early Version 1/3 Epoch!
Early alpha version, will be updated after each full epoch. One epoch takes around ~38 hours on an H200 :( .
Examples are not cherry picked and the first gen.
How to use:
Get the workflow from hugginface.
Optional: Generate an SBS image with Qwen, or other tool.
Version 2 doesnt always keep track of whats in the first frame.
Version 1 with stereoimage:
Enable the correct nodegroup in the workflow.
Version 2 without stereoimage:
Enable the correct nodegroup in the workflow.
An black frame is appended to the right side of the image
If you oom, change the scaling of the final input image with the node. (Maybe you can upscale the output)
Triggerword: Make stereogram
Description
First alpha release with 1/3 epoch and ~17 hours time. Low noise version
FAQ
Comments (27)
~38 hours on an H200
This is obscene. Can you talk about your technique and methods?
wat... sounds strange , you can train on low res with probably 100 batches with that and it would take you 2h max
FP8, 81 frames and pretty big res.
@blo01 I could try a low res run and a high res at the end. Maybe if i do an V.2
@blo01 Low res is tricky, because the stereoscopic information can sometimes be extremely subtle. I can imagine a low res train would mess up distant objects, for instance.
@Jellai - have you found that higher res training allows for better facial likeness and rendering at a distance? Any comparisons I can look at?
@cihog - What is a pretty big res? Sorry just trying to understand. Also, does the model learn something more from 81 frames than it would from 49 frames?
Just went through your examples with my eyes crossed and yeah, definitely has an effect! Very promising tech.
goat. thank you for pursuing this, 0.5 already looks really good, can't wait for the full release.
holy shit it works
You madman. Congrats on the beautiful lora
There are already apps that will turn anything into this format. Assuming this works as good as dedicated apps, which isn't likely.. it's low resolution when you could just generate full resolution then convert full quality to SBS (side by side) or other formats. Better use of your time would be creating a good working VR180 (or similar) distortion profile lora.
If you mean Depthcrafter yes, something like iw3 (if it still using an depthmap) isnt the same. You 'just' distort the image using an depthmap, if i recall correctly. And this isnt an V2V
@cihog I think your efforts are misplaced and I don't know what you mean by distorting the image using a depthmap (that's how to create 3d from 2d.. monocular depth estimation.. and using a good model to do it) and how that differs from whatever you're doing here. I looked crosseyed to get a quick feel for the 3D, and I'm not so sure about it's accuracy. A bunch of details don't look accurate (I know crosseyed is the reverse of how it would look in VR), but I could see better in VR another time.
When you generate a SBS AI image, are you not using half the resolution? If so, then it's going to be low resolution when you could just make a full resolution picture or video and then convert it, in iw3, like you mentioned.. or Owl3D, etc.
Same as I mentioned about trying to create a VR180 distortion profile lora... you would be creating 1 eye at full resolution with AI and then converting it to SBS (or anaglyph or whatever you want) with a dedicated program, instead of trying to generate both eyes in AI, where it would be half the resolution per eye and the AI likely won't line everything up correctly for both sides or give accurate depth compared to conversion in a dedicated program using the best available monocular depth estimation models.
@civitai7_ you can't generate new information with just one image. If your distorting the source to much you'll see it. For certain scenes it works good/fine. The issue with the resolution is known. If you think an VR180 Lora is a better way to spend time, do it. I'll be happy to have a good lora for this.
@cihog I did do it.. some examples on my profile.. not for WAN 2.2 though and not very well, just proof of concept.
maybe this could be stupid question but what it does? what is Stereogram?
vr video
Using VR glasses, it becomes a 3D video. Each eye sees the video from a minimal different angle, same as using two cameras close to eachother. Pause the example videos, you see objects in foreground have a minimal moved background in the left/right video.
Works great. I'm just wondering if it is possible to make a first/end frame version of the workflow? Anybody have any ideas on how to do this?
One way is to use the WanVideo ImageToVideo Encode node from the WanvideoWrapper extension. However, this will require reworking most parts of the workflow and may slow down the initial launch of generation. Sometimes this node takes a lot of time to work. I'm going to do it for myself anyway and will see how it will work in real cases.
These types of experimental LORAs usually only have limited success but this actually works really great with the work flow provided!
Work well,but i got error in colormatch node.I have a finished video everytime i generate but still have that error,is that ok?
Hi, that should be fine. The node is just to keep the right color tone.
Ok, I am going to reveal how to make a 3D VR HOLOGRAM video or photo of any object of desire you want, in passthrough, inside your own room. But, you if you want to know, I need help to train a special kind of lora, I do not know how to do that, message me if you want to trade.
i am looking for a version or app that makes SBS 180 VR. anyone can help? thamks
*Edit: I missed the "180" in your request. I added something at the bottom to address that.
Check out Rendepth for an all-in-one app that can upscale and convert your images to SBS on-the-fly, as well as batch convert. My only issue with it is that I prefer my SBS images to "pop out" of the frame and Rendepth only generates "inside depth" out of the box. It uses a DepthAnythingV2-based companion app for depth mapping, though, so it was easy to add some code and invert the depth map. The default inside depth results were fantastic but when I switched it to pop out, the results were OK but, it doesnt handle the gaps well.
NOW, if you want insane, 8K upscaled, pop-out 3D SBS images, the sky is the limit with ComfyUI_SSStereoscope. If you want the most control and best results, this is the best way to go - but there will be a steep learning curve compared to a click and see app like Rendepth.
Honorable mention for "3DCombine v6", which is an older app with an INSANE array of settings and capabilities - but it is very manual - can be useful if you want to try converting from 2D with fish-eye or dome effects for surround-me immersion, though. Use a VR viewer like Whirlgig for the best overall quality viewing experience - especally for FOV surround stuff.
I realized this was possible a few months ago and I've been making "things" pop out at me quite nicely in VR ever since. oh - and I highly recommend Rendepth as a starting point to anyone who uses the desktop app "Bigscreen". Bigscreen uses Half_SBS and Rendepth can generate Half_SBS on the fly - so now youve got your whole image library viewable in 3D, in an instant, on a huge movable screen that you can curve around you.
I could actually write 3 to 6 more paragraphs at least - I've been putting some time into this, lol. But I think anyone who reads this should have a VERY good starting point now. Something I did not have - I had to get past the "is this even possible?" stage and try all this stuff out over 2 months - but not you - gotta love a head start :D Have some fun.
*Edit: For FOV renders you'll need 3DCombine and do it with their guides semi-manually. I don't see why someone couldnt make it more automated for batch conversion by porting the functionality to ComfyUI. It's only been less than a year since SSStereoscope even existed so, give it time - or request it as a feature on github. The dev is receptive.