Ponydiffusion is an excellent model for 2d content, but it seems rather inconsistent with 3d. This model is designed to more consistently produce photorealistic 3d images of a variety of subjects. Currently, the beta version still produces a more CGI effect as I do not believe I have enough sample images, but hopefully future versions will be more realistic. I would recommend checking the description of each version to see what it does and what its drawbacks are for the time being for more detailed info.
Description
This time I tried training the model using booru captions at 1024X1024 on Civitai instead of sentences in hopes that it would yield better results, but I still seem to be running into the same issues as before with it lacking detail and looking washed out. Perhaps I havent trained it for enough steps, or maybe running it at a batch count over 1 is messing it up. Either way, it at least produces a somewhat photorealistic 3d style, though it has issues with backgrounds still. On my next attempt I will remove some images from the dataset and using a batch size of 1 with more steps.