Same dataset as my ????? Caption (Flux) model. The text captions won't be as good cause Pony doesn't create good text, but I figured the spirit of a good ????? caption can still be made with this model.