A character LoRA for LTX-Video 2.3 (22B) trained with the audio-video training mode, so the character's voice and speech delivery are baked into the adapter alongside her appearance and motion. Trained on 128 short Arcane clips (video + stereo audio).
Training details
Base model: Lightricks/LTX-2.3 (22B)
Training framework: ltx-trainer (Lightricks)
Training strategy: text-to-video + audio (
with_audio: true)Best checkpoint: step 21,000 (out of 22,000 total)
LoRA rank / alpha: 128 / 128
Target modules:
to_k,to_q,to_v,to_out.0(video + audio + cross-modal attention)Optimizer: Prodigy
Mixed precision: bf16
Batch size: 1
First-frame conditioning: 0.3 (the adapter also works in image-to-video mode)
Resolution buckets: 960x544 @ 49 / 97 / 121 / 145 / 193 frames
Dataset: 128 short clips
Inference
For inference I used ComfyUI.
Trigger word: Nfj1nx
Strength: 0.8-1.0.
Important Notes
This LoRA is created as part of a fan project for research purposes only and is not intended for commercial use. It is based on the TV series "Arcane" which is protected by copyright. Users utilize the model at their own risk. Users are obligated to comply with copyright laws and applicable regulations. The model has been developed for non-commercial purposes, and it is not my intention to infringe on any copyright. I assume no responsibility for any damages or legal consequences arising from the use of the model.
Acknowledgement
Special thanks to Lightricks for open-sourcing the LTX-2 trainer and releasing the model
Support
Fine-tuning models like this requires renting cloud GPUs, which gets expensive quickly. If you find this LoRA useful and would like me to keep contributing open-source models, your support is very much appreciated: