CivArchive
    Deepseek Janus Pro 1B / 7B [Safetensors] - Janus-Pro-1B (zipped)
    NSFW
    Preview 54431747
    Preview 54431749
    Preview 54431768
    Preview 54431830
    Preview 54431770
    Preview 54431769
    Preview 54431867
    Preview 54431863
    Preview 54431866
    Preview 54431874
    Preview 54431870
    Preview 54431868
    Preview 54431872
    Preview 54431864
    Preview 54431869
    Preview 54431865
    Preview 54431873
    Preview 54432449

    https://huggingface.co/deepseek-ai/Janus-Pro-1B

    https://huggingface.co/deepseek-ai/Janus-Pro-7B

    Note: The CY-CHENYUE/ComfyUI-Janus-Pro nodes doesn't support .safetensors.

    So I updated/forked the model_loader.py to automatically download, and support .safetensors. It refused to let me rename the files, so you need to keep them named model.safetensors

    For the 7B version, I could not get shard-merging to work. So they will be sharded in 3 parts.

    Installation instructions

    • Install ComfyUI

    • Install the CY-CHENYUE/ComfyUI-Janus-Pro node-pack

    • Manually overwrite the model_loader.py in ComfyUI\custom_nodes\ComfyUI-Janus-Pro\nodes\model_loader.py with the one above

    • You can use the ComfyUI Workflow above

    • The updated model_loader script will automatically download the model and place it in the correct folder

    • To do it manually, unzip the files for your desired version in the model list above so that the folder structure looks something like the screenshot below.

    So the model path for the 1B version should be:

    ComfyUI/models/Janus-Pro/Janus-Pro-1B/model.safetensors

    But remember that you also need the config and the rest of the files, which is why it's uploaded as a .zip

    There's also a version that is just the support-files, if you would rather combine that with the original .bin checkpoint models.

    Congratulations!

    With a 3090, 24gb, you can enjoy speedy 8-minute generations for a 384x384 image that looks much worse than anything Stable Diffusion 1.5 spits out in 0.5 second.

    Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.
    Janus-Pro is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation. Janus-Pro is constructed based on the DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base.
    For multimodal understanding, it uses the SigLIP-L as the vision encoder, which supports 384 x 384 image input. For image generation, Janus-Pro uses the tokenizer from here with a downsample rate of 16.

    This is the converted .safetensors version of the model.

    The original 7B ones can be found here: https://huggingface.co/deepseek-ai/Janus-Pro-7B/tree/e6ac502c7931490e5b56b0ff2d30413f2a21b887

    Description

    FAQ

    Comments (76)

    killerdukk110Jan 28, 2025
    CivitAI

    awesome, is there a safe tensors conversion of the 7b model available?

    mnemic
    Author
    Jan 28, 2025

    Yes. I've combined them and I'm uploading it right now.

    mnemic
    Author
    Jan 28, 2025

    @muxelmann Thanks!
    I didn't know how to get the link to the PR, so I got them manually, and wanted to save people the time and effort to get the model files in that way :)

    0l1v1aR0551Jan 28, 2025· 12 reactions
    CivitAI

    J-ANUS 🫱(‿¤‿)🫲

    Huh lol, is it any good?

    mnemic
    Author
    Jan 28, 2025

    @P_Universe No

    @mnemic damn I thought it could be the next success

    Dom83Jan 28, 2025
    CivitAI

    Can this be used in Forge or is it only compatible with ComfyUI?

    mnemic
    Author
    Jan 28, 2025

    Only ComfyUI until someone integrates it into Forge.

    JayeciferJan 28, 2025· 1 reaction
    CivitAI

    It's really slow and isn't giving me great outputs.

    MrDOJan 28, 2025· 4 reactions
    CivitAI

    At this moment only garbage images from 7B model :/

    mnemic
    Author
    Jan 28, 2025

    Yeah, it seems to be incapable of anything reasonable.

    The 1b model generates fast after the initial load at least.

    alternative_UniverseJan 28, 2025
    CivitAI

    What's the recommended cfg and samplers?

    mnemic
    Author
    Jan 28, 2025· 5 reactions

    The recommended CFG is to go back to SD1.5, it produces better outputs than this garbage.

    @mnemic lol

    pychobj2001741Jan 29, 2025

    @mnemic I came here to say something like this also...mine was going to be... "Don't" just by looking at the examples here unless you are into the "Nightmare Fuel" aesthetic.

    mnemic
    Author
    Jan 30, 2025

    @pychobj2001741 100%
    When I realized, I was even more determined to get it working and share the results, just to save people the time.

    pychobj2001741Jan 30, 2025· 3 reactions

    @mnemic The hero we deserve

    pychobj2001741Jan 30, 2025

    @mnemicI i don't know the Batman quote 

    RedPinkRetroJan 28, 2025
    CivitAI

    😕
    • takes ~15-20s per generation (on a 4080)

    • requires lots of VRAM and still hits oom with more than a simple sentence of prompt (with 16gb VRAM)

    • to get outputs with 384x384 resolution with lots of hallucinations and deformations

    TLDR: Functions ok as an image captioner ~Florence2 level, but using 10x resources and space...

    mnemic
    Author
    Jan 28, 2025· 1 reaction

    Yup! It's quite funny :D

    RedPinkRetroJan 28, 2025

    @mnemic Maybe something useful will come from it at some point. For now it seems to be quite gimmicky like the Omnigen model, which did everything and nothing, taking ages in the process 😅

    mnemic
    Author
    Jan 28, 2025

    @RedPinkRetro Yeah, let's see.

    SwampGassedJan 28, 2025· 5 reactions
    CivitAI

    I've watched a few videos on this, doesn't seem to even be worth messing with right now, don't believe the hype people. 🤔

    mnemic
    Author
    Jan 28, 2025

    Oh the hype is real, just check out the preview images XD

    Pandaofd00mJan 28, 2025· 14 reactions
    CivitAI

    And this killed 15% of NVidias market share? Oh Boy.

    (yes I know this was about the LLM itself and not the image generation - but still)

    denrakeiwJan 28, 2025· 1 reaction

    Buy the Dip ;)

    SencneSJan 28, 2025

    Heheh Making the model open and free to install your own instance is what ate the Market Share.
    If I ask 1 model that costs $0.01 per query to describe a image. And it does a great job. To Corporations they'll go with the 1 model that costs $0.009 per query that describes the image even if it's slightly less descriptive.

    That's capitalism baby! If 1/10th of a cent can be saved but produces a result that is acceptable, it's all golden.

    GitarooManJan 28, 2025· 1 reaction

    No, this is one model of a much larger DeepSeek family of models. The one that's tearing the world apart is their reasoning chat model that was made at Costco and is 1/4 of the cost of ChatGPT's best model

    zmiroxJan 29, 2025· 1 reaction
    CivitAI

    So far the generated images have not been good for me. I have used model 1B, lots of deformations and it is not good at handling texts.

    mnemic
    Author
    Jan 30, 2025

    Interesting! Did you manage to increase the generation resolution?

    zmiroxJan 30, 2025· 1 reaction

    @mnemic It's not possible. Maybe soon.

    praetJan 29, 2025· 1 reaction
    CivitAI

    Think this is not a 'real' diffusion model, hence the poor results

    mnemic
    Author
    Jan 30, 2025

    How do you mean? Why is it not a real diffusion model?

    praetFeb 4, 2025

    @mnemic the techniques used, it's closer to an LLM than a diffusion one

    mnemic
    Author
    Feb 4, 2025

    @praet I see, okay. Interesting.

    dmOrmonJan 29, 2025· 6 reactions
    CivitAI

    Explains why they haven’t posted table of comparison for Aesthetics. This looks horrible, worse than SD1.5, somewhere near DALL-E1/Midjourney 1/2.

    “Best prompt following!”, yeah, sure.

    MomongasJan 31, 2025

    Is there a change that you downloaded the "7B" version?

    dyioulos591Jan 29, 2025· 2 reactions
    CivitAI

    Is there a way to run these models with CPU only?

    Pandaofd00mJan 29, 2025· 2 reactions

    Honestly? Save some power and just don't try it (at least not yet). Scroll through the example images - that's pretty much all you can expect

    mnemic
    Author
    Jan 30, 2025

    Yeah, not sure why you would want to run these models :D

    I guess it should be doable. I didn't bother trying.

    StinkekJan 30, 2025
    CivitAI

    So, they reinvented Craiyon, except it's not viable to run on a potato?

    2182072Jan 31, 2025· 3 reactions
    CivitAI

    Reminds me of early DALL-E, I'm sure deepseek image gen will improve with time.

    kasinatorFeb 1, 2025· 1 reaction
    CivitAI

    How can i change the size of the output?

    mnemic
    Author
    Feb 1, 2025

    You can't. Not with this image generator in Comfy yet at least.

    f95hnggFeb 1, 2025· 8 reactions
    CivitAI

    Well, they got the regressive part right.

    Eagle4477Feb 2, 2025· 9 reactions
    CivitAI

    some of these images are rated R and X. Like bro, I can't even understand what's going on in the image

    mnemic
    Author
    Feb 2, 2025

    Maybe that's the kink? The uncertainty of this models outputs turns the image scanner on? What will it be next? A WOMAN laying on grass?

    jaffaparty420Feb 8, 2025

    Metadata flags

    cavallomanFeb 3, 2025· 4 reactions
    CivitAI

    This model works better for CV-Computer Vision applications such as describing an image so you can try to recreate it. such as Florence2. Trying to gen images doesn't make much sense. Use this instead of your other CV models.

    praetFeb 4, 2025

    It should be pitted against qwen2.5 VL, there's also SmolVLM

    cavallomanFeb 4, 2025

    @praet Qwen is not local is it though?

    cavallomanFeb 4, 2025

    @mnemic thanks i will do some local benchmarks, have a great week

    cavallomanFeb 4, 2025· 1 reaction

    @mnemic nice nodes, thanks, will star your repo

    MustyFeb 4, 2025· 27 reactions
    CivitAI

    2017 was calling and want this model back

    smockwigFeb 5, 2025· 9 reactions
    CivitAI

    In my opinion, the way it follows the prompt is nothing short of miraculous. If they continue to improve this architecture, LLMs will indeed be able to create images!

    condzero1950Feb 6, 2025· 8 reactions
    CivitAI

    I am running the native Janus Pro 7B model from github. For s**ts & giggles I quantized the model to QINT8 just to see how it works. Works fine.

    I would compare image quality to <= SD 1.5. I upscale the 384 X 384 images using RealESRGAN scale = 4, but you can also use the scale = 2 model.

    Hopefully, they or someone can fine tune this model to generate better images in the future. Speed wise it's similar to SD 3.5 on my machine. I am only generating 1 image as opposed to the default (5) images. The text it produces is a bit choppy but works.

    DiffussyFeb 8, 2025· 21 reactions
    CivitAI

    SD1.5 called, they said this model sucks!

    chieeoFeb 8, 2025· 6 reactions
    CivitAI

    There are serious issues of image breakdowns during use, and we hope these can be improved.

    yangshengzhou07764Feb 11, 2025· 8 reactions
    CivitAI

    Makes no sense that they released this

    5310116Feb 14, 2025· 13 reactions
    CivitAI

    "Congratulations!

    With a 3090, 24gb, you can enjoy speedy 8-minute generations for a 384x384 image that looks much worse than anything Stable Diffusion 1.5 spits out in 0.5 second."

    This made me laugh way too hard.

    mnemic
    Author
    Feb 14, 2025· 2 reactions

    Appreciate it. Quite truthful though!
    Using this model is meant to make you laugh I guess.

    Here's something to keep the laughs up:
    https://www.youtube.com/watch?v=_uTMyY1irUg

    CitronLegacyFeb 23, 2025

    LOL I had the same reaction when I read that.

    ShakingFeb 21, 2025· 4 reactions
    CivitAI

    加。。。加油

    jeffthomann871Feb 17, 2026
    CivitAI

    this used to work very well for me, but now it does not any longer? Tensor.item() cannot be called on meta tensors


    mnemic
    Author
    Feb 18, 2026

    Are you saying you were actively using this model?

    jeffthomann871Feb 20, 2026· 1 reaction

    @mnemic I got it working again. I'm not using the model to render images. Instead I'm using it to describe images as it does a heck of a faster job than qwen, ollama, etc. and it doesn't use up tokens in the process that cost like the other guys do... I've got some workflows on this over at https://openart.ai/workflows/@mongrel_monstrous_1

    mnemic
    Author
    Feb 20, 2026

    @jeffthomann871 Nice use case!
    https://github.com/MNeMoNiCuZ/AThousandWords/

    I just released this one (not announced properly yet). It's a VLM suite. Do you reckon that Janus is good enough to warrant implementation there?

    jeffthomann871Feb 22, 2026

    @mnemic we'll try it and see what happens. Last time I tested things other than deep seek/janus here it required api keys and things and unless you are paying monthly for access to those types of things will run out of tokens quickly. For instance in gemini it would stop working after about an hour and half once your daily limit is exceeded. P.S. Since this comment is on the Deep Seek Janus, which is just what is in the zip file, and you maintain the new node why not build deep seek janus in to your variant in future versions as this appears to be the only safetensor version of Janus that exists at this point in time. Also, have you tested all of what you have built yet? In ancient days when I was testing out tagger nodes it seemed that some of them would just do weird stuff like make every image of an aniaml be tagged as a pokemon, etc. ...... I did try to install your thousand words thing but it didn't work for some reason. For some reason when I go in to the gui bat file it says PyTorch is MISSING However, I've installed it a few times and it still keeps saying that?!?....

    jeffthomann871Feb 23, 2026· 1 reaction

    @mnemic @mnemic Thanks! Finally got your thousand words app working. I still cannot get flash attention working so cannot do something things but others work nicely. Took a while to get cuda and venv thing worked out but now it finally works.

    jeffthomann871Feb 23, 2026

    Deep Seek: The image showcases a vibrant and colorful design featuring various circular and floral shapes. The background is a dark teal color, which contrasts sharply with the bright and vivid colors of the shapes. The shapes include flowers, circles, and abstract forms, all filled with different colors such as red, yellow, blue, and green. The texture appears to be glittery, giving the design a sparkly and lively appearance. The overall composition is playful and dynamic, evoking a sense of joy and creativity. The intricate details and bright hues make it visually striking and engaging. Same with smolVLM2: The image depicts a vibrant, colorful, and intricate pattern of flowers and shapes, likely made of a material that resembles plastic or a similar substance. The flowers are predominantly in shades of orange, yellow, and blue, with some featuring red and purple accents. The pattern is densely packed, with each flower and shape overlapping and intersecting with others, creating a complex and dynamic visual effect. The background is a dark, textured surface, which contrasts with the colorful foreground. The overall composition of the image suggests a playful and imaginative use of color and pattern, possibly intended to evoke a sense of joy or creativity. There are no discernible texts or other objects in the image, and the relative positions of the objects are such that the flowers and shapes are arranged in a seemingly random yet harmonious manner. The image does not contain any discernible actions or movements, and the focus is solely on the visual elements. Given the detailed and intricate nature of the pattern, it is likely that this image is intended to be a decorative piece, a piece of art, or a product designed to stimulate creativity or imagination. The use of bright, bold colors and the overlapping shapes could be intended to create a sense of depth and dimensionality, making the image appear more dynamic and engaging. In summary, the image showcases a colorful, intricate pattern of flowers and shapes, likely made of a material that resembles plastic or a similar substance, set against a dark, textured background. The pattern is dense and complex, with each flower and shape overlapping and intersecting with others, creating a visually stimulating and imaginative composition.

    mnemic
    Author
    Feb 23, 2026

    @jeffthomann871 Great! Sorry to hear you had some trouble, but it seems to have worked out.

    Flash Attention is not needed. It can speed SOME models up, but don't worry about it. It's a messy one to get right. You need to find the EXACT file matching your pytorch, python, and cuda. So you gotta understand what those values are, and then find the correct .whl to install it manually.

    mnemic
    Author
    Feb 23, 2026

    @jeffthomann871 Smolvlm2 there seem to be too verbose, saying things that ARENT in the image. Which is true, but usually not helpful for captioning. But you can try many different models, and prompts, and settings using AThousandWords. It's meant for you to configure to your own needs.

    Checkpoint
    Other

    Details

    Downloads
    792
    Platform
    CivitAI
    Platform Status
    Available
    Created
    1/28/2025
    Updated
    6/15/2026
    Deleted
    -

    Files

    deepseekJanusPro1B7B_janusPro1BZipped.zip