🖥️Welcome to try out the open-source GPT4V-Image-Captioner, developed by my friend and me. It offers a one-click installation and comes integrated with multiple features including image pre-compression, image tagging, and tag statistics. Recently, we also launched the webui plugin version of this tool, everyone is welcome to use it!
🌍欢迎加入QQ群"兔狲·AIGC梦工北厂",群号 :780132897 ;"兔狲·AIGC梦工南厂",群号 :835297318(入群答案:兔狲)。Telegram群聊“兔狲的SDXL百老汇”,链接:https://t.me/+KkflmfLTAdwzMzI1
📖HelloWorld 7.0 Update - June 13, 2024
One-sentence update summary: HelloWorld 7.0 is an iteratively optimized version, with the best body performance in the entire series, and further enhanced concept scope and detail richness.
Update details:
By adding negative training images, strengthening pose training, and optimizing the clip model, the accuracy of the model's limbs and hands has been improved compared to previous versions. The recommended negative prompt words are: "bad hand, bad anatomy, worst quality, ai generated images, low quality, average quality".
Extracted the fine-tuned LoRA from the official SPO model and incorporated it into HelloWorld 7.0. SPO is a further improvement of the DPO method. The SPO base model is used for better performance than the DPO XL base model and the original SDXL base model. The SPO LoRA can enhance image details & contrast and beautify images. Thanks to the technical team behind SPO.
Continued to expand the concept scope of the training set, but optimized and streamlined the training set (large training set fine-tuning is too expensive, and H800 is difficult to rent recently, can't afford the local training time). The current total training set is 20,821 images. The training set resolution distribution is as follows, and it is recommended to use several resolutions with a larger number of images for output:
(832, 1248) - Count: 7128 (896, 1152) - Count: 6250 (1248, 832) - Count: 2402 (1024, 1024) - Count: 1639 (1360, 768) - Count: 928 (1152, 896) - Count: 870 (768, 1360) - Count: 432 (960, 1088) - Count: 506 (992, 1056) - Count: 162 (1088, 960) - Count: 140 (704, 1472) - Count: 120 (1056, 992) - Count: 122 (1472, 704) - Count: 115 (1632, 640) - Count: 75 (640, 1632) - Count: 12Used GPT4O to re-label all datasets. This time, a structured labeling method was used, with the specific structure being: "one-sentence summary description + multiple image element tags + inspired by XXX + aesthetic quality description words", where the aesthetic quality description words are divided into five levels: worst quality, low quality, average quality, best quality, and masterpiece. A typical labeling example is as follows:
conceptual art featuring a human hand wrapped in red and beige ribbons, isolated against a plain, light background, realistic style, minimalist color scheme, smooth textures, elongated and surreal aesthetic, inspired by salvador dalí's surrealist works, masterpiece
The "High-Frequency Tagging Word List" and the "High-Frequency Art Style List" involved in the Inspired by XXX for the HelloWorld 7.0 version will only be provided to commercial licensing users. Partners who have purchased Helloworld XL series model authorization in the past, please contact me if there are any omissions to get it for free.
Players can refer to the High-Frequency Tagging Word List of HelloWorld 6.0. In addition, I have also provided 150+ high-quality HelloWorld 7.0 example images in the gallery, which can be used as a reference for everyone's output. Model making is not easy, thank you players for your understanding and tolerance!
📖HelloWorld 6.0 Update - April 20, 2024
LEOSAM HelloWorld 6.0 Top 250 High-Frequency Tagging Word List
Thank you for your patience. I have been job hunting recently, which caused some delays in the HelloWorld updates. Here are the main updates in version 6.0:
HelloWorld 6.0 is an iterative improvement based on version 5.0. Based on my own testing, the realism effect is not significantly different from version 5.0. The main advantage of version 6.0 lies in its broader coverage of concepts in the training set. According to feedback, enhancements have been made in various themes including surrealism, boudoir, group photos, masks, origami, 3D renders, cars, dragons, and maternity photography. Some examples are provided in the illustrations.
HelloWorld 6.0 intentionally includes some low-quality images in the training to enhance the model's response to negative prompts. It is recommended to use the following terms in negative prompts: "low quality, jpeg artifacts, blurry, poorly drawn, ugly, worst quality".
The main body of the HelloWorld 6.0 training set employs GPT4v tagging. For images that GPT4v cannot tag, cogVQA guided by blip2-opt-6.7b is used for tagging. The tagging language style of these multimodal models differs significantly from the traditional WD1.4 tagger. To facilitate more accurate triggering of different concepts in the training set, I have compiled the top 250 high-frequency tagging words from the HelloWorld 6.0 training set. You can view these high-frequency words in this document.
Finally, although SD3 is about to be released, I will still update to HelloWorld XL 7.0, hoping to achieve greater enhancements in version 7.0!
📖2024.2.22 Introducing "HW5.0_Euler_a_Lightning"
This model is a run-accelerated version of the HelloWorld SDXL base model, incorporating both SDXL-Lightning technologies. Equipped with the Eular a sampler and CFG 1, it is capable of generating images in 6-8 steps, which is three times faster than the original SDXL version. Moreover, upon comparison, its imaging results are superior to those of LCM or Turbo versions.
The recommended parameters for generating images with this model are:
Sampler: Eular a (Important! The model is specifically adapted to Eular a, other samplers may not yield as good results)
CFG scale: 1
Sampling steps: 8 steps (6~8 steps are acceptable)
Hires algorithm: ESRGAN 4x / 8x_NMKD-Faces_160000_G
Hires Upscale factor: 1.5x
Hires steps: 8 steps
Hires Denoising strength: 0.3
📖2024.2.11 Introducing "HelloWorld 5.0 GPT4V"
HelloWorld 5.0 is the most substantial update in the history of the HelloWorld series, tagged with GPT-4v, and has undergone significant fine-tuning in fields such as science fiction, animals, architecture, and illustration.
Comparative tests show improvements in this version include:
1. More varied and dynamic character poses and image compositions, creating visually engaging pictures;
2. The film dataset has been extensively trained. While the film texture was weak from versions 2.0 to 4.0, many fans missed the leogirl style of version 1.0. Therefore, this update has specifically strengthened the film texture without compromising other photographic qualities. The film texture can be triggered by phrases such as film grain texture and analog photography aesthetic;
3. Enhanced expressiveness in themes like science fiction, thriller, and animals, with mechas and other subjects having a more designed feel. Animals like snow leopard, red panda, giant panda, tiger, the Pallas's cat, and domestic cats and dogs are more lifelike;
4. Thanks to GPT tagging, prompt adherence and conceptual accuracy have been further improved.
However, the drawbacks of this version include:
1. As this is a substantial fine-tuning update, the error rate for limbs and such may slightly increase, a normal phenomenon when moving out of a comfort zone into new areas of relative optimization. Previous versions underwent extensive limb testing for improvements, while the new version had limited time for such enhancements. Nevertheless, the accuracy of limbs in this version is at least higher than in version 1.0, and I will continue to make improvements in future updates.
2. Due to the reinforced film texture, even though GPT tagging is as accurate as possible, there can be an unavoidable default warm tone in images. However, you can use prompts like studio light or sharp focus to produce high-definition studio-quality images, and with proper use of prompts, the output can have better skin tones and visual appeal than previous versions.
3. This version includes more full-body character images to enhance the full-body effect, so the model may produce wider scenes than before if no specific character composition is directed. Currently, the facial details in 1024 resolution full-body shots might be less sharp compared to half-body or close-up shots. However, this can be improved by adetailer and a 1.5x Hires. fix at 0.3 intensity, or by using prompts like specifying composition to avoid generating full-body images.
4. Since a small number of high-quality illustration datasets have been added, there is a chance that prompts related to animated styles will produce animated images. If this concerns you, please adjust your prompts accordingly.
These are the main updates for this version. Training the SDXL base model is challenging, and when the training set approaches ten thousand images, the cost for tagging and training for each model exceeds 300 USD. I welcome everyone to use the model and appreciate any feedback you can provide! If you find this model satisfactory, I would be immensely grateful if you could help spread the word about it.
📖2024.1.31 Introducing "HelloWorld 4.0"
HelloWorld4.0 is a progressive transitional version from tagging with blip+clip to tagging with GPT4V. I initially trained a pure GPT4V tagging model, and then merged it with a large proportion of the HelloWorld3.2 version and 0.05 proportion of Juggernaut XL (to adjust the skin tone). The new version has shown improvements in prompt compliance and concept coverage compared to the 3.2 version.
The new GPT4V tagging training set has doubled from the 4000 images of the helloworld3 series to 8000 images, covering not only portraits but also animals, architecture, nature, food, illustrations, and more. However, the pure GPT4V version encountered an overfitting problem, which is preliminarily attributed to the doubling of the number of training images. One of the next steps in iterative optimization is to find out how to include as many non-portrait concepts as possible while ensuring sufficient training of portraits. At this stage, a fusion of the new and old versions has been used for fine-tuning to ensure a smooth transition between versions, so the expanded concept set and the advantages brought by GPT4V tagging are not very perceptible at the moment. These advantages will become increasingly apparent in the subsequent generations 5 and 6 of the model.
📖2024.1.5 Introducing "HelloWorld 3.2"
Version 3.2 is an iteration optimized with DPO technology, and compared to version 3.0, there are optimizations in skin tone and limb accuracy, but the improvements are not significant. That's why this version is marked as 3.2 rather than being labeled as 4.0.
📖2023.12.15 Introducing "HelloWorld 3.0"
The new version has expanded the training set, enhancing the model's ability to express in different artistic styles, including science fiction and art.
It has integrated a self-made quality enhancement LoCon (created using slider technology), to improve image texture and alleviate issues of distortion in fingers and limbs.
📖2023.11.17 Introducing "HelloWorld 2.0"
Thank you all for your patience. After overcoming various challenges, the HelloWorld 2.0 version is finally ready to be presented to you all in a state that I'm satisfied with. The main differences between HelloWorld 2.0 and 1.0 are as follows:
HelloWorld 2.0 no longer requires trigger words, and the results are comparable in quality to version 1.0 with trigger words.. The trigger word 'leogirl' in 1.0 was highly associated with East Asians. After the cancellation of the trigger words, while words like '1girl' will still likely generate East Asian portraits when race is not specified, you can now specify the race by using keywords like nationality, skin color, etc. For example, the trigger effects for words like 'Chinese', 'Russian', 'Iranian', 'Jamaican', 'Kenyan', 'dark-skinned', 'pale-skinned', etc., are listed below.

You can also get different styles of characters by writing the names of people from different countries and genders in the prompt, such as Han Meimei (China), Sophie Martin (France), Priya Patel (India), Fatima Al-Hassan (Arab), Wanjiru Mwangi (Kenya). The above prompts are just examples, there are many available prompts and ways to play, and you're welcome to explore and share them by yourself.

HelloWorld 2.0 has balanced the quality/color and offers more style options. The 1.0 version, when used with 'leogirl', would likely produce images with a strong film texture. HelloWorld 2.0 is no longer tied to a film texture and can be customized with some quality-related prompts. Some prompts that have been tested and work well include:
high-end fashion photoshoot, product introduction photo, popular Korean makeup, aegyo sal, Sharp High-Quality Photo, studio light, medium format photo, Mamiya photography, analog film, Medium Portrait with Soft Light, real-life image, refined editorial photograph, raw photo, real photo, Scanned Photo, film still
The color effects of these prompts are as follows:

The training set for HelloWorld 2.0 significantly increased the proportion of full-body photos to improve the effects of SDXL in generating full-body and distant view portraits. Although it has improved compared to version 1.0, it is still strongly recommended to use 'adetailer' in the process of generating full-body photos. Also, for users with enough video memory (24g), it is recommended to perform 1.5x high-resolution repair on the image, which can significantly improve facial details.
📖2023.8.29 Introducing "HelloWorld" SDXL Base Model
Special reminder: When using the HelloWorld 1.0 model, please remember to add the trigger word "leogirl".
Distinct from SD1.5 base model “MoonFilm”, “HelloWorld” is a brand new realistic SDXL base model series, . In order to allow more users to discover HelloWorld, I have retained the original Moonfilm's model link. It can be perceived as a spiritual continuation of Moonfilm on the SDXL new platform, but HelloWorld aims to achieve more than just the pursuit of realism and film-like quality in portraits. Thanks to the far superior amount of information and text understanding capabilities of SDXL compared to SD1.5, HelloWorld is a base model that seeks to realistically depict all things, or in other words, I hope to gradually build a virtual photography world using HelloWorld.
The realistic base model of SD1.5 has developed to a quite mature stage, and it is unlikely to have a significant performance improvement. Unless there is a breakthrough technology for SD1.5 platform, the Moonfilm & MoonMix series will basically stop updating. I will devote my main energy to the development of the HelloWorld SDXL large model. The 1.0 version is now available for download, and the 2.0 version is being developed urgently and is expected to be updated in early September.
As a brand new SDXL model, there are three differences between HelloWorld and traditional SD1.5 models:
Unlike SD1.5 base models, which typically do not include trigger words, please remember to use the trigger word "leogirl" when using HelloWorld 1.0. This ensures that the SDXL model triggers the training set effect more stably.
The HelloWorld model supports direct output at a resolution of 1024*1024 pixels, eliminating the need for high-resolution magnification. The quality of close-up portrait directly output is not inferior to the SD1.5 version, but there are still flaws when outputting distant portraits directly. Therefore, it is suggested to use ADetailer plugin, which can effectively correct the problems of distant faces.
SDXL now allows for easier output using simple natural language prompts. It is recommended to try more natural language prompts, which will result in better outcomes when outputting AI realistic photos.
After multiple rounds of testing, the suggested drawing parameter settings are:
Steps ≥ 25
Sampler: DPM++ 2M Karras
CFG scale: 10
Size ≥ 1024x1024
ADetailer: open
Everyone is welcome to try HelloWorld and provide plenty of feedback. Your valuable opinions are very important for the next step of model improvement!
Copyright Statement:
The HelloWorld series of models (hereinafter "the Model") has been crafted by myself (hereinafter "the Owner") with the assistance of the LiblibAI platform. Republishing the Model on platforms excluding LiblibAI and Civitai is unauthorized by the Owner.
The Owner permits the use of images generated by the Model for non-commercial educational or informative purposes at no cost, on the condition that:
- Users adhere to applicable laws and do not violate the rights of the Model or any third-party.
- Attribution for the images must be clearly stated as "created by LEOSAM's HelloWorld base model".
For any form of commercial utilization, a prior commercial license agreement with the Owner is required. For inquiries related to commercial licensing and model personalization, please reach out to the Owner via the contact information available on the Owner's homepage.
The development and free distribution of the SDXL model represent significant endeavors. The Owner pledges ongoing complimentary updates to the HelloWorld model for individual enthusiasts as a token of appreciation for the community's contributions to open-source development. Collaborative commercial engagements are vital for the Model's advancement and refinement. The Owner appreciates every user for their understanding and support.
Unauthorized use may breach applicable laws and carry legal repercussions. The Owner retains exclusive rights to interpret this statement, which is governed by prevailing laws and regulations.
Description
Many improvements and attempts have been made in the production process of this version. The main improvements are listed one by one for reference:
Further selections were made to the training materials, but the total volume is still maintained at the scale of 500 training sets + 1500 regular sets. The proportion of full-body photos, male photos, high-definition texture photos, and photos of different races has been increased.
The word library used for clip labeling originally contains about 110,000 phrases, but there are a large number of errors, garbled codes, and repeated phrases. With the help of GPT4 batch modifications and multiple rounds of test labeling and manual addition and deletion, this word library has been reduced to 40,000 words, and a large number of phrases related to photography, portraits, and China have been added.
A large number of comparative tests have been conducted. Including a. The difference between training the SDXL under the same training set with dreambooth and first training with SDXL lora and then merging into the large model; b. The training effect differences under adafactor, adamW8bit, prodigy three optimizers, different LR schedulers, different learning rates, different batch sizes; c. The effect differences when different training set data enhancement methods are used and not used; d. The training effect under different SDXL base models.
Batch image processing was performed before the training set was bucketed, compressing and cropping the training set and putting it into the target resolution groups of (768, 1360),(832, 1248),(864, 1184),(1024, 1024),(1184, 864),(1248, 832),(1360, 768). This improves the subsequent large batch size training effect (but the improvement seems limited).
Above are the main updates for HelloWorld 2.0. There were quite a few challenges when updating this version, but the good news is that I've figured out the way to train the SDXL large model, so future updates should be much smoother.
该版本在制作过程中在多方面进行了改进尝试。主要改进逐一列举如下:
对训练素材进行了进一步的增减精选,但总量仍维持500训练集+1500正则集的规模。增大了全身照、男性照、高清质感照片以及不同人种照片的比例。
clip打标所用的词库本身有约11万词组,但其中存在大量错误、乱码与重复词组。借助GPT4批量修改,以及多轮测试性打标人工增减,将该词库缩减至4万词规模,并大量补充了与摄影、人像、中国相关的词组。
进行了大量的对比测试。包括a.同训练集下dreambooth训练SDXL大模型与先SDXL lora训练再合并入大模型的效果差异;b.adafactor、adamW8bit、prodigy三个优化器、不同LR scheduler、不同学习率、不同batch size下的训练效果差异;c.不同训练集数据增强方法使用与未使用时的效果差异;d.不同sdxl底模下的训练效果。
对训练集进行了分桶前的批量图像处理,将训练集压缩裁剪并归入 (768, 1360),(832, 1248),(864, 1184),(1024, 1024),(1184, 864),(1248, 832),(1360, 768)这7个目标分辨率组别。以提高后续大batch size下的训练效果(但感觉提升有限)。
以上就是HelloWorld 2.0版本的主要更新内容,这个版本在更新过程中属实踩了太多坑,好处是摸到了sdxl大模型训练的门道,以后的更新应该会顺利很多。
FAQ
Comments (20)
After doing lots of testing on version 2.0, my findings have been mixed in comparison to the previous version, aka. 1.0. Generally speaking, for non-East Asian portraits, the 2.0 version is an improvement over the previous one in terms of prompts understanding and details, although it seems this time around the model is slightly overtrained - sometimes you need to lower cfg values to get decent looking results.
However, with regard to East Asian (female) portraits, it's a lot worse than 1.0 out of the box. For example, in ver 1.0, it's very easy to get good-looking Chinese faces for girls by just having keywords such as 'leogirl', ' Chinese beauty' in my prompts. However, with the same prompt in ver 2.0, you would often get ugly unnatural looking faces or mixed race. I've even tried including words such as 'Han Meimei/Zhao Lusi' with triple brackets which did make things better, but in most cases, 1.0 would still give you more attractive looking Chinese girls. It seems to me that the new data set in 2.0 have caused some noticeable degradation to the model's ability to generate good-looking Asian faces.
My suggestion is that perhaps it's better to train two separate models, one for general purposes like this 2.0 version, and another one that's focused more on East Asian faces similar to 1.0 with the ability to generate that awesome filmic looking leogirl effect very easily.
Thank you for your thoughtful and constructive feedback! I apologize for my late reply as I've been quite busy these days.
In 1.0 version, I added the trigger word "leogirl" to all East Asian portraits with film texture, which indeed can achieve impressive East Asian portrait effects. However, as a generic realistic base model, I still hope HelloWorld can be balanced and comprehensive, rather than standing out in one specific area and having reduced stability in other areas.
I think your suggestions very valuable, and I plan to introduce two new SDXL realistic base models in addition to HelloWorld, one focusing on the texture of film cameras and another on the texture of mobile phone cameras. They might be called HelloFilm and HelloPhone, which is my current plan. I hope to have both of these models ready and released before Christmas!
Refiner needed? If not, what upscale method do you suggest?
Olivio sarikas has a really good upscale method
This model does not need to be used in conjunction with a refiner. I recommend using the ESRGAN_4x algorithm from Hires. fix, with an upscale factor of 1.5x and Denoising strength of 0.3.
LEOSAM AIArt 兔狲插画 SDXL大模型
where is this model? can you upload this on civitai
Thank you for your love for AIArt! I will update this model on CivitAI this Sunday. It needs to be premiered on another website for two weeks before it can be uploaded here. Once it's updated, I will leave a message here to notify you promptly!
@LEOSAM thank you
@itachiii Hello! Apologies for the wait, AIArt has now officially been launched on Civitai! https://civitai.com/models/219791/leosam-aiart-sdxl
@LEOSAM thank you so much i am now super excited to see result thanks for uplaoding
奶头太短像个纽扣,怎么才能变长,写提示词无效
群主你这个群里有个很垃圾的管理员“拉法叶的劳伦斯”,任何和他不同意思都会被删除,也不能指出sd模型的问题,这样搞个群干嘛呢?
这种人多的是,但凡手里有一丝小权力,就特么目中无人,作威作福,说白了就下等人本质,上不了台面。因为通常人在一个陌生的环境中混,总会有些谨小慎微,这就容易让管理的这类人产生某种控制欲,自我权威化,自圈领地,事实上自圈领地意识就是纯动物特征,说明这类人就是没进化好。现实里估计就是卢瑟,需要这些东西来平衡现实中产生的无力感和自卑。
@BIG_A 我和管理员沟通过,不全是这样哈。对于给这位群友的不好体验,我很抱歉,我也跟管理员说了,以后如果遇到特殊情况,及时跟我说,我来进行处理。我申请了删除这条评论,但civitai没有处理,对此我其实有些无奈
@LEOSAM 我只是就事论事说这种现象,并无所指,我甚至和层主不熟,只是他的一个checkpoint我很喜欢,爱屋及乌,并且我也真实遇到过这样的小人,有点应激罢了,和op你无关,抱歉!
Details
Files
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.













