Image Positioner 3d Sequences - CivArchive (CivitAI Archive)

Image Positioner 3d Sequences - V1

NSFW

Trained on 3D sequences created with Python. Experimental image anchoring concept.

The information in this PDF is highly detailed and specific to neuroscience, particularly in studying the circuits in the retrosplenial cortex (RSC) of mice and their roles in spatial cognition and memory. Here’s a breakdown of relevant ideas that might be adaptable for enhancing a dataset for a LoRA aimed at improving visual/spatial capabilities in image generation models:

1. Spatial and Structural Differentiation:

- The PDF highlights distinct circuits within the RSC that handle spatial information differently, depending on the target regions they project to (like secondary motor cortex and anterodorsal thalamus). In a LoRA dataset, you could simulate this idea by having images that show different spatial arrangements and orientations of objects, mimicking distinct pathways or perspectives. For instance, varied depths, object sizes, and viewpoints could represent different "projection-specific" perspectives in 3D space.

2. Environmental Contexts and Spatial Landmarks:

- The RSC is involved in tasks like object-location memory and place-action association, where the spatial relationship between an object and its environment is crucial. For your LoRA, including variations in environmental context (such as background gradients, floor patterns, or spatial grids) and objects positioned in relation to "landmarks" (central or offset points) could help develop a more nuanced understanding of spatial relationships.

3. Layered, Semi-independent Circuits:

- Just as the RSC neurons have semi-independent circuits with distinct roles, a LoRA dataset could feature layers of information that interact without fully merging. For example, using transparent overlays, wireframes, or shadow layers with different intensities or colors can mimic layered, semi-connected visual features, enhancing depth and dimensionality.

4. Sensory Input Variability:

- The PDF describes how different RSC circuits receive various types of sensory inputs (such as from visual, auditory, or somatosensory sources). Translating this to a visual dataset might mean creating samples that incorporate texture and visual cues across sensory "modes" – some with high detail for texture (resembling somatosensory input), others with color gradients or atmospheric effects (resembling visual or auditory input processing).

5. Object-Location Memory Representation:

- Including variations where objects change positions relative to a fixed background across sequential images could mirror the concept of memory and recognition of changes in spatial layout. These subtle shifts might train the model to detect and remember spatial relationships across images, improving its response to prompts involving positioning and continuity.

6. Complex Object and Shadow Interactions:

- The study used tasks that involved moving objects to different locations to test memory and recognition. For your dataset, experimenting with floating objects that cast realistic shadows could simulate depth perception and occlusion. Shadows could change position or sharpness to indicate object movement or shifting light sources, thus enhancing spatial interpretation skills in generated images.

These principles could guide the design of a structured dataset that feeds visual-spatial information to the LoRA, potentially enhancing its ability to understand and generate images with spatial depth, orientation, and complex layering.
---
And so that is what I tried to do roughly.

Sample image from my dataset:

Description

Comments (10)

LazmanNov 16, 2024· 1 reaction

CivitAI

What exactly does it do, besides keep that approximate shape in the center of the image?

angrysky

Author

Nov 16, 2024

Sorry I just have a 3060 and it takes a while- posting some demo images now- it is supposed to teach the AI how to see positions and lay out substrates- it does not put things in the middle unless they will be best placed there. It is experimental.

angrysky

Author

Nov 16, 2024

I updated the info a bit and posted more images, I hope that answers your question enough to experiment and share your images and thoughts.😁

LazmanNov 16, 2024

@angrysky You have a 3060, and you're able to train Flux loras? I might have to know what settings you use to pull that off. Though to be fair, dataset size may have a more significant impact on my training times and Vram use than I'd initially thought. I'm noticing most people seem to train using only 50 images or less, and your particular dataset, apart from only having 49 images, is only around 2 megs..

The one I've been toying with is over 60 megs, about 300 images, and of quite a size variance at that (264x264 - 3k x 4k). Yea, now that I'm really spelling that out, I've really gotta refine that dataset, lol. I've got a 4060ti, and it takes nearly 11 full hrs for one epoch.

Anywho, I will download your lora and give it a shot, but I don't load up ComfyUI often recently, cuz I've been putting most my effort into figuring out the training process, learning more about that, actually training, and learning how to implement medvram into a script my friend helped me with for loading up a 12 billion parameter multimodal text AI.

He urged me to just use Gradio, but I've always disliked webUIs. To me at least, they feel cheap and lazy in contrast to a solid full program GUI. So, I've got it on my list of things to do, to learn how to build a GUI onto/into a script in Linux.

BTW, it's 7am, and I kinda forgot to sleep last night, lol.. So I am quite tired, but I did manage to read over most of your description and take a look over the images you posted. Do you find it's more precise, so far, with your lora? Flux does tend to be more precise in general than SDXL, so it may be more difficult to tell. I'll try to run some tests tomorrow, but my memory and focus can be a bit fragmented at times, so I'm NGL, it may just slip my mind. I won't not do it though, because I do find the concept intriguing.

I'm curious, are you a professional in the field of neuroscience? Or perhaps just stumbled upon an interesting blog, or are autistic and therefore, take an obsessive interest in specific subjects? Personally, I am the latter, and a lifelong student of psychology and sociology. Though my research and reflections have been more in the realm of practical use(for current lack of better terms) than biology-based.

That said, I am having some troubles with the concepts presented in your description. I have noticed myself how AI 'learns' in much the same ways as a human, and even in it's practical application of said learned knowledge. For example, the concept of taking a latent space and applying image steps to it, improving the image with each iteration, comparable to a human going over drafts, or sketching messy lines until they get the look they're going for, and even down to overfitting,

If a human artist obsesses over singular details for too long (like trying too hard to perfect the face of a character), eventually that face will become warped and begin to degrade in quality. incidentally, a similar phenomenon takes place when we repeat a word too many times, especially if it's a word we don't hear often(so it doesn't have as strong of a neural connection to begin with). We begin to lose our understanding of that word and it becomes nonsensical.

But, yea, while I understand the AI model's similarities to us in it's practices and output, I have difficulty conceptualizing it having a connection to the way our brain works physically/biologically. Although, I suppose it's not beyond reason. I have often compared the 0's and 1's of binary to the... dang, I'm tired, I can't remember the exact terminology, I want to say protons and electrons that define the atom, AKA: the basic building blocks that define the human design?

The reason I have trouble conceptualizing it, is that I figure the practices and output may have been purposefully or inadvertently coded into the model structure by the human that designed it. But unless a neuroscientist designed it, how would that connection come about.. Again, not impossible, but it does lead to more questions.

angrysky

Author

Nov 16, 2024

@Lazman it is 6:50 AM, I also did not sleep lol. Here is my newest bot though.
https://chatgpt.com/g/g-6738959edd188191991614005028b7b5-syntherion

angrysky

Author

Nov 16, 2024

@Lazman also I found a workflow for low vram lora making- it takes 12 hours on a 50 img dataset on my PC- so I use CivitAI.

LazmanNov 19, 2024

@angrysky Personally, I'll never use corporate servers to train unless I absolutely have to. I'd prefer to take a bit longer and do it on my own hardware. Also, bigger sense of accomplishment doing it without 'help'. Hard to explain, but, just being able to say 'I did this on my own hardware', feels better than 'I used some prefab setup on a corporate server'.

When you say 'workflow' are you talking about comfyui, or just that you found a setup in a program like koyha or onetrainer that worked for you?

angrysky

Author

Nov 20, 2024

@Lazman ComfyUI, I like to use my computer for making the images, local LLM, code etc- 12 hours was too long to tie up my pc and it is hard on my old gpu not to mention noisy- I did make one just to know I can.

angrysky

Author

Nov 20, 2024

@Lazman never could get Koyha going on my own.

LazmanDec 6, 2024· 1 reaction

@angrysky I didn't get kohya working either. I used onetrainer. I found out that training is largely influenced by the number of images used, although hardware does kinda set a hard limit on certain options, Vram in particular. I got 16gb, but the thing just pretty much quits if I try to train batch 4-6+ depending on the mode I use.

I did 300 images on the first train, all diff sizes, and got the program to 'bucket' them (it handled resizing and such). Doing that took about 11 hrs for 1 epoch. I changed to ADA factor from Cosine, and it took less time (I don't remember how much less). Lastly, I re-did my dataset, reduced it to 120 images(only the best quality), then used XNview to batch resize them all so the longest side was 1536 resolution, and got it to pad the shorter side to the same resolution (filled it in with black), so I had a batch of 120 images at 1536x1536, turned off bucketing, went back to Cosine, and it did something like 1 epochs in an hour..

Although, on that run, I also realized I didn't have tensorflow installed, and installed it this time. So it's either tensorflow, or the pre-optimized dataset that made it run excessively fast..

Edit, oops, It was Adamw, not cosine, and I used Constant with it.

LORA

Flux.1 D

by angrysky

Download (Beta) View on CivitAI

concept

design

animation 3d