Holiday cheer

kokuen@lemmynsfw.com · 7 months ago

Holiday cheer

NSFW

demonologic@lemmynsfw.com · edit-2 7 months ago

How do you mean? In the same sense of “lighting complexity” like in 3D games or whatever, photos generated with an “AI” – LLM, or diffusion model, or whatever newfangled doodad – don’t really have complexity limits in the same way as eg rasterized rendering does, where you can usually only have so many lights per scene and so on. AI-generated images can sometimes make mistakes in light source directions and stuff like that, but that’s more just an artifact of the image generation being pretty stupid in a way – the models don’t really have a concept of lights etc. so they don’t “know” how light sources work (or strictly know anything in the first place for that matter 😀). You can also get very smooth and sorta flat-looking lighting with some prompts, but usually that’s an artifact of the training data having lots of things like studio photos with professional lighting, rendered images etc. etc.

Honestly I’d bet my left boob that this is a regular 'ol photograph taken by a regular 'ol meatbag. For example there’s just way too much detail that all makes sense, instead of the usual collection of background objects, texts etc that look vaguely plausible at first glance, but turn out to be complete dada when you look at them closer.

j4k3@lemmy.world · 7 months ago

SD3 is super powerful under the hood. The tools given in ComfyUI are just a start. There is a ton of code to run that thing and a ton of potential to modify its behavior. I suspect these are image to image composites, but with SD3 there are 16 layers. I have no idea half the stuff they are doing. Like the google T5xxl LLM is using pytorch to swap out a whole layer; not a custom trained model, not a LoRA finetune layer, a whole layer is swapped - that’s it. I don’t even know where to start dissecting what is going on in that paradigm. From what I’ve seen, the example tools for SD3 certainly can’t do this, but my intuition is confident that there is a way within that toolchain.

What I have done is followed Two Minute Papers on YT by Dr. Károly Zsolnai-Fehér. They are a light researcher in this space. In my typical abstractions, I’m aware of the general limitations in complexity. There are a series of three images this person has uploaded where something just didn’t feel quite right to me. This was the third, and when I chose to say something, mostly because it was less obvious than the other two. There is also a coincidence of timing with this person and another account I find curious, but that is an aside. I don’t care what they are doing or why, as much as, if correct in my assumptions, I admire them, but it is still highly speculative.

Anyways, generative AI still struggles with complex environmental reflected light and especially color. Each of the three images look like I am in my old photo studio and placed a softbox to the side. It makes the subject pop subtly. It almost looks like a green screen like setup but not as extreme as even a high quality setup. There is an unnatural monocolor like consistency to the reflections.

I did a lot of low lighting product photography in a makeshift studio. I spent a lot of time playing with this kind of lighting for accents and hair lights. There is a familiar artificial lighting aspect that is in line with what should be easy to train with a model and captions. I expect this simplicity to be present still within accessible generative models.

If really good image searches were still possible, I bet this background is somewhere obscure on the internet with a different subject entirely. I could easily be wrong. This was just a stack of 3 images that all felt around 55-60% likely diffusion AI to me. Calling it out is a fun puzzle game now. I would not bet the farm, but might wager a coffee that it’s a gen.