I’ve noticed, at least with the model I occasionally use, that the best way I’ve found to consistently get western eyes isn’t to specify round eyes or to ban almond-shaped eyes, but to make the character blonde and blue eyed (or make them a cowgirl or some other stereotype rarely associated with Asian women). If you want to generate a western woman with straight black hair, you are going to struggle.
I’ve also noticed that is you want a chest smaller than DDD, it’s almost impossible with some models — unless you specify that they are a gymnast. The model makers are so scared of generating a chest that could ever be perceived as less than robustly adult, that just generating realistic proportions is impossible by default. But for some reason gymnasts are given a pass, I guess.
This can be addressed with LORAs and other tools, but every time you run into one of these hard associations, you have to assemble a bunch of pictures demonstrating the feature you want, and the images you choose better not be too self-consistent or you might accidentally bias some other trait you didn’t intend to.
Contrast a human artist who can draw whatever they imagine without having to translate it into AI terms or worry about concept-bleed. Like, I want portrait-style, but now there are framed pictures in the background of 75% of the gens, so instead I have to replace portrait with a half-dozen other words: 3/4 view, posed, etc.
Hard association is one of the tools AI relies on — a hand has 5 fingers and is found at the end of an arm, etc. The associations it makes are based on the input images, and the images selected or available are going to contain other biases just because, for example, there are very few examples of Asian woman wearing cowboy hats and lassoing cattle.
Now, I rarely have any desire to generate images, so I’m not playing with cutting edge tools. Maybe those are a lot better, but I’d bet they’ve simply mitigated the issues, not solved them entirely. My interest lies primarily in text gen, which has similar issues.
I’ve also noticed that is you want a chest smaller than DDD, it’s almost impossible with some models — unless you specify that they are a gymnast.
That’s also another point of present generative AI image weakness — humans have an intuitive understanding of relative terms and can iterate on them.
So, it’s pretty easy for me to point at an image and ask a human artist to “make the character’s breasts larger” or “make the character’s breasts smaller”. A human artist can look at an image, form a mental model of the image, and produce a new image in their head relative to the existing one by using my relative terms “larger” and “smaller”. They can then go create that new image. Humans, with their sophisticated mental model of the world, are good at that.
But we haven’t trained an understanding of relative relationships into diffusion models today, and doing so would probably require a more sophisticated — maybe vastly more sophisticated — type of AI. “Larger” and “smaller” aren’t really usable as things stand today. Because breast size is something that people often want to muck with, people have trained models on a static list of danbooru tags for breast sizes, and models trained on those can use them as inputs, but even then, it’s a relatively-limited capability. And for most other properties of a character or thing, even that’s not available.
For models which support it, prompt term weighting can sometimes provide a very limited analog to this. Instead of saying “make the image less scary”, maybe I “decrease the weight of the token ‘scary’ by 0.1”. But that doesn’t work with all relationships, and the outcome isn’t always fantastic even then.
I’ve noticed, at least with the model I occasionally use, that the best way I’ve found to consistently get western eyes isn’t to specify round eyes or to ban almond-shaped eyes, but to make the character blonde and blue eyed (or make them a cowgirl or some other stereotype rarely associated with Asian women). If you want to generate a western woman with straight black hair, you are going to struggle.
I’ve also noticed that is you want a chest smaller than DDD, it’s almost impossible with some models — unless you specify that they are a gymnast. The model makers are so scared of generating a chest that could ever be perceived as less than robustly adult, that just generating realistic proportions is impossible by default. But for some reason gymnasts are given a pass, I guess.
This can be addressed with LORAs and other tools, but every time you run into one of these hard associations, you have to assemble a bunch of pictures demonstrating the feature you want, and the images you choose better not be too self-consistent or you might accidentally bias some other trait you didn’t intend to.
Contrast a human artist who can draw whatever they imagine without having to translate it into AI terms or worry about concept-bleed. Like, I want portrait-style, but now there are framed pictures in the background of 75% of the gens, so instead I have to replace portrait with a half-dozen other words: 3/4 view, posed, etc.
Hard association is one of the tools AI relies on — a hand has 5 fingers and is found at the end of an arm, etc. The associations it makes are based on the input images, and the images selected or available are going to contain other biases just because, for example, there are very few examples of Asian woman wearing cowboy hats and lassoing cattle.
Now, I rarely have any desire to generate images, so I’m not playing with cutting edge tools. Maybe those are a lot better, but I’d bet they’ve simply mitigated the issues, not solved them entirely. My interest lies primarily in text gen, which has similar issues.
That’s also another point of present generative AI image weakness — humans have an intuitive understanding of relative terms and can iterate on them.
So, it’s pretty easy for me to point at an image and ask a human artist to “make the character’s breasts larger” or “make the character’s breasts smaller”. A human artist can look at an image, form a mental model of the image, and produce a new image in their head relative to the existing one by using my relative terms “larger” and “smaller”. They can then go create that new image. Humans, with their sophisticated mental model of the world, are good at that.
But we haven’t trained an understanding of relative relationships into diffusion models today, and doing so would probably require a more sophisticated — maybe vastly more sophisticated — type of AI. “Larger” and “smaller” aren’t really usable as things stand today. Because breast size is something that people often want to muck with, people have trained models on a static list of danbooru tags for breast sizes, and models trained on those can use them as inputs, but even then, it’s a relatively-limited capability. And for most other properties of a character or thing, even that’s not available.
For models which support it, prompt term weighting can sometimes provide a very limited analog to this. Instead of saying “make the image less scary”, maybe I “decrease the weight of the token ‘scary’ by 0.1”. But that doesn’t work with all relationships, and the outcome isn’t always fantastic even then.