Start coding and learn prompts engineering and workflows, be the manager and not the workhorse. You can inpaint you can Photoshop, you can still write or be an editor, review copy etc.
There are also things that present-day generative AI is not very good at in existing fields, and I’m not sure how easy it will be to address some of those. So, take the furry artist. It looks like she made a single digitally-painted portrait of a tiger in a suit, a character that she invented. That’s something that probably isn’t all that hard to do with present-day generative AI. But try using existing generative AI to create several different views of the same invented character, presented consistently, and that’s a weak point. That may require very deep and difficult changes on the technology front to try to address.
I don’t feel that a lot of this has been hashed out, partly because a lot of people, even in the fields, don’t have a great handle on what the weaknesses are and what might be viably remedied and how on the AI front. Would be interesting to try to do some competitions in various areas, see what a competent person in the field and someone competent in using generative AI could do. It’ll probably change over time, and techniques will evolve.
There are areas where generative AI for images has both surpassed what I expected and underperformed. I was pretty impressed with its ability to capture the elements of what creates a “mood”, say, and make an image sad or cheerful. I was very surprised at how effective current image generation models were, given their limited understanding of the world, at creating things “made out of ice”. But I was surprised at how hard it was to get any generative AI model I’ve tried to generate drawings containing crosshatching, which is something that plenty of human artists do just fine. Is it easy to address that? Maybe. I think I could give some pretty reasonable explanations as to why consistent characters are hard, but I don’t really feel like I could offer a convincing argument about why crosshatching is, don’t really understand why models do poorly with it, and thus, I’ve no idea how hard it might be to remedy that.
Some fantastic images are really easy to create with generative image AI. Some are surprisingly difficult. To name two things that I recall [email protected] regulars have run into over the past couple years, trying to create colored car treads (it looks like black treads are closely associated with the “tire” token) and trying to create centaurs (generative AI models want to do horses or people, not hybrids). The weaknesses may be easy to remedy or hard, but they won’t be the same weaknesses that humans have; these are things that are easy for a human. Ditto for strengths — it’s relatively-easy for generative AI to create extremely-detailed images (“maximalist” was a popular token that I recall seeing in many early prompts) or to replicate images of natural media that are very difficult or time-consuming to work in in the real world, and those are areas that aren’t easy for human artists.
I’ve noticed, at least with the model I occasionally use, that the best way I’ve found to consistently get western eyes isn’t to specify round eyes or to ban almond-shaped eyes, but to make the character blonde and blue eyed (or make them a cowgirl or some other stereotype rarely associated with Asian women). If you want to generate a western woman with straight black hair, you are going to struggle.
I’ve also noticed that is you want a chest smaller than DDD, it’s almost impossible with some models — unless you specify that they are a gymnast. The model makers are so scared of generating a chest that could ever be perceived as less than robustly adult, that just generating realistic proportions is impossible by default. But for some reason gymnasts are given a pass, I guess.
This can be addressed with LORAs and other tools, but every time you run into one of these hard associations, you have to assemble a bunch of pictures demonstrating the feature you want, and the images you choose better not be too self-consistent or you might accidentally bias some other trait you didn’t intend to.
Contrast a human artist who can draw whatever they imagine without having to translate it into AI terms or worry about concept-bleed. Like, I want portrait-style, but now there are framed pictures in the background of 75% of the gens, so instead I have to replace portrait with a half-dozen other words: 3/4 view, posed, etc.
Hard association is one of the tools AI relies on — a hand has 5 fingers and is found at the end of an arm, etc. The associations it makes are based on the input images, and the images selected or available are going to contain other biases just because, for example, there are very few examples of Asian woman wearing cowboy hats and lassoing cattle.
Now, I rarely have any desire to generate images, so I’m not playing with cutting edge tools. Maybe those are a lot better, but I’d bet they’ve simply mitigated the issues, not solved them entirely. My interest lies primarily in text gen, which has similar issues.
I’ve also noticed that is you want a chest smaller than DDD, it’s almost impossible with some models — unless you specify that they are a gymnast.
That’s also another point of present generative AI image weakness — humans have an intuitive understanding of relative terms and can iterate on them.
So, it’s pretty easy for me to point at an image and ask a human artist to “make the character’s breasts larger” or “make the character’s breasts smaller”. A human artist can look at an image, form a mental model of the image, and produce a new image in their head relative to the existing one by using my relative terms “larger” and “smaller”. They can then go create that new image. Humans, with their sophisticated mental model of the world, are good at that.
But we haven’t trained an understanding of relative relationships into diffusion models today, and doing so would probably require a more sophisticated — maybe vastly more sophisticated — type of AI. “Larger” and “smaller” aren’t really usable as things stand today. Because breast size is something that people often want to muck with, people have trained models on a static list of danbooru tags for breast sizes, and models trained on those can use them as inputs, but even then, it’s a relatively-limited capability. And for most other properties of a character or thing, even that’s not available.
For models which support it, prompt term weighting can sometimes provide a very limited analog to this. Instead of saying “make the image less scary”, maybe I “decrease the weight of the token ‘scary’ by 0.1”. But that doesn’t work with all relationships, and the outcome isn’t always fantastic even then.
Start coding and learn prompts engineering and workflows, be the manager and not the workhorse. You can inpaint you can Photoshop, you can still write or be an editor, review copy etc.
“Can you please do it right this time? Pretty please?”
“You are correct, I did it wrong the other time! This time I’ll give you the correct code!”
<code that is wrong elsewhere>There are also things that present-day generative AI is not very good at in existing fields, and I’m not sure how easy it will be to address some of those. So, take the furry artist. It looks like she made a single digitally-painted portrait of a tiger in a suit, a character that she invented. That’s something that probably isn’t all that hard to do with present-day generative AI. But try using existing generative AI to create several different views of the same invented character, presented consistently, and that’s a weak point. That may require very deep and difficult changes on the technology front to try to address.
I don’t feel that a lot of this has been hashed out, partly because a lot of people, even in the fields, don’t have a great handle on what the weaknesses are and what might be viably remedied and how on the AI front. Would be interesting to try to do some competitions in various areas, see what a competent person in the field and someone competent in using generative AI could do. It’ll probably change over time, and techniques will evolve.
There are areas where generative AI for images has both surpassed what I expected and underperformed. I was pretty impressed with its ability to capture the elements of what creates a “mood”, say, and make an image sad or cheerful. I was very surprised at how effective current image generation models were, given their limited understanding of the world, at creating things “made out of ice”. But I was surprised at how hard it was to get any generative AI model I’ve tried to generate drawings containing crosshatching, which is something that plenty of human artists do just fine. Is it easy to address that? Maybe. I think I could give some pretty reasonable explanations as to why consistent characters are hard, but I don’t really feel like I could offer a convincing argument about why crosshatching is, don’t really understand why models do poorly with it, and thus, I’ve no idea how hard it might be to remedy that.
Some fantastic images are really easy to create with generative image AI. Some are surprisingly difficult. To name two things that I recall [email protected] regulars have run into over the past couple years, trying to create colored car treads (it looks like black treads are closely associated with the “tire” token) and trying to create centaurs (generative AI models want to do horses or people, not hybrids). The weaknesses may be easy to remedy or hard, but they won’t be the same weaknesses that humans have; these are things that are easy for a human. Ditto for strengths — it’s relatively-easy for generative AI to create extremely-detailed images (“maximalist” was a popular token that I recall seeing in many early prompts) or to replicate images of natural media that are very difficult or time-consuming to work in in the real world, and those are areas that aren’t easy for human artists.
I’ve noticed, at least with the model I occasionally use, that the best way I’ve found to consistently get western eyes isn’t to specify round eyes or to ban almond-shaped eyes, but to make the character blonde and blue eyed (or make them a cowgirl or some other stereotype rarely associated with Asian women). If you want to generate a western woman with straight black hair, you are going to struggle.
I’ve also noticed that is you want a chest smaller than DDD, it’s almost impossible with some models — unless you specify that they are a gymnast. The model makers are so scared of generating a chest that could ever be perceived as less than robustly adult, that just generating realistic proportions is impossible by default. But for some reason gymnasts are given a pass, I guess.
This can be addressed with LORAs and other tools, but every time you run into one of these hard associations, you have to assemble a bunch of pictures demonstrating the feature you want, and the images you choose better not be too self-consistent or you might accidentally bias some other trait you didn’t intend to.
Contrast a human artist who can draw whatever they imagine without having to translate it into AI terms or worry about concept-bleed. Like, I want portrait-style, but now there are framed pictures in the background of 75% of the gens, so instead I have to replace portrait with a half-dozen other words: 3/4 view, posed, etc.
Hard association is one of the tools AI relies on — a hand has 5 fingers and is found at the end of an arm, etc. The associations it makes are based on the input images, and the images selected or available are going to contain other biases just because, for example, there are very few examples of Asian woman wearing cowboy hats and lassoing cattle.
Now, I rarely have any desire to generate images, so I’m not playing with cutting edge tools. Maybe those are a lot better, but I’d bet they’ve simply mitigated the issues, not solved them entirely. My interest lies primarily in text gen, which has similar issues.
That’s also another point of present generative AI image weakness — humans have an intuitive understanding of relative terms and can iterate on them.
So, it’s pretty easy for me to point at an image and ask a human artist to “make the character’s breasts larger” or “make the character’s breasts smaller”. A human artist can look at an image, form a mental model of the image, and produce a new image in their head relative to the existing one by using my relative terms “larger” and “smaller”. They can then go create that new image. Humans, with their sophisticated mental model of the world, are good at that.
But we haven’t trained an understanding of relative relationships into diffusion models today, and doing so would probably require a more sophisticated — maybe vastly more sophisticated — type of AI. “Larger” and “smaller” aren’t really usable as things stand today. Because breast size is something that people often want to muck with, people have trained models on a static list of danbooru tags for breast sizes, and models trained on those can use them as inputs, but even then, it’s a relatively-limited capability. And for most other properties of a character or thing, even that’s not available.
For models which support it, prompt term weighting can sometimes provide a very limited analog to this. Instead of saying “make the image less scary”, maybe I “decrease the weight of the token ‘scary’ by 0.1”. But that doesn’t work with all relationships, and the outcome isn’t always fantastic even then.