Aaaand #GitHub turned the dashboard view I already don't use into an algorithmic social feed because... why?
Like, I don't need it to recommend me projects my friends are working on, or at the very least it should be rare. I'm pretty sure there's a lot of shit I work on that nobody else cares about (e.g. if you follow my #Ruffle repo that doesn't mean you want to know about #PDDiffusion )
Part of the reason why I keep banging my head against the wall on #PDDiffusion is because PD (by age) training data neatly solves all the ethical problems AI art generators have. It'd be fairly difficult to ACCIDENTALLY rip off a living artist with such a thing.
With a PD-trained music generator this isn't so certain.
If I jump into the actual set of images it read from, a LOT of them are #maps. Which explains why the from-scratch CLIP trained #PDDiffusion likes to draw maps, but not why the OpenAI CLIP trained one generates pixel nonsense.
Actually, no, it doesn't explain it, because these are all clearly labeled as maps and #CLIP should be able to distinguish between the maps, portraits, and landscapes in the set.
Hey, remember how #PDDiffusion was spitting out nothing but maps?
Well, I retrained on OpenAI CLIP, and now it's spitting out nothing but nonsense. The attached image is supposed to be a "landscape painting of a forest".
#PDDiffusion 90k finished #training today, and the results are...
Uh... it literally forgot how to #draw anything that isn't a map. The prompt for this was "forest landscape painting". All the training data from the 29k version is still there.
I'm retraining with #OpenAI #CLIP instead of my own from-scratch model to try and narrow down the cause of this model forgetting literally everything. #ai #aiart
#PDDiffusion #training #draw #openai #clip #ai #aiart
Oh my god, I just heard about the whole Automatic1111 thing. tl;dr the biggest Stable Diffusion frontend is written by one guy that writes racist Rimworld mods. He recently got banned from GitHub... not because of the game mods, but because he linked to how-tos on how to use his frontend to generate underage AI anime porn. 🤢🤮
This is basically fitting into every stereotype of the AI art community that I have. I am getting less enthused about #PDDiffusion every time I read about this shit
I'm back from dinner and #PDDiffusion #CLIP training is now 25% complete. 4h30m estimated left.
That's... really weird that it got that fast, but OK I guess
So, the #WikimediaCommons scraper in #PDDiffusion choked on another weird date format (negative years) around the 90k image mark. I decided, screw it let's just do another training run. It's not quite "10x the model" but hey it should at least provide a measurable improvement.
#CLIP is training now; about 18 hours. I expect the U-Net to take a week.
I also found out that my wikitext label extractor was busted and not actually extracting label data. So that should also help.
#wikimediacommons #PDDiffusion #clip
Y'know, if #PDDiffusion doesn't work out, I came up with an alternate option. It might be a little too late but I figured it'd at least be funny:
We chuck the #aiart grifters against the NFT grifters.
Right-click, save-as is inefficient. How about we instead scrape the entirety of OpenSea and chuck it into a U-Net? Such a network would not only generate every NFT ever (thus devaluing it), but it will also generate every NFT that could ever be (because they're really fucking samey).
I decided to screw the VAE training for now and just start scraping images again #PDDiffusion #aiart
I have to babysit the scraper because the wikitext parsing still hits corner cases and crashes because, say, this CHEEKY FUCKER decided he was going to be painted on 176X
https://commons.wikimedia.org/wiki/File:Aleksy_Bobry%C5%84ski.jpeg
I thought the X years were only invented in 200X
So, with slicing enabled I CAN train the same VAE architecture that Stable Diffusion uses (128,256,512,512)... with a batch size of one and 20 minutes per iteration.
Note, that's not per epoch. That's 20 minutes PER IMAGE.
...aaaand it just threw the CUDA out-of-memory error anyway. Blargh.
Oh look another poorly-documented "click here to go fast" option #PDDiffusion
#PDDiffusion update:
- Data augmentation went as well as I could have hoped. Trained models are now a lot more likely to spit out something vaguely related to your prompt.
- I got rid of my flat-file database hackery and actually set up SQL to store image metadata and labels. Right now I'm using SQLite but I can migrate this over to MySQL or Postgres just by changing a connection string
- I started work on VAE training. It's not going well.
So, I've been working on data augmentation strategies for #PDDiffusion. As part of that, I learned the main reason why it seemed so damned unresponsive to text prompts:
I was only training on the image's CLIP vector, not the CLIP vector for its associated label text.
If CLIP training just so happened to be well fit on that particular image/label pair, great. If it wasn't, then text prompts that matched those images were effectively not being trained on.
BRUUUUH, the #PDDiffusion training set absolutely *does* have #guineapigs in it
https://commons.wikimedia.org/wiki/File%3AVictors_Fowl_with_a_rabbit_and_guinea_pig.jpg
It should at least be regurgitating this image when I run it with the prompt "a guinea pig"
Ok, so I tried training #PDDiffusion against OpenAI's CLIP... again. The first time I ran up against a weird bug, the second time I actually fixed it.
My first thought was that the small data set meant a dumber CLIP. But bringing in a smarter CLIP does not actually make the model smarter. (So is the infringement is stored in the U-Net?)
"painting circa 1800 portrait of a paintress oil on canvas"
So, CLIP isn't broken after all. #PDDiffusion 's label set is so narrow and with so many specific phrases that prompt engineering is hilariously critical to getting anything useful out of it - even with the improved wikitext parser. Descriptions aren't good enough.
Definitely going to have to build a manual labeling tool at some point, because there's entire styles of things in the dataset that you just can't recall right now.
Ok, so there's also apparently a bunch of art tagged with https://commons.wikimedia.org/wiki/Template:Royal_Museums_Greenwich which makes the labels CC-BY-NC-SA.
Going to have to completely filter those out from the training set, since even though label copyright does not leak through U-Nets, it does cover the model weights itself and would make it illegal to use the pipeline in a commercial manner (which I totally want #PDDiffusion to be able to do)
So, I've made a bunch of improvements to the wikitext parser in #PDDiffusion, but in the process found that basically all the dates on a particular artist's paintings have stray parenthesis in them. :/ Time to warm up my decades-old Wikipedia account...