Okay! Hello!
There seemed to be some interest in AI image generation at DevX today so I thought I would throw together a rough little resource pack on my strategy for experimenting with the technology as a developer instead of as a consumer, but on easy mode with UI, plugins, and community.
tl;dr
If you just want some quick, dirty resources, here ya go:
- RunPod for compute power
- Vast.ai for compute power
- Automatic1111 for WebUI
- ComfyUI for WebUI
- CivitAI for models and workflows
Good luck soldier. You’re a professional now o7.
…
… …
… … …
… … … …
Are you still here?
You must want to learn more about-
Disclaimer
I’m writing from the perspective of sharing my experience with someone that just told me they’d like to start integrating AI image generation into their apps.
RunPod (Compute Power)
Skip if you intend to run locally.
RunPod is like a VPS / serverless provider that specializes in powerful GPU availability. Yes there are other options if you don’t intend run AI models locally, and I have thoughts on Google Colab, Huggingface spaces, and many other options, however I am recommending RunPod. They’re much closer to what the actual experience of using these technologies on metal is like, and if you can deploy on RunPod, you will understand how to deploy it on any metal solution should you decide to scale a project or scope something out in the future. Additionally RunPod offers a serverless option, so if you become proficient with the deploy strategy, you can easily offer the GPU power to any of your own apps and allow for low-cost AI apps at your fingertips.
It would be good to have at least 4GB of VRAM to run most stable diffusion models in a timely manner, if you wanted to skip using RunPod.
Expect to spend ~$0.25 per hour of practice time, and should you integrate serverless gpu into your app, ~$0.00016 => ~$0.00076 per second of compute time (which works out to about ~$0.0006 => ~$0.002 per img generated in my experience).
Vast.ai (Compute Power)
Similar to RunPod, but cheaper. They’re a little less user friendly, however if you get into training LoRA or want to explore some generations that might take a few days to complete, then this might be a better option for you, as the cost per hour is typically lower and there are more machines with more power available.
A1111
A1111 or Automatic1111 is a UI for interacting with the stable diffusion models. Before the advent of UI’s like A1111, if you wanted to use a pre-trained model like stable diffusion, you would need to implement it using a ML library like TensorFlow, Keras, or PyTorch. Implementation was usually done directly in python code. This became much easier with a WebUI which would allow you to interact with python scripts under the hood easily using an intuitive UI. A1111 is the most beginner friendly of the WebUI’s in my opinion, is feature rich including a plugin environment, script implementation, and dev API upon launch. There are a ton of resources out there explaining how to use A1111 the best of which being the Official Documentation
ComfyUI
Similar to A1111, however considered to be more powerful. A higher barrier to entry, but a higher skill ceiling.
CivitAI
This is a community space for people training their own models. These models are usually a stable diffusion checkpoint that has been fine-tuned with a specific purpose in mind. There are plenty of models for every use case, and LoRA trained to recognize specific small knowledge subsets, like understanding a specific character or creating an art style.
CivitAI also has a section under each model with suggestions on how to use that model most optimally, and community generations with detailed workflows of how those generations were created.
Well that’s the brief summary I will give to these topics, I hope it’s a good launching point for your own journey of exploration. If people end up taking more interest, I’ll maybe make a more detailed guide of step-by-step how-to use these pieces of technology, and some neat Docker image shortcuts for quick deployments.