AI-based image generation tools are transforming the way we perceive and create art. I explored four such tools to understand their unique capabilities and limitations. The test subject was the dame blanche website (and Instagram feed) that I have finally released after boring my friends with the silly idea for years. Join the discussion on LinkedIn.
That said image generation has big potential for business in marketing and advertising so time to explore the tools of the trade!
- DALL-E is super easy to use with high quality images that don’t look realistic
- MidJourney gives more creative control and can create realistic images
- Stable Diffusion is for hackers with full control but less quality
- Photoshop created a super nice image but ignored the finer elements of the description
for the comparison I used this prompt: “dame blanche ice cream with dripping chocolate sauce in style of Renaissance”. I then used this base prompt to ask ChatGPT to generate a much more detailed description (pro tip). I used this expanded prompt for all trials, except for DALL-E where this is automatically done under the hood.
- easy to use, integrated into OpenAI ChatGPT Plus, has a really good understanding of the request, with minimal detail, just ask for an image and a visually appealing image comes out. ChatGPT will first rewrite your prompt before sending it to DALL-E, which is nice.
- A sumptuous depiction of a Dame Blanche ice cream, characterized by its creamy texture and elegant presentation. The ice cream is topped with rich…
- limited creative control leading to many images looking the same
- no way to consistently generate images in the same style (unless you stay in the same chat session). This is a big limitation when using image generation for professional use where you may want a common style in a series of images.
- The DALL-E editor (only for v2) lets you manipulate generated images, eg by removing a section and asking DALL-E to in paint the missing part.
- the OpenAI content policy blocks many creative avenues, even if they are perfectly legal, for example when copyright of an artist has already expired.
- automation is possible through the DALL-E API, however, for some reason the same prompts sent through the API consistently lead to lower quality images when comparing with the web interface
- a bit harder to use, but with more creative control, you can specifiy lighting, perspective, and camera type. You can refer to the typical style of a director, a film genre, level of creativity, etc. A prompt would look something like this: A Cinematic scene from [YEAR, MOVIE GENRE, MOVIE NAME], [SHOT TYPE], [SCENE/SUBJECT/ACTION] captured by [CINEMATIC CAMERA], film directed by [DIRECTOR], [EMOTIONI, [LIGHTING] –ar 1:1 –style raw —-v 5.1
- can refer to the style of living artists (which is interesting because copying a style is ok but the images shouldn’t be in the training data so it should be impossible…)
- you can upload your own image and have them modified
- you can consistently generate images by controlling the ‘seed’. The seed determines the starting point for the randomisation of the neural network. – controlling this starting point gives you the predictability to create images in the same style.
- you can see live what others are creating and how they do it (with a higher subscription you can also stay hidden if you want)
Stable Diffusion SDXL 1
- open source, can run locally (I run it on an RTX3090) and SaaS
- many people share their custom fine tuned models and what they have created, along with the prompt and the settings so you can learn from them
- full control over the models and all parameters of the image generation
- consistently generate images through control of the seed (see above)
- can automate with the API at stability.ai
Adobe Photoshop 2024
- super easy to use and seamlessly integrated into the Photoshop user interface. Uses the Photoshop image library as training set.
- can replace / remove parts of an image (in-painting / generative fill)
- excellent for extending an image to make it bigger by adding missing environment (outpainting)
- seems still early days in terms of creative control + there is a 500 character prompt length limitation so you cannot fully use the extended description generated by chatgpt
what are your experiences with these tools? Comment below!