Home / Case Studies /Midjourney
Midjourney:
Beauty First.
The tiny team that built the most visually stunning AI image generator in the world — no external funding, no corporate parent, just an obsession with making AI that creates genuinely beautiful art. The full story, explained for everyone.
Midjourney was founded by David Holz — and his story is unlike almost every other AI company you’ll read about. No co-founders from OpenAI. No $1 billion funding rounds. No corporate parent. Just a person who had spent years thinking about human creativity and decided to build the tool he wished existed.
Holz previously co-founded Leap Motion, a hand-tracking hardware company. After leaving, he spent time thinking about a different kind of question: not “how do we make AI more accurate” but “how do we make AI more beautiful?” He believed the most important thing AI could do for creativity was expand what humans could imagine and visualise — not replace artists, but give everyone the ability to externalise the images in their minds.
He founded Midjourney as a small research lab in 2021 and launched the public beta in July 2022. Unlike DALL-E or Stable Diffusion, Midjourney launched exclusively through Discord — the chat platform popular with gamers and creative communities. You joined the Midjourney Discord server, typed a command in a public channel, and watched your image generate alongside thousands of other people’s creations. It was simultaneously a product, a community, and a living gallery.
The results were immediately stunning — a different aesthetic than DALL-E. Where DALL-E aimed for photorealism and instruction-following, Midjourney had a distinctive look: painterly, dramatic, often cinematic, with an almost innate sense of composition. Designers, artists, and photographers were transfixed. It spread through creative communities at extraordinary speed.
If DALL-E is a technically excellent art student who can faithfully reproduce anything you describe, Midjourney is a gifted artist with their own aesthetic sensibility — they understand what you want, but they also bring their own vision, and the result is often more beautiful than what you described.
The decision to build Midjourney on Discord was not a technical limitation — it was a deliberate choice. And it turned out to be one of the cleverest product decisions in recent tech history.
Here’s why it worked:
Community as product. In Discord’s public channels, you see other people’s prompts alongside their generated images — in real time. New users learn prompt craft by observing thousands of experiments happening around them. Power users develop techniques, share tips, and build on each other’s discoveries. The community became Midjourney’s R&D team, customer support department, and marketing channel simultaneously — without Midjourney paying for any of it.
Zero infrastructure cost for the UI. Midjourney’s team didn’t need to build a website, a user account system, a payment system, or a social feed. Discord provided all of that. The team could focus entirely on the one thing that mattered: making the AI better. This is why 11 people could run a $100M company — they built almost nothing except the model itself.
Network effects. Every new subscriber joined the same Discord community. As the community grew, the value grew with it — more experiments to learn from, more experts to ask, more inspiration to draw on. This is a classic network effect: Midjourney becomes more valuable as more people use it, which attracts more people.
In 2024, Midjourney finally launched a dedicated web interface — but the Discord community remains the heart of the product. Most power users still prefer it.
Midjourney is one of the most capital-efficient businesses in the history of technology. By building on existing infrastructure (Discord), letting their community do their marketing, and staying relentlessly focused on the core product (image quality), they reached $100M ARR with costs a fraction of competitors. The lesson: the right distribution strategy can be worth more than millions in product development.
Midjourney uses the same foundational technology as DALL-E — diffusion models that start with noise and gradually refine toward an image matching your description. But the reason Midjourney looks so distinctively different comes down to what it was trained on and how its aesthetic preferences were shaped.
Midjourney’s architecture is proprietary. Unlike Stable Diffusion (fully open-source) or DALL-E (published in academic papers), Midjourney has never released its full technical architecture. David Holz has described it as their core competitive advantage. What the community has inferred through extensive testing and some public comments from Holz:
It uses diffusion, but with heavy custom modifications. Midjourney started with publicly available diffusion model research and made significant modifications to the training process, the noise schedule, and the aesthetic preference modelling. The exact architecture has evolved substantially with each version — V6 is likely a fundamentally different model than V1.
Composition is learned, not computed. Unlike approaches that explicitly try to model spatial relationships, Midjourney’s compositional strength appears to emerge from its training data and feedback loop. By training heavily on images with strong compositional principles (rule of thirds, leading lines, clear focal points), the model learns to apply these principles implicitly — without being given explicit compositional rules.
Coherence at the cost of controllability. Midjourney makes a deliberate trade-off: it prioritises aesthetic coherence over precise prompt adherence. If your prompt specifies something that would look aesthetically poor, Midjourney may “improve” on it — which experienced users sometimes find frustrating but beginners find magical.
Upscaling and variation as core features. Midjourney generates 4 image variations by default, and its upscaling pipeline is one of the best in the industry — preserving fine detail and adding texture at high resolution in ways many competitors struggle with.
Ask a technically skilled camera operator to photograph a subject and they’ll set up exactly what you describe. Ask a brilliant cinematographer the same thing and they’ll set up something close to what you described, but they’ll also adjust the lighting slightly, choose a more interesting angle, and apply their own sense of what looks good. Midjourney is the cinematographer — and that editorial instinct is both its greatest strength and occasional source of frustration.
“Midjourney changed my practice more than any tool since Photoshop. Not because it replaces what I do — because it lets me explore territory I would never have had time to explore before.”
Where Midjourney excels: Pure aesthetic quality. If you want a beautiful image and are willing to iterate on it, Midjourney produces results that are consistently more visually compelling than any competitor. Concept art, editorial illustration, mood boards, atmospheric scene-setting — it has no peer.
Where it struggles: Precise instruction-following. Ask for “three people sitting at a rectangular table, two on the left and one on the right” and you’ll likely get something that captures the essence but not the specifics. For exact compositional requirements, DALL-E 3 is more reliable. Midjourney is better when you want something that feels right rather than something that is exactly right.
The consistency problem. Like all image AI, Midjourney has no concept of “the same character” between generations. Generate a character once, then try to generate them again from a different angle — you won’t automatically get the same face. Midjourney introduced “character references” in V6 to partially address this, but it remains imperfect. For work requiring consistent characters across many images (comics, children’s books, brand mascots), this is a significant limitation.
The copyright training question. Midjourney has been sued by artists who argue their work was used in training without consent. In 2023, artist Sarah Andersen and others filed a class-action lawsuit against Midjourney, Stability AI, and DeviantArt. The legal outcome is pending. If you generate commercial work with Midjourney, this is a risk to be aware of — particularly if your work can be traced to a specific artist’s style.
No self-funding isn’t magic. Midjourney’s zero-external-funding story is admirable — it means no investor pressure to grow at all costs. But it also means slower development cycles, fewer researchers, and less compute budget than better-funded competitors. Claude, GPT-4, and Gemini are backed by billions. Midjourney is backed by subscription revenue from a Discord server. That gap may eventually show.
| Capability | Midjourney | DALL-E 3 | Stable Diffusion |
|---|---|---|---|
| Aesthetic quality (out of box) | ⭐ Best | Very good | Varies — depends on model |
| Precise prompt adherence | Good | ⭐ Best | Good with right settings |
| Free to use | No — subscription only | Included in ChatGPT Plus | ⭐ Yes — fully free |
| Run locally (no internet) | No — cloud only | No — cloud only | ⭐ Yes — runs on your machine |
| Open source | No | No | ⭐ Yes |
| Text in images | Improved in V6 — still imperfect | Best | Poor in base models |
| Community and learning resources | ⭐ Best — 16M Discord members | Limited | Very strong — huge open community |
Choose Midjourney when aesthetic quality is the priority and you have the tolerance for iteration. It’s the best tool for any creative work where the question is “does this look beautiful?” rather than “does this match my exact specification?” The subscription ($10–$60/month) pays for itself immediately in any business that currently buys stock imagery or commissions exploratory concept art.