Directing 2D cartoons can be pretty liberating at times. Late in the game if we need an anvil for a gag, we can usually just draw it in for one shot. And as we all know cartoons are 17% funnier when anvils are involved. But when it comes to CG productions, introducing a new prop can be a much bigger proposition involving design, modeling, texturing, and rigging.
It’s possible that AI-created 3D assets could change all that.
The AI firms at the forefront of 3D asset creation are hoping to vastly speed up the process (saving money but potentially losing jobs along the way). Over the past few years, a number of tools have emerged claiming to revolutionize the modeling process.
Let’s take a closer look at the state-of-the-art, what’s working and what’s still lacking in this rapidly improving technology.
How it Works
The promise of AI asset generators is simple: input a text or image prompt and the AI tool will output a 3D mesh that can be imported into the CGI program of your choice to be used in games, animation, or VFX. Similar to image generation models like OpenAI’s DALL-E or Midjourney, 3D asset generators are trained on a vast amount of data, in this case 3D models and images. Then when given a prompt (either text or image) the model creates a 3D mesh in one of two ways:
1. Diffusion
Used in models like Meshy.ai
With this approach, the model generates a number of images of the prompt from various angles and then treats it like a puzzle figuring out what kind of 3D shape would create the silhouette from all the given angles.
2. Autoregressive Transformer
Used in models like Hunyuan 3D-PolyGen
The method works more like the way ChatGPT creates sentences; it treats every face and vertex as a “word” and learns how those pieces fit together face by face, vertex by vertex. The end result tends to have cleaner topology.
A Brief History of 3D AI
The road towards the current slate of AI modeling tools started with Google’s DreamFusion in 2022. Unlike more recent models, DreamFusion didn’t train on 3D models at all; it was built strictly off their 2D image model, Imagen. By generating 2D images of an imagined 3D object from various angles, the model eventually could infer a 3D volume.
The result was incredible from a research perspective, but not at all practical for production. The resulting geometry was often dense and “blobby” requiring a lot of refinement and more often-than-not a complete re-modeling of the asset. It was also way too slow to be useful, initially taking multiple GPU hours to produce a single model.
Following up on Google’s research, OpenAI did some work to speed up the process with their Point-E and Shape-E models, but this still resulted in fairly crude geometric shapes (albeit in a quicker timeframe).
From there, NVIDIA took the baton and after further refinement released Edify 3D in 2024. Their model uses a multi-step process of refinement to produce a more production-friendly mesh in much less time. According to their technical paper, Edify 3D can generate a detailed 3D asset in under 2 minutes. A huge leap forward compared to just a few years previous.
Okay, so where does that leave us today?
Consumer-Grade Tools
Currently there are quite a few AI tools on the market that can translate a text or image prompt into a 3D asset. Most of these are fun to play with, but don’t have the kind of fidelity needed to fit into regular production.
The best-known consumer-grade tool is Meshy.ai, a web-based generator that quickly produces textured assets for Blender or Unreal. However, professionals frequently find its topology unsuitable for production, and while Meshy’s new version promises improvement, its practical value remains untested.
Production Ready
First of all, when it comes to creating 3D assets, using text as the input seems mostly useless for real-world applications. If you are creating a prop for a game or show, you need something specific and a fully fleshed out design is really the only way that you’re going to get something that doesn’t feel generic.
Until very recently, models like Meshy were as good as it gets. They approached game-changing results but requiring so much fixing that it hardly felt worth it. But recently a couple of different approaches to the problem seem to be solving at least some of the issues.
The Tech Solution: Hunyuan 3D-PolyGen
Released by Chinese technology giant Tencent on July 7th of this year, Hunyuan 3d-PolyGen purports to be the first “art grade” AI model generator. It can create 3D assets with tens of thousands of polygons with topology that is actually useful in real production. The result is highly detailed models, created in a production-friendly way, with flexible outputs that support both tri and quad meshes.
One of the key differences with PolyGen is that it is more than just a consumer or research model. Tencent is already using it to create screen-ready assets for its video game divisions. The result is a tool that is fully road-tested and the company says it has provided 70% efficiency gains.
So far, the reviews of PolyGen have been mostly positive for hard surfaces. Organic assets still seem to require a decent amount of human input.
The Human Touch: Kaedim
UK-based Kaedim takes a slightly different approach by keeping humans firmly in the loop. Kaedim pitches itself as an on‑demand co‑development platform. Productions upload a single piece of concept art (up to six views) and shortly thereafter their low or mid-poly model is ready.
Kaedim blends a proprietary 2D‑to‑3D neural network with an in‑house art team that fixes whatever the model gets wrong before sending it to the client. They say this human part of the process ensures quality. Kaedim may not have their own production studio like Tencent, but their workflow has drawn clients like Aardman Animation and Rebellion Games.
Potential Pitfalls
While AI‑driven asset generators can slash production time, they come with serious risks. First, most services run in the cloud meaning any proprietary IP used in the creation of the asset sits on someone else’s server. And in the case of a Chinese company like Tencent, you may be dealing with copyright laws that are much more lax than North American or European standards.
Secondly, the models themselves are often trained on massive public datasets, opening up legal risks similar to that of using image generators like Midjourney or Stable Diffusion.
The Model From All Angles
It’s also worth reflecting on AI’s broader impact. As automated systems take over more of the modeling labour, what will this do to the workforce overall? The utopian vision is simply more work being done at a bigger scale, but I worry that the reality might be a shrinking of the crews. Even if AI could created a perfect 3D model from an approved design, eliminating that job means we have a smaller pool of people who will ultimately use their acquired skills to learn and grow and come up with the next great thing.
The forward march of automation has always been a part of the animation industry, but if AI accelerates that to previously unimaginable speeds, it’s in our best interests to take a moment and examine the true costs.
Seeya next time,
Matt Ferg.