Interested in learning what’s next for the gaming industry? Join gaming executives to discuss emerging parts of the industry this October at GamesBeat Summit Next. Register today.
It takes AI kinds to make a virtual world. Nvidia CEO Jensen Huang said this week during a Q&A at the GTC22 online event that AI will auto-populate the 3D imagery of the metaverse.
He believes that AI will make the first pass at creating the 3D objects that populate the vast virtual worlds of the metaverse — and then human creators will take over and refine them to their liking. And while that is a very big claim about how smart AI will be, Nvidia has research to back it up.
Nvidia Research is announcing this morning a new AI model can help contribute to the massive virtual worlds created by growing numbers of companies and creators could be more easily populated with a diverse array of 3D buildings, vehicles, characters and more.
This kind of mundane imagery represents an enormous amount of tedious work. Nvidia said the real world is full of variety: streets are lined with unique buildings, with different vehicles whizzing by and diverse crowds passing through. Manually modeling a 3D virtual world that reflects this is incredibly time consuming, making it difficult to fill out a detailed digital environment.
This kind of task is what Nvidia wants to make easier with its Omniverse tools and cloud service. It hopes to make developers’ lives easier when it comes to creating metaverse applications. And auto-generating art — as we’ve seen happening with the likes of DALL-E and other AI models this year — is one way to alleviate the burden of building a universe of virtual worlds like in Snow Crash or Ready Player One.
I asked Huang in a press Q&A earlier this week what could make the metaverse come faster. He alluded to the Nvidia Research work, though the company didn’t spill the beans until today.
“First of all, as you know, the metaverse is created by users. And it’s either created by us by hand, or it’s created by us with the help of AI,” Huang said. “And, and in the future, it’s very likely that we’ll describe will some characteristic of a house or characteristic of a city or something like that. And it’s like this city, or it’s like Toronto, or is like New York City, and it creates a new city for us. And maybe we don’t like it. We can give it additional prompts. Or we can just keep hitting “enter” until it automatically generates one that we would like to start from. And then from that, from that world, we will modify it. And so I think the AI for creating virtual worlds is being realized as we speak.”
Trained using only 2D images, Nvidia GET3D generates 3D shapes with high-fidelity textures and complex geometric details. These 3D objects are created in the same format used by popular graphics software applications, allowing users to immediately import their shapes into 3D renderers and game engines for further editing.
The generated objects could be used in 3D representations of buildings, outdoor spaces or entire cities, designed for industries including gaming, robotics, architecture and social media.
GET3D can generate a virtually unlimited number of 3D shapes based on the data it’s trained on. Like an artist who turns a lump of clay into a detailed sculpture, the model transforms numbers into complex 3D shapes.
“At the core of that is precisely the technology I was talking about just a second ago called large language models,” he said. “To be able to learn from all of the creations of humanity, and to be able to imagine a 3D world. And so from words, through a large language model, will come out someday, triangles, geometry, textures, and materials. And then from that, we would modify it. And, and because none of it is pre-baked, and none of it is pre-rendered, all of this simulation of physics and all the simulation of light has to be done in real time. And that’s the reason why the latest technologies that we’re creating with respect to RTX neuro rendering are so important. Because we can’t do it brute force. We need the help of artificial intelligence for us to do that.”
With a training dataset of 2D car images, for example, it creates a collection of sedans, trucks, race cars and vans. When trained on animal images, it comes up with creatures such as foxes, rhinos, horses and bears. Given chairs, the model generates assorted swivel chairs, dining chairs and cozy recliners.
“GET3D brings us a step closer to democratizing AI-powered 3D content creation,” said Sanja Fidler, vice president of AI research at Nvidia and a leader of the Toronto-based AI lab that created the tool. “Its ability to instantly generate textured 3D shapes could be a game-changer for developers, helping them rapidly populate virtual worlds with varied and interesting objects.”
GET3D is one of more than 20 Nvidia-authored papers and workshops accepted to the NeurIPS AI conference, taking place in New Orleans and virtually, Nov. 26-Dec. 4.
Nvidia said that, though quicker than manual methods, prior 3D generative AI models were limited in the level of detail they could produce. Even recent inverse rendering methods can only generate 3D objects based on 2D images taken from various angles, requiring developers to build one 3D shape at a time.
GET3D can instead churn out some 20 shapes a second when running inference on a single Nvidia graphics processing unit (GPU) — working like a generative adversarial network for 2D images, while generating 3D objects. The larger, more diverse the training dataset it’s learned from, the more varied and
detailed the output.
Nvidia researchers trained GET3D on synthetic data consisting of 2D images of 3D shapes captured from different camera angles. It took the team just two days to train the model on around a million images using Nvidia A100 Tensor Core GPUs.
GET3D gets its name from its ability to Generate Explicit Textured 3D meshes — meaning that the shapes it creates are in the form of a triangle mesh, like a papier-mâché model, covered with a textured material. This lets users easily import the objects into game engines, 3D modelers and film renderers — and edit them.
Once creators export GET3D-generated shapes to a graphics application, they can apply realistic lighting effects as the object moves or rotates in a scene. By incorporating another AI tool from NVIDIA Research, StyleGAN-NADA, developers can use text prompts to add a specific style to an image, such as modifying a rendered car to become a burned car or a taxi, or turning a regular house into a haunted one.
The researchers note that a future version of GET3D could use camera pose estimation techniques to allow developers to train the model on real-world data instead of synthetic datasets. It could also be improved to support universal generation — meaning developers could train GET3D on all kinds of 3D shapes at once, rather than needing to train it on one object category at a time.
So AI will generate worlds, Huang said. Those worlds will be simulations, not just animations. And to run all of this, Huang foresees the need to create a “new type of datacenter around the world.” It’s called a GDN, not a CDN. It’s a graphics delivery network, battle tested through Nvidia’s GeForce Now cloud gaming service. Nvidia has taken that service and use it create Omniverse Cloud, a suite of tools that can be used to create Omniverse applications, any time and anywhere. The GDN will host cloud games as well as the metaverse tools of Omniverse Cloud.
This type of network could deliver real-time computing that is necessary for the metaverse.
“That is interactivity that is essentially instantaneous,” Huang said.
Are any game developers asking for this? Well, in fact, I know one who is. Brendan Greene, creator of battle royale game PlayerUnknown’s Productions, asked for this kind of technology this year when he announced Prologue and then revealed Project Artemis, an attempt to create a virtual world the size of the Earth. He said it could only be built with a combination of game design, user-generated content, and AI.
Well, holy shit.