Making Game-Ready Faces Using Photogrammetry & Unreal Engine

Members of the Myrkur Games team have told us about their photogrammetry workflow, explained how their rig works, and showed how to use Unreal Engine, ZBrush, Maya, and Substance to turn the photos into characters.

Introduction

Myrkur Games is an Icelandic video game studio, working on the development of a new third-person action-adventure game called Echoes of the End for both PC and consoles. The game will be published by Prime Matter (Koch Media/Embracer Group) and is an original IP created by the studio as well as being the first game the studio develops. We are also currently expanding our team, so if you’re interested in a position with us check out our open positions here.

Echoes is set in a new fantasy/sci-fi setting and has a realistic style, utilizing most of what Unreal Engine 5 and the new generation of consoles have to offer. For many of the characters in Echoes, we want to capture the likeness of our actors to a very high level of detail. Due to the size of the cast and production requirements of the game, we quickly realized this meant we were going to use photogrammetry.

To avoid disclosing any details about the game, we can’t actually show you details on the characters we’ve worked on for Echoes. So we grabbed one of our own level designers, Eirikur, and we put him through our pipeline and documented the steps along the way. For context, we spent roughly a single day of work in total making the scan from start to finish.

Scanning Humans

Photo scanning still objects can usually be achieved with a single camera, where hundreds of photos are taken over some period of time. The problem with scanning people this way is that, unlike inanimate objects, they will naturally move around, blink and change their expression ever so slightly. These subtle differences lower the quality of our data used to construct the 3D model, yielding inconsistent results. We need to capture a vast range of facial expressions, and during capture, it becomes nearly impossible for the actors to maintain that expression perfectly for anything over a few seconds. So for more reliable results, we opted to build our own in-house photogrammetry rig.

Let’s break this down and talk about the choices made along the way in building our rig and pipeline.

Rig Setup

We initially experimented with 18 cameras mounted on tripods in various configurations to see if that would work and provide enough overlap for the software. We quickly found that 18 cameras did not produce strong overlap, so we opted to arrange them in a cone and rotate our subjects on a chair/electric turntable, taking a photo every 45° of rotation. This resulted in 8 photos for each camera or 144 photos in total.

Photogrammetry Rig with 18 Cameras

We achieved some pretty remarkable scans of both faces and bodies with 18 cameras this way, but we had a problem with consistency in the 3D reconstruction due to subtle changes in our subject's facial expression and posture, as well as inconsistent lighting due to the rotation of the subject. Since we are scanning multiple facial expressions (20-40) for each actor, we needed to improve our consistency.

We went back to the drawing board and came up with a layout to cover 360°, capturing only one photo for each camera – reducing all risk of inconsistency. Based on lots of trial and error with 18 cameras, we created a 3D layout plan for a 36 cameras configuration which we ended up going with. This time opting for a fixed frame layout instead of tripods, as we knew it would be easier to work with and provide more consistent results overall.

3D Blockout of Our 36 Camera Rig

In total, we operate 36 cameras. We chose to go for a mix of Canon 750Ds and 2000Ds (about 50/50 split) since they offer high-resolution images in a full-frame camera while remaining a very cost-effective option.

We planned our rig very accurately in 3D using Maya, to understand the placement of all the cameras and gear in order to maximize camera overlap. The frame we built was a very lightweight aluminum system, which we got at the specialty hardware store and assembled on location.

Our Updated 36 Camera Rig

Lighting setup

To capture the likeness of an actor the texture capture is also extremely important, so you want to have somewhat neutral and even lighting. For our initial experiments, we worked with flash photography lights. However, upgrading to our 36 camera layout we chose to go with constant LEDs lights instead. While the LEDs generally produce less light, they are much easier to work with and occupy less space. With limited space in the room, this also allowed us to place more lights evenly around the subject to eliminate harsh shadows and highlights.

We’re shooting this in a relatively small room, so we painted the room white and used white sheets around the rig in order to maintain color neutrality and maximize bounced light. An important last step for us was to use polarizing filters, which we use on our camera lenses and lights, in order to remove specular highlights from our subject.

Operating the Cameras and Retrieving the Photos

To trigger all the cameras at the same time we use six Esper trigger boxes. These are devices specifically made to trigger many cameras at once with very high accuracy. From our experiments, trying to trigger all the cameras directly via USB instead of the Esper boxes introduced a pretty significant delay between cameras where the actor might move, blink or slouch a bit, any of which can have a negative impact on the results. We also learned that it is next to impossible to use flash photography without something like an Esper box to control the flash trigger, alongside the camera shutter.

Esper Control Boxes

To operate any settings on the cameras, as well as transfer photos to our computer in bulk - we use a program called Digicam control. To operate the cameras via Digicam, all of the cameras are connected to the computer via large USB 3.0 hubs.

For camera settings, it is most important to keep the ISO as low as possible at 100-400 to avoid unnecessary grain in the images. Avoid a shallow depth of field. Try to keep a minimum of F8.0 and 1/60 shutter speed. Aim for a consistent and neutral focal length (50mm, but may depend on your camera sensor and lens). And lastly, it is very important to color calibrate the cameras and ensure that you are capturing consistent and neutral colors for texturing purposes. Shoot everything in RAW format (or very high-quality settings).

Processing the Scan

Once we have retrieved the photos, we bring them into Lightroom. Here, we flatten the lighting, even more. We bring highlights down and shadows up. This is done because from our experience, we’ve learned that this will generally help our software analyze the photos with more accuracy as less information is lost in under or over-exposed elements. It’s fine to go a bit wild to help the software achieve the best 3D results, as we can always replace the texture later with our preferred settings once we have the model ready.

We then bring the edited photos into Agisoft Metashape (any photogrammetry software will do). Through its magical algorithm, Metashape aligns all the cameras in 3D space and constructs a point cloud from the data.

3D Data Constructed from Photos in Agisoft Metashape

Hair is something that really messes up this process so we have our subjects wear this stylish hair cap. We also make sure that the hair cap does not cover the hairline completely so we know where to place the hair. Usually, we also ask subjects to shave any beards before coming in for a scan – but a small stubble is easy to clean up in ZBrush later if needed.

1 of 2

So this is the results we get from the neutral scan, a really nice base to build upon. We are not looking to capture pore details, as we add those in later ourselves. It’s worth noting that we actually forgot to adjust one of the lights for this scan. As a result, there is a bit of shadow being cast off the nose, but something like that is easy enough to clean up later.

As mentioned before, we also capture different expressions to use as blend shapes for our riggers and animators to work with when animating our digital double. This will mean that when our characters frown or smile, it is their smile that you see in the game and the nuances of their performance are not wasted.

1 of 2

Retopology and Cleanup

Once we have a 3D scan the next step is to wrap our base mesh head topology around it using a program called wrap3d. All of our heads share the same topology and UV's to streamline things as much as possible, it also means you can swap textures between any character which really speeds up the process of making background characters.

Next, we bring this wrapped topo into ZBrush, sculpt away the things we don’t want, like the beard, and put in any details like skin pores or scars. Here is where we add 3D eyes and other details.

1 of 2

We already have displacement maps for pores that fit our UV’s so barely any time goes into making those, it’s mostly just importing the maps. On top of that, we sculpt any wrinkles that may be unique to that person. Sometimes by importing the Color Map and tracing information from it. After cleaning the neutral expression we repeat the same process for the blend shapes.

Blendshape transition within ZBrush:

Blendshapes can be added as additional layers to the neutral face in ZBrush. We then bake Normal Maps from the blendshapes and composite them into a wrinkle map which will overlay on the base normal in-engine depending on which expression is being triggered.

The Textures

At this point, we usually clean up the color texture from the scan. This includes painting out any clothing the actor may have had on and getting rid of any lighting information that may remain. All the texture work is done in Substance 3D Painter.

To get rid of any shadows we make a layer with a light effect, put a black mask on it, and then paint in the mask where needed to lighten up the darker areas. Normally we would also paint out the haircap with a texture similar to the character's hair.

The only other textures we make for singular heads are a Normal Map and a Cavity Map that we bake from the high poly. The rest is mostly generated in the skin shader inside Unreal Engine or uses reusable maps that all characters share.

Results

So what we end up with at that stage is Eirikur’s head, cleaned up and imported into Unreal Engine with some very basic materials.

Scanned Model Inside Unreal Engine

For main characters we spend more time than that getting things just right and finessing small details, making sure the character feels right and possibly exaggerating some features if we feel like it’s necessary, but for a secondary character, something like this would be just fine.

Texture Variation Using the Same Mesh

And that's everything! If you’d like to see more of this process, we recently posted a vlog about this scan on our YouTube channel!

Also, the Myrkur Games team is expanding and we are looking for multiple new hires to join our team in the development of Echoes. Check out our page here to see all the openings.

Making Game-Ready Faces Using Photogrammetry & Unreal Engine

Introduction

Scanning Humans

Rig Setup

Photogrammetry Rig with 18 Cameras

3D Blockout of Our 36 Camera Rig

Our Updated 36 Camera Rig

Lighting setup

Operating the Cameras and Retrieving the Photos

Esper Control Boxes

Processing the Scan

3D Data Constructed from Photos in Agisoft Metashape

Retopology and Cleanup

The Textures

Results

Scanned Model Inside Unreal Engine

Texture Variation Using the Same Mesh

Katrin Inga Gylfadóttir and Jóhannes Ágúst Magnússon, Art Director and Character Artist at Myrkur Games

Join discussion

Comments 0

You might also like

We need your consent