No Simple Answers In Stereo

There was some continued back and forth on Mastodon about stereo conversions. Mac Stories contributor Jonathan Reed asked a couple questions:

What’s your view on the best converted movies? Do you think they hold up just as well vs (non-bad) native 3D movies?

I am not picking on Jonathan, but crediting him with a question that seems very reasonable. It seems logical to ask for an example of what’s working, but that’s much more difficult to do than it sounds. It’s kind of like proving a negative (even though this is a positive?)

If there is a best conversion, you’re unlikely to be aware of it at all, because the audience usually only remembers technical errors, or discomfort. There’s nothing outwardly impressive about a good conversion, or good native stereo, and anything that was held up as a good conversion would be picked apart with intense scrutiny to prove that it’s not actually good.

Add on top of that the point Todd Vaziri and I were trying to make in that thread, and in our feedback to the Accidental Tech Podcast, that there are a variety of methods employed in various shots in various movies. It is not as homogeneous as it appears to be in untrustworthy marketing, or that silly “real vs. fake” site.

There’s no binary bit on the movie that flips if one shot in a native 3D movie is a post conversion shot, what percentage of shots need to be rendered fully 3D in an animated feature, or a blockbuster with 2,000+ effects heavy shots.

I know that is deeply unsatisfying as an answer, and the follow up question would be for movies that don’t work well. For professional reasons I wouldn’t ever spell that out.

Ultimately, I know that just saying that it’s nuanced and complicated is not very helpful or informative to people that want to understand stereo. For those that want a quick answer on whether a movie is worth watching in 3D on their Apple Vision Pro, there’s nothing so simple as a list.

The best I can do is talk through common problems in stereo. To do that we need to talk about some terms. To do that I’m going to need to bore the fuck out of you.

Native Stereo Photography

This is shot with two cameras. Usually this involves a cumbersome rig where the cameras have to be exactly aligned, have matching apertures, matching focal distances, and are slightly physically offset. To get them close enough together they’re often arranged with one camera pointing straight up, and it gets it’s light from a beam splitter. A semi-transparent plane of glass that lets light pass through to the main camera, but also reflects down to the vertical camera. This is an enormous pain in the ass, and it’s very easy to have something be just a little off in a way that won’t be clear until later.

When the stereographer and director finish shooting, they can adjust the convergence by horizontally transforming the photography which pushes and pulls things in order out of the screen depending on where the left and right eye converge. However, they can’t adjust the interaxial without throwing away one of the eyes and doing it over with conversion. That means they might be more conservative in all of their choices to reduce the chance that there’s an error.

Things to look for are misalignment. If the left and right eyes have an angular difference between them, or skew slightly. Your eyes are looking for horizontal disparity so vertical shifts mess it up a little. This is abundant in iPhone 15 Pro Spatial Videos because of Apple’s attempts to compensate for the mismatched lenses.

Another big thing is color shifts from the beam splitter. Sometimes that could manifest as a constant shift, or it could be transient if the camera rig is moving and the light catches differently. It’s possible to color correct the views to get a closer match but uncorrected differences might appear to shimmer when your brain processes the slightly different hues and values.

Specular reflections. Think of bright pings of light on glass or chrome, often from a distant, but bright light source. One eye might get the ping of light and the other eye doesn’t. A mismatch like that can appear to glow, or shimmer, and could be uncomfortable to look at. To correct for this in native stereo the ping might be artificially copied and offset to the other eye, or the value of the ping might be knocked down so it doesn’t draw attention.

If you have a visual effects shot where native stereo plate photography is combined with rendered assets you might see issues that you wouldn’t get if it was a post conversion. Like a bluescreen or greenscreen shot where the work done to extract the photographic element from the screen color where the extraction is not an exact match. A common issue is flyaway hairs, those thin wisps of hair that are always difficult, could be in one eye but not the other, or trimmed in an odd way.

Flyaway hairs in non-VFX native stereo shots should always look pretty good, but depending on how deep the background is behind them you might be surprised to notice them more than you would in a 2D movie.

This doubling of work - and the need for it to match - is also what makes something like wire removal paint much more difficult. It’s easy to make each single view of paint be internally consistent and work, but then to make sure those paint adjustments match between both photographic plates is a pain that you don’t have to deal with in conversion.

It used to be very difficult to get 3D matchmove solves that were rock solid for both eyes. Meaning something could appear to float away where native stereo photography and CG met. Very rare, but maybe if you’re watching something old things might seem to drift or breathe.

Another thing 2D VFX artists take for granted is being able to use masks/mattes/rotosplines - and only having to do it once without thinking about where the matte sits in depth. The matte could be used to grade the background, or it could be to help extract a person. Those rotosplines need to be done for two eyes, and they need to match the plate photography, and their companion spline, including motionblur. A soft mask extending back along an angled surface will need to have depth that matches that angle in the other eye. So you end up doing the post conversion kinds of steps on the mattes applied to your native stereo left and right images to make them match and sit in depth, but are constrained to the native stereo plates as well.

Native Stereo Renders

Native stereo renders in animated movies, or for shots in a VFX heavy movie that don’t have photography, have their own pros and cons. Even those “all CG” shots are not always fully rendered in stereo for left and right eyes. The flat version of the movie will be done while the stereo version of the movie lags behind a little bit. That means that rendering the offset eye might reveal issues where an old version of a shader was used, or an asset changed since the original shot completed. It can be much more of a puzzle.

People also can do anything they want to with their cameras because they are no longer constrained by physics. That means you either get mind-bending stuff, like stuff sticking out of the screen that would really be a considerable distance away from the audience, requiring enormous interaxial camera offsets, or sometimes they’ll just make it really flat, even though they have the ability to do whatever they want.

You also still have some of the same issues presented by bright specular pings being in one eye, but not the other, but also that they might sizzle because bright, distant light sources need more raytracing samples (tiny thing far away gets more missed rays than hits).

You might be like, so what? Just turn up the samples, right? That’s easier said than done in some cases, especially if the 2D version of the movie is done already, or the rendering engine just can’t resolve some very bright, distant point of light with enough samples that won’t take 3 months to render. The sample noise will sizzle differently between the two eyes and appear to glow. There are ways to cut out pixels from the other eye, or median filter it, or what have you, but if it’s uncorrected you’ll see sizzling pixels.

Native stereo renders do have one fun trick and that’s the depth map (Z) channel that is normally used for depth of field focus effects. It is a image where every pixel corresponds to how far away something is from camera. It can be used to create an exact offset based on the stereo camera pair. This makes it kind of like post conversion where fake depth is used to offset 2D data from one camera view to the other. That means you can offset things like rotosplines, or other 2D elements, to match the depth of your 3D exactly. I do mean, exactly, since it will be at exactly the depth from the depth channel. Effectively like using a projector from the location of your left eye camera, and then viewing it from the location of your right eye camera.

This also means that parts of the render from the left or the right eye can be offset by the depth data to patch or supplement renders from the other eye. Think of it like sneaking in a little conversion. This can save render time, and help with various problems matching the eyes.

It can be as specific as using a render for parts of a character (eyes, fur, screen-right edges), or parts of lighting components (just the specular, just the reflection, refraction, or just the diffuse).

To a purist, it might sound like an anathema to mix and match, because a purist would assume that the highest quality is from matching renders. Really most people would fail a Pepsi challenge on fully rendered shots vs. hybridized shots. The philosophical concerns don’t matter as much as the final set of images being coherent.

For this reason it is absolute bunk to call all animated movies “real” 3D, or to be able to claim from your seat in an audience what’s rendered from scratch and what’s not.

Post Conversion

Conversions are popular because they require less time on set, use more flexible camera setups, and cause fewer problems for the crew that’s mainly concerned with the 2D version of the movie. That also means they can adjust the depth of everything ad infinitum. That can mean a more creative, and thoughtful use of stereo because they can evaluate the results and change it in a way they can’t do easily on set, where they are more likely to be conservative, or stuck with what they shot.

Conversions are also associated with people looking to make a quick buck on ticket sales, and reducing labor costs on the conversion to get as much profit as possible.

That means it’s likely you’ll see the places where conversions fall apart because of time and budget constraints, which was very common in earlier post conversions when studio execs felt like they needed to rush. You might recall movies where only part of the film was in stereo, and they wanted you to take on or off your glasses in the theater.

There was also the quality issue from the assumption that people were going to watch these in theaters were they couldn’t hit pause. The home video part never panned out - but maybe it will with products like the Apple Vision Pro.

Major issues stem from the approach a conversion house takes when presented with 2D footage. Most places will create a 3D space in the computer and camera in order to “accurately” produce the offset eye. I’ve heard of some places where people just cut out and move stuff around to wherever it feels right for them, or use image based algorithms to create a fake depth map to drive the stereo offset, but the map might have holes and errors where the algorithm guessed wrong. I’ve never worked at a place that did these things so I don’t have insight to share about their thought process, so let’s move on to placed-in-3D stuff.

To do that, matchmove needs to be done where the camera is solved for, and elements in the shot get rough geometry. Since this often needs to happen for the 2D VFX portion of a movie, this is considered some synergistic cost savings. The plate can be projected on that rough geometry in a 3D software package, and then an offset camera can be dialed for the interaxial and convergence values that feel right for the shot, and in the context of the sequence. That projection on to the hard geo is really just to dial things, the geo won’t be used raw, it’ll be cut by rotos, blurred in the depth channel, etc. to make something that’s softer than the hard facets of rough geo.

The photography does get rotoed, however only one eye needs to be rotoed, not a complicated matching pair. The photography also needs some degree of paint work to be done to it to clean up the area that was occluded by the foreground. This can be as simple as painting out a sliver, or halo, around where a character occludes the background, or it can be a more extensive affair.

That means the same paint needs to be used in both eyes to account for any minor variance between the paint and the original plate. While the audience could never tell that the background was painted (really, I absolutely promise you can’t tell because shit is painted all the time in regular-old-vanilla shots and people truly don’t have an inkling) the audience can tell if there’s only paint in one eye’s view for the same reason as it would be a problem to have mismatched paint in native stereo.

Paint removal includes things like flyaway hairs which will be painted out and rotoed, or luminance keyed, to bring them back. They will match exactly between the two eyes, unlike mismatched keys of native stereo, but they will need to be placed in depth.

If you want to tell anything about the quality of a conversion, look for those flyaway hairs. They should be there, and they should also be at a sensible depth relative to the rest of the hair. not way behind, or in front of the actor.

The actor should have internal depth, which is usually derived from the rough matchmove geometry. They should have a nose past their eyes, and their ears and neck should be back. They should never feel like a cardboard cut-out unless they are far away from the camera, like background actors, or a really wide shot of them in an environment.

Speaking of environments, the two biggest problems there are highly reflective and refractive surfaces. If there’s a shop window, with reflections, and the name of the shop painted on the glass, the reflections should not be at the depth we see through the window or they will look like they’re at the depth of the walls and surfaces inside the shop. They need to be at the depth of whatever they’re reflecting. That means the reflections must be painted out, along with the lettering on the glass. The lettering needs to then be placed over the shop interior at the depth of where the glass plane is on the facade, and then the reflections need to be added (reflections are additive, but that is a rant for another day). Then that reconstructed window is used for both the left and right eye. No, you won’t be able to tell that work was done because you, in the audience, don’t have the 2D version of the movie to look at and compare it to, and the work will all be so internally consistent that it wouldn’t register for you to check without the knowledge that this kind of work needed to happen. In the abstract this knowledge might cause philosophical conflict — unclean! Impure! But I assure you the director isn’t anywhere near as precious about this as you might think.

If a conversion house omits that level of work, and just lets the shop window be flat to the depth of the building facade, or lets everything in the window go deeper, including the reflections and lettering, then it’s going to look wrong to any casual viewer.

This can also be applied to things like shiny cars, or reflective bodies of water.

As for refraction, that will be most obvious with things like thick, curved glass, and glassware filled with liquids. Bottles, wine glasses, thick reading glasses, etc. The edges of the glass, where the index of refraction creates that defined shape that’s almost solid, should be at the depth of the glass in 3D space. The interior core of the glass, where you see through bent light of the objects behind it should be closer to the depth of the object behind it (accounting for any magnification). Then there should be an artful blend from that edge depth to that core depth in whatever fake depth channel is being used. Anything like reflections should be painted out and added on top; like the shop window.

What you do not want is for the glass object to feel like everything inside of it is at the depth of the glass surface. It will look painted on, not like you’re seeing through the glass.

This also goes for lens flares, which are reflections and refractions from light hitting the lens element at certain angles and then the filmback. The lens flare needs to be painted out and reconstructed exactly, then the source of the flare needs to be offset to match the location of the light source, and then the little bits of the lens flare need to be offset based on where the light source moved to in relation to the center of frame, which would be the center of the lens. Oftentimes a lens flare plugin in a compositing package will be used to help replicate the original flare, or at least used as a guide for placement.

This leaves other camera based effects like grain, and heavy vignetting. The entire plate needs to be degrained as step 0 in this process, and then regrained, taking into account any extra reconstruction work, and also offseting the grain timing for the offset eye. You should never have a stereo offset in your grain (meaning the same pattern reproduced and moved in X) because that puts the grain in depth. If you leave the original grain on your left and right eyes, and do your offsets, then your grain will be painted on to the depth of all the surfaces you reconstructed. That’s extremely bad, and extremely obvious when it happens.

Grain should be offset in time (effectively randomized noise seed) so there is never a matching pattern your brain will try to place in depth. The result is a fuzz that exists around screen depth. Your eye doesn’t identify it really having any depth unless the grain is heavy and can almost take on the quality of an atmosphere, that flattens things, in which case the decision maybe made to reduce grain for both left and right eyes.

Usually people can get away without treating vignetting, unless it’s heavy —the real artsy stuff. Then the conversion house needs to remove the original vignetting and add it at screen depth (no offset) with everything else in stereo being placed behind it. You don’t want something popping out through vignetting —that doesn’t make any sense.

The really good news is that because this uses the same greenscreen and bluescreen, the edges don’t get screwy, and any combination with CG can be made exactly, because everything can be placed together in the same shared space. When done well it help the director shoot how they’re comfortable shooting, and get the results they want for both 2D and 3D.

Hybrid

Really it doesn’t do any good to make any sweeping statements about quality based on method, and especially after taking into account that films will blend various parts in ways that are often invisible to you.

Idealogical purity really doesn’t exist in either the realm of home video or stereo video, so try not to get too wound up about it. Always try to watch the best version of something you can, and suits your current situation, but don’t get yourself upset about something in the abstract.

If you really want to understand the quality of the 3D work inspect those common problem spots I mentioned. Pause your movie and open and close your left and right eyes. Look at the refraction, the reflections, the flyaway hairs.

Separately, judge whether it was worth seeing it in 3D at all. Did that add to your experience for this particular film? Was anything about it essential, or memorable? People talk about the 3D of the Avatar movies because James Cameron made it a part of the experience, not just because the native stereo checkbox was ticked.

No one is under any obligation to like 3D movies whatsoever, but it’s important that we don’t justify or define that dislike based on a simple binary that isn’t true.

2024-02-19 17:50:00

Category: text