Chasing that 13th bit

Some of our clients are shooting with the Arri Alexa 35 and want to preserve as much HDR data as possible, so they’re shooting ARRIRAW. ARRIRAW is unprocessed, and we can’t perform VFX directly on RAW footage. However, the new Alexa 35 camera boasts a greater dynamic range than previous Arri models.

The Arri Alexa 35 records in a 13-bit log ARRIRAW format. To utilize this for VFX, we need to process it into a usable format.

We’re accustomed to working with 12-bit log footage, which we can store in ProRes 4444. However, ProRes 4444 has a maximum 12-bit storage capacity. An alternative storage format is EXR (PIZ), where we’d convert the ARRIRAW into Alexa Wide Gamut Scene Linear (AWG/Scene Linear).

What’s confusing me is the terminology Arri uses:

ALEXA 35 images are processed in 18-bit linear space and recorded as 13-bit log ARRIRAW.

Arri claims that converting ARRIRAW to LogC doesn’t result in information loss. But how do we efficiently store 13-bit log data when ProRes 4444 is limited to 12-bit?

What exactly does “18-bit linear space” mean? If we convert to AWG/Scene Linear and store it in a 16-bit float EXR, are we losing information?

I understand that pursuing this 13th bit might seem excessive, but I’d greatly appreciate any insights to better understand this workflow.

@PlaceYourBetts - In the short term can you use the LogC4 to ACES workflow and store your transcodes in 32-bit?

2 Likes

yeah i did theorize 32-bit but obviously that seems overkill and some what prohibitive for commercial workflows (lean and mean)

If you transcode to the full precision, widest gamut available to you, then you maintain a clean pipeline with the maximum flexibility.

Do your clients want to adjust exposure before, during, or after vfx has been completed?

Are they making you juggle chainsaws while they vacillate?

Or are they in the throes of a tormented, creative struggle as they determine which of their fiercely single-minded visions to manifest?

2 Likes

Yes, you need 18 bit code values in linear gamma for those files. In log this can be compressed to 13 bits.

Most other cameras we deal with process 15 or 16 bit in linear.

The data always come of the sensor in linear and then are mapped to log, for the purpose of using fewer bits, because all this has to happen in real time in hardware, which is a lot less forgiving than what Flame has to do processing wise.

But agree with you, that seems a bit excessive or purist on their part. Double the budget for the extra precision and let them decide if it’s still worth it. And ask them them if the can identify the missing bit on your monitor?

3 Likes

This seems like hogwash meant to impress the Netflix technical standards people. I know it’s not the right answer but I’d just transcode it to good old LogC and call it a day. There is no universe where you can A-B the 12 to 13 bit difference and have anyone see it.

It’s not adding more data at the “ends” of the image, just more steps in the middle. With a modicum of noise in an image you’re not gonna “break” it even down at 10 bit.

I know you’re trying to act in good faith, but when clients see data numbers they throw away rational thought. People stop looking at the image and start living in in the world of difference mattes, grain matching, and crunched histograms.

8 Likes

Assuming I’m doing the math correctly here, this is how the bits fall:

If the sensor provides 18bit linear data, that’s 262,144 possible code values for each channel.

Compressing this into 13bit log, you use the gamma curve to throw away values in areas where their impact on perceived image quality is less important. Typically that means maintaining detail in the shadows, and then sacrificing detail in mid tones and extreme highlight nit values. The difference between 1400nit and 1800nit is less critical than between 20 and 30nit.

13bit log is 8,192 individual values.

Now that is one bit more than fits into ProRes 444, so you’re making it one level more coarse by dropping the last bit, and getting down to 4,096 possible values.

Now to visualize - when was the last time a clear visual difference smacked you in the head between a ProRes HQ (10bit) and ProRes 444 (12bit) file. Not totally fair, since one does 4:2:2 chroma subsampling and the other 4:4:4, but if you didn’t notice, and it had two degradations stacked against it (chrome subsampling and channel precision by 2 bits), well, you know they’re reaching for the last crumbs.

OK, not totally fair. That would only apply for delivery codecs. While you’re still in the pipeline, this extra precision can make a difference for keyers and other algorithm who have better ‘eyes’ than we do. And stacking operations benefits from extra margin to avoid image degradation.

Back to the numbers - if you use 16fp instead, you get 1024 different code values, but you can shift the decimal up and down 32 points. That means you can cover the full number range of those 262,144 original values, but you’re having massive voids in-between.

At 32fp, you have 16,777,216 possible code values, and 256 different decimal point shifts. Your original data fit into this nicely both in original linear or log without loss. But everything in your pipeline needs to be able to handle that. And with Flame I don’t think that’s the case. There are things that don’t run 32fp.

Having said all this, I think you’re save to convert to LogC in ProRes444 and keep working as usual. You’re throwing 1 bit away that was nice to have. They’ll still have the RAW files in case things change and in 10 years they want to re-process everything.

Also, all that extra data is helpful if you have a DP that isn’t the best at lighting, or something else was rushed and you have to push the footage around to make it work. If they can afford one more more Alexa35 presumably the footage is well captured and doesn’t have to go through a wringer to make it look nice, that lives or dies by that 13th pixel.

And tongue in cheek - There are many buildings who don’t have a 13th floor, and there never was a Resolve 13. They went from 12 to 14. That 13th bit may be cursed, and you don’t want it anyway :wink:

And in the end most of content is watched on 8bit screens with just 256 values remaining at the end of the pipeline.

4 Likes

just to make sure we are all
on the same page -

The AD converter in the a35 is 18bit linear INTEGER beign saved as 13b integer log.

this is not to be compard with 16b floating point, we are comparing bananas and apples here.

You can easily fit everything the a35 captures into 16b floating point without breaking a sweat.

bascially in float you have the same precision for each stop / doubleing of light, so going from 1.5 to 3 float is the same " bits " as 0.1 to 0.2 so its basically log internally.

in integer this isnt a thing, you are wasting half the possible code values on the last captured stop of light.

lets say you capture 10bit integer linear and you have 16 stops of dynamic range:

stop1: 512-1023
stop2: 256-512
stop3: 128-256
stop4: 64-128
stop5: 32-64
stop6: 16-32
stop7: 8-16
stop8: 4-8
stop9: 2-4
stop10: 1-2
stop11:0-1

Yea ooops !! we cant save 16 stops as linear in a integer 10bit format - every stop more needs a bit more as you have to double the values for each stop of dynamic range.

Thats why we are dealing with a 18bit ADC , so this directly relates to the total dynamic range of the sensor and whats posisble to get out of the sensor.

All that said i do have gripes about flames processing of arriraw:

→ can we proof a clean pipeline of ARRIRAW to 16b float? If you read the colormanagement “trace” on the right side it reads 12b integer everywhere and stuff , I am absolutely not sure it doesnt throw away stuff, very mysterious, there is no direct debayer option to float either.

→ Take a look at The arri reference tool, it comes wirh their newer ADA7SW decoder, it has been my go to for difficult plates like greenscreen shots as it can handle high contrast edges between colors better.

Remeber / a bayer sensor like most cameras only
has 1 color per pixel not 3, so all edges can be problematic for color spill as the debayer algo is combining, merging pixels to make them RGB, you can make a larger difference with better processing , one reason to shoot raw as the post processor might be newer than your camera

2 Likes

@allklier is of course absolutely right that while the dynamic range of a 18bit integer capture does in fact fit into 16b float range completely.

However just because it fits does not mean it carries the same precision, float is complicated.

Ideally you would always save it in the same bit depth as dropped out of the AD - now practically 18bit integer formats are not a thing. neither is 13bit integer format if you want to save it as log, youd have to go all the way to 16bit int-log and

Most apps work in 32b float internally (nuke , resolve..) Flame lets you pick what you do when, which is nice and dangerous obviously.

So if you want to be absolutely precise and get the absolute no questions best uncompressed version of your arriraw without any compromise at all you need to be doing 32b float, maybe actually even higher for certain values (but that format doesnt exist).

In practice - 16b float is way more than enough precision even for 12b HDR deliveries and has been the cornerstone of VFX plates and comp renders for many many years, and i have never encountered banding issues with this at all.

That said, if you debayer in flame you are allready loosing so much quality just due to flame using ada7-hw? (what does it actually use?!) and not ada7-sw or whatever (need to have a second look bur last time the result from arri reference tool was definetely better than what flame gave me)

1 Like

this is super weirdly written, so it says

→ alexa35 always uses ADA-7

→ but you can still select between ada5hw and ada5sw ?

There’s a subtle distinction between range and precision.

Here’s some python code if you want to test it:

import numpy as np

print(np.array([2048]).astype(np.float16))
print(np.array([2049]).astype(np.float16))

yields:

[2048.] 
[2048.]

fp16 has 10bit mantissa, per IEEE you get to co-opt the sign bit. So the largest mantissa value is 2048. You get to 2049 and it starts rounding. 2050 is good again. And the holes get bigger as you go along.

In fact we can tell the number of holes this way:

import numpy as np

for i in range(262144):
    print(np.array([i]).astype(np.float16))

leads to:

[allklier@akflame misc]$ python3 fp16.py | uniq | wc
7169 7169 59086

So from the 262,144 individual values the Arri 35 sensor can give you, fp16 can only encode 7169 unique values. That’s 98% loss in precision. But it can express the entire nit range of 17 stops of linear light captured by the sensor.

This problem doesn’t exist in 32fp.

As @finnjaeger says correctly though - for 99.9% of use cases these other 98% of precision can be ignored, unless you want to impress a ‘bigger is better’ producer. These 2% that are left do still capture the essence of the image. 17 stops of range, all the love for the shadow and the glistening specular highlights.

It will still be a spectacular image.

1 Like

The very essence of “lean and mean” is throwing away that 13th bit. Sacrifices have to be made. Don’t look back.

3 Likes

meanwhile on the other side:

2 Likes

Answer without any technobabble.

Work in 16bit float exr. You can work in Arri Wide Gamut/Scene linear if you are not bothered with ACES. You’ll have the full range of the sensor. It you want, you can also work directly off the RAW in Flame and render to the above.

5 Likes

Can someone point out the optical difference between 12 and 13 bits please and if theres none then what is the reason for chasing the bit.

1 Like

I am with you @johnag but the client/DOP knows the specs and when I tell him that I transcoded into ProRes444 LogC he might/has come back and asked if we are retaining ALL of the dynamic range that this new camera can capture. Or why the hell shoot ARRIRAW :person_shrugging:

I would be thinking yes. Why did you shoot ARRIRAW. LogC stored in ProRes is so damn efficient and has served us so well to date. But then my inquisitive side goes, maybe I should look into this and check that I am doing the right thing :grin:

This has been very insightful. Thanks everyone

I think it helps to understand and have a gold standard approach up your sleeve. In an ideal world we would get a “first pass” on this ARRIRAW media. A “balanced grade” and then everything could then get exported for VFX in a scene linear format :folded_hands:

I was quite happy exporting the selects in LogCv4 ProRes but I just wanted to dig deeper. Thanks

1 Like

Don’t get me wrong. I’m all for squeezing out all the data but I would really like to see visually or hear of a practical use, like its better for keying etc. Does Arri have a comparison to show us?

I don’t think that the DOP was thinking about keying.

I have heard people say that they want ARRIRAW for green screen shots before but it isn’t something I chase.

I have always been a bit agnostic to ARRIRAW and not convinced by any of the arguments (skin tones are better etc.)

I would be keen to hear some arguments on this

My suspicion is that it was driven by two things:

Dynamic range. Arri35 with 17 stops is bigger than Venice 2 with 15, and RED Komodo at 16.5. Now RED claims 17 for the latest Raptor XL. That required bigger resolution of the ADC, and then you should keep some of those bits. Arri needed a spec to stay in the top spot for camera technology.

If you feel strongly enough, I can reach out to Art Adams, who did a lot of the testing of the camera pre-launch. He’s unlikely to get into the internal considerations, but may have thoughts on where this 13th bit may actually make a meaningful difference.

On the question of RAW vs debayered (whether ARRIRAW or other camera flavor), there are two considerations:

Greater color detail that may not be perceived visually, but observed in algorithms, particularly in the noisy blue or red channels. However, as has been observed, I don’t know if this still as relevant with todays camera, high bit depth and better keying algorithms. That you guys here aren’t asking for it, seems to indicate that this is a hold-over from a different generation of cameras.

The more relevant path would be if you processed the RAW different for the key pass and the color pass. You might recover some highlights, or may debayer with more saturation or color balance towards red/blue that makes a meaningful difference. It would color separation that doesn’t exist in the colorpass.

I’ve had cases where going in and editing RED highlight roll-off made a difference.

And it’s not uncommon to run the key inputs separate. So doing a separate debayer, while possibly more extreme, if it saves an important/difficult shot, could certainly be a consideration.

It’s mostly about grade. It’s amazing how much detail you can bring out of what looks like an overexposed sky that looks completely white only to reveal a whole lot of cloud detail. Great to have that dynamic range in VFX for when you are doing sky replacements too. You can bring the exposure right down to get nice edges on things like trees. Keying is definitely better as well but not the reason for it.