Frustrating day in Flame

allklier · February 14, 2025, 4:15pm

@cnoellert fair question:

I’m using all linked media though, and with Image node there isn’t any implicit caching. Linked media is ProRes 4444 on a local NVMe drive via TB3.

It may be worth trying it with cached media. In 16fp, that would convert it to EXR. If I stay on 12bit it would cache it at ProRes4444, which is same as source under current settings.

Any settings that would be better?

From the testing yesterday I do know that the timeline bit depth is a key element. It fails at 16fp, it does use a lot of VRam at 12bit, but doesn’t crash.

In hindsight there is no reason to run this project on 16fp, that was just habit of how I setup my projects, since it’s often commercial/beauty short form.

So the fix really is to run this project at 12bit or even 10bit, and then it should be ok. Which I think explains why others had no issues. The timeline bit depth was the unforeseen variable here.

It’s the details that put you in

cnoellert · February 14, 2025, 4:30pm

So with those setting you would be rendering your 16bit timelines as PIZ compressed exrs which dedinitely have a processing overhead to decompress.

It’s important to understand that this is regardless of whether you cache or not. The screen shot is not only the format you cache to but the overall storage formats for all renders at their respective bit depths. So when you render your grades on a 16bit timeline it’s rendering to those formats.

Uncompressed formats have less overhead processing-wise so you could also try raw on 16 and 32 to see if that helps. I’ve come to setup my timelines in their delivery formats as a rule, which it sounds like is the direction you’re heading anyway—in this instance especially it’s win win with regards to storage space and non-crashy behavior.

cnoellert · February 14, 2025, 5:05pm

…and you change that start frame to 1001 this instant young man.

allklier · February 14, 2025, 5:06pm

Yes, that’s definitely the answer - set it up in delivery format. I think we discussed the merits of that in another thread not long ago.

And noted on the render caching, makes sense. Based on the testing I did yesterday, while the full crashes occur during render time, the VRam starvation and instability of the app (low FPS, blank screen, ‘out of memory’ errors in terminal) actually get triggered by playback as well, not just render.

AdamArcher · February 15, 2025, 12:01am

This is the way I always used to work when finishing in Flame. The benefits to framestore size and playback speed alone was worth it.

allklier · February 21, 2025, 10:03pm

Update on trying to isolate this, and a specific question for those who may have data.

In the testing on my main Flame this behavior is repeatable, and I have created new projects and restored just the timeline from archive. So it follows projects, it’s not a corrupted set of files.

However the devs tested on a similarly spec’ed system and can’t reproduce it.

I also ran it on my older, now test system, and I can’t reproduce it there either.

So there seems to be something specific about the hardware config of my system that makes it more susceptible to this memory leak. We’ve been trying to hone in what it may be.

One prime candidate is the CPU. My main Flame system uses an i9 12900KS processor. All the other systems that don’t show the problem run with Xeon CPUs of different variety.

My main system has a A5000 GPU, but the devs tested on an A5000 and didn’t see it, and my test system has a RTX 4000 and didn’t see it.

All system had the same software Rocky 8.7, Flame 2025.2.2, DKU 19.2, 550.90.07 NVidia driver.

I did rule out the external TB4 drive by copying the files onto an internal NVMe. I also tried it with Broadcast disabled in Flame (but with card still present).

So question to the hive:

Have you been running on an i9 CPU and every seen weird behaviors by Flame others couldn’t reproduce. Ideally memory leaks and crashes, but could also be other things.

Any reason to consider i9 different enough from Xeon to give this credence? Especially since this a GPU memory problem by all accounts. Could it be related to PCIe 5.0?

What other hardware differences could there be that I may not have isolated?

Of course memory leaks are super sensitive to specific circumstances and will not show up elsewhere in the same manner.

While not used in this project, the main system has various other packages installed (like Logik Portal, BorisFX suite, etc.).

System specifics:

Asus ProArt Z690 MB with Z690 D5 ATX system chips
i9 12900KS 16 Core CPU
128GB DDR5-4800 RAM
PCI 5.0 slots for A5000 and DeckLink 12G Extreme

I know one of the sticking points will be that this is not one of the certified configs. That’s the trade-off we always do. When I bought the system, it was on the first 12th gen Intel CPUs available. Dell and HP were at least 9 months away from their CPU refresh.

allklier · February 21, 2025, 10:36pm

Adding some useful detail and a utility to this discussion.

Flame has the resource manager, but it only is a snapshot for resources, it’s not a good real time monitor.

Turns out NVidia includes a good monitor. In a terminal run

watch -n 0.5 nvidia-smi

It produces and refreshed every .5 seconds this hardware dashboard of your GPU, which includes, temp, load, memory consumption, and memory consumption by process:

Interestingly enough while Flame is stable, it occupies the 19.3GB that is set as the threshold in Setup at 80% as instructed by support.

After 5 minutes when things start going south, you can see Flame breaking it’s own rule and GPU memory steadily increasing until it hits 24.0 GB and exhausts physical memory of the GPU. And then dies.

It also confirms that this memory full is related to the GPU, as we can see it in action.

Jason_Kalinoski · February 22, 2025, 6:13pm

Glad I read this thread as I think I ran into this same issue yesterday.

Long story short, I was halfway through a project with acescg/16 bit timelines (kind of testing a new workflow, I’m usually in 10bit) but then the client decided they wanted to re-use the same look and assets from an old project, built last year in 10bit.

Started using clr mgmt nodes where necessary to get down to 10bit so I could plug in old assets and all seemed to be working ok until I added a lut in the Image node…boom, flame crashing non stop.

Finally gave up and re-linked the footage, copied into 10bit timelines and was able to get it out the door. Of course all this happens on a Friday with the client chomping at the bit, so I haven’t really had a chance to troubleshoot.

I’m on a Lenovo P620 with a 3090.

allklier · February 22, 2025, 6:16pm

Thanks for sharing @Jason_Kalinoski and sorry to hear about your experience.

It’s a useful data point for support to consider, that my experience may not be an isolated case. And your P620 is a certified config, though your GPU is not - reverse from my situation.

And noteworthy for one more fact - that you used a LUT. My grades in this problem project all loaded a LUT on the Primary Grade node. Not something I always do.

Which version of Flame were you on?

Jason_Kalinoski · February 22, 2025, 6:31pm

2025.2.1

It felt like adding the lut caused the crashes, I had already done a base grade in Image when everything was still 16bit and didn’t have any issues.

hBomb42 · February 22, 2025, 7:29pm

Every shot I grade has a few ColorMgmt matchboxes in it to load various log conversion and look cubes, but I build all my grades via secondaries because I don’t want to change the Primary and break a bunch of keys. I’m way behind you version-wise, though, and also just about never in a 16fp timeline although I do often have 16fp clips.

allklier · February 22, 2025, 7:32pm

Correct.

It’s not that LUTs per-se are a problem. But it’s a specific combination, particularly 16fp, in ~current versions, etc.

It’s now a chase to isolate what specific combination triggers it. To that extend, every data point you and others shared helps rule out a variable or two.

allklier · February 22, 2025, 7:54pm

We’re getting closer…

I’ve removed the LUTs from all my image nodes. That didn’t make a difference.

But…

If I playback in single viewer mode set to front view, VRam stays at 5GB
If I playback in single viewer mode set to result view (F4), VRam stays at 6GB
If I playback in single viewer mode set to Image Schematic, VRam stays at 5GB

The second I switch to dual-view with Image Schematic + Result, VRam immediately jumps, quickly climbs to 19GB remains stable for a bit and then goes all the way to 24GB and crashes. Consistently.

Which is now the question for the devs, who weren’t able to reproduce it. But they may not have had my typical viewer configuration.

If that is the missing link, my guess is 2-up view, some code gets skipped that releases the VRam either for the image schematic thumbs, or something along those lines. Not enough to kill Flame on a short-form project, but it piles up quickly.

Sinan · February 22, 2025, 9:58pm

I love reading this like a techno-thriller journal. Thanks for keeping me on the edge of my seat @allklier

allklier · February 22, 2025, 10:52pm

@Sinan Haha. Well, this is common debugging procedure. Find a reproducible case, then eliminate variables until you know which piece of code to look at and then light-bulb it again. Going back to my roots in software, and thinking how the code works, where GPU memory is needed, why they might lose track of it, etc.

This is why I always get frustrated with level-1 customer support (in general, not the case with ADSK to be clear). By the time I call, I’ve already gone through all the basic debugging and give them a clear description. But they usually don’t know what to do with that, and have to follow the script anyway. So it turns into a patience test.

I remember debugging an OS boot loader back in my college days. Literally the first instruction the CPU executes after it wakes up. There’s no screen, no debugger. But I could write to an I/O port and change the 3 LEDs top right on the classic PC keyboard (num lock, etc.) as my console log. By looking which LEDs were on, I could tell how far the code got

cnoellert · February 22, 2025, 11:23pm

Same…
Bill Hader Popcorn GIF by Saturday Night Live

AdamArcher · February 23, 2025, 10:42am

You can see why they have never wanted to port Flame to Windows. All the potential hardware combinations would be a nightmare to debug.

It also explains why Mac has become such an attractive platform for Content Creation Software.

finnjaeger · February 23, 2025, 7:31pm

Linux is no different in this regard, they could still just support a certain machine on windows as well.

absolutely no idea how blackmagic keeps this up tbh, they ported fusion to everything so quickly.

philm · February 23, 2025, 8:14pm

but isn’t fusion just software and openCL if available?

finnjaeger · February 23, 2025, 8:30pm

idk what it is, what it does, or why or why not this worked but they have puhsed out resolve and fusion completely cross platform as well as all their drivers. I am not a software dev.

resolve even works on iPad, i mean you gotte give blackmagic credits for pulling this off.

additionally i havent had many issue on either OS with resolve on any weird hardware fwiw, it just kinda runs on anything. their project server is pure genius with collaboraton.

Same with Nuke, just that resolve is closer to flame.

anyhow, its some stuff we just dont know, mysterious stuff, i just write python scripts i have no say in this hahah

Topic		Replies	Views
Flame Timeline to Nuke Studio and back again.... anyone? Flame Questions	8	422	February 21, 2021
Rough day writing files Flame Questions batch	4	210	November 20, 2023
You've Been Burn'd Ep15: Endless Frustration You've Been... timeline	13	667	October 20, 2022
Open timeline Tools and Tech timeline	32	843	April 24, 2023
How do you Flame? Between Renders	32	1395	July 14, 2022

Frustrating day in Flame

Related topics