Releasing TUNET - A training ML tool

@tpo Thanks.

Also, what is the best color space for the source/destination images to be in for the algorithm assuming our images are scene linear?
rec709, sRGB?

1 Like

Rec. If you using OCIO, Tunet converter to Nuke will also generate the nuke script with color mgnt. Flame still WIP

1 Like

Due to my personal vendetta against the foundry, i messed with this the last few days

first of all @tpo thank you you are a absolute magician

  1. Made it work with GPU on windows, need to install some other stuff

  2. Made AMP work in windows, also required a code change

  3. made my own pre-slicer that takes my 3840x2160 source images and crops them randomly to 512x512 slices to bring down the step time massively.

  4. Implemented pinning pin_memory=True for faster load times

  5. ToDo: see if I can pre-convert to .pt for the input dataset and just keep it all in RAM.

(planning to fork the repo later when I have time to commit the windows fixes)

What i got now is

RTX4090 in windows

150 input images (3840x2160)
2700 generated 512x512 png tiles

Model size 128
Batch size 8

gets me this Step[14195] (195/500), L1:0.0135(Avg:0.0186), LR:1.0e-04, T/Step:0.591s (D:0.113 T:0.002 C:0.468)

that sounds pretty reasonable, and will get me 2000 epochs in about 7 days or so.

I guess distributed training is not on the table here? I have 4 machines with 4090s in them..

5 Likes

Even though it’s a bit like changing seats on the titanic, we’re stupid close to ditching Nuke for Fusion now for 90% of the work we need it for.

Tunet gets us one step closer to ditching it completely.

5 Likes

While I appreciate the sentiment, it’s a slippery slope. I won’t repeat my whole speech about the mining town feel when Resolve is the last man standing, because they starved everyone else of air.

This is no excuse for Nuke, Flame, or Baselight and their decision makers.

All that comes to mind is the graveyard scene in The Good, The Bad, and The Ugly.

“You see, in this world there’s two kinds of people, my friend: those with loaded guns and those who dig. You dig.”

Lets not be the ones with shovels.

2 Likes

I first used Digital Fusion in 2003. I keyed a whole music video with it, comped and assembled the thing in Inferno. It was painful but DF had Ultimatte and Primatte bundled and the spark costs for Ultimatte alone would have been more than the DF license. Our Inferno license was north of 300k if memory serves.

This was when Autodesk was gearing up their campaign to buy or bully every package in their sphere—Discreet had another couple years of independence before the hammer fell.

Microsoft had picked up Softimage to push a port to NT and had been working tirelessly on an NT only product called DS. Eddy was long gone at that point. Alas they sold it all to Avid after forcing the adoption of NT as a standard in the 3d world. They took a chunk of Avid in the deal and continuing the great push forward for Windows.

Beyond Soft, Parallax had long since sold out Matador and Advance to Avid who added OMF import support from MC and renamed it Media Illusion but kept Matador’s namesake. Media Illusion died in 2001. When soft was sold to Autodesk in 08 Avid held on to DS but killed it off in 2013.

Cineon died in 97 in yet another astounding dumb move by Kodak shareholders but, part of the tech resurfaced in Chalice and then Rays which both met with quick ends as well as all over the PTE for the Spirit. More money in patents and telecine.

Cyborg was on the rise but was about to hit some rocky ass roads along with Monsters and Colossus, the latter which would become Lustre after the acquisition and eventually be killed off like its ill-fated brothers and sisters during the fall.

Denim Software’s Illuminaire Studio bought by Discreet, rebranded, Effect and Paint option 1/2 (along side option 3 which was Flint on O2), then combined into Combustion, acquired with Discreet by Autodesk, then dead.

Sony’s Socratto… need I say more?

Quantel was dying a slow death with no HD prospects because of their hardware dev time. Doing macros in SD that you would scale to HD turned out not to be a winning strategy. The death rattle of the iQ was years away but you could tell the end was neigh or a miracle was needed. People weren’t buying Henry’s really, or Hals. The Domino was dead. Newbury was already selling off the private jets.

Apple had acquired Shake from Nothing Real and was already reducing the price after porting it to MacOS in a bid to siege the post market. First they killed Tremor dead. Eventually Shake would hit 500 bucks for Mac OS while NT and Iris were 9k and then it was all dead. It was the crown jewel of a strategy which should feel very familiar.

I mention all of this only because every player still in existence is complicit in where we find ourselves today and is directly responsible for our limited choices, rocket-science-tax structure and the ever present race to the bottom.

I don’t condone how BMD operates but honestly they all suck. What I can say about DF/Fusion and BMD, is it would appear that, unlike a lot of folks on the list that just bought shit, added one thing that might help their core business and then let amazing tech just die on the vine (because making money in post is hard), BMD has actually made a concerted effort to make a good package great. Not that Fusion is truly great yet, but they appear to be actually trying. And that’s uncommon.

As a postscript I will offer that I’ve edited this post repeatedly. As I would read it back I would keep remembering more and more instances of our niche being fucked (usually with less than altruistic intentions) and seldom did my initial attempt at relaying those tidbits ever land correctly with the first draft. Nevertheless I pounded “save edit” with the same fury as if I had just read the news on that same Ill-fated day.

7 Likes

It’s that same passion with which we keep re-reading and editing posts, that makes this whole story so infuriating. And the reason I keep swimming uphill on the unpopular notion that BMD isn’t the greatest thing since sliced bread, as most of the Internet - young or old - seems to have concluded. There may not be much that can be done at this juncture, but posts like yours at least restores the integrity of the moment, that while we may have to live with Fusion as the last solution, we do so with a heavy heart and not gallons of YT drool.

And it’s complicated. You’re right that BMD, for their own egotistical reasons, which have been written about, is actually throwing a lot of resources at Resolve and does keep improving the product. It’s a rocky path born out of rejection of an individual which has launched them onto this journey, which then found its own unique momentum in the age YT and DIY. And this has turned into an unstoppable train at this point, which is bound to take out anyone in its way.

Unfortunately, their approach is fundamentally a variation of what you described - not solid innovation, but buying graveyard finds (in terms of apps), slapping them together with gaffers tape into the first actually successful run at the everything app, and then doing continuing engineering by ripping all the recent new innovations of others (I could make a long list of heralded recent innovations of Resolve, that at their core are just poor copies of the work of others, often without asking) yet still winning gold multiple times with the Internet audience. And they’ve done it so much, that even old-timers now can’t tell the good from the bad.

So it is complicated - because they are actually providing answers to artists, yet they’re also accelerating the bigger demise at the same time.

The other day I wrote it in a different way that seemed resonated unexpectedly: The cycle of most products is that in the early days you have passionate engineers and innovators trying to solve a problem, often moving mountains to do so. With their tools, however imperfect they may be, and however expensive they may be, they create a step function for everyone. The focus is on solving problems and opening new frontiers. As these tools invariably grow, mature, and age, the keys get turned over to the corporate class (the innovators want to move on, and you need differently skilled people to scale). The new owners rarely understand what the product actually does. Their focus is on scale-up and profitability. And thus enshitification ensues.

To them it’s irrelevant if the tool offers the best path to solve a problem, as long as they can sell enough licenses and upgrades to make a chart go into the upper right corner. And thus the vision for the product diverges between the actual users and most of the decision makers.

On the Friday Flame hangout I mentioned that we’ve seen this with Mistika/SGO. An awesome color app, with some really cool innovations. But in terms of user base they should be long on the other side of sunset. There’s a small but dedicated community (you can count on two hands) that has a detailed wish list. Slowly some of that is making into the app at 1/4 speed. Yet the app survives, because it has some big corporate accounts that keep it above the waterline. And of course those corporate account needs trump anything the community would love to see. I’m sure it’s similar for Flame, Nuke, and even some degree Resolve (Warren Eagles and Stefan Sonnenfeld have more pull on the roadmap than you and me, but don’t dominate the forums - Warren you can mingle with at the NAB Colorist Mixer).

Overall the post industry is finding itself at a time of significant maturity. The basic tools we rely on (comp, key, 3D, paint, etc.) are all well understood and solid. Most of the needs are in the plumbing surrounding them, and most of our complaints are actually in the pipeline and interchange, not the core tools. It’s those mature periods that do as much damage to the corporate owners, as do the BMD’s of the world. This has been codified in the Innovator’s Dilemma.

So it’s not a surprising situation at all, but we can still do our little part of not hastening the enshitification more than it has, by refusing to be ignorant to the damage BMD is doing.

In fact AI is an inflection point that is creating a corner of fresh energy that does more than re-arranging the deck chairs on the Titatnic. It’s unknown what the net impact is on artists. Most innovations eliminate some previously valued operators. AI may be on par, or it may be yet more destructive - one of the few broad revolutions that doesn’t move workers from A to B, but just eliminates the need for workers without any good answers. We shall find out. But at a minimum it’s putting pressure on all incumbents to take another look at their roadmap and not be complacent.

Lastly I will say, if BMD were charging a fair price for their software, in line with industry practices, I would have much less of an issue with what they’re doing. I’m not the biggest fan of some aspects of their software, but I have a lot of respect for their engineering achievements. And I appreciate the energy they’re bringing to the market. I just wish they would be less of a wolf in a sheep’s skin in the way they go about it.

And I do use their products. I learned compositing on Fusion (before it was integrated into Resolve), I continue to use Resolve for color when it makes more sense than Flame or BL, and we recently switched begrudgingly from Avid to Resolve for distributed edit. I just wish some other products would (a) keep some air pockets and (b) do better product engineering, and I would happily stick with them as my little contribution of diversity. But it all starts in the C-suite, and they’re not interested at the moment in our little nook of the world.

2 Likes

in regards to tunet, i am not sure what I am doing wrong, but even after days of training it seems to just completely ignore what I want it to learn.

i have many shots of a dude in a shirt with a logo on it and a nice set ot manually cleaned logos from said shirt.

I first tried with cropped up closeups of yhe logo only which sort of worked but it heavily shifted the colors around after inference .

So i thought to augument the dataset it with wider shots where you see more of the shot and then also throw in some crops where no logo was present so that it learns to not do anything if the logonisnt present .

after like 10 epochs it just gives me back the src image , absolutely no change on the logo, now have 300 epochs and .. nothing is happening.

Maybe you guys have some tricks for training ?

Offer it free coffee, and a great career,

4 Likes

tried that but the AI is now citing german labour laws and apparently it has enough overtime to not work the rest of the month.

also it will be unavailable in august.

Is there a american version of this AI that doesnt complain ?

7 Likes

Hey Fin, way you described seams your AMP code changes might affecting how model is learning. Very common when dealing with precision code. Specially if you try to convert ckpts later on.
I do think the fact you doing mods is great and is the reason I released Tunet, for Artists and teams implement for their own needs.

Do a trace of the tensors of your new code and update the converters accordingly.
(I’m guessing)

You should be able to use the vanilla code tho, amp does work on windows as well as rest of codebase without need mods.

My suggestion that you work with vanilla, have a model that works. After that, start your mods and make your mods matches the results.

1 Like

thank you! its a brand new model with amp.

just tried it on my old set that only has 512x512 crops of the logo on the shirt and it immideately picks up on it, so it might juat be my training data.

by default from the repo, neither CUDA or AMP would kick in for me on win, would default to cpu.

1 Like

@tpo Do you have any tips for something like this (screens with and without tracking markers)?

I’ve tried both branches, but it doesn’t look like it’s doing anything, at least in the training preview.

Rocky Linux 9.5 - single GPU (RTX A6000)

Thanks!

1 Like

Just tested with your screen plates and seems to be working as expected.
One thing to keep in mind is if you run this on really high res plates it’ll be slow. The higher res the slower, maybe that’s the reason you not seeing any change? I’ve run this at 720p instead of the original 4.5k and it is so much faster. That’s why people often tends to crop things before sent to train. Also one thing you can try is to crop just the phone and not the whole frame.

2 Likes

Hey John! I just gave a quick train on a few GPUs to check if is just training time and worked well as well. See attached or can download checkpoint from here:

So i think is just training time since you are just one GPU and training on 4k+ plates.

1 Like

Hahaha On a few gpus? how many gpus and which?

8x B200 :sweat_smile:

6 Likes

Nice, thanks guys! I’ll resize and crop. Does the branch matter? What about Batch Size?

$25/hour…

Alot, Batch size means how many batches of images the model sees everytime it does one iteration.
For example, on a batch size of 64 the models sees 64 images in the same time to do a iteration, learn from it, take a decision and update its weights. Leading to a model that knows more context of everything. Batch size of 1, the model sees 1 image per iteration, usually leading to a noisy model.
The right size depends on your dataset, dst effect and such. There is not a define rule.
Bigger the batch size, more VRAM it will use.

For crops, might be the easiest solution, but also, spliting on differnet models, you could have a model for the phone, other one for the ipad or per light scene. That way you dataset shrinks per training, leading to a faster learning rate, potentially not need to reduce res.

3 Likes