Releasing TUNET - A training ML tool

Yes directly. Good way of thinking is:

Let’s say you have a feature film where you will need to change something in the main characters that will need Roto.
So you will have hundreds of shots, millions of frames . With TUNET you simply could train all at once with many many samples of exemples, for that, you need directly use an extreme large batch size that would never fit in a single GPU.
To make sure the model has a chance to analyze enough samples per step you choose a batch of 64 per GPU, on a 8 GPUs machine, you are effectively training with 512 batch size.
never possible on a single gpu

Ie:

3 Likes

These scripts are amazing, thanks @tpo! I wanted to build a UI for them to make it easier to use.

With the scripts below installed, in the MediaHub you can right click a folder (ideally a folder called “src” with a “dst” folder next to it) to bring up Tunet->Tunet UI. Simply pick your source and destination folders, if they haven’t been selected for you already. It’ll make the model folder automatically. Then after the training is done, back in MediaHub you can right click the checkpoint and hit Tunet → Convert Checkpoint and Import which will run Thiago’s convert_flame script. Next it waits for the onnx to be generated and then it automatically imports that into batch as an Inference node.

If anyone wants to test these scripts, copy them wherever you keep your scripts, and then modify the last 3 lines in the tunet/config/config.json file. I used find /usr -name conda.sh 2>/dev/null in a terminal to get the conda_init path. I figured if you can install something from GitHub, modifying 3 lines of a json will be easy.

Obviously, you’ll need to Tunet installed first. I don’t think the scripts are quite Logik Portal worthy
just yet, but the plan is to get them on there soon.

I only have a UI for the “simple” yaml, but could build on for the “advanced” version if needed.

Lastly, I’m assuming this will only work on Linux. DM me if you have any issues.





tunet_ui.zip (769.1 KB)

12 Likes

Like i told you John, Just amazing! you nailed. So cool.

1 Like

Hey @ALan ive updated Multi-gpu brench.

Changed things thinking on PCIe GPUs as well, how the weights are update and how data are sync between gpus. You should get better perfomance.
Now multi-gpu and single-gpu are merged into one, so you dont need anything else.

Tested on multi 6000 Ada cards and worked great, almost same speed but double batch size.

make sure to git clone from multi-gpu brench, main brench im keeping original for now.

Since i mainly use SXM cards and those automatic deal with p2p in between, i endup not paying attention to pcie, but now that is gone.

3 Likes

@tpo does the training allow for isolation of some sort? An alpha in source that denotes where the learning should focus?

1 Like

Sorry Chris! No, currently is Front (rgb) only training.
While the Tunet model will eventually learn the difference, it’s not guaranteed to focus on it quickly or efficiently, especially if the differing region is small relative to the fullframe.

So better to pre crop then before feeding it for training then. Thanks @tpo

1 Like

Amazing work @tpo @john-geehreng and all you script-heads out there. This is amazing work you’re doing

1 Like

About to give this a try on Windows / WSL2 Ubuntu, huge thanks!

EDIT: Just did a 30mins train, enough to say, OH MY GOD THIAGO

4 Likes

Just to let you know, people who want to try this on Windows, I just made it work. By default this does not work on windows since it needs nccl which is a linux only lib but I made it work by tweaking a bit of code and using pytorch 2.3. I also think this can open the door to use it on Mac only environments. I have to try but have no Macs around, i’m a pc guy.

See screenshot below, THIS IS NOT WSL, pure windows. Windows 10.
Also, @tpo thank you thank you very much for this. @tpo

I could post it here if anyone wants to try it, but first I need permission from @tpo, he has the word.

4 Likes

make a pull request to his GitHub repo with the change.

Is so nice to see @cristhiancordoba @febreroflame that is great.
Glad you liked it @febreroflame : )

Ive got lots of messagens after release asking for a Mac version. Also from Nuke community.

Im working on a multi-os version, Tunet will automatic detect os and just work. Same for yaml, can be windows, mac or linux paths.

Is already working, just cleaning up some stuff and will release.

attached on native window conda.

6 Likes

hey everyone! Cross-Platform support is now live.
Git clone again and You should be good to go in any OS.

Multi-gpu is Linux only.

Benchmark:
On mac for training is bit meh as expected:
Win with RTX 6000 Blackwell: 0.3 seconds per step
Mac with M1 Max: 3.9 seconds per step

meaning if a training would take 1 week on Nvidia, would take 13 Weeks on Apple M1 max.


5 Likes

@tpo Is there a way to implement the ability to resume training on latest checkpoint? I was looking at the flags and didnt find any flag for this. Check Chris’ comment below. Thanks

It does this automatically based on the naming I believe.

2 Likes

Thank you Chris. You’re right! Fantastic!!!

I just tested this and for some reason with the new/updated scripts, the converted onnx model is not working, it loads but it just gives a solid color output. Fortunately I had a backup of previous scripts. On the previous one was working, but the output was quite blurry as if it was low res. Nuke was working fine, but flame was not.
Could anyone try this with the new/updated scripts?

Try start a training from scratch with the new version. Mixing is no go. I believe that is why.
Make sure convert scripts are also updated.

You still have the older trainer, I’m calling legacy trainer, it is under util folder, if you use that, is same as before.

@tpo for multi-gpu, do we still use the separate branch, or was that merged into master?

Thanks.

separate branch Multi-gpu - Linux only, same for the converters.