Let’s say you have a feature film where you will need to change something in the main characters that will need Roto.
So you will have hundreds of shots, millions of frames . With TUNET you simply could train all at once with many many samples of exemples, for that, you need directly use an extreme large batch size that would never fit in a single GPU.
To make sure the model has a chance to analyze enough samples per step you choose a batch of 64 per GPU, on a 8 GPUs machine, you are effectively training with 512 batch size.
never possible on a single gpu
These scripts are amazing, thanks @tpo! I wanted to build a UI for them to make it easier to use.
With the scripts below installed, in the MediaHub you can right click a folder (ideally a folder called “src” with a “dst” folder next to it) to bring up Tunet->Tunet UI. Simply pick your source and destination folders, if they haven’t been selected for you already. It’ll make the model folder automatically. Then after the training is done, back in MediaHub you can right click the checkpoint and hit Tunet → Convert Checkpoint and Import which will run Thiago’s convert_flame script. Next it waits for the onnx to be generated and then it automatically imports that into batch as an Inference node.
If anyone wants to test these scripts, copy them wherever you keep your scripts, and then modify the last 3 lines in the tunet/config/config.json file. I used find /usr -name conda.sh 2>/dev/null in a terminal to get the conda_init path. I figured if you can install something from GitHub, modifying 3 lines of a json will be easy.
Obviously, you’ll need to Tunet installed first. I don’t think the scripts are quite Logik Portal worthy
just yet, but the plan is to get them on there soon.
I only have a UI for the “simple” yaml, but could build on for the “advanced” version if needed.
Lastly, I’m assuming this will only work on Linux. DM me if you have any issues.
Changed things thinking on PCIe GPUs as well, how the weights are update and how data are sync between gpus. You should get better perfomance.
Now multi-gpu and single-gpu are merged into one, so you dont need anything else.
Tested on multi 6000 Ada cards and worked great, almost same speed but double batch size.
make sure to git clone from multi-gpu brench, main brench im keeping original for now.
Since i mainly use SXM cards and those automatic deal with p2p in between, i endup not paying attention to pcie, but now that is gone.
Sorry Chris! No, currently is Front (rgb) only training.
While the Tunet model will eventually learn the difference, it’s not guaranteed to focus on it quickly or efficiently, especially if the differing region is small relative to the fullframe.
Just to let you know, people who want to try this on Windows, I just made it work. By default this does not work on windows since it needs nccl which is a linux only lib but I made it work by tweaking a bit of code and using pytorch 2.3. I also think this can open the door to use it on Mac only environments. I have to try but have no Macs around, i’m a pc guy.
See screenshot below, THIS IS NOT WSL, pure windows. Windows 10.
Also, @tpo thank you thank you very much for this. @tpo
I could post it here if anyone wants to try it, but first I need permission from @tpo, he has the word.
@tpo Is there a way to implement the ability to resume training on latest checkpoint? I was looking at the flags and didnt find any flag for this. Check Chris’ comment below. Thanks
I just tested this and for some reason with the new/updated scripts, the converted onnx model is not working, it loads but it just gives a solid color output. Fortunately I had a backup of previous scripts. On the previous one was working, but the output was quite blurry as if it was low res. Nuke was working fine, but flame was not.
Could anyone try this with the new/updated scripts?