Releasing TUNET - A training ML tool

cristhiancordoba · July 22, 2025, 10:35pm

Are you using a Cloud GPU server for this?

philm · July 22, 2025, 10:38pm

~$500k rig on site in a vfx company? improbable…
not to mention storage and networking and power…
but maybe…
but improbable…

tpo · July 22, 2025, 11:06pm

Hey @philm @cristhiancordoba yes it is local, incl 800Gb fiber conection and hot storage. You need and you want to be local some clients does not allow or they dont want thire data/footage going to elsewhere on a 3rd party vendor where can end up on datasets or loose control. On those cases, local is the only way to control and lock the clients data.

I posted sometime ago if you scrool up on this post my other older GPU server, 8x H200s.

recently sold to a University.

Is a thing, and i confess, im going way to much haha

philm · July 22, 2025, 11:37pm

I love it.
but your situation is no longer as part of a vfx company.
so there’s that.

Stefan · July 29, 2025, 5:49pm

Has anyone succeeded running Tunet on MacOS? (sonoma 14.7.6), mackbook pro M1 max.
getting various error with imports. like this one:
Traceback (most recent call last):

File “train.py”, line 33, in

from torch.amp import GradScaler, autocast

ImportError: cannot import name ‘GradScaler’ from ‘torch.amp’ (/opt/miniconda3/envs/tunet/lib/python3.8/site-packages/torch/amp/_init_.py)

Changing the import tofrom torch.cuda.amp import GradScaler, autocastlead to other errors:

Error initializing DDP: module ‘torch._C’ has no attribute ‘_cuda_setDevice’. Check DDP environment variables (RANK, WORLD_SIZE, MASTER_ADDR, MASTER_PORT, LOCAL_RANK) and NCCL/Gloo.

Traceback (most recent call last):

File “train.py”, line 1206, in

train(config)

File “train.py”, line 578, in train

setup_ddp(); rank = get_rank(); world_size = get_world_size()

File “train.py”, line 62, in setup_ddp

torch.cuda.set_device(local_rank)

File “/opt/miniconda3/envs/tunet/lib/python3.8/site-packages/torch/cuda/_init_.py”, line 408, in set_device

torch.\_C.\_cuda_setDevice(device)

AttributeError: module ‘torch._C’ has no attribute ‘_cuda_setDevice’

Stefan · July 29, 2025, 5:54pm

The formatting is weird, sorry.
grad_scaler.py is there by the way:
/opt/miniconda3/envs/tunet/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py
So the first message makes sense?
Did I get the wrong install maybe?

when pip installing:
Downloading torchvision-0.17.2-cp38-cp38-macosx_10_13_x86_64.whl.metadata (6.6 kB)
Downloading torchaudio-2.2.2-cp38-cp38-macosx_10_13_x86_64.whl.metadata (6.4 kB)
Downloading torch-2.2.2-cp38-none-macosx_10_9_x86_64.whl (150.6 MB)

tpo · July 29, 2025, 7:22pm

Error indicates you are trying to use Linux version into Mac.
Download the multios one or use this git clone directly, just added:
git clone --branch multios --single-branch

MacOS does work, but i was very sad with the speed of the training, is crazy slow.
After search more i get to know that not even Apple uses Apple silicon for training.
Is just too slow for such a task.

Stefan · July 29, 2025, 8:01pm

Thanks Thiago @tpo !
Makes sense, I did forgot to specify the branch when cloning.
re-installing now

Stefan · July 29, 2025, 8:03pm

hmm, this time I did
git clone --branch multios --single-branch https://github.com/tpc2233/tunet.git

… but still getting this error:

Traceback (most recent call last):
File “train.py”, line 30, in
from torch.amp import GradScaler, autocast
ImportError: cannot import name ‘GradScaler’ from ‘torch.amp’ (/opt/miniconda3/envs/tunet/lib/python3.8/site-packages/torch/amp/init.py)

Stefan · July 29, 2025, 8:09pm

Could it be something with pip install torch torchvision torchaudio getting the wrong version? grad_scaller is not directly under torch in my conda env, but inside the cuda folder

Stefan · July 29, 2025, 8:18pm

Alright, I just changed the import again in train.py (but this time with the proper cloned repo) and it seems to be working (from torch.cuda.amp import GradScaler, autocast)

13:17:59 [INFO] Epoch[1] Step[330] (330/500), L1:0.0433(Avg:0.1289), LR:1.0e-04, T/Step:0.817s (D:0.001 T:0.003 C:0.170)
13:18:03 [INFO] … (D:0.001 T:0.002 C:0.155)
13:18:03 [INFO] Epoch[1] Step[335] (335/500), L1:0.0863(Avg:0.1284), LR:1.0e-04, T/Step:0.817s (D:0.001 T:0.002 C:0.155)
13:18:07 [INFO] … (D:0.001 T:0.001 C:0.163)
13:18:07 [INFO] Epoch[1] Step[340] (340/500), L1:0.0692(Avg:0.1276), LR:1.0e-04, T/Step:0.818s (D:0.001 T:0.001 C:0.163)
13:18:11 [INFO] … (D:0.001 T:0.001 C:0.160)
13:18:11 [INFO] Epoch[1] Step[345] (345/500), L1:0.0893(Avg:0.1268), LR:1.0e-04, T/Step:0.819s (D:0.001 T:0.001 C:0.160)

tpo · July 29, 2025, 8:20pm

Not sure if amp is supported in all M chips.
Try disable amp on your config if you are using.

Stefan · July 29, 2025, 8:53pm

Guessing amp is True by default?
Using a simple config, without setting amp to anything. Seems to be working now (?)
If the results are not as expected I’ll try to set it to False.

EDIT:
the info was in the terminal:
Optimizer: AdamW | AMP Enabled: False

Stefan · August 1, 2025, 1:09am

I got it to install, run and convert on Mac and Windows, but it’s not doing what it should.
Rulling out the mac, on a windows10 RTX4090, with datasets at 1164x1620 (crops), with AMP turned on, getting this error:
16:47:09 [WARNING] AMP requested but device is CPU. AMP disabled. 16:47:09 [INFO] Optimizer: AdamW | AMP Enabled: False

Then on CPU it’s like 9s per step
Is this something to set on the windows machine (CPU/GPU) ? What am I doing wrong?
Has anybody else ran into this problem?

Stefan · August 1, 2025, 1:53am

I followed finnschi’s solution in the issue’s question and it worked.

about a second per step on this 4090, at 1164x1620 (crops)

@finnjaeger is that you? (finnschi)

finnschi’s solution ===>
conda activate tunet
pip uninstall torch torchvision torchaudio -y
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

verify that its working:
python -c “import torch; print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0)); print(torch.version.cuda)”

===========
(tunet) C:\Users\stefan\tunet>python -c “import torch; print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0)); print(torch.version.cuda)”
True
NVIDIA GeForce RTX 4090
11.8

finnjaeger · August 1, 2025, 2:07pm

yea thats me, had to do anlot more things to get AMP to work, to get my s/step down, i can send you my fork if you want, but idk if I made things better or worse

cristhiancordoba · August 1, 2025, 2:35pm

Send pls, would like to see if it reduces my training time.

mlheureux · August 1, 2025, 5:10pm

Curious to know where the aov frames are initially generated? Ran the archive and it works as expected, but obviously the results for aov separation are less than ideal when piped a new clip. Thus new training needs to be done but not sure where to create the different AOV’s to do the training on?

finnjaeger · August 1, 2025, 6:50pm

~~ill push it to my github later~~

I am too lazy :

tunet_windowsTunedByFinn.zip (303.3 KB)

cnoellert · August 1, 2025, 7:04pm

The training would be done via Tunet combined with the models mentioned by @tpo on your new source clips.

Topic		Replies	Views
Making Rain - Flame's inference experiment Tools and Tech machine-learning , batch , ai , inference	11	676	April 22, 2025
It's a good day - Flame 2025.1 is here! 🔥🚀 Autodesk News update	46	1281	August 8, 2024
flameSimpleML - Flame Machine Learning Source/Target tool with bespoke training Flame Questions ml	93	3077	February 1, 2025
TUNET is now a app, with a UI, no more cmds Tools and Tech machine-learning , batch , inference	10	753	August 31, 2025
Talkin' Timewarps Tips and Tricks	69	2800	March 18, 2021

Releasing TUNET - A training ML tool

Related topics