Not really. It’s not that the code isn’t optimized. The inference algorithms ML needs are very specific and very compute intensive. So unfortunately it’s not a matter of a smart person fiddling with a few lines and bingo.
AI in general is a performance challenge on many levels.
Hi guys, v0.5.0 is a bit all over the place at the moment in terms of different models, general approach to how to get the data from Flame and back and so on.
Also most of the work I’m doing in that branch currently focused on being able to train it on one’s own data as it is becoming more and more an issue.
So I would not recommend to use v0.5.0 at the moment and I’ll give you a shout when its become something useful.
Actually PyTorch made quite a big progress on Apple silicon recently and its is not as slow as it used to be.
It looks that I need to take v0.4.3 a bit further and adopt it to newer versions of everything. I will try to wrap something that might work up from old branch and will give you guys a shout when its there to test.
Ok, should I try 0.4.3 then? I just tried 0.4.4 and it installs and then when I try to run it I get this:
/bin/bash: /Users/carl/Documents/flameTimewarpML/miniconda3/bin/conda: No such file or directory
/bin/bash: conda: command not found
Traceback (most recent call last):
File “/Users/carl/Documents/flameTimewarpML/bundle/inference_sequence.py”, line 3, in
import cv2
ModuleNotFoundError: No module named ‘cv2’
logout
EDIT: Ahh I see, there is only a linux version for 0.4.4
Just let me try to see if I can amend 0.4.3 quickly to be able to use new pytorch on Apple silicon. That might be easier, I’ll give you a shout when there’s something to test, hopefully over the weekend
Thanks @talosh!! We’ll test as soon as it’s ready!
For what it’s worth, when processing a 4k clip on 0.4.3 with a M2 Ultra Mac we found that it would utilize a single core. When halving the res of the same clip to 2k it processes using 6-cores, and when processing at 1/4 res it leverages 22-cores.
OpenCV has dropped suppertong exr files. I’ve found a code that reads uncompressed exr files and amended it to be able to write in pure python. No compression is supported but I think its fine for not having to maintain additional complicated dependencies.
I’ve been getting a CUDA error in Rocky Linux / Flame 2024.1.1 - TWML v0.5.0 dev.005. Wondering what the fix might be. Tried converting the clip to HD, and restarting. No help.
Error: CUDA error: no kernel image is available for execution on the device
CUDA kernel error might be asynchronously reported at some other API call, so the stack trace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Yeah, I still haven’t gotten it to work. Tried removing and re-installing. Something seems broken. Running RTX A5000, so I think it’s related to that. I can’t get the initial install to happen when starting Flame. Went through the steps @randy posted on LP, but no dice. Wondering if I criss-crossed the older version with the newer? Anyone know how to clear off the older version completely to get a fresh install?
I’m working on some sort of solution with enabling 0.4 branch to work on newer hardware / GPU’s.
It looks that it would be possible to have a bespoke python environment with all packages needed alongside with the one bundled in Flame using “conda-pack” (Conda-Pack — conda-pack 0.7.0 documentation)
If it works it would make it easier to get it working on new hardare
HI guys, I’ve made some progress into getting timewarp compatible with 2025 and did some tests on MacOS as well with very recent PyTorch 2.2
There’s still one function that has not been accelerated for Mac in PyTorch but the rest are.
Testing on the same 2k test sequence of 36 images 50% slowdown:
MacMini M2 Metal - 39 sec
MacMini CPU (single therad) - 121 sec
Linux P5000 - 16 sec
Linux CPU (single therad) - 158 sec
Training time on mac is still about 3 - 4 times slower comparing to Linux, due to incomplete backpropagation for nn.functional.grid_sample on Metal. But it is possible to train and finetune models as well.
I’ve made some progress with bringing old “interfaceless” Timewarp closer, and it has now training code as well.