flameSimpleML - Flame Machine Learning Source/Target tool with bespoke training

The latest (2.x) breaks with some nvidia cuda library on 2023.2 and 2023.3 if run within the Flame’s built in interpreter. The one before that seem to work but it has cuda version 11.6 and driver is 11.2. The one currently bundled (1.12.1) is the latest compiled with Cuda 10.2 and also it is much smaller in size so I can get two of them for python 3.9 and python 3.10 and .tar file still fits to downloads on github (they have some 1.2Gb limit for binariy releases).
1.12.1 seem to work in most of the cases including older installs on Centos7 as well, but breaks with some newer Ampere cards.

1 Like

Amazing work, thanks @talosh.

Quick question though, if I wanted to do a centralized install, would I just need to define FLAMESMPLML_BUNDLE & FLAMESMPLML_PACKAGES?

Packages I get as there’s said folder in the flameSimpelML folder but for bundles, would I point that at the bundles folder that’s used for flameTWML?

Those env variables is not needed here as it runs with Flame’s python interpreter. Two additional packages it needs is Numpy and Pytorch. The script will first check if it can import it with current pythonpath, so there is an option to install it in a centralized way. If it can’t find a package installed already it will try to add folder from its location/packages/.lib/{python_version}…etc to a pythonpath and import again. I’ve put numpy and pytorch there for python 3.9 and 3.10 that should cover 2023 and 2024 Flame versions.

The reason I have to use hidden folder .lib is because it prevents Flame to scan it and try to import whatever python files are there as hooks.

So with a centralized install it should be just enough to drop flameSimpleML folder into a centralized python scripts location and it should figure out the path to dependencies on its own.

Let me know if it works for you

Ah, ok. So the entire folder needs to be where the hooks are, not just the primary Flame hook?

Yes, you’d need to drop the whole folder there

1 Like

So it seems Ampere GPU’s will work only with Cuda 11, not 10.2. Nvidia driver that comes with the latest flavours of 2023 has Cuda 11.2 and Pytorch 1.12.1 has binary compiled with 11.3. Cuda has some forward compatibility and it should be able to run as long as the major version is the same. I can try to pack next version into two different packages for python 3.9 (Flame 2023 and early 2024) and python 3.10 (later Flame 2024 - I’m not sure since what exact version it has changed). Perhaps it would make more sense to have two bundles for different Flame versions but with Ampere cards support.

Could you guys on 2024 check from what version it has switched to python 3.10 please?

Fairly certain this is related to what you’re touching on @talosh … I can successfully train a model after updating the packages following the steps you mentioned earlier in the thread but it fails when I go to apply the model.


/opt/Autodesk/user/kyle/python/flameSimpleML/packages/.lib/python3.10/site-packages/torch/cuda/__init__.py:146: UserWarning:
NVIDIA RTX A6000 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.

From what I can tell, PyTorch is 1.13.1+cu117.

This is on Flame 2024.2.1

Could you please check if there’s any other pytorch involved apart from the one that lives in .lib folder? You can also check the version from within FLame python console:

import inspect
import torch

print (torch.__version__)
print (inspect.getfile(torch))

This might be helpful to check torch version and location.

If train script is running fine then it should be the same for inference definitely. Maybe giving Flame app or a whole box reboot can help?

Interesting. The results from the console are:

1.12.1+cu102
/opt/Autodesk/user/kyle/python/flameSimpleML/packages/.lib/python3.10/site-packages/torch/__init__.py

That said, a full reboot solved the issue!

On another note, it would be nice to be able to define the locations for the models as well as locations for datasets at some point.

Many thanks again for all your hard work.

It should be possible in v0.0.3, dataset is simply a folder with uncompressed exrs and one can either use a script to export or do it by hand. With training script there is a --model_path argument and one can specify the path and filename for the model. Apply script can browse for the models and load it from any location. There are just some defaults when no model path specified where to drop thraining data and how to name it

1 Like

This would be great to have.

2 Likes

Excellent call Alan.

I’d love some AI generated normal maps too but with a good depth map the normals map should not be necessary.

@talosh

what is the command line syntax to train if we manually export the images?

Thanks.

its just train.py {path_to_folder}

there are more options to check if you run it with —help to tweak training parameters, model path and type, etc

Folder structure should contain source and target folders with uncompressed exr sequences.

I’m testing it with flame’s python but it should work with any python 3.9 or 3.10

Hi Andriy,

I’m just following the demo video, and when I select ‘Train SimpleML Model’ with two clips selected, it’s skipping the choose filepath prompt and going straight to this pop-up

Screenshot from 2024-01-29 09-26-39

Apologies if I’m doing something stupid, it’s Monday morning. I’m running Flame 2024.1.2 on Linux. Any ideas what the issue might be?

Thank you!

Hi, it looks like you’re running the very first v0.0.1 and the video has been made with v0.0.3, could you please get the new one downloaded from here: https://github.com/talosh/flameSimpleML/releases/tag/v0.0.3

Just erase flameSimpleML folder from v0.0.1 and replace it with the one from v0.0.3

That’s worked, thank you! Glad it was something simple

Hi @talosh. I tested the script, and it worked perfectly after updating the PyTorch version as you instructed. I just have one question: at a certain point, during 3 hours of training, for some other reason, the computer froze, and I had to shut it down and restart it. Is it currently possible to continue the training by incrementing with the already saved result of the inference model from the home folder? So as not to start from scratch and improve what has already been trained before.

Hi Wilton, sure you can specify the model state file with --model_path argument and continue training, or amend / fine-tune pre-trained model on different dataset. I’m not sure if it is doing all the checks at the moment in terms of number of input and output channels in case you use a dufferent dataset but as long as they are the same it should work fine. If the machine got a freeze while saving the model state file it might get corrupt, so it might be a good idea to make a copy from time to time

1 Like

Thanks , I will try to do that later.