flameSimpleML - Flame Machine Learning Source/Target tool with bespoke training

flameSimpleML - Flame Machine Learning Source/Target tool with bespoke training:

Hi guys, I’ve put several scripts together in a package called flameSimpleML:

https://github.com/talosh/flameSimpleML/releases/tag/v0.0.1

This is a “copycat” - style model and scripts that allows training it using your own “source/target” data.

Training script is command-line at the moment. In order to train your model you need to create a folder somewhere for your dataset and then create two more folders named “source” and “target”. Export your training data there as uncompressed exr sequences with no alpha channel. The script would assume that both “source” and “target” sequence are of the same dimensions and number of frames and exr’s are uncompressed.
When you run the script it would create a third “preview” folder within dataset folder and you can monitor the progress of model (hopefully) getting smarter there.

Training does not take a lot of GPU ram and can be run in a background so one can continue to use Flame.
One of the simple tests I’ve been using while writing is to give it the same sequence in colour as target and bw as source and teach it to colorise frames.

Trained model data is saved every 1000 iterations into your “homefolder/flameSimpleML_models/” as .pth file

To apply model select it from menu and navigate to that folder to load it. It is possible to use “F1 / F4” to see before / after.

This is a very first release and I’ve been testing it mostly on linux with Flame 2023.3 and I gave it some very limited testing on MacMini M2 using Flame 2025 tech preview. It might work on Intel macs if you sort PyTorch and Numpy dependencies out (give me a shout if you would like to try)

40 Likes

This is really cool, thank you.

I’ve used some copy cat on a current job. I’ll try to rerun some of these scenarios on this script to test it and compare. Might be a few weeks before I get to it though.

1 Like

Thank you for doing this, Andriy.

Thaaaanks!

You are doing so many great things for this platform. Thanks for all that you do!

5 Likes

We need to pass the hat for this dude. However many pints he needs…:beer:

13 Likes

amazing @talosh !!!

@tpo

1 Like

This is fantastic talosh, loved how you are managing things.
Spatial conv training in Flame, neat!

:clap: :clap:

4 Likes

This is super cool, and also amazing that an end user is doing the work that ADSK corporate should be doing but can’t.

Would be great if there was a screen record walk thru of install and usage.

7 Likes

Oh man I can’t wait to play with this. you’re a legend :pray:

Seriously. Why is this coming from an unpaid user and not the corp that we pay and has sufficient resources to do this. Why?

@fredwarren, any insight on this?

3 Likes

maybe you’ve noticed the sheer amount of lawsuits flying around with regards to AI processes? I think it’s a legal minefield to implement at scale.

Generative AI Lawsuits Timeline: Legal Cases vs. OpenAI, Microsoft, Anthropic and More - Sustainable Tech Partner for Green IT Service Providers.

1 Like

The reason for that potential problems is the data being used to train ML models without consent, not the models per se. And this allows training using your own data.

2 Likes

Good idea I’ll try to do smth tomorrow

4 Likes

I think that’s exactly right - the honeymoon with ML is over, where everyone just swoons instead of asking questions. Now the corporate lawyers are saying ‘not so fast’, and they definitely right to do so.

Those AI tools have two components - the technology and the data. The technology is evolving and getting better, but the breakthrough in the last year has been the training of very large datasets. And it’s those datasets that are quickly becoming the issue. We’ve briefly discussed this in the original thread where the training data for these video tools comes from, and there are definitely valid issues. Many of them were created for research purposes, not commercial use. Consent requirements are very different for those two scenarios.

It’s the equivalent of us snubbing our noses at stock agencies like Getty and ActionVFX and just grabbing files from the Internet for your next Superbowl ad. Except many of the data owners don’t know their stuff has been used, and they don’t have the resources to do anything about it, unlike Getty would.

That said, I think it would be helpful for companies like ADSK to have some of these conversations more publicly rather than just withholding tools without context. That will be the first step to create broader awareness, which is required to marshal the resources to fix it.

In the mean time tools that you can train yourself (@talosh’s effort here and Nuke’s CopyCat) can step into the gap to a certain degree. But make no mistake, it’s not the same thing. I use CopyCat regularly and with success, but it’s a different use case than what pre-trained models on large datasets can accomplish, so it’s not a replacement. Some of these quite useful tools only work if trained on massive data sets, and that is not anything that’s economical for individual artists to do.

First of all: great stuff @talosh :clap:t2:

I can give you some insight (even though a good part has already been answered)

  1. The biggest reason we stopped delivering ML tools for the moment is that we are indeed looking at all the legal implications of using data sets that may or may not (that’s the whole point for now, nobody really knows) be illegal to use. We now have some customers who are asking us for lists of data sets that were used to train our tools and if they are not explicitly free of licence than they won’t use our product. That being said, that doesn’t mean we are not working on new tools and searching for ways to deliver them in a totally legal way.

  2. This may not sound right in writing, but keep in mind that if Andriy can deliver this tool it’s because Autodesk is taking the time to develop and maintain platforms for vendors/users to do so. The goal here is not to take the credit for what has been done, but please realize that if you can use OpenFX tools from vendors, shaders from other artists, or ML stuff from Andriy it is because we took the time to implement/maintain OpenFX, Matchbox, Lightbox, Pybox, Python, Custom Actions, etc. This is not different than what other vendors are doing (think about Gizmos in Nuke). This gives the opportunity to multiple vendors/users to come up with more tools than what we could deliver ourselves and let them decide which ones they get instead of us deciding. And of course, developing/maintaining these tools doesn’t come for free; it takes a lot of time to do so.

13 Likes

But in the CopyCat paradigm, the dataset is not pre-trained, and is the end user’s local data. Wouldn’t that inherently have no legal quamire?

3 Likes

That’s correct.

1 Like

Thanks for the info, Fred.

1 Like