Machine Learning - implementations and implications

We’ve all been reading about or using text-to-image generation, timewarps, copycats, and other implementations of machine learning. I came across this article this morning about StableDiffusion and its implications.

Are we going to be replaced by some algorithm?
The author says “No-one expected creative AIs to come for the artist jobs first, but here we are!” when talking about concept designers. Do you feel threatened?
Or are we going to have to learn some other tool to create our images?
What will happen when the time to deliver a shot is cut down to one fifths the time? Will you charge more for your time or take on more jobs?
Will there be more jobs if the market is saturated by bots?

And maybe this should be a separate thread (@randy) but how do you think ML should be implemented in Flame?

  • Style transfer
  • Filter generation
  • Clean-up
  • Image-2-image
  • Timewarp (hello @talosh :grin:)

What should the workflow look like? It probably wouldn’t be parametric, as each instance of the algorithm might create a different result, but maybe an AI tab next to Batch with different nodes of implementations like the ones I listed…

Or something like this?

1 Like

Just played around with StableDiffusion for a bit. The results were some of the better ones I’ve seen from the various text-to-image models I’ve played around with. The pace of progess in these AI models is impressive, but I don’t worry about my job security at all.

I think there is something about a computer generating an image from whole cloth that captivates the mind of the average person. It lends itself to a certain level of anthropomorphism since until very recently, the generation of images has been wholy the provenance of humans. This can lead us to believe that AI is “coming for our jobs” as if they were just people in another country where our jobs can be outsourced. They may be coming for our jobs, but not because they can create 512x512 stills based off of text prompts.

It seems to me that in this initial phase of AI text to image generation, the best use case as it applies to our jobs is for matte painting. Its pretty cool to enter a prompt and receive back an image of that prompt, but google image search already does the same thing. Almost everything in the world has been photographed a million times from every angle, in every lighting situation, in every season. I’d still much rather go look for a real photograph than mash every photoreal, sharp, detailed, etc. modifier I can think of into a text to image algorithm and still receive something that I would generously describe as “painterly.”

There is an inherent limitation to any algorithm and that is the data it was trained on. That makes it hard for me to belive AI will ever be able to compete with humans on a creative level. Whereas a human has the capacity to imagine something that has never existed before, an algorithm is unable to create something that isn’t derived from the data it has been fed.

My favorite prompts to feed into these image generators are usually something along the lines of

  • The most complex thing ever
  • The most intricately detail object in existence
  • An object that doesn’t exist
  • Something that has never been thought of before
  • A photo of nothing

Whereas those prompts are so open ended that you can imagine a person coming up with nearly anything, these image generators will usually return images that are clearly derived from some of the most intricate and detailed objects ever created by people. Think baroque sculpture and architecture, or meticulously crafted gold vases and platters. Or they just completely fall over and generate mush. StableDiffusion seems to like generating meaningless text when it can’t come up with anything else.

From a financial standpoint though, I don’t imagine many execs are thinking, “Damn how can we get our concepting costs down?” and the Alpaca example shows that AI is still just a tool for an artist to use. Actual art direction isn’t going anywhere, nor are the clients who want to change this or that meangingless pixel so they can feel useful.

The actual practical uses of AI are getting far less attention in my mind, because they aren’t as flashy. We’ve already outsourced all of the jobs that will be the first to become automated, for the exact reason they were outsourced in the first place. They are labor intensive, time consuming, and relatively simple tasks. Roto, cleanup, and camera tracking are the most financially viable applications of AI software in the short term and while these tasks are mostly simple, automating them is not. I saw my first demo of AI roto six or seven years ago and its still not in widespread production anywhere that I’ve seen. If its another six or seven years until all roto is generated through AI software then whats the runway for completely replacing far more complicated tasks?

Perhaps this is naive, but the technically hurdles seem so great that I think most people here will be retired by the time there is any substantial threat to their jobs. The computing power alone may be a limiting factor for many of the things we do.

I frequently think about Gravity when AI is being discussed. When it was released I remember seeing a show-and-tell that discussed render times. The compute power was astronomical and that was when the computer was explicitly being told exactly what to do. Now imagine the computer had to do the creative part in addition to all the rendering. Most of these text to image generators are taking several seconds to a minute to generate one very low-res still image. Now complicate that by asking for video, and then complicate it even more by asking for 4k. The resources just don’t exist to entirely replace our jobs. There is a tendency in these conversations to underestimate how much of the “computing” goes on in our brains since its difficult to quantify in exact terms.

As for what kind of AI/ML implementations I’d like to see in Flame, none of it is creative.

  • Object identification for roto, automated cleanup, and depth passes with a cryptomatte style selection interface
  • Some module akin to openFX or matchboxes that you could load bespoke trained algorithms into like the ML timewarp
  • Camera tracking with depth passes and lens distortion corrections
  • Some kind of automated paint in-fill
  • depth and object aware defocusing
  • something like Nuke’s copycat
  • degrain and regrain nodes
1 Like

Speaking of…DALL·E: Introducing Outpainting

1 Like

ML is definitely an amazing addition and the innovation happening is very much welcome.

I’ve used Nuke’s CopyCat on numerous occasions to get me out of corners successfully. And other uses like upressing doc footage, generating masks of facial features, holding out moving objects on 3D trackers are all super useful things, and especially timesavers.

One thing to keep in mind that ML is a subset of AI. It’s the part where training on massive datasets can effectively get at some nuances humans can’t ever do themselves in any reasonable time. But it also has to be trained, and you need suitable training datasets. As such it usually just creates variations of existing concepts, but cannot deal with a white canvas like a human could. There are also huge biases at work, because of what training datasets are used or available. On that note, I think using Nuke CopyCat is a good exercise, because you do the training and can see what helps/hurts, vs. applying some black box that someone else trained.

One of the pitfalls of any AI model is that you cannot predict the quality or answer it will come up with. So they’re great time savers and Hail Mary moves, but if something has to be done, AI cannot be a strategy, but just one piece in the toolbox. As the saying goes ‘Hope is not a strategy’.

Along those same lines, actually, you can use any parametric method as part of your pipeline and keep rendering and get the same result. ML based processes should actually be baked as soon as you have a good result, because successive render passes will not be guaranteed to be pixel accurate. So if your client signed off on it, you better have that baked. That creates challenges in the pipeline and how to handle revisions. That is also evident in that most current generation masks/mattes are pretty good if you need a soft area selection for some color correction, but generally don’t enable pixel accurate selection needed for comping from what I’ve seen so far.

In the foreseeable future I think ML based tools will make us more efficient, but they cannot take place of predictable or super accurate processes, and they will struggle if we have to expand horizons or go down more specific paths.

Some of these limitations no doubt will shrink as time progresses, some others are inherent to the process.

There are also ethical issues to consider, and there is a whole body of research on discrimination by ML as it impacts ever more parts of our lives. But specifically to our industry, what if our ML models have not been trained on faces of minorities (due to ignorance or due to lack of sufficient samples), and suddenly because the efficiency of ML face masks has cut the budget, you can only do the work on mainstream faces?

I’m a big fan of ML tools and their innovation. But understanding some of the technology behind it, I think it’s also important to keep in perspective and not throw caution into the wind.

In terms of what I’d like to see in Flame:

  • Expanding on the ability to self-train models (ala CopyCat) is something that should be part of. It’s an underappreciated aspect of ML tools.
  • I think configuring 3D and HSL keyers with ML would be super useful. Not creating the final mask, but setting the keyer parameters based on a lose selection and refining them.
  • A combo of in-fill nodes with a skin analysis to do a better job of blemish removal would be high on my list.
  • Regrain nodes that mimic film response better than current methods (though maybe film grain will finally wane)

Am I worried about my job because of AI? No. For one, I’ve reinvented myself many times. You have to keep running ahead of the curve. So even if AI took my current job, I’d be happy to figure out what I can do instead then.

A little more tongue in cheek:

How about a ML based tool that translates the client’s notes (both initial vision & review) into language that is precise and actionable to avoid us having to guess or draw it out from them? Ideally in context of the visual being commented on.

People who wrote comments of “push the red a bit more” where most happy if you did this correction.

Of course would be hard to get a good training set on that. But hey, it would be gold.

Frame.IO could work on that. They have the imagery, the comments and the revisions. They could do the training anonymously.

“Is it art?” and it will eventually evolve into “what is art - in this day and age of ML/AI?”

1 Like

This one is pretty easy for me.

No its not art, and I would venture to say that very little if anything that was entered into the Colorado State Fair was art regardless of whether it was made by a person or a computer.

Art is first and foremost a theoretical practice with the objects that it produces being physical manifestations of that theory. This POS, completely devoid of any context barring it’s inane title, has virtually no artistic merit IMHO.

I guess I’m an annoyed human.

1 Like

I imagine the most pressing question will evolve from “is it art” to “how much should/ can we sell it for?” The modern art market is already confounding and nonsensical, this will surely push that even further.

1 Like

The “film” is more like a powerpoint presentation with damage node applied. :rofl:

Now this, unlike the AI “film”, I find intriguing. Prompt the program in a direction, and see what results and implementations it comes up with, to further expand our creative vista and concepts. Which then could cause the user to prompt new and further concepts, which then fuels the user again, etc. It could be a fruitful creative loop.

Just saw this, which is interesting.

1 Like

I think this is a brilliant implementation. Driving the image generation with an MS-Paint like interface was a new approach, but this will allow you to generate the basic layout in 3D and get a decent looking image that can be used in a stroyboard…

The issue right now is stability. every time any parameter in the input is changed a tiny bit, the output changes completely. I guess that is just the nature of it right now.

Here are some more implementations:

1 Like

Standalone desktop Stable Diffusion application for Mac.


Awesome… ADSK should integrate an interface in Flame for these models.

1 Like

Stability AI Releases Stable Animation SDK

It’s a good tools to implement inside flame using python hook. I think that this model produced consistent mask during the take.

Tracking Anything:


OK. This is really getting out of hand…
INVE: Interactive Nerual Video Editing

1 Like