Anyone up for a challenge? Let's replicate this workflow in Flame

Part 2. is way more interesting than part 1.

6 Likes

Bill Hader Popcorn GIF by Saturday Night Live

1 Like

@ALan - just hire the person that made it, no?

No cheating @philm its a Flame challenge :wink:

4 Likes

@milanesa - hiring the person that made it would be a challenge in itself…

good artists imitate,
great artists steal,
flame artists hire and take the markup…

6 Likes

fuckin A

1 Like

Chances are it would be something like this given that we lack some of the nodes required…

Create a 3d track of your background in whatever software. Get as many surface points as possible–get it dense. The denser the points the higher the fidelity.

In action load up the camera and axis denoting the point locators. Select all of the axis, and parent them all to the same image. Scale that image down, enable a position pass output with alpha enabled. Infill the position output using the inverted alpha and pick the frame which covers the area you want to retouch and mix freeze it.

Add another action, an STMAP from a colour source at the same res as your plate as the back and layer one being your muxed position pass. Copy over your tracked camera. Add an image from the background, add a position map to it using layer one and a diffuse map using the background. Pick a good frame where whatever you want to retouch is at its highest texel density, duplicate your tracked camera and keep only the keyframes on that frame. On your diffuse map set the mode to projection and select the newly held camera.

…From there it’s an unwrap. Subtract the output of the second action from the STMAP going into the action to create the unwrap vector. Plug that output into the vector output of a pixel spread. Set the coordinates to 0,0, the power to your X resolution, the X value to 1 and the Y vale to 1/aspect ratio. Add the original image to unwrap. Then paint away. When you’re done take the output of that second action, plug into the STMAP port of an STMAP node, your painted unwrapped output as the front and then whatever matte you need to reapply the motion.

The meat of it is really creating a surface for the background based on positions guesstimated by a combination of tracked points which are then interpolated using the infill blur. Honestly once you have a surface that basically represents the topology of the background, the rest of the workflow can be whatever you want it to be. That’s to say that you don’t even really need to unwrap once you have the surface. You could just project and film through a static cam. Paint away and then just project back through the same cam and film with tracked cam. It’s literally all about the surface… we all know how to fix shit once it’s stable.

5 Likes

I had a minute or two over the weekend…

https://www.dropbox.com/scl/fi/b010anghrsq6uwrb6zhdg/PPassPOC.zip?rlkey=f0fsfjj912utvl3gnyrk247jw&st=g5kip8gv&dl=0

Here’s a link to the approach I was mentioning above. Flame 2025 archive with included media. I’m not retouching anything away, just poc’ing this approach.

Since Flame doesn’t have a good analog to Points from 2d, I use the axis generated from the 3d track to create a position per axis and output a PPass from an over-scanned top-down camera. That PPass is then interpolated via an infill blur and held so it’s not constantly computing.

Next that PPass is passed to a couple different scenarios. The first is the traditional camera projection on geo stabbed by a held camera. We project down the PPass onto a surface, project the source clip via the tracked camera and then render via a frozen camera pos–in this case the same top-down camera projection the PPass. Then do some shit and project back on the same geo in a sep action.

The second scenario–which honestly I don’t know why you would use is an actual UV unwrap. We project our hold frame of UV’s from a held frame in our camera track and render through tracked cam. Then create unwrap vectors passed to a pixel-spread to unwrap, do some shit and then warp back using an STMAP node. Comp over source in action.

One thing that became pretty apparent is that if the STMAP node could work forward as it does today as well as backwards–essentially using the STMAP as the current state and warping the image input back to unity–we would have a really powerful workflow @fredwarren.

Anyhow, it works. The PPass hack is really nice actually.

9 Likes

That’s a fun setup.

Thx for sharing.

A

1 Like

Cheers Andy.

It’s a quick and dirty geo-generator at it’s core… the top down capture was all about creating a map with as few occlusions as possible—you could just as well strike it from the tracked cam and not overscan and blah blah blah.

I was just pushing it around to see what was optimum. Regardless I’m definitely adding it to the repertoire.

Could be one of your submission for the Reverse User Group 2024

1 Like

Word.

In the interim… FL-03388

Vote early and vote often.

3 Likes

Upvoted

1 Like

What Timing! I am in the process of writing a piece of code that has a feature that’s very similar to copycat. It uses a small training set of image pairs to train a model, and then uses the trained model to generalize to more images (eg: paint a few example frames, train, then inference across all frames of a clip). I’ve already used it on a couple of jobs, and it seems to work well, It’s still fresh out of the oven (still in the oven actually), I’ve not done much testing except on the few test cases I’ve been using during development.

It doesn’t read other models like Modnet (yet), but once I have this one complete, maybe I’ll look into that.

However, I have no idea how my code stacks up against copycat. There are clearly “problem situations”, but I think that’s true of copycat as well.

My more recent results are on an NDA’d project, so I cant show them, but below is an older image of the UI (if you can call it that). At that time it was a command line tool, but since then it has a basic gui.

The top window shows 3 rows, that each represent a sample of the training at that point. The first column of each row is the source image, the second is the target image (eg what you want the result to be), the third is the source image run through the model (inference). The fourth is the difference between the source and the inference, and the last column is the difference again, but normalized and gamma’d way up, so you can see the imperfections more easily.

So, in this sample, you can see in rows 2 and 3, column one, that there are palm trees that need to be removed, The second column is the manual removal, and the third column is the inference, more or less removing the palm trees. In the first row, first column, there’s a manhole cover under the tire, that gets removed. You can see the “scar tissue” more easily, in the last column.

In the window below that, you can see the loss function, slowly decreasing over the training epochs.

Anyone with a nuke license that’s interested in helping me benchmark this?

7 Likes

Sorry for the question, did you bring the 3D camera from Syntheyes?

1 Like

Yes. You need a 3d camera to establish world position of your locators relative to the 2d image plane.