Using a vram measuring stick?
Platforms (i can speak for the ones i have friends or contacts internally, some on training models, some on inference pipelines).
Usually plataforms you might get even wrost access to the models.
They need to serve milllions of users in daily bases, so serve full raw models will be very rare, almost impossible to serve all its user. So they usually have a budget, depending on demand you get a model with less quantize, sometime with more quantize.
Since they dont need commit with conistency, is totally fine do that in exchange of large number of users and tokens getting used.
Soemtimes if they are on release, or doing a PR on market, there is some investiment on having raw models, but the cost sky rocket in serve raw models, so is also really rare to happen.
Going to give a patrical example, lets use Wan2.2 a popular video gen model.
Raw i2v 720p inference takes 8xH100 GPUs, consuming around 320GB VRAM, takes 159 seconds to Generate/render.
Comfy/Quantize i2v 720p inference takes 1x4090 GPUs, consuming around 22GB VRAM, takes 6 minutes?? to Generate/render.
The only way for that to happen, is changing model behavier heavilly to allow such a dramatic requirement. What the community ussually do, during the workflow fix as most as possible with other nodes techniques. But that is the gab.
Same goes for Plataform and their own models. Same reality.
Thiago, thank you for going into all this detail! It’s fascinating. How would those differences show up in final output quality?
It is all about how much activation layers of the model you can support to activate during its inference processing. More VRAM allows more activations, (this is very simplified just to get idea).
But more vectors direct affects things such consistency in texture patterns, noisy latents = noisy objects. Physics and how things moves, and much more. Bitdepth wise is the least of things.
I must confess my issue with Confy, regardless of quality and time is that to maintain the beast up to date is a bloody mess and takes all my time.
The chaotic bazaar style approach of plugins and extensions is absolutely non-production ready so I am very very careful when entering into a project with Comfy, avoid updates and have a rebuild image pipeline in place because things really do go wrong.
Thats where ComfyUI studio comes in. Or the new comfyui cloud. They comfyui team has been working on this for a while.
Not sure if those are going to cut it, seeing studio I feel is more of the same to be honest and the cloud is no longer free so at that point I rather use a professional from the ground up application that someone has been designing for professional use.
Anyway, I will check it of course once is out but my hopes are very low.
One interesting tidbit from a great channel on AI technology and business just described NVidia’s newly announced inference optimized GPU (special purpose version) and the notable fact that it includes video encoders and decoders, which is seen as a sign that video workloads may be a significant part of AI going forward (as opposed more general purpose LLMs).
I had a chance to do a side-by-side comparison between ComfyUI and Weavy after being inspired by this weekend’s VES event.
So far it’s Comfy 1 : Weavy 0 ![]()
I do like the UI of Weavy. It’s much easier to use for sure. But it seems very much geared towards generative workflows, not general VFX workflows from what I’ve seen. But it may also be early days.
I took the clean plate example from a few weeks back, where I removed the helmet and driver out of F1 car. In the Comfy workflow it worked well on the first go. It did clean object removal and honored the original shot. Of course it took a while to build the workflow and test it.
With Weavy there is no object removal node, and I couldn’t find any tutorials in that direction. On suggestion by someone, I tried Runway’s Aleph model and prompted it for object removal. First with a human prompt, and then with an AI re-generated prompt. Both times unusable results.
In the first version it kept the car, but put a steering wheel in which I didn’t ask for, and the steering wheel faced backwards in the car. Going around the circuit in reverse at those speeds would be quite the feat ![]()
In the AI re-generated prompt, did a better job with the generated part, but it create a whole new shot. Nothing of the old shot other than color palette remained.
And running Weavy costs $1.75 for this model each time I try a prompt. ComfyUI was local and free (with good hardware).
That isn’t to say that Weavy hasn’r good use cases, especially if you need to create assets for comps. But for the bread and butter stuff of VFX, it wasn’t convincing yet.
There are also other gaps. In Comfy it’s pretty easy to run your model on a frame range while you’re experimenting. Loading a video asset in Weavy, I didn’t immediately see a way of only running it on the first 50 frames. So I had to transcode the video pre upload to keep it short. I’m sure this will be added in due time, or maybe I didn’t find it. But that’s the downside to a simple UI.
More to be done….
If their business model is based on how much you use cloud services it makes me wonder if that’s why you can only work on generated content #followthemoney
Interesting question. You definitely burn through cash a lot more using these models rather than running some open source object removal model. And if they can take a small cut of that spend, it adds up fast.
It could be that this is their goal. Or it could be a necessity to attract investors and generate revenue to keep the lights on, even though they may want to do the other stuff.
It’s also easier to get mindshare when you promise easy tools to generate nice looking images and videos. There are so many folks willing to spend on that stuff. Clean plates are utterly borking and niche.
Also, I was privy to another attempt of object removal using Runway’s Aleph model. And it was utter garbage. The task was to remove an object from someone’s wrist. It succeeded to remove the object, but left so many artifacts and massacred the rest of the hand in ways that the cleanup job likely would have taken longer than just removing the object the old-school way.
A later iteration provided an acceptable result. Not sure how many prompts it took to get there. And if the video quality was comparable with a traditional process. I only saw the in-edit version.
However, what was originally a Flame task, became an editor task and the agency signed off on it. If the bar isn’t very high, that kind of thing will happen. Of course it could also create scar tissue that results in some job security.
Speaking off what Weavy’s main focus seems to be (around video generation), these two tutorials are a good hint:
Compositing a basic scene with different assets: https://www.youtube.com/watch?v=zQGutL3RZbU&list=PLu0WpBQHUSql4tPg9BHzvp-7ajhDMRFBE&index=3
and things like using image to 3D model, to scene composite: https://www.youtube.com/watch?v=9uTXPdZbidc&list=PLu0WpBQHUSql4tPg9BHzvp-7ajhDMRFBE&index=4
So it’s clear that you don’t need a compositor / After Effects artist anymore for some basic tasks (comparable to the shot of the pizza or coke can for the Sunday ad insert in the local paper - that cost still photographers their careers 15-20 years ago).
That is, if for now you don’t mind having a model re-do the image to get rid of the clunky mask halo for the earlier step. The images go through several seemingly regenerative/destructive steps. So for those that value absolute fidelity of the original capture (like Arri’s 13th bit), that must be stress inducing. But for the creative directors that just need a damn video to go out the door that’s good enough by 2pm, well, this totally works.
I think this makes clear where the battle lines are drawn at least for the next 6 months. Who knows after.
Now when we say that VFX artists will be needed for some time to come, I think that refers to more complex, higher standards of work. Because this stuff the CD can do himself, or someone with a much lower day rate than a Flame artist. As was quoted in this week’s VES event, for basic stills, the CD no longer calls a photographer or designer. They just prompt it.
This.
Quick add-on. After seeing the abysmal results from Runway Aleph, I took the same footage and fed it into my cleanplate setup on Comfy with MiniMax Object remover. Took 2min to adapt the pins in the matte tool, change filename, and nice result. Still only MP4, but usable under the circumstances.
ComfyUI 2 : Weavy 0
Using Coco tools and/or VHS you can avoid the mp4 insanity. Or maybe there was some other reason you went compressed.
Right, this was just the nature of the existing setup I had. I should update it to read/write EXR sequences and run a full resolution.
