ML Timewarp - Tech & Business

Continuing the discussion from Flame Machine Learning Timewarp, now on Linux and Mac:

Moving this into a separate topic.

Thanks @talosh for the details. Very helpful.

Out of curiosity, I looked a bit deeper into the Vimeo90K dataset. From what I could gather it was created by MIT (http://toflow.csail.mit.edu/) and consists of essentially 98,900 videos they semi-randomly downloaded from Vimeo. The link has the list. I checked 3 of them out - one was a random video production company, another out-takes of a wedding video, the third one actually an animation.

All those videos in the list that are still there, were marked with visibility ‘public’ on Vimeo and ‘download enabled’, which for better or worse is semi-default by Vimeo. But there’s no license information, and no indication that any commercial use of these videos would be permitted if you asked their owners. They’re unwitting bystanders. One of the videos was uploaded 9 years ago, long before anyone would worry too much what someone might possibly do with it.

Certainly fine for MIT to do research with such data. But others using these data sets in other commercial productions may be a stretch. Just it being public and downloadable doesn’t imply license grant. Just like you can’t grab an image of Google Search and paste it into an ad you’re finishing without checking the copyright information.

BTW - I’m in no way meaning this as a critique of ML Timewarp as a tool and all the sweat and equity @talosh has been putting into it. Quite the opposite. It’s fantastic for us to have access to this, even more so since it’s a labor of love and nothing else.

But I think it’s good for us as a community to have more of a conversation about these questions and being more mindful. Would love to hear hear other’s thinking about that. Are you concerned, or don’t care, as long as the shot gets out the door on time?

If anything, this could be a call to arms, to help tools like ML Timewarp to get access to training sets that actually have proper commercial release.

I cant speak for Autodesk, obviously, but I think this is part of why it hasn’t been incorporated into Flame. Currently, we are using it on a Napster type basis. It’s all about sharing.

I think some guys has mentioned that they’ve seen several jobs they were working on at Vimeo90K.

I think the most useful direction here would be to create a relatively easy way for everyone to use their own data for training or fine-tuning and that’s exactly what I’m trying to implement now with flameSimpleML.

Tests has to be done here but its quite possible that with the similar amount of data fed to the model it might converge to very similar results.

3 Likes

I’m not sure if it’s practical for everyone to train their own model and prep a dataset. License wise that would of course be the easy solution. But it’s a different lift than using the tool, or maybe doing a one-off CopyCat model.

What we could do is help source material that is properly licensed and add it to a pool. For this to work it has to be a sizable pool. There’s a reason MIT harvested Vimeo. They wanted 98,900 videos. And that pool has to be representative of the type of work and quality we wan to use the tool on.

1 Like

I think It is more about being able to fine-tune the model using your own data and having some staring point for it. I’ll try to make a test with using only my own data to check how far it affects the validation on Vimeo90K.

Another thing is that Vimeo90K is a de-facto standard and currently being used in almost anything related to computer vision out there.

1 Like