Flame In The Cloud

kyleobley · November 10, 2022, 12:44pm

That’s a pretty interesting offering, thanks for sharing. Why do you call StreamBox a joke though? (Serious question)

allklier · November 10, 2022, 1:36pm

Interesting point. Could you elaborate a bit more?

I looked at them at the beginning of the pandemic. They seemed like and old-school outfit from many years ago that don’t keep up well. Lots of product turn-over. Heavily used in the color facility world, but everyone complains and super expensive.

allklier · November 10, 2022, 3:24pm

There may be a use case for 2x GPU even for Flame depending on your needs. If you use the NVLink feature it increases your visible VRAM accordingly even if your app only sees one GPU. So two A6000 could give you 96GB of VRAM if you ever needed that much.

I do have an NVLink board on order for my two A5000 and will test it.

If the theory holds, and you need 48GB of VRAM you may actually be better off with 2x A5000 rather than 1x A6000 (if you have the slots), as you get to 48GB but gain a lot more graphics cores than going from A5000 to A6000.

ALan · November 10, 2022, 3:35pm

Flame doesn’t work that way.

allklier · November 10, 2022, 3:55pm

Good to know, though unfortunate.

Josh_Laurence · November 10, 2022, 5:41pm

@ALan’s thoughts on NDI with Flame in the cloud is very close to our experience. It’s functional, but it doesn’t come close to the experience we used to have in the room. We’ve also tried Streambox with some success and the results were better. But for now, we don’t really do it as most of the clients don’t really care to watch it in person. It’s really rare for them to want a session.

Josh_Laurence · November 10, 2022, 7:33pm

I think I totally missed this thread altogether, but it’s great to weigh in after everyone’s relevant and excellent arguments. I’ve been on an AWS based Flame for the past 14 months straight and my experience has changed from “WTF is this manure?” to “I like it.”

I think if I had to start from the top again, these would be the areas of consideration for me:

Financial comparison
Ease of use
Processing power
Storage

Financial Comparison (warning: this is the least helpful part of this post)
The EC2 instance you spin up is just one of many costs associated with setting up a company on AWS or really any cloud company. You need some type of infrastructure in the cloud. Think of the cloud like your server room - you need more than just a machine to do the work. I’m not an expert in this part of the process, but think of it like a doodad to help with idiosyncrasy A, and another for B and another for C…each it’s own cost. If you have TPN security requirements, it gets even more involved. You might also need specific kinds of networking and cost monitoring tools which add to your bill. To me, it’s not straightforward, but it is a-la-carte and I’d get someone you trust with experience to sherpa you through this.

If you’re just working on a project and don’t need a team, I do know there’s a way to do it with a single instance, a license and the local storage (that’s how we started). But like any setup, the config will grow. So my takeaway is I think it’s worth reaching out to all of the above: a provider like @Gunpowder, a reseller (not sure if Alt or Cinesys provides AWS consulting) and AWS directly and learn as much as you can to get comfortable with the costs.

A number I’ve been told we’re working with between $2-3k/month per user all in, but that’s expected to go down. That includes the AWS support, the machines, the storage, the cables, the electricity, the Air Conditioning, the UI and everything else.

To me, I would be really interested to see a list of all the configs one needs to get from AWS listed under an AWS heading in the Hardware requirements sections of the manual. Then I’d love to see a step by step how to set up an instance from Rocky Linux install to Flame software install in the cloud.

Ease of Use
We went through five or six configurations of our setups before arriving at something that rivals a terrestrial office. The first ones were useless - awful playback, hard to come back to the machines after leaving, terrible USB issues, etc. But the ones we use now are all interconnected, I get 3k EXR playback without rendering or lag on my 4k monitor. The interactivity is excellent and the machines are responsive. I can spin up an extra machine for an additional team member (or five if needed) instantly and everyone enjoys the same look and feel as I get. When we’re done with a large job, we shut things off till the next one.

One of the things that made a pretty big difference to us was getting very cheap actual mac-minis, putting Windows on them and using them to connect to the cloud. The OS-X Teradici solutions were noticeably slower by a lot but once we got the MacMini Windows machines, it was really smooth. We run nothing else on those machines, just Chrome and Teradici (most of us).

In the 14 months we’ve been doing this, we’ve completely upgraded our operating systems, hardware used for the instances, and our software without a hitch whatsoever. No extra machines lying around to give to someone down the food chain in the office. No additional outlays of money. Just make new Flames on better machines as they are available and go to work.

Processing Power
Our current g5-8L machines feel every bit as fast as the z840s with 24G P6000 cards I used to use. They are not the fastest Flames out there by a long shot and when I ran the benchmark, they didn’t perform as well as the threadripper. And if horsepower is your primary consideration, @ALan is dead on: these machines are not nearly as powerful as what you can get on your own. But for my needs (I know this is gonna be tough to hear), they’re amazing. I believe the speed of a machine is not even the third consideration for my setups. If I have organized a job correctly, have producers that can handle tough clients, then I’m more concerned with the artist on the box than the box itself. I believe machine speed is directly correlated to the experience of the artist - so I don’t really mind a slower machine, and I still get ridiculously fast results from the amazing folks that I work with. I’m not convinced I would get my jobs out any faster with more horsepower.

Storage
We use WEKA for storage. It might be expensive - which I hear about endlessly from management. But it takes care of a lot of the engineering and manual configuring of fault tolerance and tiering of relevant data. I kinda love it, but it is annoying to get grilled about how we could do it cheaper. My answer is yes, that might be true, but I would need a team of developers to make the same thing as what WEKA does. There are other storage options out there. We started by using the included NVME drive as our cache but it was awful having to move stuff around manually. I have been told that FSx - another fast storage option - would work, but it also needs a lot of configuration to get going. Maybe someone using that can weigh in?

I don’t have a dedicated engineer that helps our Flame team. Our amazing engineers went through the setup and re-setup and re-setup process until I stopped whining and then they let us be. Once we landed on WEKA, things settled down quickly. We did have help in the beginning from Tom and Tom who are just wonderful. But nine times out of ten, if I’m submitting a ticket to the engineers, it’s for help running a script that quickly gets an available instance in the cloud - which can sometimes be hard, especially in the middle of the workday.

Summary
I’m sorry I don’t have more granular knowledge to share about how to set these things up, maybe one day. But I would re-iterate that if you’re interested in going down this road, reach out to @Gunpowder (Tom and Tom) and Steve Strong @oceanasteve and ping AWS themselves and see if you can find someone that’ll share. I think there’s a business out there where we can just reach out to a provider, fire up a machine, use it, and shut it down. That kind of thing seems tantalizingly close.

randy · November 10, 2022, 7:35pm

Hiya everyone! Please pardon me, I was contacted by a forum member earlier about deleting a post and accidentally deleted the wrong post. ACK! So, if things appear out of order it is my fault. All content has been restored and assigned to the original poster but may appear slighlty out of order.

Thanks and sorry for the noise.

jordibares · November 10, 2022, 8:23pm

It is about the Total Cost of Ownership, not just a bunch of servers or workstations in a room, AWS TCO includes things such as;

Power (is yours redundant?)
A/C (is yours redundant?)
Cost of square footage in your office space for rent (is yours redundant?)
Internet connection (is yours redundant? Is it 10Gb or faster?)
How about the employees who manage the server room or data center?
What about the facilities management? What about janitorial and maintenance?
Cost of racks? Power bars? Raised floor?
Oh yeah, the servers cost money……this is what most companies consider ALONE as the cost of their infrastructure
How about the maintenance contracts for servers? Cost of downtime when hardware is broken and you’re waiting on a replacement?
Network routers, core switches, rack-based switches….maintenance contracts and operational overhead for all of these
Consider the employees’ time spent racking/cabling/powering the servers, then installing the OS and configuring so you can reach them on the network
Are you considering how much each byte of network traffic, and each gigabyte of storage costs?
Oh, did I mention all of the time/effort/cost to SECURE everything in the above list?

Consider all of the above in your assessment, then look at AWS. Not only are you getting ALL of the above, but you also get:

Horizontal scaling
Ability to deploy in different geographies
AWS researches, implements, scales, and secures new services and features for you, with no effort on your part except to adopt them
Ability to create resources, and use them for as little as a few MINUTES and destroy them, all at a small cost
Built-in security features that are BETTER than your in-house cluster
Ability to deploy onto many different solutions, so your architecture matches the requirements rather than the other way around
vertical scaling factor, that it requires $0 investment on the part of the customer, and you’re never left over with hardware after the application is retired.
To maintain a server farm, you need people available 24x7, and you need to be able to cover for vacations, illness, personnel changes, etc, etc, etc. Not only are you paying for their salaries, but you’ll also have to pay for their benefits, and the corporate infrastructure overhead costs like management load, HR, accounting, insurance, and other employee-related costs. Conservatively, an employee costs a company 50% more than just their annual salary.

So yes, we are not buying the same thing, and one could argue that a threadripper with the latest graphics card is better and faster in the same I could argue that spinning 500 machines for rendering in less than 2 minutes is more valuable to me.

There is one thing I must add, working on the cloud requires a lot of knowledge to get it right, this is not the panacea and there are lots of gotchas, some of those could take your company down so just be aware it is something to do properly, with people that know what is going on.

space_monkey · November 10, 2022, 9:56pm

FYI I worked at a place that had central S+W and the server had a failure I cant remember what but its a nice way to watch 10 flame artists walk around do nothing for an entire day, everything has its pros and cons

allklier · November 10, 2022, 11:37pm

Thanks for elaborating in such detail. In complete agreement on these TCO factors. Some of them apply mostly to larger shops, but on the flipside smaller shops often lack the expertise entirely or don’t do things they should be doing.

It also depends on which market your are serving. This TCO for redundant power, AC, Internet of course passed down to the client. If you are working on a show that must air, that is must-have spend whether in-house or at AWS. There are also customers who may be more cost sensitive and if there are reasonable delays, it’s an acceptable tradeoff. If the job can only be done in time with multiple artists working together, you need different infrastructure than when a single op is sufficient.

So yes, if you operate in this larger facility space, AWS may be expensive, but if you compare the full TCO you may come out ahead, because that admin in the AWS data center is utilized 100%, whereas your local 24x7 coverage may have a lot of expensive ‘what if’ idle time.

To complete the apples-apples, under what scenario is a local system (small or big) beneficial? Any at all?

In my case as a boutique shop, I don’t benefit from the scale. I’m certainly keenly aware of my IT overhead tax. I happen to have the expertise, but it’s non-billable overhead that has eats into margin. I do use other apps than Flame and have a variety of peripheral scenarios that are complicated in the cloud. So having local systems has value to me. I use multiple AWS services, and occasionally spin up an instance for various uses. I’m aware of the complexity and learning curve that comes with that. So for me a local system provides flexibility and a predictable cost envelope that I guess takes a different effort to manage, but is still less of a headache.

jordibares · November 11, 2022, 4:06pm

Just to be clear, I am not saying everything should use the cloud, I am just arguing that we are not buying the same thing, the same services and resources, etc…

A local system? Sure, there may be reasons, like especialised hardware, always-on setups where you need instant availability, experimental setups, zero latency for graphics, no video compression, proper monitoring, professional grade audio and others I can’t even imagine.

To answer your boutique shop point, yes, it may very well be that that is the perfect solution for most of the flame post-houses out there, yes…

allklier · November 11, 2022, 4:10pm

One interesting data point, just on the cost of hardware - AWS (and all of Amazon) used to have all their servers on 3 year leases and then rotated them out. So that’s a good time horizon to keep in mind for amortizing any hardware investment you make. If you do get a local workstation and other infrastructure to run it properly, can you make it pay for itself in 3 years or less.

randy · November 11, 2022, 4:14pm

@Josh_Laurence has a great perspective on how long it ought to take on kit paying for itself.

Josh_Laurence · November 11, 2022, 4:32pm

Ha! @randy, I was just composing my rant…

Jan, you make a really good point about aging hardware and writing it off within a three year period. Especially how it pertains to Amazon’s hardware cycle. Thank you!

From a tax amortization perspective, I think Andy’s accountant might tell you that you could also write things off a lot faster if you wanted. The last time I talked to a tax professional about my situation, the advice was I could write it all off in the first year of purchase. But that might be NJ law. YMMV. The faster write-down does change how things should be perceived from an emergency or urgent purchase perspective.

I think of gear the way I think of perishable food - it need to be used before it rots - you’re on the clock as soon as you unbox. So the flip side of that coin is you need to make a certain amount of money in that time to justify the expense before getting more. In the late 90s we came up with a 20x metric for capital expenditures (CapEx). If you spend $1000 on hardware, you need to make $20,000 over its life to cover all the things that go with it, direct and indirect expenses in the business. That covers Overhead + Labor over the life of the gear.

It’s all very fuzzy logic but it helped us with a starting point for deciding to make purchases and giving us goals for revenue to cover things. It’s waaaaayyyy over simplified, but on a macro level, 20x has worked well for me for many years.

allklier · November 11, 2022, 5:42pm

I look at the tax write-off as an extra benefit, which certainly is material. But I think that the core formula should be more like cost vs. revenue. It’s a bit easier over in the camera department, since they split out labor and kit separately, and then you can look at the math rental houses do to price cameras, lenses, etc. Rental houses generally sit between 2-5% of cost for a daily rental with discounts for longer rentals, for perishables like cameras. C-Stands last longer…

Your 20x number is 5%, so you’re totally in the ball-park with 20x.

Your 20x is I think is the revenue on the job. That includes the cost of the artist, the 10% marketing overhead, the admin overhead, any assists, office cost, etc.

Take your 5% number towards system infrastructure (not counting the office wide infrastructure), and you bill out a day at $3,000 for a Flame seat. So every day billable day that accrues $150.

If you buy a $30K system for arguments sake (including all add-ons to make it usable), you need 200 billable days to break even. In a year you have a max of 250 billable days. So if this system is used every day for one year, and you have the business to bill full facility rate, you get there fast.

On the other hand, if you’re a freelancer/boutique and bill the client $1,600/day. You use your 5% again, you only accrue $80 on a billable day. Let’s say your not solidly booked and only score 125 days (50% load). Now you break even at exactly 3 years.

You drop below that, you’ll have to get more years out of your system, buy something less expensive, or achieve a higher load factor. Maybe as freelancer you can go a bit above 5% because you have less overhead, work out of your basement, answer the phone yourself, etc. (though that eats into your billable hours).

gregbennett · January 30, 2023, 9:14am

I feel like this might be the right thread to ask this question. I have a little laptop (oh and that whole vmware thing didn’t work, or I gave up on it LOL) and I have decided to try to set up flame in the cloud again. Wow this is complicated.

I have been following the guides closely and I have opened up a case with AWS. I wonder if any of you can give me a tip on what I did wrong.

I think the reason why the AMI is not showing up in my images in the aws console is I am having an error running the following command in the AWS CLI:

aws ec2 import-image --region --description "" --disk-containers file://C:\Users\admin\Desktop\containers.json. --role-name "vmimport"

Here is the json in the json file containers.json

[
{
"Description": "",
"Format": "vmdk",
"UserBucket": {
"S3Bucket": "",
"S3Key": "<adsk-baseami-rockylinux-85-v3.vmdk>"
}
}
]

and the error i get after attempting to run the command is:

"The system cannot find the file specified."

also this other command i ran to check it out:

aws ec2 describe-import-image-tasks

it returns this:

"ImportImageTasks": [
{
"Architecture": "x86_64",
"ImportTaskId": "import-ami-00e693c0191db39ce",
"Platform": "Linux",
"SnapshotDetails": [
{
"DiskImageSize": 0.0,
"Status": "completed",
"Url": "s3://flame02/adsk-baseami-rockylinux-85-v3.vmdk"
}
],
"Status": "deleted",
"Tags":
},
{
"Architecture": "x86_64",
"ImportTaskId": "import-ami-04105412778abe0d9",
"Platform": "Linux",
"SnapshotDetails": [
{
"DiskImageSize": 0.0,
"Status": "completed",
"Url": "s3://flame02/adsk-baseami-rockylinux-85-v3.vmdk"
}
],
"Status": "deleted",
"Tags":
},
{
"Architecture": "x86_64",
"ImportTaskId": "import-ami-095a8cff3eb41da32",
"Platform": "Linux",
"SnapshotDetails": [
{
"DiskImageSize": 0.0,
"Status": "completed",
"Url": "s3://flame02/adsk-baseami-rockylinux-85-v3.vmdk"
}
],
"Status": "deleted",
"Tags":
}
]
}

hmmmmm it says DELETED

well my first attempt to upload the vmdk from the drive on my laptop failed. So I had to upload it again.

Maybe this is the problem

Should I rename the vmdk on my laptop then upload it a 3rd time? And create a new s3 bucket? Would this fix the problem? It takes about 12 hours but that’s fine I just want to get it going.

I just need this vmdk to show up in the AMI so I can launch the instance in ec2.

Maybe there is a way to do this and paste the json code in the amazon website and just not use the aws cli???

Thanks.

cnoellert · February 1, 2023, 1:56am

I don’t mean to be unhelpful but when I read this, I know, balls to bones that I’ll never run a flame in the cloud…

Life’s too short.

gregbennett · February 1, 2023, 4:35am

Well I like to try new things. I hired some help. He told me there were a few typos. like the <> in the json.

Had to get rid of those for starters. I have the AMI going now. If it’s gonna cost me one hour to hire the help again and get it going, that’s probably what I am going to do.

I mean everybody rents everything nowadays. Cars, houses, furniture, you name it. Might as well rent servers too and GPU’s from AWS. I can’t afford buying the latest and greatest every two years right now.

cnoellert · February 1, 2023, 6:00am

I admire the resolve. Just not for me….