Flame Benchmark Archive

@Waldi Just got my M3 Max with 128 gb ram. Did the test in 6:44. Desktop territory. It’s fast.

7 Likes

That is valuable intel. Thanks @snacks

So will the M3 Ultra, presumably next year finally be the mac that beats Linux? The rumor was that the M2 was supposed to have a 4 way connection of processors for the Mac Pro, but for some reason it failed. I suspect they’ll get that right on the M3, and we’ll soon see Macs that are 4x the current M3 MBP’s. Interesting times.

I would not count my chicken on that quite yet.

They’re bumping up against physics and stats in this case. Apple made a design choice which is brilliant for the low to mid-end of their line-up, but has a very defined upper boundary with their chips. They kind of painted themselves into a corner so to say, though they may not mind it all that much depending on how they value the very high end part of the market, which is minuscule to their business.

The main point is that they decided to go to a SoC (system on a chip) architecture, where a single silicon die has the cpu, gpu, and all the memory on a single chip. That makes this a massive chip as is.

Anything that you manufacture of this scale will come with a certain number of defects. So they over-allocate certain areas and then mark some parts of the chip as bad during QC, as long as enough of it works. Sometimes when you get a lower-end CPU chip, it’s actually the same chip, just not everything is turned on at the hardware level.

But the more you cram into a chip, the lower your probability that it’s defect free, or with few enough defects that it’s viable. From what I read when they tried the binding of two M2 Ultras for the Pro they just had a yield problem that made the resulting chip cost prohibitive.

Yet their architecture and performance relies on everything being on the same die, that’s how you end up with unified memory and all that. Once you get off-chip and high-speed buses between multiple chips you end up with another set of compromises. AMD has gone down that road with some of their CPU designs.

Those are problems that can be solved, but only really matter for the 0.1% or fewer of super high-end systems in Apple’s lineup. Versus for AMD (or others) that’s a bigger market segment. So I don’t see any incentive for Apple to crack this nut. The complexity / cost / benefit ratio is just not there. They’ll likely accept that their current Mac architecture has a built-in ceiling.

3 Likes

You seriously think a M whatever can beats a threadripper + Quadro at some point in short-mid-long term?

i recon M3 Ultra to sit somwhere between a 4070ti and 4080 in terms of gpu performance.

the pace apple is developing these chips is insane

2 Likes

I read this exact same thing as well. Apple do have clever engineers on board so they may be able to crack this but I certainly wouldn’t be counting on it, especially when the M3 already has good performance.

One other thing I’ve read is that ARM processors (Which is what Apple are developing) have a more limited instruction set which can result in a requirement of needing multiple instruction cycles that an x86-64 chip can execute in one. So for high end computing that is heavily CPU based, the x86-64 processors will be a lot faster. Since Flame is GPU optimised this will limit the CISC vs RISC benefits but they will still be there. I wonder if this is the cause of the Apple chips seeming to slow down on more complex comps that Linux systems for Flame, although there are definitely people on here claiming that big comps are totally workable on Apple chips.

These days, I don’t see a real world performance difference that is great enough that I wouldn’t choose Apple hardware or would be unhappy on a Mac which was simply not the case even a couple of years ago.

Just checked the spreadsheet too. It indeed is totally whack now!

Unless someone backed up or downloaded a copy of it, or someone is prepared to go back to a version that kind of looks right by going through edit history then it may need to be started again.

1 Like

I think that is too simplified of a view. Yes, RISC architectures rely on a smaller and simpler instruction sets, thus requiring the compiler to break certain things into additional instructions. At the same time these simple instructions run significantly faster (a few clock cycles vs. complex CISC instructions taking sometimes many clock cycles). You would have to compare the resulting clock cycle budget for comparable algorithms for a fairer comparison. RISC CPUs have been around for a long time, beyond ARM (HP had PA-RISC, the later Sun Micro CPUs where RISC architectures). x86 is a hold-out of sorts.

In modern CPU designs with deep pipeline optimization, branch prediction, out of order execution, and many other shenanigans I’ve lost track off the performance battles are beyond the basic instructions. It’s on this width dimension that ARM actually has pushed the boundaries the most. I forget how many parallel instructions they can decode and schedule (per core), but it was off the chart. Maybe I’ll find the article again, it was during the early M1 days. I don’t keep up with this as much as I used to. My assembly code, hardware, and OS internals day are decades in the rearview mirror.

I don’t have any specific insights on why Flame becomes slower on big comps on Apple Silicon, but I imagine it has to do with the GPU design. I don’t believe Apple has followed the proven architectures on the GPU front, which are some of the most parallel and mass-scaled subsystems. The on-board GPU is probably good enough for the normal workloads and video editing tasks (encode/decode), but not for the heavy compute that goes on in a Flame comp.

When you look at the M1 specs you read about < 40 GPU cores, whereas NVidia cards have upwards of 16K cores. Not a apple-apple comparison, but it seems to be opposite where Apple’s GPU cores are higher level, while NVidia excels at maximum parallel scale. But higher level optimized for what?

Also not another apples-apples data point, but just for scale - M3 Max: 92B transistors, but that covers the entire SOC, from CPU, GPU, system control, and presumably even the main memory. On the other hand a RTX 4090 card has 76B transitions just for GPU horse power (presumably including the NVRAM).

PS: Arm isn’t even the first RISC CPU Apple has used. PowerPC, which was between the 6800K and x86 history of Apple also was a RISC design, and was a collaboration between Apple, IBM, and Motorola.

1984-1992: MC68000
1990-1996: MC60020/30/40
1994-1998: PowerPC 601-604
1999-2006: PowerPC G3-G5
2006-2021: Intel x86
2021-2023: Apple M1-M3

1 Like

Yeah, this is all beyond my pay grade. Hence why I ended talking about Real World performance. It’s rare in comparison articles that they look at a system holistically and I’m sure that MacOS has a lot of optimisations under the hood to deal with their own hardware as it is so limited in terms of hardware variation. Linux and Windows on the other hand have to account for a plethora of hardware combinations to deal with. Performance can even vary greatly depending on cooling systems, equipment environment and optimal operating temperatures. The liquid immersive cooling data centres are insane and allow the hardware to run way faster than traditional cooling. You are certainly right when you say it is a complex topic.

In the end, if it works for your use case then it works for your use case. There are still pros and cons for Linux vs Mac for a Flame system but gone are the days when a Linux system could render the same setup in half the time of a similar spec Mac.

I think this is the best way of looking at it. Also, Flame is a very versatile app and when you look at what we all use it for, it’s difficult to find a representative workload. There will be things people to do, that will be better on one and vice versa. You just have to test and benchmark.

While performance does matter, I think the available storage and other hardware, and the usability and versatility of the system, and your access to IT resources is almost a bigger factor than the raw processing power for most of us. Especially as many of these systems now live on our desktops, not in a data center anymore.

1 Like

Or if someone wants to also have Adobe CC available on the same OS (one of many non-Linux software packages) then the decision is made for you. Some people don’t want the hassle of a dual boot Linux/Windows system.

Unless you’re just a cog in a big pipeline I think these other apps invariably come up for most of us.

Not a fan of dual boot though. The way I do this - I have a Linux Flame downstairs and a Mac Studio upstairs. The Linux Flame filesystem is cross mounted, and on the Mac I have a wide screen monitor, so when I remote into the Linux Flame I can see both monitors. If I do just color work and need the reference monitor, I’ll work on the Linux Flame directly. If I need other apps, I remote into the Linux Flame from my Mac so I can have Adobe and Flame side-by-side for quick workflows. And if it’s a simple job or I don’t feel like working downstairs, I just run Flame on the Mac where I have the bigger monitor.

That skips all the dual-boot hassle.

1 Like

Sorry I missed this!

Looks like we may have to start a new benchmark list. This one is all borked up. I looked through the history and can’t make sense of it. anybody?

Rand - Does this do anything for you?