So after a few more hours of elbow grease I’ve got Flame 2025 running stable on this system.
It was the NVidia driver all along. I found later that every time it got stuck on something, the next line in the log file was a video mode change (i.e. the driver wouldn’t activate the new mode).
At first I tried a myriad of ways of manually installing the NVidia driver to have more control. I tried elrepo, I tried directly from the NVidia repos. But it was just variations of the same problem. And it was different version of the 550 driver for RHel 9.
Eventually I came across this note:
So I gave up on Rocky 9.3… Instead I installed the ISO from 2024 which I knew worked well on the system, but then installed DKU 19 and Flame 2025. That combo is supposedly supported based on the system requirements and works fine for me. End of drama cycle.
Rock 9.3 apparently has a lot of issues with NVidia driver stability based on the sheer number of forum posts trying to deal with it. A lot seems to have to do with pairing of specific kernel versions with driver version, and that link breaking if there is any kernel update (see here and here).
Incidentally, DKU 19 for Rocky 8.7 ships with the 535 driver of NVidia which is much better behaved.
Anyway, moving on, back to billing work.
Footnote 1: This is where it was disappointing that ADSK washed their hands so fast of the ‘this is not a supported config’. The only thing on that system that wasn’t in their support matrix was the motherboard and on-board I/O. All the other items, in particular the A5000 is in the matrix. They missed an opportunity to learn about the fragility of the NVidia driver that may be impacting other systems as well, including some variations of HP, Dell, and Levono systems. Such is the world.
Footnote 2: In case it’s of use for anyone. If you have trouble with the video drivers in Rocky and get locked out of boot, it’s useful to know how to force Grub into text mode after the fact.
- Halt the boot once the Grub selection screen appears by hitting cursor up or down
- Enter ‘e’ to edit the Grub commands (it appears that you only get to select, but they’re all editable, and you can add options)
- At the end of the main kernel line add ‘systemd.unit=multi-user.target’
- Hit F10 to continue the boot.
With that Rocky will boot, but bypass the switch to the NVidia driver and give you a shell login from where you can then debug the rest of the system.