NVME Raid - suddenly stopped working!

After a rmac reboot i can no longer access the internal NVME raid. I use for a framestore.
I get a little pop up window saying “The disk you attached was not readable by this computer…then three options to ignore, eject or initialize”

I have looked in Drive Utitlity, and the raid is not showing up.

Anyone else seen this happen? or got a clue how to resolve it?

Because of this, none of my projects are able to open in Flame 2024.2 - the dialogue box is empty and when i click to pick a project, they are all greyed out.

All rather worrying considering the nvme is only 6months old.

edit:-
weirdly, if i reboot into Recovery Mode, the nvme is there in Drive Utility, and looks ok - ran disk First Aid and it seems fine. But then reboot in normal mode and i get the message about it not readable by the mac.

first of all: Ouch…

What type of RAID is it?

Also, there’s a difference between seeing the physical drives in the RAID and the logical drive that represents the RAID.

Either way, the recovery should happen through whatever software you use to manage the RAID, not the operating system itself.

1 Like

it was setup using the Apple Disk Utility.

The raid is a Highpoint controller (SSD7140) with 4 x 2tb Sabrent Rocket M.2 ssd’s

As i mentioned - in Recovery Mode, Disk Utility can see the raid and its showing the right amount of data used (1.76tb out of 8tb). First Aid says its all fine, yet when i restart the Mac normally, it claims the drive is unreadable.

At this point, i am stumped!

I would try to setup a new user in macOS just to make sure, that no faulty preference or equal file in your current user path is the reason for your behavior.

1 Like

Not familiar with that particular one. Could be that some driver/extension got blocked.

On the Highpoint site they do have RAID management software. Even if you didn’t set it up that way, it might still provide some helpful diagnostics: https://www.highpoint-tech.com/ssd7000-products-downloads/

I think there are also ways of checking in the System report which devices are discovered, extensions that are loaded, etc. I went down that rabbit hole in the summer over an LTO / SAS adapter.

Is that something you want / can diagnose yourself, or do you have IT folks that can help? Not sure how good Highpoint’s support is. Never had to use it so far (I do have a different type of their controllers in another system).

1 Like

Also - anything that has changed since the last reboot prior? What was the impetus for rebooting the system?

Also, did you check the logs right after reboot?

Open a terminal, type in ‘sudo dmesg | more’ to read it. Maybe use fgrep to search for the RAID related entries.

1 Like

If you go into settings on the Mac, then general (or about / system report). At the bottom there is a System Report button (moved around in various OS versions).

That prints out a hierarchy of your PCI devices and also all the loaded drivers/extensions…

You can sort on ‘last modified’

1 Like

i have the highpoint software - it just shows the enclosure but not the raid/ssd’s.

just tried the new mac user suggestion - same result.

Already been through the system report and could not see any of the highpoint/sabrent stuff.

Just tried Recovery made again, and its showing up without issue…no clue why the mac itself is suddenly having problems.

It was restarted after my wacom decided to stop responding - had been editing for a couple of hours when it just lost its connection. Shut down all the apps, did a reboot and since then been unable to mount the nvme.

Not looked at any logs…will do that next

Just tried your Terminal suggestion whilst in recovery mode - didnt work, so will try after a restart

Makes sense.

After the reboot is your Wacom working again? Reason: was that a temp hiccup or maybe indication of I/O hardware error/damage?

I haven’t spent that much time in the recovery shell, but assume you may have access to the dd or similar command. If so, and if you can access the data in the recovery tool, you could copy everything off onto another drive in recovery mode, then delete and rebuild the array and copy back. That’s an idea, I haven’t done this myself, so not sure if it would work. Also not sure how picky Flame is on file details and if it would be happy with a restored drive.

1 Like

What hardware we are talking about? Is it a Intel Mac Pro? If so a PRAM reset would be the next I would try:

PRAM Reset Intel Mac

1 Like

yes, the wacom is working fine.

The dmesg in terminal just returned a massive list of stuff - thousands of lines that all look the same - and i could not decipher any of it.

Last extension to be modified was the LaCie Raid software 3 days ago. The mac has been restarted several times since then so i would be surprised if its that causing an issue.

Tried PRAM reset - no joy.

edit:

as its finishing time for today, i think i’ll power off over the weekend then try reseat the pci board before powering on again on monday.

1 Like

If the NVME is tethered by a PCI slot, swapping the slot may improving the trouble shooting also. Good luck.

2 Likes

@Lightningad I’ve seen this on my 2019 MacPro as well but with OWC Accelsior NVME raids.

My theory is it is heat reltated caused by dust buildup and or the stacking inside the Mac on the PCI board allowing for too much heat.

I have 2 OWC NVME raids and this has happened to me 3 times now and have resolved the issue by;

  • blowing out the dust

  • unseating the cards and blowing out the dust then reseating them

  • and finally I’ve arranged them to allow more airflow and so one is not directly on top of the other.
    So far no missing NVME drive on reboot.

Here’ a screenshot of how they are arranged now.
One drive is on slot 5 and the other in slot 7

1 Like

I’ll qualify that when it has happened it was always only one of the nvme drives on one of the OWC raids.
I think that on reboot the machine can get pretty hot for a very short time as it has only happened to me on a reboot, once(maybe twice) on a OS update. Being that with an OS update on the Mac it usually reboots at least twice.
But this is just a guess.

1 Like

Intriguing!

I had Flame rendering for an hour before quickly jumping into Protools. It was after a bounce down from Protools that the problem started. Quite possible that it was warmed up inside.

On monday, i’ll be resetting the SMC and reseating the board, so hopefully when its powered up it will work again.

Edit: however that doesn’t explain why the OS cannot see the drive but Recovery Mode can…its all connected up, yet only visible to one os. I’m starting to think I need to reinstall the OS.

I think your edit is totally valid. I suspect that it sees the RAID slightly different in recovery mode. Some config of the RAID may not be loaded the same way.

Which RAID mode is this in? 0, 1, or JBod? I wonder if in recovery mode you’re seeing the underlying drive, not the logical drive. And in RAID 1 that would contain the same data, so size would make sense. Maybe one of the of the other drives is not connecting, but in recovery mode that may not be obvious. Hard to tell without sitting at the machine. ----- Just theorizing.


Here’s what recovery mode disk utility can see. Its seeing a striped set of discs using raid 0

1 Like

Helpful.

Q: This is listed as a 8TB volume. I see 3x 2TB drives on the bottom. So is one missing? 8TB in RAID 0 should be 4x 2TB. Doesn’t look like there’s a scrollbar hiding one.

If there’s a drive missing, that would explain why it’s not coming up for you.

1 Like

I hadn’t spotted that!
Hopefully it might cool down for next week and reappear!

Right. Instead of reseating the RAID card in the PCIe slot, I would remove and re-seat the NVMe themselves. Could be that this is where the dropout is occurring.

1 Like