So based on this:
I’ve got it going as a Pybox on Mac as a full MLX port with an option for 8bit quantization… Next step is to reverse in Cuda support with what I have working. Hoping to have it ready tomorroow.