hot take: if you are working in Scene Linear it doesn’t matter what order you do them.
I had a hunch that some of this issue may be due to rec709 destroying highlight detail so I ran a test.
I made a 3d object in action, set the colorspace to ACEScg, and my viewing lut to “ACES to SDR video (rec709 limited)”.
I lit the heck out of it so the highlights were well above 1 (my rim light was set to 10,000 brightness). I rendered it out both sharp and motion blurred. With the sharp image I applied a defocus first, then a directional blur to simulate the “defocus first” approach. Next, with the already-motion blurred one I applied a defocus on top of that to simulate the opposite approach.
There are some differences but they’re all within the realm of taste; none are clearly better.
As someone who spent years rolling my eyes at people telling me, “scene linear is better! the math works! I can’t explain any of it, but it does!” I feel examples like this are useful. The reason this works is because scene linear mimic’s real-world light values, which means that something that looks white IRL is BRIGHT AS FUUUUUCK.
The highlight in this shot meters 135 and the shadow is 0.04 (approx 16 stops of difference) before any blur is applied. When blurred (motion and defocus) that highlight stays bright (metering around 3.0 in all three examples depicted), regardless of the order in which the operations happen. In rec709 the highlight is clipped to 1 and the shadow is 0.07 so you have to do a lot of labor to retain highlights when blurring (motion or defocus)