Forgive me for playing devil's advocate, but if you don't trust its response in the sub-bass, then why would you trust it anywhere else?
I have looked at some of Jaakko's plots, and think he's done some amazing things with his apps. But prefer a somewhat different approach, which is based more off of actual measurements than an idealized target (which may or may not be relevant to the measuring system being used).
Imo, the Harman curve was never intended to be used as a target for equalization purposes the way Jaakko and some others are employing it. It was intended simply as a
very rough guide for a headphone response based on the subjective preferences of listeners, and the sound of loudspeakers in the room. No more, and no less.
If you want to base your equalization off Oratory1990's graphs, then here is the approach I would use. Create a list headphones which you believe are as close to the sound you're after as possible, based on your own listening experience, and (if necessary) the opinions of others you trust. Average the responses of all those headphone together. And then use that averaged frequency response curve as your target for equalization.
Whether you would get a substantially different result with this type of approach than the one you are using now, I can't really say for sure. Imo though, an approach like this can potentially deliver a much more accurate result than a more automated system (like AutoEQ) will. It may still result in some over (or under) compensation though, simply because EQ-ing is never a 100% exact science. And headphone measurements are never 100% reliable.
Using actual headphone measurements as a basis for your target response curve will do a better job of capturing, modeling and eliminating (where necessary) the specific resonant characteristics and other idiosyncratic properties of the system being used to capture all the measurement data than a more idealized and smoothed target (such as the Harman curve) would imho... Others may disagree.
The larger the list of headphones you use to compute your average response curve, the better, generally-speaking btw. Because a larger sampling will generally do a better job of distilling/filtering any unwanted noise (or other strange behavior) from the individual headphones out of your final result.