Oh, there's no exponentials here. As a rule of thumb, square root on a modern desktop CPU costs about the same as division. Since we're doing both here, we might be able to save a few cycles by using rsqrt (=lower precision 1/sqrt) and a Newton round or two to refine, but even with div+sqrt it's not too bad.I've seen the "x+x^3/6+x^5/120" tanh() approximation before, and I haven't tried it yet. I've always heard tanh() was taxing, but also assumed that the Cmath library was using a LUT or something. Is it actually more expensive that this method, with three exponentials and a square root?
The suggested polynomial "sinh" is x+(x^3)/6+(x^5)/120 (or similar). That's a totally terrible approximation for sinh as such (with sinh growing exponentially, really any polynomial is going to be terrible), but it doesn't matter too much for tanh, because sinh/cosh approaches 1/1 quick enough that we really just need to get the shape of the knee right. So code would look something like (with divisions written in such a way that the compiler can optimize to reprocidural multiplication even without fast-math):
Code:
float x2 = x*x;float sh = x+(1+x2*((1/6.f) + x2*(1/120.f)));float th = sh / sqrt(1+sh*sh);[code]
Statistics: Posted by mystran — Mon Jan 29, 2024 8:22 am