Tensara Logo

tensara

Back

An issue in GELU results

hdnhan

·

Jan 31, 2026


Fomula: GELU(x)=xΦ(x)\text{GELU}(x) = x *\Phi(x)

From PyTorch's official docs (https://docs.pytorch.org/docs/2.8/generated/torch.nn.GELU.html):

When the approximate argument is ‘tanh’, GELU is estimated with: GELU(x)=0.5x(1+tanh(2π(x+0.044715x3)))\text{GELU}(x) = 0.5x\left( 1 + \tanh\left( \sqrt{\frac{2}{\pi}}\left( x + 0.044715x^{3}\right) \right) \right)

But reference solution:

def reference_solution(self, input_matrix: torch.Tensor) -> torch.Tensor:
    """
    PyTorch implementation of GELU.
        
    Args:
        input_matrix: Input matrix of shape (M, N)
            
    Returns:
        Result of GELU activation
    """
    with torch.no_grad(), torch.autocast("cuda", enabled=False, dtype=input_matrix.dtype):
        return torch.nn.functional.gelu(input_matrix)

which means approximate=none\text{approximate}=\text{none} by default, not tanh\tanh

For example, when x=5x=-5, the expected result should be 00, or more precisely 2×107\sim 2 \times 10^{-7}. Simple check:

  1. Go to this site (https://cpp.sh), and paste the script:
#include <iostream>
int main() {
    float x = -5.0f;
    float y = 0.5f * x * (1.0f + tanhf(0.797884f * (x + 0.044715f * x * x * x)));
    printf("%.6f\n", y);
}
  1. Or go this website (https://www.wolframalpha.com/input?i=0.5*x*%281+%2B+tanh%28sqrt%282%2Fpi%29+*+%28x+%2B+0.044715+*+x%5E3%29%29%29+where+x+%3D+-5), the result should be shown as 2.2918×107-2.2918 \times 10^{-7}

Comments