Fomula:
From PyTorch's official docs (https://docs.pytorch.org/docs/2.8/generated/torch.nn.GELU.html):
When the approximate argument is ‘tanh’, GELU is estimated with:
But reference solution:
def reference_solution(self, input_matrix: torch.Tensor) -> torch.Tensor:
"""
PyTorch implementation of GELU.
Args:
input_matrix: Input matrix of shape (M, N)
Returns:
Result of GELU activation
"""
with torch.no_grad(), torch.autocast("cuda", enabled=False, dtype=input_matrix.dtype):
return torch.nn.functional.gelu(input_matrix)
which means by default, not

For example, when , the expected result should be , or more precisely . Simple check:
#include <iostream>
int main() {
float x = -5.0f;
float y = 0.5f * x * (1.0f + tanhf(0.797884f * (x + 0.044715f * x * x * x)));
printf("%.6f\n", y);
}