Perform the GELU (Gaussian Error Linear Unit) activation function on an input matrix:
C[i][j]=GELU(A[i][j])
The GELU function is defined as:
GELU(x)=x⋅Φ(x)
where Φ(x) is the cumulative distribution function of the standard normal distribution.
A common approximation for GELU is:
GELU(x)≈0.5x⋅(1+tanh(2/π⋅(x+0.044715x3)))
Input:
- Matrix A of size M×N containing floating-point values
Output:
- Matrix C of size M×N containing the GELU activation values
Notes:
- Both matrices A and C are stored in row-major order
- You should implement the approximation formula for GELU defined above
- GELU is commonly used in modern transformer-based neural networks like BERT and GPT