Perform matrix multiplication followed by Swish activation and scaling:
output=scaling_factor⋅(input⋅weightT+bias)⋅σ((input⋅weightT+bias))
where σ(x) is the sigmoid function:
σ(x)=1+e−x1
The operation consists of three main steps:
- Linear transformation: z=input⋅weightT+bias
- Swish activation: swish(z)=z⋅σ(z)
- Scaling: output=scaling_factor⋅swish(z)
Input:
- Matrix
input_matrix of size batch_size×in_features
- Matrix
weight_matrix of size out_features×in_features
- Vector
bias of size out_features
- Scalar
scaling_factor for final scaling
Output:
- Matrix
output of size batch_size×out_features
Notes:
- All matrices are stored in row-major order
- This problem is adapted from KernelBench