Implement Batch Normalization over the batch dimension (B) for each feature channel in a 4D tensor.
The formula for Batch Normalization is:
y=Var[x]+ϵx−E[x]
where the mean E[x] and variance Var[x] are computed over the batch dimension (B) for each feature channel independently. ϵ is a small value added to the variance for numerical stability.
Input:
Tensor X of shape (B,F,D1,D2) (input data)
Epsilon ϵ (a small float, typically 1e-5)
Output:
Tensor Y of shape (B,F,D1,D2) (normalized data)
Notes:
Compute the mean and variance across the batch dimension B independently for each feature channel F.
The statistics (mean and variance) are computed independently for each spatial location (D1,D2) in each feature channel.
Use ϵ=10−5
For simplicity, this implementation focuses on the core normalization without learnable parameters (gamma and beta) and without tracking running statistics.