Tensara Logo

tensara

Kullback-Leibler Divergence

MEDIUM

Compute the element-wise Kullback-Leibler Divergence between two probability distributions, predictions and targets.

The Kullback-Leibler Divergence is a measure of how one probability distribution diverges from a second, expected probability distribution. For discrete probability distributions P and Q, the KL divergence is defined as:

DKL(PQ)=iP(i)log(P(i)Q(i))=iP(i)(logP(i)logQ(i))D_{KL}(P || Q) = \sum_{i} P(i) \log\left(\frac{P(i)}{Q(i)}\right) = \sum_{i} P(i) (\log P(i) - \log Q(i))

In this problem, you will compute the element-wise KL divergence before the summation step. That is, for each element:

output[i]=targets[i](log(targets[i])log(predictions[i]))\text{output}[i] = \text{targets}[i] \cdot (\log(\text{targets}[i]) - \log(\text{predictions}[i]))

Note that when targets[i] is 0, the contribution to the KL divergence is 0 (by convention, using the limit limx0xlog(x)=0\lim_{x \to 0} x \log(x) = 0).

Input:

  • Tensor predictions of size NN representing a probability distribution Q (all values > 0 and sum to 1)
  • Tensor targets of size NN representing a probability distribution P (all values ≥ 0 and sum to 1)

Output:

  • Tensor output of size NN, where output[i] contains the element-wise KL divergence contribution.

Notes:

  • All tensors are flat 1D arrays (or treated as such) and stored contiguously in memory.
  • You should handle the case where targets[i] is 0 correctly (the contribution should be 0).
  • To avoid numerical issues, you should add a small epsilon (e.g., 1e-10) to predictions and targets before computing logarithms.
  • The full KL divergence can be computed by summing all elements of the output tensor.

GPU Type

Language

Data Type

Loading...

Loading editor...

CUDA C++ environment

Desktop Required for Code Submission

For the best coding experience, please switch to a desktop device to write and submit your solution.