Stable Product Configuration¶
Product Configuration¶
In general, we recommend using the following "product configuration" in LTN:
- not: the standard negation \(\lnot u = 1-u\),
- and: the product t-norm \(u \land v = uv\),
- or: the product t-conorm (probabilistic sum) \(u \lor v = u+v-uv\),
- implication: the Reichenbach implication \(u \rightarrow v = 1 - u + uv\),
- existential quantification ("exists"): the generalized mean (p-mean) \(\mathrm{pM}(u_1,\dots,u_n) = \biggl( \frac{1}{n} \sum\limits_{i=1}^n u_i^p \biggr)^{\frac{1}{p}} \qquad p \geq 1\),
- universal quantification ("for all"): the generalized mean of "the deviations w.r.t. the truth" (p-mean error) \(\mathrm{pME}(u_1,\dots,u_n) = 1 - \biggl( \frac{1}{n} \sum\limits_{i=1}^n (1-u_i)^p \biggr)^{\frac{1}{p}} \qquad p \geq 1\).
"Stable"¶
As is, this "product configuration" is not fully exempt from issues:
- the product t-norm has vanishing gradients on the edge case \(u=v=0\);
- the product t-conorm has vanishing gradients on the edge case \(u=v=1\);
- the Reichenbach implication has vanishing gradients on the edge case \(u=0\),\(v=1\);
pMeanhas exploding gradients on the edge case \(u_1=\dots=u_n=0\);pMeanErrorhas exploding gradients on the edge case \(u_1=\dots=u_n=1\).
However, all these issues happen on edge cases and can easily be fixed using the following "trick":
- if the edge case happens when an input \(u\) is \(0\), we modify every input with \(u' = (1-\epsilon)u+\epsilon\);
- if the edge case happens when an input \(u\) is \(1\), we modify every input with \(u' = (1-\epsilon)u\);
where \(\epsilon\) is a small positive value (e.g. \(1\mathrm{e}{-5}\)).
This "trick" gives us a stable version of such operators. Stable in the sense it has not gradient issues anymore.
One can trigger the stable version of such operators by using the boolean parameter stable. It is possible to set a default value for stable when initializing the operator, or to use different values at each call of the operator.
In the following, we repeat the last example with the difference that we are now using the stable version of the pMean operator. It is possible to observe that the gradients are now different from NaN. Thanks to the stable version of the operator, we are now able to obtain suitable gradients.