Dynamic feature scaling for online learning of binary classifiers
This article describes and evaluates different online feature scaling approaches and their impact on the performance of binary classifiers.
 online feature scaling requires features to be adapted on the fly, i.e. before all properties of the underlying distribution are known.
 the evaluations in the article suggest, that the rather simple unsupervised dynamic scaling approach performs exceptionally well when used with the average weight vector for training and testing, to reduce the impact of the last features encountered in the learning step.
Method

The feature value \(x_j\) for feature \(j\) is transformed using the medium \(\mu_j\) and standard deviation \(\sigma_j\) to the scaled value \(x_j'\):
\[x_j' = \frac{x_j  \mu_j}{\sigma_j}\] 
estimations of the kth update of the mean and standard deviation for the jth feature are obtained from:
\[\begin{align} \mu^k_j &= \mu^{k1}_j + \frac{x_j^k  \mu_j^{k1}}{k} \\ s^k_j &= s^{k1}_j + (x^k_j  \mu_j^{k1}) (x_j^k  \mu_j^k) \text{ with}\\ \sigma^k_j &= \sqrt{s^k_j/(k1)} \end{align}\]