Career websites contain valuable data on employees, their skill sets and, employment history. This article uses k-means clustering on keywords describing skill sets that have been transformed using either
- term frequency inverse document frequency (TF-IDF) or
- t-distributed stochastic neighbor embedding (TSNE).
A third experiment performs the clustering after 20 keywords have been selected using Latent Dirichlet Allocation.
In addition the authors extract the chronological information about positions to visualize potential career paths.
- clustering jobs per title yields job titles commonly used for different kind of work (e.g. web developer, business intelligence, oracle development, etc.)
- the network generated from the chronological information shows (i) typical career paths and (ii) identifies positions with high in- and out-degrees.