The TensorFlow.js toxicity classifier is built on top of the Universal Sentence Encoder lite (Cer et al., 2018) (USE), which is a model that encodes text into 512-dimensional embedding (or, in other words, a list of 512 floating point numbers). These embeddings can be used as starting points for language processing tasks such as sentiment classification and textual similarity analysis. The USE uses the Transformer (Vaswani et al, 2017) architecture, which is a state-of-the-art method for modelling language using deep neural networks. The USE can be found on the TensorFlow Hub, and is also available as a separate TensorFlow.js module.
We’ve also open-sourced the training code for the model, which was trained on a dataset from civil comments. We encourage you to reproduce our results, and to improve on our model and grow the publicly available datasets.
The code used to build the classifier exists in another github project based on Tensorflow Estimators, specifically the
tf_hub_tfjs sub-directory. There you will also find information about the performance characteristics of this model compared to both the one featured in the Perspective API and those created as part of the Kaggle Toxic Comments Challenge. It’s a simple extension of the USE model with just a couple of additional layers before the output heads for each tag.