Welcome to PySS3’s documentation!¶
PySS3 is a Python package that allows you to work with The SS3 Classification Model in a very
straightforward, interactive and visual way. In addition to the
implementation of the classifier, PySS3 comes with a set of tools
to help you developing your machine learning models in a clearer and
faster way. These tools let you analyze, monitor and understand your
models by allowing you to see what they have actually learned and why. To
achieve this, PySS3 provides you with 3 main components: the
Live_Test class, and the
Evaluation class, as pointed out below.
which implements the classifier using a clear API (very similar to that
from pyss3 import SS3 clf = SS3() ... clf.fit(x_train, y_train) y_pred = clf.predict(x_test)
doc = "Liverpool CEO Peter Moore on Building a Global Fanbase" # standard "single-label" classification label = clf.classify_label(doc) # 'business' # multi-label classification labels = clf.classify_multilabel(doc) # ['business', 'sports']
or extract_insight() to allow you to get the text fragments involved in the classification decision.
which allows you to interactively test your model and visually see the reasons behind classification decisions, with just one line of code:
from pyss3.server import Live_Test from pyss3 import SS3 clf = SS3() ... clf.fit(x_train, y_train) Live_Test.run(clf, x_test, y_test) # <- this one! cool uh? :)
As shown in the image below, this will open up, locally, an interactive
tool in your browser which you can use to (live) test your models with
the documents given in
x_test (or typing in your own!). This will
allow you to visualize and understand what your model is actually
And last but not least, the
This is probably one of the most useful components of PySS3. As the name may suggest, this class provides the user easy-to-use methods for model evaluation and hyperparameter optimization, like, for example, the test(), kfold_cross_validation(), grid_search(), and plot() methods for performing tests, stratified k-fold cross validations, grid searches for hyperparameter optimization, and visualizing evaluation results using an interactive 3D plot, respectively. Probably one of its most important features is the ability to automatically (and permanently) record the history of evaluations that you’ve performed. This will save you a lot of time and will allow you to interactively visualize and analyze your classifier performance in terms of its different hyper-parameters values (and select the best model according to your needs). For instance, let’s perform a grid search with a 4-fold cross-validation on the three hyperparameters, smoothness(s), significance(l), and sanction(p):
from pyss3.util import Evaluation ... best_s, best_l, best_p, _ = Evaluation.grid_search( clf, x_train, y_train, s=[0.2 , 0.32, 0.44, 0.56, 0.68, 0.8], l=[0.1 , 0.48, 0.86, 1.24, 1.62, 2], p=[0.5, 0.8, 1.1, 1.4, 1.7, 2], k_fold=4 )
In this illustrative example, s, l, and p will take those 6 different values each, and once the search is over, this function will return (by default) the hyperparameter values that obtained the best accuracy.
Now, we could also use the
plot function to analyze the results obtained in our grid search using the interactive 3D evaluation plot:
In this 3D plot, each point represents an experiment/evaluation performed using that particular combination of values (s, l, and p). Also, these points are painted proportional to how good the performance was using that configuration of the model. Researchers can interactively change the evaluation metrics to be used (accuracy, precision, recall, f1, etc.) and plots will update “on the fly”. Additionally, when the cursor is moved over a data point, useful information is shown (including a “compact” representation of the confusion matrix obtained in that experiment). Finally, it is worth mentioning that, before showing the 3D plots, PySS3 creates a single and portable HTML file in your project folder containing the interactive plots. This allows researchers to store, send or upload the plots to another place using this single HTML file (or even provide a link to this file in their own papers, which would be nicer for readers, plus it would increase experimentation transparency). For example, we have uploaded two of these files for you to see: “Movie Review (Sentiment Analysis)” and “Topic Categorization”, both evaluation plots were obtained following the Tutorials.
Want to contribute to this Open Source project?¶
Thanks for your interest in the project, you’re awesome! Take a look at the project Githug repository, any kind of help is very welcome (Code, Bug reports, Content, Data, Documentation, Design, Examples, Ideas, Feedback, etc.), Issues and/or Pull Requests are welcome for any level of improvement, from a small typo to new features, help us make PySS3 better.
- Getting Started
- Installation Instructions
- The Workflow
- The SS3 Classification Model
- Visualization Tools