PySS3 Package¶
Main module¶
This is the main module containing the implementation of the SS3 classifier.
(Please, visit https://github.com/sergioburdisso/pyss3 for more info)
-
exception
pyss3.
EmptyModelError
(msg='')¶ Bases:
Exception
Exception to be thrown when the model is empty.
-
exception
pyss3.
InvalidCategoryError
(msg='')¶ Bases:
Exception
Exception to be thrown when a category is not valid.
-
class
pyss3.
SS3
(s=None, l=None, p=None, a=None, name='', cv_m='norm_gv_xai', sn_m='xai')¶ Bases:
object
The SS3 classifier class.
The SS3 classifier was originally defined in Section 3 of https://dx.doi.org/10.1016/j.eswa.2019.05.023 (preprint avialable here: https://arxiv.org/abs/1905.08772)
- Parameters
s (float) – the “smoothness”(sigma) hyperparameter value
l (float) – the “significance”(lambda) hyperparameter value
p (float) – the “sanction”(rho) hyperparameter value
a (float) – the alpha hyperparameter value (i.e. all terms with a confidence value (cv) less than alpha will be ignored during classification)
name (str) – the model’s name (to save and load the model from disk)
cv_m (str) – method used to compute the confidence value (cv) of each term (word or n-grams), options are: “norm_gv_xai”, “norm_gv” and “gv” (default: “norm_gv_xai”)
sn_m (str) – method used to compute the sanction (sn) function, options are: “vanilla” and “xai” (default: “xai”)
-
classify
(doc, prep=True, sort=True, json=False)¶ Classify a given document.
- Parameters
doc (str) – the content of the document
prep (bool) – enables input preprocessing (default: True)
sort (bool) – sort the classification result (from best to worst)
json (bool) – return the result in JSON format
- Returns
the document confidence vector if
sort
is False. Ifsort
is True, a list of pairs (category index, confidence value) ordered by cv.- Return type
list
-
fit
(x_train, y_train, n_grams=1, prep=True, leave_pbar=True)¶ Train the model given a list of documents and category labels.
- Parameters
x_train (list (of str)) – the list of documents
y_train (list (of str)) – the list of document labels
n_grams (int) – indicates the maximum
n
-grams to be learned (e.g. a value of1
means only 1-grams (words),2
means 1-grams and 2-grams,3
, 1-grams, 2-grams and 3-grams, and so on.prep (bool) – enables input preprocessing (default: True)
leave_pbar (bool) – controls whether to leave the progress bar or remove it after finishing.
-
get_a
()¶ Get the alpha hyperparameter value.
- Returns
the hyperparameter value
- Return type
float
-
get_alpha
()¶ Get the alpha hyperparameter value.
- Returns
the hyperparameter value
- Return type
float
-
get_categories
()¶ Get the list of category names.
- Returns
the list of category names
- Return type
list (of str)
-
get_category_index
(name)¶ Given its name, return the category index.
- Parameters
name (str) – The category name
- Returns
the category index
- Return type
int
- Raises
InvalidCategoryError
-
get_category_name
(index)¶ Given its index, return the category name.
- Parameters
index (int) – The category index
- Returns
the category name
- Return type
str
- Raises
InvalidCategoryError
-
get_hyperparameters
()¶ Get hyperparameter values.
- Returns
a tuple with hyperparameters current values (s, l, p, a)
- Return type
tuple
-
get_l
()¶ Get the “significance” (lambda) hyperparameter value.
- Returns
the hyperparameter value
- Return type
float
-
get_most_probable_category
()¶ Get the name of the most probable category.
- Returns
the name of the most probable category
- Return type
str
-
get_name
()¶ Return the model’s name.
- Returns
the model’s name.
- Return type
str
-
get_next_words
(sent, cat, n=None)¶ Given a sentence, return the list of
n
(possible) following words.- Parameters
sent (str) – a sentence (e.g. “an artificial”)
cat (str) – the category name
n (int) – the maximum number of possible answers
- Returns
a list of tuples (word, frequency, probability)
- Return type
list (of tuple)
-
get_p
()¶ Get the “sanction” (rho) hyperparameter value.
- Returns
the hyperparameter value
- Return type
float
-
get_s
()¶ Get the “smoothness” (sigma) hyperparameter value.
- Returns
the hyperparameter value
- Return type
float
-
get_sanction
()¶ Get the “sanction” (rho) hyperparameter value.
- Returns
the hyperparameter value
- Return type
float
-
get_significance
()¶ Get the “significance” (lambda) hyperparameter value.
- Returns
the hyperparameter value
- Return type
float
-
get_smoothness
()¶ Get the “smoothness” (sigma) hyperparameter value.
- Returns
the hyperparameter value
- Return type
float
-
get_stopwords
(sg_threshold=0.01)¶ Get the list of (recognized) stopwords.
- Parameters
sg_threshold (float) – significance (sg) value used as a threshold to consider words as stopwords (i.e. words with sg <
sg_threshold
for all categories will be considered as “stopwords”)- Returns
a list of stopwords
- Return type
list (of str)
-
get_word
(index)¶ Given the index, return the word.
- Parameters
index (int) – the word index
- Returns
the word
- Return type
str
-
get_word_index
(word)¶ Given a word, return its index.
- Parameters
name (str) – a word
- Returns
the word index
- Return type
int
-
learn
(doc, cat, n_grams=1, prep=True, update=True)¶ Learn a new document for a given category.
- Parameters
doc (str) – the content of the document
cat (str) – the category name
n_grams (int) – indicates the maximum
n
-grams to be learned (e.g. a value of1
means only 1-grams (words),2
means 1-grams and 2-grams,3
, 1-grams, 2-grams and 3-grams, and so on.prep (bool) – enables input preprocessing (default: True)
update (bool) – enables model auto-update after learning (default: True)
-
load_model
()¶ Load model from disk.
- Raises
IOError
-
plot_value_distribution
(cat)¶ Plot the category’s global and local value distribution.
- Parameters
cat (str) – the category name
-
predict
(x_test, def_cat='most-probable', labels=True, prep=True, leave_pbar=True)¶ Classify a list of documents.
- Parameters
x_test (list (of str)) – the list of documents to be classified
def_cat (str) – default category to be assigned when SS3 is not able to classify a document. Options are “most-probable”, “unknown” or a given category name.
labels (bool) – whether to return the list of category names or just category indexes
prep (bool) – enables input preprocessing (default: True)
leave_pbar (bool) – controls whether to leave the progress bar or remove it after finishing.
- Returns
if
labels
is True, the list of category names, otherwise, the list of category indexes.- Return type
list (of int or str)
- Raises
EmptyModelError
-
predict_proba
(x_test, prep=True, leave_pbar=True)¶ Classify a list of documents returning a list of confidence vectors.
- Parameters
x_test (list (of str)) – the list of documents to be classified
prep (bool) – enables input preprocessing (default: True)
leave_pbar (bool) – controls whether to leave the progress bar after finishing or remove it.
- Returns
the list of confidence vectors
- Return type
list (of list of float)
- Raises
EmptyModelError
-
print_categories_info
()¶ Print information about learned categories.
-
print_hyperparameters_info
()¶ Print information about hyperparameters.
-
print_model_info
()¶ Print information regarding the model.
-
print_ngram_info
(ngram)¶ Print debugging information about a given n-gram.
Namely, print the n-gram frequency (fr), local value (lv), global value (gv), confidence value (cv), sanction (sn) weight, significance (sg) weight.
- Parameters
ngram (str) – the n-gram (e.g. “machine”, “machine learning”, etc.)
-
save_cat_vocab
(cat, path='./', n_grams=-1)¶ Save category vocabulary to disk.
- Parameters
cat (str) – the category name
path (str) – the path in which to store the vocabulary
n_grams (int) – indicates the n-grams to be stored (e.g. only 1-grams, 2-grams, 3-grams, etc.). Default -1 stores all learned n-grams (1-grams, 2-grams, 3-grams, etc.)
-
save_model
()¶ Save the model to disk.
-
save_vocab
(path='./', n_grams=-1)¶ Save learned vocabularies to disk.
- Parameters
path (str) – the path in which to store the vocabularies
n_grams (int) – indicates the n-grams to be stored (e.g. only 1-grams, 2-grams, 3-grams, etc.). Default -1 stores all learned n-grams (1-grams, 2-grams, 3-grams, etc.)
-
set_a
(value)¶ Set the alpha hyperparameter value.
All terms with a confidence value (cv) less than alpha will be ignored during classification.
- Parameters
value (float) – the hyperparameter value
-
set_alpha
(value)¶ Set the alpha hyperparameter value.
All terms with a confidence value (cv) less than alpha will be ignored during classification.
- Parameters
value (float) – the hyperparameter value
-
set_hyperparameters
(s=None, l=None, p=None, a=None)¶ Set hyperparameter values.
- Parameters
s (float) – the “smoothness” (sigma) hyperparameter
l (float) – the “significance” (lambda) hyperparameter
p (float) – the “sanction” (rho) hyperparameter
a (float) – the alpha hyperparameter (i.e. all terms with a confidence value (cv) less than alpha will be ignored during classification)
-
set_l
(value)¶ Set the “significance” (lambda) hyperparameter value.
- Parameters
value (float) – the hyperparameter value
-
set_p
(value)¶ Set the “sanction” (rho) hyperparameter value.
- Parameters
value (float) – the hyperparameter value
-
set_s
(value)¶ Set the “smoothness” (sigma) hyperparameter value.
- Parameters
value (float) – the hyperparameter value
-
set_sanction
(value)¶ Set the “sanction” (rho) hyperparameter value.
- Parameters
value (float) – the hyperparameter value
-
set_significance
(value)¶ Set the “significance” (lambda) hyperparameter value.
- Parameters
value (float) – the hyperparameter value
-
set_smoothness
(value)¶ Set the “smoothness” (sigma) hyperparameter value.
- Parameters
value (float) – the hyperparameter value
-
summary_op_ngrams
(cvs)¶ Summary operator for n-gram confidence vectors.
By default it returns the addition of all confidence vectors. However, in case you want to use a custom summary operator, this function must be replaced as shown in the following example:
>>> def my_summary_op(cvs): >>> return cvs[0] >>> ... >>> clf = SS3() >>> ... >>> clf.summary_op_ngrams = my_summary_op
Note that any function receiving a list of vectors and returning a single vector could be used. In the above example the summary operator is replaced by the user-defined
my_summary_op
which ignores all confidence vectors returning only the confidence vector of the first n-gram (which besides being an illustrative example, makes no real sense).- Parameters
cvs (list (of list of float)) – a list n-grams confidence vectors
- Returns
a sentence confidence vector
- Return type
list (of float)
-
summary_op_paragraphs
(cvs)¶ Summary operator for paragraph confidence vectors.
By default it returns the addition of all confidence vectors. However, in case you want to use a custom summary operator, this function must be replaced as shown in the following example:
>>> def dummy_summary_op(cvs): >>> return cvs[0] >>> ... >>> clf = SS3() >>> ... >>> clf.summary_op_paragraphs = dummy_summary_op
Note that any function receiving a list of vectors and returning a single vector could be used. In the above example the summary operator is replaced by the user-defined
dummy_summary_op
which ignores all confidence vectors returning only the confidence vector of the first paragraph (which besides being an illustrative example, makes no real sense).- Parameters
cvs (list (of list of float)) – a list paragraph confidence vectors
- Returns
the document confidence vector
- Return type
list (of float)
-
summary_op_sentences
(cvs)¶ Summary operator for sentence confidence vectors.
By default it returns the addition of all confidence vectors. However, in case you want to use a custom summary operator, this function must be replaced as shown in the following example:
>>> def dummy_summary_op(cvs): >>> return cvs[0] >>> ... >>> clf = SS3() >>> ... >>> clf.summary_op_sentences = dummy_summary_op
Note that any function receiving a list of vectors and returning a single vector could be used. In the above example the summary operator is replaced by the user-defined
dummy_summary_op
which ignores all confidence vectors returning only the confidence vector of the first sentence (which besides being an illustrative example, makes no real sense).- Parameters
cvs (list (of list of float)) – a list sentence confidence vectors
- Returns
a paragraph confidence vector
- Return type
list (of float)
-
update_values
(force=False)¶ Update model values (cv, gv, lv, etc.).
- Parameters
force (bool) – force update (even if hyperparameters haven’t changed)
-
pyss3.
key_as_int
(dct)¶ Cast the given dictionary (numerical) keys to int.
-
pyss3.
mad
(values, n)¶ Median absolute deviation mean.
-
pyss3.
sigmoid
(v, l)¶ A sigmoid function.
-
pyss3.
vdiv
(v0, v1)¶ Vectorial version of division.
-
pyss3.
vmax
(v0, v1)¶ Vectorial version of max.
-
pyss3.
vsum
(v0, v1)¶ Vectorial version of sum.
Submodules¶
pyss3.server module¶
SS3 classification server with visual explanations for live tests.
(Please, visit https://github.com/sergioburdisso/pyss3 for more info)
-
class
pyss3.server.
Server
¶ Bases:
object
SS3 HTTP server wrapper.
-
static
get_port
()¶ Return the server port.
- Returns
the server port
- Return type
int
-
static
serve
(clf=None, x_test=None, y_test=None, port=0, browser=True, quiet=True)¶ Wait for classification requests and serve them.
- Parameters
clf (pyss3.SS3) – the SS3 model to be attached to this server.
x_test (list (of str)) – the list of documents to classify and visualize
y_label (list (of str)) – the list of category labels
port (int) – the port to listen on (default: random free port)
browser (bool) – if True, it automatically opens up the live test on your browser
quiet (bool) – if True, use quiet mode. Otherwise use verbose mode (default: False)
-
static
set_model
(clf)¶ Attach a given SS3 model to this server.
- Parameters
clf (pyss3.SS3) – an SS3 model
-
static
set_testset
(x_test, y_test)¶ Assign the test set to visualize.
- Parameters
x_test (list (of str)) – the list of documents to classify and visualize
y_label (list (of str)) – the list of category labels
-
static
set_testset_from_files
(test_path, folder_label=True)¶ Load the test set files to visualize from
test_path
.- Parameters
test_path (str) – the test set path
folder_label (bool) – if True, read category labels from folders, otherwise, read category labels from file names. (default: True)
- Returns
True if category documents were found, False otherwise
- Return type
bool
-
static
start_listening
(port=0)¶ Start listening on a port and return its number.
(If a port number is not given, it uses a random free port).
- Parameters
port (int) – the port to listen on
-
static
-
pyss3.server.
content_type
(ext)¶ Given a file extension, return the content type.
-
pyss3.server.
get_http_body
(http_request)¶ Given a HTTP request, return the body.
-
pyss3.server.
get_http_contlength
(http_request)¶ Given a HTTP request, return the Content-Length value.
-
pyss3.server.
get_http_path
(http_request)¶ Given a HTTP request, return the resource path.
-
pyss3.server.
parse_and_sanitize
(rsc_path)¶ Very simple function to parse and sanitize the given path.
pyss3.cmd_line module¶
This module lets you interact with your SS3 models through a Command Line.
(Please, visit https://github.com/sergioburdisso/pyss3 for more info)
-
exception
pyss3.cmd_line.
ArgsParseError
¶ Bases:
Exception
Exception thrown when an error occur parsing commands arguments.
-
exception
pyss3.cmd_line.
GetTestDataError
¶ Bases:
Exception
Exception thrown when an error occur while retrieving the test data.
-
class
pyss3.cmd_line.
SS3Prompt
(completekey='tab', stdin=None, stdout=None)¶ Bases:
cmd.Cmd
Prompt main class.
-
args_classify
(args)¶ Parse classify arguments.
-
args_evaluations
(args)¶ Parse evaluations arguments.
-
args_grid_search
(args)¶ Parse grid_search arguments.
-
args_k_fold
(args)¶ Parse k_fold arguments.
-
args_learn
(args)¶ Parse learn arguments.
-
args_live_test
(args)¶ Parse live_test arguments.
-
args_save
(args)¶ Parse save arguments.
-
args_set
(args)¶ Parse set arguments.
-
args_test
(args)¶ Parse test arguments.
-
args_train
(args)¶ Parse train arguments.
-
complete_evaluations
(text, line, begidx, endidx)¶ Complete arguments for ‘grid_search’ command.
-
complete_get
(text, line, begidx, endidx)¶ Complete arguments for ‘set’ command.
-
complete_grid_search
(text, line, begidx, endidx)¶ Complete arguments for ‘grid_search’ command.
-
complete_info
(text, line, begidx, endidx)¶ Complete arguments for ‘info’ command.
-
complete_k_fold
(text, line, begidx, endidx)¶ Complete arguments for ‘grid_search’ command.
-
complete_ld
(text, line, begidx, endidx)¶ Complete arguments for ‘load’ command.
-
complete_learn
(text, line, begidx, endidx)¶ Complete arguments for ‘learn’ command.
-
complete_live_test
(text, line, begidx, endidx)¶ Complete arguments for ‘test’ command.
-
complete_load
(text, line, begidx, endidx)¶ Complete arguments for ‘load’ command.
-
complete_plot
(text, line, begidx, endidx)¶ Complete arguments for ‘plot’ command.
-
complete_save
(text, line, begidx, endidx)¶ Complete arguments for ‘save’ command.
-
complete_set
(text, line, begidx, endidx)¶ Complete arguments for ‘set’ command.
-
complete_sv
(text, line, begidx, endidx)¶ Complete arguments for ‘save’ command.
-
complete_test
(text, line, begidx, endidx)¶ Complete arguments for ‘test’ command.
-
complete_train
(text, line, begidx, endidx)¶ Complete arguments for ‘train’ command.
-
default
(line)¶ Default error message.
-
do_EOF
(args='')¶ Quit the program.
-
do_classify
(**kwargs)¶ Classify a document.
- usage:
classify [DOCUMENT_PATH]
- optional arguments:
DOCUMENT_PATH the path to the document file
-
do_clone
(**kwargs)¶ Create a copy of the current model with a given name.
- usage:
clone NEW_MODEL_NAME
- required arguments:
NEW_MODEL_NAME the new model’s name
-
do_debug_term
(**kwargs)¶ Show debugging information about a given n-gram.
Namely, print the n-gram frequency (fr), local value (lv), global value (gv), confidence value (cv), sanction (sn) weight and significance (sg) weight.
- usage:
debug_term N_GRAM
- required arguments:
N_GRAM the n-gram (word, bigram, trigram, etc.) to debug
- examples:
debug_term the debug_term potato debug_term “machine learning” debug_term “self driving car”
-
do_evaluations
(**kwargs)¶ Perform different actions linked to evaluations results.
- usage:
evaluations OPTION [PATH] [METHOD] [DEF_CAT] [P VAL [P VAL …]
- required arguments:
- OPTION indicates the action to perform
- values: {info,plot,save,remove} (default: info)
- info - show information about evaluations (including
best values).
- plot - show an interactive 3-D plot with evaluation
results in the web browser (it also save it to disk).
save - save the interactive 3-D plot to disk. remove - delete evaluations results from history
- optional arguments:
PATH the dataset path used in the evaluate of interest
- METHOD the method that was used in the evaluate of interest
values: {test,K-fold} where K is an integer > 1
- DEF_CAT default category used in the evaluate of interest
values: {most-probable,unknown} or a category label
- P VAL the hyperparameter value (only for option “remove”)
P values: {s,l,p,a} VAL values: float
- examples:
- show information about all evaluations:
evaluations info
- show information about evaluations in path “a/dataset/path”:
evaluations info a/dataset/path
- information about 3-fold evaluations in path “a/dataset/path”:
evaluations info a/dataset/path 3-fold
- information about test evaluations in path “a/dataset/path”:
evaluations info a/dataset/path test
- plot evaluations:
evaluations plot
- save evaluations:
evaluations save
- remove all evaluation result(s) in path “a/dataset/path”:
evaluations remove a/dataset/path
remove 4-fold evaluation result(s) in path “a/dataset/path” with l = 1.1 and s = .45:
evaluations remove a/dataset/path 4-fold l 1.1 s .45
-
do_exit
(args='')¶ Quit the program.
-
do_get
(**kwargs)¶ Get a given hyperparameter value.
- usage:
get PARAM
- required arguments:
- PARAM the hyperparameter name
values: {s,l,p,a}
- examples:
get s get l get p get a
-
do_grid_search
(**kwargs)¶ Given a dataset, perform a grid search using the given hyperparameters values.
- usage:
grid_search PATH [LABEL] [DEF_CAT] [METHOD] P EXP [P EXP …] [no-cache]
- required arguments:
PATH the dataset path P EXP a list of values for a given hyperparameter.
- where:
P is a hyperparameter name. values: {s,l,p,a} EXP is a python expression returning a float or
a list of floats. Note: if this expression contains whitespaces, use quotations marks (e.g. “[0.5, 1.5]”)
- examples:
s [.3,.4,.5] s “[.3, .4, .5]” (Note the whitespaces and the “”) p r(.2,.8,6) (i.e. 6 points between .2 to .8)
- optional arguments:
- LABEL where to read category labels from.
values:{file,folder} (default: folder)
- DEF_CAT default category to be assigned when the model is not
able to actually classify a document. values: {most-probable,unknown} or a category label (default: most-probable)
- METHOD the method to be used
values: {test, K-fold} (default: test) where:
- K-fold indicates the number of folds to be used.
K is an integer > 1 (e.g 4-fold, 10-fold, etc.)
no-cache if present, disable the cache and recompute all the values
- examples:
grid_search a/testset/path s r(.2,.8,6) l r(.1,2,6) -p r(.5,2,6) a [0,.01] grid_search a/dataset/path 4-fold -s [.2,.3,.4,.5] -l [.5,1,1.5] -p r(.5,2,6)
-
do_info
(**kwargs)¶ Show useful information.
- usage:
info OPTION
- required arguments:
- OPTION indicates what information to show
- values: {all, parameters, categories, evaluations}
(default: all)
- examples:
info info evaluations
-
do_k_fold
(**kwargs)¶ Perform a stratified k-fold validation using the given dataset set.
- usage:
k_fold PATH [LABEL] [DEF_CAT] [N-grams] [N-fold] [P VAL …] [no-cache]
- required arguments:
PATHthe dataset path
- optional arguments:
- LABEL where to read category labels from.
values:{file,folder} (default: folder)
- DEF_CAT default category to be assigned when the model is not
able to actually classify a document. values: {most-probable,unknown} or a category label (default: most-probable)
- N-grams indicates the maximum n-grams to be learned (e.g. a
value of “1-grams” means only words will be learned; “2-grams” only 1-grams and 2-grams; “3-grams”, only 1-grams, 2-grams and 3-grams; and so on). value: {N-grams} with N integer > 0 (default: 1-grams)
- K-fold indicates the number of folds to be used.
value: {K-fold} with K integer > 1 (default: 4-fold)
- P VAL sets a hyperparameter value (e.g. s 0.45)
P values: {s,l,p,a} VAL values: float
no-cache if present, disable the cache and recompute values
- examples:
k_fold a/dataset/path 10-fold k_fold a/dataset/path 4-fold -s .45 -l 1.1 -p 1
-
do_learn
(**kwargs)¶ Learn a new document.
- usage:
learn CAT [N-grams] [DOCUMENT_PATH]
- required arguments:
CAT the category label
- optional arguments:
- N-grams indicates the maximum n-grams to be learned (e.g. a
value of “1-grams” means only words will be learned; “2-grams” only 1-grams and 2-grams; “3-grams”, only 1-grams, 2-grams and 3-grams; and so on). value: {N-grams} with N integer > 0 (default: 1-grams)
DOCUMENT_PATH the path to the document file
-
do_license
(args)¶ Print the license.
-
do_live_test
(**kwargs)¶ Interactively and graphically test the model.
- usage:
live_test [TEST_PATH [LABEL]] [verbose]
- optional arguments:
TEST_PATH the test set path
- LABEL where to read category labels from.
values: {file,folder} (default: folder)
verbose if present, run in verbose mode
- examples:
live_test live_test a/testset/path live_test a/testset/path verbose
-
do_load
(**kwargs)¶ Load a local model (given its name).
- usage:
load MODEL_NAME
- required arguments:
MODEL_NAME the model’s name
-
do_new
(**kwargs)¶ Create a new empty SS3 model with a given name.
- usage:
new MODEL_NAME
- required arguments:
MODEL_NAME the model’s name
-
do_next_word
(**kwargs)¶ Show up to 3 possible words to follow after the given sentence.
- usage:
next_word SENT
- required arguments:
SENT a sentence
- examples:
next_word “the self driving” next_word “a machine learning”
-
do_plot
(**kwargs)¶ Plot word value distribution curve or the evaluation results.
- usage:
plot OPTION
- required arguments:
- OPTION indicates what to plot
- values:
evaluations; distribution CAT;
- where:
CAT the category label
- examples:
plot distribution a_category plot evaluations
-
do_rename
(**kwargs)¶ Rename the current model with a given name.
- usage:
rename NEW_MODEL_NAME
- required arguments:
NEW_MODEL_NAME the model’s new name
-
do_save
(**kwargs)¶ Save to disk the model, learned vocabulary, evaluations results, etc.
- usage:
save OPTION
- required arguments:
- OPTION indicates what to save to disk
- values:
model; (default) evaluations; vocabulary [CAT]; stopwords [SG_THRESHOLD];
- where:
CAT the category label
- SG_THRESHOLD significance (sg) value used as a
threshold to consider words as stopwords (i.e. words with sg <
sg_threshold
for all categories will be considered as “stopwords”) (default: .01)
- examples:
save save model save vocabulary save vocabulary a_category save stopwords save stopwords .1
-
do_set
(**kwargs)¶ Set a given hyperparameter value.
- usage:
set P VAL [P VAL …]
- required arguments:
- P VAL sets a hyperparameter value
examples: s .45; s .5; P values: {s,l,p,a} VAL values: float
- examples:
set s .5 set l 0.5 set p 2 set s .5 l 0.5 p 2
-
do_test
(**kwargs)¶ Test the model using the given test set.
- usage:
test TEST_PATH [LABEL] [DEF_CAT] [P VAL …] [no-cache]
- required arguments:
TEST_PATH the test set path
- optional arguments:
- LABEL where to read category labels from.
values:{file,folder} (default: folder)
- DEF_CAT default category to be assigned when the model is not
able to actually classify a document. values: {most-probable,unknown} or a category label (default: most-probable)
- P VAL sets a hyperparameter value
examples: s .45; s .5; P values: {s,l,p,a} VAL values: float
no-cache if present, disable the cache and recompute values
- examples:
test a/testset/path test a/testset/path -s .45 -l 1.1 -p 1 test a/testset/path unknown -s .45 -l 1.1 -p 1 no-cache
-
do_train
(**kwargs)¶ Train the model using a training set and then save it.
- usage:
train TRAIN_PATH [LABEL] [N-gram]
- required arguments:
TRAIN_PATH the training set path
- optional arguments:
- LABEL where to read category labels from.
values:{file,folder} (default: folder)
- N-grams indicates the maximum n-grams to be learned (e.g. a
value of “1-grams” means only words will be learned; “2-grams” only 1-grams and 2-grams; “3-grams”, only 1-grams, 2-grams and 3-grams; and so on). value: {N-grams} with N integer > 0 (default: 1-grams)
- examples:
train a/training/set/path 3-grams
-
do_update
(**kwargs)¶ Update model values (cv, gv, lv, etc.).
-
precmd
(line)¶ Hook method executed just before the command.
-
preloop
()¶ Hook method executed once when cmdloop() is called.
-
-
pyss3.cmd_line.
delete_results
(data_path, method, def_cat, hparams, only_count=False)¶ Remove evaluations from history.
-
pyss3.cmd_line.
delete_results_slpa
(rh_metric, hparams, only_count=False, best=True)¶ Remove evaluations from history given hyperparameters s, l, p, a.
-
pyss3.cmd_line.
evaluations_info
(data_path=None, method=None)¶ Print evaluations best values.
-
pyss3.cmd_line.
evaluations_remove
(data_path, method, def_cat, hparams)¶ Evaluation remove command handler.
-
pyss3.cmd_line.
get_global_best
(values)¶ Given a list of evaluations values, return the best one.
-
pyss3.cmd_line.
get_results_history
(path, method, def_cat)¶ Given a path, a method and a default category return results history.
-
pyss3.cmd_line.
get_test_data_cache
(path, def_cat, method, s, l, p, a)¶ Return test results from cache.
-
pyss3.cmd_line.
grid_search
(data_path, folder_label, def_cat, n_gram, k_fold, ss, ll, pp, aa, cache=True)¶ Perform a grid search using values from ss,
ll
,pp
,aa
.
-
pyss3.cmd_line.
grid_search_loop
(data_path, x_test, y_test, categories, def_cat, k_fold, i_fold, ss, ll, pp, aa, cache=True, leave_pbar=True)¶ Grid search main loop.
-
pyss3.cmd_line.
intersect
(l0, l1)¶ Given two lists return the intersection.
-
pyss3.cmd_line.
is_in_cache
(path, method, def_cat, s, l, p, a)¶ Return whether this evaluation is already computed.
-
pyss3.cmd_line.
json2rh
(dct)¶ Convert a given dictionary to a RecursiveDefaultDict.
-
pyss3.cmd_line.
k_fold2method
(k_fold)¶ Convert the k number to a proper method string.
-
pyss3.cmd_line.
k_fold_classification_report
(data_path, method, def_cat, s, l, p, a)¶ Create the classification report for k-fold validations.
-
pyss3.cmd_line.
k_fold_validation
(data_path, folder_label, def_cat, n_grams, k_fold, s, l, p, a, cache=True)¶ Perform a stratified k-fold cross validation using the given data.
-
pyss3.cmd_line.
load_data
(data_path, folder_label, def_cat=None, return_cat_index=True, cmd_name='test')¶ Load documents from disk, return the x_data, y_data and categories.
-
pyss3.cmd_line.
load_results_history
()¶ Load results history (evaluations) from disk.
-
pyss3.cmd_line.
main
()¶ Main function.
-
pyss3.cmd_line.
module_path
(file_path)¶ Convert a file path relative to this module path.
-
pyss3.cmd_line.
parse_hparams_args
(op_args, defaults=True)¶ Parse hyperparameters arguments list.
-
pyss3.cmd_line.
plot_confusion_matrices
(cms, classes, info='', max_colums=3)¶ Show and plot the confusion matrices.
-
pyss3.cmd_line.
re_in
(regex, l)¶ Given a list of strings, return the first match in the list.
-
pyss3.cmd_line.
requires_args
(func)¶ A @decorator.
-
pyss3.cmd_line.
requires_model
(func)¶ A @decorator.
-
pyss3.cmd_line.
results
(y_true, y_pred, categories, def_cat, cache=True, method='test', data_path='', folder=False, plots=True, k_fold=1, i_fold=0)¶ Compute evaluation results and save them to disk.
-
pyss3.cmd_line.
round_fix
(v)¶ Round the number v (used to keep the results history file small).
-
pyss3.cmd_line.
save_html_evaluations
(show_plot=True)¶ Save results history (evaluations) to disk (interactive html file).
-
pyss3.cmd_line.
save_results
(rh, categories, accuracy, report, conf_matrix, k_fold, i_fold, s, l, p, a)¶ Save evaluation results to disk.
-
pyss3.cmd_line.
save_results_history
()¶ Save results history (evaluations) to disk.
-
pyss3.cmd_line.
split_args
(args)¶ Parse and split arguments.
-
pyss3.cmd_line.
subtract
(l0, l1)¶ Subtract list l1 from l0.
-
pyss3.cmd_line.
test
(test_path, folder_label, def_cat, s, l, p, a, cache)¶ Test the model with a given test set.
-
pyss3.cmd_line.
train
(x_train, y_train, n_grams, train_path='', folder_label=None, save=True, leave_pbar=True)¶ Train a new model with the given training set.
pyss3.util module¶
This is a helper module with utility classes and functions.
-
class
pyss3.util.
Dataset
¶ Bases:
object
A helper class with methods to read/write datasets.
-
static
load_from_files
(data_path, folder_label=True, as_single_doc=False)¶ Load category documents from disk.
- Parameters
data_path (str) – the training or the test set path
folder_label (bool) – if True, read category labels from folders, otherwise, read category labels from file names. (default: True)
as_single_doc – read the documents as a single (and big) document (default: False)
- Returns
the (x_train, y_train) or the (x_test, y_test) pairs.
- Return type
tuple
-
static
-
class
pyss3.util.
Preproc
¶ Bases:
object
A helper class with methods to preprocess input documents.
-
static
clean_and_ready
(text, dots=True, normalize=True, min_len=1)¶ Clean and prepare the text.
-
static
-
class
pyss3.util.
Print
¶ Bases:
object
Helper class to handle print functionalities.
-
static
error
(msg, raises=None, offset=0, decorator=True)¶ Print an error.
- Parameters
msg (str) – the message to show
raises (Exception) – the exception to be raised after showing the message
offset (int) – shift the message to the right (
offset
characters)decorator (bool) – if True, use error message decoretor
-
static
info
(msg, newln=True, offset=0, decorator=True)¶ Print an info message.
- Parameters
msg (str) – the message to show
newln (bool) – use new line after the message (default: True)
offset (int) – shift the message to the right (
offset
characters)decorator (bool) – if True, use info message decoretor
-
static
quiet_begin
()¶ Begin a “be quiet” block.
-
static
quiet_end
()¶ End the “be quiet” block.
-
static
set_decorator_error
(start, end=None)¶ Set error messages decorator.
- Parameters
start (str) – messages preffix
end (str) – messages suffix
-
static
set_decorator_info
(start, end=None)¶ Set info messages decorator.
- Parameters
start (str) – messages preffix
end (str) – messages suffix
-
static
set_decorator_warn
(start, end=None)¶ Set warning messages decorator.
- Parameters
start (str) – messages preffix
end (str) – messages suffix
-
static
set_quiet
(value)¶ Set quiet mode value.
When quiet mode is enable, only error messages will be displayed.
- Parameters
value (bool) – if True, enables quiet mode
-
static
show
(msg='', newln=True, offset=0)¶ Print a message.
- Parameters
msg (str) – the message to show
newln (bool) – use new line after the message (default: True)
offset (int) – shift the message to the right (
offset
characters)
-
static
warn
(msg, newln=True, raises=None, offset=0, decorator=True)¶ Print a warning message.
- Parameters
msg (str) – the message to show
newln (bool) – use new line after the message (default: True)
raises (Exception) – the exception to be raised after showing the message
offset (int) – shift the message to the right (
offset
characters)decorator (bool) – if True, use warning message decoretor
-
static
-
class
pyss3.util.
RecursiveDefaultDict
¶ Bases:
dict
A dict whose default value is a dict.
-
class
pyss3.util.
Style
¶ Bases:
object
Helper class to handle print styles.
-
static
blue
(text)¶ Apply ‘blue’ style to
text
.
-
static
bold
(text)¶ Apply bold style to
text
.
-
static
fail
(text)¶ Apply the ‘fail’ style to
text
.
-
static
green
(text)¶ Apply ‘green’ style to
text
.
-
static
header
(text)¶ Apply ‘header’ style to
text
.
-
static
ubold
(text)¶ Apply underline and bold style to
text
.
-
static
underline
(text)¶ Underline
text
.
-
static
warning
(text)¶ Apply the ‘warning’ style to
text
.
-
static