Usage¶
This test suite has multiple purposes:
test a new model locally for a particular test or all available tests
submit test results if the model is considered to be a release candidate
inspect test results on the HTML pages to learn about test details
rank models by test results to select the next production model
use impact analysis results and test results to summarize expected differences between models
We cover each of them in the following sub-sections.
Run a test¶
Tests are executed by the test.py
script,
that is located in the root folder of the repository.
It provides online help with
$ python test.py --help
You always have to provide the condition to test.
You select a model with --model MODEL_ID
.
If no model is specified,
it will run the tests on all previously tested models.
You can also specify the maximum signal length
that should be used for the model with --max-signal-length
.
If you do, a short tuning parameter ID will be appended
to the model name when displaying results.
You select a test with --test TEST
.
If you don’t specify a test
it will run all tests for the given model.
It will directly output which tests failed or passed.
Examples
Run Fairness Sex test for the model CNN14 on arousal:
$ python test.py arousal --test fairness_sex --model 1543ec32-1.0.3
Run all available tests for the model CNN14 on arousal:
$ python test.py arousal --model 1543ec32-1.0.3
Run all available tests for the model with ID 1543ec32-1.0.3 on arousal with a maximum signal length of 3 seconds:
$ python test.py arousal --model 1543ec32-1.0.3 --max-signal-length 5
Run Fairness Sex test for all previously tested arousal models
$ python test.py arousal --test fairness_sex
Run all available tests for all previously tested arousal models
$ python test.py arousal
Submit results¶
The results of a test are stored
under docs/results/test/CONDITION/MODEL_ID/TEST
as CSV and PNG files.
For example,
the folder
docs/results/test/arousal/1543ec32-1.0.3/correctness_regression/
contains besides other files
mean-squared-error.csv
.
If this is the first time you tested the selected model
it will also store information about that model
under the model folder
docs/results/test/CONDITION/MODEL_ID
.
If your tested model is ranked under the top five condition overview pages, or you think there are other reasons worth submitting the test results, please commit those files to a new branch, push to the Github server, and open a pull request.
Test details as HTML¶
Every time you push to the main
branch
of the Github repository,
a CI job will automatically update the HTML pages
you find under
https://audeering.github.io/ser-tests/.
Here,
you can inspect all submitted test results.
To build that page locally, please run:
$ pip install -r docs/requirements.txt
$ python -m sphinx docs/ build/html -b html
Rank models¶
Models are automatically ranked by their test results on the test overview pages. The ranking is calculated by the percentage of passed tests.
Model comparisons¶
If we want to update compare one model to another, we need to summarize the changes the user might expect. To this end, we show the comparison of the individual test results.
If a change from one model baseline to one or more model candidates should be analysed, the tests of the involved models have to be run and submitted.
Finally, the intended baseline and candidate
model ids have to be specified in
docs/results/comparison/CONDITION.yaml
under the
respective condition in order for them
to be displayed in the HTML pages.
For example, to show the comparison between two models
1543ec32-1.0.3 and 51c582b7-1.0.0
for emotion
the file comparison/emotion.yaml
should contain:
- baseline: 1543ec32-1.0.3
candidates:
- 51c582b7-1.0.0