System and Model Integrity
In an environment where there can be frequent changes of personnel, a proper production system needs to be set up so that it is easily maintainable by any competent set of engineers and data scientists if it is to be deployed and used successfully on a regular and enduring basis. The models will require regular evaluation and retraining, so the project should contain:
Automation: CI/CD pipelines
Code quality and documentation: Ensure that you have well documented code that has been run through linters and code checkers.
Unit testing: Any functions that transform data or influence the model in any way should be subject to unit tests. You can also write adversarial unit tests for the ML model with expected outcomes to ensure that retraining does not result in degradation of performance in specific types of cases, e.g. GitHub - marcotcr/checklist: Beyond Accuracy: Behavioral Testing of NLP models with CheckList . You can adapt the unit testing framework so that you pass a percentage of tests, as you may not be able to attain full performance on all cases you can design, right away.
Data validation and quality: There should be format and quality checkers on all the incoming data to ensure that changes in inputs do not lead to prediction errors. There are packages out there than can help with this, e.g. Pandera
Monitoring: Set up visualisations of the performance over time so that any degradation of performance is immediately visible, especially if you are using dynamic training. Ensure you track all relevant metrics to do with protected characteristics or potential feedback loop situations, as you want to be the first to spot any issues.
Domain adaptation: Keep note of any interventions or changes in policy that may alter the distribution of data or have effect on the model performance.
Further information: