RUDI Extras
Template downloads and more detailed guidance for data scientists.
Page under construction ....
General Info
Data Scientist Expertise
Outline of skills needed at each step of modelling. They can be covered by a single person or a team of data scientists or engineers.
RUDI for Data Scientists
Improve your understanding of what machine learning is, or hone your skills. This page has more links to outside resources that can help.
This section provides a more detailed guide for developing machine learning (ML) models intended for use in policing. It's not a complete list of analyses to conduct. Whether you're new to machine learning, have academic experience but little exposure to deploying AI in human-centric environments, or you're an expert, here are some reminders about the steps involved in building a model-based product. Towards the end of the document, you'll find a list of useful resources, including links to online courses that cover much of this material. They also provide general advice on building and deploying ML models.
It is entirely possible that a data scientist finds themselves at the point of deployment with a successfully trained model and nothing else in place. If an institution has good data collections that are easily leveraged, a data scientist can build a successful prototype without addressing many of the best practices suggested in the Development cycle, or indeed the Rationale. It is, therefore, imperative to stop before deployment and take stock of the ethical and engineering aspects of the project. Rationale, Unification, and Deployment sections should be filled out, and the model and data cards populated with all information relevant to the reproducibility and ethics of the project.
If proper tests and evaluations have not been conducted, if there is no baseline performance to judge the model against*, if concerns of the community, minorities, or other affected parties have not been evaluated, discussed, and documented, this project will run into challenges.
*the best baseline would be the current procedure (shadowing is a way to gather baseline data on how current procedure performs against the model)