top of page

Data Scientist Expertise Needed at Each Stage 

Rationale

  • Experience with data modelling, ideally using police data  

  • Knowledge of explicit and implicit biases in the data e.g. how the data was compiled and possible issues inherent to the data 

  • Knowledge of processes involved in merging data from different sources, what variables need to be in the dataset given the model aims, what data is feasible given the different data systems  

 

At this stage, knowledge of the potential data is key, and expertise in machine learning (ML) techniques less so.   

Unification (data scientist and/or data engineer)

  • Knowledge of what data is needed for the modelling and what the required final dataset should look like 

  • Knowledge of the current database systems and what can be retrieved for a) the modelling stage, and b) regular scheduled runs of the model in production  

  • Ability to engineer reliable pipelines for data extraction and unit testing to ensure that any changes in data or data inconsistencies are caught  

Development

  • Ability to prepare data for ML e.g., which algorithms need scaled data, how to separate data into training/validation/testing sets etc.  

  • ML experience: sklearn at the minimum, KNN, forests/trees etc. 

  • Preferably some neural net experience, RNN, CNN 

  • Experience of natural language processing if leveraging text reports 

  • Ability to test models and troubleshooting issues 

  • Ability to assess model performance under various circumstances etc. 

 

Knowledge of all algorithms is not required, but ML experience is essential. Some of the above can be learnt on the job if the person is given time, space and guidance. 

Implementation

  • Ability to run the model in production (this requires good programming skills)  

  • Ability to test and maintain the model once it's in production  

  • Ability to perform unit testing and statistical testing to monitor performance in real time 

bottom of page