Modelling 3: Understanding the model

It is incredibly easy to produce models by just putting data through a set of algorithms and picking some that perform best on some select criteria; however, pattern recognition algorithms can be very sneaky, and strive to find the easiest route to optimal performance. It is, therefore, important to understand if the resulting models are:

leveraging all the information given or relying lazily on particular aspects of the data.
biassed in a way that can affect a particular subset of the test data, whether this is applied to people, locations, incidents.
just not addressing particular tough, rare and very important cases, and are instead relying on easy wins on more numerous instances. a

By analysing the model performance on the development dataset you can learn about how the model is performing and learn how patterns in your data affect the performance.

You should establish parameters for realistic performance, for example, by checking how well people do at your task:

Using ablation techniques, or using interpretability packages, model visualisation, and probative testing, as well as manual inspection of the errors can all help. Some examples include:

About explainability: Picking an explainability technique | by Divya Gopinath | Towards Data Science
Attention layer visualisation: Explainable AI: Visualizing Attention in Transformers - Comet
Explainability tools: Building Trust in your ML Models — Explainability | by Matt Maufe | Filament-Syfter | Medium Explainable AI, LIME & SHAP for Model Interpretability | Unlocking AI's Decision-Making | DataCamp
Explainability in time series: [2104.00950] Explainable Artificial Intelligence (XAI) on TimeSeries Data: A Survey
In geospatial: The challenges of integrating explainable artificial intelligence into GeoAI - Xing - 2023 - Transactions in GIS - Wiley Online Library

It is important to understand the behaviour of your model with respect to protected characteristics of gender, race, and socioeconomic status. These must be recorded, the sources and reasons analysed, corrected if possible, or justified if not. If the algorithm is to be used in conjunction with human decision making these issues need to be clearly delineated and should inform how the model is deployed.