Google's NotebookLM has been around for about a year but it's new podcast function is making waves.
Here we experiment with some ONS data, although there's no reason to think that spreadsheets is what this tool was designed for at all.
I have been looking at this ONS DA data for a while now. It contains the yearly statistics on Domestic Abuse and the Criminal Justice System (CSJ). I keep slicing it and dicing it with my eyes, and not really getting anywhere. There's just something about being faced with a 26-page Excel file that can turn one's brain into mush.
So I've been looking at ways of interrogating this data with different LLMs, and I was drawn to NotebookLM because of the autogenerated podcast feature. It also generates a retrieval augmented generation (RAG) powered chat bot. This means that instead of populating the chatbot answers with internet-based knowledge it uses documents you upload to answer questions in the chat.
The many posts (blogs, YouTube videos, etc.) are not exaggerating how easy it is to get going with this. If you have a Google account you just go to https://notebooklm.google.com and upload documents you are interested in. There's a tiny hitch because the ONS data is a spreadsheet and NotebookLM allows only PDF, text, or audio files.
No problem! We can just use Excel or Pages or Google Sheets to export the document to PDF. Or you can grab my version.
Once you upload you can start chatting with the data. It gives you some pre-canned but quite sensible suggestions for example:
The little bubbles are references to the original data sources, but are difficult to interpret in this case because the tabular data has lost its formatting. In the above examples the 1 refers to the notes for table 9, while 2 links to table 9:
As much as I want to love this right out of the box I don't think this interpretation of the above numbers is quite correct:
There has been a general upward trend in the number of domestic abuse-related crimes referred to the CPS for charging decisions between 2015 and 2020, although this has decreased in recent years.
There's so much to unpack in that large paragraph of answer that it will take me a while fact check, so it was easier just to ask a more direct question about a particular column in the table.
Can you please rank the police forces from best to worst with regards to Percentage Charged or summonsed?
While it picks out the right table, the numbers are fairly random, e.g. 10.7% occurs only in table 8.
Right, but we came here for the podcast function. SO... can we get something real out of that even if the numbers in the chat are not working for us? Have a listen and let us know what you think, pay attention to the numbers!
Comments