The intent of this project was to record what podcasts I listen to so I could create some fun charts. I had previously used a Tasker script on my Android phone to record my listening, but it wasn't very reliable. Around this time, Pocket Casts came out with a Listening History feature on their mobile and web apps. Although their official API documentation did not include any documentation about this newly released feature, I used the documentation for the other API features to figure out how to access this API.
Read on to see instructions on how you can access the API yourself, as well as how I use the Listening History to infer exactly when I listened to that podcast (it's not as straight forward as you would think).
- Copy get_history.py to your files
- Add your pocket casts token and history filepath to get_history.py.
- Follow the instructions here to get your credentials.json and token.pickle files.
- Copy the credentials.json and token.pickle files to your files.
- Add your google spreadsheet id to parse_history.py.
- Set get_history.py to run every hour (or whatever frequency you wish to measure your podcast listening)
- Use parse_hist.py to parse the history once you have a few days recorded
Since Pocket Casts' Listening History feature does not expose the actual date or time you listened to an episode, I am using the date and time that your script runs to guess the date that you listened. If your script fails to run, you will only know that you listened to the episode at some point after the last time you recorded and before the next time you recorded. No way around this. Because of this, I use the library "Notify" to alert myself if my script fails so I can (hopefully) fix it within the day.
Here is all the information that is returned by the Pocket Casts' Listening History API about one episode you listened to.
{
"uuid": "e1473ac4-c23a-4076-bef2-394851f6c656",
"url": "https://anchor.fm/s/18480338/podcast/play/15507406/https%3A%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fproduction%2F2020-5-21%2F84160236-44100-2-4d2e178fed4ea.mp3",
"published": "2020-06-22T05:00:00Z",
"duration": 4775,
"fileType": "audio/mp3",
"title": "Fans Bring the Weapons - Extreme Championship Wrestling",
"size": "152807471",
"playingStatus": 2,
"playedUpTo": 2113,
"starred": false,
"podcastUuid": "7395c3c0-5277-0138-97a4-0acc26574db2",
"podcastTitle": "Legends of Philadelphia",
"episodeType": "full",
"episodeSeason": 1,
"episodeNumber": 13,
"isDeleted": false
}
Note that this does not include the date or time that you listened to the podcast, only the date and time that the episdoe was published.
The webplayer also does not give us this information, so I think it safe to assume that this information is not available via the API.
Thankfully, the array that is returned by the API is in chronological order. Using this order, and by pulling this listening history every hour of the day, I can determine which hour of the day I finished listening to an episode. However, if I finish a 2.5 hour long episode at 1:13 pm, my script will tell me that I listened to 2.5 hours of episodes at 2:00 pm.
While I could correct this by telling my script that I listened to 0.5 hours at 12:00 pm, 1 hr at 1:00 pm, and 1 hr at 2:00 pm, but this is still not very accurate. I prefer to record the data every hour of the day, and then display my data on a graph by the day. This method reduces the inaccuracy of the data.
It's also important to note that if I listen to a episode twice, it will purge the old listen and replace it with the new one.
When a podcast moves in the list from one data recording to another, it's almost always going to be a continuation of the old listen. In fact, almost every time you listen to an episode that's over an hour your data recordings will show a continued listen. Because it's so common, I've built the script to always assume that you're continuing your listening session when it sees data like this.
I currently decided to let re-listens overwrite old listens because the risk of messing up all the continuing listens is too great. However, I have some ideas of how I would factor in re-listens.
if days_since_last_listen < X:
you're continuing to listen to this episode
elif days_since_last_listen > X:
I'm going to assume you finished listening last time and you're listening to the episode again now
This could fail in two ways:
- if you decided to go back a minute or two during your listening session at the top of the hour (when your script was taking recordings)
- if the podcast is less than an hour long, you might finish the episode before the hourly data recording can catch that your playedUpTo value was less
If you want to improve on this code and try to figure out how to make it better, I've added an example history array that you can use to test whether your code properly determines re-listening.
It shows that the episode "uuid":3 was 50% done on Jan 1st and then it moved up to 100% done on the date of the next recording, Jan 2nd. The episode "uuid":2 was played to 100% on Jan 1st, and then it was re-listened on Jan 3rd and played to 100% again. The only way that I can tell it was re-listened to is that it moved in the order.
- create an authorization script that can run on a failed status_code
- add export to csv option
- add notify and notify set-up instructions
I originally created these files using Google App Script. I had no trouble converting get_history to Python, but my authorization file is giving me errors. I've included my original Google App Script authorization file for reference. My Python authorization file attempts to do the same thing, but returns a 500 Internal Server Error.
##: Investigate Accuracy of Timing Taking a look at my records, I think it's possible that the listening history database doesn't update as frequently as I would like. I've never seen it out of date on the website, but, for example, a podcast that I listened to sometimes shows up two or three hours after I think I listened to it. I'm going to investigate to see if the data is delayed and by how much. If it is delayed, it may be because it only pulls the API data every X hours unless the user specifically requests it, and maybe the request is different from the API request for data.