Tennis Tracker (alpha): Data Release

The Tennis Tracker is at the stage where feedback is necessary to move forward. In this post, I'll write a technical summary of the data, then at the end I'll describe a few ways in which you can use the data. The output contains three datasets. The point-level set provides descriptive statistics for a given point. Most of this set is taken from tennis24. The event-level set provides each (predicted) event in the tennis match, along with descriptive information like the time of the event, the type of event, the location of the ball, etc. The frame-level set provides the location of each player throughout the match at each frame.

I chose the US Open Women's semi-final in 2022 between Iga Swiatek and Aryna Sabalenka as the first release for no particular reason, other than it was available and that this year's US Open is soon. I suppose you can consider this post a form of "marketing", because if it's successful and people (that's you!) pull interesting information from the data, I plan to scale the program to map out entire tournaments.

Links to data
Frame-level data: https://drive.google.com/file/d/1_z_og5IunLnr5mN-biNcOGC970TXyirJ/view?usp=sharing Event-level data: https://drive.google.com/file/d/1IFjteOqwf-Gr0EqNFqKEmiy3hAWaFXGg/view?usp=sharing Point-level data: https://drive.google.com/file/d/1F69wCb3sezmVESG8qgeb5HM00bfd2Ccd/view?usp=sharing

Data dictionaries
Frame-level set frame: frame number of match. front_player_x: x-coordinate of the front player in a given frame. front_player_y: y-coordinate of the front player in a given frame. back_player_x: x-coordinate of the back player in a given frame. back_player_y: y-coordinate of the back player in a given frame.

Event-level set frame: frame number of match. event: name of event (serve, hit, bounce, net). ball_x: x-coordinate of the tennis ball during an event. ball_y: y-coordinate of the tennis ball during an event. actor: player or location of event (front, back). state: whether ball is in play (in) or not (out); only applicable for bounces. point: point number of match.

Point-level set point: point number of match. serve: player holding the serve (home, away). serve_side: side of serve for point (ad, deuce). serve_location: side of serve for point (front, back). home_point_score: number of points won in game by home player. away_point_score: number of points won in game by away player. home_game_score: number of games won in set by home player. away_game_score: number of games won in set by away player. home_set_score: number of sets won in match by home player. away_set_score: number of sets won in match by away player.

Runtime Specifications broadcast time: 132 minutes processing time: 129 minutes processing tools: Colab T4 GPU, personal Macbook Pro

About the data
The Tennis Tracker relies on a single broadcast feed at 720p and 30fps. This is decidedly not Hawkeye. Here's the thing: it doesn't have to be Hawkeye for it to be valuable. My goal in building this program was to contribute to tennis analytics by supplying data which is more detailed than what's currently on offer. I believe I am close to succeeding. That being said, there are still some measures which I wouldn't recommend you use without analyzing if it is stable enough for your needs. These include: ball speed (not stable enough yet), ace counts/locations, among others (I'll add to this as I work through them).

One measure I am very interested in seeing / potentially deriving, is "return probability". As more games are tracked, the output from the Tennis Tracker could provide the foundation for a general probability of return statistic, where each shot has a probability of return attributed to it. From this, we can observe which players tend to return difficult balls, which make the most mistakes (beyond "unforced errors"), and a plethora of other metrics.

Court Coordinates
The full court coordinates (doubles) are: x1 = 150, x2 = 510, y1 = 150, y2 = 930. Combining this with the standard court dimensions, you can get all the lines you need.