The hockey data that I started exploring in Part 1 of this series is just the tip of the iceberg. I have so many questions that standard hockey statistics reporting cannot answer for me.

I’m interested not only in the current wins and losses for a team or a player, but also in the history that brought them to their current state. Can I see any interesting patterns? Which characteristics about teams or players matter more than others? Is this possible to identify?

More data!

To answer these and other questions, I need to put together the data a different way. I need more data.

Even if your job doesn’t require you to analyze hockey data, there are probably plenty of questions that you could answer if only you had the data and knew what to do with it.

My goal through this series of posts is to use hockey data as a backdrop for a discussion of technologies, concepts, and techniques for data analytics that you can apply to the types of data that are more applicable to you.

The technology du jour is Power BI again, because I’ve only just begun building up my data. It’s a great tool for exploring new data sources quickly and easily before committing myself to a full-blown enterprise-scale solution if that were my long-term goal.

In today’s post, I need to get more detail into the Power BI file that I started in my last post. Currently, I have general team data and high-level game statistics, but now I want to add in player data as well as their statistics. To do this, I need to generate dynamic queries to retrieve player and player statistics.

The key to success is setting up functions and parameters in the Power BI queries… Follow along to learn how!

Getting the Player Data

The structure of the data available in the NHL API (which I described in my last post) requires me to construct a query first to get all the players on a team. For example, I know that the Team ID for the Vegas Golden Knights is 54, so the endpoint to get the team roster need to look like this:

https://statsapi.web.nhl.com/api/v1/teams/54/roster

That query returns an array of person objects. Here’s an example of a person:

{ “person” : { “id” : 8476448, “fullName” : “William Karlsson”, “link” : “/api/v1/people/8476448” }, “jerseyNumber” : “71”, “position” : { “code” : “C”, “name” : “Center”, “type” : “Forward”, “abbreviation” : “C” } }

I need to dynamically construct the endpoint to be able to get players for all the teams. But this information is only some of the information available about a player. To get other details, I need to use the use the player id link like this:

https://statsapi.web.nhl.com/api/v1/people/8476448

{ “id” : 8476448, “fullName” : “William Karlsson”, “link” : “/api/v1/people/8476448”, “firstName” : “William”, “lastName” : “Karlsson”, “primaryNumber” : “71”, “birthDate” : “1993-01-08”, “currentAge” : 25, “birthCity” : “Marsta”, “birthCountry” : “SWE”, “nationality” : “SWE”, “height” : “6′ 1\””, “weight” : 189, “active” : true, “alternateCaptain” : false, “captain” : false, “rookie” : false, “shootsCatches” : “L”, “rosterStatus” : “Y”, “currentTeam” : { “id” : 54, “name” : “Vegas Golden Knights”, “link” : “/api/v1/teams/54” }, “primaryPosition” : { “code” : “C”, “name” : “Center”, “type” : “Forward”, “abbreviation” : “C” } }

The data from the api/v1/people query has the level of detail per person that I want in a table of players. To get all players from all teams, I need to dynamically generate all these links. Fortunately, I’ve dealt with this type of problem before. It’s just a variation on a technique I described recently regarding dynamic JSON requests.

In general, I start with a query to get the teams which I can loop through to get the rosters which in turn I can loop through to get the players. To do this, I do the following:

Want to see the details?

Step 1: Create the getRoster function.

Step 2: Create the getPlayer function.

Step 3: Create the players table.

Getting Player Statistics

My last data acquisition step is to get statistics data for each player. I just need to build a function to dynamically get data by team like I did above using this endpoint as my base structure:

http://statsapi.web.nhl.com/api/v1/teams/54?hydrate=roster(person(stats(splits=yearByYear)))

It turns out there are many different kinds of statistics that I can get in addition to these statistics by season. I’ll probably get them all added into my model eventually, but the process is the same. For a list of other available statistics to use instead of yearByYear, see http://statsapi.web.nhl.com/api/v1/statTypes.

Step 4: Create the getPlayerStatistics function.

Step 5: Create the statistics table.

Viewing Player Statistics

Today I don’t have a particular question to answer. I just want to have an easy way to view information about a player. I’ll enhance my model with more data and my own calculations later. For now, I can create a report to that displays selected information about a player and get basic career statistics. In fact, you can explore this yourself below – just click the arrow under Team Name to switch teams, select a checkbox to choose a player, and use the drop-downs to select one or more leagues or teams to view the player’s career statistics. When you point to the list of players, a scroll bar appears to allow you to find players not currently visible in the list.

If you use the left arrow at the bottom of the page, you can switch to the first page of my report that I created in Part 1 of this series.

You too can embed a Power BI report in a website. But be sure you’re careful about the data you’re exposing! The data in my report is freely available NHL data.

I validated my data against the NHL statistics online. For example, William Karlsson’s data is at https://www.nhl.com/player/william-karlsson-8476448. (Oh look, there’s that player ID I have in my data results!) And so far, so good.

What’s Next?

In future posts, I plan to explain how to use various techniques to explore, enhance, and understand this data. I have enough data now for Part 3 of this series in which I can start performing some basic data exploration to begin to understand what I have gathered thus far. Stay tuned!

Here’s a link to the final PBIX file if you want to take a look at the queries and reports yourself.