{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### This is an experiment in using the Kotlin kernel for Jupyter \n", "Note: this notebook was updated July 2021 to point to newer versions of its dependencies, which had become deprecated and were not allowing the notebook to complete successfully. It was also used for a presentation of Kotlin's Jupyter kernel in March 2021, so the 2020 season data, which didn't exist at the time the initial article was written, was added." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "// two \"supported\" packages, we can skip the full dependency & import boilerplate\n", "%use lets-plot, krangl " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
RkYearTmsRshTDRecTDPR TDKR TDFblTDIntTDOthTDAllTD2PM2PAXPMXPAFGMFGASftyPtsPts/G
1202032532871872131314736313112441338812960241269224.8
2201932447797773435513325411311361210802983171167622.8
3201832439847752445413716612911641235802947101194823.3
42017323807411074142412253782106611348661027151111821.7
520163244378610722344130651105111911958501009201164722.8
6201532365842137335351318459411461217834987161167822.8
72014323808071362847121293285812221230829987211156522.6
8201332410804137306591338346912621267863998201198723.4
9201232401757181326711112972956122912358521016131165122.8
102011324007452093149512592450120012078381011211135822.2
112010323997511323225751270265012031214794964131128322.0
122009324297101018254871247245911651185756930141099121.5
13200832476646161333521012462864117011768451000211127922.0
142007323867201725375261243305711651177795960181110421.7
15200632424648159334931181213511241135767942121057720.7
16200532431644912234761172274710991114783967111055620.6
172004324167321117345351268377311791189703870151100021.5
182003324276541813245841198296011101128756954211066620.8
192002324606942217264651270478111481165737951121109721.7
202001313656351210335961120408510081027732959101002420.2

... only showing top 20 rows

" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// csv is courtesy of pro-football-reference: https://www.pro-football-reference.com/years/NFL/scoring.htm\n", "val dfScoring = DataFrame.readCSV(\"nfl_scoring.csv\")\n", "dfScoring" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "look how `filter` is a normal, native Kotlin command! Only difference is `lt` or `gt` instead of `>` or `<`\n", "\n", "compare to non-native Python required by Pandas for simple filtering: \n", " `df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]`\n", "\n", "https://stackoverflow.com/questions/17071871/how-do-i-select-rows-from-a-dataframe-based-on-column-values" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[Rk, Year, Tms, RshTD, RecTD, PR TD, KR TD, FblTD, IntTD, OthTD, AllTD, 2PM, 2PA, XPM, XPA, FGM, FGA, Sfty, Pts, Pts/G]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// DataFrames are cool but so are native Kotlin data structures, like Maps\n", "val mapScoring = dfScoring.filter { (it[\"Year\"] lt 2021) AND (it[\"Year\"] gt 1990) }.toMap()\n", "mapScoring.keys" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// the map's keys are strings (column titles), the values are lists, the individual lists contain the column data\n", "mapScoring[\"Year\"]?.map { it }" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// boom... we can easily plot a key column, Total Points\n", "val p = letsPlot(mapScoring) { x = \"Year\"; y = \"Pts\" } + ggsize(640, 240)\n", "p + geomBar(stat=Stat.identity) +\n", " ggtitle(\"Total Points per NFL regular season\")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// and another, Receiving TDs\n", "val p = letsPlot(mapScoring) { x = \"Year\"; y = \"RecTD\" } + ggsize(640, 240)\n", "p + geomBar(stat=Stat.identity) +\n", " ggtitle(\"Total Receiving Touchdowns per NFL regular season\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The graphs will be a bit more dramatic if we group the years by 5-year buckets" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "// to add new columns (for bucketing), we return to the original DataFrame and create new columns based on existing values\n", "// krangl's `addColumn` is not a native Kotlin method, but its syntax is just like `filter` or `map`, it accesses `it`, etc.\n", "val dfScoringRanges = dfScoring\n", " .filter { (it[\"Year\"] lt 2021) AND (it[\"Year\"] gt 1990) }\n", " .addColumn(\"YearRange\") { it[\"Year\"].map{ floor(it.minus(1).div(5.0)).times(5).plus(1).toInt() }}\n", " .addColumn(\"Years\") { it[\"YearRange\"].map{ \"$it - ${it + 4}\" }}\n", " \n", "// we're creating another Map, but now we are grouping by year bucket and averaging the values within each bucket\n", "val mapScoringRanges = dfScoringRanges\n", " .select({ listOf(\"Year\", \"Pts\", \"RecTD\", \"YearRange\", \"Years\") })\n", " .groupBy(\"YearRange\", \"Years\")\n", " .summarize(\n", " \"mean_Pts\" to { it[\"Pts\"].mean(removeNA = true) },\n", " \"mean_RecTD\" to { it[\"RecTD\"].mean(removeNA = true) }\n", " ).toMap()\n", "\n", "// these xlimits are the discrete values used on the x-axis (and the labels)\n", "// only annoying thing is all the null handling of a data source we know is non-null\n", "val xlimits = mapScoringRanges[\"Years\"]?.toSet()?.reversed()?.filterNotNull()" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// same plot as before, but bucketed -- unlike above graph, every value is higher than previous, no ups & downs\n", "val p = letsPlot(mapScoringRanges) { x = \"Years\"; y = \"mean_Pts\" } + ggsize(780, 240)\n", " p + geomBar(stat=Stat.identity) + scaleXDiscrete(limits = xlimits) +\n", " ggtitle(\"Average total points per NFL regular season\")" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "// ggsave(p + geom_bar(stat=Stat.identity) + scale_x_discrete(limits = xlimits) +\n", "// ggtitle(\"Average total points per NFL regular season\"), \"avg_points_binned.png\")" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " " ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// again, same plot, bucketed\n", "val p2 = letsPlot(mapScoringRanges) { x = \"Years\"; y = \"mean_RecTD\" } + ggsize(780, 240)\n", "p2 + geomBar(stat=Stat.identity) + scaleXDiscrete(limits = xlimits) +\n", " ggtitle(\"Average Receiving Touchdowns per NFL regular season\")" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "// ggsave(p2 + geom_bar(stat=Stat.identity) + scale_x_discrete(limits = xlimits) +\n", "// ggtitle(\"Average Receiving Touchdowns per NFL regular season\"), \"avg_rectd_binned.png\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Kotlin", "language": "kotlin", "name": "kotlin" }, "language_info": { "codemirror_mode": "text/x-kotlin", "file_extension": ".kt", "mimetype": "text/x-kotlin", "name": "kotlin", "nbconvert_exporter": "", "pygments_lexer": "kotlin", "version": "1.5.30-dev-598" } }, "nbformat": 4, "nbformat_minor": 4 }