Build in-demand dev skills — and level up fast.
Starting at just $19.99/mo.

Stay relevant to recruiters at top companies with over 4,000+ videos, 40+ books, exclusive Professional Development Seminars, and more.

Home

Create Your Own Kotlin Playground (and Get a Data Science Head Start) with Jupyter Notebook

Learn the basics of Jupyter Notebook and how to turn it into an interactive interpreter for Kotlin. You’ll also learn about Data Frames, an important data structure for data science applications.

Version

  • Kotlin 1.5, Android 9.0, Other

If you’ve ever wished for a Kotlin version of a REPL (read-evaluate-play loop) like the ones in command-line Python or Node.js or for a lightweight local version of the Kotlin Playground, there’s a way to do it: by using Jupyter Notebook with a Kotlin kernel. In this article, you’ll set up a Kotlin playground that runs locally on your computer!

In this article, you’ll learn the basics of Jupyter Notebook and how to turn it into an interactive interpreter you can use to learn Kotlin, as a scratchpad for your Kotlin ideas and experiments or as your annotated library of code snippets. You’ll also learn about a data structure you can use to do data science with Kotlin.

Kotlin? For Data Science?

Right now, when you bring up the topic of data science, the Python and R programming languages come up most often. Python’s easy-to-learn and readable syntax as well as its vast ecosystem of data processing and analysis libraries (especially pandas and NumPy) and R’s standing among the statistics-minded and its Tidyverse data science packages have made them data science mainstays.

There’s no reason Kotlin, too, can’t be a good language for data science. Like Python, it has a concise and readable syntax, and it’s far easier to learn and less verbose than Java. As a JVM language, it’s compatible with the galaxy of Java libraries developed for and used by its base of millions of developers. It will run circles around both Python and R.

Python and R might have a head start, but that doesn’t mean you can’t use Kotlin for data science. Kotlin not only has the speed advantage, but it also has a platform advantage: Android. It’s the OS that runs on most mobile devices. And with Windows 11, Android apps will be on the OS that runs on most desktop and laptop devices. As people expect their apps to do more and more, apps need to use data science — and they won’t be written in Python or R. Think of doing data science in Kotlin as your head start.

Introducing Jupyter Notebook

This article will be a little different from most of the Kotlin and Android articles on this site. Instead of using IntelliJ IDEA or Android Studio and building an app for an Android device, you’ll use Jupyter Notebook, which you can think of as a combination of an interactive code playground and a wiki.

Jupyter Notebook, a web application, lets you create documents that combine:

  1. Human-readable narrative in the form of rich text, hyperlinks, images and anything else you can include in a web page, as well as …
  2. Computer-readable code that can import and process data and user input and present the results.

By combining a document with executable code, you can literally “show your work”. The narrative part of the notebook explains your idea to the reader, while the code part performs the processing and calculations right in front of them.

This capability of Jupyter Notebook has made it popular with scientists and researchers, including the winners of the 2017 Nobel Prize in physics and the winner of the 2018 Nobel Prize in economics. You’ll also find Jupyter Notebook used outside academia, in places such as Netflix and Yelp.

Python is the default programming language for Jupyter Notebook, but it’s not the only language. A library of kernels allows Jupyter Notebook to support dozens of other programming languages, including Kotlin.

In this article, you’ll install Jupyter Notebook on your computer (and yes, it runs on macOS, Windows and Linux), install the Kotlin kernel and then use the krangl library to try out some basic data science.

Getting Started

The simplest way to install Jupyter Notebook is to use Anaconda’s “Individual Edition” distribution. This is an open-source, no-cost distribution of Python that installs a Python interpreter (which is required to run Jupyter Notebook), Jupyter Notebook and a set of coding utilities and tools selected with the data scientist in mind.

Go to Anaconda’s “Individual Edition” page and download the appropriate installer for your computer and operating system. You’ll find Anaconda installer applications for macOS and Windows, as well as a shell script for installing Anaconda on Linux.

Once you’ve installed Anaconda on your computer, the next step will be to give Jupyter Notebook the ability to run Kotlin code by installing the Kotlin kernel. Do this by opening Terminal (on macOS or Linux) or Command Prompt (on Windows) and entering the following command:

conda install -c jetbrains kotlin-jupyter-kernel

Once the command completes, you’ll be able to launch Jupyter Notebook. The simplest way to do so is from the command line. Enter the following into Terminal or Command Prompt:

jupyter notebook

Your computer should open a new browser window or tab to a page that looks like this:

Jupyter Notebook list of files

This page displays the contents of your home directory. You can use it to navigate through and manage your file system.

Creating Your First Notebook

To create your first notebook, click the New button located near the upper-right corner of the page. A menu will appear, and one of the options will be Kotlin. Select that option to create a new Kotlin notebook.

A new browser tab or window will open, containing a page that looks like this:

New notebook page

The text area below the menu bar and menu buttons on the page is a cell. Think of a Jupyter Notebook as being like a spreadsheet with a single column of cells. Each cell can contain either content in Markdown or code in a programming language. The cells together form a document called a notebook, containing text the reader can read and code they can execute.

Depending on how you write the notebook, it can either be code with rich text annotations or an article or essay enhanced by code.

Understanding Code Cells

Start with code cells, which can execute code entered into them. By default, Jupyter Notebook cells are code cells.

You specify the programming language a Jupyter Notebook will support when you create it. In this case, you specified Kotlin as your Jupyter Notebook’s language, so it expects you’ll enter Kotlin into its code cells.

Right now, the single cell in your notebook is a default cell, which means it’s a code cell. Enter the following code into it:

println("Hello, Jupyter world!")

Now that you’ve entered the code, it’s time to run it. To run the code, make sure you’ve selected the cell and either:

  1. Click the Run button; or
  2. Press Shift-Enter.

You should see the following:

Notebook code output

Jupyter Notebook just ran the code you entered and then presented you with a new cell. Once again, newly created cells are code cells.

Enter the following into the new cell and run it. Once again, you run the code in a cell by selecting it and clicking the Run button or pressing Shift-Enter:

val items = listOf(
    "Alpha",
    "Bravo",
    "Charlie",
    "Delta"
)

This time, you see a new cell but no output. That’s expected; creating a List shouldn’t cause any output.

Try printing the contents of items. Enter the following into the new cell and run it:

println(items)

You should see the following output:

[Alpha, Bravo, Charlie, Delta]

Note that the variable items is still in scope. This is an important feature of notebooks: Anything you declare in a cell that you run remains declared for any cells you run afterward.

If you look at the cells you’ve run so far, you’ll see numbers to their left that show the order you ran the cells:

Cell numbers

The numbers 1, 2 and 3 indicate the order in which you ran the cells. You first printed the sentence “Hello, Jupyter world!”, then defined the items list and finally printed the content of items.

You can declare more than just values for use by later cells. Functions, classes and just about anything that can be named or assigned to a variable can be declared for use in subsequent cells.

To see this in action, enter and run the following in a new code cell:

val rand = kotlin.random.Random

fun rollTheDice(): Pair<Int, Int> {   
    return Pair(rand.nextInt(6) + 1, rand.nextInt(6) + 1) 
}

class Demo {
    fun sayHello() {
        println("Hello there!")
    }
}

You’ve just defined a function named rollTheDice() and a class named Demo. Both will be in scope in subsequent cells.

Enter the following into a new code cell and run it:

rollTheDice()

You’ll see an ordered pair of two integers, each being between one and six inclusive.

Note that the cell you just ran contains only rollTheDice() and no print() or println() function. Jupyter Notebook is made to execute a line of code that evaluates to a value, such as a variable name or function. It then prints out that value.

Classes also work as you would expect. Enter this code into a new code cell and run it:

val greeter = Demo()
greeter.sayHello()

The notebook responds by printing Hello there! immediately after the code cell.

If you’ve wondered why iOS developers got to use Swift Playgrounds but there wasn’t such a thing as Kotlin Playgrounds, here’s some good news: You have them now!

Working With Markdown Cells

It’s time to look at Markdown cells, which contain content specified in Markdown.

Select the newest cell, which should be at the bottom of the notebook. In the toolbar near the top of the page, you’ll see a drop-down menu that displays its current selection as Code. Change that selection to Markdown:

You’ve designated the cell as a Markdown cell. This means it expects to have Markdown entered into it and that running the cell will cause its Markdown to be rendered.

Try it out. Enter the following into the Markdown cell:

# Welcome to *Jupyter Notebook!*

## This notebook runs a *Kotlin* kernel.

This means the notebook will allow you to:

* Enter **content** using [Markdown,](https://www.markdownguide.org/getting-started/) and
* Enter **code** using [Kotlin.](https://kotlinlang.org/)

Run the cell. It now looks like this:

Markdown output

Double-click the cell. It switches from its fully rendered form back into Markdown, which you can edit further.

If you’re not familiar with it, Markdown is a way of formatting text as paragraphs, headers, hyperlinks, lists and so on, but without drowning you in the complexity that comes with working with HTML. Instead of tags, Markdown uses a limited set of characters to format text, making it easier to read and write than HTML.

In the Markdown above, you used the following Markdown formatting characters:

  • Headings: Lines that start with at least one # are headings. # denotes a level 1 heading, ## denotes a level 2 heading, ### denotes a level 3 heading and so on.
  • Bold and italic. You can specify text to mark as bold or italic using * characters in the following manner: *italic*, **bold** and ***bold italic***.
  • Unordered list. An unordered list is a block of text where every line begins with *.

Markdown has many more features that are beyond the scope of this article. To learn more about them, consult the Markdown Guide’s Getting Started page.

Initializing krangl

Programming languages generally don’t have data science functionality built-in; instead, they get them from libraries. On Python, pandas is one of the most popular data science libraries, while on R, the preferred one is deplyr.

Kotlin has the krangl library, which takes its name from Kotlin library for data wrangling. Its design borrows heavily from two R libraries: deplyr and purrr. You’ll find krangl provides a subset of classes, methods and properties with the same or similar names as those you’ll find in these libraries. This will come in handy because there’s far more documentation and literature on those libraries than for krangl — at least for now.

It’s time to make krangl and all its features available to your notebook.

Create a new Kotlin notebook by using Jupyter Notebook’s Edit menu and selecting FileNew NotebookKotlin.

Click the notebook’s title, which is located just to the right of the Jupyter logo at the top-left corner of the page (the title will probably be “Untitled2”). This allows you to rename the notebook: Rename Notebook Enter My First Data Frame into the pop-up and click Rename to change the notebook’s name. The new title will replace the old one: New notebook title

Changing a notebook’s name also changes its filename. If you look at the URL bar in your browser, you see the notebook’s filename is now My First Data Frame.ipynb (the .ipynb filename extension comes from Jupyter Notebook’s old name, iPython Notebook).

Enter the following into a new code cell and run it:

%use krangl

The cell should look like this for a few seconds …

Asterisk while running

And then it will look like this:

asterisk replaced with number

When the code in a cell is executing, the square brackets to the left of the cell contain an asterisk (*). In many cases, the code executes so quickly you don’t even see the asterisk.

The %use krangl code you just ran isn’t Kotlin but a “magic” (short for “magic command”). Magics are commands that instruct the notebook’s kernel to perform a specific task. The %use magic tells the Kotlin kernel to use one of its built-in libraries. It takes a few seconds to initialize, which is why you see the asterisk when running it.

With krangl initialized, it’s time to start working with data. The rest of this article will focus on data frames, which are the primary data structure in data science applications.

Diving into Data Frames

Introducing Data Frames

A data frame represents a table of data that’s organized into rows and columns. Each row represents a record or observation of some thing or happening, and each column represents a particular piece of data or property for a given row. Although you could use a two-dimensional array to store a data table, data frames have data-science-specific functionality.

The diagram below shows a small data frame containing data about different types of instant ramen:

Data Frame

In this data frame, each row represents a type of instant ramen. Each column represents a property of ramen, such as its brand, type or rating.

Although it’s possible to represent a table of data using arrays — either a two-dimensional array or an array of arrays — data frames are designed with data analysis in mind and come with methods and properties you would otherwise have to write yourself. They’re more like spreadsheets than two-dimensional arrays. Using data frames allows you to focus on analyzing and exploring data rather than on programming.

Creating a Data Frame from Scratch

krangl provides a class for data frames called DataFrame. This class provides several ways to create a data frame, one of which is by defining it directly.

Enter the following into a new code cell and run it:

val df: DataFrame = dataFrameOf(
    "language", "developer", "year_first_appeared", "preferred")(
    "Kotlin", "JetBrains", 2011, true,
    "Java", "James Gosling", 1995, false,
    "Swift", "Chris Lattner et al.", 2014, true,
    "Objective-C", "Tom Love and Brad Cox", 1984, false,
    "Dart", "Lars Bak and Kasper Lund", 2011, true
)

This creates df, an instance of DataFrame that defines a table of programming languages used for mobile development. As you continue to read data science code and articles, you’ll see the variable df over and over. It’s often used as a variable name for a data frame, just as i is often used as a loop index variable.

Take a look at df‘s contents. There are a couple ways to do this. The pure Kotlin way is to use DataFrame‘s toString() method, which returns a string containing the DataFrame‘s row and column dimensions and its first 10 rows.

You’ll use toString() indirectly via the print() function. Run the following in a new code cell:

print(df)

This will produce the following output:

A DataFrame: 5 x 4
       language                  developer   year_first_appeared   preferred
1        Kotlin                  JetBrains                  2011        true
2          Java              James Gosling                  1995       false
3         Swift       Chris Lattner et al.                  2014        true
4   Objective-C      Tom Love and Brad Cox                  1984       false
5          Dart   Lars Bak and Kasper Lund                  2011        true

DataFrame also has a print() method that has the same effect.

If you want output formatted even more nicely, let the notebook do the work. Enter the following into a new code cell:

df

As with anything else that returns or evaluates to a value, Jupyter Notebook will display df‘s value. The interesting twist with types like krangl’s DataFrame is they take advantage of “hooks” provided by Jupyter Notebook. The end result is that when the notebook displays a DataFrame‘s contents, it does so in the form of a nicely formatted table:

DataFrame output

This sort of feature is helpful when you’re submitting a research paper as a Jupyter Notebook and want it to have readable tables.

Getting the Data Frame’s Schema

In the world of databases, the term “schema” has a specific meaning: It’s a description of how the data in a database is organized. In a DataFrame, a schema is a description of how the data in the data frame is organized, accompanied by a small sample of the data. You can see the schema of a data frame with DataFrame‘s schema() method.

Look at df‘s schema. Run the following in a new code cell:

df.schema()

You’ll see the following output:

DataFrame with 5 observations
language             [Str]  Kotlin, Java, Swift, Objective-C, Dart
developer            [Str]  JetBrains, James Gosling, Chris Lattner et al., Tom Love and Brad Cox, Lars Bak and Kasper Lund
year_first_appeared  [Int]  2011, 1995, 2014, 1984, 2011
preferred            [Bol]  true, false, true, false, true

schema() is useful for getting a general idea about the data contained within a DataFrame. It prints the following:

  1. The number of rows in the data frame, which schema() refers to as “observations”.
  2. The name of each column in the data frame.
  3. The type of each column in the data frame.
  4. The first values stored in each column. Because df is a small data frame, schema() printed out all the values for all the columns.

You might remember that when you instantiated df, you never specified the column types. But schema() clearly shows each column has a type: language and developer are columns that contain string values, year_first_appeared contains integers, and preferred is a column of Booleans!

krangl’s dataFrameOf() method inferred the column types. You can specify column types when creating a DataFrame, but krangl uses the data you provide to determine the appropriate types so you don’t have to. This feature makes the krangl feel more dynamically typed — like pandas and deplyr — providing a more Python- or R-like experience.

Getting the Data Frame’s Dimensions and Column Names

schema() is good for diagnostics, but it isn’t useful if you want to programatically find how many rows and columns are in a DataFrame or what its column names are. Fortunately, DataFrame has useful properties for this purpose:

  • nrow: The number of rows in the data frame.
  • ncol: The number of columns in the data frame.
  • names: A list of strings specifying the names of the columns, going from left to right.

Use these properties. Run the following in a new code cell:

println("The data frame has ${df.nrow} rows and ${df.ncol} columns.")
println("The column indices and names are:")
df.names.forEachIndexed { index, name ->
    println("$index: $name")
}

You’ll see this output:

The data frame has 5 rows and 4 columns.
The column indices and names are:
0: language
1: developer
2: year_first_appeared
3: preferred

Examining the Data Frame’s Columns

The cols property of DataFrame returns a list of objects representing each column, going from left to right. Use it to take a closer look at df‘s columns.

Run the following in a new code cell:

df.cols.forEachIndexed { index, column ->
    println("$index: $column")
}

You’ll see this result:

0: language [Str][5]: Kotlin, Java, Swift, Objective-C, Dart
1: developer [Str][5]: JetBrains, James Gosling, Chris Lattner et al., Tom Love and Brad Cox, Lars Bak ...
2: year_first_appeared [Int][5]: 2011, 1995, 2014, 1984, 2011
3: preferred [Bol][5]: true, false, true, false, true

Each column object in the list returned by the col property is an instance of the DataCol class. DataCol has properties and methods that let you examine a column in greater detail and even perform some analysis on its contents.

For now, stick to using two DataCol properties:

  • name: The name of the column.
  • length: The number of items or rows in the column.

Run the following in a new code cell:

df.cols.forEachIndexed { index, column ->
    println("$index: name: ${column.name}   length: ${column.length}")
}

It will produce the following output:

0: name: language   length: 5
1: name: developer   length: 5
2: name: year_first_appeared   length: 5
3: name: preferred   length: 5

DataFrame has some syntactic sugar that makes it easier to work with columns. Although you could access df‘s first column using the syntax df.cols[0], it’s much simpler to access it using array syntax:

df[0] // Same thing as df.cols[0]

If you’d rather access a column by name, DataFrame also implements map syntax. For example, to access df‘s first column, which is named language, you can use this code:

df["language"] // Column 0's name is language,
               // so this is equivalent to
               // df.cols[0] and df[0]

Examining the Data Frame’s Rows

Like DataFrame has a cols property to access its columns, it also has a rows property. It returns an Iterable that lets you access a collection object representing each row, going from top to bottom. Use it to take a closer look at df‘s rows.

Run the following in a new code cell:

df.rows.forEachIndexed { index, row ->
    println("$index: $row")
}

You should see this output:

0: {language=Kotlin, developer=JetBrains, year_first_appeared=2011, preferred=true}
1: {language=Java, developer=James Gosling, year_first_appeared=1995, preferred=false}
2: {language=Swift, developer=Chris Lattner et al., year_first_appeared=2014, preferred=true}
3: {language=Objective-C, developer=Tom Love and Brad Cox, year_first_appeared=1984, preferred=false}
4: {language=Dart, developer=Lars Bak and Kasper Lund, year_first_appeared=2011, preferred=true}

Each row object is an instance of DataFrameRow, which is simply an alias for Map<String, Any?>, where each key-value pair represents the name of a column and its corresponding value. For example, you could modify the loop you just ran to print only each programming language and the year in which it first appeared using this code:

df.rows.forEachIndexed { index, row ->
    println("$index: name: ${row["language"]}   premiered: ${row["year_first_appeared"]}")
}

You’ll see this output:

0: name: Kotlin   premiered: 2011
1: name: Java   premiered: 1995
2: name: Swift   premiered: 2014
3: name: Objective-C   premiered: 1984
4: name: Dart   premiered: 2011

Because rows returns an Iterable rather than a List, you need to use the elementAt() method to access a row by its index number. For example, the following code retrieves row 1 of df:

df.rows.elementAt(1) // Retrieve row 1

Accessing Data Frame “Cells” by Column and Row

DataFrame provides a convenient column-row syntax for accessing individual “cells”.

Suppose you wanted to get the value in the year_first_appeared column for row 3. As mentioned before, you could access that column in several ways:

// These all produce the same result
df.cols[2]
df[2]
df["year_first_appeared"]

By adding a subscript to any of the lines above, you can access a specific row for that column. Here’s how you can access row 3 of the year_first_appeared column:

// These all access the value in the "year_first_appeared" column
// of row 3
df.cols[2][3]
df[2][3]
df["year_first_appeared"][3]

Where to Go From Here?

You can download the Jupyter Notebook files containing all the code from the exercises above by clicking the Download Materials button at the top or bottom of the tutorial.

With Jupyter Notebook and the Kotlin kernel, you have a powerful new tool at your disposal. You can use it as a straightforward Markdown note-taking tool or as an interactive coding environment. But the most interesting use cases appear when you combine Markdown and code cells to mix narrative text with executable code:

  • You could take the “scientific paper” route and create a notebook that’s mostly text, interspersed with code performing calculations that prove your thesis.
  • You might use Jupyter Notebooks as design documents for applications, with design notes written in Markdown cells and prototype code written in code cells.
  • Jupyter Notebooks also make great libraries for often-used code snippets. You store the code in the code cells and annotate the code in rich text using Markdown cells.
  • And finally, you can use Jupyter Notebooks for data science, which is introduced in the follow-up tutorial, Beginning Data Science with Jupyter Notebook and Kotlin.

If you’d like to find out more about Jupyter Notebook and how people are using it, here are a few good places to start:

Contributors

Comments

Reviews

More like this