Create Your Own Kotlin Playground (and Get a Data Science Head Start) with Jupyter Notebook

Learn the basics of Jupyter Notebook and how to turn it into an interactive interpreter for Kotlin. You’ll also learn about Data Frames, an important data structure for data science applications. By Joey deVilla.

Leave a rating/review
Download materials
Save for later
Share
You are currently viewing page 2 of 4 of this article. Click here to view the first page.

Working With Markdown Cells

It’s time to look at Markdown cells, which contain content specified in Markdown.

Select the newest cell, which should be at the bottom of the notebook. In the toolbar near the top of the page, you’ll see a drop-down menu that displays its current selection as Code. Change that selection to Markdown:

You’ve designated the cell as a Markdown cell. This means it expects to have Markdown entered into it and that running the cell will cause its Markdown to be rendered.

Try it out. Enter the following into the Markdown cell:

# Welcome to *Jupyter Notebook!*

## This notebook runs a *Kotlin* kernel.

This means the notebook will allow you to:

* Enter **content** using [Markdown,](https://www.markdownguide.org/getting-started/) and
* Enter **code** using [Kotlin.](https://kotlinlang.org/)

Run the cell. It now looks like this:

Markdown output

Double-click the cell. It switches from its fully rendered form back into Markdown, which you can edit further.

If you’re not familiar with it, Markdown is a way of formatting text as paragraphs, headers, hyperlinks, lists and so on, but without drowning you in the complexity that comes with working with HTML. Instead of tags, Markdown uses a limited set of characters to format text, making it easier to read and write than HTML.

In the Markdown above, you used the following Markdown formatting characters:

  • Headings: Lines that start with at least one # are headings. # denotes a level 1 heading, ## denotes a level 2 heading, ### denotes a level 3 heading and so on.
  • Bold and italic. You can specify text to mark as bold or italic using * characters in the following manner: *italic*, **bold** and ***bold italic***.
  • Unordered list. An unordered list is a block of text where every line begins with *.

Markdown has many more features that are beyond the scope of this article. To learn more about them, consult the Markdown Guide’s Getting Started page.

Initializing krangl

Programming languages generally don’t have data science functionality built-in; instead, they get them from libraries. On Python, pandas is one of the most popular data science libraries, while on R, the preferred one is deplyr.

Kotlin has the krangl library, which takes its name from Kotlin library for data wrangling. Its design borrows heavily from two R libraries: deplyr and purrr. You’ll find krangl provides a subset of classes, methods and properties with the same or similar names as those you’ll find in these libraries. This will come in handy because there’s far more documentation and literature on those libraries than for krangl — at least for now.

It’s time to make krangl and all its features available to your notebook.

Create a new Kotlin notebook by using Jupyter Notebook’s Edit menu and selecting FileNew NotebookKotlin.

Click the notebook’s title, which is located just to the right of the Jupyter logo at the top-left corner of the page (the title will probably be “Untitled2”). This allows you to rename the notebook: Rename Notebook Enter My First Data Frame into the pop-up and click Rename to change the notebook’s name. The new title will replace the old one: New notebook title

Changing a notebook’s name also changes its filename. If you look at the URL bar in your browser, you see the notebook’s filename is now My First Data Frame.ipynb (the .ipynb filename extension comes from Jupyter Notebook’s old name, iPython Notebook).

Enter the following into a new code cell and run it:

%use krangl

The cell should look like this for a few seconds …

Asterisk while running

And then it will look like this:

asterisk replaced with number

When the code in a cell is executing, the square brackets to the left of the cell contain an asterisk (*). In many cases, the code executes so quickly you don’t even see the asterisk.

The %use krangl code you just ran isn’t Kotlin but a “magic” (short for “magic command”). Magics are commands that instruct the notebook’s kernel to perform a specific task. The %use magic tells the Kotlin kernel to use one of its built-in libraries. It takes a few seconds to initialize, which is why you see the asterisk when running it.

With krangl initialized, it’s time to start working with data. The rest of this article will focus on data frames, which are the primary data structure in data science applications.

Diving into Data Frames

Introducing Data Frames

A data frame represents a table of data that’s organized into rows and columns. Each row represents a record or observation of some thing or happening, and each column represents a particular piece of data or property for a given row. Although you could use a two-dimensional array to store a data table, data frames have data-science-specific functionality.

The diagram below shows a small data frame containing data about different types of instant ramen:

Data Frame

In this data frame, each row represents a type of instant ramen. Each column represents a property of ramen, such as its brand, type or rating.

Although it’s possible to represent a table of data using arrays — either a two-dimensional array or an array of arrays — data frames are designed with data analysis in mind and come with methods and properties you would otherwise have to write yourself. They’re more like spreadsheets than two-dimensional arrays. Using data frames allows you to focus on analyzing and exploring data rather than on programming.

Creating a Data Frame from Scratch

krangl provides a class for data frames called DataFrame. This class provides several ways to create a data frame, one of which is by defining it directly.

Enter the following into a new code cell and run it:

val df: DataFrame = dataFrameOf(
    "language", "developer", "year_first_appeared", "preferred")(
    "Kotlin", "JetBrains", 2011, true,
    "Java", "James Gosling", 1995, false,
    "Swift", "Chris Lattner et al.", 2014, true,
    "Objective-C", "Tom Love and Brad Cox", 1984, false,
    "Dart", "Lars Bak and Kasper Lund", 2011, true
)

This creates df, an instance of DataFrame that defines a table of programming languages used for mobile development. As you continue to read data science code and articles, you’ll see the variable df over and over. It’s often used as a variable name for a data frame, just as i is often used as a loop index variable.

Take a look at df‘s contents. There are a couple ways to do this. The pure Kotlin way is to use DataFrame‘s toString() method, which returns a string containing the DataFrame‘s row and column dimensions and its first 10 rows.

You’ll use toString() indirectly via the print() function. Run the following in a new code cell:

print(df)

This will produce the following output:

A DataFrame: 5 x 4
       language                  developer   year_first_appeared   preferred
1        Kotlin                  JetBrains                  2011        true
2          Java              James Gosling                  1995       false
3         Swift       Chris Lattner et al.                  2014        true
4   Objective-C      Tom Love and Brad Cox                  1984       false
5          Dart   Lars Bak and Kasper Lund                  2011        true

DataFrame also has a print() method that has the same effect.

If you want output formatted even more nicely, let the notebook do the work. Enter the following into a new code cell:

df

As with anything else that returns or evaluates to a value, Jupyter Notebook will display df‘s value. The interesting twist with types like krangl’s DataFrame is they take advantage of “hooks” provided by Jupyter Notebook. The end result is that when the notebook displays a DataFrame‘s contents, it does so in the form of a nicely formatted table:

DataFrame output

This sort of feature is helpful when you’re submitting a research paper as a Jupyter Notebook and want it to have readable tables.