Introduction

In this demo, we will show you how to build a Tinder-style web application (named "Tapster") recommending comics to users based on their likes/dislikes of episodes interactively.

The demo will use Similar Product Template. Similar Product Template is a great choice if you want to make recommendations based on immediate user activities or for new users with limited history. It uses MLLib Alternating Least Squares (ALS) recommendation algorithm, a Collaborative filtering (CF) algorithm commonly used for recommender systems. These techniques aim to fill in the missing entries of a user-item association matrix. Users and products are described by a small set of latent factors that can be used to predict missing entries. A layman's interpretation of Collaborative Filtering is "People who like this comic, also like these comics."

All the code and data is on GitHub at: github.com/PredictionIO/Demo-Tapster.

Data

The source of the data is from Tapastic. You can find the data files here.

The data structure looks like this:

Episode List data/episode_list.csv

Fields: episodeId | episodeTitle | episodeCategories | episodeUrl | episodeImageUrls

1,000 rows. Each row represents one episode.

User Like Event List data/user_list.csv

Fields: userId | episodeId | likedTimestamp

192,587 rows. Each row represents one user like for the given episode.

The tutorial has four major steps:

  • Demo application setup
  • PredictionIO installation and setup
  • Import data into database and PredictionIO
  • Integrate demo application with PredictionIO

Tapster Demo Application

The demo application is built using Rails.

You can clone the existing application with:

1
2
3
$ git clone  https://github.com/PredictionIO/Demo-Tapster.git
$ cd Demo-Tapster
$ bundle install

You will need to edit config/database.yml to match your local database settings. We have provided some sensible defaults for PostgreSQL, MySQL, and SQLite.

Setup the database with:

1
2
$ rake db:create
$ rake db:migrate

At this point, you should have the demo application ready but with an empty database. Lets import the episodes data into our database. We will do this with: $ rake import:episodes. An "Episode" is a single comic strip.

View on GitHub

This script is pretty simple. It loops through the CSV file and creates a new episode for each line in the file in our local database.

You can start the app and point your browser to http://localhost:3000

1
$rails server

Rails Server

Apache PredictionIO (incubating) Setup

Install Apache PredictionIO (incubating)

Follow the installation instructions here or simply run:

1
$ bash -c "$(curl -s https://raw.githubusercontent.com/apache/incubator-predictionio/master/bin/install.sh)"

PIO Install

Create a New App

You will need to create a new app on Apache PredictionIO (incubating) to house the Tapster demo. You can do this with:

1
$ pio app new tapster

Take note of the App ID and Access Key.

PIO App New

Setup Engine

We are going to copy the Similar Product Template into the PIO directory.

1
2
$ cd PredictionIO
$ pio template get apache/incubator-predictionio-template-similar-product tapster-episode-similar

Next we are going to update the App ID in the ‘engine.json’ file to match the App ID we just created.

1
2
3
$ cd tapster-episode-similar
$ nano engine.json
$ cd ..

Engine Setup

Modify Engine Template

By the default, the engine template reads the “view” events. We can easily to change it to read “like” events.

Modify readTraining() in DataSource.scala:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
  override
  def readTraining(sc: SparkContext): TrainingData = {

    ...

    val viewEventsRDD: RDD[ViewEvent] = eventsDb.find(
      appId = dsp.appId,
      entityType = Some("user"),
      eventNames = Some(List("like")), // MODIFIED
      // targetEntityType is optional field of an event.
      targetEntityType = Some(Some("item")))(sc)
      // eventsDb.find() returns RDD[Event]
      .map { event =>
        val viewEvent = try {
          event.event match {
            case "like" => ViewEvent( // MODIFIED
              user = event.entityId,
              item = event.targetEntityId.get,
              t = event.eventTime.getMillis)
            case _ => throw new Exception(s"Unexpected event ${event} is read.")
          }
        } catch {
          case e: Exception => {
            logger.error(s"Cannot convert ${event} to ViewEvent." +
              s" Exception: ${e}.")
            throw e
          }
        }
        viewEvent
      }

    ...
  }
}

Finally to build the engine we will run:

1
2
3
$ cd tapster-episode-similar
$ pio build
$ cd ..

PIO Build

Import Data

Once everything is installed, start the event server by running: $ pio eventserver

Event Server

You can check the status of Apache PredictionIO (incubating) at any time by running: $ pio status

ALERT: If your laptop goes to sleep you might manually need to restart HBase with:

1
2
3
$ cd PredictionIO/venders/hbase-0.98.6/bin
$ ./stop-hbase.sh
$ ./start-hbase.sh

The key event we are importing into Apache PredictionIO (incubating) event server is the "Like" event (for example, user X likes episode Y).

We will send this data to Apache PredictionIO (incubating) by executing $ rake import:predictionio command.

View on GitHub

This script is a little more complex. First we need to connect to the Event Server.

1
client = PredictionIO::EventClient.new(ENV['PIO_ACCESS_KEY'], ENV['PIO_EVENT_SERVER_URL'], THREADS)

You will need to create the environmental variables PIO_ACCESS_KEY and PIO_EVENT_SERVER_URL. The default Event Server URL is: http://localhost:7070.

If you forget your Access Key you can always run: $ pio app list

You can set these values in the .env file located in the application root directory and it will be automatically loaded into your environment each time Rails is run.

The next part of the script loops through each line of the data/user_list.csv file and returns an array of unique user and episode IDs. Once we have those we can send the data to Apache PredictionIO (incubating) like this.

First the users:

1
2
3
4
5
user_ids.each_with_index do |id, i|
  # Send unique user IDs to PredictionIO.
  client.aset_user(id)
  puts "Sent user ID #{id} to PredictionIO. Action #{i + 1} of #{user_count}"
end

And now the episodes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
episode_ids.each_with_index do |id, i|
  # Load episode from database - we will need this to include the categories!
  episode = Episode.where(episode_id: id).take

  if episode
    # Send unique episode IDs to PredictionIO.
    client.acreate_event(
      '$set',
      'item',
      id,
      properties: { categories: episode.categories }
    )
    puts "Sent episode ID #{id} to PredictionIO. Action #{i + 1} of #{episode_count}"
  else
    puts "Episode ID #{id} not found in database! Skipping!".color(:red)
  end
end

Finally we loop through the data/user_list.csv file a final time to send the like events:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
CSV.foreach(USER_LIST, headers: true) do |row|
  user_id = row[0] # userId
  episode_id = row[1] # episodeId

  # Send like to PredictionIO.
  client.acreate_event(
    'like',
    'user',
    user_id,
    { 'targetEntityType' => 'item', 'targetEntityId' => episode_id }
  )

  puts "Sent user ID #{user_id} liked episode ID #{episode_id} to PredictionIO. Action #{$INPUT_LINE_NUMBER} of #{line_count}."
end

In total the script takes about 4 minutes to run on a basic laptop. At this point all the data is now imported to Apache PredictionIO (incubating).

Import

Engine Training

We train the engine with the following command:

1
2
$ cd tapster-episode-similar
$ pio train -- --driver-memory 4g

PIO Train

Using the --driver-memory option to limit the memory used by Apache PredictionIO (incubating). Without this Apache PredictionIO (incubating) can consume too much memory leading to a crash. You can adjust the 4g up or down depending on your system specs.

You can set up a job to periodically retrain the engine so the model is updated with the latest dataset.

Deploy Model

You can deploy the model with: $ pio deploy from the tapster-episode-similar directory.

At this point, you have an demo app with data and a Apache PredictionIO (incubating) server with a trained model all setup. Next, we will connect the two so you can log the live interaction (likes) events into Apache PredictionIO (incubating) event server and query the engine server for recommendation.

Connect Demo app with Apache PredictionIO (incubating)

Overview

On a high level the application keeps a record of each like and dislike. It uses jQuery to send an array of both likes and dislikes to the server on each click. The server then queries Apache PredictionIO (incubating) for a similar episode which is relayed to jQuery and displayed to the user.

Data flow:

  • The user likes an episode.
  • Tapster sends the "Like" event to Apache PredictionIO (incubating) event server.
  • Tapster queries Apache PredictionIO (incubating) engine with all the episodes the user has rated (likes and dislikes) in this session.
  • Apache PredictionIO (incubating) returns 1 recommended episode.

JavaScript

All the important code lives in app/assets/javascripts/application.js View on GitHub

Most of this file is just handlers for click things, displaying the loading dialog and other such things.

The most important function is to query the Rails server for results from Apache PredictionIO (incubating).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Query the server for a comic based on previous likes. See episodes#query.
queryPIO: function() {
  var _this = this; // For closure.
  $.ajax({
    url: '/episodes/query',
    type: 'POST',
    data: {
      likes: JSON.stringify(_this.likes),
      dislikes: JSON.stringify(_this.dislikes),
    }
  }).done(function(data) {
    _this.setComic(data);
  });
}

Rails

On the Rails side all the fun things happen in the episodes controller located at: app/controllers/episodes_controller View on GitHub.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def query
  # Create PredictionIO client.
  client = PredictionIO::EngineClient.new(ENV['PIO_ENGINE_URL'])

  # Get posted likes and dislikes.
  likes = ActiveSupport::JSON.decode(params[:likes])
  dislikes = ActiveSupport::JSON.decode(params[:dislikes])

  if likes.empty?
    # We can't query PredictionIO with no likes so
    # we will return a random comic instead.
    @episode = random_episode

    render json: @episode
    return
  end

  # Query PredictionIO.
  # Here we black list the disliked items so they are not shown again!
  response = client.send_query(items: likes, blackList: dislikes,  num: 1)

  # With a real application you would want to do some
  # better sanity checking of the response here!

  # Get ID of response.
  id = response['itemScores'][0]['item']

  # Find episode in database.
  @episode = Episode.where(episode_id: id).take

  render json: @episode
end

On the first line we make a connection to Apache PredictionIO (incubating). You will need to set the PIO_ENGINE_URL. This can be done in the .env file. The default URL is: http://localhost:8000.

Next we decode the JSON sent from the browser.

After that we check to see if the user has liked anything yet. If not we just return a random episode.

If the user has likes then we can send that data to Apache PredictionIO (incubating) event server.

We also blacklist the dislikes so that they are not returned.

With our response from Apache PredictionIO (incubating) it’s just a matter of looking it up in the database and rendering that object as JSON.

Once the response is sent to the browser JavaScript is used to replace the existing comic and hide the loading message.

Thats it. You’re done! If Ruby is not your language of choice check out our other SDKs and remember you can always interact with the Event Server though it’s native JSON API.

Source code is on GitHub at: github.com/PredictionIO/Demo-Tapster

Conclusion

Love this tutorial and Apache PredictionIO (incubating)? Both are open source (Apache 2 License). Fork this demo and build upon it. If you produce something cool shoot us an email and we will link to it from here.

Found a typo? Think something should be explained better? This tutorial (and all our other documenation) live in the main repo here. Our documentation is in the livedoc branch. Find out how to contribute documentation at http://predictionio.incubator.apache.org/community/contribute-documentation/].

We ♥ pull requests!