This project has retired. For details please refer to its Attic page.

Overview

This engine template provides personalized recommendation for e-commerce applications with the following features by default:

  • Exclude out-of-stock items
  • Provide recommendation to new users who sign up after the model is trained
  • Recommend unseen items only (configurable)
  • Recommend popular items if no information about the user is available (added in template version v0.4.0)

This template requires PredictionIO version >= 0.9.0

Usage

Event Data Requirements

By default, this template takes the following data from Event Server:

  • Users' view events
  • Users' buy events
  • Items' with categories properties
  • Constraint unavailableItems set events

This template can easily be customized to consider more user events such as rate and like.

The view events are used as Training Data to train the model. The algorithm has a parameter unseenOnly; when this parameter is set to true, the engine would recommend unseen items only. You can specify a list of events which are considered as seen events with the algorithm parameter seenEvents. The default values are view and buy events, which means that the engine by default recommends un-viewed and un-bought items only. You can also define your own events which are considered as seen.

The constraint unavailableItems set events are used to exclude a list of unavailable items (such as out of stock) for all users in real time.

Input Query

  • UserID
  • Num of items to be recommended
  • List of white-listed item categories (optional)
  • List of white-listed ItemIds (optional)
  • List of black-listed ItemIds (optional)

The template also supports black-list and whitelist. If a whitelist is provided, the engine will include only those products in the recommendation. Likewise, if a blacklist is provided, the engine will exclude those products in the recommendation.

Output PredictedResult

  • A ranked list of recommended itemIDs

1. Install and Run PredictionIO

First you need to install PredictionIO 0.14.0 (if you haven't done it).

Let's say you have installed PredictionIO at /home/yourname/PredictionIO/. For convenience, add PredictionIO's binary command path to your PATH, i.e. /home/yourname/PredictionIO/bin:

1
$ PATH=$PATH:/home/yourname/PredictionIO/bin; export PATH

If you launched PredictionIO AWS instance, the path is located at /opt/PredictionIO/bin.

Once you have completed the installation process, please make sure all the components (PredictionIO Event Server, Elasticsearch, and HBase) are up and running.

If you launched PredictionIO AWS instance, you can skip pio-start-all. All components should have been started automatically.

If you are using PostgreSQL or MySQL, run the following to start PredictionIO Event Server:

1
$ pio eventserver &

If instead you are running HBase and Elasticsearch, run the following to start all PredictionIO Event Server, HBase, and Elasticsearch:

1
$ pio-start-all

You can check the status by running:

1
$ pio status

If everything is OK, you should see the following outputs:

1
2
3
4
...

(sleeping 5 seconds for all messages to show up...)
Your system is all ready to go.

To further troubleshoot, please see FAQ - Using PredictionIO.

2. Create a new Engine from an Engine Template

Now let's create a new engine called MyECommerceRecommendation by downloading the E-Commerce Recommendation Engine Template. Go to a directory where you want to put your engine and run the following:

1
2
$ git clone https://github.com/apache/predictionio-template-ecom-recommender.git MyECommerceRecommendation
$ cd MyECommerceRecommendation

A new directory MyECommerceRecommendation is created, where you can find the downloaded engine template.

3. Generate an App ID and Access Key

You will need to create a new App in PredictionIO to store all the data of your app. The data collected will be used for machine learning modeling.

Let's assume you want to use this engine in an application named "MyApp1". Run the following to create a new app "MyApp1":

1
$ pio app new MyApp1

You should find the following in the console output:

1
2
3
4
5
6
...
[INFO] [App$] Initialized Event Store for this app ID: 1.
[INFO] [App$] Created new app:
[INFO] [App$]       Name: MyApp1
[INFO] [App$]         ID: 1
[INFO] [App$] Access Key: 3mZWDzci2D5YsqAnqNnXH9SB6Rg3dsTBs8iHkK6X2i54IQsIZI1eEeQQyMfs7b3F

Note that App ID, **Access Key* are created for this App "MyApp1". You will need the Access Key when you collect data with EventServer for this App.

You can list all of the apps created its corresponding ID and Access Key by running the following command:

1
$ pio app list

You should see a list of apps created. For example:

1
2
3
4
[INFO] [App$]                 Name |   ID |                                                       Access Key | Allowed Event(s)
[INFO] [App$]               MyApp1 |    1 | 3mZWDzci2D5YsqAnqNnXH9SB6Rg3dsTBs8iHkK6X2i54IQsIZI1eEeQQyMfs7b3F | (all)
[INFO] [App$]               MyApp2 |    2 | io5lz6Eg4m3Xe4JZTBFE13GMAf1dhFl6ZteuJfrO84XpdOz9wRCrDU44EUaYuXq5 | (all)
[INFO] [App$] Finished listing 2 app(s).

4. Collecting Data

Next, let's collect training data for this Engine. By default, the E-Commerce Recommendation Engine Template supports 2 types of entities and 2 events: user and item; events view and buy. An item has the categories property, which is a list of category names (String). A user can view and buy an item. The special constraint entity with entityId unavailableItems defines a list of unavailable items and is taken into account in realtime during serving.

In summary, this template requires '$set' user event, '$set' item event, user-view-item events, user-buy-item event and '$set' constraint event.

This template can easily be customized to consider other user-to-item events.

You can send these events to PredictionIO Event Server in real-time easily by making a HTTP request or through the provided SDK. Please see App Integration Overview for more details how to integrate your app with SDK.

Let's try sending events to EventServer with the following curl commands (The corresponding SDK code is showed in other tabs).

Replace <ACCCESS_KEY> by the Access Key generated in above steps. Note that localhost:7070 is the default URL of the Event Server.

For convenience, set your access key to the shell variable, run:

$ ACCESS_KEY=<ACCESS_KEY>

For example, when a new user with id "u0" is created in your app on time 2014-11-02T09:39:45.618-08:00 (current time will be used if eventTime is not specified), you can send a $set event for this user. To send this event, run the following curl command:

1
2
3
4
5
6
7
8
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
  "event" : "$set",
  "entityType" : "user",
  "entityId" : "u0",
  "eventTime" : "2014-11-02T09:39:45.618-08:00"
}'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import predictionio

client = predictionio.EventClient(
  access_key=<ACCESS KEY>,
  url=<URL OF EVENTSERVER>,
  threads=5,
  qsize=500
)

# Create a new user

client.create_event(
  event="$set",
  entity_type="user",
  entity_id=<USER_ID>
)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<?php
require_once("vendor/autoload.php");
use predictionio\EventClient;

$client = new EventClient(<ACCESS KEY>, <URL OF EVENTSERVER>);

// Create a new user
$client->createEvent(array(
  'event' => '$set',
  'entityType' => 'user',
  'entityId' => <USER ID>
));

// Create a new item or set existing item's categories
$client->createEvent(array(
  'event' => '$set',
  'entityType' => 'item',
  'entityId' => <ITEM ID>
  'properties' => array('categories' => array('<CATEGORY_1>', '<CATEGORY_2>'))
));
?>
1
2
3
4
5
6
7
8
9
# Create a client object.
client = PredictionIO::EventClient.new(<ACCESS KEY>, <URL OF EVENTSERVER>)

# Create a new user
client.create_event(
  '$set',
  'user',
  <USER ID>
)
1
2
3
4
5
6
7
8
9
10
11
12
13
import org.apache.predictionio.Event;
import org.apache.predictionio.EventClient;

import com.google.common.collect.ImmutableList;

EventClient client = new EventClient(<ACCESS KEY>, <URL OF EVENTSERVER>);

// Create a new user
Event userEvent = new Event()
  .event("$set")
  .entityType("user")
  .entityId(<USER_ID>);
client.createEvent(userEvent);

When a new item "i0" is created in your app on time 2014-11-02T09:39:45.618-08:00 (current time will be used if eventTime is not specified), you can send a $set event for the item. Note that the item is set with categories properties: "c1" and "c2". Run the following curl command:

1
2
3
4
5
6
7
8
9
10
11
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
  "event" : "$set",
  "entityType" : "item",
  "entityId" : "i0",
  "properties" : {
    "categories" : ["c1", "c2"]
  }
  "eventTime" : "2014-11-02T09:39:45.618-08:00"
}'
1
2
3
4
5
6
7
8
9
10
# Create a new item or set existing item's categories

client.create_event(
  event="$set",
  entity_type="item",
  entity_id=item_id,
  properties={
    "categories" : ["<CATEGORY_1>", "<CATEGORY_2>"]
  }
)
1
2
3
4
5
6
7
8
9
<?php
// Create a new item or set existing item's categories
$client->createEvent(array(
  'event' => '$set',
  'entityType' => 'item',
  'entityId' => <ITEM ID>
  'properties' => array('categories' => array('<CATEGORY_1>', '<CATEGORY_2>'))
));
?>
1
2
3
4
5
6
7
8
9
# Create a new item or set existing item's categories
client.create_event(
  '$set',
  'item',
  <ITEM ID>, {
    'properties' => { 'categories' => ['<CATEGORY_1>', '<CATEGORY_2>'] }
  }
)

1
2
3
4
5
6
7
// Create a new item or set existing item's categories
Event itemEvent = new Event()
  .event("$set")
  .entityType("item")
  .entityId(<ITEM_ID>)
  .property("categories", ImmutableList.of("<CATEGORY_1>", "<CATEGORY_2>"));
client.createEvent(itemEvent)

The properties of the user and item can be set, unset, or delete by special events $set, $unset and $delete. Please refer to Event API for more details of using these events.

When the user "u0" view item "i0" on time 2014-11-10T12:34:56.123-08:00 (current time will be used if eventTime is not specified), you can send a view event. Run the following curl command:

1
2
3
4
5
6
7
8
9
10
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
  "event" : "view",
  "entityType" : "user",
  "entityId" : "u0",
  "targetEntityType" : "item",
  "targetEntityId" : "i0",
  "eventTime" : "2014-11-10T12:34:56.123-08:00"
}'
1
2
3
4
5
6
7
8
9
# A user views an item

client.create_event(
  event="view",
  entity_type="user",
  entity_id=<USER ID>,
  target_entity_type="item",
  target_entity_id=<ITEM ID>
)
1
2
3
4
5
6
7
8
9
10
<?php
// A user views an item
$client->createEvent(array(
   'event' => 'view',
   'entityType' => 'user',
   'entityId' => <USER ID>,
   'targetEntityType' => 'item',
   'targetEntityId' => <ITEM ID>
));
?>
1
2
3
4
5
6
7
8
9
10
# A user views an item.
client.create_event(
  'view',
  'user',
  <USER ID>, {
    'targetEntityType' => 'item',
    'targetEntityId' => <ITEM ID>
  }
)

1
2
3
4
5
6
7
8
// A user views an item
Event viewEvent = new Event()
    .event("view")
    .entityType("user")
    .entityId(<USER_ID>)
    .targetEntityType("item")
    .targetEntityId(<ITEM_ID>);
client.createEvent(viewEvent);

When the user "u0" buy item "i0" on time 2014-11-10T13:00:00.123-08:00 (current time will be used if eventTime is not specified), you can send a view event. Run the following curl command:

1
2
3
4
5
6
7
8
9
10
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
  "event" : "buy",
  "entityType" : "user",
  "entityId" : "u0",
  "targetEntityType" : "item",
  "targetEntityId" : "i0",
  "eventTime" : "2014-11-10T13:00:00.123-08:00"
}'
1
2
3
4
5
6
7
8
9
# A user buys an item

client.create_event(
  event="buy",
  entity_type="user",
  entity_id=<USER ID>,
  target_entity_type="item",
  target_entity_id=<ITEM ID>
)
1
2
3
4
5
6
7
8
9
10
<?php
// A user buys an item
$client->createEvent(array(
   'event' => 'buy',
   'entityType' => 'user',
   'entityId' => <USER ID>,
   'targetEntityType' => 'item',
   'targetEntityId' => <ITEM ID>
));
?>
1
2
3
4
5
6
7
8
9
# A user buys an item.
client.create_event(
  'buy',
  'user',
  <USER ID>, {
    'targetEntityType' => 'item',
    'targetEntityId' => <ITEM ID>
  }
)
1
2
3
4
5
6
7
8
// A user buys an item
Event viewEvent = new Event()
    .event("buy")
    .entityType("user")
    .entityId(<USER_ID>)
    .targetEntityType("item")
    .targetEntityId(<ITEM_ID>);
client.createEvent(viewEvent);

Query Event Server

Now let's query the EventServer and see if these events are imported successfully.

Go to following URL with your browser:

http://localhost:7070/events.json?accessKey=&lt;YOUR_ACCESS_KEY>

or run the following command in terminal:

1
$ curl -i -X GET "http://localhost:7070/events.json?accessKey=$ACCESS_KEY"

Note that you should quote the entire URL by using single or double quotes when you run the curl command.

It should return the imported events in JSON format. You can refer to Event Server Debugging Recipes for more different ways to query Event Server.

Import More Sample Data

This engine requires more data in order to train a useful model. Instead of sending more events one by one in real time, for quickstart demonstration purpose, we are going to use a script to import more events in batch.

A Python import script import_eventserver.py is provided to import sample data. It imports 10 users (with user ID "u1" to "u10") and 50 items (with item ID "i1" to "i50") with some random assigned categories ( with categories "c1" to "c6"). Each user then randomly view 10 items.

First, you will need to install Python SDK in order to run the sample data import script. To install Python SDK, run:

1
$ pip install predictionio

or

1
$ easy_install predictionio

You may need sudo access if you have permission issue. (ie. sudo pip install predictionio)

Make sure you are under the MyECommerceRecommendation directory. Execute the following to import the data:

1
2
$ cd MyECommerceRecommendation
$ python data/import_eventserver.py --access_key $ACCESS_KEY

You should see the following output:

1
2
3
4
5
6
7
8
9
...
User u10 buys item i14
User u10 views item i46
User u10 buys item i46
User u10 views item i30
User u10 buys item i30
User u10 views item i40
User u10 buys item i40
204 events are imported.

If you see error TypeError: init() got an unexpected keyword argument 'access_key', please update the Python SDK to the latest version.

You can query the event server again as described previously to check the imported events.

5. Deploy the Engine as a Service

Now you can build, train, and deploy the engine. First, make sure you are under the MyECommerceRecommendation directory.

1
$ cd MyECommerceRecommendation

Engine.json

Under the directory, you should find an engine.json file; this is where you specify parameters for the engine.

Modify this file to make sure the appName parameter match your App Name you created earlier (e.g. "MyApp1" if you follow the quickstart).

1
2
3
4
5
6
7
  ...
  "datasource": {
    "params" : {
      "appName": "MyApp1"
    }
  },
  ...

You may see appId in engine.json instead, which means you are using old template. In this case, make sure the appId defined in the file match your App ID. Alternatively, you can download the latest version of the template or follow our upgrade instructions to modify the template to use appName as parameter.

Note that the "algorithms" also has appName parameter which you need to modify to match your App Name as well:

1
2
3
4
5
6
7
8
9
10
11
  ...
  "algorithms": [
    {
      "name": "als",
      "params": {
        "appName": "MyApp1",
        ...
      }
    }
  ]
  ...

You may see appId in engine.json instead, which means you are using old template. In this case, make sure the appId defined in the file match your App ID. Alternatively, you can download the latest version of the template or follow our upgrade instructions to modify the template to use appName as parameter.

Building

Start with building your MyECommerceRecommendation engine. Run the following command:

1
$ pio build --verbose

This command should take few minutes for the first time; all subsequent builds should be less than a minute. You can also run it without --verbose if you don't want to see all the log messages.

Upon successful build, you should see a console message similar to the following.

1
[INFO] [Console$] Your engine is ready for training.

Training the Predictive Model

To train your engine, run the following command:

1
$ pio train

When your engine is trained successfully, you should see a console message similar to the following.

1
[INFO] [CoreWorkflow$] Training completed successfully.

Deploying the Engine

Now your engine is ready to deploy. Run:

1
$ pio deploy

When the engine is deployed successfully and running, you should see a console message similar to the following:

1
2
[INFO] [HttpListener] Bound to /0.0.0.0:8000
[INFO] [MasterActor] Bind successful. Ready to serve.

Do not kill the deployed engine process.

By default, the deployed engine binds to http://localhost:8000. You can visit that page in your web browser to check its status.

Engine Status

6. Use the Engine

Now, You can retrieve predicted results. To recommend 4 items to user ID "u1". You send this JSON { "user": "u1", "num": 4 } to the deployed engine and it will return a JSON of the recommended items. Simply send a query by making a HTTP request or through the EngineClient of an SDK.

With the deployed engine running, open another terminal and run the following curl command or use SDK to send the query:

1
2
3
4
$ curl -H "Content-Type: application/json" \
-d '{ "user": "u1", "num": 4 }' \
http://localhost:8000/queries.json

1
2
3
import predictionio
engine_client = predictionio.EngineClient(url="http://localhost:8000")
print engine_client.send_query({"user": "u1", "num": 4})
1
2
3
4
5
6
7
8
9
10
<?php
require_once("vendor/autoload.php");
use predictionio\EngineClient;

$client = new EngineClient('http://localhost:8000');

$response = $client->sendQuery(array('user'=> 'i1', 'num'=> 4));
print_r($response);

?>
1
2
3
4
5
6
7
# Create client object.
client = PredictionIO::EngineClient.new('http://localhost:8000')

# Query PredictionIO.
response = client.send_query('user' => 'i1', 'num' => 4)

puts response
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import com.google.common.collect.ImmutableMap;
import com.google.common.collect.ImmutableList;
import com.google.gson.JsonObject;

import org.apache.predictionio.EngineClient;

// create client object
EngineClient engineClient = new EngineClient("http://localhost:8000");

// query

JsonObject response = engineClient.sendQuery(ImmutableMap.<String, Object>of(
        "user", "u1",
        "num",  4
    ));

The following is sample JSON response:

1
2
3
4
5
6
7
8
{
  "itemScores":[
    {"item":"i4","score":0.006009267718658978},
    {"item":"i33","score":0.005999267822052033},
    {"item":"i14","score":0.005261309429391667},
    {"item":"i3","score":0.003007015026561692}
  ]
}

MyECommerceRecommendation is now running.

To update the model periodically with new data, simply set up a cron job to call pio train and pio deploy. The engine will continue to serve prediction results during the re-train process. After the training is completed, pio deploy will automatically shutdown the existing engine server and bring up a new process on the same port.

Note that if you import a large data set and the training seems to be taking forever or getting stuck, it's likely that there is not enough executor memory. It's recommended to setup a Spark standalone cluster, you'll need to specify more driver and executor memory when training with a large data set. Please see FAQ here for instructions.

Setting constraint "unavailableItems"

Now let's send a item contraint "unavailableItems" (replace accessKey with your Access Key):

You can also use SDK to send this event as described in the SDK sample above.

1
2
3
4
5
6
7
8
9
10
11
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
  "event" : "$set",
  "entityType" : "constraint"
  "entityId" : "unavailableItems",
  "properties" : {
    "items": ["i4", "i14", "i11"],
  }
  "eventTime" : "2015-02-17T02:11:21.934Z"
}'
1
2
3
4
5
6
7
8
9
10
# Set a list of unavailable items

client.create_event(
  event="$set",
  entity_type="constraint",
  entity_id="unavailableItems",
  properties={
    "items" : ["<ITEM ID1>", "<ITEM ID2>"]
  }
)
1
2
3
4
5
6
7
8
9
<?php
// Set a list of unavailable items
$client->createEvent(array(
  'event' => '$set',
  'entityType' => 'constraint',
  'entityId' => 'unavailableItems',
  'properties' => array('items' => array('<ITEM ID1>', '<ITEM ID2>'))
));
?>
1
2
3
4
5
6
7
8
# Set a list of unavailable items
client.create_event(
  '$set',
  'constraint',
  'unavailableItems', {
    'properties' => { 'items' => ['<ITEM ID1>', '<ITEM ID2>'] }
  }
)
1
2
3
4
5
6
7
// Set a list of unavailable items
Event itemEvent = new Event()
  .event("$set")
  .entityType("constraint")
  .entityId("unavailableItems")
  .property("items", ImmutableList.of("<ITEM ID1>", "<ITEM ID2>"));
client.createEvent(itemEvent)

Try to get recommendation for user u1 again, the unavailable items (e.g. i4, i14, i11). won't be recommended anymore:

1
2
3
4
5
6
7
8
9
$ curl -H "Content-Type: application/json" \
-d '{
  "user": "u1",
  "num": 4,
  "blackList": ["i21", "i26", "i40"]
}' \
http://localhost:8000/queries.json

{"itemScores":[{"item":"i33","score":0.005999267822052019},{"item":"i3","score":0.0030070150265619003},{"item":"i2","score":0.0028489173099429527},{"item":"i5","score":0.0028489173099429527}]}

You should send a full list of unavailable items whenever there is any updates in the list. The latest event is used.

When there is no more unavilable items, simply set an empty list. ie.

1
2
3
4
5
6
7
8
9
10
11
$ curl -i -X POST http://localhost:7070/events.json?accessKey=zPkr6sBwQoBwBjVHK2hsF9u26L38ARSe19QzkdYentuomCtYSuH0vXP5fq7advo4 \
-H "Content-Type: application/json" \
-d '{
  "event" : "$set",
  "entityType" : "constraint"
  "entityId" : "unavailableItems",
  "properties" : {
    "items": [],
  }
  "eventTime" : "2015-02-18T02:11:21.934Z"
}'

Advanced Query

In addition, the Query support the following optional parameters categories, whiteList and blackList.

Recommend items in selected categories:

1
2
3
4
5
6
7
8
9
$ curl -H "Content-Type: application/json" \
-d '{
  "user": "u1",
  "num": 4,
  "categories" : ["c4", "c3"]
}' \
http://localhost:8000/queries.json

{"itemScores":[{"item":"i4","score":0.006009267718658978},{"item":"i33","score":0.005999267822052033},{"item":"i14","score":0.005261309429391667},{"item":"i2","score":0.002848917309942939}]}

Recommend items in the whiteList:

1
2
3
4
5
6
7
8
9
$ curl -H "Content-Type: application/json" \
-d '{
  "user": "u1",
  "num": 4,
  "whiteList": ["i1", "i2", "i3", "i21", "i22", "i23", "i24", "i25"]
}' \
http://localhost:8000/queries.json

{"itemScores":[{"item":"i3","score":0.003007015026561692},{"item":"i2","score":0.002848917309942939},{"item":"i23","score":0.0016857619403278443},{"item":"i25","score":1.3707548965227745E-4}]}

Recommend items not in the blackList:

1
2
3
4
5
6
7
8
9
10
$ curl -H "Content-Type: application/json" \
-d '{
  "user": "u1",
  "num": 4,
  "categories" : ["c4", "c3"],
  "blackList": ["i21", "i26", "i40"]
}' \
http://localhost:8000/queries.json

{"itemScores":[{"item":"i4","score":0.006009267718658978},{"item":"i33","score":0.005999267822052033},{"item":"i14","score":0.005261309429391667},{"item":"i2","score":0.002848917309942939}]}

Next: DASE Components Explained