This section explains how to model your application data as events.

Entity: it's the real world object involved in the events. The entity may perform the events, or interact with other entity (which became targetEntity in an event).

For example, your application may have users and some items which the user can interact with. Then you can model them as two entity types: user and item and the entityId can uniquely identify the entity within each entityType (e.g. user with ID 1, item with ID 1).

An entity may peform some events (e.g user 1 does something), and entity may have properties associated with it (e.g. user may have gender, age, email etc). Hence, events involve entities and there are two types of events, respectively:

  1. Generic events performed by an entity.
  2. Special events for recording changes of an entity's properties
  3. Batch events

They are explained in details below.

1. Generic events performed by an entity

Whenever the entity performs an action, you can describe such event as entity "verb" targetEntity with "some extra information". The "targetEntity" and "some extra information" can be optional. The "verb" can be used as the name of the "event". The "some extra information" can be recorded as properties of the event.

The following are some simple examples:

  • user-1 signs-up
1
2
3
4
5
{
  "event" : "sign-up",
  "entityType" : "user",
  "entityId" : "1"
}
  • user-1 views item-1 (with targetEntity)
1
2
3
4
5
6
7
{
  "event" : "view",
  "entityType" : "user",
  "entityId" : "1",
  "targetEntityType" : "item",
  "targetEntityId" : "1"
}
  • user-1 rates item-1 with rating of 4 stars (with targetEntity and properties)
1
2
3
4
5
6
7
8
9
10
{
  "event" : "rate",
  "entityType" : "user",
  "entityId" : "1",
  "targetEntityType" : "item",
  "targetEntityId" : "1",
  "properties" : {
    "rating" : 4
  }
}

2. Special events for recording changes of an entity's properties

The generic events described above are used to record general actions performed by the entity. However, an entity may have properties (or attributes) associated with it. Morever, the properties of the entity may change over time (for example, user may have new address, item may have new categories). In order to record such changes of an entity's properties. Special events $set , $unset and $delete are introduced.

The following special events are reserved for updating entities and their properties:

  • "$set" event: Set properties of an entity (also implicitly create the entity). To change properties of entity, you simply set the corresponding properties with value again. The $set events should be created only when:
    • The entity is first created (or re-create after $delete event), or
    • Set the entity's existing or new properties to new values (For example, user updates his email, user adds a phone number, item has a updated categories)
  • "$unset" event: Unset properties of an entity. It means treating the specified properties as not existing anymore. Note that the field properties cannot be empty for $unset event.
  • "$delete" event: delete the entity.

There is no targetEntityId for these special events.

For example, setting entity user-1's properties birthday and address:

1
2
3
4
5
6
7
8
9
{
  "event" : "$set",
  "entityType" : "user",
  "entityId" : "1",
  "properties" : {
    "birthday" : "1984-10-11",
    "address" : "1234 Street, San Francisco, CA 94107"
  }
}

Note that the properties values of the entity will be aggregated based on these special events and the eventTime. The state of the entity is different depending on the time you are looking at the data. In engine's DataSource, you can use PEventStore.aggregateProperties() API to retrieve the state of entity's properties (based on time).

Although it doesn't hurt to import duplicated special events for an entity (exactly same properties) into event server (it just means that the entity changes to the same state as before and new duplicated event provides no new information about the user), it could waste storage space.

To demonstrate the concept of these special events, we are going to import a sequence of events and see how it affects the retrieved entitiy's properties.

Assuming you have created the App (named "MyTestApp") for testing and Event Server is started.

Event 1

For example, on 2014-09-09T..., a user with ID "2" is newly added in your application. Also, this user has properties a = 3 and b = 4. To record such event, we can create a $set event for the user.

for convenience, assign the ACCESS_KEY of your test app to the shell variable ACCESS_KEY and run following curl command to import the event:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ ACCESS_KEY="<YOUR_ACCESS_KEY>"

$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
  "event" : "$set",
  "entityType" : "user",
  "entityId" : "2",
  "properties" : {
    "a" : 3,
    "b" : 4
  },
  "eventTime" : "2014-09-09T16:17:42.937-08:00"
}'

You should see something like the following, meaning the events are imported successfully.

1
2
3
4
5
6
7
HTTP/1.1 201 Created
Server: spray-can/1.3.2
Date: Tue, 02 Jun 2015 23:13:58 GMT
Content-Type: application/json; charset=UTF-8
Content-Length: 57

{"eventId":"PVjOIP6AJ5PgsiGQW6pgswAAAUhc7EwZpCfSj5bS5yg"}a

After this eventTime, user-2 is created and has properties of a = 3 and b = 4.

Event 2

Then, on 2014-09-10T..., let's say the user has updated the properties b = 5 and c = 6. To record such propertiy change, create another $set event. Run the following command:

1
2
3
4
5
6
7
8
9
10
11
12
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
  "event" : "$set",
  "entityType" : "user",
  "entityId" : "2",
  "properties" : {
    "b" : 5,
    "c" : 6
  },
  "eventTime" : "2014-09-10T13:12:04.937-08:00"
}'

After this eventTime, user-2 has properties of a = 3, b = 5 and c = 6. Note that property b is updated with latest value.

Event 3

Then, let's say on 2014-09-11T..., the user's properties 'b' is removed for some reasons. To record such event, create $unset event for user-2 with properties b:

1
2
3
4
5
6
7
8
9
10
11
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
  "event" : "$unset",
  "entityType" : "user",
  "entityId" : "2",
  "properties" : {
    "b" : null
  },
  "eventTime" : "2014-09-11T14:17:42.456-08:00"
}'

After this eventTime, user-2 has properties of a = 3, and c = 6. Note that property b is removed.

Event 4

Then, on 2014-09-12T..., the user is removed from the application data. To record such event, create $delete event:

1
2
3
4
5
6
7
8
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
  "event" : "$delete",
  "entityType" : "user",
  "entityId" : "2",
  "eventTime" : "2014-09-12T16:13:41.452-08:00"
}'

After this eventTime, user-2 is removed.

Event 5

Then, on 2014-09-13T..., let's say we want to add back the user-2 into the application again for some reasons. To record such event, create $set event for user-2 with empty properties:

1
2
3
4
5
6
7
8
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
  "event" : "$set",
  "entityType" : "user",
  "entityId" : "2",
  "eventTime" : "2014-09-13T16:17:42.143-08:00"
}'

After this eventTime, user-2 is created again with empty properties.

Note that all above events are recorded in Event Store. Let's query Event Server and see if these events are imported.

Go to following URL with your browser:

http://localhost:7070/events.json?accessKey=<YOUR_ACCESS_KEY>

or run the following command in terminal:

1
$ curl -i -X GET "http://localhost:7070/events.json?accessKey=$ACCESS_KEY"

Note that you should quote the entire URL by using single or double quotes when you run the curl command.

You should see all events being created for this user-2.

Now, let's retrieve the user-2's properties using the PEventStore API.

First, start pio-shell by running:

1
$ pio-shell --with-spark

You should see the following output and shell prompt:

1
2
3
4
5
6
15/06/02 16:01:54 INFO SparkILoop: Created spark context..
Spark context available as sc.
15/06/02 16:01:54 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.

scala>

Run the following code in PIO shell (Replace "MyTestApp" with your app name):

1
2
3
scala> val appName="MyTestApp"
scala> import org.apache.predictionio.data.store.PEventStore
scala> PEventStore.aggregateProperties(appName=appName, entityType="user")(sc).collect()

This command is using PEventStore to aggregate the user properties as a Map of user Id and the PropertyMap. collect() will return the data as array. You should see the following output at the end, which indicates there is user id 2 with empty properties because that's the state of user 2 with all imported events taken into account.

1
2
res0: Array[(String, org.apache.predictionio.data.storage.PropertyMap)] =
Array((2,PropertyMap(Map(), 2014-09-09T16:17:42.937-08:00, 2014-09-13T16:17:42.143-08:00)))

Let's say we want to retrieve the state of user 2 properties with only events 1 and event 2 imported. To do that, we can specify the untilTime (aggregate the user properties with events up to the specified time) in the API.

Run the following in the pio-shell. the untilTime is set to DateTime(2014, 9, 11, 0, 0) which is the time right before event 3.

1
2
scala> import org.joda.time.DateTime
scala> PEventStore.aggregateProperties(appName=appName, entityType="user", untilTime=Some(new DateTime(2014, 9, 11, 0, 0)))(sc).collect()

You should see the following ouptut and the aggregated properties matches what we expected as described earlier (right befor event 3): user-2 has properties of a = 3, b = 5 and c = 6.

1
2
res2: Array[(String, org.apache.predictionio.data.storage.PropertyMap)] =
Array((2,PropertyMap(Map(b -> JInt(5), a -> JInt(3), c -> JInt(6)), 2014-09-09T16:17:42.937-08:00, 2014-09-10T13:12:04.937-08:00))

As you have seen in the example above, the state of user-2 is different depending on the available events or the time you are looking at the data. Recording events in logging fashioned allows us to re-construct the state the entity according to the time.

3. Batch Events to the EventServer

Using a different REST address on the usual EventServer port, as of PredictionIO 0.9.5 you can send batches of up to 50 events as a time. The format is as described above but the JSON payload is packaged as an array of Event objects.

Response:

  • Status:
    • 200 on success if we can return an array data in the response even when some events fail (e.g. because of ill-format). Client needs to check individual dictionary to verify all events were successfully created.
    • 400 otherwise. Perhaps exceeded 50 events?
  • Data: an array of dictionaries each of which contains either following keys
    • “status”: 201 if the event was successfully created; otherwise, 400.
    • "eventID": the value is the eventID if the event is successfully created and
    • "message": the error message string if any error occurs during creation

The order in the response array is corresponding to the order of the request array. However, the events might be imported in any order.

Sample Request:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
curl -i -X POST http://localhost:7070/batch/events.json?accessKey=...
-H "Content-Type: application/json" -d ‘ \
[
    {
        "event": "$create",
        "entityType": "user",
        "entityId": "uid",
        "properties": {
            ...
        }
    },
    {
        "event": "like",
        "entityType": "user",
        "entityId": "uid",
        "targetEntityType": "item",
        "targetEntityId": "iid",
        "properties": {
            ...
        }
        "eventTime": "2004-12-13T21:39:45.618-07:00"
    },
    ...
]

Sample Response:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
HTTP/1.1 200 Successful
Server: spray-can/1.2.1
Date: Wed, 10 Sep 2014 22:51:33 GMT
Content-Type: application/json; charset=UTF-8
Content-Length: 41
[
    {"eventId":"AAAABAAAAQDP3-jSlTMGVu0waj8"},
    {
        "status": 201,
        "eventId": "AAAABAAAAQDP3-jSlTMGVu0waj8"
    },
    {
        "status": 201,
        "eventId":"AAAABAAAAQDP3-jSlTMGVu0waj9"
    },
     …
    {
        "status": 400,
        "message":"Required entityType is missing”
    },
    …
]

Notice that each subrequest receives a status response. The limit of 50 events per batch requests is in line with Facebook, Mixpanel, SegmentIO and other event syncs that accept batches.