If you have a large amount of data to start with, performing batch import will be much faster than sending every event over an HTTP connection.

Preparing Input File

The import tool expects its input to be a file stored either in the local filesystem or on HDFS. Each line of the file should be a JSON object string representing an event. For more information about the format of event JSON object, please refer to this page.

Shown below is an example that contains 5 events ready to be imported to the Event Server.

1
2
3
4
5
{"event":"buy","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"0","eventTime":"2014-11-21T01:04:14.716Z"}
{"event":"buy","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"1","eventTime":"2014-11-21T01:04:14.722Z"}
{"event":"rate","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"2","properties":{"rating":1.0},"eventTime":"2014-11-21T01:04:14.729Z"}
{"event":"buy","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"7","eventTime":"2014-11-21T01:04:14.735Z"}
{"event":"buy","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"8","eventTime":"2014-11-21T01:04:14.741Z"}

Please make sure your import file does not contain any empty lines. Empty lines will be treated as a null object and will return an error during import.

Use SDK to Prepare Batch Input File

Some of the Apache PredictionIO (incubating) SDKs also provides FileExporter client. You may use them to prepare the JSON file as described above. The FileExporter creates event in the same way as EventClient except that the events are written to a JSON file instead of being sent to EventSever. The written JSON file can then be used by batch import.

(coming soon)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import predictionio
from datetime import datetime
import pytz

# Create a FileExporter and specify "my_events.json" as destination file
exporter = predictionio.FileExporter(file_name="my_events.json")

event_properties = {
    "someProperty" : "value1",
    "anotherProperty" : "value2",
    }
# write the events to a file
event_response = exporter.create_event(
    event="my_event",
    entity_type="user",
    entity_id="uid",
    target_entity_type="item",
    target_entity_id="iid",
    properties=event_properties,
    event_time=datetime(2014, 12, 13, 21, 38, 45, 618000, pytz.utc))

# ...

# close the FileExporter when finish writing all events
exporter.close()

(coming soon)
1
(coming soon)

Import Events from Input File

Importing events from a file can be done easily using the command line interface. Assuming that pio be in your search path, your App ID be 123, and the input file my_events.json be in your current working directory:

1
$ pio import --appid 123 --input my_events.json

After a brief while, the tool should return to the console without any error. Congratulations! You have successfully imported your events.