Installing Apache PredictionIO®

Assuming you are following the directory structure in the following, replace /home/abc with your own home directory wherever you see it.

Downloading Binary Distribution

You can use pre-built binary distribution for Apache PredictionIO® if you are building against

Scala 2.11.12
Spark 2.1.3
Hadoop 2.7.7
Elasticsearch 5.6.9

Download binary release from an Apache mirror.

Verifying Release

Verify binary release using the signatures and checksums and project release KEYS.

$ gpg --import KEYS
$ gpg --verify apache-predictionio-0.14.0-bin.tar.gz.asc apache-predictionio-0.14.0-bin.tar.gz
 

You should see something like this.

gpg: Signature made Tue Sep 26 22:55:22 2017 PDT
gpg:                using RSA key 7E2363D84719A8F4
gpg: Good signature from "Chan Lee <chanlee@apache.org>" [ultimate]
 

For further information, the official guide from Apache has the most up-to-date and complete information.

Installation

Extract the binary distribution and proceed to Installing Dependencies.

1	$ tar zxvf apache-predictionio-0.14.0-bin.tar.gz

Downloading Source Code

Download source release from an Apache mirror.

Verifying Release

Verify source release using signatures and checksums and project release KEYS.

$ gpg --import KEYS
$ gpg --verify apache-predictionio-0.14.0.tar.gz.asc apache-predictionio-0.14.0.tar.gz
 

You should see something like this.

gpg: Signature made Tue Sep 26 22:55:22 2017 PDT
gpg:                using RSA key 7E2363D84719A8F4
gpg: Good signature from "Chan Lee <chanlee@apache.org>" [ultimate]
 

For further information, the official guide from Apache has the most up-to-date and complete information.

Building

Run the following at the directory where you downloaded the source code to build Apache PredictionIO®. As an example, if you want to build PredictionIO to support Scala 2.11.12, Spark 2.4.0, and Elasticsearch 6.4.2, you can do

$ tar zxvf apache-predictionio-0.14.0.tar.gz
$ cd apache-predictionio-0.14.0
$ ./make-distribution.sh -Dscala.version=2.11.12 -Dspark.version=2.4.0 -Delasticsearch.version=6.4.2
 

You should see something like the following when it finishes building successfully.

...
PredictionIO-0.14.0/sbt/sbt
PredictionIO-0.14.0/conf/
PredictionIO-0.14.0/conf/pio-env.sh
PredictionIO binary distribution created at PredictionIO-0.14.0.tar.gz
 

Extract the binary distribution you have just built.

1	$ tar zxvf PredictionIO-0.14.0.tar.gz

Building against Different Versions of Dependencies

Starting from version 0.11.0, PredictionIO can be built against different versions of dependencies. As of writing, one could build PredictionIO against these different dependencies:

Scala 2.11.x
Spark 2.0.x, 2.1.x, 2.2.x, 2.3.x, 2.4.x
Hadoop 2.6.x, 2.7.x
Elasticsearch 1.7.x(deprecated), 5.6.x, 6.x

Installing Dependencies

Let us install dependencies inside a subdirectory of the Apache PredictionIO installation. By following this convention, you can use Apache PredictionIO's default configuration as is.

1	$ mkdir PredictionIO-0.14.0/vendors

Spark Setup

Apache Spark is the default processing engine for PredictionIO. Download and extract it.

$ wget https://archive.apache.org/dist/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz
$ tar zxvfC spark-2.4.0-bin-hadoop2.7.tgz PredictionIO-0.14.0/vendors
 

If you decide to install Apache Spark to another location, you must edit PredictionIO-0.14.0/conf/pio-env.sh and change the SPARK_HOME variable to point to your own Apache Spark installation.

Storage Setup

PostgreSQL Setup

You may skip this section if you are not using PostgreSQL.

PostgreSQL can be used by PredictionIO as a storage backend for all 3 repositories (event data, meta data, and model data). This is perhaps the easiest route if you are trying PredictionIO for the first time.

Make sure you have PostgreSQL installed. For Mac Users, Homebrew is recommended and can be used as

1	$ brew install postgresql

or on Ubuntu:

1	$ apt-get install postgresql

Now that PostgreSQL is installed use the following comands

1	$ createdb pio

If you get an error of the form could not connect to server: No such file or directory, then you must first start the server manually,:

$ pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start
 

Finally use the command:

1	$ psql -c "create user pio with password 'pio'"

Starting from 0.11.0, PredictionIO no longer bundles JDBC drivers. Download the PostgreSQL JDBC driver from the official web site, and put the JAR file in the lib subdirectory. By default, conf/pio-env.sh assumes version 42.0.0 JDBC 4.2. If you use a different version, modify POSTGRES_JDBC_DRIVER to point to the correct JAR.

HBase and Elasticsearch Setup

Elasticsearch Setup

You may skip this section if you are not using Elasticsearch.

Elasticsearch can be used as a storage backend for the meta data repository.

Starting from 0.11.0, if you build PredictionIO against Elasticsearch 5+, you may also use it as a backend for the event data repository.

$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.9.tar.gz
$ tar zxvfC elasticsearch-5.6.9.tar.gz PredictionIO-0.14.0/vendors
 

If you decide to install Elasticsearch to another location, you must edit PredictionIO-0.14.0/conf/pio-env.sh and change the PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME variable to point to your own Elasticsearch installation.

If you are using a shared network, change the network.host line in PredictionIO-0.14.0/vendors/elasticsearch-5.6.9/config/elasticsearch.yml to network.host: 127.0.0.1 because by default, Elasticsearch looks for other machines on the network upon setup and you may run into weird errors if there are other machines that is also running Elasticsearch.

If you are not using the default setting at localhost, you may change the following in PredictionIO-0.14.0/conf/pio-env.sh to fit your setup.

PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
 

HBase Setup

You may skip this section if you are not using HBase.

HBase can be used as the backend of the event data repository.

Download HBase from a mirror. Extract HBase by following the example below.

1	$ tar zxvfC hbase-1.2.6-bin.tar.gz PredictionIO-0.14.0/vendors

If you decide to install HBase to another location, you must edit PredictionIO-0.14.0/conf/pio-env.sh and change the PIO_STORAGE_SOURCES_HBASE_HOME variable to point to your own HBase installation.

You will need to at least add a minimal configuration to HBase to start it in standalone mode. Details can be found here. Here, we are showing a sample minimal configuration.

For production deployment, run a fully distributed HBase configuration.

Edit PredictionIO-0.14.0/vendors/hbase-1.2.6/conf/hbase-site.xml.

<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>file:///home/abc/PredictionIO-0.14.0/vendors/hbase-1.2.6/data</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/home/abc/PredictionIO-0.14.0/vendors/hbase-1.2.6/zookeeper</value>
  </property>
</configuration>
 

HBase will create hbase.rootdir automatically to store its data.

Edit PredictionIO-0.14.0/vendors/hbase-1.2.6/conf/hbase-env.sh to set JAVA_HOME for the cluster. For example:

export JAVA_HOME=/usr/lib/jvm/java-8-oracle/jre
 

For Mac users, use this instead (change 1.8 to 1.7 if you have Java 7 installed):

export JAVA_HOME=`/usr/libexec/java_home -v 1.8`
 

In addition, you must set your environment variable JAVA_HOME. For example, in /home/abc/.bashrc add the following line:

export JAVA_HOME=/usr/lib/jvm/java-8-oracle
 

Start PredictionIO and Dependent Services

If you are using PostgreSQL or MySQL, skip pio-start-all and pio-stop-all, and do PredictionIO-0.14.0/bin/pio eventserver & instead.

Simply do PredictionIO-0.14.0/bin/pio-start-all and you should see something similar to the following:

$ PredictionIO-0.14.0/bin/pio-start-all
Starting Elasticsearch...
Starting HBase...
starting master, logging to /home/abc/PredictionIO-0.14.0/vendors/hbase-1.2.6/bin/../logs/hbase-abc-master-yourhost.local.out
Waiting 10 seconds for HBase to fully initialize...
Starting PredictionIO Event Server...
$
 

You may use jps to verify that you have everything started:

$ jps -l
15344 org.apache.hadoop.hbase.master.HMaster
15409 org.apache.predictionio.tools.console.Console
15256 org.elasticsearch.bootstrap.Elasticsearch
15469 sun.tools.jps.Jps
$
 

A running setup will have these up and running:

org.apache.predictionio.tools.console.Console
org.apache.hadoop.hbase.master.HMaster
org.elasticsearch.bootstrap.Elasticsearch

At any time, you can run PredictionIO-0.14.0/bin/pio status to check the status of the dependencies.

Now you have installed everything you need!

You can proceed to Choosing an Engine Template, or continue the QuickStart guide of the Engine template if you have already chosen one.

Installing Apache PredictionIO®

PredictionIO Docs

Installing Apache PredictionIO®

On this page

Installing Apache PredictionIO®

Downloading Binary Distribution

Verifying Release

Installation

Downloading Source Code

Verifying Release

Building

Building against Different Versions of Dependencies

Installing Dependencies

Spark Setup

Storage Setup

PostgreSQL Setup

HBase and Elasticsearch Setup

Elasticsearch Setup

HBase Setup

Start PredictionIO and Dependent Services