//: # (<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

1
http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

You can proceed to Choosing an Engine Template, or continue the QuickStart guide of the Engine template if you have already chosen one.

)

Manual Install

Follow the steps below to setup Apache PredictionIO (incubating) and its dependencies. In these instructions we will assume you are in your home directory. Wherever you see /home/abc, replace it with your own home directory.

Java

Ensure you have an appropriate Java version installed. For example:

1
2
3
4
$ java -version
java version "1.8.0_40"
Java(TM) SE Runtime Environment (build 1.8.0_40-b25)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)

Download Apache PredictionIO (incubating)

Download Apache PredictionIO (incubating) and extract it.

1
2
3
4
5
$ cd
$ pwd
/home/abc
$ wget http://download.prediction.io/PredictionIO-0.11.0-incubating.tar.gz
$ tar zxvf PredictionIO-0.11.0-incubating.tar.gz

Download instructions above apply to previous non-Apache releases only. Once we have made an Apache release, new instructions will be provided.

Installing Dependencies

Let us install dependencies inside a subdirectory of the Apache PredictionIO (incubating) installation. By following this convention, you can use PredictionIO's default configuration as is.

1
$ mkdir PredictionIO-0.11.0-incubating/vendors

Spark Setup

Apache Spark is the default processing engine for PredictionIO. Download and extract it.

1
2
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.3-bin-hadoop2.6.tgz
$ tar zxvfC spark-1.6.3-bin-hadoop2.6.tgz PredictionIO-0.11.0-incubating/vendors

If you decide to install Apache Spark to another location, you must edit PredictionIO-0.11.0-incubating/conf/pio-env.sh and change the SPARK_HOME variable to point to your own Apache Spark installation.

Elasticsearch Setup

You may skip this section if you are using PostgreSQL or MySQL.

Elasticsearch can be used as a storage backend for the meta data repository.

Starting from 0.11.0, if you build PredictionIO against Elasticsearch 5+, you may also use it as a backend for the event data repository.

1
2
$ wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.6.tar.gz
$ tar zxvfC elasticsearch-1.7.6.tar.gz PredictionIO-0.11.0-incubating/vendors

If you decide to install Elasticsearch to another location, you must edit PredictionIO-0.11.0-incubating/conf/pio-env.sh and change the PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME variable to point to your own Elasticsearch installation.

If you are using a shared network, change the network.host line in PredictionIO-0.11.0-incubating/vendors/elasticsearch-1.7.6/config/elasticsearch.yml to network.host: 127.0.0.1 because by default, Elasticsearch looks for other machines on the network upon setup and you may run into weird errors if there are other machines that is also running Elasticsearch.

If you are not using the default setting at localhost, you may change the following in PredictionIO-0.11.0-incubating/conf/pio-env.sh to fit your setup.

1
2
3
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300

HBase Setup 

You may skip this section if you are using PostgreSQL or MySQL.

HBase can be used as the backend of the event data repository.

Download HBase from a mirror. Extract HBase by following the example below.

1
$ tar zxvfC hbase-1.2.6-bin.tar.gz PredictionIO-0.11.0-incubating/vendors

If you decide to install HBase to another location, you must edit PredictionIO-0.11.0-incubating/conf/pio-env.sh and change the PIO_STORAGE_SOURCES_HBASE_HOME variable to point to your own HBase installation.

You will need to at least add a minimal configuration to HBase to start it in standalone mode. Details can be found here. Here, we are showing a sample minimal configuration.

For production deployment, run a fully distributed HBase configuration.

Edit PredictionIO-0.11.0-incubating/vendors/hbase-1.2.6/conf/hbase-site.xml.

1
2
3
4
5
6
7
8
9
10
<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>file:///home/abc/PredictionIO-0.11.0-incubating/vendors/hbase-1.2.6/data</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/home/abc/PredictionIO-0.11.0-incubating/vendors/hbase-1.2.6/zookeeper</value>
  </property>
</configuration>

HBase will create hbase.rootdir automatically to store its data.

Edit PredictionIO-0.11.0-incubating/vendors/hbase-1.2.6/conf/hbase-env.sh to set JAVA_HOME for the cluster. For example:

1
export JAVA_HOME=/usr/lib/jvm/java-8-oracle/jre

For Mac users, use this instead (change 1.8 to 1.7 if you have Java 7 installed):

1
export JAVA_HOME=`/usr/libexec/java_home -v 1.8`

In addition, you must set your environment variable JAVA_HOME. For example, in /home/abc/.bashrc add the following line:

1
export JAVA_HOME=/usr/lib/jvm/java-8-oracle

Start PredictionIO and Dependent Services

If you are using PostgreSQL or MySQL, skip pio-start-all and pio-stop-all, and do PredictionIO-0.11.0-incubating/bin/pio eventserver & instead.

Simply do PredictionIO-0.11.0-incubating/bin/pio-start-all and you should see something similar to the following:

1
2
3
4
5
6
7
$ PredictionIO-0.11.0-incubating/bin/pio-start-all
Starting Elasticsearch...
Starting HBase...
starting master, logging to /home/abc/PredictionIO-0.11.0-incubating/vendors/hbase-1.2.6/bin/../logs/hbase-abc-master-yourhost.local.out
Waiting 10 seconds for HBase to fully initialize...
Starting PredictionIO Event Server...
$

You may use jps to verify that you have everything started:

1
2
3
4
5
6
$ jps -l
15344 org.apache.hadoop.hbase.master.HMaster
15409 org.apache.predictionio.tools.console.Console
15256 org.elasticsearch.bootstrap.Elasticsearch
15469 sun.tools.jps.Jps
$

A running setup will have these up and running:

  • org.apache.predictionio.tools.console.Console
  • org.apache.hadoop.hbase.master.HMaster
  • org.elasticsearch.bootstrap.Elasticsearch

At any time, you can run PredictionIO-0.11.0-incubating/bin/pio status to check the status of the dependencies.

Now you have installed everything you need!

You can proceed to Choosing an Engine Template, or continue the QuickStart guide of the Engine template if you have already chosen one.