Downloading Source Code
Download Apache PredictionIO 0.12.0-incubating from an Apache mirror.
$ gpg --import KEYS $ gpg --verify apache-predictionio-0.12.0-incubating.tar.gz.asc apache-predictionio-0.12.0-incubating.tar.gz
You should see something like this.
1 2 3
gpg: Signature made Tue Sep 26 22:55:22 2017 PDT gpg: using RSA key 7E2363D84719A8F4 gpg: Good signature from "Chan Lee <email@example.com>" [ultimate]
Run the following at the directory where you downloaded the source code to build Apache PredictionIO. By default, the build will be against
- Scala 2.11.8
- Spark 2.1.1
- Hadoop 2.7.3
- Elasticsearch 5.5.2
1 2 3
$ tar zxvf apache-predictionio-0.12.0-incubating.tar.gz $ cd apache-predictionio-0.12.0-incubating $ ./make-distribution.sh
You should see something like the following when it finishes building successfully.
1 2 3 4 5
... PredictionIO-0.12.0-incubating/sbt/sbt PredictionIO-0.12.0-incubating/conf/ PredictionIO-0.12.0-incubating/conf/pio-env.sh PredictionIO binary distribution created at PredictionIO-0.12.0-incubating.tar.gz
Extract the binary distribution you have just built.
$ tar zxvf PredictionIO-0.12.0-incubating.tar.gz
Building against Different Versions of Dependencies
Starting from version 0.11.0, PredictionIO can be built against different versions of dependencies. As of writing, one could build PredictionIO against these different dependencies:
- Scala 2.10.x, 2.11.x
- Spark 1.6.x, 2.0.x, 2.1.x
- Hadoop 2.4.x to 2.7.x
- Elasticsearch 1.7.x, 5.x
As an example, if you want to build PredictionIO to support Scala 2.11.8, Spark 2.1.0, and Elasticsearch 5.3.0, you can do
$ ./make-distribution.sh -Dscala.version=2.11.8 -Dspark.version=2.1.0 -Delasticsearch.version=5.3.0
Let us install dependencies inside a subdirectory of the Apache PredictionIO (incubating) installation. By following this convention, you can use Apache PredictionIO's default configuration as is.
$ mkdir PredictionIO-0.12.0-incubating/vendors
Apache Spark is the default processing engine for PredictionIO. Download and extract it.
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.1-bin-hadoop2.6.tgz $ tar zxvfC spark-2.1.1-bin-hadoop2.6.tgz PredictionIO-0.12.0-incubating/vendors
PostgreSQL can be used by PredictionIO as a storage backend for all 3 repositories (event data, meta data, and model data). This is perhaps the easiest route if you are trying PredictionIO for the first time.
Make sure you have PostgreSQL installed. For Mac Users, Homebrew is recommended and can be used as
$ brew install postgresql
or on Ubuntu:
$ apt-get install postgresql
Now that PostgreSQL is installed use the following comands
$ createdb pio
If you get an error of the form
could not connect to server: No such file or directory, then you must first start the server manually,:
$ pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start
Finally use the command:
$ psql -c "create user pio with password 'pio'"
Starting from 0.11.0, PredictionIO no longer bundles JDBC drivers. Download the PostgreSQL JDBC driver from the official web site, and put the JAR file in the
lib subdirectory. By default,
conf/pio-env.sh assumes version 42.0.0 JDBC 4.2. If you use a different version, modify
POSTGRES_JDBC_DRIVER to point to the correct JAR.
HBase and Elasticsearch Setup
Elasticsearch can be used as a storage backend for the meta data repository.
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.2.tar.gz $ tar zxvfC elasticsearch-5.5.2.tar.gz PredictionIO-0.12.0-incubating/vendors
If you are not using the default setting at
localhost, you may change the following in
PredictionIO-0.12.0-incubating/conf/pio-env.sh to fit your setup.
1 2 3
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
HBase can be used as the backend of the event data repository.
Download HBase from a mirror. Extract HBase by following the example below.
$ tar zxvfC hbase-1.2.6-bin.tar.gz PredictionIO-0.12.0-incubating/vendors
You will need to at least add a minimal configuration to HBase to start it in standalone mode. Details can be found here. Here, we are showing a sample minimal configuration.
1 2 3 4 5 6 7 8 9 10
<configuration> <property> <name>hbase.rootdir</name> <value>file:///home/abc/PredictionIO-0.12.0-incubating/vendors/hbase-1.2.6/data</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/home/abc/PredictionIO-0.12.0-incubating/vendors/hbase-1.2.6/zookeeper</value> </property> </configuration>
PredictionIO-0.12.0-incubating/vendors/hbase-1.2.6/conf/hbase-env.sh to set
JAVA_HOME for the cluster. For example:
For Mac users, use this instead (change
1.7 if you have Java 7 installed):
export JAVA_HOME=`/usr/libexec/java_home -v 1.8`
In addition, you must set your environment variable
JAVA_HOME. For example, in
/home/abc/.bashrc add the following line:
Start PredictionIO and Dependent Services
PredictionIO-0.12.0-incubating/bin/pio-start-all and you should see something similar to the following:
1 2 3 4 5 6 7
$ PredictionIO-0.12.0-incubating/bin/pio-start-all Starting Elasticsearch... Starting HBase... starting master, logging to /home/abc/PredictionIO-0.12.0-incubating/vendors/hbase-1.2.6/bin/../logs/hbase-abc-master-yourhost.local.out Waiting 10 seconds for HBase to fully initialize... Starting PredictionIO Event Server... $
You may use
jps to verify that you have everything started:
1 2 3 4 5 6
$ jps -l 15344 org.apache.hadoop.hbase.master.HMaster 15409 org.apache.predictionio.tools.console.Console 15256 org.elasticsearch.bootstrap.Elasticsearch 15469 sun.tools.jps.Jps $
A running setup will have these up and running:
At any time, you can run
PredictionIO-0.12.0-incubating/bin/pio status to check the status of the dependencies.
Now you have installed everything you need!
You can proceed to Choosing an Engine Template, or continue the QuickStart guide of the Engine template if you have already chosen one.