This project has retired. For details please refer to its Attic page.

This examples demonstrates how to recommend users instead of items.

Instead of using user-to-item events to find similar items, user-to-user events are used to find similar users you may also follow, like, etc (depending on which events are used in training and how the events are used). By default, "follow" events are used.

You can find the complete modified source code here.

Modification

Engine.scala

In Query, change items to users and remove categories. Change ItemScore case class to SimilarUserScore. In PredictedResult, change Array[ItemScore] to Array[SimilarUserScore].

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
case class Query(
  users: List[String],
  num: Int,
  whiteList: Option[Set[String]],
  blackList: Option[Set[String]]
)

case class PredictedResult(
  similarUserScores: Array[SimilarUserScore]
){
  override def toString: String = similarUserScores.mkString(",")
}

case class SimilarUserScore(
  user: String,
  score: Double
)

DataSource.scala

In DataSource, change ViewEvent case class to FollowEvent. Remove Item case class.

Change

1
case class ViewEvent(user: String, item: String, t: Long)

to

1
2
// MODIFIED
case class FollowEvent(user: String, followedUser: String, t: Long)

Modify TrainingData class to use followEvent

1
2
3
4
5
6
7
8
9
10
class TrainingData(
  val users: RDD[(String, User)],
  val followEvents: RDD[FollowEvent] // MODIFIED
) extends Serializable {
  override def toString = {
    s"users: [${users.count()} (${users.take(2).toList}...)]" +
    // MODIFIED
    s"followEvents: [${followEvents.count()}] (${followEvents.take(2).toList}...)"
  }
}

Modify readTraining() function of DataSource to read "follow" events (commented with "// MODIFIED"). Remove the RDD of (entityID, Item):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
  override
  def readTraining(sc: SparkContext): TrainingData = {

    // create a RDD of (entityID, User)
    val usersRDD: RDD[(String, User)] = ...

    // MODIFIED
    // get all "user" "follow" "followedUser" events
    val followEventsRDD: RDD[FollowEvent] = PEventStore.find(
      appName = dsp.appName,
      entityType = Some("user"),
      eventNames = Some(List("follow")),
      // targetEntityType is optional field of an event.
      targetEntityType = Some(Some("user")))(sc)
      // eventsDb.find() returns RDD[Event]
      .map { event =>
        val followEvent = try {
          event.event match {
            case "follow" => FollowEvent(
              user = event.entityId,
              followedUser = event.targetEntityId.get,
              t = event.eventTime.getMillis)
            case _ => throw new Exception(s"Unexpected event $event is read.")
          }
        } catch {
          case e: Exception => {
            logger.error(s"Cannot convert $event to FollowEvent." +
              s" Exception: $e.")
            throw e
          }
        }
        followEvent
      }.cache()

    new TrainingData(
      users = usersRDD,
      followEvents = followEventsRDD // MODIFIED
    )
  }

Preparator.scala

Modify Preparator to pass followEvents to algorithm as PreparedData.

Modify Preparator's parpare() method:

1
2
3
4
5
6
7
8
  ...

  def prepare(sc: SparkContext, trainingData: TrainingData): PreparedData = {
    new PreparedData(
      users = trainingData.users,
      followEvents = trainingData.followEvents) // MODIFIED
  }

Modify PreparedData class:

1
2
3
4
5
class PreparedData(
  val users: RDD[(String, User)],
  val followEvents: RDD[FollowEvent] // MODIFIED
) extends Serializable

ALSAlgorithm.scala

Modify ALSModel class to use similar user. Modify train() method to train with follow event. Modify predict() method to predict similar users.

Test the Result

Then we can build/train/deploy the engine and test the result:

The query

1
2
3
$ curl -H "Content-Type: application/json" \
-d '{ "users": ["u1"], "num": 4 }' \
http://localhost:8000/queries.json

will return the result

1
2
3
4
5
6
7
8
{
  "similarUserScores":[
    {"user":"u3","score":0.7574200014043541},
    {"user":"u10","score":0.6484507108863744},
    {"user":"u43","score":0.64741489488357},
    {"user":"u29","score":0.5767264820728124}
  ]
}

That's it! Now your engine can recommend users.