spark-logfac

Logistic Matrix Factorizations over Spark

The package contains highly scalable matrix factorization algorithms with a logistic loss function:

LMF [1]
Item2Vec [2]

Setup

resolvers += "Spark Packages Repo" at "https://repos.spark-packages.org/"

libraryDependencies += "ezamyatin" % "spark-logfac" % "0.1.0"

Overview

Both algorithms implement the general gradient descent method for matrix factorizations. The key principle of the algorithm is as follows: each training epoch consists of n subiterations, where n is the number of partitions into which the embeddings and data are partitioned. The value of n should be chosen so that 1/n part of the embeddings fits into the memory of a single executor. Data and embeddings are partitioned using two hash functions, one for “rows” and another for “columns”. Thus one table of embeddings (user-side) is partitioned using the first hash function and the second (item-side) using the second hash function. The data is a set of user-item pairs. On each subiteration, only those pairs whose embeddings of both elements are present on the excecutor are processed. To process all pairs, each of the n subiterations corresponds to a k-th cyclic shift of item-side partitioning.

Within the subiteration, local SGD is performed. The key difficulty here is negative sampling for implicit approaches. To solve this problem we used the method described in the paper "Distributed negative sampling for word embeddings" [3], the essence of which is that to use embeddings available at the executor.

References

[1] Logistic Matrix Factorization

[2] Item2Vec

[3] Distributed negative sampling for word embeddings

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
project		project
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
mkartifact.sh		mkartifact.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spark-logfac

Logistic Matrix Factorizations over Spark

Setup

Overview

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

spark-logfac

Logistic Matrix Factorizations over Spark

Setup

Overview

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages