Skip to content

Commit

Permalink
Merge branch 'main' of github.com:zahl-/Interview-Task
Browse files Browse the repository at this point in the history
  • Loading branch information
willjmax committed Nov 26, 2023
2 parents 0b5e748 + 5672653 commit ff9f992
Showing 1 changed file with 28 additions and 0 deletions.
28 changes: 28 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Research Software Engineer Pre-Interview Task

This repo contains the Non-negative Matrix Factorization (NNMF) library and the Latex files for the pre-interview task.

## The NNMF library
Let $A$ be a nonnegative $m \times n$ matrix where the $n$ column vectors are viewed at data points with $m$ features.
The goal is to factor $A$ into two nonnegative matrices $A \approx WH$ where the dimensions of $W$ and $H$ are
$m \times k$ and $k \times n$. $W$ is the matrix of centroids, an $H$ is the coefficient matrix.

Our implementation follows three steps:
1. Initialization - we initialize $W$ and $H$ via $k$-means clustering. $W$ is initialized to the matrix consisting of the $k$ centroids, and $H$ is initialized to the indicator matrix which assigns each vector to a cluster
2. Update - we update $W$ and $H$ by applying non-negative least squares (NLS) in an alternating manner. That is, we update the rows of $W$ by applying NLS to $H$ and $A$, then update the columns of $H$ by applying NLS to $W$ and $A$.
3. Evalutation - we evaluate the solution with the Frobenious norm $\||A - WH\||_F$.

#### A note on notation
In this implementation we rely on ```sklearn.cluster.KMeans``` for our initialization. This package assumes the $m \times n$
matrix consists of $m$ data points with $n$ features. This is the transpose of the setup given in the pre-interview task.
As a result, our implementation formulates the problem as $A^T \approx H^T W^T$.

### Documentation
The library is contained in ```code/nnmf.py``` and consists of the four functions defined here.
| Function | Input | Output | Description |
|------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------|
| initialize | <code>A: numpy.ndarray</code><br><br> <code>k: int</code> | <code>(numpy.ndarray, numpy.ndarray)</code> | Returns the initial factorization. |
| update | <code>A: numpy.ndarray</code><br><br> <code>H: numpy.ndarray</code><br><br> <code>W: numpy.ndarray</code> | <code> (numpy.ndarray, numpy.ndarray) </code> | Performs one step of the alternating NLS update and return <code>(H, W)</code> |
| fnorm | <code>m: numpy.ndarray</code> | <code>float</code> | Returns the Frobenious norm of a matrix. |
| loss | <code> A: numpy.ndarray </code><br><br> <code> H: numpy.ndarray </code><br><br> <code> W: numpy.ndarray </code> | <code>float</code> | Returns <code>fnorm(A - H@W)</code>. |
| nnmf | <code>A: numpy.ndarray</code><br><br> <code>k: int</code><br><br> <code>max_iter: Optional[int] = 1000</code><br><br> <code>tol: Optional[float] = 0.001</code> | <code> (numpy.ndarray, numpy.ndarray) </code> | Performs the NNMS algorithm. Terminate after <code>max_iter</code> iterations or after achieving an error tolerance of <code>tol</code>. Returns <code>(H, W)</code> |

0 comments on commit ff9f992

Please sign in to comment.