GitHub - FengBli/GSDMM: My own implementation of Gibbs Sampling for DMM(Dirichlet Multinomial Mixture)

Motivation

When I was working on the third homework of data mining course: clustering the short texts, I found this paper in Reference section which turned out be to the one recommended by Mr. Zhang in class. So I tried to implement the GSDMM algorithm proposed myself, of course, with the help of online resources.

NOTICE

This implementation is still on going.

Data Format

vacabulary.json, with one word and its corresponding id each line.
train_tokens.json, with one doc-id and its token list each line.
train_topics.json, using for validation.

Reference

Paper
- Yin, J. and Wang, J., 2014, August. A dirichlet multinomial mixture model-based approach for short text clustering. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 233-242).ACM.
- Nguyen, D. Q., Billingsley, R., Du, L., & Johnson, M. (2015). Improving topic models with latent feature word representations. , 3, 299-313.
Code
- datquocnguyen/jLDADMM: java version
- atefm/pDMM: python version

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
GSDMM.py		GSDMM.py
README.md		README.md
train_tokens.json		train_tokens.json
train_topics.json		train_topics.json
vocab.json		vocab.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Motivation

NOTICE

Data Format

Reference

About

Releases

Packages

Languages

FengBli/GSDMM

Folders and files

Latest commit

History

Repository files navigation

Motivation

NOTICE

Data Format

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages