-
Notifications
You must be signed in to change notification settings - Fork 0
Python scripts for the search log file analysis described in the book Text Mining and Visualization
License
tgr2uk/search-log-analysis
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Overall process for clustering AOL sessions =========================================== Remove queries that are just ‘-‘ > awk '!/\t-\t/' fullcollection.1000000 > fullcollection.1000000.filtered Extract the feature vectors: > python parseN.py -v fullcollection.1000000 Select a sample from the output.log: > python selectRandomVectors.py output.log -s 100000 > 100000.sample1 (times N) Edit the ARFFheader to suit the data (textedit) Prepend the appropriate weka header > cat ARFFheader 100000.sample1 > ARFF/100000.sample1.arff (times N) Apply feature scaling: > python normalise_sessions.py ARFF/100000.sample1.arff > ARFF/100000.sample1.norm.arff (times N) Cluster using weka
About
Python scripts for the search log file analysis described in the book Text Mining and Visualization
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published