Skip to content

Latest commit

 

History

History
139 lines (97 loc) · 3.83 KB

README.md

File metadata and controls

139 lines (97 loc) · 3.83 KB

sven

Sven. An ongoing project on text analysis which integrates Pattern for python.


Installation

We tested SVEN on ubuntu systems with Python 2.7 and django 1.4.3, under apache2 server with wsgi module. To install virtualenv and virtualenv wrapper from pip, follow the online documentation.

Git! let's assume /path/to/ as the parent path for sven installation

$ cd /path/to/
$ git clone https://github.com/densitydesign/sven.git

Make sure that settings.py file is located at /path/to/sven/settings.py

$ cd /pat/to/sven
$ cp settings.py.SAMPLE settings.py

Create a sven dedicated virtualenv

Once virtualenvwrapper installed, create a virtualenv directory sven.local, activate the virtualenv and install the requirements:

$ source /usr/local/bin/virtualenvwrapper.sh
$ mkvirtualenv sven.local
$ workon sven.local
(sven.local)$ deactivate

Proceed to lxml installation (required for docx and unicodecsv modules) troubleshooting on ubuntu. The dev module for your python version python2.x-dev should be installed.

$ sudo apt-get install python2.7-dev
$ sudo apt-get install libxml2-dev
$ sudo apt-get install libxslt1-dev
$
$ workon sven.local
(sven.local)$ pip install -r requirements.txt

Other external dependencies, python modules (available as pip packages):

python-docx to enable docx to txt conversion python-unicodecsv to enable unicodecsv download of segments

known issues on MAC os(pip and MAMP)

WHen used with mysql databases, python needs support of MySQLdb module. Normally doing

(sven.local)$ pip install mysql-python

solves the problem. Otherwise try: EnvironmentError: mysql_config not found

$ export PATH="$PATH:/Applications/MAMP/Library/bin"

Configure virtualhost for apache

You can use the virtualhost.SAMPLE file to enable sven site for apache server.

$ cd /path/to/sven
$ sudo cp virtualhost.SAMPLE /etc/apache2/sites-available/sven

WSGIScriptAlias directive should point to /path/to/sven/sven.wsgi.

$ sudo a2ensite sven
$ sudo service apache2 reload

Sven site will be available under http://sven.local (your ServerName directive).

Configure wsgi

You can use the sven.wsgi.SAMPLE file: rename it to sven.wsgi. Make the WSGIScriptAlias directive in your virtualhost point to this file.

Change /path/to/ according to your virtualenv location, e.g. site.addsitedir('/home/daniele/.virtualenvs/sven.local/lib/python2.7/site-packages')

Folder permissions

Create a log file for ANTA processes (sven scripts for Pattern library) and make it writable for the apache user.

$ cd /path/to/sven
$ mkdir logs
$ touch logs/anta.log
$ chown -hR your-user:www-data logs
$ chmod 0775 -R logs
$ mkdir media
$ chown your-user:www-data media
$ chmod 0775 media

The directory media will host various documents corpora (one for each folder). Configure the MEDIA_ROOT directive inside the settings.py file accordingly.

Configure settings.py file

You can use the settings.SAMPLE file and modify variables according to your configuration. Make sure that sven.anta and sven.core appear under installed apps.

...
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3', 
        'NAME': '/path/to/sven/sven.db',    
...
INSTALLED_APPS = (
...
	# 'django.contrib.admindocs',
	'sven.anta',
	'sven.core'
)
...

Start your engines

Activate your virtualenv if it is not yet activated:

$ source /usr/local/bin/virtualenvwrapper.sh
$ workon sven.local
$ cd /path/to/sven

Then:

$ python manage.py syncdb

Cross your fingers and go to your sven url...

Create a Corpus

A corpus is a collection of documents. Standard fidf text analysis will be performed on the whole corpus.