Skip to content

extract text from MS Office document with Apache POI

Notifications You must be signed in to change notification settings

nawoto/delta_attack

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

= delta_attack


== Description

Extract MS Office files to plain text.

== Installation


=== Archive Installation

 $ rake install

=== Gem Installation

 $ gem source -a http://gems.github.com
 $ gem install moro-delta-attack

== Features/Problems

Extract MS Office files to plain text usin Apache POI and JRuby.
It works with Client/Server architecture.

The extract server is works on JRuby but the client is works with
both cRuby and JRuby.

This library originally aim to index Office documents to fulltext 
serach engine.

== Synopsis

first you start DeltaAttackServer, which needs JRuby and Apache POI

 $ export CLASSPATH=path/to/poi-3.1-FINAL/poi-3.1-FINAL-20080629.jar:\
                    path/to/poi-3.1-FINAL/poi-scratchpad-3.1-FINAL-20080629.jar
 $ jruby bin/delta_attack_server

Then you can use DeltaAttack::Client, in both CRuby(MRI) and JRuby.

 require 'delta_attack/client'
 DeletaAttack::Client.cast("path/to/some.xls")

== Copyright

Author::    moro <[email protected]>
Copyright:: Copyright (c) 2008 moro
License::   MIT

About

extract text from MS Office document with Apache POI

Resources

Stars

Watchers

Forks

Packages

No packages published