Skip to content

2.1 Create Classifer Processor (Scala)

Nepomuk Seiler edited this page Feb 2, 2012 · 1 revision

This tutorial explains how to create a classifier with the knowing.core.TProcessor trait. First of all configure Knowing as explained in Knowing-Launch.

Create Project

Create an new Eclipse Plug-in Project with following options

  • No contributions to the UI
  • Activator yes

Configure Project

Right-click on the project->configure->Add Scala nature

Configure Dependencies

Open META-INF/MANIFEST.MF and open tab dependecies

  • Add de.lmu.ifi.dbs.knowing.core as required Plugin-in.
  • Remove _org.scala-ide.library.2.9.0
  • Add imported packages -> NONE

Optional

Eclipse automatically adds org.eclipse.core.runtime as dependency. If you want to use another OSGi runtime, remove org.eclipse.core.runtime from required bundles and add org.osgi.framework to imported packages as shown here:

Create Processor

A processor represents a computing unit in a data-mining process. de.lmu.ifi.dbs.knowing.core.processing.TProcessor has five methods:

def build(instances: Instances)

This method gets called when a TLoader loads a set of instances and sends a Result(insances) message to all its listeners. In our classifier we would build our internal model with this dataset.

def query(query: Instance): Instances

The query method is used, when another Node (normally a TProcessor) sends a Query message to this actor. The return type again weka.core.Instances. However these Result should be generated via the knowing.core.util.ResultsUtil static methods. Currently there are only a few standard datasets for general purposes.

def result(result: Instances, query: Instance)

If our classifier queries some other Processors, we get our results via this method. Behind the scene there is a QueryResult message. You can see a sample usage in knowing.core.validation.XCrossValidation.

def getClassLabels: Array[String]

This method is currently a helper method, when our processor wants to create Results and add his used classLabels.

def configure(properties: Properties)

Last but not least, this method configures the behaviour of our processor. It's called on startup.

So maybe you have some class like this:

package com.example.knowing

import java.util.Properties

import de.lmu.ifi.dbs.knowing.core.processing.TProcessor
import de.lmu.ifi.dbs.knowing.core.util.ResultsUtil

import weka.core.{ Instance, Instances }

class MyProcessor extends TProcessor {

  var classLabels:Array[String] = Array()
  
  def build(instances: Instances) {
    //Build our internal model
  }

  /**
   * Returns just an empty dataset
   */
  def query(query: Instance): Instances = ResultsUtil.emptyResult

  /**
   * This processor doesnt query an other processors
   */
  def result(result: Instances, query: Instance) {
    /* ## Example usage ##
     * val q = new DenseInstance
     * sendEvent(Query(q))
     */

  }

  def configure(properties: Properties) {
	  // configure options here
  }

  def getClassLabels: Array[String] = classLabels

}

Actors, Events and Communication

NOTE: Under construction

Every data-mining process is organized in a DPU (Data Processing Unit). The idea is that every Node, the DPU is structured like a graph, is represented by an actor. We use Akka as our Actor-Framework. So you look there for more informations about actors.

Actors communicate via messages. In Knowing we called it Event. Mostly these events are abstracted away and corresponding methods are called which are implemented by the programmer. The idea is to created data-mining processes which are executed robustly and parallel. For example Neuronal-Networks:

Every Neuron could be represented by an Actor. However you have only one class/factory which provides the implementation. The evaluation process is now just event driven. A Start event forces the TLoader to load the input data. They send there input to their linked neurons. Every neuron sends it's results through the network until the output-neurons send Results to the presentation layer (TPresenter).

Register as OSGi Service

Last but not least we have to register MyFactory as an OSGi service. Open your Activator.java and add the following line of code into the start method

myFactory = context.registerService(TFactory.class.getName(), new MyFactory(), null);

Your Activator should look like this.

package com.example.knowing;

import org.osgi.framework.BundleActivator;
import org.osgi.framework.BundleContext;
import org.osgi.framework.ServiceRegistration;

import de.lmu.ifi.dbs.knowing.core.factory.TFactory;

public class Activator implements BundleActivator {

	private static BundleContext context;

	static BundleContext getContext() {
		return context;
	}

	private ServiceRegistration myFactory;

	/*
	 * (non-Javadoc)
	 * @see org.osgi.framework.BundleActivator#start(org.osgi.framework.BundleContext)
	 */
	public void start(BundleContext bundleContext) throws Exception {
		Activator.context = bundleContext;
		myFactory = context.registerService(TFactory.class.getName(), new MyFactory(), null);
	}

	/*
	 * (non-Javadoc)
	 * @see org.osgi.framework.BundleActivator#stop(org.osgi.framework.BundleContext)
	 */
	public void stop(BundleContext bundleContext) throws Exception {
		myFactory.unregister();
		Activator.context = null;
	}

}

Make sure that your plugin gets started at the beginning so the service is registered. You can use declarative services as well if you want. The id of MyClassifier is com.example.knowing.MyClassifier if you use it in a DPU.