Find the code here: http://github.com/bwlewis/esperr.
Source package: http://illposed.net/esperr_0.1.0.tar.gz
Find R here: http://www.r-project.org.
Find Esper here: http://esper.codehaus.org/.


esperr: Streaming event processing for R

The esperr package for R incorporates the Esper framework and implements an R-language interface to its XML and Java bean event API. Events are defined by Java objects or by XML documents that follow an XML schema definition (XSD) document. The package includes example schema and events.

An event is an immutable, structured data object associated with a time in the past. Esper is a set of open-source libraries for working with multiple sequences of events to perform computations involving the event streams. Paul Fremantle of WS02 defines the following simple taxonomy of event-processing terms: Here are some slides from a talk given at the R in Finance 2010 conference: LewisKaneRInFinance.pdf.

Note that this package is quite new and still in active development! Please, please, feel free to contribute.


Event types and performance

Events may be described programatically in many ways. The esperr package exposes two event descriptions directly to R: XML and plain-old Java objects (POJOs). Each event description has advantages and disadvantages. The chief difference is performance: We reported in our talk a throughput of about 4,000 events/second using XML-described events. POJO events in the same VWAP example on the same hardware yielded about 250,000 events/second.

XML events are defined by a text XML schema document. Their chief advantage is their flexibility. They require only a text editor to define, and are human-readable. No extra software is required to create and use XML events. Their chief drawback is that the XML must be parsed, incurring extra processing overhead. The esperr package presently uses a very simple XML event representation implemented with the "document object model" (DOM). A much higher-performance framework is available in the Apache Axiom library. Axiom is geared to efficiently and rapidly process streaming XML events and also includes capabilities for handling raw binary data (as well as many other advantages). We plan to include Axiom support in a future revision of the esperr package.

POJO events are simply Java objects that conform to Java bean convention. They require a Java compiler to create, although once created the esperr package can process them without the need of a compiler. Their main advantage is performance. The Esper library is particularly well-suited to processing POJOs. You will see up to several orders magnitude greater performance with POJO events and esperr than with XML events.

How to choose?

XML events provide a very quick and dynamic way to create and experiment with event structure. POJO requires a bit more work and a Java compiler. Perhaps one approach is to prototype with XML and implement with Java in production.

So, is that it?

Not quite. The esperr package also includes a basic prototype interface to the Redis database that makes it easy to send output events for offline or distributed processing. Our intuition is that the combination of Esper stream processing with a Redis output event cache is a powerful one, but we are just beginning to explore this combination.

Right now at least, the package includes all the elements one needs to perform distributed parallel event processing with R.