Skip to content

Custom Aggregate Functions Using Approximate Algorithms

shmurthy62 edited this page Feb 11, 2015 · 2 revisions

#Computing topN

Sample EPL for Computing Top 5 for a single dimension is shown below

   insert into TopFiveStream select topN(10000, 5, item) from rawStream;
   @OutputTo()
   select * from TopFiveStream;

The first argument to topN is the capacity of buffer. This algorithm uses frequency estimation and works out of a fixed buffer. When buffer capacity is exceeded, the least frequently seen items will be ejected. The higher the capacity more the memory consumption. The second argument specifies how many top items to return. In this example we are asking to return top 5.

Computing Distinct Count

    insert into DistinctCountStream select distinctCount(item) from rawStream;
   @OutputTo()
   select * from DistinctCountStream ;

Distinct Count implementation uses HyperLogLog algorithm and is extremely efficient in space and time.

Computing Percentiles

   @OutputTo()
   select percentile(0.95, item) as 95th percentile;

The value passed as the second argument to this function must be of type Double - if not you will see the engine through an exception.