-
Notifications
You must be signed in to change notification settings - Fork 16
Custom Aggregate Functions Using Approximate Algorithms
shmurthy62 edited this page Feb 11, 2015
·
2 revisions
#Computing topN
Sample EPL for Computing Top 5 for a single dimension is shown below
insert into TopFiveStream select topN(10000, 5, item) from rawStream;
@OutputTo()
select * from TopFiveStream;
The first argument to topN is the capacity of buffer. This algorithm uses frequency estimation and works out of a fixed buffer. When buffer capacity is exceeded, the least frequently seen items will be ejected. The higher the capacity more the memory consumption. The second argument specifies how many top items to return. In this example we are asking to return top 5.
insert into DistinctCountStream select distinctCount(item) from rawStream;
@OutputTo()
select * from DistinctCountStream ;
Distinct Count implementation uses HyperLogLog algorithm and is extremely efficient in space and time.
@OutputTo()
select percentile(0.95, item) as 95th percentile;
The value passed as the second argument to this function must be of type Double - if not you will see the engine through an exception.