-
Notifications
You must be signed in to change notification settings - Fork 1k
FAQ:Pull Queries
Andy Coates edited this page Dec 2, 2019
·
4 revisions
-- Given an example source:
CREATE STREAM INPUT (ID INT) WITH (KAFKA_TOPIC='input', VALUE_FORMAT='JSON', PARTITIONS=1);
-- Given an example aggregate table:
CREATE TABLE OUTPUT AS SELECT COUNT(1) AS COUNT FROM INPUT GROUP BY ID;
-- Given running a push query in a different window:
SELECT * FROM OUTPUT EMIT CHANGES LIMIT 2;
-- When: Inject some data:
INSERT INTO INPUT (ID) VALUES (42);
INSERT INTO INPUT (ID) VALUES (42);
-- Then you'll see the row output by the push query, and it will include ROWTIME:
+---------------+-------+-------+
|ROWTIME |ROWKEY |COUNT |
+---------------+-------+-------+
|1571769443063 |42 |1 |
|1571769543463 |42 |1 |
-- However, if you then issue a pull query no ROWTIME is returned:
SELECT * FROM OUTPUT WHERE ROWKEY='42';
ROWKEY STRING KEY | COUNT BIGINT
42 | 2
You may be left wondering why ROWTIME is not being returned by the pull query?
With the merger of https://github.com/confluentinc/ksql/pull/4010 pull queries now support ROWTIME
in the projection.
There is no conceptual reason why pull queries don't return ROWTIME, it's just a missing feature to return the timestamp. It's tracked via https://github.com/confluentinc/ksql/issues/3623.
As a workaround, you could "mimic" the internal ROWTIME
computation in your query:
-- We can change the OUTPUT table definition to include the ROWTIME we need:
-- Note: you'll need to terminate the old query and drop the old OUTPUT table def first
CREATE TABLE OUTPUT AS SELECT MAX(ROWTIME) AS MAX_TIME, COUNT(1) AS COUNT FROM INPUT GROUP BY ID;
-- Inject some data:
INSERT INTO INPUT (ID) VALUES (42);
INSERT INTO INPUT (ID) VALUES (42);
-- And then pull queries will return this:
SELECT * FROM OUTPUT WHERE ROWKEY='42';
ROWKEY STRING KEY | MAX_TIME BIGINT | COUNT BIGINT
42 | 1571772033364 | 2