Source: Zoomdata Blog

Zoomdata Blog All Data Streams Have a Common Attribute: Time

If you think only live data can be streamed, this post is for you. With Zoomdata, any data source that has a date-time field* can be played in the Data DVR. You can visually “play” historical data, just like streaming a movie. Time for a Change: Fast Data Sinks Are Changing Analytics But first, let’s talk about Zoomdata Live Mode. Live Mode refreshes data up to as frequently as once per second. We do this without connecting to a live data stream. Yup, that’s not what most people expect. Instead, we connect directly to high-performing “data sinks.” Zoomdata customers’ favorite data sinks for fast analytics on real-time data are modern systems such as MemSQL, Impala on Kudu, Databricks Delta, Snowflake, and search-engine databases like Elasticsearch and Solr. Theoretically, any data source with a date-time field* can be configured to play data in Live Mode. But in practical terms, we recommend high-performance data sources like these that can handle both high-frequency writes while being optimized for data exploration and analytical and workloads. Simpler is Almost Always Better Working with near-real-time updates on high-performing data sinks is really simple and elegant. Without the constraint of the data stream, users can work with arbitrary time windows, not just the data window that’s available in the live stream. Working with one data source also requires far less administrative overhead and maintenance than a complex “lambda” architecture, where data lives in the stream for only so long and then rolls off to persistent storage anyway. So why not just land all the data in real-time, take advantage of the data sink’s advanced analytic capabilities (see our prior post, The Zoomdata Query Engine with Pushdown Processing), and give users a modern data visualization and business intelligence solution that offers a superior visualization and interaction experience on top of live data so you get your full money’s worth off those back-end investments? Goodbye F5, Good Riddance ⌘-R So how does this all work? First, when you connect to a data source, Zoomdata will detect any date-time fields* and the fields’ level of granularity (millisecond, second, minute, hour, etc.). If found, you can enable a checkbox that allows users to work in Live Mode. It’s that simple! In Live Mode, Zoomdata runs full queries and then uses internal markers to automatically aggregate and push incremental updates to visualizations, as frequently as once a second. Many of our customers find that once a minute is sufficient for their analytic needs, but you have options. The end result? No more F5! Users do not need to force expensive full query refreshes, and since only newly arrived data is aggregated and pushed to the user, network and other resources are conserved. The built-in Live Mode functionality increases productivity and overall user satisfaction when working with rapidly changing data. What About Data Quality and Landing Streaming Data? Almost any streaming engine, such as Kafka, Spark Streaming, Apache Storm, Apache Apex, Apache NiFi, Amazon Kinesis, and others, can be used to clean and enrich data in the streaming pipeline. You can use a third party engine to land data, or you could use the built-in Zoomdata Stream Writer Service. The Zoomdata Stream Writer Services receives live data through a RESTful API, passes it to an internal messaging queue for backflow management, and then writes data to a database. Once landed, data is accessible as any other data source for seamless Live Mode and historical analysis. Stream Historical Data On-Demand Did you know that with Zoomdata you can “watch” or “play” historical data to analyze data movement over time, just like watching a pre-recorded movie? This is particularly fascinating when watching data that populate geographic maps. To play data (live and historical), Zoomdata offers the Data DVR. The Data DVR provides controls for pausing, rewinding, fast-forwarding, and selecting data by time windows. The Data DVR uses WebSocket connections for communication between the Zoomdata Query Engine and users’ dashboards, similar to Data Sharpening™ (see our previous post, Big Data Exploration with Microqueries & Data Sharpening). Zoomdata multiplexes commands from each visualization, which means that each query request results in its own two-way, virtual communication channel for dashboards to receive data streaming in one direction, and to send user commands, such as changing playback speed, pausing, or changing the time window in the other direction. The result is a very fluid, engaging experience that allows regular people to work with complex data in very familiar ways. Streaming Data Visualization Zoomdata has been talking about the practical and architectural requirements for streaming data visualization for quite a few years now. To learn more about the challenges with lambda architectures, opportunities to leverage modern data platforms, and providing a superior end-user experience, we offer a few pieces from years gone by. 2018 blog post: Use a Fast Data Sink, Not a Lambda Architecture for Real-Time Analytics 2017 Spark Summit: Building Real Time BI Systems with Kafka, Spark & Kudu 2017 blog post: The Brave New World of Infinite Data and Immediate Interactive Visualization 2016 Spark Summit: Interactive Visualization of Streaming Data Powered by Spark 2015 press release: Zoomdata First to Optimize for Big Data Visual Analytics on Kudu Now that we’ve explored the various features of our Query Engine, we’ll look at the role smart data connectors play in Zoomdata microservices. *Playable date-time fields are determined based on presence of a partition, index or sort key.

Read full article »
Est. Annual Revenue
$5.0-25M
Est. Employees
25-100
CEO Avatar

CEO

Update CEO

CEO Approval Rating

- -/100

Read more