Source: BlazingDB Blog

BlazingDB Blog Big Data Small GPU, No Problem

BlazingSQL Now Supports Out-of-Core Execution ModelV0.14 is a HUGE release for BlazingSQL!Over the past few weeks, the BlazingSQL team has been hard at work on a variety of new features that dramatically expand on the capabilities of BlazingSQL.Out-of-Core ProcessingThe first and most important is out-of-core processing. BlazingSQL is no longer limited by available GPU memory for query execution.Since the beginning of the year, we’ve been chipping away at this highly requested feature, and, with V0.14, we’re happy to release a stable experience.There are a few key concepts to understand of the out-of-core capability of BlazingSQL, which took a fair amount of work.Graph Execution ModelFirstly, the execution model is now an acyclic graph of execution nodes with a cache in between execution nodes.Example of a BSQL Graph ExecutionEvery execution node operates independently on batches of data, allowing it to process steps in parallel as much as possible instead of sequentially.Multi-Layered CacheThe caches are tiered based on latency and bandwidth, but the three we support as of this writing are:L1: GPU High-Bandwidth Memory (VRAM)L2: System Memory (RAM)L3: Disk (NVME, HDD, etc.)With out-of-core, we have tested 10TB workloads with a single Tesla V100 GPU (32GB). In our tests, we have run 17/30 workloads from the TPCx-BB big data benchmark at scale factor 10,000, which is equivalent to roughly 10 TB of data. We will release benchmarks and numbers around scaling very soon!We are currently working out edge cases to improve BlazingSQL’s ability to scale effortlessly. Users should no longer consider the number of GPUs for how big a query can get, merely how fast (more GPUs equals more performance).Storage PluginsWe continue to make progress in other areas of the engine, such as Storage Plugins.In this latest release there are two notable improvements:MinIO Support — We have enabled support of 3rd party technologies that implement AWS SDK. Users can now register MinIO.Set AWS S3 Region — Users can now set the region of their AWS S3 bucket.What’s NextIn the current release, there have been significant feature additions and numerous improvements across the engine as well. Some, but not all of these improvements include expansion of supported SQL, exception handling, logging, and Dask integration.For this next release, the team will dedicate resources to refactoring and code cleanup to improve stability and performance.The refactoring should also expand our coverage of edge cases with out-of-core, enabling us to complete 100% of the TPCx-BB queries with a single GPU at arbitrary scale factors.We are also integrating ucx-py with the BlazingSQL communications layer. UCX should allow BlazingSQL to take advantage of high-performance networking (NVLink, Infiniband, etc.) and dramatically improve query workloads that are shuffle-intensive when this networking is present.Big Data Small GPU, No Problem was originally published in BlazingSQL on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read full article »
Est. Annual Revenue
$100K-5.0M
Est. Employees
1-25
Rodrigo Aramburu's photo - Co-Founder & CEO of BlazingDB

Co-Founder & CEO

Rodrigo Aramburu

CEO Approval Rating

71/100

Read more