Source: OStatic Blog

OStatic Blog New Benchmarks Show Big Increases in Spark Graph Processing

Companies focused on Big Data have remained very focused on Apache Spark, an open source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. According to Apache, Spark can run programs up to 100 times faster than Hadoop MapReduce in memory, and ten times faster on disk. When crunching large data sets, those are big performance differences.The race is also on to speed up Spark-driven workloads. Now, Diablo Technologies and Inspur Systems have announced the release of benchmark data showcasing the benefits of the Memory1 solution for Apache Spark workloads. By increasing the cluster memory size with Memory1, Diablo and Inspur claim they were able to cut processing times for graph analytics by half or more.Apache Spark's powerful open source platform enables high-speed data processing for large and complex datasets. The new joint benchmarking process used the k-core decomposition algorithm of Spark's GraphX analytics engine, a particularly stressful series of memory-intensive tests. Previous collaboration between Diablo and Inspur demonstrated the advantages of Memory1 for Apache Spark Streaming.According to Diablo Technologies:"The new graph testing on Memory1 highlights that users can achieve more work per server and greatly reduce the time needed to process increasingly larger datasets than servers with DRAM alone. As a result, users can get more work done with existing resources, minimize server sprawl, and improve Total Cost of Ownership.Diablo and Inspur tested Apache Spark (version 1.5.2) k-core decomposition performance on the same cluster of five servers (Inspur NF5180M4, two Intel Xeon CPU E5-2683 v3 processors, 28 cores each, 256GB DRAM, 1TB NVME drive). The servers were first configured to use only the installed DRAM to process multiple datasets. Next, the cluster was set up to run the tests on the same datasets with 2TB of Memory1 per server." Completion times for the smallest benchmarked datasets were comparable. However, the medium-sized sets using Memory1 reportedly completed twice as fast as the traditional DRAM configuration (156 minutes versus 306 minutes). On the large sets, the Memory1 servers reportedly completed the job in 290 minutes, while the DRAM servers were unable to complete due to lack of memory space. As the dataset grew, Memory1 results were reportedly several factors beyond what DRAM could do alone."While we anticipated a substantial performance improvement with Memory1, what's notable is that as the dataset scaled, the cluster without Memory1 failed," said Maher Amer, Chief Technology Officer at Diablo Technologies. "This clearly illustrates the complexity of analytics on big data workloads. Graph processing is the latest use case that shows the benefits of expanded memory in addition to SQL queries, machine learning, and streaming." "Inspur Memory1 Servers have shown that they are the best solution on the market to complete analytics processing of big data tasks and are a necessary infrastructure for Apache Spark," said Alfie Lew, Solutions Architect at Inspur. "These results are exciting for the big data world, and we look forward to demonstrating more impressive results on additional memory-intensive applications."Related ActivitiesComments (0)Post a CommentAsk a QuestionRelated Blog PostsNew Options for Valuable Hadoop and Spark Training (post comment)Guest Post: Databricks Leaders on Today's Real Time Analytics Challenges (1 comment)IBM's Spark-Driven Data Science Experience Cozies Up to GitHub (post comment)

Read full article »
Est. Annual Revenue
$5.0-25M
Est. Employees
25-100
CEO Avatar

CEO

Update CEO

CEO Approval Rating

- -/100

Read more