In this blog series we’ve been examining the Five Myths of Apache Spark Optimization. The fourth myth we’re considering relates to a common misunderstanding held by many Spark practitioners: Spark application tuning can eliminate all of the waste in my applications. Let’s dive into it. Manually Tuning Spark Applications Manual tuning refers to a developer’s ability to turn the knobs that control the CPU, memory, and other resources allocated to an application. The resource requirements for an application, especially applications such as Spark, typically vary over time—sometimes by a great amount. There is a peak period, when resource requirements are at their greatest, and an off-peak period. In practice, developers almost always size their applications to this peak, or even above. This ensures that the application has the right amount of resources and will not fail. However, the peak period often represents a small fraction of the overall time that an application runs. Most applications run well below this peak allocation. Figure 1: Developers are required to allocate memory and CPU for each of their Spark applications. To prevent their applications from being killed due to insufficient resources, developers typically set the resource request level to accommodate peak usage requirements. […]
The post Myth #4 of Apache Spark Optimization: Manual Tuning appeared first on Pepperdata.