Spark on Kubernetes(k8s)

What is Spark?

Apache Spark is a framework used in cluster computing environments for analyzing big data. This platform became widely popular due to its ease of use and the improved data processing speeds over Hadoop.

What is Kubernetes?

Kubernetes, often abbreviated as “K8s”, is an open-source container orchestration platform designed to automate the deployment, scaling, and management of containerized applications.

Why Spark on Kubernetes?

Amid a slowdown of the Hadoop big data market and added the fact that people seem way more interested in Kubernetes than in older Hadoop specific technologies like YARN for resource management and orchestration, the fast adoption of cloud native technologies and containerization of applications brewing a perfect storm for the ageing Hadoop stack. Nonetheless projects like Apache Spark are fast adopting by introducing Kubernetes as an alternative to YARN.

Spark on Kubernetes — Execution Flow

Detailed Steps

i) Clone Apache Spark project from GitHub and build Docker image in local

Spark image installed on ACR



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sandip Roy

Sandip Roy

Bigdata and Databricks Practice Lead at Wipro Ltd