Open in app
Home
Notifications
Lists
Stories

Write
Sandip Roy
Sandip Roy

Home

Mar 22

Spark on Kubernetes(k8s)

Co Authored by Joydeep Das What is Spark? Apache Spark is a framework used in cluster computing environments for analyzing big data. This platform became widely popular due to its ease of use and the improved data processing speeds over Hadoop. Apart from earlier cluster managers (Standalone cluster manager, Hadoop Yarn, Apache Mesos)…

Spark

6 min read

Spark on Kubernetes(k8s)
Spark on Kubernetes(k8s)

Mar 11

Data Validation at Scale with Spark/Databricks

What is Data Quality? Data quality is the measure of how well suited a data set is to serve its specific purpose. Measures of data quality are based on data quality characteristics such as accuracy, completeness, consistency, validity, uniqueness, and timeliness. Now while you generally write unit tests for your…

Spark

6 min read

Data Validation at Scale with Spark/Databricks
Data Validation at Scale with Spark/Databricks
Sandip Roy

Sandip Roy

Bigdata and Databricks Practice Lead at Wipro Ltd

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Knowable