Open in app

Sign In

Write

Sign In

Jyoti Dhiman
Jyoti Dhiman

591 Followers

Home

About

Published in

Towards Data Science

·Pinned

Delta lake with Spark: What and Why?

Get to know the storage layer which enabled ACID and updates with Spark — Let me start by introducing two problems that I have dealt time and again with my experience with Apache Spark: Data “overwrite” on the same path causing data loss in case of Job Failure. Updates in the data. Sometimes I solved above with Design changes, sometimes with the introduction of…

Delta Lake

4 min read

Delta lake with Spark: What and Why?
Delta lake with Spark: What and Why?
Delta Lake

4 min read


Published in

Geek Culture

·Jul 18, 2022

Finding the latest date is not as easy as you would think

Understanding how to find the latest value in a date partition column in Spark — This is a very interesting piece as it is here to bust a myth about the partitioned columns, I mean partitioning is so amazing that at certain points we start taking it for granted. …

Data

5 min read

Finding the latest date is not as easy as you would think
Finding the latest date is not as easy as you would think
Data

5 min read


Jul 17, 2022

Systematic Sampling with Spark

Understanding systematic sampling and its implementation — Hey Folks! So, if you are here you must understand what sampling is! If not, no worries, we will do a quick walkthrough of sampling followed by what is systematic sampling and how to do systematic sampling with Spark. So, let’s get started! What is sampling? In this section, we try to understand…

Data Science

4 min read

Systematic Sampling with Spark
Systematic Sampling with Spark
Data Science

4 min read


Published in

Towards Data Science

·Jul 12, 2022

Do Real-Time Data Pipelines Even Exist?

Sharing a fresh perspective on real-time data pipelines — How often have you heard these terminologies — real-time data pipelines or real-time data processing or real-time analytics or just real-time data? These are often discussed to solve some very interesting and critical use cases such as fault detection, anomaly detection, and many more. In this article, we will take…

System Design Interview

4 min read

Do Real-Time Data Pipelines Even Exist?
Do Real-Time Data Pipelines Even Exist?
System Design Interview

4 min read


Published in

Towards Data Science

·Jul 5, 2022

Getting hands-on with DBT — Data Build Tool

Step by Step Guide to running your first project with DBT — DBT is all the rage now! When I first read about it I was like okay it’s a config-driven ETL, what else is new? Then I read more about it I was like umm.. …

Data

4 min read

Getting hands-on with DBT — Data Build Tool
Getting hands-on with DBT — Data Build Tool
Data

4 min read


Published in

Towards Data Science

·May 22, 2022

Stop using the LIMIT clause wrong with Spark

Understanding spark LIMIT and its performance with large datasets — If you come from the SQL world, you must be familiar with the LIMIT clause. It is pretty commonly used to see a small chunk of data. But ever wondered how it works? Spark also provides the functionality to sub-select a chunk of data with LIMIT either via Dataframe or…

Spark

4 min read

Stop using the LIMIT clause wrong with Spark
Stop using the LIMIT clause wrong with Spark
Spark

4 min read


Published in

Geek Culture

·May 14, 2022

Should you use singleton objects in Scala?

Understanding singleton objects in Scala — Hey Folks, happy weekend! If you are familiar with object-oriented programming you must be familiar with the concept of a class, widely defined as A class is a blueprint that defines the variables and the methods common to all objects of a certain kind.

Scala

3 min read

Should you use singleton objects in Scala?
Should you use singleton objects in Scala?
Scala

3 min read


Published in

Geek Culture

·Feb 25, 2022

How to build a simple text-to-speech converter?

Guide on building text to speech converter in Python — This morning I noticed the “Listen” feature of articles on Medium and instantly loved it. I believe this is a great addition for learning on the go, and it got me thinking, how can I build one on my own? It’s a very simple implementation(Guess how many lines it takes…

Internships

2 min read

How to build a simple text-to-speech converter?
How to build a simple text-to-speech converter?
Internships

2 min read


Published in

Geek Culture

·Feb 23, 2022

What is Data-as-a-Product?

Understanding one of the founding principles of Data Mesh — Now, there are multiple understandings of viewing and approaching data as a product. This article is my attempt to share my understanding based on experience and learning of the concept. Before digging into the understanding of data as a product let’s first understand What exactly is a product? Very basic question though. We are surrounded…

Data

3 min read

What is Data-as-a-Product?
What is Data-as-a-Product?
Data

3 min read


Feb 16, 2022

Do you need a Macbook to visit Starbucks?

A hilarious encounter of my sister’s first visit to Starbucks — How will you define the typical surroundings of Starbucks? Mellow music, coffee beans fragrance and people working on their laptops, catching up with friends, or just posting pictures of the Starbucks mug on Instagram and providing them with free advertising(P.S. …

Funny

3 min read

Do you need a Macbook to visit Starbucks?
Do you need a Macbook to visit Starbucks?
Funny

3 min read

Jyoti Dhiman

Jyoti Dhiman

591 Followers

Senior Engineer @ Linked[in]

Following
  • ODSC - Open Data Science

    ODSC - Open Data Science

  • Harsh Darji

    Harsh Darji

  • Pinterest Engineering

    Pinterest Engineering

  • Tim Denning

    Tim Denning

  • Matt Weingarten

    Matt Weingarten

See all (112)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech

Teams