Send Spark DataFrame as an Attachment over E-Mail

Jyoti Dhiman
2 min readMay 25, 2020

So, I was working with one use case to send a spark DataFrame results over e-mail as an attachment. I looked through multiple sources for the same in Scala and couldn’t find an appropriate way to send the results in DataFrame over e-mail. Finally, I decided to work my way through it. I’ll discuss my approach here so it might help others along the way :)

The steps mentioned are of Scala but can be used in Java too with minimal changes.

Step 1: Create an email session, if not already created:

import javax.mail.PasswordAuthentication
import javax.mail.Session
import java.util.Properties
props: Properties = new Properties()
properties.put("mail.smtp.host", "YOUR_HOST")
properties.put("mail.smtp.auth", "true")
properties.put("mail.smtp.port", "YOUR_PORT")
// Add more properties here, if you need
session: Session = Session.getInstance(props, new Authenticator() {
protected PasswordAuthentication getPasswordAuthentication() {
return new PasswordAuthentication("YOUR_MAIL_USERNAME", "YOUR_PASSWORD")
}
})

Step 2: Create MimeMultiPart object and MimeBodyPart objects

import javax.mail.internet.MimeBodyPart
import javax.mail.internet.MimeMultipart
val multipart = new MimeMultipart()
// Multiple body parts can be defined to cater multiple elements of // your email like file or text.
val messageBodyPart = new MimeBodyPart() // For attachment
val messageBodyPartText = new MimeBodyPart() // For email body

Step 3: Writing your DataFrame to disk

// df is the dataframe I want to send over email
val header = df.schema.fieldNames.toSeq
// Defining a function that prints to file
def printToFile(f: java.io.File)(op: java.io.PrintWriter => Unit) {
val p = new java.io.PrintWriter(f)
try {
op(p)
} finally {
p.close()
}
}
// Printing my dataframe to local csv file
val f = new File("file_to_attach.csv")
printToFile(f) {
p =>
header.foreach(e => {
p.print(e)
p.print(",")
})
p.print(",")
p.print("\n")
// If I don't collect, executors will be printing in their disk,
// results won't reach the driver, read my post about Closure to
// understand why.
df.collect().foreach(row => {
row.toSeq.foreach(e => {
p.print(e)
p.print(",")
})
p.print("\n")
})
}

Step 4: Creating content of the email

import javax.mail.{Address, Message, Session, Transport}
import javax.mail.internet.{InternetAddress, MimeMessage}
messageBodyPart.attachFile("file_to_attach.csv")
messageBodyPart.setFileName("file_to_attach.csv")
messageBodyPartText.setText("My Email Body")
multipart.addBodyPart(messageBodyPart)
multipart.addBodyPart(messageBodyPartText)
val msg:MimeMessage = new MimeMessage(session)message.setFrom(new InternetAddress("From email"))
message.addRecipient(Message.RecipientType.TO, new InternetAddress("Recipient's email"))
message.setSubject("My Email subject")
// Attaching my email content
msg.setContent(multipart)

Step 5: Send the email

Transport.send(msg) // Bye-Bye, see you on the other side!

Step 6: Cleanup file on your disk, add a simple snippet to clean up your disk file, we don’t want to waste that disk space :]

As per my use case, saving file on disk worked. Also anyway emails generally have a limit(like 25 MB) so saving file to disk and collecting the dataframe won’t be much of an issue, but you can also go by attaching your file on HDFS to your email directly. To attach a file on HDFS to email, you can refer https://stackoverflow.com/a/49882284/6534673

Ciao.

--

--