Expedia | Apache Spark Meetup

Feature Row: h2 Call-to-Action Here

Row 1: h2 Heading in a Full Width Content Section (1-Column)


Register Here

Lessons Learned of Autoscaling Spark to 100’s of EC2 Nodes Each Week with Apache Spark to Service Email Campaigns


Join us for insightful conversations and delicious beers with Lead Engineers from Expedia’s Email Marketing team on how they’ve built Apache Spark into their marketing campaign workflow.


Why you should attend:

   We will cover lessons learned of scaling Spark on an AWS data lake to service millions of customers each day, bursting from 20 to 100’s of EC2 nodes throughout the week.

 We'll explore the state of Apache Spark at scale today and optimizations Expedia is leveraging to incorporate it as a core part of their marketing strategy. Focusing on the challenges using Spark, and limitation looking into the future of Spark in 2018. 

  Network with your peers

  Enjoy delicious food & drinks and great discussions


Agenda:

6:00 - 6:25 PM

Doors Open (Drinks + Snacks)

6:25 - 6:30 PM

Announcements + Opening Remarks

6:30 - 7:00 PM Jagannath on Business Use Case of Ocelot/Alpha Product in Expedia
7:00 - 8:00 PM Nishant & Nick on Using Spark to Send Personalized Marketing Emails to the Expedia Customer Base
8:00 - 8:30 PM

Ask an Architect + Mingling (Drinks + Snacks)


Speaker Abstracts:

Jagannath Narasimhan, Technical Product Manager

Jagannath will share how their team has increase revenue at Expedia by leveraging our Qubole Spark pipelines for sending out Marketing Emails, and also make use of the Spark clusters to build profiles for Expedia users. We will overview the dataflow architecture to show how we deploy our various big data pipelines (LTS, PreProd, and Prod). Servicing emails worldwide, as well as push notifications for the Expedia mobile app. Following, he will dive into our decision making process in how we size and measure the costs of our workloads using different types of clusters/instances (e.g. Spot vs On-Demand, R4 vs M4, and different # of nodes).



Nishant Jain & Nick Mergia, Software Development Engineers on the OmniChannel Communications Team

Nishant and Nick will deep dive into the technical challenges of pySpark vs. Scala, focusing on how they have migrated their Python jobs to Scala for better performance, reliability, and support of more use cases. Scala has shown many benefits such as better IDE support, unit testing, debugging capability, using external libraries, and the ability to package into a single JAR. Following this we will share how we orchestrate multiple production clusters using Jenkins, our debugging process using Scala JARs, and ultimately how we terminate these jobs with the use of AWS Lambda. We will close with learnings and recommendations of using Qubole for our Ocelot/Alpha Product and lessons learned of how we manage Spark (Shuffles, Broadcast, Repartitions).



Details:
When:  Wednesday, July 25th   |   6:00 PM

Where:  WeWork Lincoln Square, WeWork Lincoln Square 10400 Ne 4th St. Bellevue, Wa 98004 Suite 500




Row 4: 2-Column Break Out Sections

Thumbnail

The 'headline' class removes top margin on heading tags

This is a nested "row" inside of the left column. It's 5/6 width (col-sm-9) and can be adjusted the same as the larger columns (instructions above). You can turn this entire row off by toggling the "Left Column: Nested Row 2" variable to "Hide".

Thumbnail

The 'headline' class removes top margin on heading tags

This is a nested "row" inside of the left column. It's 5/6 width (col-sm-9) and can be adjusted the same as the larger columns (instructions above). You can turn this entire row off by toggling the "Left Column: Nested Row 2" variable to "Hide".

The 'headline' class removes top margin on heading tags

Add a message above your form here

This is an optional area for full-width content. Alternately, to move the "Brands That Rely on Qubole" section down, you can copy all the HTML in that section and paste it here.