Publications and Talks

I’ve had the chance to deliver a few talks on my ML experiences.

Speaking at Spark Summit 2018
Giving at talk at Spark Summit 2018.

Spark Summit 2020: Productionizing Deep Reinforcement Learning with Spark and MLflow


Deep Reinforcement Learning has driven exciting AI breakthroughs like self-driving cars, beating the best Go players in the world and even winning at StarCraft. How can businesses harness this power for real world applications? Zynga has over 70 million monthly active users for our mobile games. We successfully use RL to personalize our games and increase engagement. This talks about the lessons we’ve learned productionizing Deep RL applications for millions of players per day using tools like Spark, MLflow and TensorFlow. Hear about what works and what doesn’t work when applying cutting edge AI techniques to real world users.

ML Ops World 2020: Personalization with Deep Reinforcement Learning

(Talk wasn’t recorded)

Toronto Machine Learning Summit 2019: Deep Reinforcement Learning at Zynga: Overcoming the Challenges of Using RL in Production


Deep Reinforcement Learning has seen a lot of breakthroughs in the news, from game playing like Go, Atari and Dota to self-driving cars, but applying it to millions of people in production poses a lot of challenges. The promise of Reinforcement Learning is automated user experience optimization. As one of the world’s largest mobile video game companies, Zynga needs automation in order to personalize game experiences for our 70 million monthly active users. This talk discusses how RL can solve many business problems, the challenges of using RL in production and how Zynga’s ML Engineering team overcame those challenges with our Personalization Pipeline.

Spark Summit 2018: Using Apache Spark to Predict Installer Retention from Messy Clickstream Data


Clickstream data is messy. A single user session in a Zynga game can generate thousands of events, with each game, client version and OS having their own event schemas. Unfortunately, most ML models require their training data to be formatted as a uniform matrix, with each user having the exact same columns. It’s a time consuming challenge to develop feature sets that capture all the nuanced trends and interactions of event streams.

At Zynga we’ve developed a technique to represent user game actions with temporal heatmap feature sets. Utilizing the power of PySpark, our generic data pipeline can generate thousands of features without the need to manually interpret the events of each game. The graphical structure of the heatmaps allow us to take advantage of established image classification techniques to make personalized user level predictions. Within 30 minutes of installing our games, Zynga is able to make accurate predictions on whether a new installer will churn or become a payer.