How Cupid is Counting on Data Science to Find the Perfect Match

With Valentine’s Day just around the corner, many are turning to online dating sites to make sure they don’t spend the holiday alone. Such sites tout their matchmaking abilities to help clients find the perfect mate — but there’s more to matching than meets the eye. Data science plays a big role in the online dating industry, and here we’ll take a look at the nuts and bolts of the algorithms that help support online dating success.

Key Players in Online Dating

While there are various dating services that rely solely on geographic proximity and allow users to decide who they want to match with, others promise to match users based on metrics other than who might live in the same apartment complex. However, the jury is still out on whether the algorithms these companies tout for their proprietary effectiveness even work, since multiple studies have provided conflicting results. Part of the conflict relates to varying definitions of what constitutes a “match,” as well as the dynamics of our evolving society and changing perspectives regarding what relationship success actually looks like.

In addition, some experts cite specific weaknesses in the online dating paradigm related to sociology, anthropology, and data science frameworks, noting an overdependence on profile browsing and the “overheated emphasis on ‘matching algorithms.’ ” Regardless, the key players in the industry promise that their formulas work — as evidenced by their associated guarantees. Here are some of the top sites and the promises they make:

  • Match.com#1 in Dates, Relationships and Marriage
  • eHarmony: #1 Trusted Dating Site for Like-Minded Singles; Beat the odds, bet on love with eHarmony. Our bold, scientific approach to matching means more quality dates with deeply compatible singles that truly understand you.
  • OkCupid: The best free dating site on Earth.
  • Tinder: It starts here. Friends, dates, relationships, and everything in between.

Actually, three of these four popular sites — Match, OkCupid, and Tinder — are owned by the same company, Match Group Inc. In a recent interview with NPR, Sam Yagan, CEO of Match Group Inc., says that dating sites are great for helping identify the people you would or wouldn’t be interested in. But even he says that we’re “decades away” from predicting chemistry between people. However, he cites the role of math and data in making it possible, describing how Match was founded by himself and three other Harvard math majors in 1995: “We saw that there wasn’t any dating site at that time that was focused on an algorithm data-based approach. Now, that’s where the industry has moved quite a bit.”

Of course, the CEO of eHarmony, Neil Clark Warren, told Business Insider that he doesn’t agree that online dating apps like Tinder are effective: “They’re depending on superficial, almost accidental compatibility. Compatibility is a serious matter, and it’s very deep and very important to figure out."

The Nuts and Bolts of Programming Love

Each company has its own approach to using data science to achieve best results. Here, we’ll focus on one as an example. In a 2014 presentation at MongoDB World, “Big Dating at eHarmony,” Thod Nguyen, chief technology officer of eHarmony, discussed how the company invested in some interesting technology to support long-term attainability, scalability, and innovation needs — including a migration to the MongoDB data storage solution. His description of their journey is quite detailed, and the following provides a summary of key components that help create the eHarmony offering:

  • Compatibility matching processor (CMP Application) — Built on top of the relational database, the CMP creates about 3 billion potential matches per day, with about 25 terabytes of user data in the entire matching system. In support, there are more than 60 million queries daily — complex multi-attribute queries — looking across more than 250 attributes. The systems store and manage more than 200 simple criteria, such as million photos with more than 15 terabytes of data in photo storage. They also manage more than 4 billion relationship questionnaires, with over 25 terabytes of data.
  • Compatibility matching system (CMS Models) — eHarmony’s “secret sauce,” made up of a very sophisticated three-tier process:
    1. Compatibility matching models – identify potential matches based on a client's core compatibility, derived from 29 dimensions of personality and psychology traits and based on the user set of preferences. It is a two-step process to support a sophisticated bidirectional system to ensure that user preferences are met in both directions. It uses simple criteria, such as age, distance, religion, ethnicity, income, or education (employment was also due to be added), as well as more sophisticated personality traits that users convey by filling out a long list of questionnaires.
    2. Affinity matching models – predict the probability of communication between two people. 
    3. Match distribution models – help to ensure delivery of the right matches to the right user at the right time and to deliver as many matches to the right user at the right time and to deliver as many matches as possible across the entire active network. 

In summarizing eHarmony's system, Nguyen noted, "CMS Models are the 'secret sauce' and created by running complext multi-attribute quieries to identify potential matches for the client. We only retain the candidates where the criteria are met both ways, bidirectionally. As a second step, we take the remaining candidates, and we run them through a slew of compatible models that we have accumulated over the last 14 years. Only those candidates who pass the threshold set by the CMS models are retained and positioned as potential compatible matches for the client."

Providing more insight into their processes, he described the programming languages they use: “We use a lot of Scala. I'm sure a lot of you know, as a functional programming language, to implement our CMS and affinity matching models. We also use a lot of Hadoop. And with Hive, we also started exploring Spark as the interactive data analytics on top of YARN for massive data mining and data processing. And we also use a lot of R … R is a revolution as the programming language for predictive analytics in our machine learning models. Additionally, we use a lot of Node.js with HTML5 to implement our public-facing eHarmony web applications for both the mobile web and the desktop and a slew of other technologies that we're using right now.”

Slideshare Presentation: “Big Data at eHarmony

The journey into eHarmony’s computing efforts to support dating success provides just one glimpse into a world in which falling in love may be increasingly linked to the right algorithms.


Learn more about datascience@berkeley