in Code, Tutorials

Using OAuth for Managing User Accounts

Recently I was building a web app in which I wanted to allow people to use their accounts from a variety of social media sites to register for mine. I found a plugin that made this really easy (Sentinel Social for Laravel, if you’re curious), but I quickly found that I ran into some issues, namely:

  1. Not every social media site that allows OAuth integration sends back the same information
  2. Users are likely to have used different credentials (email addresses, names, usernames) to create their various accounts

This created an interesting problem for me, the developer, namely:

How can I reliably match up a “new” registration with an existing user account when appropriate?

Assessing the Landscape

The first thing I did was to create a table with a summary of what fields are returned by the various API calls:

OAuth Service Provider
uid
nickname
name
firstName
lastName
email
location
description
imageUrl
urls
gender
= yes,  = no,  = multiple values,  = if saved in profile

[Note: If there’s an  in a cell, I’m pretty sure that the info is never provided. However, it’s very likely that some of the  ’s should actually be  since it’s probable that those values are not required in the profiles for those accounts. Please send corrections if you have them!]

What can we do with these data? Since uid is unique to each service, it doesn’t offer us much help. The location, description, imageUrl, urls, and gender fields are also of mostly negligible use in terms of matching up identities. I ruled out using any of these fields.

The nickname field is typically what is used as a username, and is potentially useful in matching up an existing user account with the 3rd party account (Github, Instagram, or Twitter). In my own case, however, I found that the nicknames I had chosen for my various accounts were not the same. Darn. We won’t entirely give up on it, though.

The name field is provided, or potentially provided, by all of the systems. The full name, however, is notoriously hard to parse. It would be easy if everyone had just two names and they were separated by a space, e.g. “Morgan Benton.” Unfortunately, though, you’ll get plenty of Morgan C. Benton’s,  Mary Jo Jackson’s and Bill Van Dyke’s–i.e. names with two spaces and no easy way to tell if the middle word is a middle initial, part of the first name, or part of the last name. Furthermore, there’s a trend recently, especially on Facebook, for people not to use their last names in order to avoid searches that might reveal embarrassing stuff about them to employers and other authority figures. This makes the name field less useful. We’ll put that one in our second tier list, as well.

The firstName and lastName fields always show up as a pair. Facebook claims these values are the “real” names of the user, although I’m not sure how they verify this. When available, these fields seem to be a reasonable way to try to search for existing users on your system, or to match up accounts from different 3rd party accounts.

The email is the closest to being our ideal match field. Frustratingly, Instagram and Twitter stubbornly refuse to provide them. Also, in my own case I had used different email addresses to create accounts in different places. While email clearly offers the best hope of an easy match, it won’t be a universal solution.

Experimenting on Myself

So, I wiped my user database clean before running the following experiment. I clicked on the “Register with XXX” links for all seven of the above services on my site. After that, I had no less than five separate user accounts. The data returned by Google, Github and Microsoft matched each other because I happen to have used my Gmail address to sign up for all of those accounts. None of the rest of the data resulted in accounts that matched up with each other. Yikes!

I’m perhaps unusual in that, as a web developer, I’m constantly creating new identities for myself online, so I have at least four different email addresses that I use regularly to interact with different groups of people. It’s clear though, that services like Twitter and Instagram, which provide neither email address, nor discrete values for firstName and lastName are going to require some form of manual matching, or at least some very clever regular expressions.

First Pass at a Solution

The first thing I did was to group the various OAuth providers according to the strategies I would use to match them. [Note: this code is in PHP, but could easily be ported to other languages.] [php] $matched = false;
switch ($provider) {
case ‘facebook’:
case ‘google’:
case ‘linkedin’:
case ‘microsoft’:
// match on email, first_name, last_name, name
break;
case ‘github’:
// match on email, nickname, name
break;
case ‘twitter’:
case ‘instagram’:
// match on nickname, name
break;
}
if (!$matched) {
// either assume no match, or get the user involved
}
[/php]

As it turns out, though, when I actually started working on the code, the solution was much simpler than this. Here’s the rough algorithm:

  1. Get the user information returned from the OAuth provider
  2. From non-null fields: email, nickname, firstName, lastName, and name
  3. In order, look for user accounts that match the following conditions:
    1. emails match OAuth info
    2. nicknames match OAuth info
    3. firstName and lastName match OAuth info
  4. If any of the above matched, return that user account, otherwise
  5. Get a list of all user accounts, for each one:
    1. Construct a “display name” property by concatenating the first and last names with a space in between
    2. Do a textual similarity analysis between the name field from the OAuth provider and the “display name”
    3. If the similarity is above 85%, consider it a match and return that user account

Here’s what it looks like in PHP using Laravel’s Eloquent ORM syntax:

[php]

public static function findMatch($atts)
{
// convert the $atts into individual variables
extract($atts);
// try to match on the various properties
if (isset($nickname)) {
if ($user = self::where(‘username’, $nickname)->first() {
if (!empty($user)) {
return $user;
}
}
}
if (isset($firstName, $lastName)) {
if ($user = self::where(‘first_name’, $firstName)->where(‘last_name’, $lastName)->first()) {
if (!empty($user)) {
return $user;
}
}
}
if (isset($name)) {
$users = self::all();
foreach ($users as $u) {
similar_text($u->display_name, $name, $pct);
if ($pct >= 0.85) {
return $u;
}
}
}

return false;
}

[/php]

As it turns out Sentinel Social already checked for email matching for me so I could leave it out of my function. Also, the 85% similarity threshold for name matching is pretty arbitrary at this point. I have only tried it with one name (my own) and when you check “Morgan Benton” against “Morgan C. Benton” the similarity is about 89.6%.

The good news: I was able to match credentials against all seven of the OAuth providers listed above!

Obvious Problems & Dangers

I was pretty excited to have gotten this working, when one of my students pointed out an obvious problem: this approach will be susceptible to false positives, i.e. it is highly likely that it will match people with similar names who are NOT, in fact, the same person. Worse than that, it could be used maliciously to get access to someone’s account, i.e. intentionally using someone else’s name to set up a social media account, and then using that account to sign up on my site, giving the person access to whomever’s name that got stolen.

Another obvious problem is that this matching won’t work if people haven’t correctly updated their names in their various online accounts. For example, it is not required to provide your name when setting up a Microsoft Live account, so if that isn’t done, it will be impossible to match using names.

Second Pass at a Solution

In my next try, I’m going to figure out how to overcome these last problems. I’m pretty sure that at the end of it all, I will have to allow for some sort of interaction and/or verification by the user. All in all, though, I think it will be a small price to pay for the convenience of using OAuth.

Write a Comment

Comment