By Audrey Quinn
Posting in Cities
Bitly's chief data scientist asks how individuals can get more from big data, from better book recommendations to health predictions.
"We always joke that the Internet is full of cats,” Bitly's chief scientist Hilary Mason tells me as she leans into her office's conference table. “And, here it is. It is full of cats, and that's OK. They're adorable."
Mason's spent three years sorting through the Internet habits of the URL tracking service's millions of users. The banality of trending topics no longer rifles her.
"When you start at Bitly, you go through this emotional cycle. Where first you go, 'Oh my God, this data is amazing.' But then you start looking at it and you conclude that humanity is completely doomed." She gives a wary laugh. "Because what people read is cats and Bieber and celebrity gossip and that stuff."
But then there's the third stage of that cycle, where Mason now resides. "You eventually come out the other side and you realize that there's a huge amount of potential here. It's the realization that yes, this is what humanity does, that's the way people are. And it's more like thinking of the data as a great theater in front of you where you get to have a really great seat to observe what's going on in the world. And not to be judgmental because ultimately that doesn't really help."
Mason's made it her goal to figure out how all this data can help. She admits that when she first started at Bitly in 2010 she didn't have answer to that question.
"You start by counting," she says. "You start by counting everything and seeing what correlations you can find."
I've met Mason at Bitly headquarters, a loft space in lower Manhattan. Mason is in her 30s, with a patient, relaxed demeanor and a quick laugh. We sit in a glass-walled meeting room on the edge of the general work area. Among the rows of large Mac screens I count six standing desks and one in-office skateboarder.
You know Bitly if you've ever shortened a link to share on social media, but the startup also offers services that allow you to track traffic flow generated by your posts. If you want to post a Web site link to Twitter or Tumbler or Facebook, you can save a link with Bitly (www.smartplanet.com becomes http://smrt.io/15gkcC7), to see the statistics (clicks, social network activity, geography) as they come in.
"The way I usually explain our dataset," Mason says, "is it's one of the world's largest databases of gossip. It's people sharing things with other people. And we can really learn a lot of things that we haven't been able to learn about how people communicate before. What we're essentially able to do right now is take the 300 years of social science research about human behavior and communication, and actually repeat it quantitatively at the scale at which humans communicate."
Mason's data analytics now allow Bitly users to see which websites are currently most popular for a given search term or category, or to see who is looking at what kinds of sites where, like "stories about food being read by people in Brooklyn." Users can also see which topics are currently receiving a spike in attention online.
While Bitly does not sell data about its users directly, it does sell products built based on aggregate user behavior. And Mason says she takes this issue seriously.
"When we do any project, there are a few questions we always ask. One of them is, 'What is the most evil thing that can possibly be done with this?' And I don't ask that question because I think we're evil, but because it makes you think really creatively about the potential implications of the thing you want to build."
When I ask Mason if there's anything she'd like to change about the way the tech industry currently uses data, she answers with an eager "Absolutely." She explains, "I hope that in the future data is used to empower people, and not just for marketing purposes," she says. "I think we need more ambition about using our data to make our lives better."
I press her for concrete examples. She pulls out her Android phone to show me Google Now. The new application takes information from your Gmail and Google calendar to guide you about your day. Mason points to a restaurant name on her screen. "So I'm having dinner here in the East Village. So it's telling me it's 10 minutes away, and this is the transit stop nearby and this is the weather."
"It's not a good product actually -- well, not yet," she chuckles. "But the reason I'm so excited about it is for the first time Google is using all of the data that they have about us, from our e-mail, from our calendar, to make something for us, and not just to sell to advertisers. And I think Google making this decision will actually lead the way for other people to do that, too."
She mentions another app, Dark Sky for the iPhone, that gathers government weather data and your precise GPS coordinates to give you instantaneous weather information at your exact location. "That is the kind of thing I'm talking about," Mason explains, "where you take some data asset and you reprocess it in a completely personalized way for someone to make a decision."
In contrast, one common mode of data use particularly irks Mason: The current way that sites like Netflix and Amazon Books offer users recommendations.
"The thing to remember," she tells me, "is they're not optimized to find you things you'll enjoy, they're optimized to get you to spend more money or time on Netflix or Amazon." Mason launched her own site, BookBookGoose.com, as a tongue-in-cheek protest earlier this year. The site gives visitors randomized Amazon book recommendations. In order to see whether people followed the haphazard suggestions, Mason registered the site as an Amazon affiliate.
"The joke is that random is actually not a terrible way to browse through the books," she says, smiling conspiratorially. "It is actually a profitable startup, because I've made $150 in Amazon affiliate fees," Mason chuckles. "But that wasn't the point. The point was to show that the existing recommender systems we have are actually pretty poor for us. If you look at a book about java script programming on Amazon, all it does is show other java script books to you, because that's the thing you’re probably most likely to buy. But it's not going to recommend some serendipitous beautiful novel that you might enjoy, because you’re less likely to buy that. But really that's what you would want, to find the things that are on that edge of delight and discovery."
Mason explains that she sees this as more a problem with the way people currently use data technology, rather than a problem with the technology in general. "I think this is actually a problem we can address," she tells me, nodding her head. "If we build systems that people choose to use that are designed to help them have the kind of experience they want to have."
"Who would be motivated to do that?" I ask. "The motivator of making more money or driving more business is a pretty strong motivator, who's going to want to use data not just to drive profits?"
Mason points to the emerging use of data in city planning and health care. The city of New York recently reduced ambulance response times by computing the most convenient coffee shops for drivers to frequent between calls, she tells me. And she also sees promise coming from the quantified-self community, people who carefully track all of their vital signs.
"The thing about that is I think all those people are crazy," she admits. "But crazy in the way that they're willing to invest a huge amount of energy in collecting this kind of data and figuring out what they can learn from it. And ultimately we will figure out ways to collect that data without the pain and energy invested, and they'll have done the pioneering work to tell us what we can learn from it." She predicts that in the near future we'll be able to anticipate a cold the day before it arrives.
It's these sort of possibilities that leave Mason optimistic. "It's rare that you can solve a technology problem with more technology," she says, "but I think this is a case where we probably can, by changing the incentives of the system." She'd like to see companies and institutions focus on ways they can use data about users to enhance user experience, rather than just to drive sales. And that's not naïve she says. "I do think there are ways to build these things and still make some money -- as my little dumb hack that made $150 can prove," she laughs again.
Correction 4/30/13 11:20am EST: An earlier version of this article named Mason as Bitly's Chief Data Scientist, her correct title is Chief Scientist
Photo: Erin O'Brien
Apr 29, 2013
Really we can learn a lot of things that we havenât been able to learn about how people communicate before. Can also see which topics are currently receiving a spike in attention online...
Our heroine cites an App "Dark Sky for the iPhone, that gathers government weather data and your precise GPS coordinates to give you instantaneous weather information at your exact location." What??? You could always try lifting your head up out of your "smart"phone and LOOK AROUND YOU! Now *that* would be smart! As Bob Dylan didn't say: "You don't need a weatherman to know which way the wind blows ...but a smartphone App might help!" Good grief....
enrichment, while, hopefully, giving the user a useful service. So, every website which a user visits, will collect personal data about that user, and his/her travels through the internet, and his preferences of sites and purchases, and information regarding family and friends, and school(s), and so on, and so forth. So, if we have some 30 or 40 big websites interested in that kind of information, that's lot of duplication of data about a person. So, if a person is willing to give up that kind of information, simply through the things he or she does on the internet, why not just take the information directly from the user each time he or she gets on the internet. Basically, the user could have a code which takes each website to a central location to get the user's data. That central location will have all of his personal information, and information about friends and family and employer, and tastes for food and clothing and shoes and PC/Mac/tablet/smartphone preferences, and Skype and Facebook and Twitter and Google+ "friends". Basically, if websites can capture any and all information about a user, then, why not avoid the duplication of efforts and data, and just tap in to a central location where the user has decided/agreed to keep all of the information which a website might be interested in. Basically, if a user is naive enough to put his/her personal life on the internet for a website to see and use, then, why not make it official through a centralized location for that kind of information?