Everyone Benefits When Small Data Goes Big
Posted by Gadi Ben-Yehuda
It is advantageous for everyone when people share certain data with one another and with the public.
For decades, urbanites have been tuning in to local radio stations before their morning commute to hear the traffic report. Based on static cameras at strategic locations, as well as helicopters and tips from motorists or passengers, traffic reports are perfect examples of data that give rise to decisions. Holland Tunnel is blocked? Route around it. Pile up on the Inner Loop? Exit early. But in 2008, drivers got a whole new way to see traffic in real time.
That year, Waze was released—a mobile app that used information from the cellphones of people who downloaded it, sharing their location and speed with other people who were using the same application to create a crowd-sourced traffic report. By sharing the information on their phone, everyone benefited. The “small data” coming from single users was anonymized, amalgamated, and the turned into Big Data that was visualized as a traffic map.
This is only one example of this phenomenon, but in the past few years, more types of data have begun going small and are now breaking into the mainstream. Five areas seem especially ripe for people to share anonymized data with one another and with their community for the purpose of helping everyone realize the benefits that are normally associated with big data. If people are willing to share basic information—voluntarily given and thoroughly anonymized—about their health, transportation, energy, finance, and the sharing economy, they will benefit from the insights that only big data analytics can offer.
The Benefits of Big Data
What are those benefits? The most succinct answer is also the most abstract, but it is: higher quality services, and better individual decision-making, at a lower cost. That’s it. The promise of analyzing big data (which presupposes that there is big data to analyze) is that it will result in people and organizations making better decisions, enjoying better services, and saving money. How that applies in each of the five areas is addressed below.
Five Examples of Small Data
Wearable technology may be flashiest way that data is entering people’s lives, but it is hardly the only way:
- Financial data has been on an upswing for years, as services that were once available only for a fee (like Quicken or MSMoney) are now free (through Mint, MVelopes, Buxfer, orReadyForZero).
- Vehicle and Transportation Data: information from cars’ engine computers can be collected and understood through devices like Automatic or Bluetooth accessories for smartphones, and transportation data can be shared through Waze or Inrix.
- Energy data can be collected, and immediately used to reduce energy costs, but a number of consumer devices. Further, more cities are deploying smart grids, so that households can see exactly the same information that the utilities do, and in real-time.
- Health data is also going mainstream, as wearables are being joined by smartphone accessories that replace many separate, commonplace home health devices—like a scales and thermometers.
Put together, these five areas begin to paint a picture not only of an individual’s life, but of the life of a community. They can begin to help communities assess and then address their own needs both in terms of routine events, and by spotting and rectifying anomalies as they begin to occur, rather than when they have already spiraled out of control.
Protecting Privacy while Maintaining Utility
Though the benefits of sharing data are great, before any information is collected, the privacy of all participants must be assured. Famously, three seemingly innocuous data can be used to identify 87% of US adults: gender, ZIP code, and birthdate. That this can be done should not dissuade people from seeking the benefits of turning their small data into Big, but rather should reinforce the notion that only the data that actually helps should be collected.
For most applications, as an example, birthdate is superfluous information; at best only the birth year, or even better, a range of birth-years, would suffice to help people derive meaning, and thus benefit, from the data. Further, ZIP codes could be replaced by other types of location data, for example, distance from major urban centers—e.g. “within a 5-mile radius of certain coordinates.” Gender is unlikely to be needed for any of the applications discussed.
In each area, then, care must be taken both by individuals and whatever organization, public or private, that amalgamates the small data into large data sets to ensure that no individual is likely to surrender their personal identity.
The Difference between Data and Insights
The data, by themselves, are meaningless. They are dots without connection. The ability to make those connections are what makes the data so powerful, which is why government agencies have so many restrictions on collecting certain kinds of data. What is important, however, is not the data but rather the insights to which they give rise, and that is why it is not necessary that government agencies themselves (as opposed to private-sector organizations) collect or even have access to the data.
An easy example of government agencies using the benefits of data without collecting, or even having access to it, is Facebook ads. Users share information with Facebook, and government agencies make use of that data by crafting ads that appear only to specific demographics. For the five topics listed below, government agencies could equally as easily work with extant companies or with companies yet to be founded that will exist solely to provide agencies with the insights discernible through analysis of small-data-made-big.
Small Data Goes Big for Personal and Public Good
People have always collected data on these five areas of their lives, they just haven’t called it “collecting data.” Balancing a checkbook and tracking a household’s finances are exercise in data analytics. Charting the course of a child’s fever is an exercise data collection coupled, perhaps, with emergency management. Listening to the traffic report in the morning and altering one’s route to work is data collection to power real-time decision making.
What makes the current situation different than, say, 20 years ago, are three elements: first, our ability to gather and store data has increased dramatically; second, we have gained new methods through which we can share discrete portions of our personal data, as well as comfort in doing so; and finally, both individuals and institutions are guiding their decisions based on multiple data sets—call it the Moneyball Approach to life.
Health: There are two types of health-data-collecting devices, and it’s important to distinguish between the two. The first are fitness-trackers, like FitBit and Fuel Band. While interesting and often fun, they are actually of less utility for health-tracking than the new breed of monitors that connect to smartphones and help people measure their temperature, blood pressure, weight, and other vital signs.
Though knowing when any individual is sick is not of interest to a community, much less to a government agency, when many people start getting sick at the same time, it can be of great interest. First, because extra resources might need to flow into an area (think: more tamiflu), and second because special resources might be needed (think: shingles vaccine if an outbreak is detected).
The information that people would need to share is easily anonymized: initially, only 5-mile radius location and temperature. That would be enough to alert public health officials of a nascent outbreak. A Waze-like health app could then augment that rudimentary data with other important notifications. Ultimately, a program like this could have huge benefits for the economy in terms of less money lost due to illness, and could potentially save lives as illnesses could be detected and treated much more quickly.
Financial: One of the constant questions that countless magazines offer to answer is: am I getting the best deal on my mortgage/savings account/car insurance/credit cards. And yet, people have a hard time talking about the terms of their mortgage, savings accounts, car insurance, and credit cards. Further, few people know about all of the new financial tools springing up all the time.
Many people, for example, still turn to payday loans to cover for short-term debt. Yet, those loans have an annual interest rate that can be anywhere from 400 percent to greater than 700 percent. But developers could code an application that not only shared basic financial information, just location down to a 5-mile radius, and the terms and balances of various accounts—credit card, mortgage, checking, etc—but, further, coupled with information and forms from companies like ReadyForZero, which helps people plan to get out of debt, and Acorns, a micro-investing platform and Kiva, a micro-lending application.
Energy: One household turning up (or down) their thermostat doesn’t have a great impact on an electricity grid, but many households making changes in their thermostat can. Aggregating energy use by community and then agreeing to try to keep to a certain level of consumption can increase the resiliency of an electricity grid, as well as keep more money in residents’ pockets.
Further, by having a baseline for comparison, individuals can see if they are falling above or below the average energy consumption for their home and can make changes as they see fit—using their own data which they do not share with any other organization, public or private.
Transportation: Applications like Waze are only the beginning. Devices like Automatic and a host of smartphone accessories can plug in to the computers in cars and record performance data. Many states, such as Maryland, require emissions testing, but much of the pertinent information can either be recorded or inferred from the engine computer. This is another area in which people could benefit from knowing how other people in similar situations are faring. Drivers could optimize their cars, perhaps even going as far as updating the ECU—the engine computer—based on other drivers’ experiences and outcomes.
And then there is information about transportation infrastructure. Already, applications like SeeClickFix, Fix311 and PublicStuff allow people to voluntarily send their city’s agencies requests for service. Further, applications like Street Bump automatically send information about road conditions the same way that Waze does about traffic conditions
Further, sharing Waze information with municipalities can allow for on-the-go traffic management. The pattern of traffic lights can be altered, public transit can respond to special events or abnormal conditions, and the flow of people from place to place can be eased to everyone’s benefit. All that would need to be shared is specific location and ultimate destination (which, in the case of commuting, could be at a neighborhood or public transit stop-level location).
Sharing Economy: Companies like Park Circa, AirBnB, and Relay Rides are the vanguard of what is often called “The Sharing Economy.” Relay Rides sums up the benefit succinctly: “put your idle car to work.” AirBnB puts people’s idle rooms or homes to work, and Park Circa puts people’s idle parking spaces to work. The benefits for a community is a more efficient allocation of resources: people who don’t need to own a car (because they drive infrequently) can still have the benefit of a car, so they save money. Conversely, people who own a car but use it only infrequently can still derive value from that car, value that they can then put back into the community.
But there’s a rub: the sharing economy is a largely unregulated one (though some see this as feature rather than bug), and the regime of data-collection in the regular (and regulated) economy does not extend into the sharing economy. A shame, as there is much to gain for everyone if the sharing economy were regulated and its data mined.
The crux of the benefit goes to that idea of allocation of resources. If a community has a host of rooms up on AirBnB, there might be less reason to build a new hotel; conversely if a few blocks sees a lot of AirBnB activity, a new restaurant might take the chance to open there. Likewise, the use of Park Circa may indicate an inefficient allocation of parking resources, indicating the need either for a parking garage or different pricing for on-street parking in the area. Relay Rides could provide data that local governments could use to plan and deploy programs for bike rentals stations, public transit, or other transportation efforts.
Small Data + Social = Big Data and All Its Benefits
The biggest benefit of Big Data is that it helps to ground decisions in reliable probabilities. In the book Cognitive Surplus, author Clay Shirky tells the story of his small-town pizza shop, and how it sold pizza only by the whole pie. When he visited New York City, he realized that pizza shops could sell by the slice, because there was a high enough probability that they would be able to sell the whole pie a slice at a time, given enough foot traffic. His story proves the maxim that a large enough quantitative difference soon becomes a qualitative difference.
The same is true with the aggregation of small data. The temperature readings coming out of a single household may not yield many insights, but the temperature readings coming out of thousands of households in a single city might. The key to all of this is the social element: will we share certain data with one another the way we share links on Twitter, pictures on Instagram and Pinterest, and comments on Facebook and blogs? If we do—providing, of course, that we get the privacy aspects right—we all stand to gain from making our small data big.
This post was originally published on the IBM Center for the Business of Government blog.