Google is updating Flu Trends, its formula for
making weekly predictions of the number of flu
cases. The company’s philanthropic arm,
Google.org, announced in a blog post that it has
revised the algorithm that tracks U.S. flu cases.
Until recently, Google Flu Trends was the poster
child for the promise of big data as a tool for
social good. Companies from Nielsen to large
telcos are offering up their customer data to
experts who can mine it for clues about public
health and social unrest. The Flu Trends
algorithm uses data derived from Google search
terms to predict flu outbreaks in 29 countries.
The company recently added Dengue Trends to
its roster of prediction services.
But Google Flu Trends turned out not to be the
best representative of big data for social impact.
While Google’s model was relatively accurate
initially — the company first launched it in 2008
— by the 2012-2013 season, its predictions were
far off the mark. Google Flu Trends overshot the
number of flu cases 95 per cent of the time
during that season, according to David Lazer, a
Northeastern University computer science
professor. Google updated its algorithm the
following year, but it still overshot by 75 per
cent, Lazer said.
Google doesn’t dispute Lazer’s findings.
Researchers evaluate Google Flu Trends’
predictions by comparing them to estimates
tallied by the Centers for Disease Control and
Prevention. About every week, the CDC reports
the percentage of Americans assumed to have
contracted flu in the previous week. Roughly
3,000 healthcare providers around the country
contribute data to the CDC. The original aim of
Google Flu was to improve on the CDC’s lag by
offering a real-time snapshot of flu cases.
Researchers who discovered the 2012-13
discrepancy asked Google to detail its algorithm
with the intention of figuring out where it went
wrong, but the search company demurred.
Friday’s update doesn’t say much about what
has changed. In the past, Google Flu Trends
incorporated CDC tallies from the previous flu
season. Now it will integrate CDC data on a
continual basis. Beyond that, the company
provides no specific information about the
differences between the new algorithm and the
old.
“We look forward to seeing how the new model
performs in 2014/2015 and whether this method
could be extended to other countries,” the Google
blog post said.
There may be some change in the search terms
Google Flu Trends tracks. While Google doesn’t
share the actual query terms in its model —
which have numbered between 50 and 300 —
Christian Stefansen, the project’s technical lead,
said the new algorithm aims to differentiate
between people searching for flu information out
of curiosity or concern and those who are
searching because they have flu symptoms. The
earlier algorithm may have overshot the CDC’s
numbers because it counted people who were
searching for news about the flu as an indicator
of people who actually had the virus, Lazer says.
But because Google Flu Trends doesn’t publish
the terms it uses or the algorithm itself, it’s
impossible to know why its estimates were
wrong — or what might have changed to make it
right. “They just didn’t give enough detail to
figure out how they had done what they had
done,” Lazer says.
Stefansen says the company plans to “publish
the details in a technical paper soon,” but the
forthcoming publication won’t contain the search
terms that independent researchers want to see.
“We would love to, but if we were to do that, it
would be easy for someone to game the model,”
Stefansen said. “We’re at this intersection
between providing a service for free and making
it researchable, so we’re trying to strike the best
of both worlds.”
Google Flu Trend’s effort to tread the line
between an accurate service and a transparent
one reveals a wider tension as academic and
social institutions clamor for access to private
data. Global Pulse, the big-data project of the
United Nations, uses Twitter data in an attempt
to pinpoint international crises before they occur.
But even greater insights may be locked up in
corporate data, says Global Pulse spokesperson
Anoush Tatevossian. Global Pulse is working
with the Universal Postal Service, an organisation
the tracks global postal travel, to see whether
there is a correlation between mail flow and
poverty. It’s also working with a large telco —
she won’t say which — to see whether mobile
phone data can help predict the spread of
diseases like malaria and HIV, and whether
patterns in phone credits reveal more about
international wealth and poverty than traditional
indicators.
This area of research is very new, said Dan
Kaufman, director of DARPA’s innovation office,
in an interview last week at the WSJ.D Live
conference in Laguna Beach. DARPA recently
funded a research project dedicated to using
large-scale data mining to predict population
behavior and social events. If policymakers are
to use big data to make decisions, Kaufman
says, better tools are needed to ensure the
data’s accuracy.
Source: wsj.com
Like This Article? Please Like Us On FACEBOOK

No comments:

Post a Comment