Researchers first examined flu outbreak surveillance data from the CDC’s influenza-like illness (ILINet) system, along with prediction models based on the database. Using an influenza surveillance system based on Twitter data and filtered out media awareness campaigns and other confounders, the researchers created a predictive model combining the two.
Forecasts were made during the 2011-2012, 2012-2013 and 2013-2014 influenza seasons (Nov. 27-April 5) using week-by-week observational and historic data. Comparisons were made to ILINet reports immediately after their release and 1 week after collection and when the CDC released its more accurate estimates weeks later.
Researchers found that forecasting studies that use historical ILINet data must account for the fact that these data are often initially inaccurate and undergo frequent revision, effectively increasing the lag between data collection and the time that accurate numbers are available to health professionals. However using a model combining Twitter and historical data outperformed one that only relied on the latter. Using the Twitter model reduced nowcasting error by 29.6%, which dipped to 6.09% when using the CDC’s final estimates. The Twitter model was regularly more accurate than the baseline when forecasting outbreak estimates, with 10-week predictions that had fewer errors than the baseline model of 4 weeks earlier. Impressive!
Prediction data also was collected from Google Flu Trends, a surveillance system based on Google search volume, for additional comparison. It turns out that Google Flu Trends only reduced error over the baseline during a single influenza season, and was outperformed when making future predictions.
The study found there were several benefits to using Twitter over [Google Flu Trends] including the ubiquity, openness, public availability and ease of use of Twitter data.
The score currently stands as Twitter 1, Google Flu Trends 0, CDC Historical Trending -1.