Saturday, July 09, 2011

If this graph doesn't awe you... Meet Google's Hal Varian




Ask Google’s chief economist a question about the future and he goes quiet. “Who knows,” he says when asked whether the share price of high tech companies can keep rising.

But ask Hal Varian about the present and he’ll tell you the possibilities are endless. He’ll be in Australia Monday delivering the keynote address to the annual conference of economists entitled Forecasting the Present.

On the phone from California where he is preparing the board the flight he directs me to a website that allows me to track Australian searches for terms related to flu in close to real time. I can see that so far this year inquiries have been mild, except in South Australia. By contrast in 2007 inquiries about the flu went through the roof.

Exceedingly closely correlated with actual cases of the flu, the web search data comes weeks or months sooner than the official statistics.

In countries where they don’t have as good statistics it is saving lives. Bolivia and Brazil are using Google’s dengue fever data to help direct resources to areas in need as outbreaks strike.

In the United States some economists no longer wait for the official unemployment figures...
“Terms such as - how do I apply for unemployment benefits, how long do they last - it turns out these are highly correlated with actual claims for benefits. We can spot turning points sooner,” says Varian.

In Australia the key word is Centrelink. Searches using that word surged in late 2008 and 2009 during the global financial crisis and - disturbingly - are peaking now, especially in NSW and Queensland.

In Canberra Professor Varian will meet with officials from the Australian Bureau of Statistics to tell them he doesn’t want to replace what they do, merely bring it forward into the present.

“Those agencies have a wealth of well-classified historical data. We can use that to establish correlations and then produce real-time guesses about what’s happening now. Agencies such as yours calculate the rate of inflation collecting prices by hand. We are doing it using barcode and merchant fee data.”

The Google Price Index, one of Varian’s pet projects, is said to closely match the US personal consumer expenditure index excluding food and energy.

“We haven’t seen any divergence yet,” says Varian. But he is fine tuning it before taking it public.

Until recently Varian was a professor of economics at the University of California Berkley specialising in public goods - the sort of things that are produced for the good of society rather than money. For a few years during the 1980s he worked at Melbourne’s Monash University.

When the barely profitable Google asked him to help out in May 2002 he was thrown an embryonic scheme called Ad Auction and told “look at this, it might make us some money”.

Applying the insights of game theorists including John Nash, made famous in the movie the A Beautiful Mind he realised Google had stumbled upon something with the ability to make money for both advertisers and itself. Its key was charging the winning bidder only the price of the bid that came second, allowing bidders to more freely reveal what they thought the ad was worth. He set up a continually moving exchange rate that converts views to clicks and then helped ensure the ads that won were quality ads, because otherwise web users wouldn’t continue to click.

Google now takes in $30 billion a year, the biggest chunk from Ad Auction. “It’s too much to say I’m the man who made Google make money,’ he says. “I just helped.”

His passion is data, and what it can do. A fan of the Issac Asimov Foundation novels in his youth he was taken with the idea of psychohistory, the fictional science of predicting the future based on applying maths to group behaviour. Economists were once frightened of data he says. They need to embrace it along with biologists who look for correlations between DNA and illness, psychologists who look for correlations between brain scans and behaviour.

“We live in a data-rich world,” he says. “The key is distinguishing the data that means something from the data that is noise.”

Asked to chance his arm with a forecast about the future he says the best way to take part is to “become a statistician, it’s the sexy job of the next decade”.

“If you find that hard to believe, think about computer engineers. Who would have guessed a decade ago they would have sexy jobs,” he says.

Published in today's SMH


Related Posts

. So you think you trust the Consumer Price Index?

. Australian statistics are top notch. Not.

. The future, as seen in Google search results


9 comments:

Anonymous said...

Fascinating piece, Peter, and the mind boggles with the statistical possibilities.

I can see economists using those Google tools immediately.

Kymbos.

Anonymous said...

I should add, though - the NSW 'centrelink search' data is 'peaking' at 80-90 searches a week: hardly a huge volume. You can see its potential, though.

Kymbos.

Peter Martin said...

Glad you're already playing around. Anyone can. (Thank you Google!)

The numbers are not searches per week. Explained here: http://goo.gl/O817u

Anonymous said...

Kymbos, the search data is normalised to the highest value. So right now it's at 100% - that is, the highest level it's ever been at. Obviously you have to account for internet penetration etc but there's a pretty clear spike. Google doesn't usually publish absolute numbers of searches.

The spike's also much sharper than usual, if you inspect the history you see spikes every year in ~January/February and in June/July - assumedly relating to start of school/uni and tax time; this time, however, the jump is much sharper (+50% vs +33%ish).

Anonymous said...

The index is relative, you can enter multiple terms delimited with a comma.

The results can make you laugh, they can make you cry. For example.

Perhaps humanity is doomed and doesn't deserve any better.

Anonymous said...

Anonymous, your link might have just caused a skew in the statistics. I googled "Kim Kardashian" to find out who they hell he or she was.

Anonymous said...

Mwahahaha ... my plan to divert attention from global warming to Kim Kardashian is finally working.

Is it getting hot in here?

Anonymous said...

I see, it's scaled. So how do we know how many searches comprise the dataset?

Kymbos.

Peter Martin said...

I've just been to his talk!

Fascinating, especially the uses for the brand new http://correlate.googlelabs.com/

The value is the number of searches for that term (or terms) as a proportion of the total number of searches, normalised.

So changes in internet penetration shouldn't have much impact.

One qualifier is that once the number of searches in a category gets below 50 it is reported as zero, for privacy reasons.

Disease researchers don't like this. They can't observe tails.

Also did you know the use of "adult" sites is strongly correlated to unemployment? As Varian said, if find yourself jobless you have more time on your hands.

Post a Comment

COMMENTS ARE CLOSED