Calculated Risk, a site on our regular reading list, has a story today discussing the same chart we analyzed yesterday. (Thanks to reader "RB" for pointing this out). Here is their conclusion:

We can be 99% confident that the YoY changes in real PCE are positively correlated with loosening mortgage lending standards.

Some observers are interpreting this statement to mean there is a .99 correlation -- a very different thing. (Perhaps this is also what Doug Kass meant in his comment on CNBC.)

The data presented have a correlation of .43, evidence of some relationship but far from a .99 correlation. In a .99 correlation, the scatter plot shows every point in a straight line. This is far different from the plot we showed yesterday.

This has a few outliers that define the relationship and a data cloud. Any good analyst looks at a scatter plot when doing regression and correlation. Looking at the residuals is the only way to know if a linear model is appropriate. In this case, the warning flags should go up.

A correlation of .43 means an r-squared of .19. The statistical interpretation of this is that about 19% of the variation in one series, as defined by the squared deviations from the regression line, is "explained" by the variation in the other series. This is how one looks at * substantive significance* -- whether the relatinoship is important.

So where does the .99 come in? A linear regression analysis calculates a slope coefficient and an intercept. In this case, the slope coefficient for the equation is .10, as we noted on our chart. This means that every 1% change in the mortgage availability measure is associated with a one/tenth point change in year-over-year PCE. The slope coefficient has a standard error, calculated from the number of cases and the degrees of freedom. The standard error for the slope coefficient is .02.

Since the coefficient is much larger than the standard error, we can be 99% sure (making some other assumptions about the typicality of the data we have) that the "true" slope coefficient is * not zero*. This is a test of

*.*

**statistical significance**The confusion of statistical and substantive significance, and which measures are used for each, is one of the most common mistakes made by those without a strong background in research reseasrch methods.

To summarize, the Calculated Risk statement that we can be 99% sure of *some relationship* between the two variables is correct.

Everything in our article yesterday is also correct. The degree of association is not as strong as the misleading graph suggests. The entire relationship rests upon something that happened for a year or so in the early nineties. The measure of mortgage availability is not very good for the purpose. There is not enough data. The resulting relationship is probably spurious.

As someone who taught these classes at the graduate level, I have no illusion that the average reader is going to appreciate these distinctions, however important. It is a good illustration of how easy it is to be fooled by the eyes, and how difficult it can be to reach the truth.

Jeff,

Regarding CXO, here's an example of how they use Pearson correlations and lags to demonstrate that ECRI WLI is a coincident indicator at best for the stock market.

http://www.cxoadvisory.com/blog/internal/blog1-28-07/

Posted by: RB | March 09, 2007 at 07:49 PM

Hi RB,

I am generally a fan of the CXO work. Some of their topics are on my agenda. I'm not show how they use a Pearson coefficient to gain insight into causality. It may be a superior correlation measure. When looking at time series data we often try for relationships with lags of different lengths.

In this series, the chart makes it seem that no lag is helpful. In fact, using Year-over-year data for PCE and quarterly changes for mortgages makes it seem that, if there is causality (unlikely), it might run the other way.

Posted by: oldprof | March 08, 2007 at 10:39 PM

Using Pearson correlation with two time series is a serious methodological error. One could get a high (spurious) correlation for many pairs of series. Economic statistics move together for a reason, and this reason is seldom a matter of simple causation.

Posted by: oblomov | March 08, 2007 at 04:49 PM

Calculated Risk --

Thanks for stopping by and for commenting. I have had your site on my RSS reader for quite some time. I have visited, although I have never commented.

We are all interested in the relationship between housing and the economy, and we all agree there is some effect. I just do not think that the Fed survey tells us much.

I'm sure that many of my readers share your interpretation of the charts, and that is fine.

We should, however, get exactly the same correlation. I suggest an email spreadsheet exchange! We'll get it sorted out.

Thanks again --

Jeff

Posted by: oldprof | March 07, 2007 at 07:07 PM

Jeff, I've always been hesitant to put any statistics in my posts - for precisely the reason you noted - some readers might not understand what I'm writing. This is only the second time I've mentioned degrees of freedom and correlation coeffiecients, and some people definitely get confused!

Also I didn't discuss causation vs. correlation - I was being brief and I hope most of my readers understand my arguments for less growth in consumption going forward.

I have a few points of disagreement with your first post. First, I don't think the "fit" / "no-fit" graph with circles tells us much. I think we are looking for more macro moves than micro moves. So I think some of those "no fit" areas are actually pretty good fits. One of the key areas you circle as "no fit" (that I agree with and noted) might be a strong argument for the wealth effect from the stock market - something I mentioned in my original post.

Another issue is your "Looking at the Data" segment. You wrote: we "know little about whether the current standards are high, medium, or low by historic norms." I think the absolute level of tightness is immaterial, I believe it is the change that matters for the economy.

Of course we agree completely that the limited amount of data is a drawback. So I wouldn't make Kass' argument:

" ... the clear relationship of mortgage availability to personal consumption expenditures ... occurs in every cycle (up and down). You simply can't deny this relationship."

That is way too strong for me!

And finally, I calculate a correlation of .66, not .43. If your number is correct, then the confidence level of a positive correlation is lower than 99%.

Best Wishes and thanks for your analysis. I wish I had seen it before I posted.

Posted by: CalculatedRisk | March 07, 2007 at 06:24 PM

Just a question -- the CXO people use some sort of Pearson coefficient and see where it peaks to see if one is a leading indicator for the other. I suppose to see if mortgage tightening is a leading or coincident indicator, we could apply something similar here to see where the Pearson coefficient peaks?

Posted by: RB | March 07, 2007 at 05:04 PM

Thanks for the explanation. I've been to plenty of grad school but I really need to learn some statistics now.

Posted by: RB | March 07, 2007 at 02:24 PM