Isn’t the Zip Code Level Good Enough—Why Look at More Granular Housing Market Data?

by Guest Contributor 7 min read October 7, 2011

By: John Straka

For many purposes, national home-price averages, MSA figures, or even zip code data cannot adequately gauge local housing markets. The higher the level of the aggregate, the less it reflects the true variety and constant change in prices and conditions across local neighborhood home markets. Financial institutions, investors, and regulators that seek out and learn how to use local housing market data will generally be much closer to true housing markets.

When houses are not good substitutes from the viewpoint of most market participants, they are not part of the same housing market. Different sizes and types and ages of homes, for example, may be in the same county, zip code, block, or even right next door to each other, but they are generally not in the same housing market when they are not good substitutes. This highlights the importance of starting with detailed granular information on local-neighborhood home markets and homes.

To be sure, greater granularity in neighborhood home-market evaluation requires analysts and modelers to deal with much more data on literally hundreds of thousands of neighborhoods in the U.S. It is fair to ask if zip-code level data, for example, might not be generally sufficient. Most housing analysts and portfolio modelers, in fact, have traditionally assumed this, believing that reasonable insights can be gleaned from zip code, county-level, or even MSA data. But this is fully adequate, strictly speaking, only if neighborhood home markets and outcomes are homogenous—at least reasonably so—within the level of aggregation used. Unfortunately, even at zip-code level, the data suggests otherwise.

Examples

All of the home-price and home-valuation data for this report was supplied by Collateral Analytics. I have focused on zip7s, i.e. zip+2s, which are a more granular neighborhood measure than zip codes. A Hodrick-Prescott (H-P) Filter was applied by Collateral Analytics to the raw home-price data in order to attenuate short-term variation and isolate the six-year trends. But as we’ll see this dampening still leaves an unrealistically high range of variation within zip codes, for reasons discussed below. Fortunately there is an easy way to control for this, which we’ll apply for final estimates of the range of within-zip variation in home-price outcomes.

The three charts below show the H-P filtered 2005-2011 percent changes in home-price per square foot of living area within three different types of zip codes in San Diego county. Within the first type of zip code, 92319 in this case, the home-price changes in recent years have been relatively homogenous, with a range of -56% to -40% home-price change across the zip7s (i.e., zip+2s) in 92319. But the second type of zip code, illustrated by 92078, is more typical. In this type of case the home-price changes across the zip7s have varied much more. The 2055-2011 zip7 %chg in home prices within 92078 have varied by over 40 percentage points, from -51% to -10%. In the third type of zip code, less frequent but surprisingly common, the home-price changes across the zip7s have had a truly remarkable range of variation. This is illustrated here by zip code 92024 in which the home price outcomes have varied from -51% to +21%, or a 71 percentage point range of difference—and this is not the zip code with the maximum range of variation observed!

House Price Trends - ZIP 92139
House Price Trends - Zip 92078
House Price Trends - ZIP 92024

All of the San Diego County zip codes are summarized in the bar chart below. Nearly two-thirds of the zip codes, 65%, have more than 30 percentage points within-zip difference in the 2005-2011 zip7 %changes in home prices. 40% have more than a 40 percentage point range of different home-price outcomes, 23% have more than a 50 percentage point range, and 13% have more than a 70 percentage point range of differences. The average range of the zip7 within-zip code differences is a 37 percentage point median, 41 percentage-point mean. These high numbers are surprising, and are most likely unrealistically high.

Summary of Within-Zip (Zip+2 level) Ranges of Variation in Home-Price Changes in San Diego: Percentage of Zips by Range Across Zip+2s in Home Price/Living Area %Change 2005-2011

House Price Changes

Controlling for Factors Inflating the Range of Variation

Such sizable differences within a typical single zip code clearly suggest materially different neighborhood home markets. While this qualitative conclusion is supported further below, the magnitudes of the within-zip variation in home-price changes shown above are quite likely inflated. There is a tendency for a limited number of observations in various zip7s to create statistical “noise” outliers, and the inclusion of distressed property sales here can create further outliers, with cases of both limited observations and distress sales particularly capable of creating more negative outliers that are not representative of the true price changes for most homes and their true range of variation within zip codes. (My earlier blog on June 29^th discussed the biases from including distressed property sales while trying to gauge general price trends for most properties.)

Fortunately, I’ve been able to access a very convenient way to control for these factors by using the zip7 averages of Collateral Analytics’ AVM (Automated Valuation Model) values rather than simply the home price data summarized above. These industry-leading AVM home valuations have been designed, in part, to filter out statistical noise problems.

The bar chart below shows the still significant zip7 ranges within San Diego County zip codes using the AVM values, but the distribution is now shifted considerably, and more realistically, to a much smaller share of the zip codes with remarkably high zip7 variation. Compared with the chart above, now just 1% of the zips have a zip7 range greater than 60 percentage points, 5% greater than 50, and 11% greater than 40, but there are still 36% greater than 30.

To be sure, this distribution, and the average range of zip7 differences—which is now a 25 percentage-point median, 26 percent age-point mean—do show a considerable range of local home market variation within zip codes. It seems fair to conclude that the typical zip code does not contain the uniformity in home price outcomes that most housing analysts and modelers have tended to simply assume. The difference between the effects on consumer wealth and behavior of a 10% home price decline, for example, vs. a 35 to 50% decline, would seem to be sizable in most cases. This kind of difference within a zip code is not at all unusual in these data.

Automated Value Model price - San Diego County

How About a Different Type of Urban Area—More Uniform?

It might be thought that the diversity of topography, etc., across San Diego County (from the sea to the mountains) makes its variation of home market outcomes within zip codes unusually high. To take a quick gauge of this hypothesis, let’s look at a more topographically uniform urban area: Columbus, Ohio.

When I informally polled some of my colleagues asking what their prior belief would be about the within-zip code variation in home price outcomes in Columbus vs. San Diego County, there was unanimous agreement with my prior belief. We all expected greater within-zip uniformity in Columbus. I find it interesting to report here that we were wrong.

Both the H-P filtered raw home-price information and the AVM values from Collateral Analytics show relatively greater zip7 variation within Columbus (Franklin County) zip codes than in San Diego County.

The bar chart below shows the best-filtered, most attenuated results, the AVM values. 5% of the Columbus zips have a zip7 range greater than 70 percentage points, 8% greater than 60, 23% greater than 50, 35% greater than 40, and 65% greater than 30. The average range of zip7 within-zip code differences in Columbus is a 35 percentage point median, 38 percentage-point mean.

Conclusion

These data seem consistent with what experienced appraisers and real estate agents have been trying to tell economists and other housing analysts, investors, and financial institutions and policymakers for quite a long time. Although they have quite reasonable uses for aggregate time-series and forecasting purposes, more aggregate-data based models of housing markets actually miss a lot of the very real and material variation in local neighborhood housing markets. For home valuation and many other purposes, even models that use data which gets down to the zip code level of aggregation—which most analysts have assumed to be sufficiently disaggregated—are not really good enough. These models are not as good as they can or should be.

These facts are indicative of the greater challenge to properly define local housing markets empirically, in such a way that better data, models, and analytics can be more rapidly developed and deployed for greater profitability, and for sooner and more sustainable housing market recoveries.

I thank Michael Sklarz for providing the data for this report and for comments, and I thank Stacy Schulman for assistance in this post.