This page will tell you how to make surface maps from survey data and what some of the limitations are. It reports on an empirical test, enables you to contact us, and download surface maps of social phenomena in Lansing and Ingham County, Michigan.
One of the most interesting and beautiful things that can be done with survey data is to use it to make estimates of the geographic distribution of social phenomena at the local level and plot them on surface maps. With surface maps your data story can be brought to life. This site explains step-by-step how to do this.
Graphic 1.
Download me!
Geographic analyses are essential for making best use of survey data. Social phenomena have strong spatial components and these often defy stereotypes that community members may hold.
Surface maps are one of the most powerful ways to graphically display geographic data. Surface maps are used to show the estimated value of some variable--like the proportion of the population uninsured, unemployment rate, etc.--at any point in a community. They consist of a plane whose surface corresponds to the geography of a community (just like a paper map). However, the surface is three dimensional. The height of the surface gives the value of the mapped variable.
Not only that, but surface maps are visually appealing. They can grab and hold people's attention and help them to understand your point of view.
And surface maps can make complex ideas simple: For example, you can plot standard errors to illustrate where you have confidence in your estimates and where you do not. You can plot the difference between two variables in standard deviations to show where they correspond and where they are different.
Graphic 2.
Download me!

Nuts and Bolts
We have extended the work of Gerard Rushton on the spatial analysis of social data to incorporate synthetic estimation.
Four things are required to make synthetic projections using survey data:
When survey data are are used to make surface maps as described here, synthetic estimates of the distribution of phenomena of interest are made, and projected down to the local level. (It is assumed that we are working with survey data that was not geographically coded during collection.) This is an important point. Survey data typically are used make estimates of the prevalence of behaviors and attitudes in the general population, and they can be used to make estimates of the local distribution as well. However, they remain estimates, and are subject to all the sources of error that survey data are, plus a few that arise from downward projection.
What is meant by "projection"? When you project, you extrapolate from your survey findings down to the local level. Consider a simplified example: Suppose in your survey you find that 25 percent of respondents with household incomes under $30,000 lack health care insurance, but only 10 percent of those with incomes over $30,000 do so. In that case you would expect the percent of the population uninsured to be higher in lower income communities. If you can divide your community into parts based on income, you can "project" an uninsured rate of 25 percent onto lower income neighborhoods and a rate of 10 percent onto higher income ones.
Using the method to be described here, you make projections onto a rectangular grid of longitude and latitude coordinates. In the example just mentioned, each point on the grid would contain a datum--the estimated percent of the population uninsured in the area around the point--and it is these points that would be mapped. What do you want to map? The proportion of the population without health care insurance? The prevalence of depression? The smoking rate? Once you decide this there are five steps:
1) Select variables to work with, a coding scheme, and a geographic level. Examine the Census data for your community side by side with your survey data. Select four to six demographic variables present in both data sets to be used to make projections. Choose variables strongly related to social phenomena like income or percent of poverty, gender, age, race or ethnicity and education. Include household type if you can (single mothers!). Include student status if you are a university town.
Settle on a coding scheme that will be used to join your Census data and your survey data. Are you going to break respondents' incomes up by $10,000? Which races will you include? etc. Need to learn more abut Census data? Browse and download free from the gazetteer on the Census web site.
Obtain Census data at the lowest level you can work with. The smallest geographic region is the block group. But you can use Census tract, zip code, etc.
2) Recode both your survey and Census Data to conform to your coding scheme. Take Note! Survey data consist of information about individuals. Census data consist of information about geographic regions. Where one variable is used to measure something on a survey, the Census must use several. For example, a survey data set might contain a variable that tells whether a respondent was White, Black, Asian, etc. Census data will have several variables for the same thing: one for the number of Whites in a region, another for the number of Blacks, etc. Recoding your survey data will mean changing the coding categories in a variable, but recoding the Census data will mean changing the way data are grouped into variables.
3) Aggregate Census Data to a Projection Grid. In this step you set up the grid of data to be plotted. You must have access to a GIS package for this step. However, if you are careful, you will only need to do it once, so you might be able to get away with borrowing one from a university.
The first step in aggregating your Census Data is to set up a projection grid. This is a rectangular grid of longitude and latitude coordinates. Around each point is a circular buffer. The Census data inside a buffer is assigned to the corresponding grid point. Your GIS software will have a function for making buffers. Graphic 3 shows a grid of overlapping buffers on Census block groups.

Notice that the buffers overlap. This helps ensure that the data represent points on the same smooth surface. (If we were geographers we would say we want stable semivariance.) How much should the buffers overlap? In most cases the buffers should be just wider than twice the vertical or horizontal distance between two grid points. This way each buffer will embrace five grid points--one at the center and one each of those directly above, below, to the right and to the left. For example, if we are working with a one mile grid, each buffer should be just over two miles in diameter.
The density of the grid will depend on the size of the regions containing the Census data (assuming the regions contain enough people to be statistically stable). As a rule, the grid should be larger than the average size of the regions, making the buffers about four times as large. In Graphic 3, the grid points are one half mile apart, the buffers are 1.2 miles in diameter and about 1.1 square miles in area. The average block group is about .2 square miles in area.
An easy way to make the grid: Start making a square grid. Create a data base with two columns, one for the longitude of the points in the grid and one for the latitude. Enter the values for the longitude in ascending order once, and do the same for the latitude. Now copy the columns so you have as many copies as you have rows or columns. For example, if you are making a 10 by 10 grid make 9 copies plus the original. You would now have 100 point pairs in two columns. Now sort the entire longitude column in ascending order. Voila! You now have every longitude and latitude point on the grid. If your grid is to be rectangular rather than square, search and delete the points that you want to exclude.
Use your GIS program's aggregate function to aggregate all of the variables you will use in your analysis to the grid using the buffers you created. Aggregate from the Census region to the buffers, assigning the Census data to every buffer it is in. Select the option to assign to a buffer only the percentage of the data in a Census region inside the buffer.
4) Generate Survey Data Tables and Project Results to the grid. In your survey data, cross tabulate your dependent variable with the variables selected earlier to be used for projection. (Be sure you have weighted the data to correct for unequal probabilities of selection and known sampling errors and biases.) Create a new variable for each cross tabulation. Returning to the simplified example, suppose these were your survey results:
| More Than $30,000 | Less Than $30,000 | Total |
|---|
| Uninsured | 8.2 | 22.1 | 15.1 |
| Insured | 91.8 | 77.9 | 84.9 |
To make the projection from this cross tabulation you would write some code something like this:
IF (INCOME UNDER 30,000) THEN NEWVAR1 = .082 * POPULATION.
IF (INCOME OVER 30,000) THEN NEWVAR1 =.221 * POPULATION.
The final projection is the average of the projections from each cross tabulation.
4) Map the Grid. To map the grid, import it into any software package that makes surface maps. Each has virtues and faults. Experience has taught us a few things. Surface maps are most appealing when they have large detailed surfaces with coordinated colors. The visual effect is part of the attraction. However, they must also have clear legends. A text box explaining the map is nearly a must. If your software will allow you to put landmarks on the map, that helps the average viewer a great deal.
However, there are important differences between the projection and the actual rate which illustrate the limitations of synthetic estimates as well as the importance of carefully selecting the variables to be used for projection. First, in the northwest corner of the maps is the city of Lansing with a high unemployment rate. In the projection, the region of high unemployment is continued further east into the city of East Lansing, where Michigan State University is located. The projection failed to account for the student status of the large number of young adults there (some 40,000). The projection also failed to capture the high unemployment rate in the town of Weberville which was due to local economic conditions.
Contact us...
Total hits on this site:
September, 1997. Haslett, Michigan.Accuracy and Limitations
How good are these projections? We recently conducted a test by making a surface map of the percent of the population unemployed using survey data gathered in 1993, and comparing the results to the 1989 Census. The correlation between the projection and the actual rate on the grid was .95. The projection is shown below in Graphic 4. The actual rate is shown below in Graphic 5. Generally, there is good correspondence between the two maps.Graphic 4.

Graphic 5.

An important point, which is somewhat technical, but should influence how the projections are interpreted, is that the estimates necessarily have a smaller variance than would actually be found. That is, you will always under estimate the highest, and over estimate the lowest grid points. The scatter plot in Graphic 6 illustrates this. You can correct the variance slightly by multiplying the estimates by a factor that makes the value of another variable derived from them (say the average of the population in the buffers) correct. But you cannot make the problem go away.Graphic 6.

Learn about the people who built this site by visiting our home page.