Washington: Hide and Seek

The Census Bureau can count people who it can’t prove exist. One day it may not have to count at all

December 1988 Issue

WASHINGTON

ON “S-NIGHT”—Tuesday, March 20, 1990—enumerators from the Bureau of the Census will attempt to count every homeless person in America. They will begin at shelters and cheap hotels at dusk and then, as the night wears on, move to bus stations, train stations, subway stations, all-night restaurants, and all-night movie theaters. They will ask each respondent if he or she has a home elsewhere. Only people who answer no will be interviewed.

After midnight the enumerators will go out on the streets in teams of two. When they see a sleeping person, they will estimate age, sex, and race by observation. When they meet someone who is awake and lucid, they will try to conduct a short interview but won’t ask for the person’s name. S-night enumerators (the S stands for street and shelter) will be given special training, safety tips, and a number to call in case of trouble, but they will not have police escorts. The Census Bureau is concerned about “the perception of police involvement in the census,” according to an internal memo.

A few days after S-night, on “T-night” (the T stands for transient), enumerators will hang a census form on the door of each room in some 70,000 hotels and motels. The vast majority of the people who fill out these forms will also fill out census forms at home, and their transient forms will eventually be thrown out. But the T-night forms will still catch many people who would otherwise be missed.

The Bureau of the Census has been trying for 200 years to fulfill perfectly its constitutional mandate—an “actual enumeration” of the United States population once every ten years. In 1990 the census wall be conducted by a staff of more than 300,000, processed by the latest computers, and analyzed by some of the best statisticians in the world. But it will fail nonetheless to verify physically the existence of millions of people. Not surprisingly, the people most likely to be missed by the census are the homeless, the poor, the illiterate, illegal aliens, minorities, and residents of remote rural areas. Nonetheless, the bureau is making steady progress toward its ultimate goal. Bureau officials estimate that they missed only 1.4 percent of the actual U.S. population in 1980, down from 5.6 percent in 1940. Charles Jones, the associate director of the census, says that he hopes in 1990 to pin down the U.S. population with a 0.5 percent margin of error. But the last few million people will be especially hard to find—they may, in fact, never he found—and the assumption that they even exist rests on sophisticated methodologies of which the public is largely unaware.

How do you count people who don’t want to he counted, or people you just can’t find? The bureau’s staff has some powerful new tools. Next February, for example, the bureau will unveil its Topologically Integrated Geographic Encoding and Referencing System. TIGER is a computerized map of the United States, on a scale large enough to show all of the streets and most of the physical features of the American landscape. By cross-checking the map against the addresses on all census forms that have been received, the bureau can pinpoint every, place that is unaccounted for. This new mapping system, and other computer tools, make it possible for the Census Bureau to establish the size and location of the U.S. population with astonishing precision. Theoretically, the bureau could one day be able to conduct a census without sending out forms or enumerators.

TO A CERTAIN EXTENT, the CenSUS even now requires an act of faith. Enumerators have always asked one person— ideally, the head of the household—to describe every other member of that household. Heads of household sometimes forget. A Census Bureau study in 1960 showed that they were not very likely to forget their wives and children, although they sometimes did, and rather more likely to forget their grandchildren and other relatives. They were most likely to forget their in-laws.

In recent years the amount of face-toface contact between the bureau and its subjects has been diminishing. The intrepid enumerator with a clipboard who goes from door to door is rapidly disappearing. Since 1970 most census forms have been distributed and returned by postal carriers. This “mail out, mail back” procedure saves the bureau money and probably increases accuracy, because fewer errors occur when people fill out their own forms. In 1990 mail out, mail back will cover 85 percent of the population. Another 10 percent, in rural and hard-to-reach areas, will be handed forms by enumerators and will mail them back. If everything goes well, only five percent of the population— roughly 12.5 million people—will have to be interviewed in person by enumerators. Bureau officials will try to keep this number down. For the most part, enumerators are not full-time staff members or skilled demographers; they are retirees, college students, homemakers, and others who have the time and need some extra money. The task of finding and training hundreds of thousands of temporary (and not very well paid) employees during the census is one of the bureau‘s biggest headaches.

To make everything go well, the bureau must compile a list of the addresses of all dwellings in the country, a list that will be accurate on Census Day—April 1, 1990. The bureau is doing this by combining the resources of the Postal Service with those of the direct-mail industry.

The mailing lists of most catalogues, magazines, and organizations are available for sale, at prices that range from $50 to hundreds of dollars per thousand names. In January of this year the bureau bought about 56 million addresses from private vendors. To perfect this list, a card for each address will be printed, sorted by Zip Code, and given to the local post office by the local Census Bureau office. Postal workers will take the cards and “deliver” them to the post office’s separating cases, which have small compartments for every local address. When the cards are all in place, some of the compartments will still be empty, because a lot of people, hard as it is to believe, are not on anyone’s mailing list. A postal worker will fill out cards for anymissed addresses. Bureau employees will then walk the streets with the fortified address lists, to double-check. Later, local officials will triple-check. And postal carriers will review the lists and add still more addresses—a quadruple check. The process is highly effective. In 1980 pre-census measures added 6.4 million households, containing 16.4 million people, to the enumeration, at a cost of only $1.71 per added person.

In 1990 the United States will contain about 106 million places of residence, which will be home to about 250 million people. Census forms will be delivered to 95 percent of these addresses on March 23, 1990. At that point the bureau will begin a battle against the human tendency to shove paperwork into a drawer and forget about it. On April 26 enumerators will visit households whose forms are missing. Beginning on June 26 they will visit households whose forms contained partial or contradictory information.

Eighty-three percent of U.S. households returned their census forms on time in 1980, but the rate varied widely by neighborhood. Follow-up operations in 1980 were expensive—they added a total of 2.6 million people, at an average cost of $27.97 per person. These measures included “casual counts” much like S-night, return visits (sometimes many of them) to missing households, and further checks against Postal Service and local government lists. The most expensive follow-up measure was an experimental program that matched census forms against driver’s-license registrations and other “administrative” lists. Because the bureau’s computers at the time were not capable of doing the actual matching, most of the work had to be done by hand. This program added 130,000 people, at a cost of $75.54 per person. The great expense was one reason why the bureau has decided not to check its data against administrative records in 1990. But as computing costs decline, and as other forms are standardized to contain data that can be checked against census forms, administrativerecord checks will probably become more attractive. Denmark today conducts its decennial census solely through computer matching. it does not bother with a head count.

When all is said and done, however, there will still be unanswered questions. In some cases enumerators will know from asking neighbors that a building is inhabited, but they won’t be able to reach the inhabitants. In others wellintentioned heads of household will make honest errors and omissions. And a lot of heads of household will simply refuse to divulge some kinds of information, such as income. The process that fills in these blanks is called imputation, which, in its most basic form, dates back to the 1940s.

In 1990 the imputation process will be a high feat of statistical computing. It would be relatively easy to get national or regional averages of age, income, and other characteristics, and simply slip those numbers into every known blank slot. But that would seriously compromise the overall reliability of the data and would produce wildly inaccurate portraits of areas where the undercount is high. Instead, the bureau will use a process called “hot-deck” imputation. Census Bureau computers scan through forms at rates of up to forty a minute. The forms come in sequence from the same general geographic area, and the bureau assumes that people who live near each other are similar. When the computer sees a blank slot, it automatically inserts the data from the previous questionnaire. Imputation added more than 3.5 million people to the count in 1980. In other words, 1.5 percent of the official 1980 count consisted of people who were merely assumed to exist. Because of imputation a congressional seat was taken away from Indiana and given to Florida.

THE CENSUS BUREAU thinks that in 1980, despite imputation, it may have failed to count as many as 3,171,000 people, or 1.4 percent of the U.S. population. The roughness or precision of an estimate like that one lies at the heart of a debate that has been going on in Congress over census adjustment.

The undercount estimate for 1980 was the product of demographic analysis, a statistical technique pioneered in the 1950s by Ansley J. Coale, a demographer at Princeton University. One of the basic rules of demography is that there are only three ways for a population‘s size to change: birth, death, and emigration or immigration. In the 1930s for the first time it became possible in the United States to procure reliable statistics of all three kinds from the whole country. Demographers use these data for different population groups to create a model that “ages” the U.S. population as a whole from one census to the next. The difference between the demographic projection and the actual enumeration provides an estimate of the undercount or overcount. It provides an estimate, to use the term of art, of the “error of closure.”

The likelihood of undercounting matters deeply to many communities, because, as is well known, billions of dollars from Washington are distributed according to formulas that derive from census data. In 1980 some three dozen cities separately sued the federal government for alleged census undercounting. They demanded that the estimated undercount somehow be worked back into the official total, so that they could get their “rightful” share of money.

There had been a long debate on this issue among statisticians prior to 1980, and in the end the Census Bureau decided that demographic analysis was simply too blunt an instrument and that an adjustment could conceivably put more error into the count than it would remove. The bureau decided not to adjust the data, and it finally won the last of the lawsuits last spring. But after 1980 a small group of bureau officials and other statisticians devoted themselves to a search for an adjustment procedure that would in fact reduce error. Last year, they believe, they found it.

The model they devised is quite complex, and a beautiful example of the statistician’s art. It would depend on a large post-census survey of about 300,000 households, which would generate data to compare with census data. This would be the largest single household survey ever done in the United States except for the census itself. According to Barbara Bailar, the chief of the bureau team that came up with the model, 300,000 households is the sample size statistically necessary to prevent the level of error from exceeding one percent—the level that the bureau considers tolerable. The post-enumeration survey would be based on a sample of thousands of “blocks,” each consisting of a highly specific population segment, such as blacks in Los Angeles. The blocks would then be grouped into a smaller number of demographically similar “strata,” and a sample of blocks that reflected the overall characteristics of each stratum would be chosen. After the regular census ended, the bureau would go to the sample blocks and conduct a second, more intensive enumeration. When the forms from the second survey were ready, they would be matched to the forms from the original census. The difference in the results would indicate the size of the undercount or overcount. For example, if the post-enumeration survey showed that there were 18 percent more blacks in inner cities than were counted in the census, then the original count of black inner-city men would be increased by 18 percent.

Comparing two enumerations to get a more accurate estimate is called the capture-recapture method. It is the same method that wildlife biologists use to estimate the number of fish in a pond or deer in a forest. When the subject is wildlife, it is assumed that every member of the population has an equal chance of being tagged each time; this assumption is known as the assumption of independence. The population necessarily falls into four groups: those tagged both times, those tagged neither time, those tagged the first time but not the second, and those tagged the second time but not the first. If the assumption of independence is valid, then the percentage of the population tagged the first time that is also tagged the second time is, logically, the same as the percentage of the entire population that is tagged the second time. Estimating the total population thus becomes a simple matter.

Of course, for human beings the assumption of independence is not entirely valid. People can try not to get tagged. But most American statisticians are convinced that this potential problem is not serious enough to threaten the validity of the adjustment procedure devised by Bailar and her colleagues.

John Keane, the director of the Census Bureau, is not among their number, however. Neither is Robert Ortner, his superior in the Commerce Department. The census will not be adjusted in 1990. In announcing the decision, Ortner cited a continuing disagreement among statisticians, the great cost in money and time of the 300,000-household survey, and possible public skepticism about including people in the census whom no one has either seen or heard from. “We don’t play with the numbers,” Ortner said. (He did not mention imputation.)

In the ensuing controversy Bailar quit, black and urban groups protested, and there was widespread speculation that Ortner, a Republican, had forced a political decision on the bureau. An adjustment would mean more federal aid for cities and in general would tend to help the Democratic Party. Congressman Mervyn Dymally, a Democrat from California, is promoting legislation that would force the Census Bureau to adjust the 1990 count, but his bill is likely to fail. If an adjustment were to be made in 1990, one result would be the removal of a congressional district from Pennsylvania and the creation of a new one in Arizona.

DESPITE THE adjustment controversy, Keane has pledged that research into alternative methods of enumeration will continue, and he has been true to his word. Among other things, the bureau-recently launched a program designed to figure out why so many blacks, Hispanics, Asians, American Indians, illegal aliens, and homeless people try to avoid the enumerator. The results so far show that these people believe that census information will be used to hurt them, that it is shared freely with other agencies like the Internal Revenue Service and the Immigration and Naturalization Service, and that it is used to hunt people down. (In fact the bureau takes elaborate measures to keep individual census forms confidential. Every number it releases to the public is an average, or is anonymous, and the smallest geographic area for which it will release data is a city block. Anyone who releases information about an individual sooner than seventy-two years after the enumeration was made has committed a federal offense.) The Census Bureau is toying with the idea of permitting people in certain “hard-to-enumerate” areas—such as inner cities, public-housing projects, and wilderness areas—to fill out census forms without providing their names.

The biggest improvement planned for 1990 is the installation of TIGER, the computerized street map of the United States. In every past census enumerators have used hand-drawn paper maps to find addresses. When they needed to make a correction, they marked it on the map. In 1980 the clutter of handwriting on more than 300,000 paper maps caused a lot of mistakes, and housing units near the edges of maps were often omitted, counted twice, or counted in the wrong area.

After the 1980 count the Census Bureau’s geography division, in cooperation with the U.S Geological Survey, began the gargantuan task of entering all USGS 1:100,000 series maps into a computer. An automated scanning machine “read“ each feature represented on each map, assigned it geographic coordinates, and added it to TIGER’s master file. Computers use these coordinates to reconstruct the maps and display them graphically on a screen as a series of lines. The great advantage of this method is that any feature on the map can be changed instantly and simply, in the computer.

The TIGER file now contains about 50 to 70 gigabytes (one gigabyte represents a billion units of information). It is arranged into subfiles, each of which contains a different physical feature, such as waterways, political boundaries, rail lines, and streets. When the file is revised after the census, it will probably grow even larger and will certainly become more accurate.

So far the data in TIGER has not been linked to actual census data. In other words, it is not yet possible to call up a map of any road in America, hit a computer key, and instantly see the demographic characteristics of its residents. Soon, however, that may be possible. “We’re just starting out,” explains Robert Hammond, of the bureau’s 21st Century Decennial Census Planning Staff. “We figured that we’d better build a Chevy first.”

TIGER is also an invaluable gift to American businesses that use directmail campaigns, home-delivery routes, and other forms of marketing that need a high degree of precision. The data companies that serve these businesses are already working on software that will make TIGER easier to use. “It cost about as much as a typical weapons system,” says Donald Cooke, the president of Geographic Data Technology, in Lyme, New Hampshire. “But it will be far more valuable to businesses than the census itself.”

Bruce Johnson, the head of the 21st Century Decennial Census Planning Staff, is thinking past 1990 to the year 2000 and beyond. He thinks about sending enumerators out into the field with hand-held computers instead of clipboards, and he tries to imagine what a “paperless” census would be like. He thinks about installing TIGER in thousands of dashboard computers. And he wonders whether hand-held devices linked to satellites can instantly pinpoint a user’s latitude and longitude, and whether that could have applications for enumeration in rural areas.

Technological advances might even replace the enumeration one day. Some statisticians have proposed using supercomputers to establish a “megamatch” of federal statistical data sources. Com-

Sections

The Print Edition

Washington: Hide and Seek