U.S. Bureau of the Census Monograph D R A F T Economic Census Staff Washington, D.C. 20233 USING EXTRACT WITH STF 1A (EXTutor 4) July 16, 1992 ====================================================================== CONTACT: Paul Zeisset or Bob Marske (301) 763-1792 ABSTRACT: The following was designed as a self-directed tutorial in EXTRACT usage. It may also be used as a training presentation. The tutorial draws all of its examples from STF 1A, but once the skills covered are learned, the user should have immediate facility with any of the other major STFs, including STF 1C and STF 3A. This tutorial is in five parts: 4A--Basic skills: simple item and record selection, display options, printing results 4B--Increasing your productivity in item and record selection, creating "user-defined items", extracting results to a comma- delimited file, and importing that file into Lotus 1-2-3. 4C--Adding labels, more about record selection and user- defined items, extracting data to a DBF file and reusing it in EXTRACT. 4D--Advanced topics: setting up an EXTRACT menu, selecting all components of a county subdivision, tract, or Indian reservation. 4E--Advanced topics: linking or merging files beyond what is possible by adding labels. Some of the material duplicates skills taught in EXTutor1, 2, and 3. TIME REQUIRED: about 30 minutes per part. ===================================================================== Why Use EXTRACT? Each 1990 Census CD-ROM comes with a basic data display program run with the command "GO". This software does a nice job of displaying any STF table, one area at a time. There is also a feature on presenting a "general profile". For STF 1, the general profile includes 89 key items selected or derived from the 982 population and housing data fields carried in the 10 STF 1A databases. 1 Any table, or the general profile, may be printed to your printer or to a file, one area at a time. Because there is a separate GO program for each of the STFs, the software insulates the user from much of the complexity of the STF files--data stored in multiple files, with cryptic variable names, and with complex geographic hierarchies. As such, GO is highly appropriate for library patrons or other new users. EXTRACT, on the other hand, requires you to deal with more of the complexity of the data set, but gives you much more power and flexibility in working with the data, including the following capabilities: - To deal effectively with classes of geographic areas, such as all places within a county or all tracts with 1 or more person of Hispanic origin. - To work with data items from more than one table at a time. - To view narrative concept definitions so you understand what you are working with. - To create "user-defined items", such as population per square mile, percent white, or a count of persons under 15. - To merge these data with data from other sources, such as the economic or agriculture censuses. - To copy to a file whatever subset of data you select, so that you can import the data into other applications, such as Lotus 1-2-3(tm) or Harvard Graphics(tm). - To work these data using a tool you have learned how to use with other Census Bureau data sets. Before you begin This tutorial assumes that you have installed EXTRACT 1.3 or 1.4, that you have created a separate subdirectory for temporary files, e.g. \EXTRACT\WORK, and that you have made the directory containing EXTRACT the default. An optional part of tutorial 4B assumes that you have Lotus 1-2-3 available. Instructions for keyboard entry are italicized in printed versions of this tutorial, or are shown enclosed in {braces} when distributed as an ASCII text file. User keyboard entries are shown in all capital letters, and special characters are enclosed in <>, for example, for the escape key. 2 Tutorial 4A-- STARTING THE PROGRAM 1. {Type:} EXTRACT , {then press} {again after viewing the opening screen.} 2. DRIVE SELECTION The system prompts you for the drive designation for your CD-ROM, and for a location where it can store temporary files. {Enter the appropriate drive letter for your CD-ROM, e.g., type:} L: , {or whatever is appropriate for your system.} EXTRACT then asks for two locations on your hard disk--a directory for workspace and a directory for auxiliary files. {Type:} C:\EXTRACT\WORK {for work space--assuming c:\extract\work already exists.} {Type:} C:\EXTRACT\1990AUX {for auxiliary files.} We have the option to save these parameters for use next time. {Type:} S. {[If at this point the program presents you with a choice of catalog files, leave the cursor on MASTER or MSTRST1A and press} . {Multiple master catalogs are discussed later under tutorial 4D.]} The system asks you to specify a name for these parameters. When we work with CD-ROMs other than STF 1A we will use different parameters, and save them under different names. {Type:} EXSTF1A . 3 3. SELECT A CATALOG The system then presents you with a menu of all of the types of files--or "catalogs"--you have to choose from on this disc. There are ten types of files within STF 1A, each covering a different group of tables, or, in the case of age data, parts of tables. You may select only one as a starting point. [In tutorial 4C and 4D you will learn how to link data from two or three catalogs into a single display.] File Matrixes Subjects covered STF1A0 P1 - P10 Geographic Identifiers, Sex, Race, Hispanic Origin STF1A1 P11 - P12(p2) Age, Total and White STF1A2 P12(p3-p5) Age, Black & American Indian, Eskimo, Aleut (male) STF1A3 P12(p6-p8) Age, American Indian .(female) and Asian or Pacific STF1A4 P12(p9)-P13(p1) Age, Other Race and Hispanic Origin (male) STF1A5 P13(p2) - P19 Age, Hispanic Origin(female), Household Rel. & Type STF1A6 P20 - P35 Household Relationship & Type (cont.), Imputations STF1A7 P36, H1 - H20 Housing: Vacancy, Tenure, Age of Hhr, Rooms,Persons STF1A8 H21 - H40 Persons Per Room, Value, Rent, Duration of Vacancy STF1A9 H41 - H55 Units in Structure, Housing Imputations {With the cursor (highlighted bar) on STF1A0__, press} . 4. HELP SCREEN The help screen associated with this catalog of files appears. It has two parts: a. A general description of the file, including some notes on the sequence of the data. b. A brief description of how three of the options on the main menu that follows apply to this particular file. You may pull up this help screen at any time from the main menu by pressing elp or . {Press} {to continue.} 5. SELECT A FILE Now the system presents a list of all of the files within the STF1A0 catalog--which correspond to all of the states on this disc. {Move the highlighted bar (press the down arrow) down to the state of your choice, and press} . At this point, the program takes a few moments to read in the file's data dictionary and get itself set up, and (after about 25 seconds) displays the main menu. 4 5. MAIN MENU The main menu is the control center from which you will operate the rest of the program. At the lower right is the name of the file you are working with. The main menu lists a series of options. Most sessions involve selecting items and records (options 1 and 2), but let's jump ahead to Display the file to the screen, as is. {Type:} 6 6. DISPLAY TO SCREEN You are now looking at a large "spreadsheet". You can scroll down and to the right to see more data. Features: At the top are cryptic variable names. In STF 1 files most of these variable names are not really self-explanatory. At the bottom is a more complete description of the one item that the cursor is highlighting. You can scroll the cursor from side to side to see different descriptions. {Press right arrow key repeatedly.} This is the columnar display, where every line represents a separate geographic area. 7. TOGGLE TO ROWWISE DISPLAY {Press:} T {to oggle to "rowwise" display.} There is another display mode that shows us only one area at a time. Until we become more familiar with the contents of the file, this is a friendlier way to browse through the data. Data values are shown at the left, and a complete description of the item is shown at the right. The descriptive text comes from a separate file called the "data dictionary". {Press} {twice.} There are so many geographic codes in this file, that it is not until the third screen that we see the name for the geographic area, and then the beginning of the population counts we were expecting to see. {Press} {to see more data values, then again, and again to see the last of the record. Pressing} {again brings you to the beginning of the next geographic area.} Even with the rowwise display, this is more information than you want to deal with all at once, so let's create a more selective display. {Press:} {to return to the main menu.} 5 8. SELECT ITEMS {Type:} 1 In order to select items, the program lists variable names, the ones we saw as column headings in the first "columnar" display screen, along with the more complete descriptors we saw in the second "rowwise" display. We can select the variables we want -- by marking each with an X. For reasons that will be evident later, SUMLEV is very important. {Mark an} X {by SUMLEV and GEOCOMP (third and fourth from the top), then press} {four times and place an} X {by ANPSADPI,} {again to mark P0010001, again to mark P0060002, and three more times to mark P0080001.} {Press} {to return to the main menu.} 9. DISPLAY TO SCREEN {Type:} 6 Now we have all of the data items we want on one screen, and, in fact, we find that the program can now fit two full areas on a single screen. Let's say that we are interested in comparing these statistics across a number of areas at one time, so for that purpose the original columnar display is more efficient. {Press:} T {to oggle back to columnar display.} Initially, we see only the columns for SUMLEV, GEOCOMP, and AREALAND. This is because the area name field, ANPSADPI, is so wide (66 characters) that little else can share the screen width it. {Move the cursor right to see the area name and other items.} 10. ADJUSTING COLUMN WIDTH In columnar mode, we can adjust the display width of each column. {Move the cursor to the column containing ANPSADPI.} {Type:} W {for idth} {Enter} 15 {as the new width. Press} Just because you have narrowed a text column doesn't mean you can't look at the hidden text if you need to. {Move the cursor to highlight an area name that appears truncated, and type} S {for how.} A box appears that displays the entire text, temporarily covering over the text of adjacent lines. {To return to normal display, move the cursor off of this cell, in any direction.} 6 11. DISPLAYING DEFINITIONS {[If you are using a version of EXTRACT dated prior to October 1991, skip this section--STF 1 definitions are not accessible.]} We can see that the first several records are summaries for the state, with different populations, and different values for the code GEOCOMP--00 then codes in the 40's. Here is a case where even the description at the bottom of the screen doesn't tell us everything we need to know. Fortunately, most concepts in the STF 1 files have full definitions available, which we can refer to by simply pressing D. {Press} D. This definition is in the form of a code list. By paging down we find explanations for categories 40 to 44. These summaries are associated with various types of lands of American Indians and Alaska Natives. {Press} {to return to data display screen.} By the same token, we can look at definitions of subject concepts. {Cursor to the right to highlight the P0080001 column, and press} D. This definition is the same as is presented in the back of printed reports, or in the CD-ROM technical documentation. EXTRACT encourages you to confirm that you know what the census concepts mean by making it easy to consult the definitions while you are looking at the data. [The efinition function also works from within the Select Items screen and from the main menu.] {Press} {to return to data display screen.} 12. Before leaving this screen, note the geographic sequence of the file. After the state component records comes the first county, followed by summaries for its first county subdivision, then the first place within that, the first tract or block numbering area within that place, and finally a record for each block group fitting within that hierarchy. If one of these latter areas crosses the boundaries of an area above it, the summary given is only for that part within the given hierarchy, and (pt.) is added to each area name. Since the names associated with each area do not always make clear what geographic level you are looking at, you can keep track of where you are with the SUMLEV variable. If you need definitions of SUMLEV codes, moving the cursor to the SUMLEV column and typing D will give you SUMLEV's efinition. Let's see if we can display data for counties statewide but not for any of the smaller areas. {Press} {to return to the main menu.} 7 13. At the main menu, nothing comes right out and tells us how to select data for all counties. If we are not sure how to proceed, we can use the elp function. {Press the key or type} H {for elp.} This is the same screen we saw after selecting a catalog. The bottom of the elp screen features comments about how three EXTRACT options relate to this particular type of file. It tells us that while Select Items lets us specify particular data variables, Select Records allows us to specify particular geographic areas. [It also mentions adding labels, which we will refer to in tutorial 4C.] In terms of the data display we used a moment ago, selecting items is a matter of selecting columns, and selecting records is a matter of selecting rows. The main menu does not use columns and rows terminology, though, because we have already seen that there is a display mode that turns columns into rows. Since we want to select on a type of geographic area, we want to Select Records. {Press} {to return to the main menu.} 14. SELECT RECORDS {Type:} 2 The first record selection screen looks a lot like the item selection screen we used before. That is because we can use any variable in the file to govern the record selection process. Some selection variables work better than others, though, and we most often try to use variables marked with an asterisk (*). We want to select records for all counties. If you once, you see that there is a county code (CNTY), but that is not what we want because that code merely helps us pick one or more particular counties. To get counties in general, we need to use the SUMLEV variable. {Press} , {cursor to SUMLEV, and type:} S. Note that we use S to select variables at this screen rather than the X called for when we were selecting items and could mark lots of variables. {Press} {to continue.} Now a menu appears listing all possible values of SUMLEV. The second entry is 050--State-County, and those are the summaries we want. [If code 050 is not visible, cursor up to it.] {Cursor to 050, and type:} X. {Press} {to continue.} The system now spends a little time searching for the first qualifying record, then returns you to the main menu. 8 15. {Type} 6 {to display to screen.} Because the system has to look for just certain classes of records, it requires more time to fill the screen. But in a few moments, we have the statistics we selected for all counties in the state. A new subheading on the second line of the screen advises us that these data are sorted by SUMLEV+PLACEFP, and that we can press N for natural order. Since we have the display we want, there is no point in experimenting with natural order. So we can ignore that message for the moment. {After examining the data, press} {to return to the main menu.} 16. Let's try out other ways of selecting records. Suppose we want to display all census tracts within one county. {Type:} 2 {to select records.} {Mark an} S {next to SUMLEV and another} S {next to CNTY, and press} . {Mark an} X {by SUMLEV code 140, and press} {Mark an} X {by a particular county, and press} . {Type:} 6 {to display to screen.} {After examining the data, press} {to return to the main menu.} 17. Let's select all block groups within [a particular place]. {Type:} 2 {to select records.} {Put an S next to SUMLEV,} {twice and cursor to PLACEFP, and type another} S. Note that there are two place codes to choose from. We select the FIPS place code PLACEFP because it has an asterisk. The index that will help the system find [place] fast uses the FIPS code and not the Census code. {Press} . You notice that there are two summary levels featuring block groups, 091 and 150. What is the difference? SUMLEV 150 observes the simplest hierarchy--State, county, tract/BNA, and block group. SUMLEV 091 inserts county subdivisions and places in the hierarchy, which is relevant at the moment because we want to select block groups within a place. {Mark an} X {by SUMLEV code 091, and press} . {Mark an} X {by [a particular place], and press} . {Type:} 6 {to display to screen.} 9 It is obvious that the area name alone is not enough to uniquely identify each item in this query. You also need to display tract codes to identify block groups. {Press} {to return to the main menu.} 18. {Type:} 1 {to select items.} To add another variable to our display, it is simply a matter of adding another X. {once and mark TRACTBNA with} X. {Press} {to return to the main menu.} {Type:} 6 {to display to screen.} {Press} {to return to the main menu.} 19. FORMAT OPTIONS Our next step is to prepare these data for printing. The program gives us a few format options. {Type:} 5 One of the options is to specify your own heading. The program defaults start us off with a heading taken from the data dictionary, but by the time we have gone through item and record selection, a much more specific heading may be appropriate. {Type:} 3 {(heading) then enter an appropriate heading, for example,} Blacks and Hispanics by Block Group in [place]...10/1/91 PTZ It is frequently handy to add today's date or your initials to the heading, particularly if you are going to save your output for future use. {Press:} {to return to the main menu.} SENDING OUTPUT TO A PRINTER {[Optional; skip if you are not connected to a printer.]} You may want to print the results of your work. EXTRACT's defaults will print out an 80-character display to any printer. You can print out in either columnar or rowwise format, depending on which way you last toggled the display. 1. {Type:} 7 {to print.} If you have specified too many items to fit on one line in a columnar display, the program will "wrap" the overflow items onto the next line(s). If that is not what you want, press to stop the printout, and go to the Select Items menu to reduce the number of items selected or to change the width of certain columns. 10 2. If your printer can accommodate more columns, you can tell EXTRACT how wide to make the page through the Format Options. {Type:} 5 {Type:} 5 {Type the number of characters per line (e.g., 136) and press} You also can use format options to specify a different left-hand margin or page length. EXTRACT cannot send codes to your printer for compressed print or to change fonts or styles. You will need to set that up before you start EXTRACT, or use hardware switches if your printer has them. {Press} {to return to the main menu.} 3. {Type:} 7 {to print your output with the new options.} When your printout is finished, you will notice that at the bottom is a list of all data items along with their complete descriptions. 4. If is also possible to create a printout in rowwise format (set by oggling in the data display screen). The result is a profile format especially useful for displaying data for one area at a time. 5. {When the system returns you to the main menu, you type:} Q {to uit.} 11 Tutorial 4B Having saved our opening parameters into the file EXSTF1A.bat, we can skip past the opening drive selection screen. {Type:} EXSTF1A Before selecting a catalog, notice the prompt at the bottom of the screen that you may estrict the entire session to a single State. This option is of little value for many Census CD-ROMs, but is useful on STF 1A. {Press} R {to estrict the session and enter the State abbreviation of your choice, e.g.,} VA . {Press} {to select the STF1A0__ catalog.} SELECT ITEMS--SPECIAL FEATURES 1. {Type} 1 {Put an} X {next to SUMLEV.} In our previous item selections, we used and to move through the long list of data items. That may be fine for getting to know the file, but there are more efficient ways of getting through the list if we know what we are looking for. One way we can find out about these shortcuts is to press the help key or . {Press} . 2. HELP SCREEN EXTRACT's help system is context sensitive, that is, it gives you help specific to where you are in the program. This screen gives us three suggestions for how to move quickly around the list: J will let you ump to a specific item name, if you already know it, and the printout from exercise 4A gives you a key to some of the item names. L is for ocate, and it finds a particular character string at the beginning of the item description. W is for ord search, which looks anywhere in the description for the character string you enter. {Press} {to return to the select-items menu.} 3. Last time we skipped through a number of screens to find the area name field ANPSADPI. This time let's ump to it directly. {Press} J, {and type:} ANPSADPI {in the box at the bottom of the screen. Enter} X, {then cursor down to P0010001 and put an} X {next to it.} 12 We also want to include the count of black persons. Let's say we don't remember its field name, but we can try to ocate it based on its text description. {Press} L {and type:} black , {then put an} X {next to it.} Note that ocate was not sensitive to upper and lower case. The difference between ump and ocate is that ump lets you search in the field name column, and ocate lets you search the description column. Both look for a match only at the beginning of the appropriate column. In contrast, the ord search option looks anywhere in the description field. Let's say that we are looking for the variable identifying land area, but are not sure how it is worded. {Press} W {and type:} area . This is obviously the end of the list, but we can to see a number of descriptions with the word area in them. {Cursor to AREALAND, and put an} X {next to it.} Notice that area is presented in thousandths of a square kilometer. Let's see if the definition can help us figure out how to convert those to square miles. {Press} D. Note the last sentence of the first paragraph, telling us to divide the figures presented by 2590 to get square miles. {Press} . At the moment the screen still is filtered to only show items with the letters a-r-e-a in the description. To return to the full list, do a blank ord search. {Press} W {and press} {without entering any search string.} 4. USER-DEFINED ITEMS One of the really interesting ways EXTRACT augments your displays is by computing "User-Defined Items". This in effect gives you some limited computational capabilities. {Type:} U {for ser-defined item.} You can create two types of user-defined items--Ratios and Free-form expressions. Ratios, such as percents, are used most often. You will need to enter exact variable names as they appear in the database for the numerator and denominator. If you do not know the exact names, you may return to the regular Select Items mode {(Press twice) and cursor down until you see the item you want, then Type: U {again to return to the ser-defined items mode, and type} 1 {to select the first item}. To compute population per square mile-- {Type:} P0010001 {for the numerator;} {Type:} AREALAND {for the denominator;} 13 {Type:} 2590 {for the scaling factor.} The definition for land area told us that a thousandth of a square kilometer is 1/2590 of a square mile, so the inverse of that compensates for having the denominator in thousandths of a square mile. {Press} {to get past the Free-form Expression field.} Be sure to leave the "freeform expression" blank if you are computing a ratio, and leave the numerator and denominator fields blank if you are entering a freeform expression. Once past the basic specification, the system asks how to present the new item. {Type:} 7 {for the field length;} {Type:} 1 {for the number of decimals;} {Type:} PopSqMi {for the field name; and} {Type:} Population per square mile {for the title.} In this screen, the key moved you from one field to the next. If you need to back up, use the key. To skip ahead, use the key. If doesn't let you get to the fields you want, for example, if you need to respecify the numerator, try . {Press} {after completing the title.} 5. EXTRACT will let you enter up to 10 user-defined items, either now or later. Let's say we want to percent Black in the population as well. {Press} 2 {to define a second item.} {Type:} P0060002 {for the numerator;} {Type:} P0010001 {for the denominator;} {Type:} 100 {for the scaling factor, since you normally multiply a proportion by 100 to get a percent;} {Press} {to get past the Free-form Expression field;} {Type:} 5 {for the field length;} {Type:} 1 {for the number of decimals;} {Type:} BLACK {for the field name; and} {Type:} Percent Black of total population {for the title.} {Press} {after completing the title, and} {to return to the regular select items menu.} 14 6. PREVIEW MODE At this point you may wonder whether all of these new items are going to fit on the screen. We could go back to the main menu and display a full screen of data, but EXTRACT gives us a shortcut from within the Select Items menu. {Type:} P {for

review mode.} In the preview mode you can cursor to the right to see more fields, just as in the full screen display mode. You can even cursor down to see the next record(s) in the database. If you do not see all of the items you want on the screen, cursor to the right, and adjust the width of any item that looks like it could get by with less space. {Cursor right to ANPSADPI, press} W {and type: } 15 . Now it takes only a moment to adjust the width of any or all columns with the W option. The same idth option exercised from the regular display screen may take considerably longer, as the system repaints the full 17-line data screen with every width change. 7. You may find that you also want to alter the display by adding another item or taking one from the display. {Type:} R {to eselect data items.} Mark X by any additional items you want to include in the display, or press to un-select items. [The preview displayed at the top of the screen will not be updated until you press

again.] {Press} {to return to the main menu.} SELECTING RECORDS FOR PLACES WITH 10,000 OR MORE POPULATION 8. {Type:} 2 {to Select Records.} It is possible to select records using not only the various geographic code variables but also any data value in the current file. {Cursor to SUMLEV, type} S, {then} {several times until you see P0010001, and place an} S {next to it as well. Press} {when finished.} {In the next screen, put an} X {next to 160--State-Place, and press} . When we use as a selection criterion an item that does not have a "*" or "-" next to it on the first record selection screen (e.g., P0010001), the system cannot give us a menu of all possible values, so instead it prompts us to enter the minimum and maximum values for a range. The minimum population value we want to include is 10,000, which we enter with no commas. We will not enter a maximum, to leave the interval open ended. {Type:} 10000 {for the minimum, and press} again to skip the maximum.} 15 9. {Type:} 6 {to display the data to the screen.} This same technique allows you to select records based on any data value in the current file, so that you could, for example, select all block groups with 1 or more Black persons. {Press} 10. EXTRACTING THE DATA TO A FILE At this point, we could again print out our results with option 7. Instead, let's prepare for some real work. Let's say we want to extract the data to a file that Lotus 1-2-3 can read in, so that we can arrange the counties in rank order of population, and prepare a graph. {Type:} 8 EXTRACT allows us to copy our extracted data set into any of four formats: 1 - dbf We can create another dBASE file, just smaller than the original. 2 - prn A comma-delimited file is the kind you want for importing to Lotus 1-2-3. 3 - sdf A fixed-format file looks more like a columnar report. 4 - txt A print file, with the same formatting options as printed output. [Option 4 is available starting with EXTRACT 1.3e, issued February 1992 on Economic Censuses CD-ROM 2b and elsewhere.] In addition, there is a Dry Run option that counts up all of the records selected and projects the size of the output file without actually doing the extraction. {Type:} 5 {to execute a dry run.} Doing a dry run doesn't save any time, unless you find out that the file you were about to create would have been too large for your hard disk or floppy, leading you to reduce the number of items or make the selection of records more narrow. A dry run does give you a count of records, and in this case the system gives us a count of places with 10,000 or more inhabitants. Similarly, you can answer such questions as "How many block groups are there in my county?" or "How many places in the State have more than 100 Hispanic residents. Now, let's go ahead and extract the data for the county already selected into a file for import into Lotus 1-2-3. {Type:} 2 Let's let the system add headings for us. {Type:} Y The system prompts us to specify a name for our output file and a drive and directory location on our hard disk or floppy. 16 {If you have Lotus 1-2-3, specify a filename in the subdirectory where that program normally looks for data files. For example, type:} c:\123\demo1 . If you do not have 1-2-3, specify any valid filename. The system then examines the records in the data base for ones that meet the selection criteria we specified, then copies off the data fields that we selected. This selection was very simple, but some data extractions can take quite a long time. {When the "completed" message appears, press} {to return to the main menu.} At this point, we could look at the file again, perhaps changing the items displayed, selecting a different set of records, or modifying formats. Instead, let's exit the program and see what we accomplished with the file we just created. {Type:} Q {to quit.} LOOKING AT AN EXTRACTED FILE {[This step is optional if you have Lotus 1-2-3 and can view the file in Lotus. If you have available the shareware program LIST or a text editor, you may prefer to use that program rather than TYPE in the following command.]} {Type:} TYPE C:\123\DEMO1.PRN The data have scrolled by quickly, but what is left on the screen shows us what we mean by "comma-delimited format": Alphabetic variables and codes are enclosed in quotes. Numbers have no leading zeros or spaces. Commas have been inserted between each field. Because we told EXTRACT to give us headers and footers, the file includes a complete list and description of each of the variables (columns) at the bottom of the file. We can see how useful that format is by loading the data into Lotus 1-2-3. LOAD EXTRACTED DATA INTO LOTUS 1-2-3 1. Start Lotus 1-2-3 in the way you normally do to bring up a blank spreadsheet, for example, type: 123 2. To import a comma-delimited file, type: /F(ile) I(mport) N(umbers). Then we select our file from the menu presented. {Cursor to highlight DEMO1, and press} . {If DEMO1 is not one of the options on the menu, change the default directory with /F(ile) D(irectory) to the one where DEMO1 is located, then try /I(mport) /N(umbers) again.} 17 This easy procedure loads our extracted data into spreadsheet form. Note that this procedure has preserved not only column headings, but also the master heading we specified in EXTRACT. 3. You may want to adjust the column width for the area name by cursoring to that column and entering: /W(orksheet) W(idth) B [for column B] 15. 4. At the bottom of the display ({press} ), more complete descriptions are associated with the cryptic column headings. 5. Let's say that we want to rank order these data on total population. {Press} {to return to the top, then cursor down to cell A4.} {Type:} /D(ata) S(ort) D(ata-Range) .(period) {Type:} P(rimary sort key) F1 (or whatever cell is at the top of the population column) D(escending sort) {Type:} G(o) Now we have sorted all areas in decreasing order of population. 6. Of course, you will have lots of other things you want to do inside a spreadsheet program, like-- - calculate percentages - add up subtotals - rank areas based on population size - do bar charts or graphs 7. {To leave the program, type:} /Q(uit) Y(es) 18 Tutorial 4C Having saved our opening parameters into the file EXSTF1A.bat, we can skip past the opening drive selection screen. {Type:} EXSTF1A {Press} R {to estrict the session and enter the State abbreviation of your choice, e.g.,} VA . This time let's look at data on people living alone. Household type and household relationship are found in files 5 and 6. For starters, let's look at STF1A5. {Cursor down to STF1A5__ and press} . You have a few moments to read the help screen while the system is reading in the data dictionary. 1. To find data for persons living alone we may have to page through quite a few screens full of items on the Select Items menu. It would help if we knew which table to look for. It is a good idea to keep the STF 1A paper documentation handy for such purposes, but for right now, we can make use of the fact that there is a general table locator for STF 1A built into the definitions system. {Type:} D {bring up the index of definitions from the main menu.} {With the cursor highlighting -LOCATOR, press} , {then through the alphabetic entries.} While there are no entries for "persons living alone", there are entries for Household Size and Household Type (including P016 in STF1A5) and other entries under Persons in Unit (in STF1A7). {Press} {to return to the main menu.} 2. {Type:} 1 {to the Select Items.} This file obviously does not have as many geographic identification fields as STF1A0. Later we will have to address the lack of an area name field. {Put an} X {next to SUMLEV, CNTY, COUSUBFP, PLACEFP, and LOGRECNU.} {Type}: J {and} P016 {to ump to the entries beginning with the characters P016. Put an X next to P0160001 and P0160002.} 3. There are two separate cells that make up one-person households. If we need that expressed as a single total, we need to compute a user-defined item. {Type:} U {for ser-defined item.} To total two items, we will enter a freeform expression. {Press} {to move to the freeform expression field, then type:} P0160001 + P0160002 . 19 Make sure that you type 0's rather than o's in these item names. If you do not enter the items correctly, the system will keep cycling you back through these cells until it gets an expression it finds meaningful. {Type:} 7 {for the field length;} {Press} {to skip any entry for the number of decimals;} {Type:} Person1 {for the field name; and} {Type:} One-person households {for the title.} 4. We can compute one-person households as a percent if we can supply the denominator of all households. That number is shown in a separate file, but we can also compute it from table P016 since the universe of the table is households, and all of the cells in the table should add up to that total. {Press} 2 {to define a second item.} {Type:} P0160001 + P0160002 {for the numerator.} (Users of EXTRACT 1.3e or later could type in Person1, representing the sum of these two items computed above.) To enter ten field names in the denominator field would be laborious. Fortunately, EXTRACT gives us a shortcut. You may enter the first and last fields separated by a colon, and the system will compute the sum of all of the fields in between. {Type:} P0160001:P0160010 {for the denominator.} {Type:} 100 {as the scaling factor appropriate for a percent.} {Press} {again to get past the freeform expression field.} {Type:} 5 {for the field length;} {Press} 1 {to specify one decimal place for the percent.} Note that 5 characters was assigned for a percent to allow for a figure as large as 100.0 percent. {Type:} Pct1P {for the field name.} You may be tempted to use the percent sign (%) in such a field name, but special characters such as %, $, /, *, +, or - are not allowed. It is also preferable to make the field name no longer than the field length--EXTRACT will make the data display column wide enough to accommodate both the field and its name. {Type:} One-person households as a percent of all households {for the title.} {Press} {to return to the Select Items menu, and} {again to return to the main menu.} 20 5. {Type:} 6 {to display the data.} ADDING LABELS Obviously, the display is limited in its usefulness until we add a label for the area name. {Press} {to return to the main menu.} 6. {Type:} 3 {to Add Labels.} We may add labels associated with just about any of the coded variables on the file, except tract and block group for which there are no names. The most flexible type of label, however, is the general area name (ANPSADPI) found in the STF1A0 file, which gives a title appropriate to any summary level. That choice also has the advantage of giving us access to other data items in the STF1A0 file. {Cursor down to --STF1A0 and press} . The area name is highlighted, and pressing would select it and return us to the main menu. But let's consider other items from the file as well, taking advantage of the options show a the bottom of the screen. {Press} M {to allow selecting multiple items.} {Press} {to select ANPSADPI. A ">" confirms that it is selected.} {Now press} A {to show all fields. Type:} J {and} P001 {to ump to the first of the data tables. Cursor to highlight P0010001, and press} {to include P0010001 as a label, then cursor down to P0030001 and press} {to include it as well.} {Press} to return to the main menu. 7. {Type:} 6 to display the data. {Press} W {and} 13 {to reduce the width of the ANPSADPI field to 13 characters. OPTIONAL: COMPUTING A USER-DEFINED ITEM WITH DATA FROM ANOTHER FILE In the percentage displayed at the right, one-person households was shown as a percent of all households. Another way of looking at one-person households would be as a percent of persons. It is possible to use the population count added as a "label" in computing a user-defined item, but the field name must be recorded precisely. Notice on the display that the heading for total persons is STF1A0->P0010001, with the two characters "-" and ">" making up something that looks like an arrow. This indicates that the item has been pulled in from the STF1A0 file; we will use this notation shortly in a user-defined item. {Press} {to return to the main menu.} 21 8. {Type} 1 {to Select Items} {Press} U {to bring up the list of user-defined items, and press} 2 {to review how we defined the second item.} The numerator we want will stay the same--P0160001 + P0160002. {Press} {to return to the list, then enter} 3 {to specify a new item.} {Type:} P0160001 + P0160002 {as the numerator} {Type:} STF1A0->P0010001 {as the denominator} {Type:} 100 {as the scaling factor, and another to skip past the freeform expression field. {Type:} 5 {for the field length;} 1 {for decimal places;} Pct1P {for the field name; and} Persons living alone as a percent of all persons {for the title.} {Press} {to return to the Select Items menu.} 9. {Press} P {to

review column widths} To get all of the fields on the screen, we need to adjust the widths of some of the columns. {Cursor to the SUMLEV column, press} W {and since the 3 appearing in the prompt window is the minimum this code requires, simply press} . {Repeat the process narrowing the COUSUBFP and PLACEFP column to 5 characters each.} {Press} {to return to the main menu.} SELECTING ALL RECORDS WITHIN A COUNTY 10. {Type:} 2 {to Select Records} In the examples in tutorials 4a and 4b, we selected one SUMLEV at a time, something that can be accomplished quite efficiently. But sometimes you want to show the whole geographic hierarchy within a particular county or county subdivision. Let's pick data for one county, using the CNTY variable. {Cursor down to CNTY, type} S, {and press} {to go to the next screen.} In Select Records Screen 2, put an X next to the particular county of interest. 22 11. SPEED-UP SCREEN Here is something you may not have expected. The system tells us that it can speed up retrieval if we pick one of the categories of SUMLEV. {Cursor to 050--State-County, and type:} X. You may be puzzled as to why the system asked that extra question, particularly when the answer seems obvious. In fact, county codes are attached to county subdivision, tract, and block group and some place records as well as to county records, so your answer does make a difference. EXTRACT makes use of existing "indexes" on the CD-ROM whenever it can to speed the process of finding specific data. In this case it found that in order to use a particular index, it had to know both the county and the summary level, so it prompted us for the missing piece of information. If you happen to select a SUMLEV that is not valid in combination with the rest of your selection criteria, then the program will not find any of the data you want. For example, if you specified a particular county code and a SUMLEV of 160 (State-Place--a summary level with no county codes), EXTRACT will give you a message that it can find no records meeting those conditions. EXTRACT uses the information supplied "to speed up retrieval" only for the purpose of finding the first eligible record. Thus, the index is "turned off" as soon as the first eligible record is found. {Press} . 12. {At the main menu, type} 6 {to redisplay the data.} This shows us the entire geographic hierarchy for the county selected. Note that its name has been introduced as a subheading. OPTIONAL: ADDING A SECOND SET OF LABELS 13. You may recall from scanning the table locator that some of the tables related to household composition were in file STF1A6. We have already seen how we can merge in data from STF1A0 using the Add Labels option. EXTRACT can actually handle two sets of labels at once, so we have a chance to investigate what additional information STF1A6 can bring to bear on persons living alone. {Type:} 3 {to Add Labels} This time, a blinking prompt at the top advises us to press S first if we want to ave the labels we have already designated. {Press} S, {then cursor down to --STF1A6, and press} . {Cursor down to P023, press} M {to allow selection of multiple items, cursor to P0230005, press} , {move down to P0230007, and press} {again, then} {when we are finished selecting labels.} 23 14. {Type:} 6 {to display.} At this point we may see that the data displayed are not particularly in the sequence we might have first chosen. Labels appear first, in the sequence chosen, then the data items from the current file, then user- defined items. EXTRACT 1.3 does not give us any further control over the sequence of items. EXTRACTING DATA TO A .DBF FILE This set of records might just be a group we want to refer back to, so let's save that to a file that EXTRACT can continue to use. [Note: Before proceeding, use Select Items to make sure that you have all of the items you need. For example, if you eliminated LOGRECNU or CNTY to save space in the screen display, add them back in for your .DBF output.] 1. {Type:} 8 {to extract the data to a file.} 2. {Type:} 1 {to select the DBF file option.} {Enter a file name, without any extension, e.g.,} alone. If a drive and directory have not been specified, this file is automatically save to the work directory, and becomes c:\work\alone.dbf. If you want to save this file to a floppy disk or other directory, enter the drive and directory explicitly, e.g, a:\alone. The system prompts us for a description for our "My_Files" catalog. You can take the one listed, you can edit it by cursoring to something you want to change, or you can retype the line. {Type:} Persons living alone in [county] [date] . The system will work for a while, first extracting the appropriate records, and then in a second pass adding the appropriate labels. If you have added two sets of labels, this step can take quite a while. When the system is finished, it will give you the opportunity to go directly to the newly created file. {Press} 2. 5. {Type:} 6 {to display to the screen.} At this point the display has the same data items we had selected before, except that the data items previously added as labels take less width because they no longer need as long a name. We also have to reset the width of the ANPSADPI column to fit our display since the system copied the full contents from the original file. This time there was no waiting while EXTRACT built up the screen--the system is not having to slow down as it filters out records that do not qualify, and it is working off a fast hard disk rather than a slow CD-ROM. {Press} . 24 6. {Type:} 1 {to bring up the Select Items screen.} The system has created a customized data dictionary for our new database file. Here we can see that there are fewer items to choose from. {Press} to leave the item list unchanged and return to the main menu. 7. If we want to add another label to the file, we could do so with option 3. If we wanted to cut out some of the records, for instance including records for selected townships (county subdivisions), we could do so with option 2. But if we decided at this point that we wanted more data rather than less, such as more items from the original STF1A5 file, we would have to reselect the original file, and repeat most of our original steps. 8. {If you are finished, type:} Q {to quit.} PRACTICE PROBLEMS 1. Display the name, population, number of persons under 15, and percent under 15 of all persons for geographic areas of your choice. Hints: You will want to use STF1A1 file as your main file for age data, and add the area name and population as labels from the STF1A0 file. You must add together nine separate items to create persons under 15 as a user- defined item--entering P0110001:P0110009 as a freeform expression. In computing the percent, you will enter that same expression as the numerator, STF1A0->P0010001 as the denominator, and 100 as the scaling factor. If you are wondering where the STF1A0->P0010001 came from, that is the heading EXTRACT shows for the total population column in the regular data display screen when you have added it as a label. You need to copy those characters exactly, including the - and >. 25 Tutorial 4D-- BUILDING MENUS FOR MULTIPLE CD-ROMS 1. {Type:} EXMENU . If you have used EXTRACT 1.3 only with the STF 1A tutorials, the EXTRACT menu printed out may have only one entry. If you have used more than one type of CD-ROM with EXTRACT 1.3, your menu might look more like this: EXTRACT MENU . EX EXSTF1A EXPL94 EXSTF1B EXCBP EXFTD EXSTF1A Every time you ave parameters from EXTRACT's Drive Selection screen, the program adds the name you give that set of parameters to the file EXMENU.bat. The sequence of entries is simply the order in which you saved parameters with various discs. Obviously, this menu could be more useful to you if it were annotated, rearranged, and unduplicated. Since EXMENU.bat is an ASCII text file, you may do anything you like with it in your text editor or word processor--as long as you save it as an ASCII text file, and as long as you leave the word "echo" at the beginning of each line. Here is a sample of how the menu might look after editing. EXTRACT MENU . EX EXTRACT with Economic or Agriculture Census discs EXCBP EXTRACT with County Business Patterns EXFTD EXTRACT with U.S. Exports or Imports of Merchandise EXPL94 EXTRACT with 1990 Census P.L. 94-171 EXSTF1A EXTRACT with 1990 Census STF 1A EXSTF1B EXTRACT with 1990 Census STF 1B Some users prefer to set up menus where the above options would be numbered 1, 2, ... 6 rather than given the cryptic names. If you do so, make sure you rename the corresponding batch files. For example, EX.bat would be renamed 1.bat, EXCBP.bat would be renamed 2.bat, etc. Note: Some users may prefer to have this menu reappear after the conclusion of every EXTRACT session. This may be accomplished by adding two lines, one saying PAUSE and another saying EXMENU, to each of the EXTRACT batch files. Users of computers with extended memory (more than 640K) may find that adding "SET CLIPPER=E000" at the beginning helps avoid unwanted conflicts. For example, the file EXSTF1A.bat could be edited to read: SET CLIPPER=E000 EXTRACT c l: c:\auxil\ c:\work\ master.ctg PAUSE EXMENU 26 The PAUSE line yields the prompt "Press any key to continue" before returning you to EXMENU. If EXTRACT terminates abnormally, it displays an error message on the top line (usually beginning with "Proc") that may be the key to figuring out what you (or the program) did incorrectly. Press or copy down the message before pressing any other key, least you be returned to the EXTRACT menu without that important clue. 2. {Type:} EXSTF1A {Press} R {to estrict the session and enter the State abbreviation of your choice, e.g.,} VA . This time let's look at some housing data. {Cursor down to STF1A7__ and press} . You have a few moments to read the help screen while the system is reading in the data dictionary. 3. {Type:} 1 {to select items} {Put an X next to SUMLEV, COUSUBFP, PLACEFP, TRACTBNA, BLCKGR, LOGRECNU, and (after ) H0010001.} {Press} . SELECTING ALL RECORDS WITHIN A COUNTY SUBDIVISION OR TRACT/BNA 4. Selecting records within a county subdivision or census tract/BNA presents some new wrinkles to learn, because their names may not be unique within the State, and the code lists are sorted within county. Place and county names are always unique within state, so their code lists are presented in straight alphabetic sequence, making them easier to select from a list. There are two ways to deal with these identifiers nested within county, as shown below. a. {Type:} 2 {to select records.} {Put an S next to CNTY and another S next to COUSUBFP, then press} . {Move to the county of your choice and put an X next to it. While you are there, note the county code for future reference. Press} . Contrary to your expectation, the county subdivision code list does not start with the county you have just specified. Instead it lists the county subdivisions for the county that is first in alphabetic sequence. We could press L to ocate the code based on the county subdivision name, but both the ocate and ord search options are rather time consuming on county subdivision, tract, and place lists on 1990 census files. On the other hand ump is quite fast, once you know how. {Press} J. 27 The prompt "Jump to what value for SUMLEV+CNTY+COUSUBFP+PLACEFP +TRACTBNA+BLCKBR?" is not particularly inviting. Fortunately, all you need to specify in order to get to the right part of the list is the SUMLEV (060 for county subdivisions) and the county code (which you should have made note of while selecting the county in the previous step). Thus, if the county code were 025, you would enter the string 060025, followed by . {Put an X next to the desired county subdivision and press} . Now comes one of those pesky "speed up retrieval" prompts, requesting a particular value for SUMLEV. {Put an X next to 060 and press} . {Type:} 6 {at the main menu to display the data, then} {back to the main menu when you are ready.} b. There is another way to select a particular county subdivision or census tract, a bit more time consuming, but without the complexity of entering in codes at the ump prompt. This involves two separate record selection operations, first for the county, to bring up the desired part of the code list, second for the individual county subdivision(s) or census tract(s). {Type:} 2 {to select records.} {Put an S next to CNTY and press} . {Move to the county of your choice, put an X next to it, and press} . {In response to the "speed up" screen, put an X next to 060--State- County-County Subdivision, since that is where we want to be next.} {From the main menu, type:} 2 {to select records a second time.} {Put an S next to COUSUBFP (or TRACT/BNA) and press} . {Move to the county subdivision of your choice, put an X next to it, and press} . This two-pass approach brings you to the right part of the county subdivision (or census tract/BNA) list. EXTRACT always attempts to bring you into each code list at a point corresponding to the data record that was last displayed or selected. {In response to the "speed up" screen, put an X next to 060 for SUMLEV, and press} . This is followed by another "speed up" screen, asking you to confirm the county. {Press} X, {then} . {Display the data to the screen at this point if you wish.} Note that, while you could have selected more than one county subdivision or census tract within a single county, EXTRACT does not give you the capacity to select individual tracts or county subdivisions in multiple counties in a single retrieval. As will be discussed later, however, two 28 separate retrievals, if extracted to .DBF files, can later be merged "vertically" into a single data base. 5. {Type:} 6 {to display the data to the screen.} The data displayed reflect the full geographic hierarchy within the selected county subdivision: county subdivision, place, tract, and block group. a. Starting on the second line of the screen is the prompt --Press I to Index (resequence) by SUMLEV+CNTY+COUSUBFP+PLACEFP +TRACTBNA+BLCKGR-- Since the display is sequenced the way we want it, we do not want to press I. This message appears whenever the record selection process takes us through the "speed up retrieval" prompt. The system has used an index to find the first record, then disengaged the index so that the data display in the original sequence. In this case, turning the index back on would group the records by summary level. For example, to view only the block group records, cursor to the first block group record and press I. If you then want to return to the original display, you may press N for natural order, which in this case was hierarchic. The system lets you go back and forth between indexed and natural order only once, after that you will need to repeat your original record selection process. b. You may wish to add the area name label (from STF1A0, the ANPSADPI field). For the tract and block group records this may give you only information that is self evident from the tract/BNA or block group number. On the other hand, the name may include a part indicator (pt.) that alerts you that the county subdivision or place boundary subdivides the area you are looking at. KEEPING TRACK OF WHAT SUMMARY LEVELS HAVE WHAT CODES Understanding summary levels and the sequencing of records is so important that you will want to keep handy the summary level sequence chart on page 6-1 of the STF 1 documentation (or print out the first part of \DOCUMENT\SUM_LEV.ASC on the CD-ROM). The same information is contained, somewhat abbreviated, in the definition of SUMLEV, accessible by pressing D for efinition in either of two places: (1) at the data display screen when the cursor is on the SUMLEV column, or (2) at the item selection screen when the cursor is on the SUMLEV row. You also need to keep straight which codes are available on which summary levels. For example, if you want to analyze tracts/BNAs by place, you need to specify a SUMLEV of "080" instead of "140", because SUMLEV 140 tract/BNA records do not include place codes, while SUMLEV 080 tract/BNA records do. Similarly, if you want to list places within a county, you need to work with SUMLEV 155 or 070, not SUMLEV 160, becauses the latter does not have county codes. To determine which codes are available on which summary levels, see "File identification by summary level" chart in STF 1 documentation pages 2-3 to 2-15 or print out \DOCUMENT\HOWTOUSE.ASC on the CD-ROM. 29 The chart is somewhat awkward to use because it spans a number of pages, the first 4 pages dealing with SUMLEVs 010 to 201, the next four with SUMLEVs 202 to 321, and the last four with SUMLEVs 329 and up, with pairs of pages alternating between codes from A to M, and M to U. As an exercise, (1) use the chart to confirm that place codes are present on tract records at summary level 080 but not 140; and (2) determine what summary levels are accessible using Congressional district codes. DISPLAYING DATA FOR AN INDIAN RESERVATION Selecting data for an Indian reservation introduces some interesting challenges, because there is no menu available for Indian reservations, and since no indexes are available to speed searches for the components of a reservation. 1. {Load a file in the STF1A0__ group. If you are have just completed the preceeding exercise, return to the file selection menu (option 9) to do so.} 2. {At the main menu, type:} 2 {to select records, then put an S next to AIANAFP and press} . What next appears is not an easy-to-use menu of all of the possible values for AIANAFP, but rather a prompt to specify a minimum value for the code, something you are not yet prepared to do. {Press} {to skip back to the main menu.} 3. In the absence of an Indian reservation code menu, you will have to construct your own code list. a. {Type:} 2 {to select records, and place an S next to SUMLEV and press} . {Put an X next to each of the codes having to do with Indian reservations or trust lands, from 210 to 221, and press} . [If you get the message "No records found meeting these conditions", then this state has no Indian reservations, and you will need to repeat the above steps with another state.] b. {Type:} 1 {to select items. Put an X next to SUMLEV, AIANAFP, AIANACC, ARTLI, and CNTY; three times; and put Xs next to ANPSADPI and P0010001. Press} . c. {Type:} 6 {to display. Cursor to the right to highlight the ANPSADPI column, press} W {and type:} 24 {to reduce the column width so that everything fits on the screen.} At this point we can simply make a note of the AIANAFP code associated with a particular reservation of interest. d. If you want to look at the block groups that comprise the Indian reservation, it will be quite useful if the search can be narrowed down to a single county. Summary levels 220 and 221 specify county components. If there is more than one reservation in the state, SUMLEVs 220 and 221 will not be together with the associated reservation title. The second line of the screen tells us that the display is sorted by SUMLEV (the PLACEFP in the expression has no effect on these records). {Press} N {to get the Natural or original sequence which is hierarchical.} 30 e. The codes for AIANACC and ARTLI are not self explanatory, so it may be worthwhile to see what they mean. {Cursor to AIANACC and press} D {to display its definition. Repeat the process with ARTLI.} f. This display should make a handy code list. {Press} {to return to the main menu, the type:} 7 {to print out the list, assuming you are connected to a printer. If you are not connected to a printer, make a note of the county in which the particular reservation is located.} 4. {Type:} 2 {to select records.} Before specifying the record selection, we need to review the fact that there is no asterisk next to AIANAFP, meaning that record selections based on it do not benefit from any indexes. Since searches using unasterisked items can take several minutes per megabyte to accomplish, we are much better off if we can help the program find a appropriate starting county. {Put an S next to AIANAFP and and another S next to CNTY. When the prompt for a minimum value for AIANAFP appears, enter the 5-digit code you recorded earlier, press , then enter the same code as the maximum. On the next screen, put an X next to the appropriate county, as determined from the code list, and press . Finally, in response to the "speed up retrieval" screen, put an X next to 091 for block groups.} While not obvious from this screen, we can confirm with the chart discussed in the preceeding section that American Indian reservation codes appear on block group records. {Press} . 5. Before displaying the data, confirm that you have the items you want. {Type:} 1 {to reselect items, delete the X next to AIANACC code, and add an X next to TRACTBNA several lines below.} You will want a tract or BNA code to help you identify the block groups in the display. You may also want to add a count of American Indians, and the easiest way to find that item is to do a ord search on the word Indian. {Press} W, {type:} Indian , {then put an X next to P0060003. To confirm that all of the data will fit on the screen, and press} P {to

review. Adjust the width of the ANPSADPI column if necessary. When satisfied, press} {to return to the main menu}. 6. {Type:} 6 {to initiate the display screen.} The system will now search through all of block groups starting with the county specified. If there are many block groups in the remainder of the state, this could take a while. Do something else for a while, and come back when the CD-ROM light is no longer blinking to see your results. 31 Tutorial 4E-- MERGING FILES HORIZONTALLY With the Add Labels feature, practiced in Tutorial 4c, you can merge data from up to three STF1A files at a time (the original file selected plus two more specified with "Add Labels"), as long as the total number of "label" items taken is no more than 10 (e.g., 2 from one file and 8 from a second). Sometimes you need more than the add labels function will allow, either because you want to add more than 10 items from another file or because you want to add items from a fourth or fifth file. {[Note: The following example assumes that you have installed EXTRACT 1.3e (Feb 92) or later. Earlier editions of EXTRACT do not successfully link files using record number, although linking by geographic codes is still possible. If you have EXTRACT 1.3d or earlier, skip ahead to the "Merging Files to Make Inter-Area Comparisons" below.]} 1. The first step in this process is to create an extract file with the appropriate records and as much of the data from other files as can be accommodated by adding labels. You must include in the extract enough information to support a unique match. The easiest way to assure that is to include the variable LOGRECNU. (It is also possible to make a horizontal match with a string of other variables if they match the contents of one of the STF1A indexes (e.g., sumlev, county, tractbna, block group). 2. The file ALONE.dbf created in the tutorial 4c is a good starting point. - {Start EXTRACT, or, if you are already in the program, type:} 9 to return to the file selection menu. - {At the catalog selection menu, to highlight the MY_FILES catalog, and press} . - {Move the cursor if necessary to highlight ALONE and press} . - {Type:} 1 {to select items, and put an X next to items you wish to display. You should also confirm that the variable LOGRECNU is present. Press} {to return to the main menu. You may also wish to display the data to the screen before proceeding and to reduce the width of ANPSADPI.} 3. Merging files is one of the functions provided within the "Manipulate Files" option at the main menu. {Type:} 4 4. At the file manipulation menu, select the third option, since we want to add more items to existing records. (We will discuss vertical mergers later) {Type:} 3 . 5. We want to add information about the tenure of persons living alone from file STF1A7. {Cursor to STF1A7 and press} . {At the next screen, select the appropriate state and press} . 6. The prompt asks "How do records in STF1A7.. relate to records in ALONE?" Since the LOGRECNU variable gives us the record number in the original STF1A data bases, we can relate the two data bases by record number. {Type:} 1 32 The system follows with another question: "What expression in ALONE gives the record number in STF1A7..?". {Type:} LOGRECNU (If we were working from one of the original STF1A files on the CD-ROM, this second question would not have appeared, since the system would have detected the same number of records in the two files.) 7. The system takes us directly to a Select Items for the file being merged in, STF1A7.. {Cursor or ump to H0180001, and put an X next to it and H0180008, the numbers of owners and renters among one-person households. 8. Display the results. 9. If you want to merge in data from yet another source, you will need to first save what you have to an extract .DBF file. a. But first go back to the select items menu to make sure you aren't leaving out any variables you will need. It is particularly easy to forget about LOGRECNU since it adds nothing of analytic interest to your data display; take LOGRECNU along since it is critical to further merging. b. Extract the data to a file using option 8. When the data extraction process is complete, the system will allow you to go directly to the new file, which you may use as the base of operations for repeating the steps outlined above. MERGING FILES TO MAKE INTER-AREA COMPARISONS We have seen a number of ways that EXTRACT allows you to work with data within a single record, such as the creation of user-defined items. If you want to make comparisons among different records from the same source, you will usually have the most flexibility if you import data into a spreadsheet program, such as Lotus 1-2-3 or Quattro Pro. Still, the horizontal merge feature gives some flexibility for linking among different levels of the hierarchy. The most common application is in doing a percent distribution across the components of an area. The following assumes that you are currently using ALONE.dbf (created in Tutorial 4c) or a file you have created from it. Our objective is to create a percent distribution of persons living alone within the county. 1. {At the main menu, type:} 4 {to manipulate files} 2. {Type:} 3 {to merge files horizontally.} If you previously made another horizontal merger into this file, it will ask you whether you are simply changing items within the previously merged file. By selecting items from a different file (option b) you lose the results of the previous merger. If you don't want that, press twice to return to the main menu, and save what you have as a .dbf file (option 8 from the main menu), before coming back to this menu. 33 3. {Cursor to STF1A5, and press} . {Select the appropriate state on the next screen and press} . 4. We want to link each record to the corresponding county total, and there is an index that will let us do that. In response to the prompt "How do records in STF1A5.. relate to records in ALONE?" type: 3 . Cursor now to the STF1AM.. index, because its description starts with sumlev by county. {Press} . 5. The system asks us to enter a specific key for relating the two files, and it prompts us with the contents of the key expression for STF1AM..: SUMLEV + CNTY + COUSUBFP + PLACEFP. Since we want to constrain the summary level to give us only county data, we type "050" (including quotation marks) over SUMLEV and press to get rid of the last character in SUMLEV. Cursor right past CNTY then press Y to delete to the end of the line. {Press} . 6. {When the system brings up the item selection screen, put an X next to p0160001 and p0160002, then press} . 7. {At the main menu type:} 6 {to display the results.} On the line for the county total (sumlev=050), number of males and females living alone should be repeated, first from our extract file ALONE, second from the merged file. On subsequent records the data from the merged file will stay constant as long as the county does not change, while the data from the original extract file will vary from record to record. 8. Note the headings for new columns, STF1A5..->P0160001 and STF1A5..->P0160002 (where the abbreviation for the state you are working with shows in place of the ..). If you want to create a user defined item that incorporates these statistics, you must be prepared to enter the references exactly that way. For example, to obtain a percent distribution of females living alone within the county, with the county total as 100%, enter a user defined item with P0160002 as the numerator, STF1A5..->P0160002 as the denominator, and 100 as the scaling factor. 9. When you extract the results to a .dbf file, the system will prompt you to change the duplicate field names and descriptions, since you already have a P0160001 and a P0160002 in the data base. Enter MALE1CO and "County total for males living alone" in the two prompt areas for P0160001, and FEMALE1CO and "County total for females living alone" for P0160002. OTHER APPLICATIONS FOR HORIZONTAL FILE MERGING Merging data from STF 1 and STF 3 When STF 3A and 3C data become available, with much of the same geographic structure as STF 1A and 1C, users may want to display data from both sources together. While many of the key STF 1 data are repeated on STF 3, other items can be obtained only from STF 1. Further, the STF 1 numbers, being free of sampling variability, are more accurate than counterparts from STF 3 (although the difference is negligible for large areas). 34 Unless you have a system that can address two CD-ROMs at once (not just a "jukebox"), you will need to extract data from one source to a dBASE file in your MY_FILES catalog before loading the CD-ROM for the other source. You will not be able to use LOGRECNU in specifying the relationship between the two data sets since the record numbering will not match exactly. Therefore, your extract file will need to include as many codes as are necessary to satisfy the index keyed to the second source, e.g., SUMLEV, CNTY, COUSUBFP, PLACEFP, TRACTBNA, and BLCKGR. The steps for specifying the merger are similar to those specified in the above example for inter-area comparisons, except that you would specify the matching relationship as SUMLEV + CNTY + COUSUBFP + PLACEFP + TRACTBNA + BLCKGR. (The default prompt includes only the first four of those terms.) If you are working with data for Indian reservations or other areas not featured in key indexes, you will need to create a new index for linking purposes. Since reindexing a whole STF 1A file can consume considerable time and hard disk space, it is best to extract data from both sources to your MY_FILES catalog, then merge based on an index you create at the time of file merger (option 2 at the relationship specifying screen). Merging data from the 1988 County and City Data Book CD-ROM One of the great frustrations of users of 1990 census CD-ROMs is that the discs include no comparative numbers from previous censuses. At this writing, the only Census Bureau CD-ROM featuring any 1980 census data is the 1988 County and City Data Book CD, which includes a number of summary measures corresponding to STF 1 (population by race, Hispanic origin, households by type, housing counts, percent owner occupied, median value), and a few corresponding to STF 3 (percent high school and college graduates, per capita income, median household income, percent below poverty). There are no detailed distributions from the 1980 census (the 9-category age distribution is based on 1984 estimates, not the 1980 census.) These data are shown for States, counties, and cities of 25,000 or more inhabitants, with the items spread across a number of small files. One file shows 1980 population and 1979 per capita income for all places with 2,500 or more inhabitants. County files are indexed on a 5-digit STCO code that combines the state and county code. In specifying the relationship from an STF 1 extract, use STATEFP + CNTY. City files are indexed on a 6-digit STPL code combining the state and Census place code (not the FIPS code). In specifying the relationship from STF 1, use STATEFP + PLACECE. Unfortunately, the file with all places does not have an index usable for linking. In using 1980 census data from any source, you must take into account the possibility of changes in boundaries or coding between 1980 and 1990. While county changes were very few (La Paz, AZ and Cibola, NM), many places changed boundaries through annexation or detachment, and codes may have changed as a result of mergers and new incorporations. Merging STF 1 data with economic data Data from STF 1 may also be merged with data from the 1987 economic or agriculture censuses or from the annual County Business Patterns CD-ROM. The most common merger involves combining measures of retail sales with measures of 35 population or income. The notes above on comparability over time apply here as well, although the time difference is only 3 years, and no county changes occurred during that period. The economic censuses recognize only incorporated places with more than 2,500 inhabitants, and only selected county subdivisions in New England, New Jersey, and Pennsylvania. State, county and place data will need to be merged in separate operations, since most of the key indexes involve the 2-digit RECTYPE codes (02 for states, 04 for MSAs, 06 for counties, 07 for places), which serve the same function as but are not the same as SUMLEV codes. To merge data with the RC87A3 retail trade file, you would make the merge using the index RC87C3 (by rectype, state, place, and county) and specify the relationship as follows: (for State data) "02" + STATEFP (for county data) "06" + STATEFP + "0000" + CNTY. (for place data) "07" + STATEFP + PLACECE Note that, while both the 4-digit census place codes and new 5-digit FIPS place codes are included in both the economic census and 1990 census files, the economic censuses employed the 4-digit code (corresponding to PLACECE) to control indexes, while the decennial census employed the 5-digit code (PLACEFP) for indexing. For the merge to work PLACECE must have been included in the current file. CHOOSING WHERE TO START WITH LABELS AND FILE MERGERS In tutorial 4c, we found that one could add up to two sets of labels at a time, in effect linking 3 files at once. In the exercises above, we found that we could use the horizontal merge feature to add even more in successive iterations. While EXTRACT generally gives you considerable flexibility in the sequence with which these operations are performed, you may want to consider the following factors. - All record selection variables need to be on the base file, that is, the file you load at startup or through Main Menu option 9. If you want to select records on a code or data item not in every file (for example, congressional district code), use STF1A0 as your base file. Where this is not immediately possible, as in selection based on a user-defined item (e.g., tracts more than 50 percent Black), you can always create an extract .DBF file. Once you use it, the extract file becomes your base file, complete with any previously added labels or user-defined items. - In specifying a User-Defined Item, referring to variables from files other than the base file can be awkward. You have to use nomenclature like STF1A0MD->H0010001 and you can't use the colon as a shortcut in specifying the sum of a series of items. - Items you add through Add Labels will appear first in your output, and in the sequence chosen. If you want the area name to be at the left, make it the first label you select. Items you add through horizontal file merger will appear after items in the base file, though before user-defined items being created now. - There are fewer steps required to add labels than to merge files, and fewer things to go wrong, as long as you are working off the CD-ROM. Once you extract data to a .DBF file and use that file, the linkage to label files, as specified in the data dictionary, may be broken. (For example, the data dictionary assumes that the base file and the label files each incorporate 36 the state abbreviation as characters 7 and 8 of the file name, e.g., STF1A0VA.) Therefore, in a multi-file extract, it is generally best to add what you can as labels (up to the two-file, 10-item limit) before trying a horizontal file merger. - If you need more than 10 items from a file other than the base, merging files horizontally can incorporate all of the items in a single pass, while adding labels cannot. No more than three files can be linked at a time. If you want to add items through file merger, you can add labels from only one source on the same pass. - No dBASE file can store more than 128 items. PRACTICE PROBLEMS 1. Merge State, county or place data from STF1 with data from Economic Censuses CD-ROM 1D or 1E. Hints: You will need to start by extracting desired parts of one of the CD-ROMs to your MY_FILES catalog, then restart EXTRACT with the other CD- ROM in place. Once you have selected an index on which to relate a particular economic census file, you will need to specify a relationship. Here you must bridge across different nomenclature used between the two sources. The following codes are directly comparable: STATEFP = ST, CNTY = COUNTY, PLACEFP = PLACEFIP, PLACECE = PLACE, MSACMSA = MSA. RECTYPE fills the same role for the economic censuses as SUMLEV plays 1990 census files, but the codes are different: 01 = U.S., 02 = State, 04 = MSA/CMSA/PMSA, 06 = county, 07 = place. The relationship between files must be expressed in terms meaningful to the base file, not to the file being merged in. Thus, if your base file contains summary data from the 1990 census, and you select a retail trade file indexed on RECTYPE + ST + COUNTY, you must specify the relationship as "06" + STATEFP + CNTY. End of April 1992 draft 37 ========================================= Material to be added to future revisions: Merging files vertically, to combine data from multiple states or prior retrievals Resequencing records =========================================== Comments on this series of tutorials is welcome. Call Paul Zeisset or Bob Marske at 301/763-1792, or write the Economic Census Staff, Bureau of the Census, Washington, D.C. 20233. 38