Text preview for : C20-8062_Index_Organization_for_Information_Retrieval_1961.pdf part of IBM C20-8062 Index Organization for Information Retrieval 1961 IBM generalInfo C20-8062_Index_Organization_for_Information_Retrieval_1961.pdf



Back to : C20-8062_Index_Organizati | Home

.
I
Reference Manual
Index Organization for Information Retrieval




@ 1961 by Intemational Business Machines Corporation
TABLE OF CONTENTS

Page

Introduction . 1

Indexing 2

Name Indexing 3

Subj ect Indexing 9

Classification . 9

Subj ect Headings . 15

Coordinate Indexing . 21

Special Indexes. . . . 34

Word Indexing & Subject Indexing 34

Auto-encoding & Keyword in Context (KWIC) Index . 34

Indicative & Informative Indexes 37

Lookup and Search. . 40

Applications . 42

Glossary 45

Bibliography 61
INTRODUCTION

This is a primer on index organization. Only the basic principles are
presented and these in a simplified form. There will be no attempt to
discuss the problems of subject analysis which the indexer must perform
to select the correct index points; rather, the discussion will be limited
to the methods and patterns of organizing indexes.

Today, with the development of mechanized information storage and
retrieval, there is need for communication between librarians and
documentalists on the one hand and systems personnel on the other. It is
to help the latter understand the problems he will encounter in organizing
information for retrieval that this primer has been prepared.

The literature on indexing is very extensive and its vocabulary is unstable
and confusing. The basic principles, however, are not difficult to under-
stand. As the systems man gains understanding of the techniques of
information retrieval, he will be in a better position to demonstrate the
contributions that mechanization can offer this field.
INDEXING

Indexing is an ordering and listing of names, topics, objects, etc., to
facilitate finding the individual items contained in a store of information.
The conversion of indexes to codes - that is, the use of special symbols
to represent words - is the subject of an IBM pamphlet, Modern Coding
Methods (X21-3793). Coding will be touched on only incidentally.

There is no perfect or ideal index organization which is applicable to
every situation. Rather, the contents of the file and the uses to which it
will be put will determine the form of the index.

Indexing is usually divided into name indexing and subject indexing. Since
they serve different purposes and have different patterns of organization,
these indexes are nearly always treated separately.




2
NAME INDEXING

Names are usually arranged in strict alphabetic order, letter by letter,
to the end of each word:

Smith, J.
Smith, John
Smith, John A.
Smithell, Alfred

Sometimes it is questionable which part of the name is to be used. The
usual practice in the United States is to use the full surname, including
compounds, with all prefixes and to file exactly as spelled, disregarding
umlauts, accents and other diacritical marks used with foreign names.

d'Alembert EI Al Macdonald O'Daniel
Dalton Fitzgerald MacRae O'Keefe
de Secour Fitz-Hugh Mayer Okin
de Vivo Int'Feld McCall Tenant
Devon L'Abbee McDonald Ten Eyck
Disney LaBelle M'I..ean Vanner
Di Stefano Labor O'Brien Van Ness
EI-Abd La Chappelle Obst Vonner
MacAllister Von Rath

Libraries, as a rule, ignore the prefix for foreign names and group the
M', Mc and Mac together as if written Mac.

Mace
M'Ewan
MacEwan
Mach
McHale
Macham
Mac Hatton
McLachlen
Maclay

Indexing of verified names is quite simple. The problem, however,
becomes complicated when the exact spelling of the name cannot be
established or when a group of people all have the same name. In such
instances secondary evidence is introduced to pinpoint the individual.
Common items of secondary evidence are birth date, street address,
telephone number, Social Security number, signature, physical de-
scription such as height, weight, color of eyes, sex, and even finger-
prints and photographs.

Where there is doubt about the spelling of a name, the searcher must be
able to scan groups of names in order to select the individual he wants.
The usual library practice is to cross-reference individual names.




3
Beam see also Beem
Behr see also Baer, Baier, Bair, Baire, Bare, Bayer, Beir, Byer
Beedle see also Beadle, Beidel
Berch see also Birch, Burch
Canady see also Kennedy
Cline see also Clyne, Klein, Kline
Ebel see also Able
Eisenberg see also Isenberg
Lisle see also Lyle, Lysle
McCloud see also McLoud, McLeod
McCrea see also McRea
McElroy see also McIlroy
Mueller see also Miller
Philbrick see also Filbrick
Ray see also Rea, Wray
Read see also Reed, Reid
Rhine see also Ryan
Rogers see also Rodgers
Saxe see also Sachs, Sacks
Sinclair see also Saint Clair, St. Clair
Smith see also Schmid, Schmidt
Weinberg see also Wineberg
Ziegler see also Seigler, Siegler

Cross referencing is sufficient where names are accepted as correct and
it is a matter of directing the searcher to the correct entry in the index.
Where doubt exists as to exactly what the name is, it may be necessary to
have a large number of cross references.

Nickel see also

Niccol Nichal Nickell Nicol Nikalos
Niccola Nichala Nickells Nicola Niklas
Niccolai Nichalas Nickels Nicolae Niklass
Niccolas Nichali Nicklas Nicolais Nikless
Niccolay Nichalis Nicklaus Nicolas Nikol
Niccoli Nichalo Nickle Nicolau Nikola
Niccoll Nichalos Nickles Nicolaus Nikolaa
Niccolla Nichals Nickless Nicolay Nikolai
Niccollai Nicheles Nickol Nicoli Nikolas
Niccollay Nichels Nickola Nicoll Nikolaus
Niccolls Nichol Nickolai Nicolls Nikolay
Niccols Nichola Nickolas Nicols Nikoll
Nicholas Nickolay Nikolls
Nichole Nickoll Nikols
Nicholes Nickolls
Nicholi Nickols
Nicholis
Nicholl
Nicholls
Nicholo
Nicholos
Nichols


4
Such a large number of cross references, even though they may begin with
the same initial letter, are too numerous to be looked up individually. The
method usually adopted, therefore, is to group such names under one
spelling, treat all variants as if they were identical, and search by the
first name. Such a "class" or "bucket" containing all variants can also
carry cross references to other classes or single names where the
relationship between the names is rather tenuous:

James, Jameson, Jamieson, Jamison see also Jamerson

Phonetic filing is sometimes used to obtain a partial grouping of similar-
sounding names. This may involve simply dropping vowels:

Brn for Braun
Brwn for Brown, Browne
Jhnsn for Johnson
Jhnstn for Johnston, Johnstone

or may involve grouping of similar-sounding consonants. Under one of
the more popular schemes:

The initial letter is retained.

W, H are dropped except as initial letters.

A E IOU Yare also dropped but serve as separators.

Remaining consonants are coded up to three figures, as follows ~

1. BFPV
2. CGJKQSXZ
3. DT
4. L
5. MN
6. R

Zeros are added, if necessary, to complete three digits.

Double consonants or equivalents are coded as one letter unless
separated by a separator.

Bai!~ B630
Bird B630
By.!~ B630

Johnson J525
Joh!!~e!! J525
Johnston J523
Joh!!~tone J523
Joh!!~town J523
Jon~ion J523




5
Lowe!y L600
Lau,2h.E ey L260

SaQ.h~ S220
Sack~ S222
S~e S200

As can be seen in the examples, it is not possible to group all similar-
sounding names by a phonetic system. Furthermore, special rules must
be developed to avoid scattering such similar. names as McLane, McClain,
M'Lean, or Saint Clair, Sinclair, St. Clair.

Also, a formula approach often groups unrelated or dissimilar names:

Han H400
Hei! H400
Hill H400
Hull H400
Howell H400
Howe!ey H400

As demonstrated in the "Nickel" example, one must use empirically
derived lists of names in order to take care of all possible variants.

There are other techniques for filing names. Although some of these do
have the effect of grouping similar-sounding names, their main purpose
is to develop short codes, digital representations, or to combine with the
name such secondary data as birth date or address in -order to develop
unique entries. These are coding techniques and are, therefore, not
considered here.

ORTHOGRAPHY

So far the discussion has been confined to actual name variants and to
variants due to phonetic errors. In some instances where signatures are
used, there are errors due to difficulty in interpreting handwriting. In
such instances ~ may be confused with.!!, ! with!, Q or !!. with Ii, ~ with!,
.,g with.Q, and so on. Such ortbographic variations can be readily incorpo-
rated in a name list.

FORENAMES

Forenames may also be grouped in classes. In fact, this if often
necessary because of contractions, nicknames, translations and the like:

James, Diego, Giacomo, Jaime, Jas., Jim, Jimmie, Vaclav,
Venzel, Vincenzo, Waclaw, Wenzel

CORPORATE NArviES

Firm names and other corporate names are treated as personal surnames.
Coined names are filed as written:



6
Backus, J. C. & Company
Belton, Donald F. & William D. Company
Best Brands Inc.
Best, William
Best's Beauty Salon
Bevans and Beverly Service Co.
Beyer, John
Beyer Real Estate
Bill's Barber Shop
Bit of Honey Shoppe
Board of Trade
C & C Auto Service
Commission on Waterways
Committee for Local Government
Consolidated Edison Co.
Cooper Hotel
Co-operative Housing Firm

NOTE: Articles, conjunctions, ampersands, prepositions, etc., are
ignored in filing.

At times there is difficulty in determining whether the first part of a firm
name should be treated as a forename or used as an entry like a surname:

John Crerar Library
John Hancock Mutual Insurance Co.
John stewart Methodist Church
Johns Hopkins University
Marshall Field & Co.

The tendency is to file under the first part of the name and to cross-
reference_, from the second part.



The following frequencies, based on samplings by the Social Security
Administration, can be of. help in setting up name indexes:

Length of Surname

Length in Characters Percentage CUmulative Percentage s
5 or less 29.53 29.53
6 24.22 53.75
7 21. 56 75.31
8 12.81 88.12
9 6.10 94.22
10 2.87 97.09
11 1.15 98.24
12 or more 1. 76 100.00




7
Distribution of Surnames by Initial Letter

Initial Letter Percent of Total File in Letter Rank
A 3.051 15
B 9.357 3
C 7.267 5
D 4.783 10
E 1. 888 17
F 3.622 13
G 5.103 8
H 7.440 4
I .387 23
J 2.954 16
K 3.938 12
L 4.664 11
M 9.448 2
N 1. 785 18
0 1. 436 19
P 4.887 9
Q .175 25
R 5.257 7
S 10.194 1
T 3.450 14
U .238 24
V 1.279 20
W 6.287 6
X .003 26
y .555 21
Z .552 22

The Social Security Administration also publishes a list of some 1,500
most common names arranged alphabetically and by size.




8
SUBJECT INDEXING

Man has always systematized and organized his knowledge so as better to
understand and use it. As the scope of his knowledge has changed and
expanded, he has adapted his tools to control it. Today, with the acceler-
ated growth of scientific, technical and commercial information which
must be available for use very quickly, and with the development of
mechanisms to organize and reproduce large masses of information, there
is a crisis in the whole field of information storage and retrieval. Long-
established information systems are being reappraised and many new
approaches are being tried. The skills and vocabularies of many different
disciplines are being brought to bear on the problem. Words are being
coined or borrowed from other subject areas to describe the various
systems. Thus, although there may be much progress, there is also
much confusion.

Much of the confusion can be avoided by relating things to basic
principles. In the case of subject indexing there are essentially only
three fundamental approaches: classification, subject headings and
coordinate or manipulative headings. Practically all specialized indexing
systems use one of these approaches or combinations of them. Each has
unique qualities and abilities as well as deficiencies. Each must be
carefully selected and adapted for the job to be done.

Classification

Classification is a systematic, logical arrangement of index entries
usually in a hierarchical or tree pattern. The standard library classi-
fication systems, such as Dewey Decimal, Bliss, Cutter, Library of
Congress and Universal Decimal, all try to be hierarchical systems.
The terms are arranged so that they proceed from the most general to
the most specific:

Dewey Decimal Classification

Notation Term
700 Fine arts
720 Architecture
721 Architectural construction
721. 8 Openings and their fittings
721. 81 Doors

Library of Congress

Q Science
QC Physics
QC 125 Treatises on experimental mechanics
QC 151 Liquids in motion. Hydrodynamics

Highly developed hierarchical systems, such as zoological and botanical
classifications, may go through more than 20 steps descending from
kingdom through phylum, superclass, class, subclass, infraclass,
cohort, order, suborder, family, subfamily, tribe, genus, species, and


9
so on. Such a logical arrangement of an index is extremely useful. Since
it is not necessary to alphabetize the entries, the classified index has the
same order in any language, and the language barrier is thus overcome.
Class catalogs, therefore, have been very popular in Europe and wherever
multilingual groups have had to consult the catalogs and indexes.

Since the position of a topic is fixed and not dependent on language, the
synonym problem is eliminated and the need for cross references is
reduced. Cross references to show relationships of topics in different
classes are, however, necessary and most classification schemes have
extensive cross references.

Mpst important, a hierarchical arrangement permits one to search at any
level of indexing. By using an expanding notation, as in the Dewey
Decimal system, or some other graded code, the search constraints can
be set to include as broad or as narrow a subject as one desires. For
example, one wants information on hexose. Depending on the size of the
original text and the depth of the indexing used, this information might be
indexed variously as:

Hexose
Monosaccharide
Sugar
Carbohydrate

This is actually the hierarchical order, going from the specific to the
more general. In an index alphabetically arranged by subject headings,
such references would be scattered; in a classified index they would be
brought together. A classified index, therefore, employing a code which
in its structure reflects the generic relationships of the index, makes for
an excellent mechanical retrieval system. It is simple to search at any
level of specifiCity. If a hit is not made at a very specific level, one can
automatically go to the next, more general level and so on until a hit is
made, assuming, of course, there is informational material on the
subject in the file. A classification code number, therefore, not only
stands for the input description of a subject in any language, but also
brings the subject into some logical relation with other subjects. Further,
it provides a simple and efficient address for mechanized storage and
retrieval.

Classification, however, has certain disadvantages. An alphabetic index
(Dewey calls this the relative index) is needed in order to find where topics
are filed:

Topics Dewey Decimal Classification

Oil
Animal (chemical analysis) 543
Animal (chemical technology) 665
Baths 542
Burning, locomotives 621
Coal (economic geology) 553
Cooking 641


10
Topics Dewey Decimal Classification

Oil (cont.)
Cookstoves 643
Domestic fuel 644
Feeders (lubrication) 621
Gages (motor vehicles) 629
Heaters 644
Insulating material 621
Lamps 644
Light 644
Motor vehicles 629
Painting (Art) 759
Painting (Building) 698
Plants (Agriculture) 633
Plants (Botany) 581
Refining 614

It is necessary, therefore, to go through two steps to find something.
First an alphabetic index must be consulted to find the class number, then
the class number looked up to find the reference. This slows the search
and makes it more expensive.

Also it is necessary to provide for future expansion of a classification
scheme so that new terms may be interpolated anywhere in the scheme.
In rapidly developing subjects this can cause difficulty, especially where
unforeseen changes occur.

The major difficulty, however, derives from the fact that the demands
made on a retrieval system have really nothing to do with logical or
hierarchical arrangement. To begin with, there is often no natural basis
for a logical arrangement such as is found in biology or chemistry:

Thing
Substance
Chemical compound
Organic compound
Hydroxy compound
Carbohydrate
Sugar
Monosaccharide
Hexose
d-glucose
beta-d -glucose

Rather, most classifications are artificial or synthetic:

Universal Decimal Classification

6 Applied science. Medicine. Technology
66 Chemical technology
669 Metallurgy
669.7 Light metals in general


11
Universal Decimal Classification (cont. )

669.71 Aluminum. Aluminum alloys
669.713 Extraction of aluminum and aluminum
alloys from aluminum compounds
669.713.7 Electrolytic production
669.713.72 Fused salt-bath electrolysis
669.713.723 Electrolysis of aluminum or other
oxygen-bearing compounds of
aluminum in halide bath

It is really only in nature that one finds a true hierarchy.
In almost all
other cases it is an artificial or pseudo-hierarchy, sometimes called a
chain, representing a particular point of view. There are, therefore, as
many workable artificial hierarchies or chains as there are points of view.

In this discussion of classification so far we have used the term hierarchy
to describe the relationship between the subdivisions of an index. This is
traditional but not very accurate. Actually, all that should be conveyed
is that there is a relationship between the topics listed under each index
entry. Subdividing a topic does not mean splitting a class into a subclass.
Moreover, even where a true hierarchy exists, searching a file need not
be hierarchical; in fact, is most likely not to be. For example, if one
searcher is interested in dogs as pets, another in dogs as disease
vectors, a third in dogs as guardians, none of these searchers derives
any benefits from using an index which carefully shows the hierarchical
relationships between a specific breed of dogs, canines and mammals in
general. In other words, all documents relevant to a given class are not
found in that class:

Subject Heading Library of Congress Classification

Dogs
Care and breeding SF427
Diseases SF991
Folklore GR720
Legends and stories QL795. D6
Manners and customs GT5890
Pictures, illustrations N7660
Police dogs (Breed) SF427. S6
Police dogs (Social economy) HV8025
Taxation HJ5791
War use UH100
Zoology QL737. C2

Recognizing that hierarchy does not meet modern needs, especially of
inter-disciplinary literature, a number of people have devised classifi-
cation schemes in which various classes and categories can be combined
at will. A subject file is analyzed to discover the basis for its classifi-
cation. The various terms are grouped into categories and rules are
worked out which govern the order of citation of these categories. Such
a classification is often referred to as faceted or "analytico-synthetic. "
One of the best known systems of this type is the Colon Classification


12
devised by S. R. Ranganathan. There are also many elements of this free
combination in the Semantic Coding developed by J. W. Perry and in the
older Universal Decimal Classification scheme. The ability to use
separate lists of related concepts, to expand these lists and add to them as
needed has made this type of classification a more flexible tool than a
classification that tries to be purely hierarchical or, as the colon classi-
fiers call it, "enumerative."

The facet classifiers consider a class a homogeneous subject such as
chemistry, physics, medicine, agriculture, history, etc. A category is
a differentiation within a class on the basis of various characteristics.
In Chemistry, for example, there are categories such as kind, state,
property, reaction, operation, device, etc. Alcohol is a kind of
chemical, liquid is a state, volatility is a property, combustion is a
.reaction, analysis is an operation, and a flask is a device. In the class
Medicine there are such categories as organs (heart), problem (disease),
symptom (fever), agent (virus), handling (surgery), etc. Within the
categories there can, of course, be hierarchies.

The order in which these categories are to be arranged can be prescribed
so that, for example, an organ is always first, a problem is second, a
symptom third, a handling fourth, and so on. Thus an article describing
the use of penicillin to cure an inflammation of the skin would read

Skin - Inflammation - Therapy - Penicillin

Using a proposed faceted classification for nuclear energy, the notation

R212.2D 2 0-081.2-071AIR-061-022

means

"Start-up of thermal reactor, moderated by D20 using enriched
uranium fuel with air coolant, for research. "

R2 = Reactors
R212. 2 = Thermal reactors
D20 = (Heavy water)
081. 2 = Enriched uranium (used as fuel in a reactor)
071 = Gas cooled
AIR
061 = Research
022 = Start-up

The facets in this example are linked by dashes. other linkages and
relationships can be shown by using colons, zeros, or apostrophes.
Using examples of the Universal Decimal System:




13
538.114:669.245.3 = Ferromagnetism of nickel copper alloys

538 = Magnetism
538. 114 = Special theory of ferromagnetism
669 = Metallurgy
669.2 = Nonferrous metals
669.245 = Nickel alloys
669. 245. 3 = Copper-nickel alloys

621. 365. 2.078 = Automatic regulation of arc furnace
546.623'32'226 = Potassium aluminum sulphate

An example of another faceted classification is:

CcIufNbj = Transonic flow over a bent airfoil
Cc = Airfoil
Iuf = bent
Nbj = transonic flow

A colon classification example would look like this:

L2153:4725:63129:B28 = Soft palate - Cancer - Radium Treatment -
Statistical study

L = Medicine
L2 = Digestive system
L21 = Mouth
L215 = Palate
L2153 = Soft Palate
L2153:4 = Disease and so on

An example of the Semantic Code is:

MWTL. PASS. RQHT. 001 = Heat treating
MWTL = Metal
PASS = Processing
RQHT. 001 = By means of heat

Nevertheless, such synthetic or artificial classifications, when developed,
still represent, individually, a single rigid approach to a subject. A
fixed classification, as has been shown, often does not coincide with the
needs and viewpoint of the searcher, nor does it really avoid the
problems of expansion. This does not mean that classification is not a
valuable tool in the preparation of indexes. Under certain circumstances
it makes for a good index and it can also be helpful, as will be shown, in
the preparation of alphabetic subject headings.

Classification, in general, is better suited for well-established subjects
where there is not much change or expansion. And it is better suited
where the index users have a single, unified and rather specialized view-
point. If a library is concerned with basically a single subject and the
users of the index or catalog have either a uniform viewpoint of the subject


14
matter or at least understand or are in agreement as to the organization
of that subject, then a classification scheme can be very useful.

Subject Headings

Most American libraries use a classification scheme to arrange books
and other publications on their shelves but use alphabetic subject headings
to catalog and index the collection. An alphabetic subject index uses a
single word, phrase or noun combination that fully and exactly identifies
the subject matter:

Astatine
Civil engineering
Flower arrangement, Chinese ~apanese, etc.]
Gases - Liquefaction
Ionization in water
Ionization of gases
Maps, Military - History
Mathematics as a profession
Packaging - Materials, Aluminum
Shielding (ElectriCity)
Shielding (Radiation)
Heart - Diseases - Research
Tungsten - Physical properties - Tensile strength - High temperature
Uranium - Rolling (Alpha-phase)

An alphabetic subject index is an extremely efficient tool for finding
specific subjects. It has only one arrangement and is self-indexing.
Access to each subject is direct. Natural language is used and no trans-
formation into a class or code is necessary. The public can use it without
special instruction. New terms may be introduced whenever and wherever
needed.

The main problem with subject headings is to bring the vocabularies of
both the index and the searcher into coincidence, so that the information
sought is not missed. In other words, the searcher coming to the index
must use the same words in the same order as the index does, in order
to find the entries he is seeking. Generally speaking, language has a
fairly stable semantic history, and many names of elements, materials,
concepts and forms are unique and fixed. The same terms are used in
many different indexes over long periods of time. In some subjects, such
as chemistry, the terms used are often generated by accepted rules and
are unambiguous.

There are, on the other hand, many synonyms, near synonyms, over-
lapping terms, vague terms ,erroneous and superseded terms and other
possible sources of terminological difficulties. Most of these can be
overcome by providing adequate cross references of the "see" and "see
also" variety:




15
Airstrips see Airports - Runways
Berlin air lift see Berlin - Blockade, 1948 -1949
Boring machinery see also Rock drills
Distillation apparatus see also Column packing; Evaporators;
Packed columns
Invertebrates see also Arachnida; Anthropoda;
Brachiopoda; Coelenterata; Crustacea;
Echinodermata; Insects; Larvae -
Invertebrates; Mesozoa; Mollusks;
Myriapoda; Polyzoa; Protozoa;
Sponges; Worms
Medical care plans see Insurance, Health; State medicine
Medical examiners see Coroners and medical examiners

Some cross references are more elaborate and even resemble thesauri:

Counting devices Electrical or mechanical devices for
registering or recording numbers, not
to be confused with radiation detection
instruments which are often called
counters
see also Radiation detection instruments;
Radiation detectors; Scalers
Heart - Diseases see also Angina pectoris; Arrhythmia;
Chest - Diseases; Coronary heart
disease; Endocarditis; Heart - Valves -
Diseases; Rheumatic heart disease
Indians - Legal status, see also subdivision Legal status, laws,
laws, etc. etc., under names of groups of
Indians and names of individual Indian
tribes; e. g., Indians of North
America - Legal status, laws, etc.;
Cherokee Indians - Legal status,
laws, etc.
Mental health laws Here are entered works on laws dealing
with the care of the insane, the
mentally ill, the mentally handicapped,
alcoholics, epileptics, and narcotic
addicts. Works dealing separately
with alcoholics, epileptics, or narcotic
addicts are entered under the specific
headings. Works on the legal status
of the insane are entered under the
heading Insanity - Jurisprudence.

Such explanations, usually referred to as scope notes, are effective not
only in defining subject headings but also showing exactly the categories
in which they fall and their range of applicability.

The problem is somewhat more complicated where terms for new
concepts must be chosen. In the areas where language has not been
stabilized, the choice of the correct term may have to be tentative and
subject to later revision. This, however, is easier to do than to try to
find a new slot in a classification scheme.

16
Another source of language difficulty is the tendency for information
requesters not to formulate their questions precisely. Generally speaking,
they tend to phrase their inquiries in the broadest terms, asking, for
example, for a treatise on physics when they really want to know the slow
neutron cross section of zirconium. To overcome this, librarians build
a pyramid of cross references going from the general to the specific and
making cross references to related subjects:

Engineering see also Civil engineering
Civil engineering see also Mining engineering
Mining engineering see also Petroleum engineering
Petroleum engineering see also Oil wells

Since classification provides at least one hierarchy, the need for such
cross references is somewhat reduced in classification schemes, but is
by no means eliminated.

In addition to cross references, sometimes multiple entries are provided
for the various related terms so that no matter where a searcher enters
the file he will find the desired references. Multiple entries, however,
can be used only very sparingly; otherwise the index will become too large
to handle.

Particles see also headings such as Nickel powders
see also Alpha particles; Beta particles;
Charged particles; Dusts; Elementary
particles; Nuclear particles; Powders;
S particles; T particles; V particles
Charged particles see also Ions; Particles
Dusts see also Aerosols; Particles; Powders
Elementary particles see also specific particles, e. g., Mesons
and V particles. For elementary
particles with zero spin, see also
Bosons. and for those with nonintegral
spin see also Fermions
see also Antiparticles; strange particles
Nuclear particles see also the specific particles concerned
see also Elementary particles; Nucleons;
Radiation
Powders see also powders of specific elements
see also general headings of the form
Oxide powders in the list below for
lists of powders of specific compounds
see also Fluoride powders; Glass
powders; Graphite powders; Hydride
powders; Metal powders; Oxide
powders; Particles; Steel powders;
Sulfate Powders; Sulfide powders

Another approach is to group terms into small classifications so as to
bring like things together. In order to preserve the alphabetic order of
the entries, the usual technique is to invert the subject heading and thus
make the noun the file word:


17
Geometry, Algebraic
Geometry, Analytic
Geometry, Descriptive
Geometry, Differential
Geometry, Enumerative
Geometry, Infinitesimal
Geometry, Plane
Geometry, Projective
Geometry, Solid

Some alphabetic subject heading indexes tend, therefore, to be hybrid
schemes, for they include small class groups in what are otherwise
direct entry lists. Modern research libraries, however, prefer not to
use inverted headings and, instead of class groupings, rely on cross
references.

In order to make logically connecting cross references and thus tighten
the connective structure, indexers and catalogers sometimes first
develop classified chains of hierarchical definitions. Such a systematic
classified list is then used to develop the actual subject headings and their
scope notes, which define them, in order that the headings be precise
and not over lap. In other words, a classification can be a guide for the
development of subject headings and cross references.

For example, the hierarchy or "chain" shown on page 11:

Organic compound
Hydroxy compound
Carbohydrate
Sugar
Monosaccharide
Hexose
d-glucose
beta -d -glucose

tells the indexer that cross references from any one of these terms
should be made to the others. But, as was ~xplained in the Classification
section, there can be several different hierarchies for Sugar, for
example, and therefore this chain is only partially helpful in making
cross references.

Since compound subject headings are usually required to describe
adequately an entry, the possible permutation of terms can cause diffi-
culty. Entries might appear variously as:

Copper-tungsten-zinc alloy - Phase diagram
Zinc-copper-tungsten alloy - Phase diagram
Tungsten-zinc-copper alloy - Phase diagram
Alloys - Copper-zInc-tungsten - Phase diagram
Phase diagrams - Copper-zinc-tungsten alloy




18
This problem has never been adequately solved. A few conventions such
as listing the constituents of alloys, cermets, etc., in alphabetic order
as in the first example can help a little. General vague ru1es such as
putting the "most Significant" word first, or developing categories of
words - realization, material, processes and problems, place, time,
form - and assigning an order to these categories, as do the facet classi-
fiers (see page 13) really do not help very much. Very detailed indexes
permute or "rotate" the entry word and so provide mu1tiple entries rather
than use "see also" references. In general, however, such a multiplicity
of entries will bu1k a manual index so that it becomes difficu1t to use.

Although subject headings can be very precise, from a practical point of
view they are usually not as precise or detailed as they should be. This
is due to the fact that the indexer or cataloger, for reasons of economy,
usually indexes to the level of the document rather than to the level of the
concepts in the document. For example: Two documents are received,
one a brief account on the tensile strength of zirconium at 800 0 F, the
other a large report with very elaborate tables and graphs giving all the
known physical properties of zirconium. The first document would be
indexed:

Zirconium - Physical properties - Tensile strength - High Temperature

The second document, which actually has much more detailed information
on the high temperature tensile strength of zirconium, wou1d be simply
indexed as:

Zirconium - Physical properties

The unsophisticated searcher coming to the index or catalog looking for
the high temperature strength of zirconium wou1d find the first document
but not the second, unless he took the trouble to read through all the
entries under the broader headings. Conversely, anyone approaching the
index by the broader heading Physical properties might miss the first
document.

Librarians have, of course, prepared separate index entries for various
portions of a book. Such "analytics" have been used primarily where a
publication covers a variety of topics that cannot be grouped conveniently.
Analytics have also been used to bring out subjects for which the library
doe'S not have separate publications.

Indexers sometimes use broader headings and rely on the bibliographic
information carried with the entry to help the searcher select the specific
references he needs. On unit library catalog cards, the full author and
title and often an abstract or notes give a great deal of specific information
not covered by the subject heading. In indexes of abstract journals, unless
the complete bibliographic entry is