2.
Background
This section describes briefly some of
the concepts and techniques used for representing polygonal areas.
The simplest approach is to encode each area object as a separate entity,
without any regard for overlaps and adjacencies between areas. This leads
to two immediate problems. Firstly, boundaries between adjacent areas are
encoded twice, once for the area on each side, thus causing redundant
information to be stored and possible 'sliver' lines to appear when
plotting more than one object. Secondly, the lack of information about the
spatial relationships limits the ability to manipulate areas and any
associated substantive data.
One of the first systems to incorporate an explicit topological structure
was the Dual Independent Map Encoding (DIME) system of the U.S. Bureau of
the Census (Cooke and Maxfield, 1967). It was designed to represent a
single non-overlapping set of area objects. The basic element is the
straight line, uncrossed by any other, called a segment. The segment is
identified by its start and end points, called nodes, and the attribute
codes for the areas on either side of it.
The DIME philosophy was extended in POLYVRT (Laboratory for Computer
Graphics and Spatial Analysis, 1974). In this model the chain replaced the
segment as the basic element. Like a DIME segment, a chain has nodes at
its two end points, it separates two areas, and is assumed to be
uncrossed. It may, however, consist of many points, and it represents the
whole of a continuous section of boundary between two areas. In this
structure the number of points required to describe a boundary between two
areas has become unimportant; it is always represented by a single chain.
Thus the chain plays an important topological role, as well as forming the
interface between the geometry and the area objects. POLYVRT also keeps a
list of the chains which form the polygonal boundary of each area. POLYVRT
first introduced the cartographic data structures which have continued to
form the backbone of a variety of internal data structures used for
representing segments/chains today.
A consequence of the role that chains play in the POLYVRT structure is a
restriction in the types of area it can represent. For example, it is only
able to record explicitly the topological relationships and boundaries of
a single non overlapping set of areas. In an attempt at greater
flexibility, Peucker and Chrisman (1975) introduced GEOGRAF, which uses
the Least Common Geographic Unit (LCGU) as the controlling unit of the
data structure in place of the chain. The LCGU is defined as that part of
the plane surface uncut by further partitioning, and its boundary is
constructed as a POLYVRT polygon directly from chains. The chain now
becomes the boundary between two LCGUs, and thus assumes a purely
geometric role, rather than being the boundary between two area objects.
Also, the uncut spatial units (LCGUs) created by the chains no longer
represent the area objects directly; instead they are each given a unique
identity and are used as a building block in forming the area objects.
GEOGRAF is able to model multiple sets of area objects. Thus if an
administrative hierarchy of districts, counties, and countries was
represented, GEOGRAF could conceive of the three levels as separate sets
of areas rather than as a single set of areas (with hierarchic area
codes). Each LCGU would record the attribute codes of the area it belongs
to at each level. GEOGRAF can similarly represent multiple non
hierarchical data sets, such as county and parliamentary areas. The
boundary between two objects of a set, for example between two counties,
is described by a chain group, which is an ordered set of chains. The
polygonal boundaries of these objects, in turn, consist of chain groups
and will from now on be called GEOGRAF polygons.
The GIMM$ segment format is an extension of POLYVRT. It allows composite
area codes to be supplied for POLYVRT type chains, and from these it is
able to extract GEOGRAF polygons. However, GIMMS is based on the chain (POLYVRT)
rather than the LCGU, and is thus unable to combine the geometric and
geographic descriptions for areas in a flexible manner.
References to objects to the left and right of segments/chains/chain
groups allow the above models to cope with detached parts and holes (one
or more area objects completely within another area object) without having
explicit knowledge of their existence. Edwards et al (1977) proposed a
hierarchical data structure (HDS) for representing areal data which has
holes, holes in holes, and so on. From POLYVRT type chains which refer to
area objects on either side, they form a set of directed polygonal
boundaries, each of which has anti clockwise closure and a common area
code on the left or right side. Each directed polygonal boundary is then
used to define an interior or exterior region. An interior region is the
entire area within a boundary which has a common left area code, and an
exterior region the entire area outside a boundary with a common right
area code. The interior and exterior regions do not contain holes.
Area objects with holes are defined in terms of the interior and exterior
regions. Each interior region corresponds to the outer extent of one
disjoint part of an area object, and each exterior region corresponds to a
hole in an area object. The hierarchical data structure is used to model
the nesting (or containment) that exists between the interior and exterior
regions. This containment hierarchy is computed automatically based on the
knowledge of which area object a region includes or excludes, and also by
performing spatial calculations. Once the hierarchy is formed, each
disjoint part of an area object can then be defined by taking the
intersection of an interior region together with all the exterior regions
which are immediately below it in the hierarchy.
The advance of HDS over POLYVRT is that it uses chains to define
boundaries of regions (which enclose and exclude area objects) rather than
to define the area objects themselves. Modelling the hierarchy of these
regions allows for the explicit representation of holes and the spatial
relationships that result. The chain remains the boundary between two area
objects, however, and a consequence of linking the geometric and
geographic information in this manner is that the system cannot represent
a set of areas which contain overlaps, or several sets of areas
simultaneously. Since HDS is similar to our model further discussion of
HDS is postponed until Section 3.3.
All the above systems require the linework of boundaries to be input
together with left/right codes for the area objects on either side of the
boundaries (or, for the LCGUs in the case of GEOGRAF). Although this
expedites computer processing, it presents a poor human computer interface
and places an unnecessary burden on users. Input to the GIRAS system
(Mitchell et al, 1977) consists of arcs and polygon labels. Arcs, unlike
chains, do not carry left/right tags at input time. Along with the line
data, one polygon label for each polygon of the map is entered in the
system. A polygon label is an arbitrary point within each polygon with
which is associated a not necessarily unique integer attribute to describe
the polygon in which it is placed. The arcs are then connected by software
to form polygons, and the arcs and polygons are internally cross
referenced and the attributes of the polygon labels assigned to both . The
GIRAS structure is described (Mitchell et al, 1977, p 5) as topologically
similar to that introduced by Peucker and Chrisman (1975). However, the
attributes of the polygon labels need not be unique, which suggests that
GIRAS does not use the GEOGRAF model as such, but that the polygons
correspond directly to the detached parts of area objects as in POLYVRT.
GIRAS also explicitly recognises islands (the GIRAS term for holes) and
compound islands (islands which are subdivided) by defining the extent of
each polygonal area by the arcs of its outer perimeter in a clockwise
direction and the arcs of any interior islands in an anti clockwise
direction. The outer and inner extents of a complex polygonal area are
fixed using the position of the polygon label (see Mitchell et al, 1977, p
11). Thus GIRAS, like HDS, uses directed polygonal boundaries, although
the various boundaries which define a polygonal area do not each have a
separate identity in GIRAS. The geometric hierarchy of holes within holes
remains implicit in GIRAS. However, it is possible to encode a geographic
hierarchy between area objects (such as that between administrative areas)
by associating a composite feature code with the polygon label. GIRAS
therefore uses a number of ad hoc procedures to circumvent problems in
spatial data processing without seeking to accommodate them within an
underlying conceptual model of areal entities.
The Level 3 Digital Line Graph (DLC 3) format (Allder and Elassal, 1984),
as the name implies, uses a link and node structure to encode the area,
line, and point features of a map together with their spatial
relationships. The attributes of lines and points are associated with the
links and nodes, and the links also contain explicit information about the
topological relationships. When areas are represented, each closed part of
the plane surface created by the linework of the boundaries is given a
unique identity, in a similar way to the LCGUs in GEOGRAF. The links
encode the adjacency relationships between LCGUs by referring to the LCGU
on either side. The attributes of each LCGU, that is the area objects to
which they belong, are associated with a digitised point (which need not
be contained within the LCGU). Attributes are not encoded if they can be
derived from relationships to adjacent elements. For example, a link is
known to be a boundary between two area objects if the attribute codes for
the LCGUs on either side of the link are different for that class of area
object. The links which form the polygonal boundaries of LCGUs and area
objects are not explicitly stored in the structure further processing is
required to derive this information.
ARC/INFO is a modular geographic information system with facilities for
input, analysis, data management, and output of geographic data. Two data
models are used within ARC/INFO. Cartographic data (locational and
topological) are represented using a topological data model said to be
based on DLG (ESRI, 1985, p 2.9). Thematic data (or attribute data) are
represented using a relational model. In the name ARC/INFO, ARC and INFO
refer to these two structures respectively, and ARC/INFO refers to the
composite data model and the commands provided to manage and manipulate
data held in these structures.
The input of polygonal data to ARC/INFO is similar to GIRAS in that it
consists of arcs, without left/right tags, and polygon labels. Moreover,
ARC/INFO allows even greater freedom in how they are digitised. Arcs can
be digitised as 'spaghetti' without the user having to identify
intersections (nodes), as the system can identify intersections
automatically prior to polygon building. Also, the entry of polygon labels
is optional, as the system does not use them to fix the outer and inner
extents of complex polygons when automatically deriving the topology. A
not necessarily unique attribute code is supplied by the user for each arc
and polygon label at the time of digitising. The system also assigns to
each a unique internal sequence number.
When the polygon topology is generated, each polygonal area formed is also
assigned a unique internal sequence number by the system, together with
the attribute code supplied by the user for the polygon label contained
within the polygon (or zero if there is no label within the polygon). The
system also creates a polygon attribute table and an arc attribute table
which can be accessed by the user through the INFO relational database
system. These tables contain one record for each polygon or arc, and
initially hold the system number and user attribute code, and the area and
perimeter of polygons and the topology and length of arcs (ESRI, 1985, pp
2.12 2.16). The user may store additional attributes in these tables, or
in additional attribute tables which can be associated with these tables
using relational operations.
The data structures and the commands provided by ARC/INFO give the user
considerable flexibility in how area objects are represented. For example,
the user could assign a unique code to each polygon label (and thus each
polygon), and regard the polygons as being similar to LCGUs (as in GEOGRAF
and DLG 3). Several overlapping sets of area objects could then be
represented in the same data set by holding a list of area codes for each
polygon in the polygon attribute table. The attributes of each set of area
objects could be held in separate attribute tables.
An alternative approach would be to use a separate data set for each set
of nonoverlapping areas, and assign their area codes to the polygon
labels. The polygons would then correspond directly to the detached parts
of area objects (as in POLYVRT and GIRAS), and the attributes of the area
objects could be held in the polygon attribute table and/or in additional
attribute tables. ARC/INFO functions for spatial manipulation support
either approach, although the organisation of the file system encourages
the use of coverages (separate data sets) for each kind of area object.
With this latter approach, an individual object within a coverage is
represented by its corresponding polygon instead of being unduly
fragmented.
ARC/INFO automatically establishes the existence of holes when deriving
the polygon topology, and defines the spatial extent of each polygon by
the arcs of its outer perimeter followed by the arcs of any interior holes
as in GIRAS. However, this information is held within the ARC structure,
and is not accessible by the user. It is possible to discover that holes
exist in an area object from the topological information about the
polygons and arcs held in the INFO structure, but it is not always
possible to discover which are the outer and inner extents of the polygon.
Thus it may not be possible to answer questions about the containment
relationships between area objects, such as whether one area object
totally surrounds another. The Working Group on Terms and
Definitions of the American National Committee for Digital Cartographic
Data Standards held that 'holes in cartographic objects constitute a gap
in our knowledge' (Moellering, 1984, p 24). HDS and GIRAS addressed this
problem but both rely on the input of object related information for
extracting the relationships between boundaries. ARC/INFO and DLCf 3 also
recognise the existence of holes. However, only HDS makes explicit the
hierarchy of geometric polygons.
<The
Model>
|