From Line Geometry To Area Topology
by  © P. Wade, M. Visvalingam and G.H. Kirby
CISRG Discussion Paper Series No 1, University of Hull, 48 pp

 

For the sake of posterity, please cite from the published version:
Kirby, G H, Visvalingam, M and Wade, P (1989) "The recognition and Representation of Polygons with Holes" Computer J, 32 (6), 554 - 562.

CONTENTS

  1. Introduction

  2. Background

  3. Disassociative Area Model (DAM)
    3.1 Geometry
    3.2 Geography
    3.3 Comparison of DAM with predecessors

  4. Deriving the Topology
    4.1 Extracting the Boundaries
    4.2 Forming the Containment Hierarchy

  5. An application of DAM
    5.1 Background to the OS 1:625,000 Data
    5.2 Extraction of the Geometric Topology
    5.3 Identification of Area Objects
    5 .4 Summary of the Project

  6. Conclusion

REFERENCES
FIGURES
It is best to open the figures in another browser window, so that you can read the text while studying the figures

2. Background

This section describes briefly some of the concepts and techniques used for representing polygonal areas.

The simplest approach is to encode each area object as a separate entity, without any regard for overlaps and adjacencies between areas. This leads to two immediate problems. Firstly, boundaries between adjacent areas are encoded twice, once for the area on each side, thus causing redundant information to be stored and possible 'sliver' lines to appear when plotting more than one object. Secondly, the lack of information about the spatial relationships limits the ability to manipulate areas and any associated substantive data.

One of the first systems to incorporate an explicit topological structure was the Dual Independent Map Encoding (DIME) system of the U.S. Bureau of the Census (Cooke and Maxfield, 1967). It was designed to represent a single non-overlapping set of area objects. The basic element is the straight line, uncrossed by any other, called a segment. The segment is identified by its start and end points, called nodes, and the attribute codes for the areas on either side of it.

The DIME philosophy was extended in POLYVRT (Laboratory for Computer Graphics and Spatial Analysis, 1974). In this model the chain replaced the segment as the basic element. Like a DIME segment, a chain has nodes at its two end points, it separates two areas, and is assumed to be uncrossed. It may, however, consist of many points, and it represents the whole of a continuous section of boundary between two areas. In this structure the number of points required to describe a boundary between two areas has become unimportant; it is always represented by a single chain. Thus the chain plays an important topological role, as well as forming the interface between the geometry and the area objects. POLYVRT also keeps a list of the chains which form the polygonal boundary of each area. POLYVRT first introduced the cartographic data structures which have continued to form the backbone of a variety of internal data structures used for representing segments/chains today.

A consequence of the role that chains play in the POLYVRT structure is a restriction in the types of area it can represent. For example, it is only able to record explicitly the topological relationships and boundaries of a single non overlapping set of areas. In an attempt at greater flexibility, Peucker and Chrisman (1975) introduced GEOGRAF, which uses the Least Common Geographic Unit (LCGU) as the controlling unit of the data structure in place of the chain. The LCGU is defined as that part of the plane surface uncut by further partitioning, and its boundary is constructed as a POLYVRT polygon directly from chains. The chain now becomes the boundary between two LCGUs, and thus assumes a purely geometric role, rather than being the boundary between two area objects. Also, the uncut spatial units (LCGUs) created by the chains no longer represent the area objects directly; instead they are each given a unique identity and are used as a building block in forming the area objects.

GEOGRAF is able to model multiple sets of area objects. Thus if an administrative hierarchy of districts, counties, and countries was represented, GEOGRAF could conceive of the three levels as separate sets of areas rather than as a single set of areas (with hierarchic area codes). Each LCGU would record the attribute codes of the area it belongs to at each level. GEOGRAF can similarly represent multiple non hierarchical data sets, such as county and parliamentary areas. The boundary between two objects of a set, for example between two counties, is described by a chain group, which is an ordered set of chains. The polygonal boundaries of these objects, in turn, consist of chain groups and will from now on be called GEOGRAF polygons.

The GIMM$ segment format is an extension of POLYVRT. It allows composite area codes to be supplied for POLYVRT type chains, and from these it is able to extract GEOGRAF polygons. However, GIMMS is based on the chain (POLYVRT) rather than the LCGU, and is thus unable to combine the geometric and geographic descriptions for areas in a flexible manner.

References to objects to the left and right of segments/chains/chain groups allow the above models to cope with detached parts and holes (one or more area objects completely within another area object) without having explicit knowledge of their existence. Edwards et al (1977) proposed a hierarchical data structure (HDS) for representing areal data which has holes, holes in holes, and so on. From POLYVRT type chains which refer to area objects on either side, they form a set of directed polygonal boundaries, each of which has anti clockwise closure and a common area code on the left or right side. Each directed polygonal boundary is then used to define an interior or exterior region. An interior region is the entire area within a boundary which has a common left area code, and an exterior region the entire area outside a boundary with a common right area code. The interior and exterior regions do not contain holes.

Area objects with holes are defined in terms of the interior and exterior regions. Each interior region corresponds to the outer extent of one disjoint part of an area object, and each exterior region corresponds to a hole in an area object. The hierarchical data structure is used to model the nesting (or containment) that exists between the interior and exterior regions. This containment hierarchy is computed automatically based on the knowledge of which area object a region includes or excludes, and also by performing spatial calculations. Once the hierarchy is formed, each disjoint part of an area object can then be defined by taking the intersection of an interior region together with all the exterior regions which are immediately below it in the hierarchy.

The advance of HDS over POLYVRT is that it uses chains to define boundaries of regions (which enclose and exclude area objects) rather than to define the area objects themselves. Modelling the hierarchy of these regions allows for the explicit representation of holes and the spatial relationships that result. The chain remains the boundary between two area objects, however, and a consequence of linking the geometric and geographic information in this manner is that the system cannot represent a set of areas which contain overlaps, or several sets of areas simultaneously. Since HDS is similar to our model further discussion of HDS is postponed until Section 3.3.

All the above systems require the linework of boundaries to be input together with left/right codes for the area objects on either side of the boundaries (or, for the LCGUs in the case of GEOGRAF). Although this expedites computer processing, it presents a poor human computer interface and places an unnecessary burden on users. Input to the GIRAS system (Mitchell et al, 1977) consists of arcs and polygon labels. Arcs, unlike chains, do not carry left/right tags at input time. Along with the line data, one polygon label for each polygon of the map is entered in the system. A polygon label is an arbitrary point within each polygon with which is associated a not necessarily unique integer attribute to describe the polygon in which it is placed. The arcs are then connected by software to form polygons, and the arcs and polygons are internally cross referenced and the attributes of the polygon labels assigned to both . The GIRAS structure is described (Mitchell et al, 1977, p 5) as topologically similar to that introduced by Peucker and Chrisman (1975). However, the attributes of the polygon labels need not be unique, which suggests that GIRAS does not use the GEOGRAF model as such, but that the polygons correspond directly to the detached parts of area objects as in POLYVRT.

GIRAS also explicitly recognises islands (the GIRAS term for holes) and compound islands (islands which are subdivided) by defining the extent of each polygonal area by the arcs of its outer perimeter in a clockwise direction and the arcs of any interior islands in an anti clockwise direction. The outer and inner extents of a complex polygonal area are fixed using the position of the polygon label (see Mitchell et al, 1977, p 11). Thus GIRAS, like HDS, uses directed polygonal boundaries, although the various boundaries which define a polygonal area do not each have a separate identity in GIRAS. The geometric hierarchy of holes within holes remains implicit in GIRAS. However, it is possible to encode a geographic hierarchy between area objects (such as that between administrative areas) by associating a composite feature code with the polygon label. GIRAS therefore uses a number of ad hoc procedures to circumvent problems in spatial data processing without seeking to accommodate them within an underlying conceptual model of areal entities.

The Level 3 Digital Line Graph (DLC 3) format (Allder and Elassal, 1984), as the name implies, uses a link and node structure to encode the area, line, and point features of a map together with their spatial relationships. The attributes of lines and points are associated with the links and nodes, and the links also contain explicit information about the topological relationships. When areas are represented, each closed part of the plane surface created by the linework of the boundaries is given a unique identity, in a similar way to the LCGUs in GEOGRAF. The links encode the adjacency relationships between LCGUs by referring to the LCGU on either side. The attributes of each LCGU, that is the area objects to which they belong, are associated with a digitised point (which need not be contained within the LCGU). Attributes are not encoded if they can be derived from relationships to adjacent elements. For example, a link is known to be a boundary between two area objects if the attribute codes for the LCGUs on either side of the link are different for that class of area object. The links which form the polygonal boundaries of LCGUs and area objects are not explicitly stored in the structure further processing is required to derive this information.

ARC/INFO is a modular geographic information system with facilities for input, analysis, data management, and output of geographic data. Two data models are used within ARC/INFO. Cartographic data (locational and topological) are represented using a topological data model said to be based on DLG (ESRI, 1985, p 2.9). Thematic data (or attribute data) are represented using a relational model. In the name ARC/INFO, ARC and INFO refer to these two structures respectively, and ARC/INFO refers to the composite data model and the commands provided to manage and manipulate data held in these structures.

The input of polygonal data to ARC/INFO is similar to GIRAS in that it consists of arcs, without left/right tags, and polygon labels. Moreover, ARC/INFO allows even greater freedom in how they are digitised. Arcs can be digitised as 'spaghetti' without the user having to identify intersections (nodes), as the system can identify intersections automatically prior to polygon building. Also, the entry of polygon labels is optional, as the system does not use them to fix the outer and inner extents of complex polygons when automatically deriving the topology. A not necessarily unique attribute code is supplied by the user for each arc and polygon label at the time of digitising. The system also assigns to each a unique internal sequence number.

When the polygon topology is generated, each polygonal area formed is also assigned a unique internal sequence number by the system, together with the attribute code supplied by the user for the polygon label contained within the polygon (or zero if there is no label within the polygon). The system also creates a polygon attribute table and an arc attribute table which can be accessed by the user through the INFO relational database system. These tables contain one record for each polygon or arc, and initially hold the system number and user attribute code, and the area and perimeter of polygons and the topology and length of arcs (ESRI, 1985, pp 2.12 2.16). The user may store additional attributes in these tables, or in additional attribute tables which can be associated with these tables using relational operations.

The data structures and the commands provided by ARC/INFO give the user considerable flexibility in how area objects are represented. For example, the user could assign a unique code to each polygon label (and thus each polygon), and regard the polygons as being similar to LCGUs (as in GEOGRAF and DLG 3). Several overlapping sets of area objects could then be represented in the same data set by holding a list of area codes for each polygon in the polygon attribute table. The attributes of each set of area objects could be held in separate attribute tables.

An alternative approach would be to use a separate data set for each set of nonoverlapping areas, and assign their area codes to the polygon labels. The polygons would then correspond directly to the detached parts of area objects (as in POLYVRT and GIRAS), and the attributes of the area objects could be held in the polygon attribute table and/or in additional attribute tables. ARC/INFO functions for spatial manipulation support either approach, although the organisation of the file system encourages the use of coverages (separate data sets) for each kind of area object. With this latter approach, an individual object within a coverage is represented by its corresponding polygon instead of being unduly fragmented.

ARC/INFO automatically establishes the existence of holes when deriving the polygon topology, and defines the spatial extent of each polygon by the arcs of its outer perimeter followed by the arcs of any interior holes as in GIRAS. However, this information is held within the ARC structure, and is not accessible by the user. It is possible to discover that holes exist in an area object from the topological information about the polygons and arcs held in the INFO structure, but it is not always possible to discover which are the outer and inner extents of the polygon. Thus it may not be possible to answer questions about the containment relationships between area objects, such as whether one area object totally surrounds another.  The Working Group on Terms and Definitions of the American National Committee for Digital Cartographic Data Standards held that 'holes in cartographic objects constitute a gap in our knowledge' (Moellering, 1984, p 24). HDS and GIRAS addressed this problem but both rely on the input of object related information for extracting the relationships between boundaries. ARC/INFO and DLCf 3 also recognise the existence of holes. However, only HDS makes explicit the hierarchy of geometric polygons.

<The Model>


© Dr Mahes Visvalingam, University of Hull, March 2003

Cartographic Information Systems Research Group, University of Hull