From Line Geometry To Area Topology
by  © P. Wade, M. Visvalingam and G.H. Kirby
CISRG Discussion Paper Series No 1, University of Hull, 48 pp

 

For the sake of posterity, please cite from the published version:
Kirby, G H, Visvalingam, M and Wade, P (1989) "The recognition and Representation of Polygons with Holes" Computer J, 32 (6), 554 - 562.

CONTENTS

  1. Introduction

  2. Background

  3. Disassociative Area Model (DAM)
    3.1 Geometry
    3.2 Geography
    3.3 Comparison of DAM with predecessors

  4. Deriving the Topology
    4.1 Extracting the Boundaries
    4.2 Forming the Containment Hierarchy

  5. An application of DAM
    5.1 Background to the OS 1:625,000 Data
    5.2 Extraction of the Geometric Topology
    5.3 Identification of Area Objects
    5 .4 Summary of the Project

  6. Conclusion

REFERENCES
FIGURES
It is best to open the figures in another browser window, so that you can read the text while studying the figures

3. Disassociative Area Model (DAM)

This model disassociates the geometric and geographic components of areas with holes for separate academic consideration. This section briefly outlines the essential features of this model and then compares it with its predecessors.

3.1 Geometry

Using only the geometry of the bounding lines, the areal map can be dissected into a set of non overlapping parts, which we call primitive regions. For example, the linework of Figure la results in eight primitive regions (A H), including one (A) which surrounds all the other primitive regions. The primitive region is similar to the LCGUs of GEOGRAF.

At this stage, primitive regions exist only in concept. The representation of the topology hinges on the boundary. Figure lb shows the set of boundaries required to describe the primitive regions of Figure la. Each boundary is a closed loop, with direction, which defines one extent of a primitive region. A boundary can be sub classified as being either an enclosing boundary or a hole. The outer boundary of a primitive region is known as an enclosing boundary, and any inner boundaries are known as holes. Thus each boundary forms an extent of one, and only one, primitive region and each primitive region is bounded by one enclosing boundary and zero or more holes. There is one exception to this the outermost primitive region has no enclosing boundary but just one or more holes (in Figure lb the outermost primitive region is described by holes 1 and 5). This primitive region forms the complement of the union of all the other primitive regions on the plane surface.

Each boundary has a separate existence within our intermediate data structure. Boundaries are equivalent to a closed loop in geometry, but they also have an associated direction to distinguish enclosing boundaries from holes. Where one primitive region completely surrounds another primitive region, .the hole in the outer primitive region and the enclosing boundary of the inner primitive region are identical in shape. The two boundaries, however, remain unique since they have opposite direction (for example see boundaries 7 and 8 in Figure 1b). The direction of a boundary thus relates it to one specific primitive region.

Boundaries are composed of links which are similar to arcs. DAM is unconcerned as to how the link geometry is represented. It does, however, assume that the links do not cross and that they are node matched in order to extract the boundaries. It would therefore be necessary to pre process spaghetti digitising into a link and node structure. An algorithm for forming the boundaries from a set of links is given in Section 4.1.

When primitive regions all occur at the same level, that is, when there are no nested holes, there is a one to one correspondence between boundaries and primitive regions. For example, if the outermost primitive region, A, is excluded from consideration, there is a one to one correspondence between B, C, D and 2, 3, 4 respectively. Thus each primitive region may also be assigned the identity of its enclosing boundary. Links provide the adjacency relationships between boundaries and their corresponding primitive regions as in GEOGRAF.

When primitive regions (and therefore holes) are nested, there is a many to one correspondence between boundaries and those primitive regions containing holes. The links continue to provide the adjacency relationships between boundaries, and also between clusters of adjacent primitive regions. The problem is that the identity of primitive regions containing holes remains unknown. What exists, instead, is the distinct references to separate boundaries at the outer and inner extent of such primitive regions, and no single reference to the primitive regions themselves. For example, boundaries 6, 7, and 9 remain as separate references and there is no knowledge at this stage that they all belong to E.

The containment relationships between a complete set of boundaries can be viewed as forming a hierarchy which can be represented by a rooted tree. The root of the tree consists of a nominal reference to the enclosing boundary of the outermost primitive region, that is, the part of the plane surface which surrounds all the other boundaries. Each boundary is enclosed spatially by every boundary which precedes it in the tree but no other. When two boundaries are identical in shape, the one which is a hole is considered as surrounding the one which is an enclosing boundary. Thus the level of each boundary in the tree is equal to the number of other boundaries which surround it. Also, boundaries at even levels of the tree will be enclosing boundaries and those at odd levels will be holes. Figure lc illustrates the general case of the rooted tree. Note that the tree is not an ordered rooted tree as there is no set order to the edges leaving each vertex of the tree.

This hierarchical system can be applied to any set of boundaries irrespective of their complexity. For example, currently the area within hole number 7 is uncut and forms a single primitive region. It could feasibly be cut into a number of primitive regions, as is the area within hole number 9. Some of these primitive regions may in turn contain further holes. However, this would all be represented by suitable branches starting from boundary 7 in place of the one shown.

It is also possible that the map frame will be digitised around all the boundaries (see Figure 2a). This would have the effect of creating additional boundaries and primitive regions. If the map frame does not touch any of the other boundaries (as in the figure), then the hole and enclosing_boundary created by the map frame would be at levels 1 and 2 respectively in the tree, and all the other boundaries would be moved down by two levels. Figure 2b shows the revised tree.

The derivation of such a tree fully resolves the containment relationships between the boundaries and thus also the spatial extent of the primitive regions since the holes within each primitive region immediately follow the enclosing boundary for that primitive region in the tree. The set of holes at level 1 of the tree describe the holes in the outermost primitive region, whose enclosing boundary is undefined. The tree also makes the nesting of primitive regions within holes in other primitive regions explicit. The derivation of this tree for a set of boundaries is a one off process requiring an exhaustive spatial search. Forerunners to DAM were constrained by the inability in practice to resolve fully the relationships between the boundaries. An algorithm which attempts to minimise the computation required is given in Section 4.2.

Once a primitive region assumes an identity, it becomes the basic building block for area modelling and forms the pivot between the geometry and the geography as in GEOGRAF and its derivatives.

3.2 Geography

Area objects are specific instances of different types of geographic phenomena. Some applications require only simple objects, such as a description of the administrative areas at a specified level. Others require complex objects, such as a hierarchy of administrative areas. Each application may define its own set of types of simple and complex objects.

The spatial descriptions and other attributes of area objects may be provided in different formats by various sources; and applications may require access to some or all of these descriptions. DAM provides an application independent model of geometric entities and a framework for integrating the geographic and geometric descriptions where necessary. At the bottom most level, there may be a one to one mapping between objects and primitive regions in the simplest case (Figure 3a). Where there are disjoint parts, there is a one to many relationship between objects and primitive regions (Figure 3b). DAM allows a variable number of objects to be associated with each primitive region, that is, a many to one mapping (Figure 3c). Thus it copes with both hierarchic objects (administrative hierarchies) as well as objects which overlap at any one level (broadcasting areas). Most other models cannot cope with the latter case.

DAM is not a universal model of topographic, let alone geographic, phenomena. It does, however, provide a framework for the flexible input of geographic area information in a variety of formats. For example the data about area objects might be associated with the links, or with digitised points which lie within the objects, or both. Alternatively, the user could point to the primitive regions which belong to each area object once the digitising of the links has been completed. In addition, information may be available from other sources, such as a table which defines how a single level of objects form a hierarchy. Since phenomena under consideration and the input format are both variable, further development of DAM requires an interface to a rule processing facility. Ad hoc processes, based on a given rule set, must otherwise be provided for integrating the geography with the geometry and for data validation and automatic editing.

DAM adopts a 'human' view of the geometry of area entities in that, given the links, it is possible to extract a unique identity for each primitive region, to establish adjacency relationships and also to extract the information about the nesting of primitive regions. Since the geometric and geographic relationships are each processed separately and then related via the primitive region, DAM also provides a capability for cross checking the geometric and geographic information (this is illustrated in Section S below).

Once the object data has been assigned to the primitive regions, the boundaries of geographic objects can be computed. This is done by forming the union of the primitive regions belonging to each object and discarding any internal lines. Each geographic object comprises one or more disjoint parts, and each disjoint part can be described by one outer boundary and zero or more inner boundaries in a similar way to the geometric primitive regions. The boundaries are stored as an ordered list of links. Through the links, the adjacent primitive regions, and thus adjacent areas, to this area can be found. In addition, if a list of the constituent primitive regions is kept for each object and a list of objects is kept for each primitive region, spatial relationships between areas can easily be determined.

3.3 Comparison of DAM with predecessors

Most modern data structures which represent areas in vector form use a POLYVRT type chain and node structure for holding the geometrical extent of area objects. However, whilst the chain and the polygonal boundary are also used in POLYVRT as a direct interface to the topology and the spatial extent of area objects respectively, successors to POLYVRT have introduced intermediary constructs in an attempt at greater flexibility.

GEOGRAF uses the LCGU as a building block for forming area objects. The chain now assumes a geometric role, and holds the topology of the LCGUs. The topology of the area objects is represented by the chain group. However, the LCGU (unlike DAMs primitive region) is assumed to be simply connected, and so there is a one to one correspondence between the boundary and the LCGU. This prevents the explicit representation of holes.

In HDS and GIRAS the chain once again has the topological role of separating two area objects (as in POLYVRT). The role of the polygonal boundary is altered, however. Instead of assuming that there is a one to one correspondence between polygonal boundaries and area objects, these structures define the spatial extent of an area object by one or more directed polygonal boundaries, thus allowing the representation of objects with holes. Both systems automatically establish the connection between the polygonal boundaries at the outer and inner extents of an area object; HDS goes further and makes explicit the full containment hierarchy between the enclosing boundaries and holes in a similar way to DAM, although Edwards et al (1977) conceived of them as interior and exterior regions respectively. Importantly, the areas formed by intersecting interior with the appropriate exterior regions in HDS may in some circumstances consist of more than one primitive region. This is due to the method used for boundary formation, which could result in a non unique set of boundaries. Consequently, the interior and exterior regions may not form a hierarchy, let alone a unique one.

GIRAS differs from DAM in that, as in HDS, the directed polygonal boundaries are the boundaries of geographic objects, rather than geometric primitive regions. Both utilise information about the area objects when computing the containment between polygonal boundaries, instead of resolving it from the locational (or geometric) information alone some reasons for considering the geometric and geographic information in isolation and only connecting it once the topology is formed were given in Section 1.

ARC/INFO, with its data structures and commands, allows geometric and geographic information to be considered separately and the topological relationships to be computed from only the geometric details. However, whilst it discovers the existence of holes, it defines the topology of the polygons formed only in terms of their adjacencies and does not record the containment relationships. Within ARC, the spatial extent of each polygonal area is described by a list of the arcs needed to create its boundaries; the arcs of the enclosing boundary being the first in the list. However, neither this list nor the coordinates of the arcs are accessible through INFO. It is difficult therefore, and sometimes impossible, to answer questions about the containment relationships between area objects. For example, a user would find it difficult to establish whether A contains B or B contains A, or to determine the areas contained by A.

The GIRAS model included some explicit information on containment. Fegeas et al (1983, p 13) described one reason for this, namely that this would make it easy for small polygonal areas to be eliminated or combined with larger surrounding areas, for example for cartographic and information generalisation. We also anticipate that the hierarchic structure of DAM, taken in conjunction with other properties, could be of value for the automatic recognition of area objects. The inclusion of containment relationships in the polygon attribute table of ARC/INFO would greatly extend its usefulness at least for the semi automatic verification of attribute data. Alternatively, the topological information in ARC could be represented in a relational form and made (read only) accessible to users through INFO. Others (van Roessel and Fosnight, 1985, and Kirby et al, 1986) have established that the topological information can be held and manipulated in relational form.

ARC/INFO and GIRAS do not maintain a separate identity for each boundary, but instead just give an ordered list of arcs which describe the outer and inner extents of each polygon. Thus if these systems were to make explicit the containment hierarchy in a similar way to HDS and DAM, it would be a hierarchy of polygons (that is, primitive regions) rather than a hierarchy of boundaries. Whilst it would give the containment of polygons within other polygons, it would not indicate the clusters of polygons which lie within each hole of another polygon. Thus some of the information concerning nesting is lost (see Figure 4).

DAM provides a synthesis of its predecessors and identifies the essential functions of the unit line, unit boundary, unit of space, and unit object of given area types. Within a specific pragmatic model, the functions of these various parts may be replicated or transferred to other units for efficient computer handling or for improving the user interface.

It is possible to deduce and use any pragmatic model, which is consistent with DAM, to specify data formats for various tasks, for example for data input, output, or transfer. As a corollary, DAM can be used to evaluate whether a complete and coherent description of area objects can be derived from a proposed pragmatic model. DAM was in fact constructed precisely for that purpose as described in Section 5.

<Algorithms>


© Dr Mahes Visvalingam, University of Hull, March 2003

Cartographic Information Systems Research Group, University of Hull