From Line Geometry To Area Topology
by  © P. Wade, M. Visvalingam and G.H. Kirby
CISRG Discussion Paper Series No 1, University of Hull, 48 pp

 

For the sake of posterity, please cite from the published version:
Kirby, G H, Visvalingam, M and Wade, P (1989) "The recognition and Representation of Polygons with Holes" Computer J, 32 (6), 554 - 562.

CONTENTS

  1. Introduction

  2. Background

  3. Disassociative Area Model (DAM)
    3.1 Geometry
    3.2 Geography
    3.3 Comparison of DAM with predecessors

  4. Deriving the Topology
    4.1 Extracting the Boundaries
    4.2 Forming the Containment Hierarchy

  5. An application of DAM
    5.1 Background to the OS 1:625,000 Data
    5.2 Extraction of the Geometric Topology
    5.3 Identification of Area Objects
    5 .4 Summary of the Project

  6. Conclusion

REFERENCES
FIGURES
It is best to open the figures in another browser window, so that you can read the text while studying the figures

5. AN APPLICATION OF DAM

This section briefly describes how DAM was used in a project we undertook to derive hierarchic area objects from an Ordnance Survey (OS) database of feature coded vectors.

5.1 Background to the OS 1:625,000 Data

The 1:625,000 database was established by the Ordnance Survey for purposes of experimentation by themselves and others. The aim was to provide positive evidence towards the design of the 1:50,000 database (Haywood, 1984). The structure of the database was recognised as a crucial factor influencing not only the usefulness of a small scale database but also its feasibility. The structure has implications for the cost of initial data capture and subsequent maintenance of the database.

It should be emphasised that the data we received was transferred from a database that the OS did not regard as final in terms of design nor complete as far as the data content was concerned. A systematic search for errors and omissions had not been carried out; these were known to exist, but their extent was completely unknown.

The OS database design considered the cost effective capture of data from maps to be one of the most important influences. We undertook to evaluate whether the OS design for representing the hierarchy of administrative areas (districts, counties, and countries) was adequate in concept, structure, and content for other purposes. The OS database does not provide an explicit description of the boundaries of the areas, nor does it give the hierarchical and spatial relationships between them (for example which districts belong to a county, or which counties are adjacent to one another). Instead, it just holds the essential topographic details of area objects using 'links' and 'area seeds'.

Each link is a line (described by an ordered list of Cartesian coordinates) with an associated feature code which indicates the type of administrative boundary it represents. Only one link exists for each part of a boundary, and the feature code corresponds to the highest level of administrative area to which the boundary belongs. Since the administrative units form a hierarchy, it could be inferred that the boundaries of high level objects also form a boundary of objects below them in the hierarchy and that the coastline could form all other boundaries.

Each detached part of an administrative unit is indicated by an area seed, a representative point within the polygon enclosing that part of the object. This polygon is similar to a GEOGRAF polygon. The area seeds carry the name of the area, and also have an associated feature code to identify the type of area to which they relate. For the administrative areas there are codes for national, county, metropolitan county, and district, and also a further set of codes for the seaward extensions of the administrative areas.

This pragmatic data model is extremely convenient and cost effective for data capture since it records once, and only once, each explicitly recorded map detail. However, a consequence is that the extraction of area objects is quite complex.

5.2 Extraction of the Geometric Topology

Our first task was to extract the links and area seeds relating to the administrative areas from the remainder of the OS data. The links were stored in a POLYVRT type structure this conversion was straightforward, as the links for the administrative boundaries do not cross one another and only meet at their end points. This stage was only concerned with the geometry, or course, of the links; however, the link feature codes and the area seeds were also stored for later use.

The OS data is divided into units of 100 x 100 km. squares, based upon the National Grid. These units are referred to as 100 km. libraries. We were supplied with four such libraries which together made a 200 km. square whose south west corner had the grid reference SN 000000. The four libraries were amalgamated into a single data base to ease subsequent processing. However, areas along the edge of the 200 km. square remained cut and their links remained as dangling lines. Extra links which corresponded to the edge of the 200 km. square were therefore inserted into the database so that the boundaries of the incomplete areas became closed.

The last part of the geometric processing was to form the boundaries of the primitive regions and calculate their containment hierarchy, as described in Sections 4.1 and 4.2 respectively.

5.3 Identification of Area Objects

In the OS scheme, the construction of an area object begins with the retrieval of all its area seeds. From each area seed a spatial search takes place according to a set of rules to find a node, that is, a link end point, which lies on the outer boundary of that area. Links are then chained from this starting node, according to further rules, to form a complete polygon. This process identifies the outer boundary of each disjoint part of the.area object. Further checks and processing are required to establish whether there are any holes within the area, and if so to determine their boundaries.

In our scheme, the aim is to classify each unit of space (that is, primitive region), rather than identify the linework of the area boundaries. This involves finding the name and type of district, county, and country of each primitive region. By focusing on the primitive region it is possible to provide a more coherent representation of area objects. For example, the representation of the containment relationships between primitive regions makes it unnecessary for repeated spatial searches to establish whether area objects contain holes. Also, the knowledge of the spatial relationships between the primitive regions enables checks to be carried out to ensure that the relationships between area objects are consistent.

Figure 6 illustrates the techniques that were used to name the primitive regions. The various line styles indicate the feature codes that were supplied with the links. Firstly, each area seed was allocated to a primitive region by using a point in polygon test to find the primitive region containing it, and its associated attributes (the type of the area seed and the name of the administrative unit) were assigned to the primitive region. Figure 6a has the area seeds marked upon it.

The next process was to search outwards and inwards from these named primitive regions to find neighbouring primitive regions which belong to the same object, that is, a named administrative unit. The knowledge of the spatial relationships between the primitive regions made this a thorough and efficient process. Each search started from a seeded primitive region. If the feature code of a link which bounds the primitive region was below the level of the seed, the name was also given to the adjoining primitive region if it was still unnamed. The extent of this search was limited by a link at an equal or higher level than the seed. Figure 6b illustrates this process.

Following this there were still some primitive regions which were not named at all levels. This arises because:

1)   The area‑seeds for some area objects, which were cut by the map edge, occurred outside the map edge and were thus missing from the subset we received. This type of omission could not be overcome and so these areas could not be named. The existence of these unnamed objects was established however.

2)   Some omissions in the OS data were intentional. Islands do not contain national area‑seeds, and seaward extensions do not contain national and sometimes county area‑seeds. It was possible to deduce the names in such cases from other information. For an island, its nationality was taken from a land counterpart which had the same district or county name. If missing, the nationality and county of a seaward extension was similarly taken from a land counterpart with the same district name.

There was one further set of unnamed primitive regions which arose from the system of identifying each unit of space: those which represent the sea. It was possible to name these primitive regions as such by starting from any one primitive region whose land/sea status was known and expanding outwards and inwards. The land/sea state changes of course every time a link with the coastline feature code is crossed.

Figure 6c shows the naming of the primitive regions after all the above processing. There is one primitive region whose county is unknown and three whose district is unknown due to these areas being truncated by the map edge. There is also one on the left edge of the map whose nationality has not been supplied and cannot be deduced. It is known to be land, however, due to the wider context made explicit by the processing.

It was then possible to build the hierarchy of administrative areas within the restrictions caused by the lack of data. This involved:
1)   the allocation of districts to counties, and counties to countries; and
2)   the extraction of the boundaries of the areas by aggregating their primitive regions and eliminating internal boundaries.

The area objects do not yet exist as such within the system; instead, they are held as attributes of the primitive regions. However, there is the knowledge within the system for the extraction of the entire hierarchy of area objects in a variety of forms, including the formats required for transfer to other systems (such as GIMMS and DLG‑3).

The use of an intermediate data structure, based on DAM, which focussed on the geometric topology of the primitive regions rather than on the topology of the links, made it possible to validate partially the OS data, for example:

1)   that the disjoint parts of area objects at each level of the administrative hierarchy each contain just one area‑seed for that level;

2)   that each area object is completely bounded by links with the appropriate feature codes;

3)   that the administrative areas form a proper hierarchy;

4)   that the type of each area is appropriate to its status as land or sea; and

5)   that all areas are named, except those on the edge of the map.

We found just one coding error; the links which form the northern seaward extension of the England/Wales boundary had been feature coded as a county seaward extension. This violated the rule that the feature code of a boundary should correspond to its highest level in the administrative hierarchy.

5.4 Summary of the Project

The project confirmed that it is possible to derive transformations of the OS 1:625,000 experimental data for manipulating hierarchically related objects and their polygonal descriptions. The object hierarchy had to be inferred from fragments of information, using the scattered clues and the nesting rules for the hierarchic link and area‑seed feature codes. The concept of the primitive region, and the knowledge of the geometric topology, provided a convenient framework for solving the puzzle. Further details of this project can be found in Visvalingam et al (1985).

A Geographic Information System should be capable of deriving a complete description of hierarchically related and/or overlapping area objects from feature‑coded map details, that is, fragments of information. In theory, ARC/INFO appears to be capable of this. However, the verification of the resulting information base would involve the re‑formulation of relatively simple rules in terms of possibly complex relational operators. Moreover, in some cases such verification may be impossible if the user does not have access to all information, especially information supplied by the user (for example, multiple polygon labels within one polygon).

<Conclusion>


© Dr Mahes Visvalingam, University of Hull, March 2003

Cartographic Information Systems Research Group, University of Hull