Area Topology for Road Extraction
and Topographic Data Validation
by  © Varley, D A and Visvalingam, M (1993)
CISRG Discussion Paper Series No 11, University of Hull, 34 pp

For the sake of posterity, please cite from the published version:
Varley D A and Visvalingam, M (1994) "Road Extraction and Topographic Data Validation using Area Topology" Computer J 37 (1), 3 - 15

CONTENTS

1.  Introduction   
2.  Background
3.  From Line to Area Topology
    3.1 Input Data
    3.2 The Disassociative Area Model
    3.3 Verification of Input Data
4. Road Extraction Using Point-in-polygon tests 
    4.1 Selection of Candidate Regions
    4.2 Selection of Seeds
    4.3 Labelling the Regions
    4.4 Discussion of Results
5. Road Extraction Using Topological Cues
6. Label-based validation
7. Conclusion and Future Work 
Acknowledgments
References

 


 

 

 

 

1.  Introduction 

The Cartographic Information Systems Research Group (CISRG) of the University of Hull is researching the automatic recognition of spatial objects based on their spatial descriptions alone. The feasibility study, reported in this paper, formed a part of an SERC CASE project (January 1990 January 1993); the collaborating body was the Ordnance Survey of Great Britain. The project involved the recognition of objects implied in large scale topographic maps. We define object recognition as the process of identification of objects from their forms alone; i.e. without recourse to semantic labels. Unlike recognition, extraction uses all available information, including semantic labels manually associated with line segments. This paper is concerned with the problem of road extraction. The principal aim of this feasibility study was to assess whether it would be possible to recognise roads given the structure and content of an experimental topographic database designed and created by the Ordnance Survey. The emphasis in the feasibility study was on acquiring a good understanding of large scale topographic data and of the data processing problems. Although some attention was paid to the efficiency of processing, this was not the primary concern at this stage. Efficient processing of geo-spatial data is a major topic of research which has to take into account a wider range of considerations (Frank, 1991).

This paper makes a number of contributions. Firstly, it identifies novel procedures for road extraction. Roads would normally be extracted by finding the regions within which road centre lines or road names occur. This research describes a more efficient and robust method. Roads can now be extracted by using only the topological information, normally held in memory, using concepts embodied within the Disassociative Area Model (Kirby et al, 1989). There is no need to access the vertices on disc which define the precise boundaries of regions. Secondly, the study also suggests how the extraction of roads can be trivialised by minor changes to the data specification.
  Thirdly, the paper suggests how software for re casting the data into DAM format can be used for detecting violations of the specified data structure. Fourthly, it suggests the type of semantic checks which can be carried out to ensure that relevant aspects of the data are consistent and to identify errors. These checks indicate that the data specification does not expedite road recognition since the boundaries of extracted roads include extraneous features. Thus, road recognition cannot rely on simplistic rules about roads but must accommodate the presence of anomalies. Fifthly, the study demonstrated how some of these anomalies may be automatically detected. Finally, the study has provided a catalogue of roads which can be used to derive empirical rules for recognition and to assess automatically the success of alternative procedures for recognition.

The paper is organised as follows. In the next section, the paper sketches a brief background which indicates the rationale for this work and some underpinning concepts. It then briefly describes the input data and its recasting into the data structures associated with the Disassociative Area Model (Kirby et al, 1989). The procedures which were adopted for verifying that the data conformed to the anticipated structure are then outlined and potential consequences of some violations are described. Next, the implications of the point in polygon approach to road extraction are considered. The paper then outlines a much simpler novel process for extracting roads. It then describes the procedures which were adopted for checking the consistency of semantic labelling and discusses the results. It concludes by summarising the main findings.

2. Background

The late 1980s has witnessed a growth in Geographical Information Systems (GIS). In Britain, the Chorley Report (DoE, 1987) has led to the formation of the UK Association for Geographic Information (AGI) and GIS Specialist Groups within many learned and professional societies, including the British Computer Society. The Chorley Report defines a GIS as "a system for capturing, storing, checking, integrating, manipulating, analysing and displaying data which are spatially referenced to the earth" (p 132). It refers to the diversity of users and uses of geo‑referenced data which give rise to different types of GIS (Visvalingam, 1990). In Britain, Land and Property Information Systems (LIS) use large‑scale (1:1250 and 1:2500) topographic data as a base reference against which they record, manage and inter‑relate their own data. The Ordnance Survey of Great Britain (OS) has been responsible for creating the topographic base maps used by numerous bodies, such as Her Majesty's Land Registry, the utilities and Local Authorities. The OS is renowned internationally for its pioneering activities in digital mapping since the early 1970s and its Research and Development Division continues to experiment with alternative database designs to meet changing market needs.

 

Users of topographic data are ultimately concerned with objects of interest to them. Different applications may link different data with the same objects. Topographic objects are described by spatial and other locational references.  A road network, for example, will have multiple polygons which describe its extent and which exclude regions, such as central reservations and roundabouts, which do not form a part of the road. At the application‑level there is growing interest in object‑oriented databases. However, the design of the class structure of objects is application dependent. For example, vehicle routing, highway maintenance, cutting of grass verges and other activities which relate to roads will tend to develop their own object structures. Consequently data vendors, such as the Ordnance Survey, tend to supply only partially structured data and it is the users' responsibility to extract the necessary object descriptions. The vendor may add value to data by making explicit, through a combination of automated and manual processing, some objects implied by the basic topographic data. For example, the OS itself defines some objects, such as vegetation, for its own map production. However, the automatic recognition and labelling of objects encoded in digital topographic maps remains an outstanding problem.

 

The OS vector topographic data consist of point, line and text features. A semantic label, called a feature code, is associated with each feature. Features are partial descriptions of objects. An object may be described by one or more features and a feature may form a part of several higher‑level features and objects. Object recognition seeks to add value to data by grouping features so that they form higher‑level entities, both features and objects, and by either relating currently free‑standing, or else manually input, information with these derived entities.

 

In the past, data suppliers have supplied feature‑coded data in unstructured, so‑called spaghetti, form. In spaghetti form, lines are broken when the feature code changes but may otherwise cross each other and themselves. This simple structure is sufficient for semi‑automated map production but it does not facilitate the chaining of the polygons which define areal objects. For this, spaghetti vectors must be structured into a link‑and‑node form. In this structure, lines are not allowed to cross themselves or other features. Where such intersections occur, lines are split forming segments, called links by the Ordnance Survey. All links begin and end at nodes, which establish the connectivity of the links. The process of link‑and‑node structuring is usually automated. The output of this process can contain a few errors such as overshoots, which need to be trimmed, and is thus subject to a cleaning process. The OSBASE data we received was link‑and‑node structured and cleaned.

 

The feasibility study on road extraction had to achieve the following tasks. It had to:

  • organise the link‑and‑node data into a form which explicitly recorded, and topologically related, the regions of space implied by them.

  • develop efficient and robust procedures for identifying and labelling roads.

  • validate the data

 

Figure 1

3.  From Line to Area Topology

3.1 Input Data

The OS has derived a number of prototype topographic databases for experimental purposes. The feasibility study was based on a variant of the OSBASE dataset. Many of the properties of this experimental version, described below, had to be uncovered from detailed OS internal reports, personal discussions with R & D staff of OS and by exploration of the database. OSBASE data for Birmingham, consisting of 12 sheets, were received in July 1990. Topologic processing revealed errors. Some of these were manually fixed by us. We then received a further 16 sheets of OSBASE data for Canterbury, which was believed to be correctly edge‑matched and of higher quality, in February 1992. One of the objectives of the study was to use spatial data models to validate the data.

 

OSBASE records point, line and text features on published (i.e. hardcopy) 1:1250 base maps. To satisfy internal requirements for automating area fill in map production, a task which had previously been completed manually, OSBASE included explicit descriptions of some areal objects, such as roofed areas, water features, vegetation and slopes. The geometric details of these features have been rigorously checked to ensure that the polygons describing these objects are properly closed. Area seeds were added to link in relevant semantic information as well as the cartographic symbolism for rendering.

 

Other areal objects, including roads and land parcels, are only implied and not explicitly defined within OSBASE. OSBASE contains two types of information about roads, namely: 

  • Some of the links forming the edges of road polygons are feature‑coded as ROAD METALLING links. Only one feature‑code is associated with each link. Since a link may form a part of several objects, feature‑codes are assigned priorities. Feature codes associated with road boundaries do not have the highest priority. The following set of features are listed in descending order of priority in the data specification.

Mean High Water

Building detail (solid or pecked)

Water detail

General detail solid

Road detail (ROAD METALLING)

Other detail           (general detail pecked, including buildings with general

                            detail pecked, slopes and cliffs)

Mean low water

Vegetation/landform limits

Thus some of the links forming roads may not be tagged as ROAD METALLING. Figure 1 shows how a road may be defined by a set of dark links, building and general detail, both solid and pecked and water detail in addition to ROAD METALLING links.

 

  • Road names are included as text features. They are cartographic annotations and their digitising specifications are geared towards internal map production within OS. Road names are not the best clues for extracting roads. They name branches of the network, i.e. sub­objects, and may be repeated for ease of visual processing. Thus many names may refer to the same road network. Also, truncation by the map edge can result in a connected network being represented by two independent networks. Thus, there is a many‑to‑many relationship between road names and road networks. Furthermore, road names may be placed outside roads if their placement within is likely to detract from the legibility or clarity of the map. People easily associate such re­located names with roads. Also, minor road names maybe omitted in towns where they are likely to obscure detail. Although road names are useful for segmenting and labelling branches of the road network at a later stage, we decided not to use them in the feasibility study.

There is also a separate layer of information, containing networks of road centre lines for the areas covered by the underlying OSBASE maps. This information was manually digitised as a free‑standing layer which is geometrically connected to the underlying OSBASE data only at the map edges and at the head of cul‑de‑sacs which do not terminate in a roundabout.

 

Figure 2      Figure 3


The feasibility study did not use all the layers within OSBASE since it was quite apparent that roads could be extracted using the links in the OSBASE and centre line layers. A decision was made to exclude links relating to underground features and administrative boundaries in this feasibility study. Overhead features such as electricity lines, recorded in a separate layer within OSBASE, were also excluded. Other overhead features, such as overhead roads and walkways, which form a part of the topographic base, were retained. The impact of these higher level features had to be studied before accommodating underground ones. Also, point features, area seeds and text features, which were superfluous to this study, were excluded. The input to the feasibility study therefore consisted of a subset of the links which make up some OSBASE features and the road centre line layer. An example of the OSBASE input for one sheet is shown in Figure 7a.

 

Since the data conforms to the link‑and‑node model, it can be cast into the data structures shown in Figure 2. They are described below.

 

3.2  The Disassociative Area Model

 

Kirby et al (1989) described the advantages of disassociating and independently processing the spatial and aspatial descriptions of topographic objects. They described the concepts underpinning their Disassociative Area Model (DAM) for recording area topology and outlined procedures for deriving such topological information from link‑and‑node structured spatial data. They discussed the flexibility that this offers for modelling geographic phenomena by linking aspatial with spatial descriptions at a later date. This facilitates the cross‑checking of the consistency of semantic coding. They used the DAM model to locate the few residual errors remaining in the 1:625000 experimental database of the hierarchy of administrative areas in parts of England and Wales. Visvalingam and Sekouris (1989) also demonstrated the utility of DAM for locating geometric and semantic errors in an experimental 1:50000 topographic database. However, the methodology for validating topographic data is still underdeveloped.

 

The creation of digital topographic maps is an involved process with a series of automated and manual stages. Given the size of the databases and the cost of rigorous checking, it cannot be assumed that digital maps are error‑free nor that they meet the users' needs with respect to their structure and content. The feasibility study was therefore undertaken to expose the types of data‑related problems facing the automatic recognition of roads based only on their forms and juxtaposition vis‑a‑vis other features.

 

The first decision which had to be taken was to determine whether the OSBASE and centre line links should be treated as separate or an integrated set. In theory, if the two sets of links were used to form polygons, then roads could easily be identified as consisting of the polygons on either side of centre line links. However, OSBASE is only a cartographic description of topographic features. Thus road networks are segmented when they are occluded by overhead features. For example, where overhead pedestrian crossings occur, only the uppermost feature is shown on the published map since the occluded information is not necessary for visual processing. The under‑passing road is neither digitised nor recorded in OSBASE. The centre line network, on the other hand, is digitised for route planning purposes and is thus continuous. As a result the centre line links can segment all sorts of overhead features, for example, where complex motorway sections cross another road. Thus, polygons which are not roads can occur on either side of road centre lines. Moreover, since the two layers are only partially connected, users would have to re‑structure the combined line information, which could introduce a new batch of geometric errors the cleaning of which is a tedious and costly process. Road centre line links were thus not included with OSBASE data but were used separately.

 

Software developed by Wade (Kirby et al, 1989) was used to add further fields and structures to the link‑and‑node structure. The data structures in Figure 2a articulate the Disassociative Area Model (DAM) for representing area topology. They are only indicative of the type of information required since such information may be derived and represented in other ways. For example, Kirby et al (1987) and Visvalingam and Sekouris (1989) considered how these same entities may be represented and managed using a relational database model.

 

Wade's software chains the OSBASE links into a set of polygonal boundaries, which are then structured to form a geometric hierarchy consisting of enclosing boundaries alternating with holes. Figure 3 illustrates this structure. A brief description of the underpinning concepts and data structures are provided here; a fuller description is provided in Kirby et al (1989) .

 

In Digital Cartography, link‑and‑connectivity is encoded using the link, node and coordinate records represented in Figure 2a. One link record is used for each link. The coordinates of the two end points of the link are stored in the node records and any remaining internal points are stored separately as a contiguous block in a coordinates file. The link record points to the start and end nodes, to the start and end of the block of coordinates and, to a list of feature codes. Initially, this list consists of only one feature code.

 

There is one node record for each location where one or more links start and end. With each node is kept a list of the record numbers of the links which meet there. The record number is signed to indicate whether the link starts (positive) or ends (negative) at the node. The bearing at which the link joins the node is also held and the link list is sorted on this field so that links occur in a clockwise order around the node.

 

The Disassociative Area Model, first reported in Wade et al (1986), adds a boundary record to model the area topology. Each boundary is a closed loop, with direction (see Figure 3b). Boundaries are sub‑classified into enclosing boundaries and holes. Each enclosing boundary records the outer extent of a primitive region; the inner boundaries of the latter are termed holes. Each boundary forms an extent of one, and only one, primitive region and each primitive region is bounded by one enclosing boundary and zero or more holes. There is one exception; the outermost primitive region has no enclosing boundary but just one or more holes (in Figure 3b, the outermost primitive region is described by holes 1 and 5).

 

The boundary record is used to hold the details of each boundary. The containment relationship between a complete set of boundaries may be viewed as forming a hierarchy represented by a rooted tree (Figure 3c). Each boundary is enclosed spatially by every boundary which precedes it in the tree but no other. The level of each boundary in the tree corresponds to the number of boundaries which surround it. The outermost enclosing boundary at the root of the tree is a nominal reference to the part of the plane surface which surrounds all the other boundaries. The derivation of such a tree fully resolves the containment relationships between the boundaries and thus the spatial extent of the primitive regions since the holes within each primitive region immediately follow the enclosing boundary of that primitive region in the tree. The boundary record notes the level of each boundary and its surrounding boundary. The link record keeps a record of the boundaries to the left and right of each link.

 

Once a primitive region assumes an identity, it becomes the basic building block for modelling areal objects and it forms the connection between the geometry and the geography. Within GIS, there is often a many‑to‑many relationship between primitive regions and areal objects. However, in this feasibility study, the primitive regions need only be labelled initially as CANDIDATE ROADS and eventually as ROADS, ROAD NEIGHBOURS or UNCLASSIFIED.

 

   Figures 4 & 5

3.3 Verification of Input Data

To recapitulate, the extraction of the area topology includes the following stages:

 

* Checking of input data

* Chaining the links into polygons

* Forming the geometric hierarchy of polygons as shown in Figure 3.

 

The checks which were performed were described in Visvalingam et al (1987). When the OSBASE data was processed by this suite of software, it failed to form the hierarchy of polygons for one of the Canterbury sheets because it could not find the level 1 hole, i.e. the map edge polygon. The precise location of the error proved to be a time‑consuming task. The problem is characterised in Figure 4. In Figure 4a, links a to d form part of the map edge. Link e is a part of a ROAD METALLING link. Part of e (e') is co­incident with a part of c, namely c' (see Figure 4b). The link‑and‑node structure does not allow links to overlap and coincide in this way. If a node was inserted at E, this could result in duplicate occurrences of e', which was only 30 cm on the ground. This is smaller than the survey tolerance. Stage 1 was thus enhanced to detect such cases by including a check which flag links with identical bearings at a node. This revealed a number of instances of duplicate links and dangling lines without free nodes which were only about 2 cm on the ground on average. This suggests that these and other errors, some with configurations similar to those in Figure 4, are software generated. In the event only the situation described in Figure 4 proved to be critical and was patched manually. Since cleaning is time consuming, we ignored the other errors which could prove to be critical to other tasks. Although OSBASE does not guarantee that road polygons are properly formed, there were relatively few cases of geometric inconsistencies, some of which are described later.

  Figure 6

4.   Road Extraction Using Point-in-polygon tests

Road centre lines are manual abstractions of the connectivity of the road network. Since centre lines are always located within road polygons, the extraction of road networks may be conceived as essentially one of finding those polygons which contain the centre lines. This idea may be articulated in different ways. In this section, we consider the implications of using the point‑in‑polygon approach for finding the containing polygon. In the next section, this is contrasted with the use of topological clues.

 

In the previous section we described how the OSBASE links were re‑structured to make explicit the primitive regions they describe. Road extraction now consists of the following steps illustrated in Figure 5.

  • select only those primitive regions which were likely to be roads for further consideration

  • select suitable seeds, i.e. points on the road centre line network, for point‑in‑polygon checks

  • label primitive regions which contain these seeds as those forming road networks

4.1 Selection of Candidate Regions

Road networks are regions which contain centre lines. In order to minimise computation, road networks were initially assumed to be those regions which contain at least one ROAD METALLING link. Candidates are found as follows:

 

Select all links whose Feature Code = "Road Metalling Link"

Create a unique list of all the boundaries referred to in their

        Left Boundary and Right Boundary fields

Select only those boundaries which are enclosing boundaries

 

Of the 44186 boundaries, only 1669 (3.8 percent) were selected as candidates. The status of each of these candidate boundaries is noted in a separate list as shown in Figure 2b.


4.2 Selection of Seeds
 

In concept, only one point on the centre line network is needed to find the enclosing polygon. In practice this is somewhat more involved because of the presence of special cases. The strategy adopted may be summarised as follows:

 

FOR each Centre Line Network

CASE (Number‑of‑Nodes in Centre Line Network)

 

2+)    Avoid dangling nodes (i.e. nodes with only one link attached to it) since these may lie on the boundary of the object leading to ambiguity. SELECT AN INTERNAL NODE WITH AT LEAST 2 CONNECTED LINKS

 

2)     IF there are internal points present in the coordinates file

        THEN take internal point

        ELSE calculate an intermediate point.

 

Else) ERROR

 

The status of each seed is noted in a list (see Figure 2b)

 

4.3 Labelling the Regions

 

This essentially consists of finding the region (i.e. enclosing boundary) which contains the seed. However, as described below, it is more efficient to search through the seeds for each polygon instead of through the polygons for each seed. 

Label all candidate boundaries as UNCLASSIFIED

FOR each candidate boundary

      set found to FALSE

      IF (UNCLASSIFIED candidate boundary)

      THEN { WHILE (UNUSED seeds in list AND NOT found )

                  { IF (next UNUSED seed is inside )

THEN set found to TRUE

          label candidate boundary as ROAD

          add label ROAD to all links which form candidate boundary

          label seed as USED

          FOR each neighbour (i.e. boundary on other side of link)

IF (there are no other ROAD METALLING links in neighbour) label boundary as ROAD NEIGHBOUR

 

     )

                       )

 

Note that in the DAM structure, it is only necessary to label the enclosing boundary as ROADs since this implies that any holes at the next level in the hierarchy also refer to the same object. In general, if a candidate boundary is a ROAD, then it is possible to make some assumptions about the neighbour. The neighbour shares one or more ROAD METALLING links with the identified road. If these are discounted,

*    a neighbour which has no other ROAD METALLING link is unlikely to be another road.

*    the status of a neighbour with additional ROAD METALLING links cannot be determined.

 

At the end of this labelling process, candidate boundaries will be labelled as ROAD, ROAD NEIGHBOUR, or as UNCLASSIFIED. Seeds would be labelled as USED or UNUSED.

 

 Figures 7 - 9


4.4 Discussion of Results

The process of road extraction is essentially one of classification. The success of this process may be evaluated by identifying and explaining errors of omission and commission. All cases of the following types were automatically identified and manually inspected:

  • Road seeds which remain UNUSED
    There was only one instance of this type. It occurs because its enclosing boundary, which forms a genuine road, does not contain any ROAD METALLING links. The probability of this occurring in an extensive road network is very small. These errors of omission tend to occur where truncation by the map edge leaves a small section of road as shown in Figure 6a. As a result of building and water detail having higher priority in the feature coding hierarchy than ROAD METALLING, none of the links in this road boundary have the ROAD METALLING feature code. The polygon was thus not identified as a candidate.

There are two possible solutions to this problem. One is to highlight these cases for manual checking and labelling. The other is to achieve a more complete solution. The latter solution is possible but this means that all enclosing boundaries which touch the map edge, and not just those with ROAD METALLING links, are considered as candidate boundaries. This would mean that some 6170, instead of 1669, boundaries (i.e. 14.0 percent, instead of 3.8 percent) would become candidates.

  • Candidate boundaries with ROAD METALLING links which remain UNCLASSIFIED. These occur in two situations:

1)  when only one side of a road skirts the map edge resulting in a missing seed (see Figure 8). Without a road centre line, both the road and the adjoining regions will remain UNCLASSIFIED and will have to be manually checked and labelled as either ROAD or ROAD NEIGHBOUR.
 

 2)  when a road network is segmented by overhead features (see Figure 6b). Here selection of a single seed is insufficient as the overhead feature truncates the road into two polygons, only one of which contains the sample seed. An analysis of these errors indicate the following:

‑ the algorithm for road extraction was too simplistic

‑ the data specification does not expedite road extraction

 

The problems related to data are considered first since they guide the problem‑solving strategy. As pointed out earlier, OSBASE is a cartographic model of data; these cases show how only the uppermost features are depicted in maps. Road centre lines, on the other hand, record connected networks. Thus, where overhead features occur it is not possible to use just one seed from the centre line network.

 

Again there are two solutions. One is to use automatic validation procedures to highlight UNCLASSIFIED enclosing boundaries for manual labelling as before. Another somewhat involved procedure is to consider all the points in the road centre line network. It is insufficient to consider only the nodes of this network. This is done as follows:

 

Label all vertices in the road centre line network as UNUSED.

WHILE (still UNUSED vertices)

( Select one seed as before

When this seed is labelled as USED

(    Label all points which are either inside the identified road

      or its neighbours as USED

     )

)

 

( This would also eliminate another type of error. There were two instances when road centre lines occurred within what appeared to be neighbours. In one, the centre line extended into and terminated within a roundabout. This is a residual error due to a change in data specification. The other case is typified in Figure 6c. Here link A crosses a ROAD METALLING peck and extends into an access road. Other similar access roads have not been included in the road centre line network; this suggests that the digitisation of Link A may be have been an error).

 

Unfortunately, the inclusion of all points in the road centre line network will lead to errors of commission since points on the road centre line can and do fall within other regions which are not roads. For example, in Figure 6d, road centre line points can fall in region A and C, which cannot be ruled out as candidates until region B is correctly identified as a road.

 

In addition to these errors of omission, the data specification introduces some anomalies. Road boundaries include features which are not metalled roads. Figure 7b illustrates how the road can extend into pavements, driveways and other features (see, for example, the cases labelled A ‑ C). In most instances, these extraneous features are separated from the road network by ROAD METALLING links. ROAD METALLING links are sometimes omitted, for example, where a road ends and a drive to a block of garages begins. Since neither roads nor appended features are area‑filled in OS maps, OS does not need to separate roads from other features. Whilst these link omissions do not affect OS map production, the inclusion of extraneous features can impede road recognition since they can distort the metrics, such as shape, which have been used by others to guide the process of object recognition. Thus, procedures need to be developed to detect automatically the presence of extraneous features.

 

  Figure 10

5. Road Extraction Using Topological Cues

The point‑in‑polygon approach to road extraction is modelled on the direct visual perception of roads as regions traversed by the road centre‑line network. It is inefficient because it involves the retrieval from disc of locational information (co‑ordinates). Spatial data models, such as DAM, are intellectual constructs devised to provide a framework for reasoning. They guide us towards more efficient solutions although, once found, the solutions are not dependent upon the re‑structuring of the data into DAM form. Also, these solutions encourage us to visualise the problem in a different way.

 

The problem of road extraction is essentially one of identifying the region within which the road centre line lies. Centre lines have to be used because, the ROAD METALLING links do not indicate whether they bound regions to the left or the right of them. This impedes extraction when we have sibling regions as in Figure 8a. However, since roads form connected networks, ROAD METALLING links in holes provide unambiguous clues. For example, in Figure 8b, region C which is inside a hole cannot form a road since it does not lead anywhere. Thus, ROAD METALLING links forming holes normally indicate that the region which contains C, namely B in Figure 8b, must be the road. Unfortunately, not all roads contain holes. Moreover, not all holes in roads contain ROAD METALLING links.

 

However, holes at all levels of the geometric hierarchy provide clues for identifying roads by elimination of unlikely candidates. The hole at level 1, namely the map border, is equally informative. We noted earlier that the road centre lines connect to the OSBASE layer at the map edge. Since the region outside the map lies to one side of the MAP EDGE links with centre line nodes, the level 2 enclosing boundaries on the other side must be roads. We can use this knowledge for labelling the candidate boundaries used in the point‑in­polygon test. The process of extraction consists of the following steps.

 

1) Label the obvious cases first.

  • Use the road centre line nodes at the map edge to locate the MAP EDGE links within which they lie.

  • Label the enclosing boundaries containing these links as ROAD, irrespective of whether they contain ROAD METALLING links.

  • Label associated entities (neighbours, holes and links) as before.

Here the point‑in‑polygon test is replaced by a range check, involving only the nodes; thus the vertices in the Coordinates file on disc need not be accessed. This step will identify all roads with centre lines which intersect the map edge.

 

2) Label the roads skirting the edge which do not have a centre line.

  • Select the Still UNCLASSIFIED level 2 enclosing boundaries, with both MAP EDGE and ROAD METALLING links.

  • IF (an enclosing boundary has one and only one MAP EDGE link) THEN label it as ROAD and label associated entities.

In cases, such as that in Figure 9a, this will correctly identify B as the road. If A were the road, a road centre line will have been included. However, in some cases, as shown in Figure 9b, it is possible to pick D first and wrongly identify it as a ROAD. Region C will not be labelled as ROAD NEIGHBOUR since it includes ROAD METALLING links other than those which form the boundary between C and D. Since two neighbouring polygons will now be labelled as ROADS, it is possible during the review stage to re‑label D as ROAD NEIGHBOUR automatically as it does not have any other ROAD METALLING links. Since these roads are deduced rather than explicitly recorded in the database, their labelling must be manually checked since data conditions can lead to wrong conclusions. There were no problems of misidentification in the sheets we processed.

 

3) Label the roads which do not terminate at the map edge.

 

By now only those candidate boundaries which are detached from the main road network, and others arising out of digitising or link‑and‑node structuring errors, will remain UNCLASSIFIED. (In the given data, these form only 2.6 percent of the candidate boundaries.) Clusters of boundaries can become detached because of the presence of overhead features (see Figure l0a). It is possible to resolve these detached roads in several ways:

  •  manually; this should be a last resort and should really be used for checking purposes only

  •  point‑in‑polygon checks as before.
    By now, only a small number of candidate boundaries, which consist of few vertices remain. Thus, these checks are no longer as onerous.

  • topologic reasoning
    If we visualise each cluster of detached enclosing boundaries as occurring within a hole (as in Figure 10b), then only the road
    has at least two sibling candidates with which it shares ROAD METALLING links. Although this rule resolves all the cases we have encountered, it may not be foolproof.

  • a change to the data specification
    Currently, the road centre line layer is not topologically integrated with OSBASE. For example, there are no nodes where the road centre line crosses links recording overhead features. The outermost links recording overhead features can be distinguished since they are feature coded as either BUILDING PECKED, where a building forms the overhead feature, or GENERAL SOLID for other
    features. If these were treated as vertical edges, i.e. like the
    map edge, and if the centre line network was connected to OSBASE at these links by nodes, then road extraction would become a trivial process.

At the end of this three step process only the loops of ROAD METALLING links and other errors, generated during the link‑and‑node structuring stage, and ambiguous cases as in Figure 6c, will remain UNCLASSIFIED.

 

6. Label-based validation

The labels assigned to links and boundaries can be used now to locate inconsistencies in the value‑added database and assess whether they are critical or insignificant. These inconsistencies must be ascribed to one or more of the following reasons:

 

a)     errors in extraction process; these may be due to incorrect

        interpretations of the data specification or inappropriate logic

b)     anomalies caused by edge effects and overhead features

c)     errors in data arising out of contraventions of the data specification

d)     inconsistencies arising out of ambiguities in the data specification

e)     difficulties arising out of the nature of the data specification

 

Two sets of checks were undertaken. The first set checked the correspondence between the OSBASE and road centre line layers; the second validated the consistency of OSBASE data.

 

6.1 Correspondence Between OSBASE and Road Centre Line Layers

 

The following checks were made.

  • Manually check if all regions labelled ROADs, which contain at least one point on the road centre line, are actually roads.
    They appear to be.

  • Automatically flag all regions labelled as ROADS if they do not contain at least one point in the road centre line network.
    This check only identified the cases which could be attributed to cause b. Although the centre line does pass through it, the small detached section of road located in a hole in an overhead motorway (see Figure 10), need not have contained road centre line points. The point‑in‑polygon approach would miss such cases.
     

  • Automatically flag points in the road centre line network which lie within regions which have not been classified as roads
    This identified some overhead features as expected (cause e; see Figure 6b). The case, sketched in Figure 6c, is the only one of its kind in 28 sheets it is likely that it is due to cause c). There was one further case where the road centre line extended into and terminated in a roundabout. This arose from data digitised under a now superceded digitisation specification.
     

  • Automatically flag MAP EDGE links, bounding a region labelled ROAD, which do not contain a centre line node
    There were several violations of this rule and they were all the result of missing LOGICAL links to close off roads, i.e. cause e. For example, the MAP EDGE links of feature C in Figure 7b will be flagged.

These checks indicate that apart from the two cases due to cause c), manual input of centre lines is very reliable.

 

6.2 Consistency of the OSBASE Layer

 

*   Automatically flag enclosing boundaries with residual ROAD METALLING links which remain UNCLASSIFIED. In this check, ROAD METALLING links which already form a part of boundaries labelled as ROAD are discounted.

This found a few (37) cases due to digitising or cleaning errors which formed either a loop in the side of a road or left a small section of ROAD METALLING link in a neighbouring feature, such as a pavement. These should have been removed in the cleaning stage but they do not impede road recognition.

 

*   Automatically flag links of ROADs which have the same boundary number in both the left/right boundary fields

This implies that a ROAD has become a neighbour of itself. There were several instances of this. They appear to be due to missing LOGICAL links (cause e).

 

*   Automatically flag ROADs which are neighbours of other ROADs
    This does occur and can be attributed to missing LOGICAL links (cause e).

 

*   Automatically flag boundaries forming roads which include feature codes which are of lower priority than ROAD METALLING

 

This check identified a number of occurrences of links coded as GENERAL PECKED or VEGETATION in road boundaries. They seem to be due to missing LOGICAL links (cause e) more often than to coding errors. But there are a few cases where the semantic labelling is incorrect. It would be very difficult to locate such residual errors manually without using a model‑based approach, such as that used in this paper.

 

In general, the node‑in‑edge method for road extraction is more reliable than the point‑in‑polygon approach and, even using our still unoptimised implementations, it is about five times faster.

 

OSBASE did contain some link‑and‑node structure violations and a few feature coding errors. These are unlikely to impede road recognition in the majority of cases. The inclusion of extraneous features within roads is more of a problem. However, the study has revealed several ways in which their presence could be detected.

 

7. Conclusion and Future Work

 

This paper has demonstrated that the point in polygon approach to road extraction is inefficient and unreliable owing to special cases in topographic data. It provides a much simpler and more robust method based on topologic reasoning. The Disassociative Area Model (DAM) provided a framework which allowed us to reason about road extraction in different terms. Once found, the solutions may be implemented without re casting link and node structured topographic data into the full DAM format.

The paper has also demonstrated how the process of object extraction can assist in the verification of the topology and the validation of the semantic content of data. Now that the roads have been extracted, it is possible to assess the scope for automatic segmentation and naming of roads. This process would not only use the attributes of road centre lines and road text features but also the clues deduced by this feasibility study.

The catalogue of roads versus road neighbours and other objects can be used to guide the process of road recognition and evaluate.the success of recognition algorithms. The exercise has also suggested how minor changes in the data specification can trivialise road extraction. Roads also provide contextual information to guide the recognition of other topographic objects of relevance to GIS.

 

ACKNOWLEDGEMENTS

This research was made possible by the award of a UK Science and Engineering Research Council (SERC) CASE studentship to Dominic Varley. We are grateful to the Ordnance Survey of Great Britain (OS), the collaborating body, for providing access to their digital topographic data and to the required information. We are particularly grateful to John Farrow, formerly of the OS, for his help, encouragement and input by way of discussion. Thanks are due to others at the OS, especially Nigel Venters and Steve Erskine for answering numerous queries relating to data and Ross Christie for permission to publish this paper. We are indebted to Phil Wade, a past SERC CASE student within the CISRG, for permission to use his software.

REFERENCES

Department of Environment (1987), "Handling Geographic Information", Report of the Committee of Enquiry chaired by Lord Chorley (HMSO, London)

 

Frank, A. U., (1991) "Properties of Geographic Data: Requirements for Spatial Access Methods", Advances in Spatial Databases, ed. Gunther, 0 and Schek, H ‑J, Proceedings of the 2nd Symposium, Zurich, August 1991, 225 ‑ 234.

 

Kirby G. H., Wade P. and Visvalingam M. (1987), "Storage and retrieval of topographic data using a relational database management system", in Haywood P. (ed.) Proceedings of the SORSA `87 symposium, Durham (Ordnance Survey, Southampton).

 

Kirby G. H., Visvalingam M. and Wade P. (1989), "Recognition and representation of a hierarchy of polygons with holes", The Computer Journal, 32 (6), 554 ‑ 562.

 

Visvalingam M. (1990), "Trends and concerns in digital cartography", Computer‑Aided Design, 22(3), 115 ‑ 130.

 

Visvalingam M., Kirby G. H. and Wade P. (1987), "Extraction of a complete description of hierarchically related area objects from feature‑coded map details", in Haywood, P. (ed.) Proceedings of the SORSA `87 symposium,

Durham (Ordnance Survey, Southampton).

 

Visvalingam M. and Sekouris N. M. (1989), "Management of digital map data using a relational database model", End‑of‑grant Report to OS available as Cartographic Information Systems Research Group Special Issue 3,

University of Hull, 50 pp.

 

Wade P., Visvalingam M. and Kirby G. H. (1986) "From Line Geometry to Area Topology", Cartographic Information Systems Research Group Discussion Paper 1, University of Hull, 48 pp.


© Dr Mahes Visvalingam, University of Hull, Uploaded June 2003

Cartographic Information Systems Research Group, University of Hull