Blog | The Gascon Rolls Project

First steps in geocoding

Posted on août 5, 2014, 11:15 matin by Emma Tonkin

As part of the calendaring process, historians working on the Gascon Rolls routinely identify and encode a variety of types of entity: people, places, things, entry type and so forth. This information, whilst useful in itself, may be considered most useful when placed in a broader aggregate context. How did things change throughout the life of this foothold of English royalty in France? Did the focus move over time, from one town to another, or from one wealthy family to the next?

Answering these questions and others involves the enhancement of the entity information held by the project to allow us to store and present as much information as possible about the entities involved and the contexts in which events occurred and transactions were made. In plain terms; we would like to be able to answer the core metadata questions of who, what, when, where: the answers to these questions can become useful evidence to underlie speculation about the fifth W: 'why'.

In this blog post we cover the geocoding of entities. In later posts, we'll talk about the use of this information to connect calendar entries and membranes (that is, images of the 'pages' of the Rolls) and show how the resulting dataset can be visualised using the open-source R data analysis and statistics package.

This geospatial work is part of the Bordeaux and Leverhulme funding phases of the Gascon Rolls project;we're collaborating with the team at Bordeaux led by Frederic Boutoulle and Francoise Lainé on geospatial issues.

From names to numbers

By convention, the Gascon Rolls encoding process includes the provision of basic location information: name of locality, region and so forth. For example, Westminster is linked to other entities in the location hierarchy including its parent, London. Consequentially, this encoding provides partial address information. These can be geocoded (that is, we can establish the approximate geographic coordinates belonging to that entity) using an approprate address/ location database.

In the case of French locations, the Gascon Rolls project has made use of a dataset provided by the National Institute of Statistics and Economic Studies (INSEE) which provides location name, 'partial address', unique identifier (INSEE code) and location coordinates. Using this, it becomes possible to resolve many location entities into coordinate sets.

Frames of reference

There's more to be done before these coordinates can be used with standard web APIs. One might be forgiven for presuming that there is just one standard for sharing coordinates, but in fact this is not so; far from it. There are two broad approaches for describing geographic positions on the surface of the planet, which is more or less a (slightly bulgy) sphere. The first is to model the planet as more or less a bulgy sphere, and describe the positions accordingly; "the place we're interested in is this far from the equator and this far around the world from a convenient baseline". An example of this is the World Geodetic System, which happens to be essentially the approach used by Google. It is therefore of particular interest to us, since many programming tools have adopted this convention.The second approach is to give up on the sphere and flatten out the planet instead, or at least that part of the planet which interests you, into a flat sheet.

Our INSEE data, for example, is encoded in a particular coordinate system often used in metropolitan France, Lambert-93. This is a conic map projection method intended to provide good coverage over France. This essentially involves using a conical surface to approximate the surface of the Earth where the cone is in contact with the surface of the sphere.

To see how that might work, take a piece of paper and form it into a cone-shaped paper hat, something like a dunce's cap. Now perch it at a jaunty angle on the planet, or a reasonable facsimile. Then trace the national boundaries onto the paper; this is more difficult to do in person than using a computer, because distortion can be minimised by allowing the paper to pass underneath the surface of the globe at some points. In any case, when you've finished, take the cap off the planet, remove the sticky tape and flatten the paper back out. The quality of the result will depend on whether you have chosen to perch the cap at a sensible angle. Some Lambert projections look positively majestic. Lambert-93 stipulates that the cap must be placed on the planet at just the right angle and height to display France with minimal distortion. It uses the reference parallels of 44°N and 49°N: that is, the cone hits the sphere at the 49°N, and pops out again at 44°N.

If the above process proves to be a little difficult to imagine, you may enjoy this YouTube visual: https://www.youtube.com/watch?v=AGXP23icm4I

Conversion process

Lambert-93 coordinates can be converted into the standards used by most Web APIs using a conversion algorithm: an appropriate approach has been published by France's Institut Géographique National (IGN). The problem with this is that there are not very many convenient approaches for this conversion, since Lambert-93 is not widely used internationally. Industry standard tools such as QGis can perform these transforms. There are also some libraries that provide conversions. A couple of examples below:

The OpenLayers JavaScript library (code source):

viewer.getMap().getExtent().toGeometry().transform(viewer.getMap().getProjection(), new OpenLayers.Projection("IGNF:LAMB93"));

Or, using a Java library (lambert-java):

import net.yageek.lambert.*;
public static void main (String[] args) {
         ...
        xval=Double.parseDouble(args[0]);
        yval=Double.parseDouble(args[1]);
        LambertPoint pt = Lambert.convertToWGS84Deg(xval, yval, LambertZone.Lambert93);
}