Basic Usage¶
The following chapter gives a practical introduction on how to use Pyosmium. It is assumed that you already have a basic knowledge about the OSM data model.
For a more detailed introduction into the design of the osmium library, the reader is referred to the osmium documentation.
Using Handler Classes¶
OSM file parsing by osmium is built around the concept of handlers. A handler is a class with a set of callback functions. Each function processes exactly one type of object as it is read from the file.
Let’s start with a very simple handler that counts the nodes in the input file:
import osmium
class CounterHandler(osmium.SimpleHandler):
def __init__(self):
osmium.SimpleHandler.__init__(self)
self.num_nodes = 0
def node(self, n):
self.num_nodes += 1
A handler first of all needs to inherit from one of the handler classes.
At the moment this is always osmium.SimpleHandler
. Then it
needs to implement functions for each object type it wants to process. In
out case it is exactly one function node(). All other potential callbacks
can be safely ignored.
Now the handler needs to be applied to an OSM file. The easiest way to
accomplish that is to call the apply_file()
convenience function, which in its simplest form only requires the file name
as a parameter. The main routine of the node counting application
therefore looks like this:
if __name__ == '__main__':
h = CounterHandler()
h.apply_file("test.osm.pbf")
print("Number of nodes: %d" % h.num_nodes)
That already finishes our node counting program.
Inspecting the OSM objects¶
Counting nodes is actually boring because it completely ignores the
content of the nodes. So let’s change the handler to only count hotels
(normally tagged with tourism=hotel
). They may be tagged as nodes, ways
or relations, so handler functions for all three types need to be implemented:
import osmium
class HotelCounterHandler(osmium.SimpleHandler):
def __init__(self):
osmium.SimpleHandler.__init__(self)
self.num_nodes = 0
def count_hotel(self, tags):
if tags['tourism'] == 'hotel':
self.num_nodes += 1
def node(self, n):
self.count_hotel(n.tags)
def way(self, w):
self.count_hotel(w.tags)
def relation(self, r):
self.count_hotel(r.tags)
A reference to the object is always given as the only parameter to the
handler functions. The objects have some common methods and attributes,
listed in osmium.osm.OSMObject
, and some that are specific to
each type. As all objects have tags, it is possible to reuse the same
implementation for all types. The main function remains the same.
It is important to remember that the object references that are handed to the handler are only temporary. They will become invalid as soon as the function returns. Handler functions must copy any data that should be kept for later use into their own data structures. This also includes attributes like tag lists.
Handling Geometries¶
Because of the way that OSM data is structured, osmium needs to internally
cache node geometries, when the handler wants to process the geometries of
ways and areas. The apply_file()
method cannot
deduct by itself, if this cache is needed. Therefore locations need to be
explicitly enabled by setting the location parameter to true:
h.apply_file("test.osm.pbf", locations=True, idx='sparse_mem_array')
The third parameter idx is optional and states what kind of cache osmium is supposed to use. The default sparse_mem_array is a good choice for small to medium size extracts of OSM data. If you plan to process the whole planet file, dense_mmap_array is better suited. If you want the cache to be persistent across invocations, you can use dense_file_array giving an additional file location for the cache like that:
h.apply_file("test.osm.pbf", locations=True, idx='sparse_file_array,example.nodecache')
where example.nodecache is the name of the cache file.
Interfacing with Shapely¶
Pyosmium is a library for processing OSM files and therefore offers almost no functionality for processing geometries further. For this other libraries exist. To interface with these libraries you can simply convert the osmium geometries into WKB or WKT format and import the result. The following example uses the libgeos wrapper Shapely to compute the total way length:
import osmium
import shapely.wkb as wkblib
# A global factory that creates WKB from a osmium geometry
wkbfab = osmium.geom.WKBFactory()
class WayLenHandler(osmium.SimpleHandler):
def __init__(self):
osmium.SimpleHandler.__init__(self)
self.total = 0
def way(self, w):
wkb = wkbfab.create_linestring(w)
line = wkblib.loads(wkb, hex=True)
# Length is computed in WGS84 projection, which is practically meaningless.
# Lets pretend we didn't notice, it is an example after all.
self.total += line.length
if __name__ == '__main__':
h = WayLenHandler()
h.apply_file("test.osm.pbf", locations=True)
print("Total length: %f" % h.total)