This is home to Cosmogony, a project that aims at providing an efficient tool to quickly use and update worldwide geographical regions. It returns geographical zones with a structured hierarchy to easily know that Paris is a city
in the state
Île-de-France in the country
France. The architecture of Cosmogony is based on OpenStreetMap data and on the exploitation of well defined libpostal rules to type each zone according to its country. Then the resulting hierarchy is built thanks to geographical inclusion. An example of a full data extract can be browsed at http://cosmogony.world.
To explore and navigate fluently in the built hierarchy, Cosmogony comes along with two other tools:
Below is a brief visualisation of a basic use case of the Cosmogony Explorer:
🚧 Until we propose a direct data download, you have to extract your geographic regions by yourself (see below). 🚧
The best way to explore the data (i.e. the coverage, the zones metadata, the hierarchy...) is our Cosmogony Explorer
🚧 In the future, we may create other tools to use the data. Please share your ideas and needs in the issues. 🚧
You can build cosmogony to extract the regions on your own.
Here are the necessary manual steps to build cosmogony :
curl https://sh.rustup.rs -sSf | sh # intall rust
apt-get install libgeos-dev # install GEOS
git clone https://github.com/osm-without-borders/cosmogony.git # Clone this repo
cd cosmogony; # enter the directory
git submodule update --init # update the git submodules
cargo build --release # finally build cosmogony
You can now grab some OSM pbf and extract your geographic zones:
cargo run --release -- generate -i /path/to/your/file.osm.pbf
Check out cosmogony help for more options:
cargo run --release -- -h
Note: the default subcommand is the generate
subcommand, so cosmogony -i <osm-file> -o output file
if the same as cosmogony generate -i <osm-file> -o output file
To generate a world cosmogony on a server withtout a lot of RAM, you can generate cosmogonies on split non overlapping osm files, without a shared parent (eg. split by continent or country) and merge the generated cosmogony.
To merge several cosmogonies into one you can use the custom subcommand merge
:
cargo run --release -- merge *.jsonl -o merged_cosmo.jsonl
Note: to reduce the memory footprint, it can only merge json lines cosmogonies (so .jsonl
or .jsonl.gz
).
The initial purpose of Cosmogony is to enhance mimir, our geocoder (See the founding issue for a bit of context). Another common use case is to create geospatially aware statistics, such as choropleth maps. Anyway, we'd love to know what you've built from this, so feel free to add your use cases in Awesome Cosmogony.
OpenStreetMap (OSM) seems the best datasource for our use case. However the OSM administrative regions (admins) have several drawbacks:
- admin_level: The world is a complex place where each country has its own administrative division. OSM uses an
admin_level
tag, with values ranging from 1 to ~10 to allow consistent rendering of the borders among countries. This is fine for making maps, but if you want a world list of cities or regions, you still need local and specific knowledge to find which admin_level to use in each country. - no existing hierarchy: indeed the OSM data model rests only on
nodes
,ways
andrelation
without any structure.
To mitigate this, the general idea is to take an OSM pbf file and to:
- use a geometric algorithm to define which admin belongs to another admin (we'll start with shapes exact inclusion and see if that's enough).
- use the libpostal rules to type the admin depending on its country.
OSM administrative regions may not be mapped with the same precision all over the earth but the data is easy to update and the update will benefit the community.
Beyond OSM, we will possibly consider in the future using other data sources (with compliant license).
However we don't want cosmogony
to be too complex (as the great WhosOnFirst is (see below)
The libpostal types seem nice (and made by brighter people than us):
- suburb: usually an unofficial neighborhood name like "Harlem", "South Bronx", or "Crown Heights"
- city_district: these are usually boroughs or districts within a city that serve some official purpose e.g. "Brooklyn" or "Hackney" or "Bratislava IV"
- city: any human settlement including cities, towns, villages, hamlets, localities, etc.
- state_district: usually a second-level administrative division or county.
- state: a first-level administrative division. Scotland, Northern Ireland, Wales, and England in the UK are mapped to "state" as well (convention used in OSM, GeoPlanet, etc.)
- country_region: informal subdivision of a country without any political status
- country: sovereign nations and their dependent territories, anything with an ISO-3166 code.
Cosmogony reads OSM tags to determine names and labels for all zones, in all available languages.
In addition to name:*
tags from boundary objects themselves, other names from related objects are used
as they may provide more languages :
- nodes with role
label
(if present) - nodes with role
admin_center
(if relevant: for cities, or on matching wikidata ID)
Note that these additional
name:*
values are included in zonetags
in the output to help reusing, even if they are not part of the OSM object tags.
Below is a brief example of the information contained in the cosmogony output.
{
"zones":[
{"id":0,
"osm_id":"relation:110114",
"admin_level":8,
"zone_type":"city",
"name":"Sand Rock",
"zip_codes":[],
"center":{"coordinates":[-85.77153961457083,34.2303942501858],"type":"Point"},
"bbox": [-85.803571, 34.203915, -85.745058, 34.26666],
"geometry":{
"coordinates":"..."
},
"tags":{
"admin_level":"8",
"border_type":"city",
"boundary":"administrative",
"is_in":"USA"
},
"parent":"null",
"wikidata":"Q79669"}
],
"meta":{
"osm_filename":"alabama.osm.pbf",
"stats":{"level_counts":{"6":64,"8":272},
"zone_type_counts":{"City":272,"StateDistrict":64},
"wikidata_counts":{"6":58,"8":202},
"zone_with_unkwown_country_rules":{},
"unhandled_admin_level":{},
"zone_without_country":0}
}
}
You can check the cosmogony file built with our Cosmogony Data Dashboard.
🚧 Ideas and other contributions welcomed in issue #4 🚧
Cosmogony, just like OpenStreetMap, emphasizes local knowledge: even if you can't code, you can help us to make Cosmogony go worldwide 🚀
If the cosmogony of your country does not look good, here is what you can do to fix it:
- Find your country here: https://github.com/osm-without-borders/libpostal/tree/master/resources/boundaries/osm
- Edit the config file to map the relevant administrative zones with libpostal types and OSM admin_level
- the OSM wiki page about admin_level may be useful
- The French config file is a good example if you need inspiration
- Make a Pull Request with your changes
- Find a reliable data source (Wikipedia, Wikidata, Eurostat NUTS & LAU, etc)
- Update the reference file from our Data Dashboard with the number of zones that actually exists in the country: https://github.com/osm-without-borders/cosmogony-data-dashboard/blob/master/reference_stats_values.csv
- If you are unsure if the right number of cities is 3314 or 3322, you can use the
expected_min
andexpected_max
columns 😉 - If the number of zones already in OSM does not match the expected number, please mark the test by putting
yes
in theis_known_failure
column
- If you are unsure if the right number of cities is 3314 or 3322, you can use the
- Make a Pull Request with your changes
-
Mapzen borders project
deprecated, and without cascading hierarchy
Our main inspiration source 💖 Hard to maintain because of the many sources involved that needs deduplication and concordances, difficult to ensure a coherent hierarchy (an object Foo can have an object Bar as a child whereas Foo is not listed as a parent of Bar), etc
Pretty cool if you just need to inspect the coverage or export a few administrative areas. Still need country specific knowledge to use worldwide.
-
WhateverShapes : quattroshapes, alphashapes, betashapes
Without cascading hierarchy. Duno if it's up to date, and how we can contribute.
All code in this repository is under the Apache License 2.0.
This project uses OpenStreetMap data, licensed under the ODbL by the OpenStreetMap Foundation. You need to visibly credit OpenStreetMap and its contributors if you use or distribute the data from cosmogony. Read more on OpenStreetMap official website.