Melbourne and Bristol coming up as US only... #16

rdlou · 2018-10-18T15:25:18Z

Hi, I am running single cities through the country_mentions func and both of them are coming up only with "OrderedDict([('US', 1)])"

cities = ['Melbourne', 'Bristol']

for city in cities:
    country_dict = GeoText(city.title()).country_mentions
    print(country_dict)

I understand that these are places in the US, but obviously Melbourne is pretty significant in Australia, as is Bristol in the UK. Should the Dict come back with numerous country mentions?

Thanks!

The text was updated successfully, but these errors were encountered:

rdlou · 2018-10-31T15:09:22Z

Paris comes up as United States, Sydney comes up as Canada....

iwpnd · 2018-11-02T11:22:08Z

Think of geotext as the general framework on how to extract named entities (low level approach) that are then looked up in an exemplary table of cities. If you want to be able to distinguish between cities in the US, Canada or Australia you could always provide the proper logic in separate lookup tables on your own.

rdlou · 2018-11-02T11:29:35Z

Thanks @iwpnd iwpnd. I've ended up doing that using geocache So it will come back with a list which has city, country and confidence score.

So if you said "I live in London" it would come back with:

[{"city":"London","country":"United Kingdom","confidence": 50},{"city":"London","country":"Canada","confidence": 25}]

London UK gets a higher score because it has a higher population.... That sort of thing. If "Ontario" or "Canada" was in the sentence then that would get a better score. Might upload the code.

Thanks for your response, appreciate it.

iwpnd · 2018-11-02T11:36:06Z

I like the idea, thanks for sharing!

VanessaVanG · 2018-11-30T19:24:01Z

rdlou -- your idea seems great! This is what I ended up doing -- I made a text doc like this:
Dublin: Cork,
Paris: Dijon,
Moscow: Vladivostok,
...
where the first city is the one that's mistaken and the second city is a city that returns the correct country (as in there isn't another city by that name in the US).
I used regex and made replacements. Here's my code: https://github.com/MAVRYK/GW-Project3/blob/master/data_prep/location_extractor.ipynb

(In case you're wondering about the stopwords I removed, they're words like Franklin
Harrison
Liberal
Helena
Defiance
that clearly aren't a city name.)

albertc1 · 2019-01-17T01:45:23Z

I was having the same problem. My simple solution was to sort the cities15000.txt datafile by ascending population, so that the biggest cities get processed later and overwrite the smaller cities in GeoText.index.cities.

#18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Melbourne and Bristol coming up as US only... #16

Melbourne and Bristol coming up as US only... #16

rdlou commented Oct 18, 2018 •

edited

Loading

rdlou commented Oct 31, 2018

iwpnd commented Nov 2, 2018

rdlou commented Nov 2, 2018 •

edited

Loading

iwpnd commented Nov 2, 2018

VanessaVanG commented Nov 30, 2018

albertc1 commented Jan 17, 2019

Melbourne and Bristol coming up as US only... #16

Melbourne and Bristol coming up as US only... #16

Comments

rdlou commented Oct 18, 2018 • edited Loading

rdlou commented Oct 31, 2018

iwpnd commented Nov 2, 2018

rdlou commented Nov 2, 2018 • edited Loading

iwpnd commented Nov 2, 2018

VanessaVanG commented Nov 30, 2018

albertc1 commented Jan 17, 2019

rdlou commented Oct 18, 2018 •

edited

Loading

rdlou commented Nov 2, 2018 •

edited

Loading