Sorting Countries by Region - Python

This is a discussion on Sorting Countries by Region - Python ; Hi all, I'm ****yzing some data that has a lot of country data. What I need to do is sort through this data and output it into an excel doc with summary information. The countries, though, need to be sorted ...

+ Reply to Thread
Results 1 to 10 of 10

Sorting Countries by Region

  1. Default Sorting Countries by Region

    Hi all,

    I'm ****yzing some data that has a lot of country data. What I need
    to do is sort through this data and output it into an excel doc with
    summary information. The countries, though, need to be sorted by
    region, but the way I thought I could do it isn't quite working out.
    So far I can only successfully get the data alphabetically.

    Any ideas?

    import xlrd
    import pyExcelerator

    def get_countries_list(list):
    countries_list=[]
    for country in countries:
    if country not in countries_list:
    countries_list.append(country)

    EU = ["Austria","Belgium", "Cyprus","Czech Republic",
    "Denmark","Estonia", "Finland"]
    NA = ["Canada", "United States"]
    AP = ["Australia", "China", "Hong Kong", "India", "Indonesia",
    "Japan"]
    Regions_tot = {'European Union':EU, 'North America':NA, 'Asia
    Pacific':AP,}

    path_file = "c:\\1\country_data.xls"
    book = xlrd.open_workbook(path_file)
    Counts = book.sheet_by_index(1)
    countries= Counts.col_values(0,start_rowx=1, end_rowx=None)

    get_countries_list(countries)

    wb=pyExcelerator.Workbook()
    matrix = wb.add_sheet("matrix")

    n=1
    for country in unique_countries:
    matrix.write(n,1, country)
    n = n+1

    wb.save('c:\\1\\matrix.xls')




  2. Default Fwd: Sorting Countries by Region

    Just a few notes:

    1) get_countries_list

    What is the purpose of that function? Besides a few errors (an
    argument named list, no value returned), it seems you just want to
    remove duplicates from a list called countries. You can do that
    transforming the list to a 'set'.

    new_countries = list( set(countries) )

    2) I would suggest using countries.sort(...) or sorted(countries,...),
    specifying cmp or key options too sort by region instead.

    3) Instead of doing this:

    for country in unique_countries:
    matrix.write(n,1, country)
    n = n+1

    Do something like

    for i, cou in enumerate(unique_countries):
    matrix.write(1+i, 1, cou)

    (so you dont need to increment the 'i' variable manually)


    On Nov 16, 2007 2:13 PM, <patrick.waldo@gmail.com> wrote:
    > Hi all,
    >
    > I'm ****yzing some data that has a lot of country data. What I need
    > to do is sort through this data and output it into an excel doc with
    > summary information. The countries, though, need to be sorted by
    > region, but the way I thought I could do it isn't quite working out.
    > So far I can only successfully get the data alphabetically.
    >
    > Any ideas?
    >
    > import xlrd
    > import pyExcelerator
    >
    > def get_countries_list(list):
    > countries_list=[]
    > for country in countries:
    > if country not in countries_list:
    > countries_list.append(country)
    >
    > EU = ["Austria","Belgium", "Cyprus","Czech Republic",
    > "Denmark","Estonia", "Finland"]
    > NA = ["Canada", "United States"]
    > AP = ["Australia", "China", "Hong Kong", "India", "Indonesia",
    > "Japan"]
    > Regions_tot = {'European Union':EU, 'North America':NA, 'Asia
    > Pacific':AP,}
    >
    > path_file = "c:\\1\country_data.xls"
    > book = xlrd.open_workbook(path_file)
    > Counts = book.sheet_by_index(1)
    > countries= Counts.col_values(0,start_rowx=1, end_rowx=None)
    >
    > get_countries_list(countries)
    >
    > wb=pyExcelerator.Workbook()
    > matrix = wb.add_sheet("matrix")
    >
    > n=1
    > for country in unique_countries:
    > matrix.write(n,1, country)
    > n = n+1
    >
    > wb.save('c:\\1\\matrix.xls')
    >
    >
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >


  3. Default Re: Sorting Countries by Region

    patrick.waldo@gmail.com wrote:
    > Hi all,
    >
    > I'm ****yzing some data that has a lot of country data. What I need
    > to do is sort through this data and output it into an excel doc with
    > summary information. The countries, though, need to be sorted by
    > region, but the way I thought I could do it isn't quite working out.
    > So far I can only successfully get the data alphabetically.
    >
    > Any ideas?
    >
    > import xlrd
    > import pyExcelerator
    >
    > def get_countries_list(list):

    it isn't a good idea to use a built-in object as a variable name
    > countries_list=[]
    > for country in countries:ii
    > if country not in countries_list:
    > countries_list.append(country)
    >
    > EU = ["Austria","Belgium", "Cyprus","Czech Republic",
    > "Denmark","Estonia", "Finland"]
    > NA = ["Canada", "United States"]
    > AP = ["Australia", "China", "Hong Kong", "India", "Indonesia",
    > "Japan"]
    > Regions_tot = {'European Union':EU, 'North America':NA, 'Asia
    > Pacific':AP,}

    i would create a class to capture country information, e.g.
    class country(object):
    def __init__(self, name, size = 0, population = 0):
    self.name = name
    self.size = size
    self.poplation = population

    def __cmp__(self, other):
    if self.name < other.name:
    return -1
    elif self.name > other.name:
    return 1
    else:
    return 0

    then you can set up the world as

    world = {'NA': [country("United States"), country("Canada")], \
    'Europe': [country("Belgium"), country("Austria")]}


    now you can sort and print it easy

    for region in world:
    print region
    lands = world[region]
    lands.sort()
    for land in lands:
    print land.name

    the sort works because the country objects have a method __cmp__

    >
    > path_file = "c:\\1\country_data.xls"
    > book = xlrd.open_workbook(path_file)
    > Counts = book.sheet_by_index(1)
    > countries= Counts.col_values(0,start_rowx=1, end_rowx=None)
    >
    > get_countries_list(countries)
    >
    > wb=pyExcelerator.Workbook()
    > matrix = wb.add_sheet("matrix")
    >
    > n=1
    > for country in unique_countries:
    > matrix.write(n,1, country)
    > n = n+1
    >
    > wb.save('c:\\1\\matrix.xls')
    >
    >
    >


    i'm not familiar with the excel modules so i can't help you with that

  4. Default Re: Sorting Countries by Region

    On Nov 16, 8:28 pm, martyw <wit...@kpnplanet.nl> wrote:
    >
    > i would create a class to capture country information, e.g.

    <snip>

    What is the advantage of this:

    > def __cmp__(self, other):
    > if self.name < other.name:
    > return -1
    > elif self.name > other.name:
    > return 1
    > else:
    > return 0
    >


    over
    def __cmp__(self,other):
    return cmp(self.name,other.name)

    ?

    --
    Alan

  5. Default Re: Fwd: Sorting Countries by Region

    Great, this is very helpful. I'm new to Python, so hence the
    inefficient or nonsensical code!

    >
    > 2) I would suggest using countries.sort(...) or sorted(countries,...),
    > specifying cmp or key options too sort by region instead.
    >


    I don't understand how to do this. The countries.sort() lists
    alphabetically and I tried to do a lambda x,y: cmp() type function,
    but it doesn't sort correctly. Help with that?

    For martyw's example, I don't need to get any sort of population
    info. I'm actually getting the number of various types of documents.
    So the entry is like this:

    Argentina Food and Consumer Products Food Additives Color
    Additives 1
    Argentina Food and Consumer Products Food Additives Flavors 1
    Argentina Food and Consumer Products Food Additives
    General 6
    Argentina Food and Consumer Products Food Additives labeling 1
    Argentina Food and Consumer Products Food Additives Prohibited
    Additives 1
    Argentina Food and Consumer Products Food Contact Cellulose 1
    Argentina Food and Consumer Products Food Contact Food
    Packaging 1
    Argentina Food and Consumer Products Food Contact Plastics 4
    Argentina Food and Consumer Products Food Contact
    Waxes 1
    Belize
    etc...

    So I'll need to add up all the entries for Food Additives and Food
    contacts, the other info like Color Additives isn't important.

    So I will have an output like this
    Food Additives Food Contact
    Argentina 10 7
    Belize
    etc...

    Thanks so much for the help!

  6. Default Re: Fwd: Sorting Countries by Region

    About the sort:

    Check this (also on http://pastebin.com/f12b5b6ca )

    def make_regions():

    # Values you provided
    EU = ["Austria","Belgium", "Cyprus","Czech Republic",
    "Denmark","Estonia", "Finland"]
    NA = ["Canada", "United States"]
    AP = ["Australia", "China", "Hong Kong", "India", "Indonesia",
    "Japan"]
    regions = {'European Union':EU, 'North America':NA, 'Asia Pacific':AP}

    ans = {}
    for reg_name, reg in regions.items():
    for cou in reg:
    ans[cou] = reg_name
    return ans

    def cmp_region(cou1, cou2):
    ans = cmp(regions[cou1], regions[cou2])
    if ans: # If the region is the same, sort by country
    return cmp(cou1, cou2)
    else:
    return ans

    regions = make_regions()
    some_countries = ['Austria', 'Canada', 'China', 'India']

    print 'Old:', some_countries
    some_countries.sort(cmp_region)
    print 'New:', some_countries


    Why that code?
    Because the first thing I want is a dictionary where the key is the
    name of the country and the value is the region. Then, I just make a
    quick function that compares considering the region and country.
    Finally, I sort.

    Btw, the code is just a quick hack, as it can be improved -a lot-.


    About the rest of your code:
    - martyw's example is much more useful than you think. Why? because
    you can just iterate across your document, adding the values you get
    to the adequate object property. That is, instead of using size or
    pop, use the variables you are interested in.

    Best, and good luck with python,
    Sergio

    On Nov 16, 2007 5:15 PM, <patrick.waldo@gmail.com> wrote:
    > Great, this is very helpful. I'm new to Python, so hence the
    > inefficient or nonsensical code!
    >
    > >
    > > 2) I would suggest using countries.sort(...) or sorted(countries,...),
    > > specifying cmp or key options too sort by region instead.
    > >

    >
    > I don't understand how to do this. The countries.sort() lists
    > alphabetically and I tried to do a lambda x,y: cmp() type function,
    > but it doesn't sort correctly. Help with that?
    >
    > For martyw's example, I don't need to get any sort of population
    > info. I'm actually getting the number of various types of documents.
    > So the entry is like this:
    >
    > Argentina Food and Consumer Products Food Additives Color
    > Additives 1
    > Argentina Food and Consumer Products Food Additives Flavors 1
    > Argentina Food and Consumer Products Food Additives
    > General 6
    > Argentina Food and Consumer Products Food Additives labeling 1
    > Argentina Food and Consumer Products Food Additives Prohibited
    > Additives 1
    > Argentina Food and Consumer Products Food Contact Cellulose 1
    > Argentina Food and Consumer Products Food Contact Food
    > Packaging 1
    > Argentina Food and Consumer Products Food Contact Plastics 4
    > Argentina Food and Consumer Products Food Contact
    > Waxes 1
    > Belize
    > etc...
    >
    > So I'll need to add up all the entries for Food Additives and Food
    > contacts, the other info like Color Additives isn't important.
    >
    > So I will have an output like this
    > Food Additives Food Contact
    > Argentina 10 7
    > Belize
    > etc...
    >
    > Thanks so much for the help!
    > --
    >
    > http://mail.python.org/mailman/listinfo/python-list
    >


  7. Default Re: Fwd: Sorting Countries by Region

    This is how I solved it last night in my inefficient sort of way and
    after re-reading some of my Python books on dictionaries. So far this
    gets the job done. However, I'd like to test if there are any
    countries in the excel input that are not represented, ie the input is
    all the information I have and the dictionary functions as the
    information I expect. What I did worked yesterday, but doesn't work
    anymore more...see comment

    Otherwise I tried doing this:

    for i, country in countries_list:
    if country in REGIONS_COUNTRIES['European Union']:
    matrix.write(i+2, 1, country)
    but I got "ValueError: too many values to unpack"

    Again, this has been a great help. Any ideas of how I can make this a
    bit more efficient, as I'm dealing with 5 regions and numerous
    countries, would be greatly appreciated. Here's the code:


    #keeping all the countries short
    REGIONS_COUNTRIES = {'European Union':["Austria","Belgium"," "France",
    "Germany", "Greece"],\
    'North America':["Canada", "United States"]}

    path_file = "c:\\1\\build\\data\\matrix2\\Update_oct07a.xls"

    book = xlrd.open_workbook(path_file)
    Counts = book.sheet_by_index(1)
    wb=pyExcelerator.Workbook()
    matrix = wb.add_sheet("matrix")

    countries = Counts.col_values(0,start_rowx=1, end_rowx=None)
    countries_list = list(set(countries))
    countries_list.sort()

    #This seems to not work today and I don't know why
    #for country in countries_list:
    # if country not in REGIONS_COUNTRIES['European Union'] or not in
    REGIONS_COUNTRIES['North America']:
    # print "%s is not in the expected list", country

    #This sorts well
    n=2
    for country in countries_list:
    if country in REGIONS_COUNTRIES['European Union']:
    matrix.write(n, 1, country)
    n=n+1
    for country in countries_list:
    if country in REGIONS_COUNTRIES['North America']:
    matrix.write(n, 1, country)
    n=n+1

    wb.save('c:\\1\\matrix.xls')



    On Nov 17, 1:12 am, "Sergio Correia" <sergio.corr...@gmail.com> wrote:
    > About the sort:
    >
    > Check this (also onhttp://pastebin.com/f12b5b6ca)
    >
    > def make_regions():
    >
    > # Values you provided
    > EU = ["Austria","Belgium", "Cyprus","Czech Republic",
    > "Denmark","Estonia", "Finland"]
    > NA = ["Canada", "United States"]
    > AP = ["Australia", "China", "Hong Kong", "India", "Indonesia",
    > "Japan"]
    > regions = {'European Union':EU, 'North America':NA, 'Asia Pacific':AP}
    >
    > ans = {}
    > for reg_name, reg in regions.items():
    > for cou in reg:
    > ans[cou] = reg_name
    > return ans
    >
    > def cmp_region(cou1, cou2):
    > ans = cmp(regions[cou1], regions[cou2])
    > if ans: # If the region is the same, sort by country
    > return cmp(cou1, cou2)
    > else:
    > return ans
    >
    > regions = make_regions()
    > some_countries = ['Austria', 'Canada', 'China', 'India']
    >
    > print 'Old:', some_countries
    > some_countries.sort(cmp_region)
    > print 'New:', some_countries
    >
    > Why that code?
    > Because the first thing I want is a dictionary where the key is the
    > name of the country and the value is the region. Then, I just make a
    > quick function that compares considering the region and country.
    > Finally, I sort.
    >
    > Btw, the code is just a quick hack, as it can be improved -a lot-.
    >
    > About the rest of your code:
    > - martyw's example is much more useful than you think. Why? because
    > you can just iterate across your document, adding the values you get
    > to the adequate object property. That is, instead of using size or
    > pop, use the variables you are interested in.
    >
    > Best, and good luck with python,
    > Sergio
    >
    > On Nov 16, 2007 5:15 PM, <patrick.wa...@gmail.com> wrote:
    >
    > > Great, this is very helpful. I'm new to Python, so hence the
    > > inefficient or nonsensical code!

    >
    > > > 2) I would suggest using countries.sort(...) or sorted(countries,...),
    > > > specifying cmp or key options too sort by region instead.

    >
    > > I don't understand how to do this. The countries.sort() lists
    > > alphabetically and I tried to do a lambda x,y: cmp() type function,
    > > but it doesn't sort correctly. Help with that?

    >
    > > For martyw's example, I don't need to get any sort of population
    > > info. I'm actually getting the number of various types of documents.
    > > So the entry is like this:

    >
    > > Argentina Food and Consumer Products Food Additives Color
    > > Additives 1
    > > Argentina Food and Consumer Products Food Additives Flavors 1
    > > Argentina Food and Consumer Products Food Additives
    > > General 6
    > > Argentina Food and Consumer Products Food Additives labeling 1
    > > Argentina Food and Consumer Products Food Additives Prohibited
    > > Additives 1
    > > Argentina Food and Consumer Products Food Contact Cellulose 1
    > > Argentina Food and Consumer Products Food Contact Food
    > > Packaging 1
    > > Argentina Food and Consumer Products Food Contact Plastics 4
    > > Argentina Food and Consumer Products Food Contact
    > > Waxes 1
    > > Belize
    > > etc...

    >
    > > So I'll need to add up all the entries for Food Additives and Food
    > > contacts, the other info like Color Additives isn't important.

    >
    > > So I will have an output like this
    > > Food Additives Food Contact
    > > Argentina 10 7
    > > Belize
    > > etc...

    >
    > > Thanks so much for the help!
    > > --

    >
    > >http://mail.python.org/mailman/listinfo/python-list



  8. Default Re: Fwd: Sorting Countries by Region

    En Sat, 17 Nov 2007 08:34:43 -0300, <patrick.waldo@gmail.com> escribió:

    > for i, country in countries_list:
    > if country in REGIONS_COUNTRIES['European Union']:
    > matrix.write(i+2, 1, country)
    > but I got "ValueError: too many values to unpack"


    Remove the i, and try again...



    --
    Gabriel Genellina


  9. Default Re: Fwd: Sorting Countries by Region


    On Sat, 2007-11-17 at 03:34 -0800, patrick.waldo@gmail.com wrote:
    > #This seems to not work today and I don't know why
    > #for country in countries_list:
    > # if country not in REGIONS_COUNTRIES['European Union'] or not in
    > REGIONS_COUNTRIES['North America']:
    > # print "%s is not in the expected list", country
    >


    This snippet fails because you want 'and' rather than 'or'. In this
    one, you test to see that country is not in EU, and if you have, say
    'France' as your country, the first half evaluates false, so the or
    tells it to try again with North America. Lo and behold, france is not
    in North America, so the second half returns true. False or True
    returns True, so your code prints "France is not in the expected list."

    Cheers,
    Cliff



  10. Default Re: Sorting Countries by Region

    nothing, Alan wrote:
    > On Nov 16, 8:28 pm, martyw <wit...@kpnplanet.nl> wrote:
    >> i would create a class to capture country information, e.g.

    > <snip>
    >
    > What is the advantage of this:
    >
    >> def __cmp__(self, other):
    >> if self.name < other.name:
    >> return -1
    >> elif self.name > other.name:
    >> return 1
    >> else:
    >> return 0
    >>

    >
    > over
    > def __cmp__(self,other):
    > return cmp(self.name,other.name)
    >
    > ?
    >
    > --
    > Alan

    yours is better!

+ Reply to Thread

Similar Threads

  1. List of countries in a DropDownBox
    By Application Development in forum DOTNET
    Replies: 3
    Last Post: 05-14-2007, 09:42 PM
  2. Link to stock information in countries not listed
    By Application Development in forum Microsoft Money
    Replies: 3
    Last Post: 04-16-2007, 03:38 PM
  3. Mapping countries
    By Application Development in forum Idl-pvwave
    Replies: 2
    Last Post: 12-05-2006, 02:54 PM
  4. List of countries
    By Application Development in forum DOTNET
    Replies: 4
    Last Post: 08-15-2004, 12:51 AM
  5. OWA connection from Different countries
    By Application Development in forum Microsoft Exchange
    Replies: 1
    Last Post: 06-08-2004, 06:51 AM