29 Jun 2017
citys = ["CN/Beijing","CN/Shanghai","US/Newyork","US/Washington"]
我们有一个数据源(不一定是一个list,也有可能是任何格式的数据),其中包含了国家代号和城市名称,并以一种特定的格式表示(此例是用/间隔)。我们需要去解析它,并得到一个dict,key是国家代码,value是城市名称。
citys = ["CN/Beijing","CN/Shanghai","US/Newyork","US/Washington"] result = {} for item in citys: country, city = item.split("/") if not country in result: result[country] = [] result[country].append(city) print result
output: {‘CN’: [‘Beijing’, ‘Shanghai’], ‘US’: [‘Newyork’, ‘Washington’]}
可是这样付出的代价是,每个item,我们查询了三次country这个key是否存在
setdefault是dict的一个方法,dict1.setdefault(k, d)的含义是当k在dict1中不存在时,执行dict1[k] = d
citys = ["CN/Beijing","CN/Shanghai","US/Newyork","US/Washington"] result = {} for item in citys: country, city = item.split("/") result.setdefault(country, []).append(city) print result
output: {‘CN’: [‘Beijing’, ‘Shanghai’], ‘US’: [‘Newyork’, ‘Washington’]}
同样的结果,我们只使用了一次查询
from collections import defaultdict citys = ["CN/Beijing","CN/Shanghai","US/Newyork","US/Washington"] result = defaultdict(list) for item in citys: country, city = item.split("/") result[country].append(city) print result
output: defaultdict(
, {‘CN’: [‘Beijing’, ‘Shanghai’], ‘US’: [‘Newyork’, ‘Washington’]})
当我们创建defaultdict时,我们传入了一个default_factory的参数,此例中为list(result = defaultdict(list)),当key不存在的时候,执行result[country] = list()
defaultdict扩展
# 默认default_factory为list result = defaultdict(list) result["test"] print result defaultdict(<type 'list'>, {'test': []}) # 默认default_factory为int result = defaultdict(int) result["test"] print result defaultdict(<type 'int'>, {'test': 0}) # 默认default_factory为自定义function def default_str(): return "default string" result = defaultdict(int) result["test"] print result defaultdict(<function default_str at 0x7f5a8446e2a8>, {'test': 'default string'})