Study Notes on Dictionaries for Fluent Python
Source: WeChat - April Link.
This one is a reading note. Main topics: * Common dictionary methods * How to handle keys that can't be looked up * Variants of the dict type in the standard library * How hash tables work
Pan-mapping type
The collections.abc module has two abstract base classes, Mapping and MutableMapping, which serve to define formal interfaces for dict and other similar types.
All mapping types in the standard library are implemented using dict, and they have a common restriction that only hashable data types can be used as keys in these mappings.
What is a hashable data type? In the python glossary (https://docs.python.org/3/glossary.html#term-hashable), the definition of a hashable type is this: if an object is hashable, then its hash value is constant for the lifetime of the object, and the object needs to implement methods. Also hashable objects have to have methods so that they can be compared with other keys. If two hashable objects are equal, then their hashes must only be the same According to this definition, atomic immutable types (str, bytes, and numeric types) are all hashable, frozenset is also hashable (because by its definition, only hashable types can be held in frozenset), and a tuple is also hashable if all within it are hashable (a tuple is immutable, but such a tuple cannot be considered immutable if the elements within it are mutable types). In general, objects of user-defined types are hashable, and the hash value is the return value of their id() function, so these objects are not equal when compared. (If an object implements the __eq__ method and uses the object's internal state in the method, then the object is hashable only if all that internal state is immutable. ) Based on these definitions, the dictionary provides many kinds of construction methods. https://docs.python.org/3/library/stdtypes.html#mapping-types-dictThis page has an example to illustrate the different ways to create a dictionary
>>>a=dict(one=1,two=2,three=3)
>>>b={'one':1,'two':2,'three':3}
>>>c=dict(zip(['one','two','three'],[1,2,3]))
>>>d=dict([('two',2),('one',1),('three',3)])
>>>e=dict({'three':3,'one':1,'two':2})
>>>a==b==c==d==e
True
In addition to these methods, a new dict can be constructed using dictionary derivation.
dictionary derivation
Since Python 2.7, the concepts of list derivations and generator expressions have been ported to dictionaries, giving rise to dictionary derivations. Dictionary derivation (dictcomp) can construct a dictionary from any iterable object with key-value pairs as elements. For example.
>>>data=[(1,'a'),(2,'b'),(3,'c')]
>>>data_dict=
>>>data_dict
{1:'a',2:'b',3:'c'}
Common mapping methods
The following table shows us the common methods for dict, defaultdict and OrderedDict (the latter two are variants of dict and are located within the collections module).
default_factory is not a method, but a callable object whose value defaultdict is set by the user when it is initialized.
OrderedDict.popitem() removes the first inserted element of the dictionary (first-in-first-out); the optional argument last removes the last inserted element if the value is true (last-in-first-out).
Use setdefault to handle keys that are not found
Python throws an exception when the dictionary d[k] cannot find the correct key, normally we use d.get(k, default) instead of d[k] to give a default value to a key that cannot be found, and also use the more efficient setdefault
my_dict.setdefault(key,[]).append(new_value)
# Equivalent to
ifkeynotinmy_dict:
my_dict[key]=[]
my_dict[key].append(new_value)
Both pieces of code have the same effect, except that the latter requires at least two key lookups, three if they don't exist, while the entire operation can be done with just one. So, what do we do with keys that we can't find when we fetch the value?
Resilient queries for mapping
Sometimes, even if a key doesn't exist in the mapping, we want to get a default value when reading through that key. There are two ways to help us get there, a subclass of this type rather than a normal dict, and then implementing methods in the subclass. defaultdict: an option for handling keys that are not found
First let's look at how to use defaultdict.
importcollections
index=collections.defaultdict(list)
index[new_key].append(new_value)
Here we have created a new dictionary index, and if the key does not exist in index, the expression will do the following.
Call list() to create a new list
Take this new list as the value,'new_key' As its key, put in index in
Returns a reference to this list.
And the callable object used to generate the default value is stored in the instance property named The default_factory in defaultdict will only be called in __getitem__ and will not work in other methods. For example, the expression index[k] would call some default value created by default_factory, while index.get(k) would return None. (This is because the special method __missing__ calls default_factory when a defaultdict encounters a key it can't find; in fact, this feature is supported by all mapping methods.) Special method __missing__
All mappings involve the __missing__ method when dealing with keys that are not found. But the base class dict does not provide this method. However, if there is a class that inherits from dict and then that inherited class provides the __missing__ method, then Python will automatically call it when __getitem__ encounters a key not found, rather than throwing a KeyError exception. method will only be called. Providing the __missing__ method has no effect on the get or __contains__ (which is used by the in operator) methods. The following code implements the StrKeyDict0 class. The StrKeyDict0 class converts non-string keys to strings at query time.
classStrKeyDict0(dict):# Inherit from dict
def__missing__(self,key):
ifisinstance(key,str):
# If the key not found is itself a string, throw KeyError
raiseKeyError(key)
# If the key not found is not a string, convert it to a string and find it again
returnself[str(key)]
defget(self,key,default=None):
# The get method delegates the lookup to __getitem__ in the form of self[key], so that the key can be given another chance via __missing__ if the lookup is declared a failure
try:
returnself[key]
exceptKeyError:
# If a KeyError is thrown, it means that __missing__ has also failed and returns default
returndefault
def__contains__(self,key):
# Find the key first by the key passed in, and if not, convert the key to a string and find it again
returnkeyinself.keys()orstr(key)inself.keys()
__contains__ The method exists for consistency, owing to k in d This operation will call it, But we've seen from dict inherited from __contains__ The method will not be used when the key is not found __missing__ approach。my_dict.keys() (located) at Python3 where the return value is a " view"," view" It's like a collection, And it's as fast as a dictionary.。 however, in Python2 in,my_dict.keys() The return is a list of。 consequently k in my_dict.keys() Operate on python3 Very fast in the middle., however, in python2 in, Not very efficient to process。 If you want to customize a mapping type, The appropriate strategy is to inherit the class。 This class is the one that takes the standard dict usefulness python Achieved it again,UserDict is for the user to inherit from writing subclasses of, The improved code is as follows:
importcollections
classStrKeyDict(collections.UserDict):
def__missing__(self,key):
ifisinstance(key,str):
raiseKeyError(key)
returnself[str(key)]
def__contains__(self,key):
# Here you can safely assume that all the keys that have been stored are strings. So just look it up on self.data
returnstr(key)inself.data
def__setitem__(self,key,item):
# This method will convert all the keys to strings.
self.data[str(key)]=item
Since UserDict inherits from MutableMapping, the rest of the mapping types in StrKeyDict are inherited from the UserDict, MutableMapping, and Mapping superclasses. The get method is provided in Mapping, the same as we defined in StrKeyDict0, so we don't need to define the get method here.
Variants of the dictionary
In the collections module, there are other mapping types besides defaultdict.
collections.OrderedDict
collections.ChainMap
collections.Counter
Immutable mapping type
All mapping types in the standard library are mutable, what if we want to give the user an immutable mapping type? Starting with Python 3.3 the types module introduces a wrapper class named. If this class is given a mapping, it will return a read-only view of the mapping (if changes are made to the original mapping, the result page of this view will change accordingly). for example
>>>fromtypesimportMappingProxyType
>>>d={1:'A'}
>>>d_proxy=MappingProxyType(d)
>>>d_proxy
mappingproxy({1:'A'})
>>>d_proxy[1]
'A'
>>>d_proxy[2]='x'
Traceback(mostrecentcalllast):
File"
TypeError:'MappingProxy'objectdoesnotsupportitemassignment
>>>d[2]='B'
>>>d_proxy[2]# d_proxy It's dynamic.,d The changes are fed back to it
'B'
Scattered lists in the dictionary
A hash table is actually a sparse array (arrays that always have blank elements are called sparse arrays). In a dict hash table, each key value occupies a table element, and each table element has two parts, a reference to the key and a reference to the value. Since all table elements are the same size, a particular table element can be read by offset. python will try to ensure that roughly 1/3 of the table elements are empty, so the original hash table will be copied to a larger space when this threshold is about to be reached. If you want to put an object into a hash table, then you first calculate the hash value of this element. Python's built-in hash() method can be used to compute all built-in type objects. If two objects are equal at the time of comparison, then their hash values must also be equal. For example 1 == 1.0 then hash(1) == hash(1.0)
hash table algorithm
To get the value of my_dict[search_key], Python will first call hash(search_key) to compute a hash of search_key, using the lowest few bits of this value as an offset to look up the meta in the hash table. If the table element is empty, a KeyError exception is thrown. If not empty, the table element will have a found_key:found_value pair. This checks that search_key == found_key, and if equal, returns found_value. If it doesn't match (hash conflict), take a few more bits in the hash table, then process it and use the processed result as an index to find the table element again. Then repeat the steps above. The flow chart for the fetching is as follows.
Adding a new value is essentially the same process as above, except that for the former, a new element is put in when an empty table element is found, while for the latter, the value object in the original table is replaced with the new value after the corresponding table element is found.
Also, when inserting a new value, Python may decide whether to reallocate memory to expand the hash table according to how crowded it is.
Advantages and limitations of dictionaries
1. The key must be hashable
The hashable object requirements are as follows.
supports hash functions and the hash value obtained by the __hash__() method remains unchanged
Support for detecting equivalence via the __eq__() method
If a == b is true, then hash(a) == hash(b) is also true
2. Huge dictionary overhead
Because the dictionary uses hash tables, which in turn must be sparse, this causes it to be spatially inefficient. 3、Key search is fast
The dict implementation is a typical space-for-time: the dictionary type consists of a huge memory overhead, but provides fast access regardless of the size of the data. 4. The order of the keys is determined by the order of addition
When adding a new key to a dict and a hash conflict occurs, the new build may be arranged to be stored in another location. 5. Adding new keys to the dictionary may change the order of existing keys
Whenever a new key is added to the dictionary, the Python interpreter may make the decision to expand the dictionary. The result of the expansion is to create a new, larger hash table and add the original keys to the new hash table, a process that may result in a new hash conflict, causing the order to change in the new hash table. Therefore, do not iterate and modify the dictionary at the same time.
conclude
This one focuses on.
Common dictionary methods
How to handle keys that don't check
Variants of the dict type in the standard library
How the scatter table works
Potential impact of scattered lists
Reference Links
https://docs.python.org/3/glossary.html#term-hashable
https://docs.python.org/3/library/stdtypes.html#mapping-types-dict