Study Notes on Dictionaries for Fluent Python


Source: WeChat - April Link.

This one is a reading note. Main topics: * Common dictionary methods * How to handle keys that can't be looked up * Variants of the dict type in the standard library * How hash tables work

Pan-mapping type

The collections.abc module has two abstract base classes, Mapping and MutableMapping, which serve to define formal interfaces for dict and other similar types.

All mapping types in the standard library are implemented using dict, and they have a common restriction that only hashable data types can be used as keys in these mappings.

What is a hashable data type? In the python glossary (https://docs.python.org/3/glossary.html#term-hashable), the definition of a hashable type is this: if an object is hashable, then its hash value is constant for the lifetime of the object, and the object needs to implement methods. Also hashable objects have to have methods so that they can be compared with other keys. If two hashable objects are equal, then their hashes must only be the same According to this definition, atomic immutable types (str, bytes, and numeric types) are all hashable, frozenset is also hashable (because by its definition, only hashable types can be held in frozenset), and a tuple is also hashable if all within it are hashable (a tuple is immutable, but such a tuple cannot be considered immutable if the elements within it are mutable types). In general, objects of user-defined types are hashable, and the hash value is the return value of their id() function, so these objects are not equal when compared. (If an object implements the __eq__ method and uses the object's internal state in the method, then the object is hashable only if all that internal state is immutable. ) Based on these definitions, the dictionary provides many kinds of construction methods. https://docs.python.org/3/library/stdtypes.html#mapping-types-dictThis page has an example to illustrate the different ways to create a dictionary

>>>a=dict(one=1,two=2,three=3)

>>>b={'one':1,'two':2,'three':3}

>>>c=dict(zip(['one','two','three'],[1,2,3]))

>>>d=dict([('two',2),('one',1),('three',3)])

>>>e=dict({'three':3,'one':1,'two':2})

>>>a==b==c==d==e

True

In addition to these methods, a new dict can be constructed using dictionary derivation.

dictionary derivation

Since Python 2.7, the concepts of list derivations and generator expressions have been ported to dictionaries, giving rise to dictionary derivations. Dictionary derivation (dictcomp) can construct a dictionary from any iterable object with key-value pairs as elements. For example.

>>>data=[(1,'a'),(2,'b'),(3,'c')]

>>>data_dict=

>>>data_dict

{1:'a',2:'b',3:'c'}

Common mapping methods

The following table shows us the common methods for dict, defaultdict and OrderedDict (the latter two are variants of dict and are located within the collections module).

default_factory is not a method, but a callable object whose value defaultdict is set by the user when it is initialized.

OrderedDict.popitem() removes the first inserted element of the dictionary (first-in-first-out); the optional argument last removes the last inserted element if the value is true (last-in-first-out).

Use setdefault to handle keys that are not found

Python throws an exception when the dictionary d[k] cannot find the correct key, normally we use d.get(k, default) instead of d[k] to give a default value to a key that cannot be found, and also use the more efficient setdefault

my_dict.setdefault(key,[]).append(new_value)

# Equivalent to

ifkeynotinmy_dict:

my_dict[key]=[]

my_dict[key].append(new_value)

Both pieces of code have the same effect, except that the latter requires at least two key lookups, three if they don't exist, while the entire operation can be done with just one. So, what do we do with keys that we can't find when we fetch the value?

Resilient queries for mapping

Sometimes, even if a key doesn't exist in the mapping, we want to get a default value when reading through that key. There are two ways to help us get there, a subclass of this type rather than a normal dict, and then implementing methods in the subclass. defaultdict: an option for handling keys that are not found

First let's look at how to use defaultdict.

importcollections

index=collections.defaultdict(list)

index[new_key].append(new_value)

Here we have created a new dictionary index, and if the key does not exist in index, the expression will do the following.

Call list() to create a new list

Take this new list as the value,'new_key' As its key, put in index in

Returns a reference to this list.

And the callable object used to generate the default value is stored in the instance property named The default_factory in defaultdict will only be called in __getitem__ and will not work in other methods. For example, the expression index[k] would call some default value created by default_factory, while index.get(k) would return None. (This is because the special method __missing__ calls default_factory when a defaultdict encounters a key it can't find; in fact, this feature is supported by all mapping methods.) Special method __missing__

All mappings involve the __missing__ method when dealing with keys that are not found. But the base class dict does not provide this method. However, if there is a class that inherits from dict and then that inherited class provides the __missing__ method, then Python will automatically call it when __getitem__ encounters a key not found, rather than throwing a KeyError exception. method will only be called. Providing the __missing__ method has no effect on the get or __contains__ (which is used by the in operator) methods. The following code implements the StrKeyDict0 class. The StrKeyDict0 class converts non-string keys to strings at query time.

classStrKeyDict0(dict):# Inherit from dict

def__missing__(self,key):

ifisinstance(key,str):

# If the key not found is itself a string, throw KeyError

raiseKeyError(key)

# If the key not found is not a string, convert it to a string and find it again

returnself[str(key)]

defget(self,key,default=None):

# The get method delegates the lookup to __getitem__ in the form of self[key], so that the key can be given another chance via __missing__ if the lookup is declared a failure

try:

returnself[key]

exceptKeyError:

# If a KeyError is thrown, it means that __missing__ has also failed and returns default

returndefault

def__contains__(self,key):

# Find the key first by the key passed in, and if not, convert the key to a string and find it again

returnkeyinself.keys()orstr(key)inself.keys()

__contains__ The method exists for consistency, owing to k in d This operation will call it, But we've seen from dict inherited from __contains__ The method will not be used when the key is not found __missing__ approach。my_dict.keys() (located) at Python3 where the return value is a " view"," view" It's like a collection, And it's as fast as a dictionary.。 however, in Python2 in,my_dict.keys() The return is a list of。 consequently k in my_dict.keys() Operate on python3 Very fast in the middle., however, in python2 in, Not very efficient to process。 If you want to customize a mapping type, The appropriate strategy is to inherit the class。 This class is the one that takes the standard dict usefulness python Achieved it again,UserDict is for the user to inherit from writing subclasses of, The improved code is as follows:

importcollections

classStrKeyDict(collections.UserDict):

def__missing__(self,key):

ifisinstance(key,str):

raiseKeyError(key)

returnself[str(key)]

def__contains__(self,key):

# Here you can safely assume that all the keys that have been stored are strings. So just look it up on self.data

returnstr(key)inself.data

def__setitem__(self,key,item):

# This method will convert all the keys to strings.

self.data[str(key)]=item

Since UserDict inherits from MutableMapping, the rest of the mapping types in StrKeyDict are inherited from the UserDict, MutableMapping, and Mapping superclasses. The get method is provided in Mapping, the same as we defined in StrKeyDict0, so we don't need to define the get method here.

Variants of the dictionary

In the collections module, there are other mapping types besides defaultdict.

collections.OrderedDict

collections.ChainMap

collections.Counter

Immutable mapping type

All mapping types in the standard library are mutable, what if we want to give the user an immutable mapping type? Starting with Python 3.3 the types module introduces a wrapper class named. If this class is given a mapping, it will return a read-only view of the mapping (if changes are made to the original mapping, the result page of this view will change accordingly). for example

>>>fromtypesimportMappingProxyType

>>>d={1:'A'}

>>>d_proxy=MappingProxyType(d)

>>>d_proxy

mappingproxy({1:'A'})

>>>d_proxy[1]

'A'

>>>d_proxy[2]='x'

Traceback(mostrecentcalllast):

File"

TypeError:'MappingProxy'objectdoesnotsupportitemassignment

>>>d[2]='B'

>>>d_proxy[2]# d_proxy It's dynamic.,d The changes are fed back to it

'B'

Scattered lists in the dictionary

A hash table is actually a sparse array (arrays that always have blank elements are called sparse arrays). In a dict hash table, each key value occupies a table element, and each table element has two parts, a reference to the key and a reference to the value. Since all table elements are the same size, a particular table element can be read by offset. python will try to ensure that roughly 1/3 of the table elements are empty, so the original hash table will be copied to a larger space when this threshold is about to be reached. If you want to put an object into a hash table, then you first calculate the hash value of this element. Python's built-in hash() method can be used to compute all built-in type objects. If two objects are equal at the time of comparison, then their hash values must also be equal. For example 1 == 1.0 then hash(1) == hash(1.0)

hash table algorithm

To get the value of my_dict[search_key], Python will first call hash(search_key) to compute a hash of search_key, using the lowest few bits of this value as an offset to look up the meta in the hash table. If the table element is empty, a KeyError exception is thrown. If not empty, the table element will have a found_key:found_value pair. This checks that search_key == found_key, and if equal, returns found_value. If it doesn't match (hash conflict), take a few more bits in the hash table, then process it and use the processed result as an index to find the table element again. Then repeat the steps above. The flow chart for the fetching is as follows.

Adding a new value is essentially the same process as above, except that for the former, a new element is put in when an empty table element is found, while for the latter, the value object in the original table is replaced with the new value after the corresponding table element is found.

Also, when inserting a new value, Python may decide whether to reallocate memory to expand the hash table according to how crowded it is.

Advantages and limitations of dictionaries

1. The key must be hashable

The hashable object requirements are as follows.

supports hash functions and the hash value obtained by the __hash__() method remains unchanged

Support for detecting equivalence via the __eq__() method

If a == b is true, then hash(a) == hash(b) is also true

2. Huge dictionary overhead

Because the dictionary uses hash tables, which in turn must be sparse, this causes it to be spatially inefficient. 3、Key search is fast

The dict implementation is a typical space-for-time: the dictionary type consists of a huge memory overhead, but provides fast access regardless of the size of the data. 4. The order of the keys is determined by the order of addition

When adding a new key to a dict and a hash conflict occurs, the new build may be arranged to be stored in another location. 5. Adding new keys to the dictionary may change the order of existing keys

Whenever a new key is added to the dictionary, the Python interpreter may make the decision to expand the dictionary. The result of the expansion is to create a new, larger hash table and add the original keys to the new hash table, a process that may result in a new hash conflict, causing the order to change in the new hash table. Therefore, do not iterate and modify the dictionary at the same time.

conclude

This one focuses on.

Common dictionary methods

How to handle keys that don't check

Variants of the dict type in the standard library

How the scatter table works

Potential impact of scattered lists

Reference Links

https://docs.python.org/3/glossary.html#term-hashable

https://docs.python.org/3/library/stdtypes.html#mapping-types-dict


Recommended>>
1、Apache explodes log file vulnerability
2、Representative Fu Xinping the use of big data services to handle cases so that justice can be seen and felt
3、CARLA Unmanned Vehicle Simulation Environment Construction
4、Fake Candy Scam 6000eth Xue Bianzi Says Most of the Hilarious Companies Will Be Finished Bao Erji Says Biggest Blockchain Application Is Coin Speculation
5、Big data tells you whether to buy a parking space or not

    已推荐到看一看 和朋友分享想法
    最多200字,当前共 发送

    已发送

    朋友将在看一看看到

    确定
    分享你的想法...
    取消

    分享想法到看一看

    确定
    最多200字,当前共

    发送中

    网络异常,请稍后重试

    微信扫一扫
    关注该公众号