| ▲ | A “frozen” dictionary for Python(lwn.net) |
| 175 points by jwilk 11 hours ago | 128 comments |
| |
|
| ▲ | pansa2 8 hours ago | parent | next [-] |
| I wonder whether Raymond Hettinger has an opinion on this PEP. A long time ago, he wrote: "freezing dicts is a can of worms and not especially useful". https://mail.python.org/pipermail/python-dev/2006-February/0... |
| |
| ▲ | pkulak 2 hours ago | parent | next [-] | | This is why I love how Rust approached this; almost by accident to make borrow checking work. Every reference is either mutable or not, and (with safe code), you can't use an immutable reference to get a mutable reference anywhere down the chain. So you can slowly construct a map through a mutable reference, but then return it out of a function as immutable, and that's the end of it. It's no longer ever mutable, and no key or value is either. There's no need to make a whole new object called FrozenHashMap, and then FrozenList, and FrozenSet, etc. You don't need a StringBuilder because String is mutable, unless you don't want it to be. It's all just part of the language. Kotlin _kinda_ does this as well, but if you have a reference to an immutable map in Kotlin, you are still free to mutate the values (and even keys!) as much as you like. | | |
| ▲ | the__alchemist an hour ago | parent [-] | | My favorite part of rust is its explicit control over mutability, in the manner you describe. |
| |
| ▲ | jonathaneunice 7 hours ago | parent | prev | next [-] | | That's a great link and recommended reading. It explains a lot about the design of Python container classes, and the boundaries of polymorphism / duck typing with them, and mutation between them. I don't always agree with the choices made in Python's container APIs...but I always want to understand them as well as possible. Also worth noting that understanding changes over time. Remember when GvR and the rest of the core developers argued adamantly against ordered dictionaries? Haha! Good times! Thank goodness their first wave of understanding wasn't their last. Concurrency and parallelism in Python was a TINY issue in 2006, but at the forefront of Python evolution these days. And immutability has come a long way as a design theme, even for languages that fully embrace stateful change. | | |
| ▲ | zahlman 7 hours ago | parent [-] | | > Also worth noting that understanding changes over time. Remember when GvR and the rest of the core developers argued adamantly against ordered dictionaries? Haha! Good times! The new implementation has saved space, but there are opportunities to save more space (specifically after deleting keys) that they've now denied themselves by offering the ordering guarantee. | | |
| ▲ | jonathaneunice 7 hours ago | parent [-] | | Ordering, like stability in sorting, is an incredibly useful property. If it costs a little, then so be it. This is optimizing for the common case, where memory is generally plentiful and dicts grow more than they shrink. Python has so many memory inefficiencies that occasional tombstones in the dict internal structure is unlikely to be a major effect. If you're really concerned, do `d = dict(d)` after aggressive deletion. | | |
| ▲ | zahlman 6 hours ago | parent | next [-] | | > Ordering, like stability in sorting, is an incredibly useful property. I can't say I've noticed any good reasons to rely on it. Didn't reach for `OrderedDict` often back in the day either. I've had more use for actual sorting than for preserving the insertion order. | | |
| ▲ | kzrdude 41 minutes ago | parent | next [-] | | It seems like opinions really differ on this item then. I love insertion sort ordering in mappings, and python with it was a big revelation. The main reason is that keys need some order, and insertion order -> iteration order is a lot better than pseudorandom order (hash based orders). For me, it creates more reproducible programs and scripts, even simple ones. | |
| ▲ | xen0 2 hours ago | parent | prev | next [-] | | It's sometimes nice to be deterministic. I don't often care about a specific order, only that I get the same order every time. | | |
| ▲ | no_wizard 2 hours ago | parent [-] | | Thinking about this upfront for me, I am actually wondering why this is useful outside of equality comparisons. Granted, I live and work in TypeScript, where I can't `===` two objects but I could see this deterministic behavior making it easier for a language to compare two objects, especially if equality comparison is dependent on a generated hash. The other is guaranteed iteration order, if you are reliant on the index-contents relationship of an iterable, but we're talking about Dicts which are keyed, but extending this idea to List, I see this usefulness in some scenarios. Beyond that, I'm not sure it matters, but I also realize I could simply not have enough imagination at the moment to think of other benefits | | |
| ▲ | xen0 an hour ago | parent [-] | | I work on a build system (Bazel), so perhaps I care more than most. But maybe it does all just come down to equality comparisons. Just not always within your own code. |
|
| |
| ▲ | mcherm 3 hours ago | parent | prev | next [-] | | Personally, I find lots of reasons to prefer an orders Dict to an unordered one. Even small effects like "the debugging output will appear in a consistent order making it easier to compare" can be motivation enough in many use cases. | |
| ▲ | morshu9001 2 hours ago | parent | prev | next [-] | | Same. Recently I saw interview feedback where someone complained that the candidate used OrderedDict instead of the built-in dict that is now ordered, but they'll let it slide... As if writing code that will silently do different things depending on minor Python version is a good idea. | | |
| ▲ | tehnub 2 hours ago | parent [-] | | Well it's been guaranteed since 3.7 which came out in 2018, and 3.6 reached end-of-life in 2021, so it's been a while. I could see the advantage if you're writing code for the public (libraries, applications), but for example I know at my job my code is never going to be run with Python 3.6 or older. | | |
| ▲ | morshu9001 a minute ago | parent [-] | | Yeah, if you have that guarantee then I wouldn't fault anyone for using dict, but also wouldn't complain about OrderedDict. |
|
| |
| ▲ | vanviegen 5 hours ago | parent | prev | next [-] | | Indeed! I don't understand why it isn't more common for stdlibs to include key-ordered maps and sets. Way more useful than insertion ordering. | | |
| ▲ | zahlman 5 hours ago | parent [-] | | Presumably because it involves different performance characteristics. |
| |
| ▲ | BiteCode_dev 2 hours ago | parent | prev [-] | | Ordering is very useful for testing. This morning for example, I tested an object serialized through a JSON API. My test data seems to never match the next run. After a while, I realized one of the objects was using a set of objects, which in the API was turned into a JSON array, but the order of said array would change depending of the initial Python VM state. 3 days ago, I used itertools.group by to group a bunch of things. But itertools.group by only works on iterable that are sorted by the grouping key. Now granted, none of those recent example are related to dicts, but dict is not a special case. And it's iterated over regularly. |
| |
| ▲ | seanhunter 4 hours ago | parent | prev | next [-] | | Ordering is specifically a property (useful or not) that a set doesn't have. You need a poset for it to be ordered. I would expect to use a different data structure if I needed an ordered set. | |
| ▲ | LtWorf 4 hours ago | parent | prev [-] | | Does your code actually rely on that? I've never once needed it. |
|
|
| |
| ▲ | dkarl 6 hours ago | parent | prev | next [-] | | It's interesting that he concludes that freezing dicts is "not especially useful" after addressing only a single motivation: the use of a dictionary as a key. He doesn't address the reason that most of us in 2025 immediately think of, which is that it's easier to reason about code if you know that certain values can't change after they're created. What a change in culture over the last 20 years! | | |
| ▲ | morshu9001 2 hours ago | parent [-] | | You can't really tell though. Maybe the dict is frozen but the values inside aren't. C++ tried to handle this with constness, but that has its own caveats that make some people argue against using it. | | |
| ▲ | krick 29 minutes ago | parent [-] | | Indeed. So I don't really understand what this proposal tries to achieve. It even explicitly says that dict → frozendict will be O(n) shallow-copy, and the contention is only about O(n) part. So… yeah, I'm sure they are useful for some cases, but as Raymond has said — it doesn't seem to be especially useful, and I don't understand what people ITT are getting excited about. |
|
| |
| ▲ | mvanbaak 7 hours ago | parent | prev | next [-] | | This was 19 (almost) 20 years ago.
As stated in the lwn.net article, a lot of concurrency has been added to python, and it might now be time for something like a frozendict. Things that were not useful in 2006 might be totally useful in 2026 ;P Still, like you, I'm curious wether he has anything to say about it. | | |
| ▲ | aewens 7 hours ago | parent [-] | | I think Raymond Hettinger is called out specially here because he did a well known talk called [Modern Dictionaries](https://youtu.be/p33CVV29OG8) where around 32:00 to 35:00 in he makes the quip about how younger developers think they need new data structures to handle new problems, but eventually just end up recreating / rediscovering solutions from the 1960s. “What has been is what will be, and what has been done is what will be done; there is nothing new under the sun.” | | |
| ▲ | sesm 6 hours ago | parent [-] | | Since that time HAMT was invented and successfully used in Scala and Clojure, so this talk didn't age well. | | |
|
| |
| ▲ | morshu9001 2 hours ago | parent | prev | next [-] | | I agree, same with frozenset. If you really want to use one of those as a key, convert to a tuple. There might be niche use cases for all this, but it's not something that the language or even the standard lib need to support. | | |
| ▲ | boothby 2 hours ago | parent [-] | | Problem being that sets aren't consistently ordered and conversion to a tuple can result in an exponential (specifically, factorial) explosion in the number of possible keys associated with a single set. Nor can you sort all objects. Safe conversion of sets to tuples for use as keys is possible but the only technique I know requires an auxiliary store of objects (mapping objects to the order in which they were first observed), which doesn't parallelize well. | | |
| ▲ | morshu9001 2 hours ago | parent [-] | | tuple(sorted(s)) and if you can't even sort the values, they're probably not hashable. I get that this involves a copy, but so does frozenset, and you can cross that bridge in various ways if it's ever a problem. | | |
| ▲ | boothby 41 minutes ago | parent [-] | | Here are some types that support hashing: str
bytes
int, float
complex
tuple
frozenset
Aside from int and float, you cannot perform comparisons between objects of different types. Moreover, you cannot sort complex numbers at all.I have crossed that bridge, and I'm telling you (again) that a sorted tuple is not a generic solution. | | |
| ▲ | morshu9001 13 minutes ago | parent [-] | | I'm not saying the problem with tuple doesn't exist, but that there doesn't need to be a built-in way to deal with it. If for some reason you've got a mixed-type set that you also want to use as a dict key, there are semi-manual ways to deal with it. Most languages don't try to specially address that use case. |
|
|
|
| |
| ▲ | zahlman 7 hours ago | parent | prev | next [-] | | > Another PEP 351 world view is that tuples can serve as frozenlists; however,
that view represents a Liskov violation (tuples don't support the same
methods). This idea resurfaces and has be shot down again every few months. ... Well, yes; it doesn't support the methods for mutation. Thinking of ImmutableFoo as a subclass of Foo is never going to work. And, indeed, `set` and `frozenset` don't have an inheritance relationship. I normally find Hettinger very insightful so this one is disappointing. But nobody's perfect, and we change over time (and so do the underlying conditions). I've felt like frozendict was missing for a long time, though. And really I think the language would have been better with a more formal concept of immutability (e.g. linking it more explicitly to hashability; having explicit recognition of "cache" attributes, ...), even if it didn't go the immutable-by-default route. | | |
| ▲ | kccqzy 6 hours ago | parent | next [-] | | Apple (or perhaps NeXT) has solved this problem already in Objective-C. Look at NSArray and NSMutableArray, or NSData and NSMutableData. It’s intuitive and Liskov-correct to make the mutable version a subclass of the immutable version. And it’s clearly wrong to have the subclass relationship the other way around. Given how dynamic Python is, such a subclass relationship need not be evident at the C level. You can totally make one class whose implementation is independent of another class a subclass of the other, using PEP 3119. This gives implementations complete flexibility in how to implement the class while retaining the ontological subclass relationship. | |
| ▲ | pansa2 7 hours ago | parent | prev | next [-] | | > ImmutableFoo as a subclass of Foo is never going to work. And, indeed, `set` and `frozenset` don't have an inheritance relationship. Theoretically, could `set` be a subclass of `frozenset` (and `dict` of `frozendict`)? Do other languages take that approach? > linking [immutability] more explicitly to hashability AFAIK immutability and hashability are equivalent for the language's "core" types. Would it be possible to enforce that equivalence for user-defined types, given that mutability and the implementation of `__hash__` are entirely controlled by the programmer? | | |
| ▲ | kccqzy 6 hours ago | parent | next [-] | | Yes you could. Other languages do. See NSMutableSet and NSSet in Objective-C. | |
| ▲ | chriswarbo 7 hours ago | parent | prev [-] | | > Theoretically, could `set` be a subclass of `frozenset` (and `dict` of `frozendict`)? At one extreme: sure, anything can be made a subclass of anything else, if we wanted to. At the other extreme: no, since Liskov substitution is an impossibly-high bar to reach; especially in a language that's as dynamic/loose as Python. For example, consider an expression like '"pop" in dir(mySet)' | | |
| ▲ | tremon 6 hours ago | parent [-] | | > consider an expression like '"pop" in dir(mySet)' class frozenset:
pass
class set(frozenset):
def pop(self, key):
pass
I don't see why hasattr(mySet, 'pop') should be a problem here? | | |
| ▲ | chriswarbo 4 hours ago | parent [-] | | > I don't see why hasattr(mySet, 'pop') should be a problem here? I never said it's a problem (and I never said it's not!). I was specifically addressing two things: - The "theoretical" nature of the question I quoted (i.e. ignoring other aspects like subjectivity, practicality, convention, etc.) - The reasoning about "Liskov violation", which was quoted further up this thread. For context, here's Liskov's definition of their principle (from https://en.wikipedia.org/wiki/Liskov_substitution_principle ): > Barbara Liskov and Jeannette Wing described the principle succinctly in a 1994 paper as follows:[1] > > Subtype Requirement: Let ϕ(x) be a property provable about objects x of type T. Then ϕ(y) should be true for objects y of type S where S is a subtype of T. My expression `"pop" in dir(mySet)` gives an explicit example of how `set` and `frozenset` are not subtypes of each other (regardless of how they're encoded in the language, with "subclasses" or whatever). In this case `ϕ(x)` would be a property like `'"pop" in dir(x)' = 'False'`, which holds for objects x of type frozenset. Yet it does not hold for objects y of type set. Your example of `hasattr(mySet, 'pop')` gives another property that would be violated. My point is that avoiding "Liskov violations" is ("theoretically") impossible, especially in Python (which allows programs to introspect/reflect on values, using facilities like 'dir', 'hasattr', etc.). (FYI I became rather jaded on the Liskov substitution principle after reading https://okmij.org/ftp/Computation/Subtyping ) | | |
| ▲ | kccqzy 37 minutes ago | parent | next [-] | | > I became rather jaded on the Liskov substitution principle after reading https://okmij.org/ftp/Computation/Subtyping The root of the issue here is that Liskov substitution principle simply references ϕ(x) to be some property satisfied by objects of a class. It does not distinguish between properties that are designed by the author of the class to be satisfied or properties that happen to be satisfied in this particular implementation. But the Hyrum’s Law also states that properties that are accidentally true can become relied upon and as time passes become an intrinsic property. This to me suggests that the crux of the problem is that people don’t communicate sufficiently about invariants and non-invariants of their code. | |
| ▲ | tremon 4 hours ago | parent | prev [-] | | > > Subtype Requirement: Let ϕ(x) be a property provable about objects x of type T. Then ϕ(y) should be true for objects y of type S where S is a subtype of T. This says "if hasattr(parent, 'pop') == True then hasattr(child, 'pop') must be True". This is not violated in this case, since hasattr(parent, 'pop') is False. If you want to extend the above definition so that negative proofs concerning the parent should also hold true for the child, then subtyping becomes impossible since all parent and child types must be identical, by definition. |
|
|
|
| |
| ▲ | tmp10423288442 an hour ago | parent | prev | next [-] | | And, to the point of this proposal, `dict` and `frozendict` don't have an inheritance relationship either. | |
| ▲ | immibis 17 minutes ago | parent | prev [-] | | ImmutableFoo can't be a subclass of Foo, since it loses the mutator methods. But nor can Foo be a subclass of ImmutableFoo, since it loses the axiom of immutability (e.g. thread-safety) that ImmutableFoo has. When you interpret Liskov substitution properly, it's very rare that anything Liskov-substitutes anything, making the entire property meaningless. So just do things based on what works best in the real world and aim for as much Liskov-substitution as is reasonable. Python is duck-typed anyway. It's a decent guiding principle - Set and ImmutableSet are more substitutable than Set and Map, so Set deriving from ImmutableSet makes more sense than Set deriving from Map. It's just not something you can ever actually achieve. |
| |
| ▲ | ndr 5 hours ago | parent | prev [-] | | Immutability it's a joy to work with. Ask anyone who's worked with Clojure's dicts. |
|
|
| ▲ | perrygeo 5 hours ago | parent | prev | next [-] |
| Python discovers immutable data, and gets it wrong. Frozendict is a blunt instrument - instead of a mutable free-for-all, it just locks the data for the entire lifecycle. Brace for the wave of code littered with deep copies, proudly proclaiming how functional it all is. If you want real immutable data structures, not a cheap imitation, check out pyrsistent. |
| |
| ▲ | cylemons 5 hours ago | parent | next [-] | | How else would you "modify" immutable data other than by copying it? | | |
| ▲ | ndr 5 hours ago | parent | next [-] | | You don't have to copy everything. Look up persistent data structures [0] The main trick is to store everything in a tree with a large branching factor. The elements are on your leaves.
When you want to make an edit you need to create only the nodes from the root to the actual leaf. Then you return the new root. Everything else you can share. This is a log operation, normally implemented with branch 32. It works with vectors as well as dicts.
Creating a copy is constant, it's immutable can be shared freely (especially on GC'd languages). Insert/Delete are O(log32(n)) instead of O(n) they'd be with a full copy. [0] https://en.wikipedia.org/wiki/Persistent_data_structure#Pers... | |
| ▲ | thechao 5 hours ago | parent | prev [-] | | And, yet, somehow, functional languages have been doing this since forever? I jest, I jest! There's an entire field of study focused on this! I'm no expert, but imagine you've got a singly-linked list, and we're going to allow only push_back() and pull_back() (no insert()). Let's go ahead and let multiple owners have at this list. If A does "pull_back()" what really happens is A has a local "tail" pointer that moves back along the end of the list and has a new belief about the end of the list. When A does "push_back()" it begins attaching links at its new tail. Since the links just "point backwards" without modifying the old links, this doesn't mutate the list "as is", it only mutates A's version of the list. So, if B is also looking at the list, it could just start doing "push_back()" and add its own "tail". The result is a data-structure that's a "bunch of shared singly linked lists" in a bush structure that's more akin to a tree than a singly linked list. You can do the same thing with binary trees and other structures. It's more fiddly, and there's definitely places where single-ownership allows mutability "under the hood" for real performance gains, but that's the basic idea. | | |
| ▲ | cylemons an hour ago | parent [-] | | Interesting, I didn't think about that, so it is copying on write but on a more granular level |
|
| |
| ▲ | shadowgovt 5 hours ago | parent | prev [-] | | If your dicts are frozen, you shouldn't need to deep-copy. The point of immutability is that if you want a new frozendict based on another one, you just rebuild the indirection data structure up top and leave the values it references alone. | | |
| ▲ | perrygeo 3 hours ago | parent [-] | | You're absolutely right: an "indirection data structure" is necessary. Freezing the data is the least interesting part - it doesn't give you any of the benefits typically associated with immutable data structures in functional languages. That's my point - Python is shipping a half solution that's being mistaken for a proper one. You think Python developers are going to roll their own HAMT on top of frozendicts? Or are they just gonna make copies? Personally, I'd just use pyrsistent which seems to get it right. | | |
| ▲ | rented_mule 2 hours ago | parent | next [-] | | It's a little early to say that "Python is shipping" anything related to this. The PEP is still in draft status and it may be modified (as it was three hours ago) or even rejected. The article says it is likely to be accepted, but that has not happened yet. That also means there is time to comment on the PEP in the discussion it links to if you'd like. https://peps.python.org/pep-0814/ | |
| ▲ | BiteCode_dev 2 hours ago | parent | prev [-] | | pyrsistent is super slow, though. Just ran a quick benchmark: - Creation - 8-12x slower
- Lookup - 22-27x slower
- Contains check - 30-34x slower
- Iteration - 5-14x slower
- Merge - 32-158x slower
Except at 10k+ items, batchup dates on 100K+ items or inserting 100 keys.This is rarely the case in practice, most dictionaries and dict operations are small, if you have a huge dict, you probably should be chunking your load or delegating that to infra. Not to mention pyrsistent's API is incompatible with dicts, so you can't pass it to external code without conversion. You'd better have an incredible ROI to justify that. | | |
| ▲ | instig007 22 minutes ago | parent [-] | | > pyrsistent is super slow, though Since when is Python about speed? > Just ran a quick benchmark Where's the code? Have you observed the bottleneck call? > Except at 10k+ items, batchup dates on 100K+ items or inserting 100 keys. > This is rarely the case in practice Where's the stats on the actual practice? > You'd better have an incredible ROI to justify that. The ROI being: fearless API design where 1) multiple instances of high level components are truly independent and could easily parallelize, 2) calling sites know that they keep the original data intact and that callees behave within the immutability constraints, 3) default func inputs and global scope objects are immutable without having to implement another PEP, 4) collections are hashable in general. |
|
|
|
|
|
| ▲ | polyrand 9 hours ago | parent | prev | next [-] |
| A frozen dictionary would be very welcome. You can already do something similar using MappingProxyType [0] from types import MappingProxyType
d = {}
d["a"] = 1
d["b"] = 2
print(d)
frozen = MappingProxyType(d)
print(frozen["a"])
# Error:
frozen["b"] = "new"
[0]: https://docs.python.org/3/library/types.html#types.MappingPr... |
| |
| ▲ | zahlman 7 hours ago | parent [-] | | > You can already do something similar Only if you deny access to the underlying real dict. | | |
| ▲ | ali_m 7 hours ago | parent [-] | | Yes, this only prevents the callee from mutating it, it can't provide a strong guarantee that the underlying mapping won't be changed upstream (and hence MappingProxyType can't be washable). |
|
|
|
| ▲ | sevensor 7 hours ago | parent | prev | next [-] |
| Concurrency is a good motivation, but this is super useful even in straight line code. There’s a huge difference between functions that might mutate a dictionary you pass in to them and functions that definitely won’t. Using Mapping is great, but it’s a shallow guarantee because you can violate it at run time. |
|
| ▲ | code_biologist 9 hours ago | parent | prev | next [-] |
| Shoutout to pyrsistent, a personal favorite library for frozen/immutable/hashable data structures whenever I need to use lists, sets, or dicts as dict keys, or make sets of such items: https://github.com/tobgu/pyrsistent |
|
| ▲ | rurban 4 hours ago | parent | prev | next [-] |
| > so having a safe way to share dictionaries between threads will be a boon Since only the keys are const, the values not, frozendict is not thread-safe per se. There needs to be a small lock around the value getter and setter. |
| |
| ▲ | varelaz 4 hours ago | parent [-] | | it's thread safe on operations on the dict but not on the values. Same relates to other immutable structures like tuples. Lock will not help here cause unsafety comes from operation on value after value is obtained. |
|
|
| ▲ | sundarurfriend 8 hours ago | parent | prev | next [-] |
| Can someone ELI5 the core difference between this and named tuples, for someone who is not deep into Python? ChatGPT's answer boiled down to: unordered (this) vs ordered (NTs), "arbitrary keys, decided at runtime" vs "fixed set of fields decided at definition time" (can't an NT's keys also be interpolated from runtime values?), and a different API (`.keys()`, `.items()`), etc (I'm just giving this as context btw, no idea if there's inaccuracies in these). So could this also have been approached from the other side, as in making unordered NamedTuples with support for the Mapping API? The line between dictionaries and named tuples and structs (across various languages) has always seemed a bit blurry to me, so I'm trying to get a better picture of it all through this. |
| |
| ▲ | quietbritishjim 8 hours ago | parent | next [-] | | > arbitrary keys, decided at runtime" vs "fixed set of fields decided at definition time" (can't an NT's keys also be interpolated from runtime values?) If you want to create a named tuple with arbitrary field names at runtime then you need to create a new named tuple type before you create the instance. That is possible, since Python is a very dynamic language, but it's not particularly efficient and, more importantly, just feels a bit wrong. And are you going to somehow cache the types and reuse them if the field names match? It's all a bit of a mess compared to just passing the entries to the frozendict type. | |
| ▲ | _flux 8 hours ago | parent | prev | next [-] | | The key difference is what the GPT outlined: tuples have order for the fields and named tuples are not indexed by the name, but instead a field accessor is used, i.e. foo.bar vs foo["bar"]. In addition namedtuples can be indexed using that order like tuples can (foo[0]), which clearly isn't possible with dicts and would be quite confusing if dict had integer key. So I think the differences aren't great, but they are sufficient. A frozendict is not going to be indexable by an integer. Python already has an abstract type for this, for mostly the use of type checking I imagine: https://docs.python.org/3/glossary.html#term-mapping Documentation for namedtuple: https://docs.python.org/3/library/collections.html#collectio... | |
| ▲ | willvarfar 8 hours ago | parent | prev | next [-] | | The values in tuples cannot change. The values that keys point to in a frozen dict can? But yeah I'd be in favour of something that looked a lot like a named tuple but with mutable values and supporting [name] access too. And of course some nice syntactic sugar rather like dicts and sets have with curly brackets today. | | |
| ▲ | pansa2 7 hours ago | parent [-] | | > The values in tuples cannot change. The values that keys point to in a frozen dict can? The entries of a tuple cannot be assigned to, but the values can be mutated. The same is true for a `frozendict` (according to the PEP they don't support `__setitem__`, but "values can be mutable"). | | |
| ▲ | vscode-rest 7 hours ago | parent [-] | | Tuple entries must be hashable, which (as far as standard library is concerned) means immutable. | | |
| ▲ | pansa2 7 hours ago | parent [-] | | >>> hash([1, 2])
TypeError: unhashable type: 'list'
>>> t = ([1, 2], [3, 4])
>>> print(t)
([1, 2], [3, 4])
| | |
| ▲ | vscode-rest 7 hours ago | parent [-] | | Ah. Of course… that’s how the workaround to use tuples as frozen dicts can work in the first place. Slow morning for me! |
|
|
|
| |
| ▲ | grimgrin 8 hours ago | parent | prev [-] | | I think you could have asked this same comment w/o mentioning ChatGPT and you wouldn't have been downvoted to oblivion in 3 minutes I don't see anything wrong with your asking to understand | | |
| ▲ | chistev 8 hours ago | parent [-] | | This place hates ChatGPT and AI. Lol. Edit: Of course, I get down voted as I predicted I would. Lol. | | |
| ▲ | acdha 7 hours ago | parent | next [-] | | This place hates laziness and imprecision. Using ChatGPT for editing or inspiration is okay as long as you personally review the results for accuracy and completeness, at which point people care about it as much as you announcing that you used a spell checker. | | |
| ▲ | delaminator 7 hours ago | parent [-] | | Pasting chat GPT responses is against the site rules. always has been even before GPT https://news.ycombinator.com/item?id=46206457 | | |
| ▲ | acdha 7 hours ago | parent [-] | | Fair, because they’re not your words. I’ll edit my comment for what I had in mind: that it can be helpful for that like a spell checker - for example, I know non-native English speakers who find them useful for editing but they completely understand the topic intellectually. |
|
| |
| ▲ | grimgrin 5 hours ago | parent | prev [-] | | you made a non-comment, that's why you're downvoted comments don't have to be substantial, but they should be adding something that might merit more responses, or could. yours does not. what kind of reply could you even give to "this place [and so, its users] hates [thing]" I'd ask you to qualify it, but you can't really qualify such a statement |
|
|
|
|
| ▲ | bilsbie 7 hours ago | parent | prev | next [-] |
| Wow weird Mandela effect for me. I really remember this being a built and actually using it. |
| |
|
| ▲ | bvrmn an hour ago | parent | prev | next [-] |
| It would be great to have **kwargs as frozendict by default. It could help with caching decorators to speedup key construction. For example natural key is simply `(args, kwargs)`. |
|
| ▲ | Strilanc 6 hours ago | parent | prev | next [-] |
| This is a type that I would use a lot. For example, I often write classes that do cacheable analysis that results in a dict (e.g. the class stores a list of tiles defined by points and users want a point-to-tiles mapping for convenience). It's worth caching those transformations, e.g. using @functools.cached_property, but this introduces a risk where any caller could ruin the returned cached value by editing it. Currently, I take the safety hit (cache a dict) instead of the performance hit (make a new copy for each caller). Caching a frozendict would be a better tradeoff. |
| |
| ▲ | clickety_clack 5 hours ago | parent [-] | | Maybe you should take a look at pyrsistent. That allows you to make frozen “maps”. You can “evolve” them into new versions of the objects and it keeps the references to the unchanged keys and values under the hood so it’s fast and memory efficient. |
|
|
| ▲ | mwsherman 6 hours ago | parent | prev | next [-] |
| I’ve found C#’s frozen dictionary to be useful: https://learn.microsoft.com/en-us/dotnet/api/system.collecti... It’s optimized for fast reads in exchange for expensive creation. |
|
| ▲ | codethief 9 hours ago | parent | prev | next [-] |
| Next step: Auto-inferring the correct (most narrow) TypedDict type from a frozendict. (Think `const foo = { … } as const` in TypeScript.) I miss this TS feature in Python on the daily. |
| |
| ▲ | mrweasel 8 hours ago | parent | next [-] | | What would that do exactly? Auto-generate a TypedDict class? Bringing up TypedDict is sort of interesting, because it seems like a frozen dictionary could have been implemented by PEP 705, if typing.ReadOnly was enforced at runtime, and not just a hint to a static type checker. | | |
| ▲ | dmurray 8 hours ago | parent [-] | | Auto-generate a TypedDict type while type checking, while doing nothing at runtime. I expect this is not a very big ask and the various typecheckers will have versions of this soon after release. | | |
| ▲ | codethief 5 hours ago | parent | next [-] | | After what release? Type checkers already support the various TypedDict PEPs, even those that haven't been accepted yet. Yet, none of them infers TypedDict types. | |
| ▲ | mrweasel 8 hours ago | parent | prev [-] | | Okay so basically just "inline": class MyType(TypedDict):
a: str
b: int
or infer the type hint in: my_typed_dict: dict[str, int] = {"a": 5}
It should probably be something like: auto_typed: dict[typing.Const, typing.Const] = {"a": 5}
where typing.Const defaults to Any for an empty dict. | | |
| ▲ | codethief 5 hours ago | parent [-] | | Not sure we're talking about the same thing. Inline TypedDicts are already in the process of being formalized, see https://peps.python.org/pep-0764/ , and have experimental support in e.g. Pyright. What I meant was that foo = {"a": 5}
should be inferred as foo: TypedDict[{ "a": Literal[5] }] = {"a": 5}
|
|
|
| |
| ▲ | adzm 9 hours ago | parent | prev | next [-] | | Curious what this means for typescript as well; honestly I've only reached for as const rarely. | |
| ▲ | anentropic 8 hours ago | parent | prev | next [-] | | Agreed... but it shouldn't need a frozendict for this IMHO TypedDict in Python are essentially broken/useless as is What is needed is TS style structural matching, like a Protocol for dicts | |
| ▲ | virtue3 9 hours ago | parent | prev | next [-] | | I also SUPER LOVE this feature. Especially when you make a type union of the keys for easier indexing. | |
| ▲ | tweakimp 9 hours ago | parent | prev [-] | | Can you give a Python example of a use case for this? |
|
|
| ▲ | drhagen 9 hours ago | parent | prev | next [-] |
| If this gets wide enough use, they could add an optimization for code like this: n = 1000
a = {}
for i in range(n):
a[i] = str(i)
a = frozendict(a) # O(n) operation can be turned to O(1)
It is relatively easy for the JIT to detect the `frozendict` constructor, the `dict` input, and the single reference immediately overwritten. Not sure if this would ever be worth it. |
| |
| ▲ | _flux 8 hours ago | parent [-] | | Wouldn't ref-counting CPython already know that a has a single reference, allowing this optimization without needing any particular smart JIT? | | |
| ▲ | tracnar 17 minutes ago | parent | next [-] | | Indeed, the Tcl implementation does this so e.g. `set d [dict] ; dict set d key value` can modify d in place instead of creating a copy (since everything is immutable). | |
| ▲ | zbentley 7 hours ago | parent | prev | next [-] | | I think GP was talking about optimizing away the O(N) call on the last line. The GC will take care of removing the reference to the old (mutable) dict, but constructing a new frozendict from a mutable dict would, in the current proposal, be an O(N) shallow copy. There are also potentially other optimizations that could be applied (not specific to dict/frozendict) to reduce the memory overhead on operations like "a = f(a)" for selected values of "f". | |
| ▲ | zahlman 7 hours ago | parent | prev [-] | | First thought: I would very much expect it to be able to do this optimization given the similar things it does for string concatenation. But actually, I suspect it can't do this optimization simply because the name `frozendict` could be shadowed. |
|
|
|
| ▲ | steadyelk 5 hours ago | parent | prev | next [-] |
| Agree it's about time to make this built in. Other functional packages like JAX [0] are already using the concept but they build it into their library from scratch. [0] https://flax.readthedocs.io/en/latest/api_reference/flax.cor... |
|
| ▲ | the__alchemist 5 hours ago | parent | prev | next [-] |
| I wish Python could move away from types-as-mutability-determiners. Or paper over them; all could have mutable and non-mutable variants, with "to_mut()" and "to_immut()" methods. And in the same vein, explicit control over whether functions mutate a variable in place, or make a local copy. Python is my first lang, and I still am unable to consistently reason about these. |
| |
|
| ▲ | ok123456 6 hours ago | parent | prev | next [-] |
| Ruby has had frozen dictionaries since version 1.0, and they are generally considered a good thing to use where possible. |
|
| ▲ | westurner an hour ago | parent | prev | next [-] |
| PMap and PVector, functional Python libraries, "PEP 351 – The freeze protocol" (2005, rejected)
https://peps.python.org/pep-0351/ ; IIUC the freeze protocol proposed basically: def freeze(obj)
return cache.setsefault(hash(obj),
obj.__freeze__())
/? "Existing threads re: consts and applications thereof"I wasn't able to find a URL to this post (2021) from the python-ideas mailing list archives using a double quoted search term today; I had to use the python mailing list search engine? Did something break crawling of mailing lists? Old mailman HTML archives were very simple to crawl.. ENH: pypa: add a sitemap.xml for/of mailing list archives, forums; @pypa: ask for search engine indexing advice: "How do we make sure that the python mailing list archives will be search indexed?" (as they traditionally were) How to find the .txt of mailing list archives posts these days? From "[Python-ideas] Re: Introduce constant variables in Python" (2021)
https://mail.python.org/archives/list/python-ideas@python.or... : - pyrsistent: PMap, PVector I'm out of time for this; (reformatting this for HN so that URLs will be auto-linkified but newlines won't be eliminated) here's the full email as .txt, the mailing list archive has a hyperlinkified version with newlines preserved. GH Markdown and CommonMark Markdown also preserve newlines and auto-linkify: From: [@westurner]
Date: Thu, Jun 17, 2021, 10:43 AM
Subject: Re: [Python-ideas] Re: Introduce constant variables in Python
Cc: python-ideas <python-ideas@python.org>
On Mon, May 24, 2021 at 5:43 PM Chris Angelico <rosuav@gmail.com> wrote:
Requiring that a name not be rebound is well-defined and testable.
Requiring that an object not change is either trivial (in the case of,
say, an integer) or virtually impossible (in the case of most
objects).
What would be the advantage of such a declaration?
ChrisA
## Existing threads re: consts and applications thereof
So, `/? from:me pyrsistent` I found a few results:
- "[Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)" 2016-04
https://mail.python.org/pipermail/python-dev/2016-April/143958.html
- ~Sandboxing python within python is nontrivial to impossible; consts might help a bit
- https://mail.python.org/pipermail/python-dev/2016-April/143958.html
- "Proposal to add const-ness type hints to PEP-484"
https://mail.python.org/archives/list/python-ideas@python.org/thread/OVPF5I6IOVF6GOJQRH5UGCCU3R7PQHUF/
- https://github.com/python/typing/issues/242
- "Final names and attributes" https://github.com/python/mypy/pull/5522
This is where `typing.Final` comes from.
- "[Python-ideas] "Immutable Builder" Pattern and Operator"
https://mail.python.org/pipermail/python-ideas/2017-January/044374.html
- [pyrsistent] and "fn.py [do] immutables:
https://github.com/kachayev/fn.py/blob/master/README.rst#persistent-data-structures "
- "[Python-ideas] Add recordlcass to collections module"
https://groups.google.com/g/python-ideas/c/9crHfcCBgYs/m/6_EEaWJAAgAJ
- ORMs (e.g. Django, SQLAlchemy) require "dirty state" checking to know which object attributes have changed and need an SQL statement to be executed to synchronize the state; this is relevant because when we're asking for mutable namedtuple we're often trying to do exactly this pattern.
- "[Python-ideas] Suggestions: dict.flow_update and dict.__add__"
https://www.google.com/search?q=%22%5BPython-ideas%5D+Suggestions%3A+dict.flow_update+and+dict.__add__%22
> dicttoolz has functions for working with these objects; including dicttoolz.merge (which returns a reference to the merged dicts but does not mutate the arguments passed).
>
> https://toolz.readthedocs.io/en/latest/api.html#dicttoolz
> https://toolz.readthedocs.io/en/latest/api.html#toolz.dicttoolz.merge
>
> pyrsistent has a PRecord class with invariants and type checking that precedes dataclasses. pyrsistent also has 'freeze' and 'thaw' functions for immutability. PRecord extends PMap, which implements __add__ as self.update(arg) (which does not mutate self)
https://github.com/tobgu/pyrsistent/blob/master/README.rst#precord
>
> https://github.com/tobgu/pyrsistent/blob/master/pyrsistent/_pmap.py
- "[Python-ideas] How to prevent shared memory from being corrupted ?"
https://www.google.com/search?q=%22How+to+prevent+shared+memory+from+being+corrupted+%3F%22
> PyArrow Plasma object ids, "sealing" makes an object immutable, pyristent
>
> https://arrow.apache.org/docs/python/plasma.html#object-ids
> https://arrow.apache.org/docs/python/plasma.html#creating-an-object-buffer
> > Objects are created in Plasma in two stages. First, they are created, which allocates a buffer for the object. At this point, the client can write to the buffer and construct the object within the allocated buffer. [...]
- [Python-ideas] Experimenting with dict performance, and an immutable dict
https://mail.python.org/archives/list/python-ideas@python.org/message/DNBGUJHDH4UTPSETMFFWMJHNXQXIWX4I/
> https://pyrsistent.readthedocs.io/en/latest/intro.html#pyrsistent :
>
>> Pyrsistent is a number of persistent collections (by some referred to as functional data structures). Persistent in the sense that they are immutable.
>>
>> All methods on a data structure that would normally mutate it instead return a new copy of the structure containing the requested updates. The original structure is left untouched.
>>
>> This will simplify the reasoning about what a program does since no hidden side effects ever can take place to these data structures. You can rest assured that the object you hold a reference to will remain the same throughout its lifetime and need not worry that somewhere five stack levels below you in the darkest corner of your application someone has decided to remove that element that you expected to be there.
>>
>> Pyrsistent is influenced by persistent data structures such as those found in the standard library of Clojure. The data structures are designed to share common elements through path copying. It aims at taking these concepts and make them as pythonic as possible so that they can be easily integrated into any python program without hassle.
> What would be the advantage of such a declaration?
Constants don't need to be locked or unlocked; which is advantageous for parallelism and reasoning about program correctness.
True consts (wherein everything referred to in that object is 'frozen' and immutable or at least only modifiable with e.g. copy-on-write)
wouldn't require locks,
which would be post-GIL advantageous.
You could do consts by never releasing a threading.Lock (or similar):
- https://docs.python.org/3/library/asyncio-sync.html#locks
- https://docs.python.org/3/library/threading.html#lock-objects
- This from
https://docs.python.org/2/library/sets.html?highlight=immutable#immutable-transforms re ImmutableSet/FrozenSet
is not present in the python 3 docs:
https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset
Though - even if Python enforced normal consts in the language - all of the other code objects would still be mutable, so you still have the impossibility of sandboxing python.
Functional and contracts coding styles rely upon invariance;
which can be accomplished with various third-party packages that enforce const-ness throughout what may be an object tree behind that reference that would otherwise need to be copy.deepcopy()'d.
## pyrsistent
Src: https://github.com/tobgu/pyrsistent
> - PVector, similar to a python list
> - PMap, similar to dict
> - PSet, similar to set
> - PRecord, a PMap on steroids with fixed fields, optional type and invariant checking and much more
> - PClass, a Python class fixed fields, optional type and invariant checking and much more
> - Checked collections, PVector, PMap and PSet with optional type and invariance checks and more
> - PBag, similar to collections.Counter
> - PList, a classic singly linked list
> - PDeque, similar to collections.deque
> - Immutable object type (immutable) built on the named tuple
> - freeze and thaw functions to convert between python standard collections and pyrsistent collections.
> - Flexible transformations of arbitrarily complex structures built from PMaps and PVectors.
## icontract
Src: https://github.com/Parquery/icontract
> icontract provides design-by-contract to Python3 with informative violation messages and inheritance.
>
> It also gives a base for a flourishing of a wider ecosystem:
>
> - A linter pyicontract-lint,
> - A sphinx plug-in sphinx-icontract,
> - A tool icontract-hypothesis for automated testing and ghostwriting test files which infers Hypothesis strategies based on the contracts,
together with IDE integrations such as icontract-hypothesis-vim, icontract-hypothesis-pycharm, and icontract-hypothesis-vscode,
> - Directly integrated into CrossHair, a tool for automatic verification of Python programs,
together with IDE integrations such as crosshair-pycharm and crosshair-vscode, and
> - An integration with FastAPI through fastapi-icontract to enforce contracts on your HTTP API and display them in OpenAPI 3 schema and Swagger UI.
https://en.wikipedia.org/wiki/Design_by_contracthttps://en.wikipedia.org/wiki/Invariant_(mathematics)#Invari...
[ https://en.wikipedia.org/wiki/Class_invariant ] > What is the difference between "invariant" and "constant" and "final"? |
|
| ▲ | DeathArrow 2 hours ago | parent | prev | next [-] |
| >But dictionaries are mutable, which makes them problematic for sharing data in concurrent code. Not really, C# has ConcurrentDictionary. |
|
| ▲ | zzzeek 8 hours ago | parent | prev | next [-] |
| SQLAlchemy has its own frozendict which we've had in use for many years, we have it as a pretty well performing cython implementation these days, and I use it ubiquitously. Would be a very welcome addition to the stdlib. This proposal is important enough that I chimed in on the thread with a detailed example of how SQLAlchemy uses this pattern: https://discuss.python.org/t/pep-814-add-frozendict-built-in... |
|
| ▲ | xamuel 7 hours ago | parent | prev | next [-] |
| This subject always seems to get bogged down in discussions about ordered vs. unordered keys, which to me seems totally irrelevant. No-one seems to mention the glaring shortcoming which is that, since dictionary keys are required to be hashable, Python has the bizarre situation where dicts cannot be dict keys, as in... {{'foo': 'bar'}: 1, {3:4, 5:6}: 7} ...and there is no reasonable builtin way to get around this! You may ask: "Why on earth would you ever want a dictionary with dictionaries for its keys?" More generally, sometimes you have an array, and for whatever reason, it is convenient to use its members as keys. Sometimes, the array in question happens to be an array of dicts. Bang, suddenly it's impossible to use said array's elements as keys! I'm not sure what infuriates me more: said impossibility, or the python community's collective attitude that "that never happens or is needed, therefore no frozendict for you" |
| |
| ▲ | kzrdude 7 hours ago | parent [-] | | Turning a dictionary into a tuple of tuples `((k1, v1), (k2, v2), ...)`; isn't that a reasonable way? If you want to have hash map keys, you need to think about how to hash them and how to compare for equality, it's just that. There will be complications to that such as floats, which have a tricky notion of equality, or in Python mutable collections which don't want to be hashable. | | |
| ▲ | xamuel 5 hours ago | parent [-] | | That argument would apply to sets too, and yet frozenset is builtin. |
|
|
|
| ▲ | semiinfinitely 2 hours ago | parent | prev | next [-] |
| its called dataclass |
|
| ▲ | shadowgovt 5 hours ago | parent | prev | next [-] |
| Regarding the spooky-action-at-a-distance concerns of a `.freeze()` method on dict: `.freeze()` should probably just return a frozendict instead of in-place mutating the dict, and they should be separate types. Under the hood, you'll have to build the hashtable anyway to make the frozendict; as long as you're doing that work, you may as well build an object to contain the hashtable and just have that object be separate from the dict that birthed it. The values referenced by both the original dict and the frozendict can be the same values; no need to clone them. |
|
| ▲ | whimsicalism 4 hours ago | parent | prev | next [-] |
| yes! no more `MappingProxyType` hacks |
|
| ▲ | lou1306 7 hours ago | parent | prev | next [-] |
| Can someone give a strong rationale for a separate built-in class? Because "it prevents any unintended modifications" is a bit weak. If you have fixed keys, a frozen dataclass will do. If you don't, you can always start with a normal dict d, then store tuple(sorted(d.items())) to have immutability and efficient lookups (binary search), then throw away d. |
|
| ▲ | drhagen 9 hours ago | parent | prev | next [-] |
| Great! Now make `set` have a stable order and we're done here. |
| |
| ▲ | cr125rider 8 hours ago | parent [-] | | Aren’t sets unsorted by definition? Or do repeated accesses without modification yield different results? | | |
| ▲ | ledauphin 7 hours ago | parent | next [-] | | this is likely in reference to the fact that dicts have maintained insertion order since Python ~3.6 as property of the language. Mathematically there's no defined order to a set, and a dict is really just a set in disguise, but it's very convenient for determinism to "add" this invariant to the language. | | |
| ▲ | zahlman 7 hours ago | parent | next [-] | | Sets use a different implementation intentionally (i.e. they are not "a dict without values") exactly because it's expected that they have different use cases (e.g. union/intersection operations). | |
| ▲ | jonathaneunice 7 hours ago | parent | prev [-] | | Debugging is a completely different and better animal when collections have a predictable ordering. Else, every dict needs ordering before printing, studying, or comparing. Needlessly onerous, even if philosophically justifiable. |
| |
| ▲ | adrian_b 7 hours ago | parent | prev | next [-] | | Not related to Python, but one of the possible implementations of a set, i.e. of an equivalence class on sequences, is as a sorted array (with duplicates eliminated, unless it is a multiset, where non-unique elements are allowed in the sorted array), as opposed to the unsorted array that can store an arbitrary sequence. So sets can be viewed as implicitly sorted, which is why the order of the elements cannot be used to differentiate two sets. Being sorted internally to enforce the equivalence between sets with elements provided in different orders does not imply anything about the existence of an operation that would retrieve elements in a desired order or which would return subsets less or greater than a threshold. When such operations are desired, an order relation must be externally defined on the set. So a possible definition of sets and multisets is as sorted arrays with or without unicity of elements, while sequences are unsorted arrays (which may also have the constraint of unique elements). However the standard set operations do not provide external access to the internal order, which is an order between arbitrary identifiers attached to the elements of the set, which have no meaning externally. | |
| ▲ | Retr0id 7 hours ago | parent | prev | next [-] | | A stable order does not imply sorted order. If you convert the same set to a list twice you'll probably get the same order both times but it isn't guaranteed anywhere, and that order may change between python implementations and versions. The order may also change as the set grows or shrinks. | |
| ▲ | sltkr 7 hours ago | parent | prev [-] | | So are dictionary keys, but Python decided to make them insertion ordered (after having them be unordered just like set elements for decades). There is no fundamental reason sets couldn't have a defined order. That's what languages like JavaScript have done too. | | |
| ▲ | cpburns2009 7 hours ago | parent [-] | | Python's decision to make dict keys ordered in the spec was a mistake. It may be the best implementation so far, but it eliminates potential improvements in the future. | | |
| ▲ | mrweasel 6 hours ago | parent [-] | | Agreed. The only reason to make them sorted is because people would wrongly assume that they where. You can argue that a programming language should not have unexpected behaviors, and apparently unsorted dictionary keys where a surprise to many, on the other hand I feel like it's a failure of education. The problem was that assuming that keys would be sorted was frequently true, but not guaranteed. An alternative solution would have been to randomize them more, but that would probably break a lot of old code. Sorting the keys makes no difference if you don't expect them to be, but it will now be a greater surprise if you switch language. |
|
|
|
|
|
| ▲ | malkia 5 hours ago | parent | prev [-] |
| Welcome to starlark :) |