hard to understand why design it like this…
rag_search_result = {
"rag_search_result": {
"rag_search_content": {
"message": "success",
"data": [
{
"page_content": "content1",
"metadata": {
"source": "document.docx",
"pk": "459315158705267390",
"relevance_score": 0.9990234375
},
"type": "Document",
"id": None,
"score": 3,
"kb_type": "tenant"
},
{
"page_content": "content2",
"metadata": {
"source": "document.docx",
"pk": "459315158705267392",
"relevance_score": 0.99169921875
},
"type": "Document",
"id": None,
"score": 3,
"kb_type": "tenant"
},
{
"page_content": "content3",
"metadata": {
"source": "document.docx",
"pk": "459315158705267391",
"relevance_score": 0.9736328125
},
"type": "Document",
"id": None,
"score": 3,
"kb_type": "tenant"
}
]
}
}
}
rag_search_content = rag_search_result["rag_search_result"]["rag_search_content"]
file_to_contents = {}
for result_single in rag_search_content.get('data', []):
content = result_single.get("page_content", "")
file_name = result_single.get("metadata", {}).get("source", "")
kb_type = result_single.get("kb_type")
file_to_contents.setdefault(tuple({file_name: file_name, kb_type: file_name}), []).append(content)
for file_key, file_content_list in file_to_contents.items():
print("output_file_key")
print(file_key)
print("output_file_key type")
print(type(file_key))
print("file_name")
print(file_key[0])
print("kb_type")
print(file_key[1])
when I get the codes (yes and I have somehow js background) so it be natual for me to refine it as:
file_to_contents.setdefault(tuple({file_name: file_name, kb_type: file_name}), []).append(content)
→
file_to_contents.setdefault(tuple({file_name, kb_type}), []).append(content)
then boomb ![]()
that makes me rethink should we really let set and dict have different order mechenism here…be there any other language designing it as dict be insertion ordered meanwhile set be random ordered?
It was specifically added for dict to make **kwargs ordered - see PEP 468 – Preserving the order of **kwargs in a function. | peps.python.org. There wasn’t a particular need to make sets ordered
I’ve moved this to Python Help , as it’s a better fit.
Ordering dicts has some benefits, like making it simpler to round-trip JSON data.
It may help you to know that dicts didn’t used to be ordered, and there is a separate collections.OrderedDict. But eventually the practical benefits were deemed more important than the theoretical purity of an unordered mapping type.
Interesting I read the history differently, including that PEP. To me it seems to state that dicts and sets were both optimised for performance. In python 3.6 the performance-optimised dicts were discovered to maintain their key order, people started using that “feature”, and the performance-cost of documenting something that’s already true was 0.
If performance-optimal dicts had not maintained their key order, the PEP seems to suggest **kwargs may have unpacked into an ordered dict instead.
Regardless, the convenience of this happy accident beats purity ![]()
That would be the case if CPython were the only Python; the cost of documenting it was making a demand on every other Python implementation, including every future version of CPython, to keep this as a feature. This was decided to be a cost worth paying, and it has in fact been extremely helpful since then, but it was a cost nonetheless.
If you want to dig into it, I believe the compact dict representation came from PyPy first.
We are frequently stumbling over this. For us, having sets ordered as well would frequently benefit us greatly. For example, when implementing stable hashes, you currently have to actively sort the set before turning it into a hash.
We tend to use dict[T, None] just to get the stable order. We also didn’t see a relevant performance impact when just using dict instead of set.
Would it be sensible to write a PEP for that?
Can’t you use frozensets?
Preserving the order of **kwargs was not why dictionaries were changed to preserve insertion order. Preserving dictionary insertion order was an unintentional side effect of another dictionary optimization.
Raymond Hettinger initally introduced the idea of More compact dictionaries with faster iteration on Python Dev in December 2012.
On the same day, Armin Rigo commented about dictionary order:
As a side note, your suggestion also enables order-preserving
dictionaries: iter() would automatically yield items in the order they
were inserted, as long as there was no deletion.
Tim Delaney quickly noted that
…there is an argument (no pun intended) for making **kwargs a
dictionary that maintains insertion order if there are no deletions. It
sounds like we could get that for free with this implementation…
I wasn’t around at the time, but my understanding of the history is that preserving **kwargs ordering was the rationale for elevating it from an implementation detail of CPython to part of the language specification.
Otherwise, I don’t quite get how to square this with PEP 468 existing and having been accepted.
For the record, **kwargs order is relating to CPython dict implementation.
Eric Snow had planned to use OrderedDict for keyword argument. He ported pure Python OrderedDict into C OrderedDict for that purpose.
But I didn’t like that. C OrderedDict still uses doubly linked-list. It is still slower and use more memory than dict.
That motivate me to port PyPy dict implementation into CPython. It made **kwargs faster, not slower.
Sorry for the late response. Unfortunately, frozensets are not always sufficient for our use-cases, in particular when you rely on mutability of the set and it is performance critical that you don’t convert between data structures when the order needs to be stable.
For example, imagine an algorithm where you build up a set over multiple iterations and have to check for membership throughout. Afterwards, you hash the set with a custom hash function. This step is performance critical, so you wouldn’t want to have to convert your set into a frozenset at this moment. Also, you don’t care about the exact order when iterating over the items, just that the hash stays the same when redoing the computation.
Might be too much of a niche use-case though and I see that frozenset can probably solve most of the relevant use cases where you need stable ordering for sets.
I too use dict[T, None], annoying - but it works.