This is a specific scenario, but there are many alike.
A soft-concurrent program (threads or asyncio) is visiting several Web sites. The memory pool is fragmented, but total memory remains low and constant. Then several huge responses are downloaded.
In that case, a statement like this becomes O(MxN)
memory (N
= size of response, M
= number of methods called:
body = text.strip().replace('\x00', '').encode('utf8')
(that’s real-life example, but think about the case in which each call creates a slightly larger result).
The reason is this:
The currently available ways of avoiding the situation are at the expense of losing the convenience of the many methods and functions available for str
and bytes
.
One possible solution to keep the convenience of str
and bytes
would be for Python to switch to a “long string (bytes)” representation after the length is above a certain threshold. The long string would be stored in smaller blocks. For some operations the single large block may need to be re-constructed, but memory should keep within O(2xN)
per “logical string”, with one of the N
consisting of small blocks.
(research done by Heorhii Zatserklianyi)