Add BytesIO.name and StringIO.name

Hello all, it would be nice if BytesIO and StringIO had a name attribute. Or alternatively having a subclass that has a name.

Some code that accepts a file like object relies on this attribute. For example, the requests library uses to guess a file’s mime type.

It seems other people has run into this: Code search results · GitHub

3 Likes

Would <BytesIO> & <StringIO> be an acceptable name, as it doesn’t refer to a real file?
The requests library would handle this correctly:

if name and isinstance(name, basestring) and name[0] != "<" and name[-1] != ">":
1 Like

BytesIO and StringIO both have a __dict__ attribute, meaning you can just set the name attribute on their instances:

s = StringIO("foo")
s.name = "bar"
5 Likes

They already have a name attribute like many classes: __name__.

>>> import io
>>> io.StringIO.__name__
'StringIO'
>>> io.BytesIO.__name__
'BytesIO'

Without knowing what the purpose of requests.util.guess_filename is, nor the cases it must support, adding an extra check for __name__ to its logic would be a simple PR.

No, that is completely pointless and doesn’t at all solve any of the problems.

1 Like

I fail to see how there are any real problems in the first place.

It seems perfectly sensible to me, to change the client code that expects a certain API, to be less fussy and accept the API that’s already there.

Instead of requiring the reference implementation of an entire language to change, simply to accomodate the needs of a single function in an external library.

Instances don’t have a name attribute:

>>> import io
>>> file = io.StringIO()
>>> file.__name__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: '_io.StringIO' object has no attribute '__name__'. Did you mean: '__ne__'?

Fair enough. However instances still have a __class__ attribute which does.

>>> import io
>>> file_ = io.StringIO()
>>> file_.__class__.__name__
'StringIO'

Or have I got it backwards, and file_ is required?

I would suggest you first think (or research/ask) a bit about why .name is being accessed before making suggestions about what to do instead.

2 Likes

My suggestion, as it often is, is to do nothing to change the Python language without a good justification.

Hey, I use that in my code too, but I don’t think my type checker would like that:
Either you can’t pass it to the function, or you can’t access the attribute.

If they used <BytesIO> & <StringIO> by default, this could be resolved.
And you could allow to optionally specify the name in the constructor.

1 Like

Have you considered subclassing them to add the name? This seems like something that isn’t really intrinsic to those classes.

2 Likes

No, I didn’t rewrite that code with a type checker, so I didn’t know this. :slight_smile:
But that’s precisely what at least 320 people did: (Code search results · GitHub)

Correction at least 91 people: Code search results · GitHub

1 Like

One point I failed to make: StringIO and BytesIO instances are often used as a replacement for file like objects. They implement all other methods like .read, .seek, and .close to behave like them. File objects have a .name attribute.

So it’s more about them implementing the same interface as the files they often replace.

3 Likes

But they generally don’t have a name, similar to files like sys.stdin or sys.stdout. And there’s no benefit to demanding that the user supplies a name when creating one, as 99% of the time it would be irrelevant.

If you want a name, just add it:

f = StringIO()
f.name = "something.txt"

Yes, type checkers might have a problem with this, but that’s an issue for type checkers to solve, not a reason to make the StringIO API less user-friendly.

8 Likes

Also, you made an abstract request for an “attribute” to be added. What does this mean in practical terms?

  • Should it be possible to add it? → this is already the case
  • Should it be required to set it? → backwards incompatible, pointless most of the time, wont happen
  • Should it default to something? → To what? And why? Shouldn’t consumers be able to gracefully treat the absents of the attribute? What do we gain from a default for these specific file-like objects (since there are many other file like objects that still don’t have a name?
  • Should it be possible to set this in the construct? Why only name, not mode as well?
3 Likes

This shouldn’t pass type checking, BytesIO does NOT implement IO:

import io
import typing

def foo(file: typing.IO[bytes]) -> None:
    typing.reveal_type(file.name)  # Type of "file.name" is "str | Any"

foo(io.BytesIO())

If we implement this, the stubs don’t need to lie anymore:

class BytesIO(BufferedIOBase, BinaryIO):
    # BytesIO does not contain a "name" field. This workaround is necessary
    # to allow BytesIO sub-classes to add this field, as it is defined
    # as a read-only property on IO[].
    name: Any

Or type checkers could stop lying anyway and raise errors here, and introduce more gradual protocols for file-like objects, for example via optional protocol members suggested in other threads.

As has been pointed out, this has less to do with io.BytesIO and io.StringIO and more with the fact that typing.IO is not really usable as a generic standin. This is for example because it promisis a .name attribute that might not exists. It happens that the io.*IO classes implement all other methods and attributes (well, I am trusting you there, haven’t checked), but many other file-likes don’t.

2 Likes

Just checked, they implement all other methods and attributes:

from typing import Protocol, runtime_checkable
import io

@runtime_checkable
class IOProtocol(Protocol):
    # @property
    # def mode(self): ...
    # @property
    # def name(self): ...
    def close(self): ...
    @property
    def closed(self): ...
    def fileno(self): ...
    def flush(self): ...
    def isatty(self): ...
    def read(self): ...
    def readable(self): ...
    def readline(self): ...
    def readlines(self): ...
    def seek(self): ...
    def seekable(self): ...
    def tell(self): ...
    def truncate(self): ...
    def writable(self): ...
    def write(self): ...
    def writelines(self): ...
    def __enter__(self): ...
    def __exit__(self): ...

print(isinstance(io.BytesIO, IOProtocol))  # True
print(isinstance(io.StringIO, IOProtocol))  # True