File write ^@ characters

Hi everyone. I don’t know if this is a bug or bad code. I tried googling and some people say ^@ is null from C code. So, easiest solution would find and replace the character but it’s still not working. It only happens some of the time out of many many writes. When I do a grep on the file it says that it’s a binary file which I don’t want. I am running version 3.8.12.

Thanks

^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@2022:03:03:00:40:23:BFNY - hourly run successful

    hrly_rpt_file= "hrly_rpt.log"
    self.hrly_rpt= open(hrly_rpt_file,"a")

    t = time.localtime()
    current_time = time.strftime("%Y:%m:%d:%H:%M:%S", t)
    current_time = current_time.replace('\x00','')
    current_time = current_time.replace('\%x00','')
    message = message.replace('\x00','')
    message = message.replace('\%x00','')

    self.hrly_rpt.write(f"{current_time}:{message} \n")

Hi everyone. I don’t know if this is a bug or bad code. I tried
googling and some people say ^@ is null from C code. So, easiest
solution would find and replace the character but it’s still not
working. It only happens some of the time out of many many writes.
When I do a grep on the file it says that it’s a binary file which I
don’t want. I am running version 3.8.12.

It is not apparent to me where your string:

^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@2022:03:03:00:40:23:BFNY - hourly run successful

came from. The code below doesn’t show an obvious way to make it:

hrly_rpt_file= "hrly_rpt.log"
self.hrly_rpt= open(hrly_rpt_file,"a")

t = time.localtime()
current_time = time.strftime("%Y:%m:%d:%H:%M:%S", t)
current_time = current_time.replace('\x00','')
current_time = current_time.replace('\%x00','')
message = message.replace('\x00','')
message = message.replace('\%x00','')

self.hrly_rpt.write(f"{current_time}:{message} \n")

Certainly I would never expect time.strftime to produce a string
containing NUL characters.

Next, does replace work on the NUL character? Yes:

Python 3.9.10 (main, Jan 15 2022, 11:48:15)
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more 
information.
>>> text='\0\0\0'
>>> text
'\x00\x00\x00'
>>> text.replace('\x00','')
''
>>>

Now, some context.

When I see data like the above in a supposed text file my thought is
normally that someone has done a seek() to some point beyond the end
of the file and written there. On POSIX systems that fills the
intermediate area with NUL bytes.

So we need a little more information about hwere this data came from.

Cheers,
Cameron Simpson cs@cskk.id.au

Hi Cameron,

The function accepts a String and append current time to the front. At first I thought it was time.strftime because it probably calls a C function. I added replace to current_time but that didn’t work so I added replace to the message also.

print_rpt(f"{station.upper()} - hourly run successful")

Thanks

I’d be examining the supplied string then with repr.

This kind of thing, BTW, is why log messages should do some degree of
quoting/escaping/transcription of supplied data to avoid “injection”
issues, normally considered attacks but also leading to plain old data
corruption in the sense that the log becomes invalid. here you have
NULs, but equally your might get a "blah\nvalid-timestamp bogus log info" presupplied string. Or something like that.

Typically log likes commence with the timestamp, and you could write
repr(the_supplied_string) in the message part. You at least wouldn’t
get NULs or embedded newlines or whatever.

But if you’re being presupplied NULs I’d be suspicious of the source
string and how it was obtained.

Cheers,
Cameron Simpson cs@cskk.id.au

After adding repr(message) but still failed. So, I added repr(current_time) and still failed. The repr function seems to work fine and the String is good. Not sure where ^@ comes from.

‘2022:03:07:17:38:47’:‘ALA1 - run successful’
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@‘2022:03:07:17:42:56’:‘ALAN - run successful’
‘2022:03:07:17:45:29’:‘ALAS - run successful’

Two thoughts come to mind. 1: file corruption (say, computer shut down without syncing disk first). 2: seek beyond the end of the file before the write operation. When read, the “hole” between the last write and next-to-last write will read as ASCII NUL characters. YMMV. My background is on Unix-based operating systems (Linux, Solaris, Mac, etc).

Hi Skip,

For sure it’s not #1. There’s no seek only append. Any suggestions on how to debugging this?

Thanks

Some other things to think about (I agree with Cameron that you’ve been kinda skimpy on details and context):

  • If you start with no hrly_rpt.log file on-disk, can you reproduce the problem?
  • Does it require just one run of your application to reproduce the problem? Or multiple runs?
  • What is your platform?
  • Can you reproduce this in a small standalone script (and post it so others can mess around with it)?
  • Is the string with the NUL bytes in your first message the entire contents of the file or is there more before or after that?
  • Is some other process also writing to the file?
  • If not, is it open multiple times in the same process, perhaps in multiple threads?
  • If multi-threaded, are you sharing your Python file object across threads without properly controlling access to it?

While you only open it for append, having two different open file objects might well wreak havoc. Each one would have a C stdio FILE * pointer (or similar, considering Unicode and all) open underneath the Python layer. I’m not sure if stdio will share FILE * pointers or not, but I wouldn’t count on it.

One thing I can ensure you is that stdio isn’t mysteriously writing NUL bytes to your open file. It’s got to be something you are doing. Try printing self.hrly_rpt.tell() before each message written. If your application is multi-threaded or multiple process, also print os.getpid() or threading.current_thread().ident (as appropriate) when you call tell().

While you only open it for append, having two different open file
objects might well wreak havoc.

It is hard to see them writing NULs though. Overwriting each other
(maybe), interspersing message text etc etc, sure.

Each one would have a C stdio FILE * pointer (or similar, considering
Unicode and all) open underneath the Python layer. I’m not sure if
stdio will share FILE * pointers or not, but I wouldn’t count on it.

They’re independent. What a disaster it would be if they weren’t :slight_smile: How
would you do stdio-mediated random access if they magicly hooked up?

That said, ISTR that Python 3 doesn’t use stdio beneath its file objects
any more, but it will be using something equivalent such as
file:///Users/cameron/var/doc/python/3.8.0/library/io.html#io.BufferedIOBase

With stdio, and almost certainly with BufferedIOBase (I have not
checked), a file opened for “append” will have the OS level file
descriptor opened with O_APPEND, which means that data are always
written to the end of the file, regardless of where the tell() pointer
is. That means that (a) files really do always append and (b) multiple
writers do not overwrite each other. But without care, the write buffers
may not be line-of-text aligned, so you can intersperse things and thus
make garbage.

So I do not expect files always opened in append mode to get spurious
NULs. I would suspect something else (eg a file not open in append mode
but faking it badly, for example, with a racy seek(),write() pair).

One thing I can ensure you is that stdio isn’t mysteriously writing NUL
bytes to your open file. It’s got to be something you are doing.

Or something someone else is doing :frowning:

Try printing self.hrly_rpt.tell() before each message written. If
your application is multi-threaded or multiple process, also print
os.getpid() or threading.current_thread().ident (as appropriate)
when you call tell().

Adding debug is always good. I like doing something like this sometimes:

write('['+line+']\n')

which concisely marks the start and end of each line. That would tell
you if the NULs were external, or part of the line data (outside vs
inside the square brackets).

Cheers,
Cameron Simpson cs@cskk.id.au

Thanks for all your help. I believe the problem is resolved by adding flush() after each write(). There’s probably some buffer overflow. I tried to reproduce the error but strangely I can’t reproduce it. I run 6 processes concurrently about 10 times but could not repeat the problem.