Not Able to find 'item' in a text file (Array)

salesma91 · February 26, 2024, 11:26am

I have an emoji list in a text file and my goal is to ignore these names, I tried the code below but it isn’t working as it should be, it works when I reduce the list, below is the code:

The attached text file link has the list of the file names that I want to ignore;

below code I am using to achieve it, but it’s not picking the list inside the text file.

import sys
import re
import os
import numpy as np

# File Name to test
image_name ='1f4c3.png'
#datafromfile holds the list from test file, link for the text file is attached
datafromfile = np.genfromtxt("D:\VS Python\Emoji.txt",dtype="str")
# print(datafromfile)
# for i, x in enumerate(datafromfile):
#   print (x)

# image_name  should return value in the text file and skip
if image_name not in datafromfile:
else:
     print('Ignored successfully')

I have over 1000_ emoji image names in a text file. The name exists in the text file but I am not able to find it.

Text File is here: Upload files for free - Emoji.txt - ufile.io

sr-murthy · February 26, 2024, 11:37am

Your if condition is surely wrong - you are reading in the list into a variable named datafromfile but the if condition does not use this name, but something which is actually syntactically invalid:

if image_name not in data from file:

The not in is a valid construct, but not in <variable1> from <variable2> is invalid in Python. Perhaps you meant:

if image_name not in datafromfile:

salesma91 · February 26, 2024, 11:39am

Yes, Autocorrect made that possible. Can you tell me why it is not able to find the item from the text file?

JamesParrott · February 26, 2024, 11:40am

When hardcoding Windows paths in Python, use a raw string. Otherwise the directory separator is also the escape character, so must be escaped (with itself). I.e.:

r"D:\VS Python\Emoji.txt"

or if you must:

"D:\\VS Python\\Emoji.txt"

It’s especially problematic in paths when the first letter of a directory or file after a separator coincides with a Python escape shortcut, particularly \x or \u which can fail silently

sr-murthy · February 26, 2024, 11:41am

Did you consult the Numpy documentation on genfromtxt?

salesma91 · February 26, 2024, 11:44am

I just tried both ways, it is still not working, can you try it on your end? file is attached with my question. It would be an immense help.

I have just started Python, so I have not gone through Numpy Document, I will be going through it now.

sr-murthy · February 26, 2024, 11:47am

Unfortunately, I don’t have a Windows system. It is best to solve the problem yourself - the method documentation mentions the use of delimiter, which you have not specified. What delimiter is used in the file to separate entries? You must specify that delimieter in the call to genfromtxt, so that it can split entries into different elements in the list.

Also you mention Autocorrect - that suggests something like a non-code text editor. What do you use to write the code? If you use something like an IDE or even something like Sublime Text syntactically invalid constructs should be automatically highlighted.

salesma91 · February 26, 2024, 11:49am

There is no delimiter, it is stacked vertically, I could try using commas to separate the list. Let me check.

avisser · February 26, 2024, 12:05pm

If you read the file in as a string, the delimiter is probably the newline character. Try \n

salesma91 · February 26, 2024, 12:18pm

I got it, if image_name not in datafromfile: is case sensitive, this is why it was not able to find the file name in an array. Thank you for staying with me, it made me go a little deep. So, thank you once again.

kknechtel · February 26, 2024, 2:28pm

Some general hints for debugging - and for making it easier to demonstrate the problem to others - next time:

How exactly did you “reduce the list”?

What exactly does “isn’t working” mean? (The code you show isn’t valid, even after fixing the typo that was already pointed out - the if block can’t be empty like that.

When you un-comment the debugging code (to see the datafromfile result and the individual entries), do you see what you expect?

What happens if you try including the text as a string in your program, and using numpy.fromstring instead of numpy.genfromtxt?

Can you edit the string into a small example that demonstrates the problem? (Hint: if you use only the first half of the rows, does the problem occur? Only the second half? A few lines from the middle? If you find a smaller section of the data that demonstrates the problem, repeat until it’s minimal.)

Additional reading:

As for the issue you discovered:

Well, yes; why wouldn’t it be?

But you should also be aware of workarounds for that problem:

salesma91 · February 26, 2024, 4:25pm

Thank you, I have learned a lot from this interaction, and all of the points that you made are very interesting to me, I will try doing these tests that you suggested.

‘Reduce the list’ meant that after I deleted some items in the text file, it picked up some items because it was all in small letters same as the item I was trying to find, but some items in the text were in capital letters, which I didn’t notice, so I was confused that maybe these arrays have limits to how much ‘item’ it can hold.