Removal of Duplicates from List

giddyhead · January 29, 2021, 11:23pm

Hello everyone. I am new to python and I have been working on this for a while. I am trying to create a way to put the MP3 Tags from the songs from the folder into a spreadsheet.

    import mutagen,xlrd, glob,re,openpyxl,os,pygal
    #from mutagen.easyid3 import EasyID3
    from os import walk
    from pprint import pprint 
    from tinytag import TinyTag, TinyTagException
    from openpyxl import Workbook
    from openpyxl.utils import get_column_letter
    from mp3_tagger import MP3File
    from string import ascii_uppercase
    from mutagen.mp3 import MP3
    list = os.listdir('C:\\Users\\mrdrj\\Desktop\\sdf\\') # directory path of files
    number_files = len(list) +1
    from openpyxl.workbook import Workbook

    tracks= []
    gettags =[]
    getit = []

    def ExtractMP3TagtoExcel():
     
        for root, dirs, files, in os.walk ('C:\\Users\\mrdrj\\Desktop\\sdf\\'):
            for name in files:
                if name.endswith(('.mp3','.m4a','.flac','.alac')):

                    tracks.append(name) #Add Media Files

                    try:

                        temp_track = TinyTag.get(root + '\\' + name)
                        mp3 = MP3File(root + '\\' + name)
                        #tags = mp3.get_tags()
                        #print(root, '-',temp_track.artist, '-', temp_track.title)

                        gettags2 = [temp_track.album, temp_track.albumartist, temp_track.artist, temp_track.audio_offset,
                                    temp_track.bitrate, temp_track.comment, temp_track.composer, temp_track.disc,
                                    temp_track.disc_total, temp_track.duration, temp_track.filesize, temp_track.genre,
                                    temp_track.samplerate, temp_track.title, temp_track.track, temp_track.track_total,
                                    temp_track.year] #Add Tags to list
                       
                   
        
                        for x in range(len(gettags2)):
                        #append slice of gettags2, containing the entire gettags2
                            gettags.append(gettags2[:])
                            #print(gettags2[x]) 
                    except TinyTagException:
                        print('Error')

                    
                    os.chdir('C:\\Users\\mrdrj\\Desktop\\sdf\\')
                    header = [u'album',u'albumartist' u'artist', u'audio_offset',u'bitrate', u'comment', u'composer', u'disc',u'disc_total',
                                  u'duration', u'filesize', u'genre',u'samplerate', u'title', u'track', u'track_total',u'year']                             
                    #header2 = {u"album",u"albumartist" u"artist", u"audio_offset",u"bitrate", u"comment", u"composer", u"disc",u"disc_total",
                                  #u"duration", u"filesize", u"genre",u"samplerate", u"title", u"track", u"track_total",u"year"}

                    new_date = gettags
                    wb = Workbook()
                    new_data = gettags
                    dest_filename = '11empty_book11.xlsx'
                    ws1 = wb.active
                    ws1.title = "MP3 Tags"
                    ws2 = wb.create_sheet(title="Set")
                    ws1.append(header[:])
                    
                    tags = []
                      
                    
                    for row in new_data: # Number of Rows
                        #tags.append(new_data[:]) #Add to Tag List
                        tags.append(row)
                        headers = set(tags)
                                            
                        ws1.append(row)

                  
                    print(row)
                    
                    wb.save(filename=dest_filename)

It runs however when I try to delete the duplicates in the below section I get TypeError: unhashable type: ‘list’ How do I finish this so it put the Tags in the excel document without duplicates once and for all? Thanks for your time.

for row in new_data: # Number of Rows
#tags.append(new_data[:]) #Add to Tag List
tags.append(row)
headers = set(tags)

pylang · January 30, 2021, 12:11am

Sets don’t like list elements.

>>> tags = ["a", ["b"]]
>>> set(tags)
TypeError ...

Try converting the elements of the tag into tuples first, then use set(tag).

>>> tags = ["a", ("b",)]
>>> set(tags)
{('b',), 'a'}

In a simple case, you can try a list comprehension to convert them. Hopefully, you don’t have a list of lists of lists of lists …

cameron · January 30, 2021, 12:23am

Hello everyone. I am new to python and I have been working on this for
a while. I am trying to create a way to put the MP3 Tags from the
songs from the folder into a spreadsheet.

A few random remarks on the way to your real question below…

   list = os.listdir('C:\\Users\\mrdrj\\Desktop\\sdf\\') #

2 things:

Don’t call a variable “list”, that is the name of the built in type
“list”. Pick another name eg “filenames”.

You’ve nicely doubled the backslashes in your Windows file path. You may
find raw strings more convenient:

filenames = os.listdir(r'C:\Users\mrdrj\Desktop\sdf\')

Note the leading r’ which starts a raw string - in such a string the
backslash is not a special character.

       for root, dirs, files, in os.walk('C:\\Users\\mrdrj\\Desktop\\sdf\\'):

Same raw string remark here.

           for name in files:
               if name.endswith(('.mp3','.m4a','.flac','.alac')):

                   tracks.append(name) #Add Media Files

                   try:

                       temp_track = TinyTag.get(root + '\\' + name)
                       mp3 = MP3File(root + '\\' + name)

This is better written:

track_filepath = os.path.join(root, name)
temp_track = TinyTag.get(track_filepath)
mp3 = MP3File(track_filepath)

In particular, os.path.join knows your OS file separator.

                       #tags = mp3.get_tags()
                       #print(root, '-',temp_track.artist, '-', temp_track.title)

                       gettags2 = [temp_track.album, temp_track.albumartist, temp_track.artist, temp_track.audio_offset,
                                   temp_track.bitrate, temp_track.comment, temp_track.composer, temp_track.disc,
                                   temp_track.disc_total, temp_track.duration, temp_track.filesize, temp_track.genre,
                                   temp_track.samplerate, temp_track.title, temp_track.track, temp_track.track_total,
                                   temp_track.year] #Add Tags to list

Is there a reason to not just keep temp_track itself instead of pulling
out a large but arbitrary set of particular fields, whose meanings you
now no longer know except by their position in gettags2?

                       for x in range(len(gettags2)):
                       #append slice of gettags2, containing the entire gettags2
                           gettags.append(gettags2[:])
                           #print(gettags2[x])

If you’re not using x, the common convention is to just call it “_”. Of
course, if you are using it in the print call just ignore me.

                   except TinyTagException:
                       print('Error')

You really want to print the exception here, otherwise you don’t know
what happened. Probably you also want to skip the file entirely, since
you won’t have any tags:

except TinyTagException as e:
    print('File', name, 'Error', e)
    continue

That will show more information and skip to the next loop iteration
immediately.

                   os.chdir('C:\\Users\\mrdrj\\Desktop\\sdf\\')

We tend to avoid using os.chdir(). It changes the global state of your
programme: suddenly other parts of the code which might be opening files
without a full file path are opening then in the wrong place, etc.

It is usually better to just assemble the full path to what you’re
trying to use with os.path.join than to chdir.

                   header = [u'album',u'albumartist' u'artist', u'audio_offset',u'bitrate', u'comment', u'composer', u'disc',u'disc_total',
                                 u'duration', u'filesize', u'genre',u'samplerate', u'title', u'track', u'track_total',u'year']
                   #header2 = {u"album",u"albumartist" u"artist", u"audio_offset",u"bitrate", u"comment", u"composer", u"disc",u"disc_total",
                                 #u"duration", u"filesize", u"genre",u"samplerate", u"title", u"track", u"track_total",u"year"}

You might be able to pull this list of names directly from the temp_trak
object rather than hardwiring it here.

                   new_date = gettags
                   wb = Workbook()
                   new_data = gettags
                   dest_filename = '11empty_book11.xlsx'
                   ws1 = wb.active
                   ws1.title = "MP3 Tags"
                   ws2 = wb.create_sheet(title="Set")
                   ws1.append(header[:])

Just “header” will do here: you don’t need to append a copy of the list;
ws1.append is almost certainly copying its contents anyway.

                   tags = []
                   for row in new_data: # Number of Rows
                       #tags.append(new_data[:]) #Add to Tag List
                       tags.append(row)
                       headers = set(tags)
                       ws1.append(row)
                   print(row)

The “print” above seems to be outside the loop. It will only print the
last row.

It runs however when I try to delete the duplicates in the below
section I get TypeError: unhashable type: ‘list’ How do I finish this
so it put the Tags in the excel document without duplicates once and
for all? Thanks for your time.

Now to your error. What is a “hashable” type?

A set is like a dict with no values (or where the values are the keys).
It has the same constraints on the values it can store.

For a dict to find its keys or for a set to check whether a value is
already there, the value needs 2 criteria:

it must has a working “==” implementation so that things can be
compared
it must be hashable: it must have a “hash function” which ensures that
two “equal” values also have the same “hash value” from their hash
functions

The reason dicts and sets have fast lookup, around O(1), is that they
store things in a “hash table”: Hash table - Wikipedia

The table is an array of “buckets” with all the values with the same
“hash value” in the same bucket. When you look something up, it locates
the right bucket from the hash value of the value. Then it just has to
compare against the things in that single bucket, not against all the
values in the hash table.

This is why equality needs to imply the same “hash value”: if the
value is already in the table, it must appear in the bucket we decide
to look at.

To make this fast, the number of buckets is usually about the same size
as the number of values, so the there are very few values in any given
bucket.

So, this means that what you’re storing must have some kind of “hash
function” provided which returns the hash value. That needs to be stable
and unchanging.

An object with such a function is called “hashable”, which is what your
error is complaining about: lists are not hashable because they can be
modified, which means that they may no longer be the same, equalitywise.
So they are unsuitable for use in a set.

Back to your code:

                   tags = []
                   for row in new_data: # Number of Rows
                       #tags.append(new_data[:]) #Add to Tag List
                       tags.append(row)
                       headers = set(tags)
                       ws1.append(row)
                   print(row)

“tags” is a list. “new_data” is “gettags”, which is a list of lists.
That means that “row” is a list. By appending the row to “tags”, “tags”
is a list of lists:

tags = [ ["red", "blue"], ["green", "yellow"] ]

or whatever. Lists are not hashable.

What are you actually trying to make unique? Is it the contents of each
list (convert [“red”, “green”, “green”] to [“red”, “green”])?
Or is it the lists overall (no two lists with the same things.

For the former, maybe you want:

row = set(row)

For the latter, you need to convert your lists into an immutable type,
such as a tuple:

tags.append(tuple(row))

Now tags is a list of tuples, and tuples are hashable (they cannot be
changed, so you can put them in a hash table and expect to find them
again later).

However, I think you’re confused about what you’re trying to uniqueify.
Why is “headers” obtainable from “tags”?

I’d be printing some of these things out to see if they contain what you
think they should contain.

Also, remember that a set is unordered. When you iterate over it, you
don’t know what order the elements will be handed out.

Cheers,
Cameron Simpson cs@cskk.id.au

steven.daprano · January 30, 2021, 12:55am

Hi Jason,

A few comments on your code:

list = os.listdir('C:\\Users\\mrdrj\\Desktop\\sdf\\') # directory path of files

You can write that path with forward slashes and Windows will still know
what you mean:

'C:/Users/mrdrj/Desktop/sdf/') # directory path of files

More importantly, you have assigned that list of files to the name
“list”, which is the same name used for the built-in list type. This is
called “shadowing”, and it will prevent you from using the built-in list
name:

>>> list('abc')
['a', 'b', 'c']
>>> list = [2, 4, 8]
>>> list('abc')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'list' object is not callable

Shadowing can be harmless, and occasionally even useful, but especially
for beginners is can lead to confusing errors. So be careful with that.

You also had this line:

number_files = len(list) +1

This seems odd. You have a list of file names, but you are saying that
there is one more file than the number of file names.

You also don’t seem to use that number anywhere in your code.

You have:

except TinyTagException:
    print('Error')

Trust the voice of experience here: print('Error') is a horrible,
horrible thing to do to your users, even if your only user is yourself.

Have you every been frustrated by a program that won’t let you do
something, and it just says “An error occurred” and you cannot work out
what sort of error or how to fix it? And so you curse the idiot
programmer who wrote that thing?

Congratulations, you have just joined the club of annoying programmers
who stop users from working out what went wrong with their data

The time will come that you are trying to process a file, and something
goes wrong, and you will be wracking your brain trying to work out
what went wrong so you can fix it, and your program is mocking your
pain by just saying “Error”.

You can fix this:

except TinyTagException as err:
    print(err)

which will print whatever information TinyTag gives you. Which hopefully
will be something better than just “Sorry, an error occurred”.

You have:

header = [u'album',u'albumartist' u'artist', ...]

Firstly, if you are using Python 3, there is no need to use the
u-prefix. All regular strings ‘album’, ‘albumartist’, etc are Unicode
strings.

Secondly, you have a missing comma there. A feature of Python is that
adjacent string literals are automatically concatenated, so you can do
things like this:

s = 'string with " double quotes ' "and ' single quotes."

without needing any escapes. But that means that your missing comma in
the list gives you:

'albumartist' 'album'

which becomes ‘albumartistalbum’. Oops. So watch out for missing commas.

steven.daprano · January 30, 2021, 1:05am

Cameron suggested using raw strings for paths:

filenames = os.listdir(r'C:\Users\mrdrj\Desktop\sdf\')

Alas, raw strings are actually only semi-raw. For reasons that nobody
has been able to explain to me (or if they have, I have forgotten) you
cannot end a raw string with a backslash because it escapes the quote.

>>> r'path\'
  File "<stdin>", line 1
    r'path\'
            ^
SyntaxError: EOL while scanning string literal


>>> r'path\'ab'
"path\\'ab"

Weird, huh? I don’t get it either.

You can use raw strings for paths so long as you don’t end the path with
a backslash.

Cameron also made a good point about using os.chdir to change the
current working directory. There are some parts of Python that will
misbehave or fail if you do that, although I cannot remember off the top
of my head what they are. I think os.walk might be one?

cameron · January 30, 2021, 4:06am

Cameron suggested using raw strings for paths:

filenames = os.listdir(r’C:\Users\mrdrj\Desktop\sdf')

Alas, raw strings are actually only semi-raw. For reasons that nobody
has been able to explain to me (or if they have, I have forgotten) you
cannot end a raw string with a backslash because it escapes the quote.

r’path'
File “”, line 1
r’path'
^
SyntaxError: EOL while scanning string literal

Ah yes. Something was niggling in the back of my head when I wrote
that.

Cameron also made a good point about using os.chdir to change the
current working directory. There are some parts of Python that will
misbehave or fail if you do that, although I cannot remember off the top
of my head what they are. I think os.walk might be one?

The second “Note:” here:

https://docs.python.org/3/library/os.html#os.walk

says not to chdir during an os.walk from a relative path, because
os.walk uses the path as is i.e. relative and if the current working
directory changes everything will break.

But the same issue applies generally - if your programme uses relative
paths to access things anywhere, that presumes that the current
directory is relevant. If some other part of the code changes that then
the presumption is then invalid.

Because os.chdir is process-wide, fiddling with it is generally to be
avoided or only done with full knowledge of the entire programme’s
workings, typically once at the beginning.

Cheers,
Cameron Simpson cs@cskk.id.au

giddyhead · January 30, 2021, 5:53am

Thank you all for your insight, updates and information. I wanted to let you know that I have updated the script as follows:

    import mutagen, xlrd, glob, re, openpyxl, os, pygal
    # from mutagen.easyid3 import EasyID3
    from os import walk
    from pprint import pprint
    from tinytag import TinyTag, TinyTagException
    from openpyxl import Workbook
    from openpyxl.utils import get_column_letter
    from mp3_tagger import MP3File
    from string import ascii_uppercase
    from mutagen.mp3 import MP3
    from openpyxl.workbook import Workbook

    tracks = []
    gettags = []

    def ExtractMP3TagtoExcel():
        for root, dirs, files, in os.walk(r'C:\Users\mrdrj\Desktop\sdf'):
            for name in files:
                if name.endswith(('.mp3', '.m4a', '.flac', '.alac')):
                    tracks.append(name)  # Add Media Files
            try:
                track_filepath = os.path.join(root, name)
                temp_track = TinyTag.get(track_filepath)
                mp3 = MP3File(track_filepath)
                except TinyTagException as err:
                print(err)
                continue

                 gettags2 =[temp_track.album, temp_track.albumartist, temp_track.artist, temp_track.audio_offset,
                    temp_track.bitrate, temp_track.comment, temp_track.composer, temp_track.disc,
                    temp_track.disc_total, temp_track.duration, temp_track.filesize, temp_track.genre,
                    temp_track.samplerate, temp_track.title, temp_track.track, temp_track.track_total,
                    temp_track.year]  # Add Tags to list

                for x in range(len(gettags2)):
                # append slice of gettags2, containing the entire gettags2
                    gettags.append(gettags2[:])


            #os.path.join(root, name)
                header = ['album', 'albumartist', 'artist', 'audio_offset', 'bitrate', 'comment', 'composer', 'disc',
              'disc_total', 'duration', 'filesize', 'genre', 'samplerate', 'title', 'track', 'track_total', 'year']

                 wb = Workbook()
                new_data = gettags
                dest_filename = '11empty_book11.xlsx'
                ws1 = wb.active
                ws1.title = "MP3 Tags"
                ws2 = wb.create_sheet(title="Set")
                ws1.append(header[:])

                tags = []
                for row in new_data:  # Number of Rows
                 # tags.append(new_data[:]) #Add to Tag List
                    row = set(row)
                    ws1.append(row)
                    tags.append(tuple(row))
                print(row)

                wb.save(filename=dest_filename)

    ExtractMP3TagtoExcel()

and when it is ran I get two errors the except TinyTagException and the Calling of the function on the last line. How can this be smoothed over so it can pass the info to the spreadsheets. Thanks

IndentationError: unexpected unindent

from the last line to call the ExtractMP3TagtoExcel() and I got a a SyntaxError: invalid syntax at

SyntaxError: invalid syntax
except TinyTagException as err:
print(err)
continue

cameron · January 30, 2021, 7:52am

Thank you all for your insight, updates and information. I wanted
to let you know that I have updated the script as follows:

[…]

    def ExtractMP3TagtoExcel():
        for root, dirs, files, in os.walk(r'C:\Users\mrdrj\Desktop\sdf'):
            for name in files:
                if name.endswith(('.mp3', '.m4a', '.flac', '.alac')):
                    tracks.append(name)  # Add Media Files
            try:
                track_filepath = os.path.join(root, name)

[…]

                for x in range(len(gettags2)):
                # append slice of gettags2, containing the entire gettags2
                    gettags.append(gettags2[:])

[…]

                 wb = Workbook()
                new_data = gettags

The wb = is not at the same indentation as the “for” above and the
“new_data” below.

                wb.save(filename=dest_filename)

    ExtractMP3TagtoExcel()

I presume you intend to end the function definition and then call the
function.

You have not closed the “try:” clause. A “try” requires at least an
“except” or a “finally” clause.

and when it is ran I get two errors the except TinyTagException and the Calling of the function on the last line. How can this be smoothed over so it can pass the info to the spreadsheets. Thanks

IndentationError: unexpected unindent

The “wb =” above may be the issue. And also the call to
ExtractMP3TagtoExcel, since Python thinks the function is not finished
because the "try’ is not open.

from the last line to call the ExtractMP3TagtoExcel() and I got a a SyntaxError: invalid syntax at

SyntaxError: invalid syntax
except TinyTagException as err:
print(err)
continue

Here, the “except” has no “try”. Also the print and continue should be
indented under the “except”. But move these up to the code which would
raise the TinyTagException.

Cheers,
Cameron Simpson cs@cskk.id.au

giddyhead · February 1, 2021, 12:36am

Ahh. Thanks for the info. I used pycharm to update the information just to be sure. When I ran the script it give me an error No tag reader found to support filetype! To make it easier in requesting assistance with the script I have the following link. https://github.com/giddyhead/MP3TagstoExcel.git. I am sure I should of put other items in the original code to make it easier apologize for that but that is why I am not sure what is missing from the script for it to process its way through. Thanks