Help with Python Script and Discogs API

jaroatsea · August 27, 2024, 1:59am

I used Chat GPT to write a script to automate a task I really don’t want to do.

The task
I have a ton of mp3 files that I have labeled the correct artist and title, however I don’t have the year values for the metadata. That’s where Discogs comes in, except I don’t want to be sat at my desk for hours on end to only accomplish a 200 file updates, manually searching for the years. So I signed up for free on the Discogs website and got a token for their API. Chat GPT wrote the script based on what I told it and for the most part it works.

The problem
To test it out before going full-scale, I put a bunch of files that are songs from the 50s into a folder and executed the script. Of the 5 files I tested, one came back with the wrong year. I know the correct year I need is on Discogs, but the API won’t give it to me or the script can’t find it? idk
Sometimes when trying to troubleshoot the script I won’t even get the year values for any of the tracks.

What I’m looking for
I want to know how to get the script to pick the right year OR how to get the API to give me the correct information.

AlSweigart · August 27, 2024, 3:30am

Could you post your code (without the API key) as well as the returned info from the api request?

jaroatsea · August 27, 2024, 6:52pm

So this is the code. It was done in such a way where the API key is kept in another file and it reads that file for the key.

import os
import requests
from mutagen.easyid3 import EasyID3
from mutagen.mp3 import MP3
from config import DISCOGS_TOKEN

# Function to get the first release year from Discogs
def get_first_release_year(artist, title):
    search_url = "https://api.discogs.com/database/search"
    params = {
        "artist": artist,
        "title": title,
        "type": "release",
        "token": DISCOGS_TOKEN,
    }

    try:
        response = requests.get(search_url, params=params)
        response.raise_for_status()
        results = response.json().get("results", [])

        if results:
            # Extract the earliest year from all results
            years = [result["year"] for result in results if "year" in result]
            if years:
                return min(years)

    except requests.exceptions.RequestException as e:
        print(f"Error querying Discogs: {e}")

    return None

# Function to update the year in the audio file metadata
def update_audio_file_year(file_path, year):
    try:
        audio = MP3(file_path, ID3=EasyID3)
        audio["date"] = str(year)
        audio.save()
        print(f"Updated year to {year} for file: {file_path}")
    except Exception as e:
        print(f"Failed to update file {file_path}: {e}")

# Main function to process all audio files in the folder
def main():
    folder = os.path.dirname(os.path.realpath(__file__))

    for filename in os.listdir(folder):
        if filename.endswith(".mp3"):
            file_path = os.path.join(folder, filename)
            try:
                audio = MP3(file_path, ID3=EasyID3)
                artist = audio.get("artist", [None])[0]
                title = audio.get("title", [None])[0]

                if artist and title:
                    print(f"Processing: {artist} - {title}")
                    year = get_first_release_year(artist, title)
                    if year:
                        update_audio_file_year(file_path, year)
                    else:
                        print(f"No release year found for {artist} - {title}")
                else:
                    print(f"Skipping {filename}: missing artist or title metadata")

            except Exception as e:
                print(f"Error processing {filename}: {e}")

if __name__ == "__main__":
    main()

My understanding is that a query will provide a JSON and the script parses that for the information requested.

I have before with the help of Chat GPT make the information returned in the JSON readable to a regular person like me. I can tell you that in that package is basically information about all the releases of a song. So let’s say there’s a record from the 50s you’re inquiring about, it would give you things like what country it was released in and the year it was released and what labels were involved etc. So some songs could have multiple years and I have tried getting the script to pick the earliest year but it never does.

funkyfuture · August 27, 2024, 7:02pm

there are actually crafted solutions for that problem, e.g. http://beets.io/
this one’s even implemented in Python and studying the code is more fruitful than aksing a stochastic model.

regarding the metadata selection, mind that the earliest release of what matches artist and release title isn’t necessarily what the files are representing. the latter may originate from a reissue that has extended contents or changed track order.

jaroatsea · August 28, 2024, 8:20pm

Thank you. I am a not a coder, not even by the slightest definition. I wouldn’t know where to start. Just needed something to help me with organization. I will look over the link you sent. Thank you

fungi · August 28, 2024, 8:35pm

Just to chime in, I’ve used Beets myself for many years, it’s a
great application for organizing a large digital music collection
and can query and update a wide variety of metadata automatically
(not just from from Discogs but also other sources). Definitely a
testament to the great sorts of software solutions people have
created using Python!

jaroatsea · September 1, 2024, 5:59pm

Ok so I checked out beets and it seems like a great tool, but not what I’m looking for. I’m just trying to automate the process of manually searching Discogs for myself for the songs in my DJ Library.

In any case, could anyone help me with this script. I must reiterate I know nothing about coding, python etc. I just know that I asked ChatGPT and I got something back. It is frustrating me though.

import requests

# Function to query the Discogs API for release years based on artist and song title
def query_discogs_release_years(token, artist, title):
    base_url = "https://api.discogs.com/database/search"

    # Set up the query parameters
    params = {
        "artist": artist,
        "track": title,
        "token": token,
        "type": "release",
        "per_page": 100,
        "page": 1,
    }

    while True:
        # Make the request to the Discogs API
        response = requests.get(base_url, params=params)
        
        if response.status_code != 200:
            print(f"Error fetching data from Discogs API: {response.status_code}")
            return

        results = response.json().get("results", [])
        if not results:
            break

        # Print each result's year and title, but only if the year is not None
        for result in results:
            year = result.get("year")
            title = result.get("title")
            if year:  # Check if year is not None
                print(f"Year: {year}, Title: {title}")

        # Prepare for the next page of results
        params["page"] += 1

# Main function to prompt the user and run the query
def main():
    # Prompt for the artist and title
    artist = input("Enter the artist name: ")
    title = input("Enter the song title: ")

    # Query the Discogs API
    query_discogs_release_years('DISCOGS_TOKEN', artist, title)

if __name__ == "__main__":
    main()

So GPT uses a hashtag line to add notes to the code I guess to show me what each bit does, I think that’s why it might look clunky here. Anyway, this code returns data however the last line says ‘Error fetching data from Discogs API: 404’. I’m not sure why and I imagine that might be an issue getting the script to run.

I am for the testing of the script using 1 song. I manually searched it on Discogs and proved the release year (the very first) to be 1955. The song is ‘Love Is A Many-Splendored Thing’, and the artist is ‘The Four Aces’. Previously, I’d have an issue where I’d get wrong info back or nothing at all. With the above code I get a massive list of release years for the song in question and I have been able to at least table out the data returned and I can tell the year I am looking for (the earliest of all of them) is in the list.

Initially when I started to try and make this script, asking GPT to make the script return the earliest year gave me different results. Sometimes it would say ‘None’ or it would give me a year from the 2000s or 1978. I figured out based on how the data is returned from the API, if either of those values was first in the list (None, 2000+ or 1978) it would give me that specifically as the “earliest” year.

I then got GPT to write a different script that allows me to paste the date dump I got from the other script and sort in ascending year and it does that, however, when I ask GPT to combine both scripts where it gets the info and finds the earliest year, it give me nothing back which is frustrating.

Does anyone have any ideas? Can anyone help?

funkyfuture · September 4, 2024, 1:46pm

sorry, but following your elaboration i can only suggest that you accept that you haven’t understood the problem you’re trying to solve (you state you want to tag single files that represent a song while you’re searching a database that catalogs released media with audio contents), that you have an inadequate idea of what Large Language Models are capable of and that you’re best advised to either make use of the tools that a variety of people with a diverse range of data and problems at hand have developed and reasoned about in many hours or that you try to gain enough coding competence to come to that conclusion yourself.

jaroatsea · September 5, 2024, 12:35am

Thanks for you input friend. What I’m asking isn’t that hard. If I had the time to learn coding I would. I’m just trying to make something that takes an extra thing off my plate so that I have more time to do the 100 other things that needs doing.

I know it’s possible to do, I just don’t know how to do it. I can manually search and update the files and THAT is what I’m trying to automate with the use of a script. Being acutely aware that GPT isn’t the perfect mechanism to achieve this is why I came here, hoping that someone would look at what I’ve done so far and help me get it to where it’s usable (1% error acceptable) so that I can move on with my life and the other things I want to do.

But I get what you’re saying. And it’s cool. Maybe I should give up and whatever. It’s not that serious anyway.

Thank you everyone that chimed in. With all the information received maybe I’ll figure out a different path.

sinoroc · September 5, 2024, 4:51pm

I can also recommend beets, I have used it and it is really helpful.

But…

If you want something that might be easier for a one-off operation, I can also strongly recommend MusicBrainz Picard. It has a great GUI. I have used it as well, and really enjoyed it to fill in the blanks in a music library with incomplete metadata tags. Really great tool all around. If I recall correctly, with Picard, you can drag-and-drop a whole folder of music files and it will automatically give you a bunch of suggestions to complete the metadata tags, in a way that is very visual and very easy to work with in a sequential manner, one file after the other with minimal amount of clicks.

Aside regarding AI, feel free to disregard:

The thing is that I do not understand the code that AIs write. I can understand the code that normal people write. I am used to mistakes normal people make, so they are easy to spot for me. Code that AIs write and the errors that they make are just too unpredictable, completely random to me. Mistakes can be anywhere, hidden behind the beautifully formatted code and the seemingly perfect abstraction layers.
Also when someone comes with the code that they wrote themselves, they usually are able to tell where we should look, because they know where the potential weak points are, since they did write the code themselves.

Personally I have no interest in reading code written by an AI, it’s just absolutely uninteresting to me. Why would I spend time reading and debugging code written by an AI? What sense would that make? I absolutely don’t get why.

This discrepancy between appearance and actual quality of the code generated by AIs that I mentioned above, is extremely off-putting for me. Maybe it is the uncanny valley.

jaroatsea · September 6, 2024, 2:44am

Thank you for your feedback. I will definitely check it out. And I get what you mean about AI writing code. I am a DJ, so I’m always making playlists. I have in the past asked AI for specific playlists with set rules and what I get back is usually far different from what I was expecting. I only embarked on this particular journey as I thought it might’ve been easy enough to get done. In fact all the functions I want I am able to get the AI to write for me, but when combining the codes, it all seems to go haywire. I will look over Music Brainz Picard and see if it does what I need.

Thanks again!

funkyfuture · September 6, 2024, 12:39pm

as a former disk spinner i’m wondering: what would be left to do for you if that had succeeded, given that playing digital media has already left you with nothing more to do than curating a sequence of music.

(i use vinyl records and Arch Linux, btw.)

jaroatsea · September 7, 2024, 2:35am

I’d have no issue tackling 1000 files per day for a month if I were only playing music. Here on the ship I am a DJ and a host and many other things. So my day is mostly spent doing the other things and then I DJ for a few hours in the night time. Has been that way for the past almost a decade now. Over that time I’ve filled my downloads folder with scores of songs per day, or few hundreds every week, and when I change laptops, the downloads folder gets dumped onto an external drive and never looked at again. This year I decided I wanted to clean up and organize my library, of the 40,000+ songs/files I have, I don’t really know exactly what I have or don’t have. Additionally I have a ton of stuff I’ve never played in my 16+ years of my career. So when I started organizing this year I decided to properly label the files, tags and filenames, add cover art where I can and purge the stuff I’ve never played and never will. We’re talking all the genres Pop and its sub-genres, Rock and its sub-genres, Hip-hop, Rap, R&B, Dance, House, EDM and its sub-genres, Reggae, Dancehall, Soca, Latin and its sub-genres, plus all the old shit like Rock & Roll, Soul, Motown, Disco, R&B, plus the British world of tunes and bangers. I’m probably over-explaining at this point, please don’t take me for an asshole, I’m just frustrated with trying to solve my issue.

I think it’s cool you still use vinyl. On the ship there’s too much vibration sometimes to use vinyl although I deeply prefer the digital stuff. I learned to play with CDJs. Spent a year spinning Serato timecode vinyl exclusively, before moving into controllers. I have a DDJ-FLX10 currently and I love it! My setup is self-contained since I travel my gear. I also use wireless IEMs for monitoring.

What’s your actual setup like and what is Arch Linux?

sinoroc · September 7, 2024, 8:28am

I am not saying it is going to be a breeze, but there are a bunch of tools that can help with that.

Tools like MusicBrainz Picard can work based on a “fingerprint” of the audio to match the file with an online database of audio fingerprints. If that does not work, then the fallback is to try matching based on the existing metadata tags or name of the file. It is of course a bit easier when your library is already somewhat clean: file names, metadata tags, files grouped per album/release. All these things can help the tool figure things out more reliably, but thanks to the audio fingerprint matching it is not strictly necessary.

Beets can do roughly the same things (including matches based on audio fingerprint), but is command-line based.

And of course there are a bunch of other such tools, Mp3tag for example. There has to be one that matches your needs closely enough.

I really recommend you try to spend a couple of hours with those tools and a small but representative sample of your library (copy a hundred of files into an “experimental” directory) and see if there is one tool you like better than the others.

The “bottleneck” with the Discogs API, is that as far as I know it is not possible to do a search with a specific sorting order. So you get one page of search results, you can sort this page by oldest release year, but if you then retrieve the next page of search results you might get even older releases. You could first retrieve all the pages, then sort all these pages as one page, but I do not know how practical that would be, that would for sure be more complex.

jaroatsea · September 7, 2024, 5:39pm

Thank you! Will give those a look over. I had started looking at MusicBrainz Picard since it was mentioned and so far it has impressed me. Might end up being what I use.

sinoroc · September 8, 2024, 9:17pm

Side note, it is open source and written in Python: GitHub - metabrainz/picard: MusicBrainz Picard audio file tagger

sindreruud · January 17, 2025, 12:43pm

Someone has made a script that might do exactly what you want, and some more. It can also fetch genre and covers.

The script in question is available in three variants that have formed over the years, atleast to my knowledge.

First one, with a youtube-video: https://www.youtube.com/watch?v=PCpSwu2exLs

The second one, which is a fork of the first one, and seems to be recently maintained: GitHub - bolinocroustibat/discogs-tag-updater: Updates genre, year and image of .mp3 and .flac files based on title and artist using the Discogs database.

And the third one, which comes with a GUI. It is however deprecated, so you would have to look into if it’s still usable. I can only post two links as a new user, so find it yourself on github as this: Marekkon5/discogstagger

I would also strongly encourage you to check out the discogs.py file from the second one, to give you and idea of what goes into making what you were after.

Also, I would like to also advocate for using Beets as a metadata manager. It would handle all your metadata needs, fetching it directly from Discogs without the need for any manual input. You can also have it organize the music in folders for you.

jaroatsea · January 17, 2025, 5:11pm

Whoa! Thank you man! Lemme check em out and report back.