New to Python and JSON. I think I'm very close to completion, but keep getting an error with my Keras Tokenizer

This is the error: myenv\lib\site-packages\keras\preprocessing\text.py", line 536, in get_config
json_word_counts = json.dumps(self.word_counts)
AttributeError: ‘dict’ object has no attribute ‘word_counts’

Here is the code:

import librosa
import numpy as np
import nltk
import tensorflow as tf
import time

from flask import Flask, jsonify, request
from flask_cors import CORS
from midiutil import MIDIFile

app = Flask(name)
CORS(app)

Load the saved tokenizer and model

tokenizer = tf.keras.preprocessing.text.tokenizer_from_json(open(‘tokenizer.json’).read())
model = tf.keras.models.load_model(‘model.h5’)
max_sequence_length = model.input.shape[1]

Helper function to preprocess text data

def preprocess_text(text):
# Tokenize text
tokens = nltk.word_tokenize(text)
# Remove punctuation and convert to lower case
filtered_tokens = [token.lower() for token in tokens if token.isalnum()]
# Join filtered tokens back into a string
text = ’ '.join(filtered_tokens)
return text

Helper function to predict the next word

def predict_next_word(text, model):
# Preprocess text
preprocessed_text = preprocess_text(text)
# Convert preprocessed text to a sequence of integers
sequence = tokenizer.texts_to_sequences([preprocessed_text])[0]
# Pad sequence with zeros
sequence_padded = tf.keras.preprocessing.sequence.pad_sequences([sequence], maxlen=max_sequence_length, padding=‘pre’)
# Predict probabilities for the next word
predictions = model.predict(sequence_padded)[0]
# Get the index of the most probable next word
next_word_index = np.argmax(predictions)
# Get the actual next word corresponding to the predicted index
next_word = tokenizer.index_word[next_word_index]
return next_word

Initialize MIDI file and track

midi_file = MIDIFile(1)
midi_file.addTrackName(0, 0, “Lyrics”)
midi_file.addTempo(0, 0, 120)

Define Flask route for predicting the next word

@app.route(‘/predict’, methods=[‘POST’])
def predict():
# Get input text from request
text = request.json[‘text’]
# Predict the next word
next_word = predict_next_word(text, model)
# Return the predicted next word as a response
response = {‘next_word’: next_word}
return jsonify(response)

Define Flask route for generating and playing the MIDI file

@app.route(‘/generate_midi’, methods=[‘POST’])
def generate_midi():
# Get input text from request
text = request.json[‘text’]
# Preprocess text
preprocessed_text = preprocess_text(text)
# Split preprocessed text into words
words = preprocessed_text.split()
# Generate MIDI notes for each word
for i, word in enumerate(words):
# Predict the next word
next_word = predict_next_word(’ '.join(words[:i+1]), model)
# Generate MIDI note for the current word and predicted next word
midi_note = librosa.note_to_midi(word)
midi_next_note = librosa.note_to_midi(next_word)
# Add MIDI note to the track
midi_file.addNote(0, 0, midi_note, i, 1, velocity=100)
# Add a pause between notes to simulate timing of singing
time.sleep(0.5)
# Write the MIDI file to disk
with open(‘lyrics.mid’, ‘wb’) as f:
midi_file.writeFile(f)
# Play the MIDI file using the virtual engine
return jsonify({‘message’: ‘MIDI file generated successfully!’})

import os

root_dir = os.path.dirname(os.path.abspath(file))
print(root_dir)

#!/usr/bin/env python

coding: utf-8

In[1]:

import subprocess

def install_transformers():
subprocess.call([‘pip’, ‘install’, ‘transformers’])

In[3]:

from transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer.from_pretrained(‘bert-base-uncased’)
model = BertModel.from_pretrained(‘bert-base-uncased’)

In[1]:

import subprocess

def install(package):
subprocess.call([sys.executable, “-m”, “pip”, “install”, package])

import sys
import subprocess

def install(package):
subprocess.call([sys.executable, “-m”, “pip”, “install”, package])

example usage

if name == ‘main’:
install(‘transformers’)
install(‘pytorch_lightning’)
install(‘sentencepiece’)
install(‘scikit-learn’)
install(‘pandas’)
install(‘matplotlib’)
install(‘seaborn’)

import torch
import torchvision

In[2]:

In[3]:

from transformers import GPT2Tokenizer, GPT2Model

Load pre-trained model tokenizer (vocabulary)

tokenizer = GPT2Tokenizer.from_pretrained(‘gpt2’)

Load pre-trained model (weights)

model = GPT2Model.from_pretrained(‘gpt2’)

In[4]:

from transformers import RobertaTokenizer, RobertaModel

Load pre-trained model tokenizer (vocabulary)

tokenizer = RobertaTokenizer.from_pretrained(‘roberta-base’)

Load pre-trained model (weights)

model = RobertaModel.from_pretrained(‘roberta-base’)

In[5]:

import subprocess

def install_packages():
packages = [‘keras’, ‘torch’, ‘torchvision’, ‘magenta’, ‘darknet’]
for package in packages:
subprocess.call([sys.executable, “-m”, “pip”, “install”, package])
install_packages()

%pip install -r requirements.txt

In[3]:

In[4]:

In[1]:

In[2]:

%load C:\Users\Andrew Embury\Desktop\combined code for realtime lyrics.py

import librosa
import numpy as np
import nltk
import tensorflow as tf
import time

from flask import Flask, jsonify, request
from flask_cors import CORS
from midiutil import MIDIFile

app = Flask(name)
CORS(app)

Load the saved tokenizer and model

tokenizer = tf.keras.preprocessing.text.tokenizer_from_json(open(‘tokenizer.json’).read())
model = tf.keras.models.load_model(‘model.h5’)
max_sequence_length = model.input.shape[1]

Helper function to preprocess text data

def preprocess_text(text):
# Tokenize text
tokens = nltk.word_tokenize(text)
# Remove punctuation and convert to lower case
filtered_tokens = [token.lower() for token in tokens if token.isalnum()]
# Join filtered tokens back into a string
text = ’ '.join(filtered_tokens)
return text

Helper function to predict the next word

def predict_next_word(text, model):
# Preprocess text
preprocessed_text = preprocess_text(text)
# Convert preprocessed text to a sequence of integers
sequence = tokenizer.texts_to_sequences([preprocessed_text])[0]
# Pad sequence with zeros
sequence_padded = tf.keras.preprocessing.sequence.pad_sequences([sequence], maxlen=max_sequence_length, padding=‘pre’)
# Predict probabilities for the next word
predictions = model.predict(sequence_padded)[0]
# Get the index of the most probable next word
next_word_index = np.argmax(predictions)
# Get the actual next word corresponding to the predicted index
next_word = tokenizer.index_word[next_word_index]
return next_word

Initialize MIDI file and track

midi_file = MIDIFile(1)
midi_file.addTrackName(0, 0, “Lyrics”)
midi_file.addTempo(0, 0, 120)

Define Flask route for predicting the next word

@app.route(‘/predict’, methods=[‘POST’])
def predict():
# Get input text from request
text = request.json[‘text’]
# Predict the next word
next_word = predict_next_word(text, model)
# Return the predicted next word as a response
response = {‘next_word’: next_word}
return jsonify(response)

Define Flask route for generating and playing the MIDI file

@app.route(‘/generate_midi’, methods=[‘POST’])
def generate_midi():
# Get input text from request
text = request.json[‘text’]
# Preprocess text
preprocessed_text = preprocess_text(text)
# Split preprocessed text into words
words = preprocessed_text.split()
# Generate MIDI notes for each word
for i, word in enumerate(words):
# Predict the next word
next_word = predict_next_word(’ '.join(words[:i+1]), model)
# Generate MIDI note for the current word and predicted next word
midi_note = librosa.note_to_midi(word)
midi_next_note = librosa.note_to_midi(next_word)
# Add MIDI note to the track
midi_file.addNote(0, 0, midi_note, i, 1, velocity=100)
# Add a pause between notes to simulate timing of singing
time.sleep(0.5)
# Write the MIDI file to disk
with open(‘lyrics.mid’, ‘wb’) as f:
midi_file.writeFile(f)
# Play the MIDI file using the virtual engine
return jsonify({‘message’: ‘MIDI file generated successfully!’})

In[3]:

In[1]:

In[1]:

In:

In:

import transformers
import torch
import torchvision
import librosa
import numpy

from app import app

if name == ‘main’:
app.run()

import tensorflow as tf
import numpy as np

def main():
# put your code here
model = tf.keras.models.load_model(‘model.h5’)
tokenizer = tf.keras.preprocessing.text.tokenizer_from_json(open(‘tokenizer.json’).read())

while True:
    text = input('Enter some text: ')
    if text.lower() == 'exit':
        break

    sequence = tokenizer.texts_to_sequences([text])
    sequence = tf.keras.preprocessing.sequence.pad_sequences(sequence, maxlen=100)
    prediction = model.predict(sequence)
    prediction = np.squeeze(prediction)

    if prediction > 0.5:
        print('Positive')
    else:
        print('Negative')

if name == ‘main’:
main()

import tensorflow as tf
import numpy as np

def my_function():
# put your code here
model = tf.keras.models.load_model(‘model.h5’)
tokenizer = tf.keras.preprocessing.text.tokenizer_from_json(open(‘tokenizer.json’).read())

while True:
    text = input('Enter some text: ')
    if text.lower() == 'exit':
        break

    sequence = tokenizer.texts_to_sequences([text])
    sequence = tf.keras.preprocessing.sequence.pad_sequences(sequence, maxlen=100)
    prediction = model.predict(sequence)
    prediction = np.squeeze(prediction)

    if prediction > 0.5:
        print('Positive')
    else:
        print('Negative')

if name == ‘main’:
my_function()

import pygame

Initialize Pygame mixer

pygame.mixer.init()

Load audio file

audio_file = “example_audio.wav”
pygame.mixer.music.load(audio_file)

Play audio file

pygame.mixer.music.play()

Wait for audio to finish playing

while pygame.mixer.music.get_busy():
continue

Clean up

pygame.mixer.quit()

from transformers import AutoTokenizer

Load the tokenizer

tokenizer = AutoTokenizer.from_pretrained(“bert-base-uncased”)

from transformers import AutoTokenizer

Load the tokenizer

tokenizer = AutoTokenizer.from_pretrained(“roberta-base”)

from transformers import GPT2Tokenizer

Load the tokenizer

tokenizer = GPT2Tokenizer.from_pretrained(“gpt2”)

from transformers import Wav2Vec2Tokenizer

Load the tokenizer

tokenizer = Wav2Vec2Tokenizer.from_pretrained(“facebook/wav2vec2-base-960h”)

Tokenize some text

text = “Hello, how are you doing today?”
tokens = tokenizer(text, return_offsets_mapping=True, padding=True, truncation=True)

Output tokenized text as JSON

import json

with open(“output.json”, “w”) as f:
json.dump(tokens, f)

Cheers!

GPT2 is not a suitable tool for writing code. Start by actually writing your own code, and then maybe we can help. Also, you haven’t posted any of the code, only the error, so we can’t help you at all.

Don’t use a chat AI to write software. It’s a waste of time.

Thank you Chris, that all makes sense. I am happy to share the code, how should I do so?

Post code that you wrote, not that an AI wrote for you, inside triple backticks.

How much of this came from GPT and how much came from you? I don’t want to waste my time reviewing code that you have no control over.

It was all generated from GPT, it is not my intention to waste your time. I am just naïve in this department and thought that GPT did a good enough job. I worked in visual studio for about a week fixing all of the errors that were displayed. The only error that persists is that the Keras tokenizer does not have a dictionary config for the term “Word Count” and i thought that would be an easy fix for someone in this community. Sorry to ask for help before paying my coding dues brother, I was just excited to have my app idea come to life and didn’t realize that a LLM would not suffice as an adequate coder.

Unfortunately, GPT is very good at tricking people into thinking that it knows what it’s talking about. It is, however, completely oblivious to truth and correctness (it even gets basic arithmetic wrong!), so it’s not suitable for software generation.

Which means that you’ll probably do best to start over. Sorry.

Fair enough. Thank you for you time. Do you know of a way that I can at least check the validity of the code before scrapping it all and learning the right way? I’m happy to scrap it and learn .py , just curious mostly. I think i gave myself false hope that it was doing a good job when VS found errors and I worked to fix them, kinda led me to believe it was real code lol.

There’s one obvious way to check: run it and see :slight_smile: You’ve already done that, and found that, no, it’s not valid.

While you MIGHT be able to salvage some parts of it, you would get just as much benefit by copying and pasting code from tutorials on the internet. Obviously those could also be faulty, but at least they come from people who have an understanding of whether code works or not, and a well-respected tutorial will have been tested by multiple people.

More reasonable use of this code, though, is as a sort of checklist of “stuff that I’m gonna need”. Turn that into a set of bullet points, then turn those bullet points into comments, and use that as your skeleton.

Your a king, I appreciate all of your feedback and will use it accordingly. I won’t take up anymore of your time. Thank you again. Cheers!

1 Like

I’d be happy to help out with code review on the next iteration!

1 Like