How to summarize multiple HTML pages with Python AI?

I have Python 3.11 on Windows 11.

I would like to make a list of HTML articles on the web and be able to ask an AI questions about them. This is for personal use. I’ve been searching for web pages on how to do this but so far each article I’ve found only allows one to summarize a single page.

How should I go about summarizing many pages, about 20 or so, so the AI can answer my question about them?

Thanks.

I’m only a beginner, but my first thought is:

  • If you know how to do this for one file, then your problem involves merging all the HTML articles into a single file (and then “summarize a single page”).

a) Perhaps you could merge the files with a bash command?

cat page1.txt page2.txt > merged.txt

b) Or use an online guide on how to merge files with Python? E.g. Python Program to merge two files into a third file - GeeksforGeeks

Sorry I couldn’t be more specific, I’m only learning myself. But this just popped out at me when you said you kinda knew how to do it with a single page/file.

I just came back around to this project. Do I have to change the HTML to plain text first by removing all HTML tags?

Hi mate,
My assumption here is that:
a) You could simply tell your AI tool to ignore the HTML tags.
b) Or you could maybe just use a Python script to remove the HTML tags, or use another bash command to do this, or any Find/Replace method you’d prefer.