Newbie Here looking for code assistance

KenFred · July 8, 2023, 3:06pm

Hello, I am new to python and working on a new project. I am looking to code a program that will be pointed at a website (think of something like youtube) and will download the videos including the descriptions and tags to my server. I would think something like a scrapper would work, but not sure the best way to go about it. Since this is the first step in my project I want to make sure it is correct and efficient.

I have the following code but would like your options on what is missing:

'import os
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

def download_videos_from_website(url, output_dir):
# Send a GET request to the webpage
response = requests.get(url)
if response.status_code == 200:
# Parse the HTML content
soup = BeautifulSoup(response.content, ‘html.parser’)

    # Find all video tags or any other relevant tags that contain video URLs
    video_tags = soup.find_all('video')
    # Alternatively, you can search for <a> tags with video file extensions (.mp4, .avi, etc.)
    # video_tags = soup.find_all('a', href=lambda href: href.endswith(('.mp4', '.avi')))

    # Create the output directory if it doesn't exist
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    # Iterate over the video tags and download each video
    for video_tag in video_tags:
        # Get the video URL
        video_url = urljoin(url, video_tag['src'])

        # Generate a unique filename for the video
        filename = os.path.basename(video_url)

        # Download the video file
        response = requests.get(video_url)
        if response.status_code == 200:
            # Save the video file to the output directory
            filepath = os.path.join(output_dir, filename)
            with open(filepath, 'wb') as f:
                f.write(response.content)
            print(f"Downloaded: {filename}")
        else:
            print(f"Error: {response.status_code} - Failed to download {filename}")

else:
    print(f"Error: {response.status_code} - Failed to retrieve webpage")

Example usage

website_url = ‘https://www.example.com’
output_directory = ‘/path/to/save/videos’
download_videos_from_website(website_url, output_directory)

barry-scott · July 8, 2023, 9:20pm

You will find that this will not work as you are not running the JavaScript that is used to create the page you see on sites like youtube.

You could use selenium to load the url in a browser and get info about page via the selenium API.

What you are likely to find is that each site uses a unique way to provide rge video and you will need per site solutions.

fungi · July 8, 2023, 9:58pm

Also, unless this is purely a learning exercise for personal
enrichment, you’re likely far better off just using yt-dlp
(coincidentally written in Python already), as it has special casing
for the various video publishing sites, even some fairly obscure
ones, and gets regularly updated to cope with the constant changes
those sites impose in order to try to thwart direct download
clients: