Hello, I am new to python and working on a new project. I am looking to code a program that will be pointed at a website (think of something like youtube) and will download the videos including the descriptions and tags to my server. I would think something like a scrapper would work, but not sure the best way to go about it. Since this is the first step in my project I want to make sure it is correct and efficient.
I have the following code but would like your options on what is missing:
'import os
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
def download_videos_from_website(url, output_dir):
# Send a GET request to the webpage
response = requests.get(url)
if response.status_code == 200:
# Parse the HTML content
soup = BeautifulSoup(response.content, ‘html.parser’)
# Find all video tags or any other relevant tags that contain video URLs
video_tags = soup.find_all('video')
# Alternatively, you can search for <a> tags with video file extensions (.mp4, .avi, etc.)
# video_tags = soup.find_all('a', href=lambda href: href.endswith(('.mp4', '.avi')))
# Create the output directory if it doesn't exist
if not os.path.exists(output_dir):
os.makedirs(output_dir)
# Iterate over the video tags and download each video
for video_tag in video_tags:
# Get the video URL
video_url = urljoin(url, video_tag['src'])
# Generate a unique filename for the video
filename = os.path.basename(video_url)
# Download the video file
response = requests.get(video_url)
if response.status_code == 200:
# Save the video file to the output directory
filepath = os.path.join(output_dir, filename)
with open(filepath, 'wb') as f:
f.write(response.content)
print(f"Downloaded: {filename}")
else:
print(f"Error: {response.status_code} - Failed to download {filename}")
else:
print(f"Error: {response.status_code} - Failed to retrieve webpage")
Example usage
website_url = ‘https://www.example.com’
output_directory = ‘/path/to/save/videos’
download_videos_from_website(website_url, output_directory)