How to upload a compressed file (.gz) to the swift object storage using Python swift client?

I wrote the Python code to upload the .gz file from my local machine to the OpenStack object store using the following documentation: https://docs.openstack.org/python-swiftc...t-api.html.

The file gets uploaded successfully. However, if I download the file from the object store and try to decompress it, I get the following error:

gzip -d object_netbox_2024-07-20.psql.gz

gzip: sanbox_nb01_netbox_2024-07-20.psql.gz: not in gzip format

What should I do to ensure the file gets uploaded in the same format and size to the object storage as the file on my local machine?

Below is the code I wrote to upload the file to the Swift object storage

from keystoneauth1 import session
from keystoneauth1.identity import v3
from swiftclient.client import Connection, logger
from swiftclient.client import ClientException
import gzip

# Create a password auth plugin
auth = v3.Password(
    auth_url='https://cloud.company.com:5000/v3/',
    username='myaccount',
    password='mypassword',
    user_domain_name='Default',
    project_name='myproject',
    project_domain_name='Default'
)

# Create swiftclient Connection
swift_conn = Connection(session=keystone_session)

# Create a new container
container = 'object-backups'
swift_conn.put_container(container)
res_headers, containers = swift_conn.get_account()
if container in containers:
    print("The container " + container + " was created!")

# Create a new object with the contents of Netbox database backup
with gzip.open('/var/backup/netbox_backups/netbox_2024-03-16.psql.gz', 'rb') as f:
    # Read the contents...
    file_gz_content = f.read()

    # Upload the returned contents to the Swift Object Storage container
    swift_conn.put_object(
        container,
        "object_netbox_2024-06-16.psql.gz",
        contents=file_gz_content,
        content_type='application/gzip'
    )

# Confirm the presence of the object holding the Netbox database backup
obj1 = 'object_netbox_2024-06-16.psql.gz'
container = 'object-backups'
try:
    resp_headers = swift_conn.head_object(container, obj1)
    print("The object " + obj1 + " was successfully created")
except ClientException as e:
    if e.http_status == '404':
        print("The object " + obj1 + " was not found!")
    else:
        print("An error occurred checking for the existence of the object " + obj1)

Below is the code I wrote to upload the compressed file (.gz)

import gzip
import shutil
import tarfile

# Create a password auth plugin
auth = v3.Password(
auth_url='https://cloud.company.com:5000/v3/',
username='myaccount',
password='mypassword',
user_domain_name='Default',
project_name='myproject',
project_domain_name='Default'
)

# Create session
keystone_session = session.Session(auth=auth)

# Create swiftclient Connection
swift_conn = Connection(session=keystone_session)

# Create a new container
container = 'netbox-backups'
swift_conn.put_container(container)
res_headers, containers = swift_conn.get_account()
if container in containers:
print("The container " + container + " was created!")

# Download the created object from the Object Storage
obj = 'sanbox_nb01_netbox_2024-07-20.psql.gz'
container = 'netbox-backups'
resp_headers, obj_contents = swift_conn.get_object(container, obj)
with open('sanbox_netbox_2024-07-20.psql.gz', 'wb') as local:
local.write(obj_contents)

Did you examine the downloaded file to see what you got back?
Maybe there are headers or something else you need to remove?

Maybe test by uploading a text file and see if you can download that without changes?

I tried to examine the compressed files using the file command:

The compressed file from my local machine before being uploaded to the Object Storage

[root@scs-sandbox-nb01 netbox_backups]# file netbox_2024-07-24.psql.gz 
netbox_2024-07-24.psql.gz: gzip compressed data, last modified: Wed Jul 24 09:40:09 2024, from Unix, original size 1916349

[root@scs-sandbox-nb01 netbox_backups]# ls -lha netbox_2024-07-24.psql.gz 
-rw-r--r--. 1 root root 404K Jul 24 09:40 netbox_2024-07-24.psql.gz


[root@scs-sandbox-nb01 netbox_backups]# zcat netbox_2024-07-24.psql.gz
...
--
-- Name: wireless_wirelesslink wireless_wirelesslink_tenant_id_4c0638ee_fk_tenancy_tenant_id; Type: FK CONSTRAINT; Schema: public; Owner: netbox
--

ALTER TABLE ONLY public.wireless_wirelesslink
    ADD CONSTRAINT wireless_wirelesslink_tenant_id_4c0638ee_fk_tenancy_tenant_id FOREIGN KEY (tenant_id) REFERENCES public.tenancy_tenant(id) DEFERRABLE INITIALLY DEFERRED;


--
-- PostgreSQL database dump complete
--

The compressed file that was downloaded from the object storage using the Swift command

[root@scs-sandbox-nb01 scripts]# file sanbox_nb01_netbox_2024-07-24.psql.gz
sanbox_nb01_netbox_2024-07-24.psql.gz: UTF-8 Unicode text, with very long lines

[root@scs-sandbox-nb01 scripts]# ls -lah sanbox_nb01_netbox_2024-07-24.psql.gz
-rw-r–r–. 1 root root 1.9M Jul 24 12:12 sanbox_nb01_netbox_2024-07-24.psql.gz

[root@scs-sandbox-nb01 scripts]# zcat sanbox_nb01_netbox_2024-07-24.psql.gz

gzip: sanbox_nb01_netbox_2024-07-24.psql.gz: not in gzip format

I wrote a small Python script to upload a text file and it worked successfully. I was able to open the file after downloading it.

Below is the code I wrote to upload the text file

from keystoneauth1 import session
from keystoneauth1.identity import v3
from swiftclient.client import Connection, logger
from swiftclient.client import ClientException

# Create a password auth plugin
auth = v3.Password(auth_url='https://102.211.198.250:13000/v3/',
                   username='christian.hangi',
                   password='W@m47yJi4KSwbZDwi',
                   user_domain_name='Default',
                   project_name='scs.infra',
                   project_domain_name='Default')

# Create session
keystone_session = session.Session(auth=auth)

# Create swiftclient Connection
swift_conn = Connection(session=keystone_session)

# Create a new container
container = 'netbox-backups'
swift_conn.put_container(container)
res_headers, containers = swift_conn.get_account()
if container in containers:
    print("The container " + container + " was created!")
# Create a new object with the contents of a local text file
with open('/var/backup/netbox_backups/swift.txt', 'rb') as f:
    file_data = f.read()

    swift_conn.put_object(
        container,
        'swift.txt',
        contents=file_data,
        content_type='text/plain'
    )

# Confirm the presence of the object holding the Netbox database backup
obj1 = 'swift.txt'
container = 'netbox-backups'
try:
    resp_headers = swift_conn.head_object(container, obj1)
    print("The object " + obj1 + " was successfully created")
except ClientException as e:
    if e.http_status == '404':
        print("The object " + obj1 + " was not found!")
    else:
        print("An error occurred checking for the existence of the object " + obj1)

However, I am not sure really what could be the reason for the issue I am facing with the compressed file. I’m still a newbie in Python and your assistance will be appreciated.

Ok. You got text back.
Use less or more to look at the start of the file.
What do you see?

If I use less or more to look at the start of the file, I can see its contents.

[root@scs-sandbox-nb01 scripts]# less sanbox_nb01_netbox_2024-07-24.psql.gz

--
-- PostgreSQL database dump
--

-- Dumped from database version 15.6
-- Dumped by pg_dump version 15.6

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;

SET default_tablespace = '';

SET default_table_access_method = heap;

--
-- Name: auth_group; Type: TABLE; Schema: public; Owner: netbox
--

CREATE TABLE public.auth_group (
    id integer NOT NULL,
    name character varying(150) NOT NULL
);

...

I wrote a bash script that creates compressed database backup files daily and it works perfectly. The Python script, however, will transfer those compressed backup files to the Object storage which is used as a backup storage system. I want to ensure that the compressed file that is created locally is in the same format and size as the file forwarded to the Object storage and I am not sure how to fix it. Your assistance will be so much appreciated.

Thank you.

I have no special knowledge to share only experience debugging.

Is that text what is inside your .gz file?
I wonder if the content was automatically uncompressed?

When you upload the file, you’ve got

# Create a new object with the contents of Netbox database backup
with gzip.open('/var/backup/netbox_backups/netbox_2024-03-16.psql.gz', 'rb') as f:
    # Read the contents...
    file_gz_content = f.read()

The gzip.open will give you uncompressed content. To upload the compressed content, you should just need to switch it to

with open('/var/backup/netbox_backups/netbox_2024-03-16.psql.gz', 'rb') as f:
    # Read the contents...
    file_gz_content = f.read()

(or even skip loading the whole file into memory and let swiftclient stream it; put_object accepts a file-like so

with open('/var/backup/netbox_backups/netbox_2024-03-16.psql.gz', 'rb') as f:
    swift_conn.put_object(
        container,
        "object_netbox_2024-06-16.psql.gz",
        contents=f,
        content_type='application/gzip'
    )

should work)

1 Like

Thank you so much, Tim. I refactored the code as you suggested and it worked perfectly for files that are compressed in the .gz format. I tried to do the same for files that are compressed in the .tar.bz2, but unfortunately, it didn’t create a .tar.bz2 file in the object storage.

Below is the code I wrote before and tested for the transfer of .tar.bz2 files to the object storage

from keystoneauth1 import session
from keystoneauth1.identity import v3
from swiftclient.client import Connection
from swiftclient.client import ClientException

# Create a password auth plugin
auth = v3.Password(auth_url='https://cloud.company.com:5000/v3/',
                   username='myaccount',
                   password='mypassword',
                   user_domain_name='Default',
                   project_name='myproject',
                   project_domain_name='Default')

# Create session
keystone_session = session.Session(auth=auth)

# Create swiftclient Connection
swift_conn = Connection(session=keystone_session)

# Create a new container
container = 'object-backups'
swift_conn.put_container(container)
res_headers, containers = swift_conn.get_account()
if container in containers:
    print("The container " + container + " was created!")

# Create a new object with the contents of the compressed Netbox media backup
with open("/var/backup/netbox_backups/netbox_media_2024-07-25.tar.bz2", "rb") as f:
    swift_conn.put_object(
        container,
        'sanbox_nb01_netbox_media_2024-07-25.tar.bz2',
        contents=f,
        content_type='application/x-gtar'
    )

# Confirm the presence of the object holding the compressed Netbox media backup
obj2 = 'sanbox_nb01_netbox_media_2024-07-26.tar.bz2'
container = 'netbox-backups'
try:
    resp_headers = swift_conn.head_object(container, obj2)
    print("The object " + obj2 + " was successfully created")
except ClientException as e:
    if e.http_status == '404':
        print("The object " + obj2 + " was not found!")
    else:
        print("An error occurred checking for the existence of the object " + obj2)

The file doesn’t get uploaded to the object storage since it’s a file that has been archived with tar before being compressed with .bz2. We would need to read the contents of the .tar.bz2 file and pass them as a file-like object.
I wrote the code to include the tar file format after doing some research on how to do it.

import io
from keystoneauth1 import session
from keystoneauth1.identity import v3
from swiftclient.client import Connection, logger
from swiftclient.client import ClientException
import gzip
import tarfile

# Create a password auth plugin
auth = v3.Password(auth_url='https://cloud.company.com:5000/v3/',
                   username='myaccount',
                   password='mypassword',
                   user_domain_name='Default',
                   project_name='myproject',
                   project_domain_name='Default')

# Create session
keystone_session = session.Session(auth=auth)

# Create swiftclient Connection
swift_conn = Connection(session=keystone_session)

# Create a new container
container = 'netbox-backups'
swift_conn.put_container(container)
res_headers, containers = swift_conn.get_account()
if container in containers:
    print("The container " + container + " was created!")

# Create a new object with the contents of the compressed Netbox media backup
with tarfile.open('/var/backup/netbox_backups/netbox_media_2024-07-26.tar.bz2', 'r:bz2') as file_tar_bz2:
    # Go over each file in the tar archive...
    for file_info in file_tar_bz2:
        if file_info.isreg():
            # Read the contents...
            logger.info(f"Is regular file: {file_info.name}")
            extracted_file = file_tar_bz2.extractfile(file_info)
            if extracted_file:
                file_contents = extracted_file.read()
            else:
                logger.warning(f"Skipping {file_info.name}: Unable to extract file.")
                continue
        elif file_info.isdir():
            # Read the contents...
            logger.info(f"Is directory: {file_info.name}")
            extracted_file = file_tar_bz2.extractfile(file_info)
            if extracted_file:
                file_contents = extracted_file.read()
            else:
                logger.warning(f"Skipping {file_info.name}: Unable to extract file.")
                continue
        elif file_info.issym():
            # Read the contents...
            logger.info(f"Is symbolic link: {file_info.name}")
            extracted_file = file_tar_bz2.extractfile(file_info)
            if extracted_file:
                file_contents = extracted_file.read()
            else:
                logger.warning(f"Skipping {file_info.name}: Unable to extract file.")
                continue
        elif file_info.islnk():
            # Read the contents...
            logger.info(f"Is hard link: {file_info.name}")
            extracted_file = file_tar_bz2.extractfile(file_info)
            if extracted_file:
                file_contents = extracted_file.read()
            else:
                logger.warning(f"Skipping {file_info.name}: Unable to extract file.")
                continue
        else:
            logger.info(f"Is something else: {file_info.name}. Skip it")
            continue

        # Create a file-like object from the contents...
        file_like_object = io.BytesIO(file_contents)

Instead of uploading the file as compressed locally in the .tar.bz2 format, it is uploaded uncompressed to the object storage. May I please have your assistance in resolving this bug?
I am sorry for the questions as I am still a beginner in Python.

Kind regards

Unless the server vetos .tar.bz2 the same code that worked for .tar.gz should work.

Just to confirm you are not doing any compression or decompression in the python code now?

What error did you get?

No, I’m not. The code that was written previously was to send files that end with .gz only to be sent to the object storage. In my local machine, I wrote a Bash script that will compress a folder containing all files such as PDF documents, pictures, etc. into a .tar.bz2 file (netbox_media_2024-07-26.tar.bz2) and I set a corn job to perform this task every day. Now, I want to use the Python script to transfer the compressed file from my local machine to the object storage. For the first code snippet, it does not do anything and the file is not being set to the object; for the second code snippet however, the file is uncompressed and sent to the object storage. The code above worked for files that end with .gz, not .tar.bz2 or tar.gz since these files include the archive extension (.tar) and their content type is different.

How can I ensure that the compressed file with the extension tar.bz2 or tar.gz is sent to the object storage?

Do not uncompress it inside your python code. Read the file as bytes and send.
I assume you set the name as part of the api. Include the .tar.gz or .tar.bz2 in that name.