I have a project layout as shown below. Data files are located in the package in the src/mypackage/data directory.
my-project
├── src
│ └── mypackage
│ ├── data
│ │ ├── fruits.csv
│ │ └── veggies.csv
│ ├── __init__.py
│ └── reader.py
├── README.md
├── example.py
└── pyproject.toml
The pyproject.toml content is shown here:
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "mypackage"
version = "0.1"
authors = [{name = "Bart Simpson"}]
description = "A small example package"
requires-python = ">=3.12"
dependencies = ["pandas", "ruff"]
In the reader.py
module I have functions that read the CSV files in the data directory and print out the data. Below is a function that reads the fruits.csv
file and prints the fruit data.
import pandas as pd
import importlib.resources
def read_fruits():
"""Read fruits CSV file and print data."""
data_res = importlib.resources.files("mypackage") / "data"
with importlib.resources.as_file(data_res / "fruits.csv") as f:
df = pd.read_csv(f)
print(f"\nFruits data from `fruits.csv` is below\n{df}")
Question 1
I can use
importlib.resources.files("mypackage") / "data"
or I can use
importlib.resources.files("mypackage.data")
to get a traversable to the data directory in the package. Both of these definitions work but does it matter which one I should use? Is one more performant than the other?
Question 2
The data directory is just a plain folder as shown here
data/
├── fruits.csv
└── veggies.csv
or it can be a package as shown next
data/
├── __init__.py
├── fruits.csv
└── veggies.csv
Both of these approaches work. But the Python docs make it sound like this directory should be a package. Can someone clarify if this data directory should be a package or not?
Question 3
If the data directory is made a package as shown below
data/
├── __init__.py
├── fruits.csv
└── veggies.csv
I can import it as a package and use it as shown here
import pandas as pd
import importlib.resources
from . import data
def read_fruits():
"""Read fruits CSV file and print data."""
data_res = importlib.resources.files(data) # <-- use data package here
with importlib.resources.as_file(data_res / "fruits.csv") as f:
df = pd.read_csv(f)
print(f"\nFruits data from `fruits.csv` is below\n{df}")
This approach works too and doesn’t rely on using strings to get the data directory but it requires adding an __init__.py
file and an import statement for the data package. Is there any reason to use this approach compared to my approach shown above?