Python project structure

Hello,

I am a programmer with experience in multiple of programming languages, but not yet in python. I want to start a bigger project, but while googling I find a lot of conflicting advice on how to structure it. Can someone provide me a few links to what’s the standard today, if there even is 1 standard?

Basically, I need an equivalent to the conventions given by Java’s maven, go’s go, rust’s cargo, …

I have these needs:

  • Dependency management, isolated to the project (so multiple projects can have conflicting versions of the same dependency)… Ecosystem seems to be pip,pipenv, pipx,venv, but it is unclear to me how these work together and how much they overlap.
  • folder structure: Where do I place source code, tests, documentation, config, in relation to the project folder
  • tooling: What are best practices for enforcing linting, formatting, testing, …

I found this that seems to give 80% of the answer, but I don’t know if it really represents todays standards, and it has some holes (e.g. where do I put source code?)

I also found this, but I have the impression it gives tooling but not much conventions

There definitely isn’t one standard (you can find some very long discussions about this state of affairs :sweat_smile: ). There are a lot of options, catering to different styles or sets of requirements.

One resource I think is useful for something-like-a-standard is packaging.python.org. It focuses on what is standardized, and so there are some gaps that are tool specific, but it goes over things like where to put your code in a package directory.

I can give you my own personal preferences, but many people do things differently:

  • For dependency/environment management I use conda (or preferably mamba) because I often use extensions that are historically tricky to install with pip [1].
  • I use the src layout for my code with separate directories for docs and tests. Where config lives depends on how it will be used–often I want to include the config file with the package so it goes in src and is accessed with importlib.resources.
  • I use ruff for linting and black for formatting. I like pytest for testing. I configure them all in the pyproject.toml file. I typically use setuptools to build the package.

I won’t claim these are the best solutions, but they’re “vanilla” answers to all those questions. All of these (save perhaps ruff which is fairly new) are well-established tools with lots of users and lots of documentation about how to use and configure things.


  1. this is better than it used to be, though ↩︎

1 Like

I am a programmer with experience in multiple of programming languages,
but not yet in python. I want to start a bigger project, but while
googling I find a lot of conflicting advice on how to structure it.
Can someone provide me a few links to what’s the standard today, if
there even is 1 standard?

There isn’t just one standard. And you can always change your standard.

Basically, I need an equivalent to the conventions given by Java’s maven, go’s go, rust’s cargo, …
I have these needs:

  • Dependency management, isolated to the project (so multiple projects can have conflicting versions of the same dependency)… Ecosystem seems to be pip,pipenv, pipx,venv, but it is unclear to me how these work together and how much they overlap.

There are several of these. The easiest is venv, ships with Python, is
simple and effective. The others: do more, I’ve not needed them
personally, you can always upgrade once you’re comfoartable.

  • folder structure: Where do I place source code, tests, documentation, config, in relation to the project folder

Many people put it in a “src/” subfolder. My personal habit is
“lib/python/blah/…”, and usually there’s a top level “blah” there
representing the project top package name (keeps its modules out of the
way of other names). I put a convenience symlink “blah” →
“lib/python/blah” to make for easy to use names for editors.

  • tooling: What are best practices for enforcing linting, formatting, testing, …

There are many linters; I keep a script which runs a few with my
preferred tunings.

“ruff” is getting very popular - it is fast and covers a lot of the
other linters’ tests.

Formatting: lots of people use an autoformatter to apply a style close
to PEP8 (which is the style for Python’s stdlib). The “black” tool is
very popular because is basicly applies an opinionated PEP 8 style with
nearly no tuning options. For various reasons I use yapf, with a style
which is PEP8 with a few tweaks.

I found this that seems to give 80% of the answer, but I don’t know if it really represents todays standards, and it has some holes (e.g. where do I put source code?)
How to set up a perfect Python project

I also found this, but I have the impression it gives tooling but not much conventions
12. Virtual Environments and Packages — Python 3.12.0 documentation

Conventions are largely a personal choice. Do something simple which
you’re already comfortable and make an informed choice later when you’ve
hit some pain points, if any.

Cheers,
Cameron Simpson cs@cskk.id.au

Just my personal opinions here:

  • Dependency management. Use either venv or conda. (I always use conda, since it’s a full-blown package/environment manager - for more than Python - while pip is basically just an installer. Conda is especially useful for big packages as used in ML/datascience. It gives better guarantees that all dependency versions remain consistent – pip doesn’t completely do this. I use pip only for packages that are not available on anaconda or sometimes for pure-Python packages or for special testing.)
  • Folder structure - Same recommendation as @jamestwebber (see his link)
  • Tooling. I use: black, pylint, pytest and mypy as only extra tools (apart from build tools). I always try to ensure code has no linting errors and that mypy runs without errors. Pylint is opinionated but easy to customize. Mypy is useful especially for bigger projects - it gives a little bit more assurance that function argument + return value types are consistent across the project, even though Python just ignores type hints at runtime. Black is very nice as code formatter.

I just wanted to emphasise this bit: “try to ensure code has no linting_
errors”. When you do this routinely it means that errors are new, from
recent changes you’ve made. Very useful.

Remember, linters are quite tuneable, usually, so you can turn off
particularly annoying rules which don’t match your code, and usually put
comments in the code to disable a particular rule for a particular
piece of code for which a rule is a false positive.

1 Like

@hyperman For folder structure, you’re much better off with an src layout. For why, you can read this.

https://packaging.python.org/en/latest/discussions/src-layout-vs-flat-layout/

Use a project TOML file (pyproject.toml), and don’t use setup.py and requirements.txt files. The TOML file can be used to define your project dependencies, as well as different groups of dev dependendencies, all in one place, you can use a PEP 621 compliant tool to manage them, and other metadata as well.

For dependency management, accurate and reliable dependency resolution is the key, and in my experience PDM is the best in meeting this requirement. The only other tool out there that is comparable is Poetry, but I think PDM is much better, as it is complies with the relevant standards (e.g. PEP 621) and has a much better CLI with more options.

I recently came across this blog post comparing the various options available, which I think is very helpful.

P. S. The only thing PDM can’t do is Python version management, but neither does Poetry. For that you can use pyenv.

Thanks for the advice, everyone. This seems very interesting. I’ll do some reading over the next few days. I hadn’t understood Conda as a package management tool, more as a distribution, but as I am in the data science world, I’m probably going that way.

Anaconda is the curated distribution of packages, and conda is the package manager that was originally created to help distribute and update those curated packages. These days conda has wider use beyond just the Anaconda distribution.