Here even is a complete demonstration example. The code below generates between one and four million multiply-nested “gamestats”. Then it will chew through all of them but only call the expensive function on a fraction of them (so only take a fraction of the time it would take otherwise):
Processed 673 out of 2247765 game stats
You can tinker with the base sleep value to make it simulate more expensive process_game_stats or with the base probability to make it skip more or less of the game stats.
It’s just a toy that uses a current league id to progressively increase the probability that a future gamestats will be skipped. It’s up to you to implement the real business logic to decide which should actually be skipped. But the point is that there are clear places to put that logic, that aren’t smeared across seven levels of a 50 line loop. You can actually run mypy on this sort of code, and make sure the parts fit together like they are supposed to. In other words: Decoupling the different concerns can be done, and more importantly, should be done (for testing ease, maintainability, and sanity).
from __future__ import annotations
from dataclasses import dataclass
from random import randint, random
from time import sleep
from typing import Iterable, Iterator
# Controls (tinker with these)
_BASE_PROBABILITY = 0.001
_BASE_SLEEP = 0.001
# Types
@dataclass
class GameStats:
football_bats: int
@dataclass
class Season:
game_stats: list[GameStats]
@dataclass
class Player:
seasons: list[Season]
@dataclass
class League:
id: int
players: list[Player]
ProcessedStats = tuple[League, Player, Season, GameStats]
# Fake data
data: list[League] = []
TOTAL_STATS = 0
for i in range(1, 11): # leagues
league = League(i, [])
for j in range(200): # players
player = Player([])
for k in range(randint(10, 20)): # seasons
season = Season([])
for l in range(randint(50, 100)): # game_stats
season.game_stats.append(GameStats(randint(2, 10)))
TOTAL_STATS += 1
player.seasons.append(season)
league.players.append(player)
data.append(league)
# Processing
FILTER_DATA = _BASE_PROBABILITY
# simulate expensive function
def process_game_stats(
league: League, player: Player, season: Season, game_stats: GameStats
) -> ProcessedStats:
sleep(game_stats.football_bats * _BASE_SLEEP)
return league, player, season, game_stats
def generate_stats(leagues: Iterable[League]) -> Iterator[ProcessedStats]:
for league in leagues:
for player in league.players:
for season in player.seasons:
for game_stats in season.game_stats:
if should_skip(league, player, season, game_stats):
continue
yield process_game_stats(league, player, season, game_stats)
def should_skip(
league: League, player: Player, season: Season, game_stats: GameStats
) -> bool:
# business logic that skips based on collected data goes here
return random() > FILTER_DATA
def filter_stats(stats: Iterable[ProcessedStats]) -> Iterator[ProcessedStats]:
# real data for the business logic is built up here
global FILTER_DATA
for stat in stats:
FILTER_DATA = _BASE_PROBABILITY / stat[0].id
yield stat
processed = filter_stats(generate_stats(data)) # lazy evaluation pipeline
final = []
for i, stat in enumerate(processed):
if i % 10 == 0:
print(f"{i} processed, current league {stat[0].id}, P={FILTER_DATA}")
final.append(stat)
print(f"Processed {len(final)} out of {TOTAL_STATS} game stats")