Skip to content

Commit

Permalink
download: fix logging level of workers on Windows
Browse files Browse the repository at this point in the history
A gotcha from my Unix-only design. multiprocessing is simply awful
on Windows -- expensive process creation, no signals (only crappy signal
emulation; I miss SIGTSTP in particular), and TIL, no fork. So logger
level set inside a function simply isn't inherited by worker
processes (because it's not a side effect of import).

Were I to redo the whole thing with Windows in mind, I might choose
threads over processes -- but then we face the problem of no reliable
termination mechanism, and a download thread could hang
indefinitely* (unless we dig low enough to be able to inject
synchronization mechanisms into network code, or find a library that's
flexible enough to do so -- certainly not requests or urllib3) unless we
force a dirty exit with unjoined threads.

Trade-offs, trade-offs.

* Make no mistake: iter_content could hang, too. Actually, maybe my
  iter_content_with_timeout
  https://gist.github.com/zmwangx/0dcbdbe6f67f3540dd73e777ef1b2a89 could
  solve the problem within requests, but god knows whether eventlet
  works on Windows, or whether it works but has other gotchas.
  • Loading branch information
zmwangx committed Nov 6, 2018
1 parent 05d09e8 commit 643486c
Showing 1 changed file with 12 additions and 6 deletions.
18 changes: 12 additions & 6 deletions src/caterpillar/download.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,12 +125,17 @@ def download_segment(
)


# download_segment wrapper that takes all arguments as a single tuple,
# so that we can use it with multiprocessing.pool.Pool.map and company.
# It also gracefully consumes KeyboardInterrupt.
def _download_segment_mappable(args: Tuple[str, int, pathlib.Path]) -> bool:
# download_segment wrapper that takes all arguments as a single tuple
# (with one additional argument: the logging level, so that it can be
# set correctly for worker processes -- there's no fork on Windows, so
# the worker processes do not actually inherit logger level), so that we
# can use it with multiprocessing.pool.Pool.map and company. It also
# gracefully consumes KeyboardInterrupt.
def _download_segment_mappable(args: Tuple[str, int, pathlib.Path, int]) -> bool:
try:
return download_segment(*args)
*dl_args, logging_level = args
logger.setLevel(logging_level)
return download_segment(*dl_args)
except KeyboardInterrupt:
url, *_ = args
logger.debug(f"download of {url} has been interrupted")
Expand Down Expand Up @@ -172,9 +177,10 @@ def download_m3u8_segments(
target_duration = remote_m3u8_obj.target_duration
local_segments = []
download_args = []
logging_level = logger.getEffectiveLevel()
for index, segment in enumerate(remote_m3u8_obj.segments):
url = urllib.parse.urljoin(remote_m3u8_url, segment.uri)
download_args.append((url, index, local_m3u8_file.parent))
download_args.append((url, index, local_m3u8_file.parent, logging_level))
local_segments.append((f"{index}.ts", segment.duration))

with open(local_m3u8_file, "w", encoding="utf-8") as fp:
Expand Down

0 comments on commit 643486c

Please sign in to comment.