I think the main problem is that the "preprocessing" step for the input files is to do:
print(f"Preprocessing file: {filepath}") cmd = ["grep", "countme", str(filepath)] r = subprocess.run(cmd, stdout=tmpfile) if r.returncode != 0:
...and this fails for filepath.gz. That makes the file 10x+ bigger which is the first problem, and also triggers tracebacks in the "progress" iteration (probably due to the extra input lines without countme match).
Obvious step is to just uncompress the file to a temp. place, much like preprocessing. Or maybe have the preprocessing be inline instead of needing a raw file.
Log in to comment on this ticket.