You are not wrong, but for this example you can do something like this to run in threads:
import polars as pl
pl.DataFrame({"a": [1, 2, 3]}).write_parquet("test.parquet")
def print_shape(df: pl.DataFrame) -> pl.DataFrame:
print(df.shape)
return df
lazy_frames = [
pl.scan_parquet("test.parquet")
.map_batches(print_shape)
for _ in range(100)
]
pl.collect_all(lazy_frames, comm_subplan_elim=False)
(comm_subplan_elim is important)