| ▲ | josh-gree 6 hours ago | |
*Location:* UK (Manchester) *Remote:* Yes (preferred) *Willing to relocate:* UK/EU considered for the right role *Resume:* josh-gree.github.io/cv *Email:* joshuadouglasgreenhalgh@gmail.com *Technologies:* Python, SQL, R; Airflow, Prefect, Dagster; Kafka; Docker/Kubernetes; Terraform; GCP/AWS; Postgres, PostGIS, Snowflake, Redshift; Zarr/Parquet; ML/Deep Learning; HPC; React/Flask. *Summary:* Senior Software/Data Engineer with a strong mathematical and computational modelling background. I build high-reliability data systems, complex ETL/ELT pipelines, and ML-ready data platforms—especially where datasets are large, irregular, hierarchical, or scientifically complex. Most recently, I’ve been designing and operating large-scale data infrastructure for high-dimensional biological datasets (100k+ samples), unifying heterogeneous storage formats into lineage-aware catalogues, creating ontologies for hierarchical labels, building QC pipelines in Dagster, developing synthetic single-cell data generators, and working closely with domain scientists to formalise and scale experimental and computational workflows. Previously: large-scale mobile-network analytics for humanitarian agencies; climate/energy data engineering; ad-tech pipelines; and HPC-driven modelling from computational research. I’m looking for roles where difficult data problems, scientific or ML-adjacent pipelines, or complex modelling workflows need to be made robust, reproducible, and scalable. Prefer small teams, high ownership, and work with real impact. *What I offer:* – Architecture & implementation of reliable data/ML platforms – Workflow orchestration, data governance, and reproducibility – Scientific/ML pipeline design (Bayesian modelling, synthetic data, QC/validation) – Cloud infra/IaC and cost-efficient storage design – Ability to collaborate deeply with domain experts and formalise messy processes | ||