Remix clone Hacker News

new | show | ask | jobs Github

	▲	josh-gree 6 hours ago
		Location: UK (Manchester) Remote: Yes (preferred) Willing to relocate: UK/EU considered for the right role Resume: josh-gree.github.io/cv Email: joshuadouglasgreenhalgh@gmail.com Technologies: Python, SQL, R; Airflow, Prefect, Dagster; Kafka; Docker/Kubernetes; Terraform; GCP/AWS; Postgres, PostGIS, Snowflake, Redshift; Zarr/Parquet; ML/Deep Learning; HPC; React/Flask. Summary: Senior Software/Data Engineer with a strong mathematical and computational modelling background. I build high-reliability data systems, complex ETL/ELT pipelines, and ML-ready data platforms—especially where datasets are large, irregular, hierarchical, or scientifically complex. Most recently, I’ve been designing and operating large-scale data infrastructure for high-dimensional biological datasets (100k+ samples), unifying heterogeneous storage formats into lineage-aware catalogues, creating ontologies for hierarchical labels, building QC pipelines in Dagster, developing synthetic single-cell data generators, and working closely with domain scientists to formalise and scale experimental and computational workflows. Previously: large-scale mobile-network analytics for humanitarian agencies; climate/energy data engineering; ad-tech pipelines; and HPC-driven modelling from computational research. I’m looking for roles where difficult data problems, scientific or ML-adjacent pipelines, or complex modelling workflows need to be made robust, reproducible, and scalable. Prefer small teams, high ownership, and work with real impact. What I offer: – Architecture & implementation of reliable data/ML platforms – Workflow orchestration, data governance, and reproducibility – Scientific/ML pipeline design (Bayesian modelling, synthetic data, QC/validation) – Cloud infra/IaC and cost-efficient storage design – Ability to collaborate deeply with domain experts and formalise messy processes