I like Karpathy, we come from the same lineage and I am very proud of him for what he's accomplished, he's a very impressive guy.

In regards to deep learning, building deep learning architecture is one of my greatest joys in finding insights from perceptual data. Right now, I'm working on spatiotemporal data modeling to build prediction systems for urban planning to improve public transportation systems. I build ML infrastructure too and plan to release an app that deploys the model in the wild within event streams of transit systems.

It took me a month to master the basics and I've spent a lot of time with online learning, with Deeplearning.ai and skills.google. Deeplearning.ai is ok, but I felt the concepts a bit dated. The ML path at skills.google is excellent and gives a practical understanding of ML infrastructure, optimization and how to work with gpus and tpus (15x faster than gpus).

But the best source of learning for me personally and makes me a confident practitioner is the book by Francois Chollet, the creator of Keras. His book, "Deep Learning with Python", really removed any ambiguity I've had about deep learning and AI in general. Francois is extremely generous in how he explains how deep learning works, over the backdrop of 70 years of deep learning research. Francois keeps it updated and the third revision was made in September 2025 - its available online for free if you don't want to pay for it. He gives you the recipe for building a GPT and Diffusion models, but starts from the ground floor basics of tensor operations and computation graphs. I would go through it again from start to finish, it is so well written and enjoyable to follow.

The most important lesson he discusses is that "Deep learning is more of an art than a science". To get something working takes a good amount of practice and the results on how things work can't always be explained.

He includes notebooks with detailed code examples with Tensorflow, Pytorch and Jax as back ends.

Deep learning is a great skill to have. After reading this book, I can recreate scientific abstracts and deploy the models into production systems. I am very grateful to have these skills and I encourage anyone with deep curiosity like me to go all in on deep learning.

▲

nemil_zola 4 days ago | parent [-]

The project you mentioned you are working sounds interesting. Do you have more to share ?

I’m curious how ML/AI is leveraged in the domain of public transport. And what can it offer when compared to agent based models.

	▲	lazarus01 4 days ago \| parent [-]
		The project I’m working on emulates a scientific abstract. I’m not a scientist by any means, but am adapting an abstract to the public transit system in NYC. I will publish the project on my website when it’s done. I think it’s a few weeks away. I built the dataset, now doing experimental model training. If I can get acceptable accuracy, I will deploy in a production system and build a UI. Here is a scientific abstract that inspired my to start building this system. -> https://arxiv.org/html/2510.03121 I am unfamiliar with agent based models, sorry I can’t offer any personal insight there, but I ran your question through Gemini and here is the AI response: Based on the scientific abstract of the paper "Real Time Headway Predictions in Urban Rail Systems and Implications for Service Control: A Deep Learning Approach" (arXiv:2510.03121), agent-based models (ABMs) and deep learning (DL) approaches compare as follows: ### 1. Computational Efficiency and Real-Time Application * Deep Learning (DL): The paper proposes a ConvLSTM (Convolutional Long Short-Term Memory) framework designed for high computational efficiency. It is specifically intended to provide real-time predictions, enabling dispatchers to evaluate operational decisions instantly. * Agent-Based Models (ABM): While the paper does not use ABMs, it contrasts its DL approach with traditional "computationally intensive simulations"—a category that includes microscopic agent-based models. ABMs often require significant processing time to simulate individual train and passenger interactions, making them less suitable for immediate, real-time dispatching decisions during operations. ### 2. Modeling Methodology * Deep Learning (DL): The approach is data-driven, learning spatiotemporal patterns and the propagation of train headways from historical datasets. It captures spatial dependencies (between stations) and temporal evolution (over time) through convolutional filters and memory states without needing explicit rules for train behavior. * Agent-Based Models (ABM): These are typically rule-based and bottom-up, modeling the movement of each train "agent" based on signaling rules, spacing, and train-following logic. While highly detailed, they require precise calibration of individual agent parameters. ### 3. Handling Operational Control * Deep Learning (DL): A key innovation in this paper is the direct integration of target terminal headways (dispatcher decisions) as inputs. This allows the model to predict the downstream impacts of a specific control action (like holding a train) by processing it as a data feature. * Agent-Based Models (ABM): To evaluate a dispatcher's decision in an ABM, the entire simulation must typically be re-run with new parameters for the affected agents, which is time-consuming and difficult to scale across an entire metro line in real-time. ### 4. Use Case Scenarios * Deep Learning (DL): Optimized for proactive operational control and real-time decision-making. It is most effective when large amounts of historical tracking data are available to train the spatiotemporal relationships. * Agent-Based Models (ABM): Often preferred for off-line evaluation of complex infrastructure changes, bottleneck mitigation strategies, or microscopic safety analyses where the "why" behind individual train behavior is more important than prediction speed.