▲ | mikewarot 6 days ago | |
3 months ago, I would have agreed with much of this article, however... In the past week, I watched this video[1] from Welch Labs about how deep networks work, and it inspired an idea. I spent some time "vibe coding" with Visual Studio Code's ChatGPT5 preview and had it generate a python framework that can take an image, and teach a small network how to generate that one sample image. The network was simple... 2 inputs (x,y), 3 outputs (r,g,b), and a number of hidden layers with a specified number of nodes per layer. It's an agent, it writes code, tests it, fixes problems, and it pretty much just works. As I explored the space of image generation, I had it add options over time, and it all just worked. Unlike previous efforts, I didn't have to copy/paste error messages in and try to figure out how things broke. I was pleasantly surprised that the code just worked in a manner close to what I wanted. The only real problem I had was getting .venv working right, and that's more of an install issue rather then the LLMs fault. I've got to say, I'm quite impressed with Python's argparse library. It's amazing how much detail you can get out of a 4 hidden layers of 64 values, and 3 output channels (rgb), if you're willing to through a few days of CPU time at it. My goal is to see just how small of a network I can make to generate my favorite photo. As it iterates through checkpoints, I have it output an image with the current values, to compare against the original, it's quite fascinating to watch as it folds the latent space to capture major features of the photo, then folds some more to catch smaller details, over and over, as the signal to noise ratio very slowly increases over the hours. As for ChatGPT5, maybe I just haven't run out of context window yet, but for now, it all just seems like magic. |