Remix.run Logo
tptacek 8 hours ago

So, wait: this is just based on taking the 40 best/most consistent Nano Banana outputs for a prompt to do pixel-art versions of isometric map tiles? That's all it takes to finetune Qwen to reliably generate tiles in exactly the same style?

Also, does someone have an intuition for how the "masking" process worked here to generate seamless tiles? I sort of grok it but not totally.

NAR8789 7 hours ago | parent | next [-]

I think the core idea in "masking" is to provide adjacent pixel art tiles as part of the input when rendering a new tile from photo reference. So part of the input is literal boundary conditions on the output for the new tile.

Reference image from the article: https://cannoneyed.com/img/projects/isometric-nyc/training_d...

You have to zoom in, but here the inputs on the left are mixed pixel art / photo textures. The outputs on the right are seamless pixel art.

Later on he talks about 2x2 squares of four tiles each as input and having trouble automating input selection to avoid seams. So with his 512x512 tiles, he's actually sending in 1024x1024 inputs. You can avoid seams if every new tile can "see" all its already-generated neighbors.

You get a seam if you generate a new tile next to an old tile but that old tile is not input to the infill agorithm. The new tile can't see that boundary, and the style will probably not match.

cannoneyed 7 hours ago | parent | next [-]

That’s exactly right - the fine tuned Qwen model was able to generate seamless pixels most of the time, but you can find lots of places around the map where it failed.

More interestingly, not even the biggest smartest image models can tell if a seam exists or not (likely due to the way they represent image tokens internally)

NAR8789 7 hours ago | parent [-]

I'm curious why you didn't do something like generate new tiles one at a time, but just expand the input area on the sides with already-generated neighbors. Looks like your infill model doesn't really care about tile sizes, and I doubt it really needs full adjacent tiles to match style. Why 2x2 tile inputs rather than say... generate new tiles one at a time, but add 50px of bordering tile on each side that already has a pixel art neighbor?

cannoneyed 6 hours ago | parent | next [-]

Yeah I actually did that quite a bit too. I didn't want to get too bogged down in the nitty gritty of the tiling algorithm because it's actually quite difficult to communicate via writing (which probably contributed to it being hard to get AI to implement).

The issue is that the overall style was not consistent from tile to tile, so you'd see some drift, particularly in the color - and you can see it in quite a few places on the map because of this.

KolmogorovComp 2 hours ago | parent | next [-]

Have you tried restraining the color palette by post-processing?

NAR8789 5 hours ago | parent | prev [-]

Oh that makes sense, thanks for explaining! And thanks for sharing your process and result! Interesting to see your process, and looking at the map really tickles my nostalgia

polishdude20 6 hours ago | parent | prev [-]

There would have to be some tiles which don't have all four neighbors generated yet.

abrookewood 2 hours ago | parent | prev [-]

Thanks for this - I was confused as well. Makes perfect sense now.

pedrogpimenta 2 hours ago | parent | prev | next [-]

So you don't grok it. You understand it, but don't grok it. Respect the martian :)

__mharrison__ 4 hours ago | parent | prev | next [-]

Does anyone have a good reference for finetuning Qwen? This article opened my eyes a bit...

dimitri-vs 3 hours ago | parent [-]

The turn-key option is ostris ai-toolkit which has good tutorials on YT and can be run completely locally or via RunPod. Claude Code can set everything up for you (speaking from experience) and can even SSH into RunPod.

larodi 5 hours ago | parent | prev [-]

you can tell the diffusion from space, sadly it would normally take years to do it the conventional way, which is still the only correct way.