Yeah. I'm trying to figure out how to combat these inconsistencies. Right now, I have some manual overrides, but not sure it's sustainable to keep manually overriding inconsistent listings.

Any thoughts? Should I default to what's in the product title instead of the unit count? Not sure the best way to combat this.

▲ Propelloni 4 hours ago | parent | next [-]

Maybe you could build a heuristic around shipping weight? A single golf ball weighs about 45 to 50 g, so divide the shipping weight by, say, 50 g to account for boxing and so on and you get a rough estimate of the balls in the package.

	▲	rockdiesel 4 hours ago \| parent [-]
		O wow, that's an interesting approach. That would've never crossed my mind without posting this on HN. Appreciate the suggestion.

▲ fultonn 3 hours ago | parent | prev | next [-]

what I've done for a similar script in the past:

    answer_initial = llm(prompt=prompt, site=site) # JSON with answer and any stuff needed to do heuristic checks.
    heuristic_results = heuristics(answer_final) # rule based.
    answer_final = llm(prompt-prompt, site=site, answer=answer_initial)
    mark_for_review = ... # basically just a bunch of hard-coded stuff I add flag possible failures for review.

You can use an extremely small/cheap model for something like this -- granite 4.0 micro works fine for me, 3.3 8b did as well, both run on my macbook. YMMV / try different models and see how it goes.

▲ datsci_est_2015 3 hours ago | parent | prev | next [-]

The funny thing is, if your method becomes the dominant way of price discovery, a bad actor will simply try to circumvent the system to get their product ordered first, and you’ll be embattled in a Cold War.

See also: toilet paper sheet count comparisons.

▲ tonygrue 4 hours ago | parent | prev | next [-]

You could make a list of all the metadata and pass it through a LLM to determine the quantity. You’ll need some sanity checking but if you prompt it with some examples it will do well. (Done something very similar myself.)

▲ hluska 4 hours ago | parent | prev [-]

I’m not the person you replied to but I took a look at the data and this is an interesting one. You found a really cool data set and this will be fun.

Consider the top four most expensive golf balls on your current list:

TaylorMade 2021 TP5x (3+1 Box) 4DZ Golf Ball Pack, White — uses 4DZ in title, 48.0 in unit count in product specs.

Bridgestone Golf Tour B RXS Quadfecta - nothing in the title, unit count in product specs is 4.0. This one shows 4 dozen in a different spot than other balls.

TaylorMade Golf 2024 TP5 Golf Balls 3+1 Box Four Dozen — Four dozen in the title, unit count in product specs is 1.0 but it has 4.0 dozen in the same div as the Bridgestone balls.

Srixon Z Star Yellow Golf Balls - Buy 2 DZ Get 1 DZ Free — Title shows buy 2 DZ get 1 free. That’s represented as 2+1 or 3+1 in other data. In product specs it shows a unit count of 1.0.

— In that extremely limited sample, the product weight is a pretty good metric to show that the unit count is flawed though that only works in comparison to others. I wonder if you could do a multi pass approach, where you sort data first and then do a unit count versus weight check to find outliers and then start rocking through the titles? You’ll still end up digging through a lot of edge cases and that won’t be much fun but a multi pass would at least give you some insight into those weird edge cases.

	▲	rockdiesel 3 hours ago \| parent [-]
		I appreciate you taking a look. This product weight approach has me intrigued and something I'll look into. I'm thinking I could just start with any listing where unit count = 1 and take a pass at those first. I haven't looked yet, but I'm guessing single unit counts are almost always inconsistent with the actual number of golf balls.