Remix.run Logo
hluska 4 hours ago

I’m not the person you replied to but I took a look at the data and this is an interesting one. You found a really cool data set and this will be fun.

Consider the top four most expensive golf balls on your current list:

TaylorMade 2021 TP5x (3+1 Box) 4DZ Golf Ball Pack, White — uses 4DZ in title, 48.0 in unit count in product specs.

Bridgestone Golf Tour B RXS Quadfecta - nothing in the title, unit count in product specs is 4.0. This one shows 4 dozen in a different spot than other balls.

TaylorMade Golf 2024 TP5 Golf Balls 3+1 Box Four Dozen — Four dozen in the title, unit count in product specs is 1.0 but it has 4.0 dozen in the same div as the Bridgestone balls.

Srixon Z Star Yellow Golf Balls - Buy 2 DZ Get 1 DZ Free — Title shows buy 2 DZ get 1 free. That’s represented as 2+1 or 3+1 in other data. In product specs it shows a unit count of 1.0.

— In that extremely limited sample, the product weight is a pretty good metric to show that the unit count is flawed though that only works in comparison to others. I wonder if you could do a multi pass approach, where you sort data first and then do a unit count versus weight check to find outliers and then start rocking through the titles? You’ll still end up digging through a lot of edge cases and that won’t be much fun but a multi pass would at least give you some insight into those weird edge cases.

rockdiesel 3 hours ago | parent [-]

I appreciate you taking a look. This product weight approach has me intrigued and something I'll look into.

I'm thinking I could just start with any listing where unit count = 1 and take a pass at those first. I haven't looked yet, but I'm guessing single unit counts are almost always inconsistent with the actual number of golf balls.