Remix.run Logo
I analyzed 571M Amazon reviews to find the most profanity-filled customer rants(burla-cloud.github.io)
64 points by jmp1062 3 days ago | 17 comments
vivzkestrel 3 days ago | parent | next [-]

- i saw your other comment that talks about using an open source dataset but i had to ask

- how would you actually go about loading reviews if you really wanted to

- what kind of system would you need to work around the captcha and stuff

jmp1062 3 days ago | parent [-]

i would probably use Playwright with custom code, create chunks based on similar products, then run it on a large cluster in parallel (https://github.com/Burla-Cloud/burla).

if you have a single worker trying to scrape a shit ton of products back to back to back you're going to get rate limited or their bot detection will catch you.

skyberrys 3 days ago | parent | prev | next [-]

Well the website is kind of useless, but it does suck me in. I love reading crazy reviews. The only thing that would make it better is if they also included Airbnb reviews.

The second review I read was a customer complaining about profanity in a movie and then writing out all the examples. Who has time for that?

jperryjperry 3 days ago | parent [-]

well well well... take a look at what I just built https://burla-cloud.github.io/airbnb-burla/

skyberrys 3 days ago | parent | next [-]

I must say the reviews you have are more in the horrifying and less in the pretty funny situation. My favorite funny (and bad) review was a host that accused his guest of flipping over all the furniture in the house and the guest was like "why and how would I do this". I still want to know what happened that day. How did all the furniture end up upside down?

skyberrys 3 days ago | parent | prev [-]

I love it! Endless entertainment and 0 attempts to get me to stay at the Airbnb.

jperryjperry 3 days ago | parent [-]

yeah now that I have the images I want to do some silly shit with it. maybe find the all Airbnbs with satanic decor or like red rooms haha

skyberrys 2 days ago | parent [-]

Find all of the ones with taxidermy in the southwest USA.... It's like all of them. Okay I did find one in Austin without, but it still had cowhide pillows.

reaperducer 2 days ago | parent | prev | next [-]

It does taste exactly like formaldehyde and kerosene swirled around with a bit of gas station kimchi that's been warming down the front of a hobo's pants. How do I know? To give this comment validity, I took the liberty of mixing up that exact concoction, then I went to down the train yard and asked a hobo to warm it up for me right on his swampy, fetid hobo taint.

Loved this until I remembered that these reviews are what AI is trained on and influenced by.

But at least he's employing hobos.

rendaw 2 days ago | parent | prev | next [-]

Amazon doesn't even allow you to use slightly strong (non-profanity) wording in reviews these days. Are these old reviews?

jperryjperry 2 days ago | parent [-]

from 2023

rawgabbit 3 days ago | parent | prev | next [-]

I love this. The reviews' word play tops MacBeth in my book.

jmp1062 3 days ago | parent [-]

i'm just happy they don't censor the comment section haha, makes for funny content.

i also love that people will complain about the vulgar language in a book or movie by writing a review that contains a quote with the vulgar language

mind_heist 3 days ago | parent | prev | next [-]

how did you scrape all the reviews?

jperryjperry 3 days ago | parent [-]

open source dataset from McAuley Lab at UCSD https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2....

I'm going to publish an Airbnb example tomorrow where I scraped 1,406,718 photo URLs from public listing pages. For that I used https://docs.burla.dev/ which is a high-performance parallel processing python library I've been working on for a few years now.

add-sub-mul-div 3 days ago | parent | prev [-]

Shit like this is why Amazon reviews are now behind a login wall for everyone.

mandeepj 19 hours ago | parent [-]

I don’t think a login wall can stop scrappers