Remix.run Logo
dns_snek 6 days ago

Slightly tweaked, a leaderboard of em dash containing comments after ChatGPT release, limited to users who used them in fewer than 1% of comments before ChatGPT release, and who posted at least 200 comments before and after ChatGPT release. Data is recent (August 28th).

Of course this doesn't mean they're using ChatGPT either, they could've switched devices or started using them because they felt like it.

  #   user           before_chatgpt after_chatgpt  
  1   fao_           9/1777 (1 %)   36/225 (16 %)
  2   tlogan         1/962 (0 %)    59/399 (15 %)
  3   whynotminot    1/250 (0 %)    36/356 (10 %)
  4   unclebucknasty 13/2566 (1 %)  38/378 (10 %)
  5   iLemming       0/793 (0 %)    61/628 (10 %)
  6   nostrebored    10/1045 (1 %)  32/331 (10 %)
  7   freeone3000    0/2128 (0 %)   74/791 (9 %) 
  8   pdabbadabba    6/932 (1 %)    20/225 (9 %) 
  9   thebooktocome  4/632 (1 %)    18/208 (9 %) 
  10  tnecniv        0/671 (0 %)    34/446 (8 %) 
  11  dkersten       39/5092 (1 %)  24/318 (8 %) 
  12  stared         8/1565 (1 %)   29/392 (7 %) 
  13  ETH_start      3/385 (1 %)    75/1029 (7 %)
  14  tcbawo         2/792 (0 %)    15/218 (7 %) 
  15  jbm            2/406 (0 %)    22/350 (6 %) 
Query [2]:

  WITH by_user AS (
    SELECT
      `by` AS user,
      COUNTIF(text LIKE '%—%') AS match_count,
      COUNT(*) AS total_count,
      (timestamp >= '2022-11-30') AS after_chatgpt
    FROM `bigquery-public-data.hacker_news.full` 
    WHERE type = 'comment'
    GROUP BY user, after_chatgpt
  ),
  combined AS (
    SELECT
      user,
      MAX(IF(NOT after_chatgpt, match_count, 0)) AS match_before_chatgpt,
      MAX(IF(NOT after_chatgpt, total_count, 0)) AS total_before_chatgpt,
      MAX(IF(after_chatgpt, match_count, 0)) AS match_after_chatgpt,
      MAX(IF(after_chatgpt, total_count, 0)) AS total_after_chatgpt,
    FROM by_user
    GROUP BY user
    HAVING total_before_chatgpt >= 200 AND total_after_chatgpt >= 200
  ),
  with_fractions AS (
    SELECT
      *,
      SAFE_DIVIDE(match_before_chatgpt, total_before_chatgpt)  AS fraction_before_chatgpt,
      SAFE_DIVIDE(match_after_chatgpt, total_after_chatgpt) AS fraction_after_chatgpt
    FROM combined
  )
  SELECT
    user,
    FORMAT('%d/%d (%.0f %%)', match_before_chatgpt, total_before_chatgpt, ROUND(fraction_before_chatgpt*100)) AS before_chatgpt,
    FORMAT('%d/%d (%.0f %%)', match_after_chatgpt, total_after_chatgpt, ROUND(fraction_after_chatgpt*100)) AS after_chatgpt
  FROM with_fractions
  WHERE fraction_before_chatgpt < 0.01
  ORDER BY fraction_after_chatgpt DESC
  LIMIT 15
[1] https://news.ycombinator.com/item?id=45072937

[2] https://console.cloud.google.com/marketplace/product/y-combi...

owenversteeg 4 days ago | parent | next [-]

Out of curiosity, I browsed comments from some of those accounts. #5 has a ton of obvious LLM-generated comments. #2 has some. I didn't see any in the most recent comments from #7 and #10.

stavros 6 days ago | parent | prev | next [-]

I think for this one you should do absolute, rather than relative, increase. The first place went from 9 to 36 whereas second went from 1 to 59, the number of comments they wrote without ChatGPT hitting an emdash shouldn't be relevant, I think.

It does need some normalization for people who post very few comments, but it feels more fair this way.

nullc 6 days ago | parent | prev [-]

It's interesting that only two of them are zero before. Going from few to many is nowhere near the chatgpt using signal as going from zero to many... unless perhaps the few before were obviously from copy and pastes elsewhere.