Remix.run Logo
gpm 3 days ago

Hmm, analytics appear to default to enabled: https://github.com/BloopAI/vibe-kanban/blob/609f9c4f9e989b59...

It is harvesting email addresses and github usernames: https://github.com/BloopAI/vibe-kanban/blob/609f9c4f9e989b59...

Then it seems to track every time you start/finish/merge/attempt a task, and every time you run a dev server. Including what executors you are using (I think this means "claude code" or the like), whether attempts succeeded or not and their exit codes, and various booleans like whether or not a project is an existing one, or whether or not you've set up scripts to run with it.

This really strikes me as something that should be, must legally be in many jurisdictions, opt in.

louiskw 3 days ago | parent | next [-]

That's fair feedback, I have a PR with a very clear opt-in here https://github.com/BloopAI/vibe-kanban/pull/146

I will leave this open for comments for the next hour and then merge.

TeMPOraL 3 days ago | parent | next [-]

Nice, I vote for merging it :).

It really doesn't hurt to be honest about this and ask up-front. This is clear enough and benign enough that I'd actually be happy to opt-in.

louiskw 3 days ago | parent | next [-]

Merged and building, thanks for bearing with us

gpm 3 days ago | parent | prev [-]

I concur :)

smcleod 3 days ago | parent | prev [-]

Good on you for taking action on this kind of feedback!

bn-l 3 days ago | parent | prev | next [-]

Thanks, really appreciate the heads up. I put devs who do this on a personal black list for life.

I think also that this would be better as an mcp tool / resource. Let the model operate and query it as needed.

willsmith72 3 days ago | parent [-]

It's the email/username harvesting that you mean right? Or do people also have something against anonymised product analytics?

gpm 3 days ago | parent | next [-]

I have something against opt-out analytics over TCP/IP or UDP/IP period, because they aren't anonymized, they include an IP address by virtue of the protocol.

But I definitely only posted that original complaint of the email/username (not the person you responded to initially).

const_cast 3 days ago | parent | prev | next [-]

> anonymised product analytics?

They're not anonymous, they're just pseudo-anonymous. It's incredibly easy to collect pieces of data A thru Z that, on their own, are anonymous but, all together, are not. It's also incredibly easy to collect data that you think is generic but is actually not.

Do you query the screen size? I have bad news for you. But, all of this is besides the point: when that data is exfiltrated to a third-party service, you have no idea how it's being used. You have a piece of paper, if you're lucky, telling you the privacy policy, which is usually "you have no privacy dumbass".

Even if data appears completely anonymous to humans, it can be ingested by machine learning algorithms that can spot patterns and de-anonymize the data.

I mean, we have companies who's entire business model is "how do we string together bits of data and tie it to real-world identity?": namely Google. Turns out it's remarkably easy when you have your hands in a lot of different pots. Collect a little anonymous data here, a little there, and boom: now you know that Billy Joe who lives on First Street loves to go to Walmart at 1 AM and buy Ben and Jerry's ice cream in a moment of weakness.

sexeriy237 2 days ago | parent [-]

Ad agencies are using the contact tracing algorithms made for covid to track people.

adastra22 3 days ago | parent | prev [-]

Yes to both.

willsmith72 3 days ago | parent [-]

how do you build a product without analytics? how do you measure the success and failure of every change?

msgodel 3 days ago | parent | next [-]

Many users tend to be pretty vocal when changes break things they like, you don't need to spy on them for that. Mail readers > analytics frameworks.

willsmith72 3 days ago | parent [-]

"not breaking things they like" is a very low bar for building a great product

To be honest building things this way seems like such a competitive disadvantage I don't see how it could ever work at scale. Certainly all the big players are using them. If we shake our heads at the little players doing the same, we're just going to widen the moat

collingreen a day ago | parent | next [-]

Isn't that an argument against any piece of ethics? Am I missing something or are you arguing that gaining an advantage by being a bad actor means you shouldn't be a good actor because then you'd be at a disadvantage?

I get that I am making a general statement from your original narrow scope so correct me if I'm wrong that you mean THIS bad thing is fine but other bad things are still bad.

adastra22 2 days ago | parent | prev [-]

Spying on your users does not give better feedback than simply asking your users (surveys, focus groups) and responding to the considered comments you receive. Spying and trying to infer intent is such a low bar to improve upon.

willsmith72 2 days ago | parent | next [-]

> Spying on your users does not give better feedback than simply asking your users

If that's true, there are many companies paying thousands -hundreds of thousands unnecessarily. Why are they choosing to throw away their money?

msgodel a day ago | parent [-]

Companies blow money on bad ideas all the time. Middle managers love analytics because it lets them win internal arguments, not because it actually solves problems.

throwaway7783 2 days ago | parent | prev [-]

It is not an either or. Surveys are almost always ignored. Micro improvements cannot be done with just surveys and asking users. Often users do not know how to describe a problem. Product analytics, if anonymized with opt-out gives a pretty good picture of intent, especially in B2B software.

adastra22 2 days ago | parent [-]

Analytics cannot be anonymized.

throwaway7783 11 hours ago | parent [-]

Why?

adastra22 7 hours ago | parent [-]

Any complex dataset has enough revealing information as to make deanonimization possible. To truly muddle the waters enough to make such attempts impossible would require injecting enough noise as to make the analytics useless to learn from.

This is a fundamental property derived from information theory, but also confirmed time after time in practice: https://www.theguardian.com/technology/2019/jul/23/anonymise...

Data anonymization is a myth sold to politicians to whitewash data collection.

throwaway7783 6 hours ago | parent [-]

Sure, but that is broader than product analytics and applies to all data collection. The word I should have used is "pseudonymize". The goal for capturing product analytics is not to deanonimize but understand usage trends/bottlenecks.

adastra22 3 days ago | parent | prev [-]

You know that generations of engineers built and sold products without spying on their users.

swyx 3 days ago | parent | prev | next [-]

could you point me to what jurisdictions require analytics opt in esp for open source devtools? thats not actually something ive seen as a legal requirement, more a community preference.

eg ok we all know about EU website cookie banners, but i am more ignorant about devtools/clis sending back telemetry. any actual laws cited here would update me significatnly

47282847 2 days ago | parent | next [-]

GDPR is not about cookies but about privacy in general. It’s an easy read, and yes, it applies to software and telemetry as much as it applies to websites and cookies, and it applies to anyone providing services and tools to Europeans.

"Personal data is information that relates to an identified or identifiable individual. If you cannot directly identify an individual from that information, then you need to consider whether the individual is still identifiable. You should take into account the information you are processing together with all the means reasonably likely to be used by either you or any other person to identify that individual."

gpm 3 days ago | parent | prev [-]

I mean, you've labelled one big one already with the GDPR covering a significant fraction of the world - and unlike your average analytics "username and email address" sounds unquestionably identifying/personal information.

Where I live I think this would violate PIPEDA, the Canadian privacy law that covers all business that do business in any Canadian province/territory other than BC/Alberta/Quebec (which all have similar laws).

There's generally no exception in these for "open source devtools" - laws are typically still laws even if release something for free. The Canadian version (though I don't think the GDPR does) has an exception for entirely non-commercial organizations, but Bloop AI appears to be a commercial organization so it wouldn't apply. It also contains an exception for business contact information - but as I understand it that is not interpreted broadly enough to cover random developers email addresses just because they happen to be used for a potentially personal github account.

Disclaimer: Not a lawyer. You should probably consult a lawyer in the relevant jurisdiction (i.e. all of them) if it actually matters to you.

generalizations 3 days ago | parent [-]

> GDPR covering a significant fraction of the world

> privacy law that covers all business that do business in any Canadian province

A random group of people uploaded free software source code and said 'hey world, try this out'. I wish the GDPR and the PIPEDA the best of luck in keeping people from doing that. (Not to actually defend the telemetry, tbh that's kinda sleezy imo.)

gpm 3 days ago | parent [-]

I mean, those are merely the two countries privacy laws I'm most familiar with. The general principal of "no you can't just steal peoples personal information" is not something unique to the ~550 million people the laws I cited cover.

And the laws don't prevent you from uploading "random" software and saying "try this". They prevent you from uploading spyware and saying "try this". Edit: Nor does the Canadian one cover any random group of people, it covers commercial entities, which Bloop AI appears to be.

jjangkke 3 days ago | parent | prev | next [-]

analytics stuff is fine but the email harvesting/github username appears to be illegal especially if its done without notifying the user?

great catch, many open source projects appear to be just an elaborate lead gen tool these days.

janoelze 3 days ago | parent | prev [-]

fork, task claude to remove all github dependence, build.

gpm 3 days ago | parent | next [-]

I did this locally to try it out :) Also stubbed out the telemetry and added jj support. "Personalizing" software like this is definitely one of LLMs superpowers.

I'm not particularly inclined to publish it because I don't want to associate myself with a project harvesting emails like this.

BeetleB 3 days ago | parent | next [-]

> and added jj support

Please do the same for Aider :-)

https://github.com/Aider-AI/aider/issues/4250

gpm 3 days ago | parent [-]

Be the change you want to see! This is pretty close to a best case task for these models because it's a relatively direct "translation" of existing code.

There's a big difference between "something actually ready for use" and "claude hacked sometime together with bubblegum and ducttape that works on my system" though - doing it properly will probably take a bit of work.

janoelze 3 days ago | parent | prev [-]

yes, i was just doing/thinking the same, it was an interesting experience to sculpt a somewhat complex codebase to my needs in minutes.

hsbauauvhabzb 3 days ago | parent | prev [-]

Use a telemetry backed tool to remove telemetry from another telemetry backed tool?

TeMPOraL 3 days ago | parent | next [-]

There's telemetry you consent to, and telemetry you don't. Just because I'm fine with a tool like Claude Code collecting some telemetry, doesn't mean I'm fine with a different party collecting telemetry - and the two products being used together doesn't change it. It's not naive, it's simply my right.

janoelze 3 days ago | parent | prev [-]

it came to mind first, you're free to use whatever flavour of LLM f̶l̶o̶a̶t̶s̶ ̶y̶o̶u̶r̶ ̶b̶o̶a̶t̶ vibes your code.

hsbauauvhabzb 3 days ago | parent [-]

That doesn’t change the naïvety of the response.

collingreen a day ago | parent [-]

Your insult calling the commenter naive requires all telemetry from all sources to be the same.

I disagree with who is naive in this exchange.