Remix.run Logo
vidarh 6 days ago

The media issue has nothing to do with the protocol itself, and everything to do with servers that choose to expire remote content.

The "subset of the replies" issue likewise isn't inherent, but is somewhat more problematic as it requires everyone to behave in ways that makes it work, e.g. push replies back to the origin server, and regularly poll the origin for additional replies etc., and Mastodon itself is not great at his.

To the extent AcitivityPub itself is affecting any of it, it's only in the sense that ActivityPub imposes very few constraints on implementations, and that leaves a lot of room for specific applications to behave in counter-productive ways..

styanax 6 days ago | parent [-]

A problem I've seen emerge in the Lemmy side of ActivityPub is that it relies on a hub and spoke model. In practice this has failed when the hub goes offline, all the spokes are now left with independent copies of it's last known state and destroys the community progress (have to rebuild elsewhere, lose old content and references, etc.) if even you can regrow the community.

Mastodon has the similar problem but worse with content discovery; a user is not "seen" remotely by anyone until one remote person finds them and subscribes to their content explicitly. On every single remote instance, which of course is undesirable but that's how ActivityPub is designed.

I don't believe in ForgeFed terms this matters as much as being able to search across the federated network for repos, etc. which I think is a key feature. Sure issues and user accounts and whatnot, but an AP-linked FF-wide search would be insane on how useful it could be for users (and how to implement a "distributed search index" seems like a tough nut to crack).

vidarh 5 days ago | parent [-]

A search is pretty "easy" (doing it distributed is just more expensive in terms of resources than a single index, because you end up doing multiple searches in parallel and merging results) - the main issue with search on the Mastodon side of things have been politics. That is, a lot of people like that discovery isn't as easy as searching. For subsets of the Fediverse where people actually agree search is a good thing or if the software specifically indicates consent or not, it'd be fairly straightforward to provide.

For Lemmy the hub and spoke model is essentially intentional - groups "belong" to a specific instance. But there's nothing in ActivityPub that'd prevent a USENET style model of groups either. There's nothing in ActivityPub that prevetns an application where a collection is effectively open to writing by all, and that would then relay messages to a sufficient set of "downstream" instances.

It'd be interesting to have that as an alternative to the Lemmy approach - I think the two could live quite well side by side.

styanax 5 days ago | parent [-]

> because you end up doing multiple searches in parallel and merging results

This reminds me of the design model of SearX/SearXNG - instead of a distributed forge index, it would distribute the search endpoints of forge instances to facilitate the next steps you outline. It almost feels like a central coordinator or maybe a CDN-like network set of search proxies would be needed to do the actual combining and filtering of results. Maybe it could fit in the Codeberg operational umbrella in some future plan.

In practice Nostr does this step on the client side - one subscribes to relays, then when querying for new content it asks all relays, gets all the duplicate metadata and filters on the client. Huge network use and battery drain on your handheld device, Nostr bouncers have emerged for this exact same reason, a popular software is "Bostr", easy to find examples run by random volunteers but it requires money (disk/cpu/ram): https://bostr.azzamo.net/

vidarh 5 days ago | parent [-]

There are quite a lot of approaches you can take to reduce the cost of this, e.g. sharding by search term, so the number of shards hit for any specific search term is a subset of the total set.

You can also certainly broadly cache the "top" of the hit lists for very common searches, so you don't need to fan out unless you're doing less common searches or going beyond the first "page" of results.