Remix.run Logo
stabbles a day ago

For Python (or PyPI) this is easier, since their data is available on Google BigQuery [1], so you can just run

    SELECT * FROM `bigquery-public-data.pypi.distribution_metadata` ORDER BY length(version) DESC LIMIT 10
The winner is: https://pypi.org/project/elvisgogo/#history

The package with most versions still listed on PyPI is spanishconjugator [2], which consistently published ~240 releases per month between 2020 and 2024.

[1] https://console.cloud.google.com/bigquery?p=bigquery-public-...

[2] https://pypi.org/project/spanishconjugator/#history

Rygian a day ago | parent | next [-]

Regarding spanishconjugator, commit ec4cb98 has description "Remove automatic bumping of version".

Prior to that commit, a cronjob would run the 'bumpVersion.yml' workflow four times a day, which in turn executes the bump2version python module to increase the patch level. [0]

Edit: discussed here: https://github.com/Benedict-Carling/spanish-conjugator/issue...

[0] https://github.com/Benedict-Carling/spanish-conjugator/commi...

dijksterhuis a day ago | parent [-]

i love the package owner’s response in that issue xD

breakingcups a day ago | parent | prev | next [-]

Tangential, but I've only heard about BigQuery from people being surprised with gargantuan bills for running one query on a public dataset. Is there a "safe" way to use it with a cost limit, for example?

abxyz a day ago | parent [-]

Yes you can set price caps. The cost of a query is understandable ahead of time with the default pricing model ($6 per TB of data processed in a query). People usually get caught out by running expensive queries recursively. BigQuery is very cost effective and can be used safely.

bangaladore an hour ago | parent | next [-]

Can you actually set "price caps"?

Most of the cloud services allow you to set alerts that are notorious for showing up after you've accidentally spend 50k USD. So even if you had a system that automated shutdown of services when getting the alert, you are SOL.

Bratmon a day ago | parent | prev | next [-]

You can tell someone has worked in the cloud for too long when they start to think of $6 per database query as a reasonable price.

lenkite a day ago | parent | next [-]

We really need to go back to on-premise. We have surrendered our autonomy to these megacorps and now are paying for it - quite literally in many cases.

morcus a day ago | parent | prev | next [-]

Surely most queries should process much less than 1 TB of data?

abxyz a day ago | parent | prev | next [-]

My 3TB, 41 billion row table costs pennies to query day to day. The billing is based on the data processed by the query, not the table size. I pay more for storage.

21 hours ago | parent | prev [-]
[deleted]
Aeolun a day ago | parent | prev [-]

Running ripgrep on my harddrive would cost me $48 at that price point.

Symbiote 19 hours ago | parent [-]

BigQuery data is stored (I assume) in column oriented files with indices, so a typical query reads only a tiny fraction of the stored data.

passivegains a day ago | parent | prev | next [-]

I decided my life could not possibly go on until I knew what "elvisgogo" does, so I downloaded the tarball and poked around. it's a pretty ordinary numpy + pandas + matplotlib project that makes graphs from csv. one line jumped out at me: str_0 = ['refractive_index','Na','Mg','Al','Si','K','Ca','Ba','Fe','Type'] the university of st. andrews has a laser named "elvis" that goes on a remote controlled submarine: https://www.st-andrews.ac.uk/~bds2/elvislaser.htm I was hoping it'd be about go-go dancing to elvis music, but physics experiments on light in seawater is pretty cool too.

thesystemisbust a day ago | parent | prev | next [-]

You can also query for free at clickpy.clickhouse.com. If you click on any of the links on the visuals you can see the query used.

The underlying dataset is hosted at sql.clickhouse.com e.g. https://sql.clickhouse.com/?query=U0VMRUNUIGNvdW50KCkgICBGUk...

disclaimer: built this a a while ago but we maintain this at clickhouse

oh and rubygems data is also there.

darkamaul a day ago | parent [-]

Here [0] is the partial query on the ClickHouse dataset, with different results due to a quota error [1].

[0] https://sql.clickhouse.com?query=U0VMRUNUIHByb2plY3QsIE1BWCh...

[1] Quota read limit exceeded. Results may be incomplete.

thesystemisbust 3 hours ago | parent [-]

We have mvs you can use to avoid this

https://sql.clickhouse.com/?query=U0VMRUNUIHByb2plY3QsIE1BWC...

takes 0.1s

n4r9 a day ago | parent | prev | next [-]

> spanishconjugator [2], which consistently published ~240 releases per month between 2020 and 2024

They also stopped updating major and minor versions after hitting 2.3 in Sept 2020. Would be interesting to hear the rationale behind the versioning strategy. Feels like you might as well use a datetimestamp for the version.

0x500x79 a day ago | parent | prev [-]

deps.dev has a similar bigquery dataset across a couple more languages if someone wanted to do analysis across the other ecosystems they support.