> For example, deleting 1 million rows in a single transaction is a textbook case of what not to do. Instead, splitting the operation into smaller batches, such as deleting 10,000 rows across 100 iterations, is far more effective.

Why do I as a user have to do that? Why can't the database implement batching internally and automatically transform my 1-million-rows query into an appropriate list of batched queries?

(Edit: Thanks a lot for the answers, that makes more sense - in particular the point that this would also lock one million rows at once)

▲ Nican 7 months ago | parent | next [-]

Transactional consistency / ACID guarantees.

Before you execute the query, you should be able to query any of the data, and after you execute the query, none of the data should be available. The mechanisms to make a transactional database is tricky.

Some databases, like CockroachDB, provides some built-in TTL capabilities.

But also- if you are having to delete huge ranges of data and do not care about consistency, you are probably looking at an analytical workload, and there would be better databases suited for that, like Clickhouse.

	▲	tremon 7 months ago \| parent [-]
		> none of the data should be available As written, that's not required. The data should not be retrieveable by query, but ACID only specifies what happens at the client-server boundary, not what happens at the server-storage boundary (Durability prescribes that the server must persist the data, but not how to persist it). A database that implements DELETEs by only tombstoning the row and doesn't discard the data until the next re-index or vacuum operation would still be ACID-compliant.

▲ asQuirreL 7 months ago | parent | prev | next [-]

One reason I can think of is that the database needs to maintain atomicity and isolate effects of any given operation (the A and I in ACID).

By manually batching the deletes, you are telling the database that the whole operation does not need to be atomic and other operations can see partial updates of it as they run. The database wouldn't be able to do that for every large delete without breaking its guarantees.

▲ wruza 7 months ago | parent [-]

I think that gp’s comment can be reinterpreted as: why should this landmine exist when databases could notify a reader of its manual about this issue in an explicit way, for example:

  DELETE FROM t WHERE … BATCH 100

Which would simulate batched queries when called outside of transaction. This would remove the need of a client to be connected (or at least active) for a duration of this lenghty operation.

If DELETE is so special, make special ways to manage it. Don’t offload what is your competence onto a clueless user, it’s recipe for disaster. Replace DELETE with anything and it’s still true.

  ALTER DATABASE d SET UNBATCHED DELETE LIMIT 500000

I know a guy (not me) who deleted rows from an OLTP table that served a country-level worth of clients and put it down for two days. That is completely database’s fault. If its engine was designed properly for bigdata, it should have refused to do so on a table with gazillions of rows and suggested a proper way to do it.

▲

cogman10 7 months ago | parent | next [-]

Rather than batching, I would want a "NO ROLLBACK DELETE" sort of command. The real expensive part of the delete is rewriting the records into the transaction log so that a cancel or crash can undo the delete.

If you've gone to the effort of batching things, you are still writing out those records, you are just giving the db a chance to delete them from the log.

I'd like to save my ssds that heartache and instead allow the database to just delete.

In MSSQL in some extreme circumstances, we've partitioned our tables specifically so we can use the 'TRUNCATE TABLE' command as delete is just too expensive.

That operation can wipe gbs in seconds.

▲

mike_hearn 7 months ago | parent | next [-]

Yes the commercial databases make it easier to handle this.

One simple way in Oracle is to take a table lock, copy the data you want to preserve out to a temporary table, truncate the target table, copy the data back in.

▲

Kinrany 7 months ago | parent | prev | next [-]

What happens when two transactions select and then delete two rows in the opposite order while requesting "no rollback"?

▲

pixl97 7 months ago | parent | prev [-]

I would say it can unlink gbs in seconds. The data is still on the disk until it's trimed or overwritten.

▲

immibis 7 months ago | parent [-]

So why does it need to be copied into the WAL log until vacuum runs?

And vacuum is not expected or required to be atomic, since it deletes data that was necessarily unreferenced anyway, so it also shouldn't need to copy the old data into WAL files.

▲

mattashii 7 months ago | parent [-]

Many DBMSs with index-oriented storage (MySQL, Oracle, MSSQL) use undo logging for a transaction's MVCC, so that for deletion the old version is put into the undo log of that transaction and referred to as an old version of the record (or page, or ...), immediately cleaning up space on the page for new data while the transaction is still goin on. This is great for short transactions and record updates, as a page only has to hold one tuple version at a time, but that is at the cost of having to write the tuples that are being removed into a log, just in case the transaction needs to roll back.

▲

immibis 7 months ago | parent [-]

The space isn't immediately cleaned up because of Postgres's version-based MVCC. It should only need to record that it marked the row as deleted, and the vacuum shouldn't need to record anything because it isn't atomic.

	▲	mattashii 7 months ago \| parent [-]
		Yes, but that's in PostgreSQL, not in MSSQL or the other systems I described (and which the gps seemed to refer to)

▲

magicalhippo 7 months ago | parent | prev [-]

You kinda have that already for certain databases[1] with DELETE TOP 100. We have a few cleanup tasks that just runs that in a loop until zero rows affected.

That said, I agree it would be nice to have a DELETE BATCH option to make it even easier.

[1]: https://learn.microsoft.com/en-us/sql/t-sql/statements/delet...

▲ karrieberrie553 7 months ago | parent | prev | next [-]

Because they mean different things. Deleting all records in a SINGLE transaction means something different than many smaller transactions. Even if for most use cases, you want the latter, it may be the case that you need to lock the DB down to perform the deletions in a single step.

▲ martinsnow 7 months ago | parent | prev | next [-]

Admin panels may do that. Say for example deleting a bucket with a million rows.

In that case I'd fire off a job and tell the user, yeah sure the delete is successful. But check the logs to be sure.

	▲	dotancohen 7 months ago \| parent [-]
		I show these as pending. No message (other than an INFO log message) when the operation completes successfully, but failures are handled and notified properly.

▲ sureglymop 7 months ago | parent | prev [-]

And a follow up question: would the current best way to handle this be to "mark records as deletable" and then do the batched deletion operations when convenient?

	▲	rawgabbit 7 months ago \| parent [-]
		Create a column called MarkedForDeletion. Create a job that starts in the off hours to detect how many locks are present on the table, if low then delete X records. Else wait for Y minutes. Put this in a loop. If error detected, breakout of the loop.