Delete Data

Use the Python client to delete data from a Synnax cluster.

The Synnax client allows deletion of time ranges of data in any channel: after each deletion operation is complete, all future reads will no longer include the deleted data. However, it may take a while before the underlying file sizes decrease – this allows deletion operations to be served in a rapid manner and only actually collect the unwanted data when the load on the server is low.

Note the differences between deleting data and deleting a channel – once a channel is deleted, it no longer exists; whereas when some data in a channel is deleted, we can write over that time range with new data or even delete some more data. Even if an entire channel’s data is deleted, the channel is still in the database, albeit empty.

Deleting Data From a Channel

The delete method of the client allows deletion of data (not to be confused with the delete method of the Channel class, which deletes channels). To delete a chunk of data, simply pass in the channel name(s) or key(s) and the time range to delete. As throughout Synnax, remember that a time range is start-inclusive and end-exclusive, i.e. data at the start time stamp is deleted and data at the end time stamp is not.

For example, to remove data in the range [00:01, 00:03) on the timestamps and my_precise_tc channels:

import synnax as sy
client = sy.Synnax(...)

# timestamps and my_precise_tc are two channels containing data.
client.delete(
    ["my_index_timestamps", "my_precise_tc"],
    sy.TimeStamp(1 * sy.TimeSpan.SECOND).range(sy.TimeStamp(3 * sy.TimeSpan.SECOND))
)

Using channel name(s) to delete data will delete data in all channels with the given name(s). Using keys to delete is more preferable to prevent accidental deletion!

Note that delete is idempotent, meaning consecutive calls to delete on overlapping time ranges are allowed:

# no additional data deleted
client.delete(
    ["my_index_timestamps", "my_precise_tc"],
    sy.TimeStamp(1 * sy.TimeSpan.SECOND).range(sy.TimeStamp(3 * sy.TimeSpan.SECOND))
)

# 00:01 to 00:10 deleted
client.delete(
    ["my_index_timestamps", "my_precise_tc"],
    sy.TimeStamp(1 * sy.TimeSpan.SECOND).range(sy.TimeStamp(10 * sy.TimeSpan.SECOND))
)

Limitations of Deletions

In some situations, delete raises an error. If some channel keys or names do not exist in the database, the entirety of the delete operation fails, no data is deleted, and a NotFound error is returned:

# Suppose 111 and 112 are keys to channels that do exist, since 113 
# does not exist, none of these channels' data get deleted.
client.delete([111, 112, 113], time_range_to_delete)

In the case where a requested channel is not found, delete is atomic: no data will be deleted and the operation will fail. However, in all other cases, delete is not atomic: failure in deleting data one channel halts the entire operation and raises an error immediately.

In addition, if a delete call is made to an index channel that other channels depend on data in the requested time range, an error is raised:

# If my_precise_tc is indexed by my_index_timestamps from 1 second to 3 seconds,
# we cannot delete my_index_timestamps. This call raises an error.
client.delete(
    ["my_index_timestamps"],
    sy.TimeStamp(1 * sy.TimeSpan.SECOND).range(sy.TimeStamp(3 * sy.TimeSpan.SECOND))
)

# If we delete my_precise_tc, the dependent, at the same time as my_index_timestamps,
# no errors are raised.
client.delete(
    ["my_precise_tc", "my_index_timestamps"],
    sy.TimeStamp(1 * sy.TimeSpan.SECOND).range(sy.TimeStamp(3 * sy.TimeSpan.SECOND))
)

Last but not least, delete calls on any channel with a writer whose start time is before the deleting time range raise an error. This is to ensure that the writer and the deleter do not contend over data in the same region.

w = client.open_writer(
    start= sy.TimeStamp(10 * sy.TimeSpan.SECOND),
    channels=["my_precise_tc"],
)

# error raised since writer start 00:10 is before deleting time range [00:12 - 00:30)
client.delete(
    ["my_precise_tc"],
    sy.TimeStamp(12 * sy.TimeSpan.SECOND.range(sy.TimeStamp(30 * TimeSpan.SECOND))
)

Once writers starting before the deleting time range are closed, calls to delete may proceed normally.