It’s because you’re looking at it from opposing ends.
From the perspective of the data source, in a streaming context, the size is finite — it’s whatever you’re sending. From the data sink’s perspective, it’s unknown how many records are going to get sent in total.
Vice versa, in a batch context, the data source has no idea how many records will eventually be requested, but the data sink knows exactly the size of the request.
That is, whoever is initiating the job knows what’s up, and whoever is targeted just has to deal with it.
But generally I believe the norm is to discuss from the sink’s perspective, because the main interesting problem is when the sink has to deal with infinity (streaming). When then source deals with infinity (batch), it’s fairly straightforward to manage — refuse requests of too large a size and move on. The data isn’t going anywhere, so the sink can fix itself and re-request. You do that with streaming and data starts getting lost
From the perspective of the data source, in a streaming context, the size is finite — it’s whatever you’re sending. From the data sink’s perspective, it’s unknown how many records are going to get sent in total.
Vice versa, in a batch context, the data source has no idea how many records will eventually be requested, but the data sink knows exactly the size of the request.
That is, whoever is initiating the job knows what’s up, and whoever is targeted just has to deal with it.
But generally I believe the norm is to discuss from the sink’s perspective, because the main interesting problem is when the sink has to deal with infinity (streaming). When then source deals with infinity (batch), it’s fairly straightforward to manage — refuse requests of too large a size and move on. The data isn’t going anywhere, so the sink can fix itself and re-request. You do that with streaming and data starts getting lost