From: "zverok (Victor Shepelev)" <noreply@...>
Date: 2021-08-30T11:29:38+00:00
Subject: [ruby-core:105092] [Ruby master Feature#18136] take_while_after

Issue #18136 has been updated by zverok (Victor Shepelev).


```ruby
(0..).lazy
  .map { |offset| get_page(offset, limit) }
  .take_while_after { |response| response.count == limit } # the last will have, say, 10 items, but should still be included!
  .map { process response somehow }
```
> Couldn't this be written with `.take_while { |response| response.count > 0 }`

No. Target API (and it is the real one we are working in production currently, and it is an API of a well-known service!) has "(less than limit) results" as the only designation of the last page. It is not the only one doing so. And relying on the fact that the _next_ page would be empty is a) unwanted (we do at least **one extra request**) and b) not always works: Some of those APIs do throw errors on attempt to access "page after last", and at least one of others does _repeat the last page_ (10 items in my example) infinitely.

> The problem may also be that the overall interface doesn't seem to be designed very well. Marking the last piece of data as special looks wrong; introducing a next object that is marked as not being part of data seems much more appropriate.

From a high theoretical point of view, it might be right (though, even this can be argued against: why do additional HTTP call to fetch empty "designates end" object, when we can pass "there is no more" with the data? I believe MOST of pagination APIs work this way!)

Anyway, there is still a lot of APIs/data structures in the wild that explicitly use "the last object also says it is last" approach. And without `take_while_after` there is no way to represent them as Ruby enumerator.

----------------------------------------
Feature #18136: take_while_after
https://bugs.ruby-lang.org/issues/18136#change-93501

* Author: zverok (Victor Shepelev)
* Status: Open
* Priority: Normal
----------------------------------------
Sorry, I already tried that once (#16441) but I failed to produce the persuasive example.
So I am back with a couple of them, much simpler and clear than my initial.

**The proposal itself:** Have `take_while_after` which behaves like `take_while` but also includes the last element (first where the condition failed). Reason: there are a lot of cases where "the last good item" in enumeration is the distinctive one (one where enumeration should stop, but the item is still good.

**Example 1:** Take pages from paginated API, the last page will have less items than the rest (and that's how we know it is the last):

```ruby
(0..).lazy
  .map { |offset| get_page(offset, limit) }
  .take_while_after { |response| response.count == limit } # the last will have, say, 10 items, but should still be included!
  .map { process response somehow }
```

**Example 2:** Same as above, but "we should continue pagination" is specified with a separate data key "can_continue":
```ruby
(0..).lazy
  .map { |offset| get_page(offset, limit) }
  .take_while_after { |response| response['can_continue'] } # the last will have can_continue=false, but still has data
  .map { process response somehow }
```

**Exampe 3:** Taking a sentence from a list of tokens like this:
```ruby
tokens = [
  {text: 'Ruby', type: :word},
  {text: 'is', type: :word},
  {text: 'cool', type: :word},
  {text: '.', type: :punctuation, ends_sentence: true},
  {text: 'Rust', type: :word},
  # ...
]

sentence = tokens.take_while_after { _1[:ends_sentence] }
```

(I can get more if it is necessary!)

Neither of those can be solved by "Using `take_while` with proper condition.", as @matz suggested here: https://bugs.ruby-lang.org/issues/16441#note-9

I typically solve it by `slice_after { condition }.first`, but that's a) uglier and b) greedy when we are working with lazy enumerator (so for API examples, all paginated pages would be fetched at once, and only then processed).

Another consideration in #16441 was an unfortunate naming.
I am leaving it to discussion, though I tend to like `#take_upto` from #16446.


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>