From: duerst@... Date: 2019-02-06T09:35:34+00:00 Subject: [ruby-core:91421] [Ruby trunk Feature#15580] Proposal: method addition to class String called .indices ( String#indices ) Issue #15580 has been updated by duerst (Martin D��rst). Just a quick question: Should the results include overlaps or not? I.e. is it `'abababa'.indices('aba') # => [0, 2, 4]` or is it just `'abababa'.indices('aba') # => [0, 4]`? ---------------------------------------- Feature #15580: Proposal: method addition to class String called .indices ( String#indices ) https://bugs.ruby-lang.org/issues/15580#change-76681 * Author: shevegen (Robert A. Heiler) * Status: Open * Priority: Normal * Assignee: * Target version: ---------------------------------------- Hello, I am not sure whether this proposal has a realistic chance to be added to Ruby; but I think it is ok to suggest it nonetheless and let matz and the core team decide whether this may be a useful addition to ruby (at the least a bit), or whether it may not be a useful addition or not necessary. Also, I am trying to learn from sawa on the issue tracker here, making useful suggestions. :) I propose to add the following **new method** to **class String** directly: String#indices This would behave similar to String#index in that it will return the position of a substring, but rather than return a single number or nil, it should **return an Array** of all positions found between the main (target) String; and a substring match. If no match is found, nil should be returned, similar to String#index. (It may be possible to extend String#index to provide this functionality, but I do not want to get into the problem of backwards compatibility; and #indices seems to make more sense to me when reading it than #index, since the intent is a different one - hence why I suggest this new method addition.) Right now **.index** on class String will return a result like this: 'abcabcabc'.index 'a' # => 0 'abcabcabc'.index 'd' # => nil So either the number of the first member found ('a', at 0), or nil if no result is found (in the example of 'd'). In general, the proposal here is to keep #indices behaviour the very same as #index, just with the sole difference being that an Array is returned when at the least one index is found; and all positions that are found are stored in that array. What is the use case for this proposal or why would I suggest it? Actually, the use case I have had was a very simple one: to find a DNA/RNA "subsequence" of just a single nucleotide in a longer DNA/RNA string. As you may know, most organisms use double stranded DNA (dsDNA) consisting of four different bases (A,T,C,G); and RNA that is usually single stranded (ssRNA), with the four different bases being (A,T,C,U). For example, given the RNA sequence of a String like 'AUGCUUCAGAAAGAGAAAGAGAAAGGUCUUACGUAG' or a similar String, I wanted to know at which positions 'U' (Uracil) would be in that substring. So ideally an Array of where the positions were. So that was my use case for String#indices. We can of course already get the above as-is via existing ruby features. One solution is to use .find_all - which I am actually using (and adding +1, because nucleotide positions by default start not at 0 but at 1). So I do not really need this addition to class String to begin with, since I can use find_all or other useful features that ruby has as-is just fine. However had, I also thought that it may be useful for others if a String#indices method may exist directly, which is why I propose it here. Perhaps it may simplify some existing code bases out there to a limited extent if ruby users could use the same method/functionality. There may be other use cases for String#indices, but I will only refer to the use case that I have found here. If others wish to add their use case please feel free to do so at your own leisure if you feel like it. Please also do feel free to close this issue here at any moment in time if it is considered to be not necessary. It is not really a high priority suggestion at all - just mostly a convenience feature (possibly). Thanks! PS: I should also add that of course in bioinformatics you often deal with very large datasets, gigabytes/terabytes of genome sequencing data / Next generation sequencing dataset, but if you need more speed anyway then you may use C or another language to do the "primary" work; and ruby could do very fine with smaller datsets just as well; "big data" is not necessarily everywhere. I only wanted to mention this in the event that it may be pointed out that String#indices may not be very fast for very long target strings/substrings - there are still many use cases for smaller substrings, for example. Perl was used very early in the bioinformatics field to good success, for instance. As for documentation, I think the documentation for String#index could be used for String#indices too, just with the change that an Array of the positions found may be returned. -- https://bugs.ruby-lang.org/ Unsubscribe: