GroupNumbers
Installation
Install this package with Pkg.add("GroupNumbers")
Description
A family of iterators for grouping adjecent elements of the given iterator xs.
compare function | emits the grouped elements | emits the grouped indices | |
|---|---|---|---|
isequal | groupby2 | groupby2_indices | |
groupby2_dict | groupby2_dict_indices | also emits key | |
isapprox | groupby_numbers | groupby_numbers_indices | |
groupby_numbers_dict | groupby_numbers_dict_indices | also emits key | |
accepts optional emit parameter |
groupby2YYYZZZ(xs; keyfunc=identity, compare=isequal)groupby_numbersYYYZZZ(xs; keyfunc=identity, compare=isapprox, kwargs)
Here, "YYY" = "" or "_dict", and "ZZZ" = "" or "_indices".
Apply keyfunc function to each element of xs to compute the key for comparison. For default, keyfunc is identity, so the key is each element itself.
Compare the adjacent keys by compare function. While groupby2YYYZZZ family adopt isequal as the default compare function, groupby_numbersYYYZZZ family adopt isapprox with accompanying kwargs being supplied to the keyword parameters of the default isapprox function, allowing the control of the tolerance.
Unbranded iterators ("ZZZ" = "") emit the grouped elements.
The
_indicesalternatives ("ZZZ" = "_indices" ) emit the indices of the grouped elements.Unbranded iterators ("YYY" = "") emit only the grouped elements or their indices. Optional keyword parameter
emitmay be specified to apply further transfomation to each element, although it will induce runtime dispatch and lower performance.The
_dictalternatives ("YYY" = "_dict" ) emit also the first keys.
Examples
Example 1: Groups characters in a string
groupby2(xs) is equivalent to IterTools.groupby(identity, xs).
Simple case
julia> collect(groupby2("AAAABBBCCD"))
4-element Vector{Vector{Char}}:
['A', 'A', 'A', 'A']
['B', 'B', 'B']
['C', 'C']
['D']julia> using IterTools
julia> collect(IterTools.groupby(identity, "AAAABBBCCD")); # => same resultEmits keys
Use groupby2_dict(xs) if you need the keys.
julia> collect(groupby2_dict("AAAABBBCCD"))
4-element Vector{Tuple{Any, Vector{Char}}}:
('A', ['A', 'A', 'A', 'A'])
('B', ['B', 'B', 'B'])
('C', ['C', 'C'])
('D', ['D'])Groups without case sensitive
Specify keyfunc optional parameter to a function that computes a key.
julia> collect(groupby2_dict("AaAABbBcCD"; keyfunc=uppercase))
4-element Vector{Tuple{Any, Vector{Char}}}:
('A', ['A', 'a', 'A', 'A'])
('B', ['B', 'b', 'B'])
('C', ['c', 'C'])
('D', ['D'])Groups without case sensitive. Emits the grouped indices rather than the grouped elements.
julia> collect(groupby2_dict_indices("AaAABbBcCD", keyfunc=uppercase))
4-element Vector{Tuple{Any, Vector{Int64}}}:
('A', [1, 2, 3, 4])
('B', [5, 6, 7])
('C', [8, 9])
('D', [10])Example 2: Groups integer numbers
Simple case
groupby2 and groupby_numbers can be used to group integer numbers.
julia> collect(groupby2([10,20,20,30]))
3-element Vector{Vector{Int64}}:
[10]
[20, 20]
[30]julia> collect(groupby_numbers([10,20,20,30])); # => same resultEmits keys
julia> collect(groupby2_dict([10,20,20,30]))
3-element Vector{Tuple{Any, Vector{Int64}}}:
(10, [10])
(20, [20, 20])
(30, [30])julia> collect(groupby_numbers_dict([10,20,20,30])); # => same resultGroups by absolute values
julia> collect(groupby2_dict([10,-20,20,30]; keyfunc=abs))
3-element Vector{Tuple{Any, Vector{Int64}}}:
(10, [10])
(20, [-20, 20])
(30, [30])julia> collect(groupby_numbers_dict([10,-20,20,30]; keyfunc=abs)); # => same resultGroups by absolute values. Emits the grouped indices rather than the grouped elements.
julia> collect(groupby2_dict_indices([10,-20,20,30]; keyfunc=abs))
3-element Vector{Tuple{Any, Vector{Int64}}}:
(10, [1])
(20, [2, 3])
(30, [4])julia> collect(groupby_numbers_dict_indices([10,-20,20,30]; keyfunc=abs)); # => same resultExample 3: Groups floating point numbers
Use groupby_numbersYYYZZZ rather than groupby2YYYZZZ to group floating point numbers.
Simple case.
groupby_numbersYYYZZZ groups floating point numbers with isapprox function by default.
julia> collect(groupby_numbers([ 2e-10, 2e-9, 2e-8, 2e-7 ] .+ 1))
3-element Vector{Vector{Float64}}:
[1.0000000002, 1.000000002]
[1.00000002]
[1.0000002]Adjusts tolerance with atol and rtol parameters.
Consult the manual of Base.isapprox for its keyword parameters such as atol and rtol.
julia> collect(groupby_numbers([ 2e-8, 2e-7, 2e-6, 2e-5 ] .+ 1; atol=1e-6))
3-element Vector{Vector{Float64}}:
[1.00000002, 1.0000002]
[1.000002]
[1.00002]julia> collect(groupby_numbers([ 16, 17, 19, 20 ]* 1e-4 .+ 1; atol=2e-4))
2-element Vector{Vector{Float64}}:
[1.0016, 1.0017]
[1.0019, 1.002]Groups by their absolute values
julia> collect(groupby_numbers([ 1+2e-6, -1+2e-5, 1+2e-4, 1-2e-3 ];
keyfunc=abs, rtol=1e-4))
3-element Vector{Vector{Float64}}:
[1.000002, -0.99998]
[1.0002]
[0.998]Emits the grouped indices rather than the grouped elements.
julia> collect(groupby_numbers_indices([ 1+2e-6, -1+2e-5, 1+2e-4, 1-2e-3 ];
keyfunc=abs, rtol=1e-4))
3-element Vector{Vector{Int64}}:
[1, 2]
[3]
[4]Example 4: Groups noisy vectors
groupby_numbersYYYZZZ can be used to group an array of floating point numbers.
Groups array of vectors
Rotation preserves norm.
julia> using LinearAlgebra
julia> # Rotation matrix
t=15; r15 = [ cosd(t) -sind(t); sind(t) cosd(t)]
2×2 Matrix{Float64}:
0.965926 -0.258819
0.258819 0.965926
julia> using IterTools
julia> vs1 = collect( Iterators.take(
iterated(v -> (1+rand()*1e-8)*r15*v, [1, 0]), 5) )
5-element Vector{Vector}:
[1, 0]
[0.9659258323666292, 0.25881904673099826]
[0.8660254177031013, 0.5000000080359436]
[0.7071067969544697, 0.7071067969544694]
[0.5000000112991584, 0.8660254233551546]
julia> # group by norm
collect( groupby_numbers_indices(vs1; keyfunc=norm, atol=1e-6))
1-element Vector{Vector{Int64}}:
[1, 2, 3, 4, 5]Groups array of tuple consisting of vector and its norm
Calculate the vectors and their norms to avoid recalculate the latters.
julia> using LinearAlgebra
julia> vs1=vec( [ begin
v= [i1,i2] *(1+(rand()-0.5)*1e-8);
(norm(v),v)
end for i1 in -2:2, i2 in -2:2] );
julia> # sort by norm
vs2=sort(vs1; by=first);
julia> # group by norm
collect(groupby_numbers_dict_indices(vs2; keyfunc=first))
6-element Vector{Tuple{Any, Vector{Int64}}}:
(0.0, [1])
(0.9999999976242439, [2, 3, 4, 5])
(1.4142135561654923, [6, 7, 8, 9])
(1.999999991951223, [10, 11, 12, 13])
(2.2360679691661827, [14, 15, 16, 17, 18, 19, 20, 21])
(2.828427114159456, [22, 23, 24, 25])