4

I'm trying to create a list of indices that cycles from 0 to m - 1 and is of length n. I've thus far achieved this as follows:

import numpy as np
m = 7
n = 12
indices = [np.mod(i, m) for i in np.arange(n)]

which results in the following:

[0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4]

Is there a faster way to achieve this?

Thanks for any suggestions.

  • Define faster. Faster than... what? – ggdx Apr 18 at 9:08
  • 1
    @ggdx Faster in terms of computation time. This line is a bit of a bottleneck ... Thanks! – alex_lewis Apr 18 at 9:09
1

Since you're asking for the fastest possible it would be good to provide some test times. Thus I tested most of the posted snippets with timeit module.

Quickly defined functions for easier calls to timeit.

def list_comp(m, n):
    return [np.mod(i, m) for i in np.arange(n)]

def leftover(m, n):
    nb_cycles = n//m
    leftover = n-m*nb_cycles
    indices = list(range(m))*nb_cycles + list(range(leftover))

def islice_cycle(m, n):
    return list(islice(cycle(range(m)), n))

def npmod(m, n):
    return mod(np.arange(m), n)

def resized(m, n):
    return np.resize(np.arange(m), n)

Tested with:

timer = timeit.Timer(stmt="function_name(7, 12)", globals=globals()).repeat(repeat=100, number =10000)
print(f'Min: {min(timer):.6}s,\n Avg: {np.average(timer):.6}s')

Results

| Function       |   Minimum   |    Average   |  
|:---------------|------------:|:------------:|  
| list_comp      | 0.156117s   |  0.160433s   |  
| islice_cycle   | 0.00712442s |  0.00726821s |  
| npmod          | 0.0118933s  |  0.0123122s  |  
| leftover       | 0.00943538s |  0.00964464s |  
| resized        | 0.0818617s  |  0.0851646s  |  

@Austin answer using islice and cycle is the fastest of all so far. @T.Lucas is a bit slower, but only by a small margin, but quite fast for pure python.

Other answers are by a magnitude slower.

  • 1
    Thanks for the benchmark; always a useful addition. However, it isnt obvious from the OP that his numerical constants were necessarily representative. Itd be interesting to post the results for larger N as well, where the performance argument isnt kinda moot in the first place. – Eelco Hoogendoorn Apr 18 at 11:54
  • Funnily enough when I tested on larger data I got different results. Later I'll do some more testing and update my answer. – user3053452 Apr 18 at 12:05
3

You can use islice + cycle from itertools:

from itertools import islice, cycle

print(list(islice(cycle(range(7)), 12)))
# [0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4]
  • 1
    Great stuff - 4x improvement in time. Thanks! – alex_lewis Apr 18 at 9:19
2

Just drop the roundtrip to python using the list comprehension. Getting good speed from numpy is all about making sure your loops stay within numpy, and not looping within python itself.

np.mod(np.arange(n), m)

This is the only numpythonically correct answer, in the sense that it makes it obvious that all for loops in python are avoided. (edit: as demonstrated in other answers, it turns out to be far from the fastest solution though)

1

Given that you're using numpy, one way would be using np.arange and np.resize, which will fill the larger resized array with copies of the original array:

np.resize(np.arange(7), 12)
# array([0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4])
  • Interesting solution; but an illustration of the type of gotcha I discussed in Divakars answer. np.resize contains a nontrivial amount of logic in python; of note, there will be at least one python-memory-management operation per requested copy, since a tuple of references to the input array is constructed; so thats a departure from 'true' vectorization. I imagine that would show up in a nontrivial benchmark. – Eelco Hoogendoorn Apr 18 at 12:41
  • Yes @EelcoHoogendoorn, its does contain some logic in python based on a quick inspection, but it does not seem much of an overhead, as the process mainly involves computing the mod and a further concatenation and slicing – yatu Apr 18 at 13:15
  • @EelcoHoogendoorn The logic involving modulus on top of the memory requirement for the range array isn't working too well. It's all true vectorization here with resize, tile, both are allocating and assigning at C level, without any compute, as needed for mod. – Divakar Apr 18 at 16:18
  • Its the creation of a python tuple of references of the array to be resized that worries me; that seems like its creating a lot of work for the python GC; and its hard to predict the performance implication of that. The benchmark might be tainted by accumulating GC work not counted during the execution of the benchmark itself. – Eelco Hoogendoorn Apr 18 at 17:58
1

NumPy based one with np.tile -

np.tile(np.arange(m),(n+m-1)//m)[:n]

Sample run -

In [58]: m,n = 7,12

In [59]: np.tile(np.arange(m),(n+m-1)//m)[:n]
Out[59]: array([0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4])

Benchmarking

If you are looking for efficiency, especially on decent to large sized data, NumPy would do well. In this section we are timing NumPy solutions with variations across m and n.

Setup :

import numpy as np

def resize(m,n):
    return np.resize(np.arange(m), n)

def mod(m,n):
    return np.mod(np.arange(n), m)

def tile(m,n):
    return np.tile(np.arange(m),(n+m-1)//m)[:n]

Run timings code on IPython console :

# Setup inputs and timeit those on posted NumPy approaches
m_ar = [10,100,1000]
s_ar = [10,20,50,100,200,500,1000] # scaling array

resize_timings = []
mod_timings = []
tile_timings = []
sizes_str = []
for m in m_ar:
    for s in s_ar:
        n = m*s+m//2
        size_str = str(m) + 'x' + str(n)
        sizes_str.append(size_str)

        p = %timeit -o -q resize(m,n)
        resize_timings.append(p.best)

        p = %timeit -o -q mod(m,n)
        mod_timings.append(p.best)

        p = %timeit -o -q tile(m,n)
        tile_timings.append(p.best)

Get results on plot :

# Use pandas to study results  
import pandas as pd

df_data = {'Resize':resize_timings,'Mod':mod_timings,'Tile':tile_timings}
df  = pd.DataFrame(df_data,index=sizes_str)

FGSZ = (20,6)
T = 'Timings(s)'
FTSZ = 16
df.plot(figsize=FGSZ,title=T,fontsize=FTSZ).get_figure().savefig("timings.png")

Results

Comparing all three

enter image description here

resize and tile based ones seem to be doing well.

Comparing resize and tile

Lets's plot those two only :

enter image description here

tile seems to be doing better between these two.

Studying in chunks

Now, let's study the timings in chunks corresponding to three different m's :

enter image description here

enter image description here

enter image description here

mod based one wins only on small m and small n and the timings on those m's and n's are of the order of 5-6 u-secs, but loses out on most of the other scenarios, and it's the compute involving it that's killing it on those.

  • This may be faster in terms of cpu cycles; mod operations are not single-cycle operations. But I would expect the cost of the tiling operation to dwarf the cost of just brute forcing the mod operation, given that numpy will not optimise this expression into a single for loop without temporaries. – Eelco Hoogendoorn Apr 18 at 9:59
  • @EelcoHoogendoorn Tiling is a memory operation, whereas the mod one involves both memory and compute. Why would you expect cost of tiling to be any more than mod based one? – Divakar Apr 18 at 10:48
  • Well; not al numpy functions are themselves truly vectorized, but may involve python for-loops or other suboptimal constructs. I checked; tile calls into repeat, which does call into C; so presumably that leads to an optimised code path without temporaries. So you are probably correct; but I wouldn't immediately trust it to be truly vectorised when written this way, just by looking at the call itself. Not convinced one way or the other without seeing a benchmark. – Eelco Hoogendoorn Apr 18 at 11:52
  • @EelcoHoogendoorn Added benchmark. The compute with mod isn't working well on timings. – Divakar Apr 18 at 16:12
  • Interesting; thanks for the expanded benchmark. I knew mod operations were not cheap; but my general rule of thumb using numpy is that it is hard to notice the cost of individual instructions considering they are not part of a tightly rolled loop; even for one-cycle instructions like add, your actual throughput tends to be way below one operation per cycle due to memory bandwidth constraints and other limitations. But from some references ive seen mod actually takes 100s of cycles on many architectures; and maybe it cant be pipelined? So thats one exceptions to that rule of thumb... – Eelco Hoogendoorn Apr 18 at 17:53
0

If you are looking for speed, this is the faster way I found:

nb_cycles = n//m
leftover = n-m*nb_cycles
indices = list(range(m))*nb_cycles + list(range(leftover))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.