# Efficient one-to-one mapping between value and index in an array

I need to get the index (i.e. position) of a value in an array, and I am wondering whether there is some faster way than using the `find` command, by constructing some sort of map or look-up table which contains the mapping between the array values and the indexes.

Take, for example, this array:

``````th = [0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90];
``````

Now, let's say I have a variable with the value of

``````angle = 55
``````

and I want to know where this value is positioned in the array (correct answer is `idx = 12`). Now, I could of course just use `find`:

``````idx = find(th==angle)
``````

But my problem is that in my code I need to do this look-up, to get the index in `th` for the value in `angle`, several (million) times, and it seems like a bit of a waste of resources to constantly be calling the `find` function, which I guess is looping through `th` and doing some sort of comparison.

Instead, I was hoping that there was some way to set up a one-to-one map or a look-up table, where I can just immediately fetch the index corresponding to the value I have in `angle`. (Note: I know that the value I have in `angle` will always correspond exactly to one of the values in `th`.) So just have some function

``````idx = angle2i(angle)
``````

which performs this mapping:

``````0 -> 1
5 -> 2
10 -> 3
15 -> 4
20 -> 5
25 -> 6
``````

etc.

But I'm not seeing how I should implement such a look-up (well, I have a couple of very non-elegant ideas, I'm hoping and guessing that there must be some smart approach for this). Or am I wasting my time here and should I just use the `find` command?

• well you already have a mapping function, which according to your example `th` I would write as `angle2i = @(angle) (angle/5)+1;`. I don't know if that's faster than `find` (you'd have to profile your code to estimate). Where you could potentially save a lot of time is if it could be vectorised, meaning if you already have all the angle values to look up in an array, then with the mapping function calculate an array of all the relevant indices in one operation, then in your loop just pick the index from this array (no need repeated calls to your mapping function or to `find`. – Hoki Apr 17 at 10:56
• Hey @Hoki, thanks for the reply. You're right for this particular case, but I was hoping for a general procedure, when there is no apparent relationship between the values and the indexes. – Finnur Pind Apr 17 at 12:22
• if there is no calculable relationship between the values and the indexes then your best option is to use a `map` as shown in Chris Luengo's answer. – Hoki Apr 17 at 13:56

## 5 Answers

If your keys are integers, perhaps you could use a sparse array! Consider the following situation:

``````function t = q55725607
%% Define the mapping(s):
keys = (1:60).*5; % Keys must be integers!
values = 1:numel(keys);
% Construct an array such that `map(key) == value`
map = sparse(ones(numel(keys),1), keys, values);

% Compare to containers.Map:
hashmap = containers.Map(keys, values);

%% Try this out:
queryKeys = randi(60,5000000,1).*5;
queryKeysCell = num2cell(queryKeys);

t = [timeit(@f1,1), timeit(@f2,1)];

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function queryVals = f1()
queryVals = reshape(full(map(queryKeys)), size(queryKeys));
end

function queryVals = f2
queryVals = hashmap.values(queryKeysCell);
end

end
``````

I don't know if the comparison I made is fair, but if it is, the sparse approach is an order of magnitude faster on my system (`0.1549` vs `1.5685`).

BTW, in case it isn't clear, the reason for using a `sparse` array is because it only takes up space for the non-zero values (so even if you have indices like 1 and 10E5, you would only store 2 values, roughly speaking).

• This is pretty clever! I'll make a comparison for my case and see what turns out to be faster. Thank you for your input! – Finnur Pind Apr 17 at 18:30
• Check the benchmark! – Cris Luengo Apr 17 at 19:26
• @FinnurPind You're welcome! I hope this proves useful. – Dev-iL Apr 17 at 19:41

You are looking for `containers.Map`.

You could do:

``````th = [0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90];
angle2i = containers.Map(th,1:numel(th));
index = angle2i(55)
``````

This is a general solution that only requires that `th` contain unique elements. They don't need to be sorted, and they don't need to be integer (though care must be taken when comparing floating-point values!). This solution should be much faster than `find` for large arrays, as this solution is O(log n) and the `find` solution is O(n). But for very small arrays, the overhead of using `containers.Map` will show.

If `th` is guaranteed sorted, then solutions to this other question might be useful as well.

Of course, if there is a simple mathematical relationship (as in the case of the example `th`, then the O(1) solution by @mattesyo of computing the index cannot be beaten.

• Brilliant, thanks Cris! – Finnur Pind Apr 17 at 18:29
• @FinnurPind: Check the benchmark! This method looks nice and sounds like it is great, but it actually sucks in terms of speed. Your `find` is significantly faster under certain conditions. You might want to reconsider your choice for which of these answers you accept. :) – Cris Luengo Apr 17 at 19:27

If there is a numerical context between the index and the value like it is in your example, you can just use this as a function instead of a look-up table:

``````function idx=angle2i(angle)
idx=angle/5+1;
end
``````

But I don't know if this solves your problem, because I don't know your specific problem.

• Thanks for the reply. Well, you're right, for this particular case I could solve this with some numerical relationship. But I also have some cases where there is no apparent relationship between the index and the value, so I would need some sort of table which relates the two... Still not sure how I would go about implementing that. – Finnur Pind Apr 17 at 12:20
• In this case i don't think there is a general solution in case of run time optimization to your problem while not knowing the specific array size and value structure. The 'find' function is exactly doing what you're trying to reach which actually does not mean it is the most efficient method for every application. By not knowing the specific problem the only chance i see is, you have to try your "very non-elegant ideas" and use the MATLAB Profiler to probably find the fastest one (if there is one). – mattesyo Apr 17 at 13:05
• mattesyo, feel free to leave a comment to my answer if you want to warn about overhead. But don't delete half of it and change it to say something I wouldn't write. There is a reject reason for edit suggestions "conflicts with author's intent". Fixing typos and fixing formatting are OK. Changing the contents of an answer is not OK. Here you can see your edit suggestions and what decisions were taken on them. – Cris Luengo Apr 17 at 14:00
• Actually, this comment by the OP suggests that they are indeed looking for what @CrisLuengo wrote, a general solution. Classic case of an oversimplified example in the question IMO – Adriaan Apr 17 at 14:12

If you have all (several million) values of `angle` at hand you can use `ismember` :

``````[~, idx] = ismember(angle, th);
``````

# Benchmark

Starting off with Dev-iL's benchmark, I added OP's `find` method and rahnema1's `ismember` method, and look at how the methods scale with data size (number of keys). I also compare two different use cases: finding 5000 keys at once, and finding 5000 keys one at the time.

These are the results:

``````                One key at the time                      Many keys at once
-------------------------------------   -------------------------------------
size    sparse     c.Map      find  ismember    sparse     c.Map      find  ismember
----   -------   -------   -------   -------   -------   -------   -------   -------
50    5.1681   54.3091    3.7766   28.8590    0.0956    1.2973    0.5578    0.0537
500    5.0864   54.7872    6.9310   32.5554    0.0977    1.6847    3.6726    0.0499
5000    5.2052   56.4472   35.1449   60.6480    0.1140    2.0886   38.7444    0.0789
``````

[Timings on MATLAB R2017a, on a 3 year old iMac. Your mileage will vary.]

Contrary to my expectations, `containers.Map` has a lot of overhead and is not really suitable for this purpose. Even with an array of 5000 keys, the O(n) `find` method is actually faster than the O(log n) hash map. `containers.Map` is a custom class, and the MATLAB JIT is still not as good with optimizing that type of code. However, one can clearly see the effect of the scaling there, as the `find` method is the only one whose run time increases significantly with increasing data sizes.

Interestingly, the "sparse" method is ~50x faster when vectorized. This is usually not the case any more with vectorization. For example the `find` method is only about 1x-2x faster when vectorized (for larger data sizes, the vectorization requires too much memory, and eventually will prove to be very slow).

The largest difference here between vectorized and loop code is for the `ismember` function. This one sorts the input data, and so here we see the difference between doing that once, and doing it 5000 times. This method is really only suitable when called few times. But in that case, `ismember` also is easily the fastest method.

When fetching one key at the time, the sparse method is fastest, unless the data size is very small, in which case the `find` method wins. However, the sparse method is the only one that requires keys to be positive integers (it will not work with 0, negative values, or non-integer values). The other methods all work with values of arbitrary types (including strings).

### Benchmark code:

``````function t = so(N)
% Define the mapping(s):
keys = (1:N).*5; % Keys must be positive integers!
values = 1:N;

% Sparse lookup table
sparseMap = sparse(ones(numel(keys),1), keys, values);

% containers.Map lookup table
hashMap = containers.Map(keys, values);

% Try this out:
queryKeys = keys(randi(numel(keys),5000,1));
queryKeysCell = num2cell(queryKeys); % trick to read many values from the hashMap at once

t = [timeit(@f1,1), timeit(@f2,1), timeit(@f3,1), timeit(@f4,1), ...
timeit(@f1q,1), timeit(@f2q,1), timeit(@f3q,1), timeit(@f4q,1)] * 1000;

% Functions that do the lookup one at the time:

function queryVals = f1
queryVals = zeros(size(queryKeys));
for ii=1:numel(queryKeys)
queryVals(ii) = full(sparseMap(queryKeys(ii)));
end
end

function queryVals = f2
queryVals = zeros(size(queryKeys));
for ii=1:numel(queryKeys)
queryVals(ii) = hashMap(queryKeys(ii));
end
end

function queryVals = f3
queryVals = zeros(size(queryKeys));
for ii=1:numel(queryKeys)
queryVals(ii) = find(keys==queryKeys(ii));
end
end

function queryVals = f4
queryVals = zeros(size(queryKeys));
for ii=1:numel(queryKeys)
[~, queryVals(ii)] = ismember(queryKeys(ii), keys);
end
end

% Functions that do the lookup all at once:

function queryVals = f1q
queryVals = reshape(full(sparseMap(queryKeys)), size(queryKeys));
end

function queryVals = f2q
queryVals = hashMap.values(queryKeysCell);
end

function queryVals = f3q
[queryVals,~] = find(keys.'==queryKeys);
end

function queryVals = f4q
[~,queryVals] = ismember(queryKeys, keys);
end

end
``````
• Thanks! This is a valuable summary! – Dev-iL Apr 17 at 19:39
• @Dev-iL It is more than a summary! – rahnema1 Apr 17 at 19:42
• @rahnema1 Did I go a bit overboard? :) – Cris Luengo Apr 17 at 19:44
• It is nice and a detailed answer. – rahnema1 Apr 17 at 19:46
• @CrisLuengo I have two other solution in mind. Could you please test efficiency of griddedInterpolant and scatteredInterpolant ? – rahnema1 Apr 17 at 19:52