How slow is MATLAB really?

A couple of days ago, I wrote a quick post discussing the usage of CPLEX instead of MATLAB’s built in solvers. CPLEX is a proprietary tool that is developed by IBM and is capable of solving certain types of problems much, much quicker – especially when the number of search parameters becomes large. As an addendum to my previous findings, in the post where I bashed MATLAB and praised CPLEX (link), I herewith would like to continue my bashing, based on yet another bottleneck that I discovered in MATLAB: populating space matrices.

Sparse matrices

The simplest problems we would like to solve require matrices that can easily grow larger than a few 100GB. But most values that are stored in these matrices are zeros, which makes the matrices sparse in data. If we instead use MATLAB’s sparse matrix implementation to take advantage of this sparsity, then the memory requirements drastically reduce! The function one should use to initialise such a sparse matrix is:

A = shallow(m, n, nz);

Here, one can define the matrix’s height, m, its width, n and the number of non-zero elements, nz, can also be set. Now, instead of requiring e.g. 5083.5GB to store a single matrix, our matrix grows up to a manageable 75MB.

>> size(A) % A is our large sparse matrix
    ans = 998562 683280
>> whos('A')
    Name           Size                   Bytes  Class     Attributes
    A         998562x683280            78342264  double    sparse

Nonetheless, we need to populate this matrix with data – otherwise what’s the point of having the matrix anyway? For instance, for the code above we append rows to the bottom of our \textbf{A} matrix until all constraints have been added. Sometimes the constraints are symmetric and we can append multiple rows that contain combinations of scaled identity matrices (eye()), upper triangular matrices (triu()) or lower triangular matrices (tril()).

It is this population and appending however, that requires a lot of time!

Timing

To investigate how much time is spent on preparing and actually solving our minimisation problem, I adjusted my profiling code and re-ran the MILP (Mixed Integer Linear Programming) solver in MATLAB. Results look as follows:

One can see, how the MATLAB overhead increases with the actual solving time, since this is what one might expect. After all, the more variables we need to deal with, the more time it takes to populate (i.e. overhead) and the more time is required to solve (i.e. solving). So I checked if the ratio between time spent populating and time spent solving changes for different number of variables.

Results show that the average time spent solving our problem highly outweighs the average overhead time. One might therefore believe that MATLAB is pretty good at performing elementary tasks. After all, problems that took two hours to solve required less than a second to be set up. But as you can see, MATLAB is pretty slow, which is why (at the time of writing updating this post) I only had 50 repetitions of up to 11 15 days evaluated. CPLEX on the other hand was much quicker, so I started profiling its BILP (Binary Integer Linear Programming) solver and after a few days gathered the results.

The difference is astonishing! Both the overhead time as well as the solving time increase exponentially as the number of input parameters goes up. Unlike the pure MATLAB implementation however, for long datasets, the CPLEX solver is actually quicker at finding a solution than MATLAB is at setting up the problem. Now that’s what I call optimisation: well done IBM! But how is the average share of time allocated you may ask?

Here you go. When using CPLEX, the percentage of running time used to solve our minimisation problem rapidly decreases with increasing problem size. Isn’t that beautiful? But also frustrating since there is no apparent way to speed up MATLAB’s matrix manipulation mechanism. After all, I used as much of my MATLAB knowledge to avoid loops and use native functions wherever possible. Yet it is MATLAB, not the solver, that has become our bottleneck.

Nonetheless, I am complaining from a very high standard of “underperformance”. The solutions we seek are returned within 15 minutes, and that includes both solving time and overhead time. Therefore, I apologise to you for having made you read through this pointless mini-rant, and can only hope that you at least enjoyed my beautiful results that were in no way, shape or form advertisement for IBM’s awesomeness 😉

Updated: 27th of July 2017 – More MILP timing data

Leave a Reply

*
*