Friday, 28 January 2011

MATLAB - the Programming Language

MATLAB, short for Matrix Laboratory, is a fourth generation programming language1 that specialises in mathematical computing but can also be expanded to include functions that allow image processing, distributed and parallel computing and various other applications. Its main use is the manipulation of matrices2 in a great many ways, with the underlying language of MATLAB being fine-tuned for this purpose. The first version was written in Fortran by Cleve Moler at the University of New Mexico, but was rewritten in the programming language 'C' after this language gained acceptance in various fields of applied mathematics. Together with Jack Little and Steve Bangert, Moler founded the MathWorks in 1984, with MATLAB having since become widely used in teaching and research as well as in commercial areas. MATLAB is now available for Windows, Unix, Linux and Mac OS X.

The Basics

MATLAB is a lot less scary to the newcomer than some other programming languages, this being one of the reasons why it is used for teaching. There is no need to declare variables before using them, and most mathematical expressions3 don't require much fiddling with to turn them into something MATLAB can understand. As well as allowing the creation and manipulation of vectors and matrices of various descriptions, the language also includes programming favourites such as 'if', 'switch4', 'for' and 'while', allowing the creation of innocent-looking scripts which can take hours on end to execute. The following is a MATLAB function5 that follows the rules of the 3n+1 Conjecture, which would be run by saving it as threenplusone.m and then typing threenplusone(χ) at the command line, where χ would be the number you wanted to test:

function [result] = threenplusone(input)
while input ~= 1
  if rem((input),2) == 1 % if 'input' is odd6
    input = (input.*3)+1; % multiply by 3, add 1
    disp(input)
  end
  while rem(input,2) == 0 % while 'input' is even7
    input = input./2; % divide by 2
    disp(input)
  end
end % will return to start of loop unless input = 1
result = input;
disp(result)
disp('Success!');

Running the program displays each value of 'input' followed by a notification that 'input' has reached 1, assuming that this is possible8. If it's not, then you'll either have to press Ctrl + Break, which will halt execution of the function, or wait an infinitely long time for the program to stop. For convenience, the above lacks any way of catching numbers that aren't positive integers, but we're assuming the user has the sense not to make the input anything except for a positive integer. The function could be improved to run faster, but has been kept simple for the purposes of this Entry. It could also be expanded to accept a whole matrix of numbers, returning a matrix containing the result of each implementation of the main code.

Notice that the multiply (.*) and divide (./) symbols are preceded by dots — this is to indicate simple multiplication and division as opposed to the more complicated operations often used on matrices9. The '%' symbol indicates that the rest of that line is a comment which MATLAB will ignore, while the ';' at the end of some lines stops MATLAB from displaying the result of the calculation in the command window. The latter is very useful when dealing with large matrices, as displaying thousands of columns of numbers in a little window is very time consuming and also rather messy. However, the ';' symbols could be removed in this example, with the command window then displaying each value of 'input' produced while running the program. Finally, it should be noted that '~=' stands for 'not equal to'.

Vectorisation

MATLAB is designed to work with vectors10, as opposed to loops of code which repeat the same operation on a series of scalars11. This means that a script that makes use of loops is much slower than the equivalent 'vectorised' script, which puts all the numbers to be worked on into a single list rather than working on them one by one. This is due to the fact that MATLAB can easily work on a whole vector at once without interruption, but is forced to cycle through logical statements and other operations if the code uses a loop. The following is an example of a looped script and a vectorised script that do the same thing:

A = [1 4 9 16 25];
for x = 1:5
  A(x) = realsqrt(A(x));
end
disp(A)
A = [1 4 9 16 25];
A(1:5) = realsqrt(A(1:5));
disp(A)

Both scripts find and display the square root of each number in the vector 'A', but the latter is much quicker. A script that squares and then square-roots the numbers between 1 and 1,000,000 takes around ten times longer to run in looped form, and so a vectorised script is more ideal than one that contains loops, with the act of rewriting a script thus being known as vectorisation. The runtime can be measured using the 'tic' and 'toc' functions, which measure the time taken for the section of a program between them to run. One important part of vectorisation is the colon operator ':', with '(1:5)' indicating that the square root should be found for columns one to five of 'A' and then used to replace the values that were there before.

More Complicated Stuff

MATLAB also includes commands to allow programmers to use MATLAB data and functions in programs written in C and Fortran so that they can get their fix of object-oriented programming. The basic package also includes GUI12 development software for the production of colourful little programs with clickable buttons that make functions produced in MATLAB more accessible to lay users. While the core of MATLAB contains many hundreds of commands and functions, 'toolboxes' can be added such as the distributed computing toolbox, which allows code to be split up and run on several processors at the same time so as to increase the running speed of software.

Problems

One particular problem with MATLAB is that licences for it are quite costly, with the language being subject to copyright and being different from alternatives such as GNU Octave in many subtle ways. This means that once a long and complicated program has been written in MATLAB, the costs in time and money of switching to a different supplier and thus another programming language can be prohibitive. Also, MATLAB has gone through several versions, some of which are not entirely compatible. On the programming side, MATLAB can be irritating due to the fact that all matrices start with row 1, column 1, so that all references to a matrix have to be made using positive integers. For instance, if a matrix represented data from -100 to +100 on the x and y axes, the programmer would need to store each value in the column and row of the matrix equal to (value on x axis) + 101 and (value on y axis) + 101 respectively.

MATLAB Tips

  • Make the layout of your program clear by using indents in the same manner as the example above, and use lots of comments to make clear notes on what you're doing.

  • Use the Matlab Editor when writing code — it uses various colours for different parts of the code, helps you to pair up brackets, and automatically applies indents after commands such as 'for', 'if' and 'while'.

  • Add checks to ensure that your code is running properly and add error messages that will display if something goes wrong.

  • Start off by getting the small, yet important, parts of your program to work, then aim bigger. Learn to use the Matlab Editor's cell mode, which allows you to run small sections of code on their own straight from the editing window.

  • Make use of the built-in functions, as they have been tried and tested and will run quite quickly without many problems. Ensure you know the syntax of the functions you are using so as to avoid wasting time while looking for a missed comma.

  • Don't be scared by the number of commands available in MATLAB — help is available for each one and there's often a better way of doing something, provided you know how.

  • If runtime is important, try to vectorise (see above) the section of the program that takes the most time.

Also, have a look at the Seven Secrets of Successful Programmers.


1 A fourth generation language is one which has been made to be closer to a human language than to machine code, and which has a specific function in mind.
2 Tables of numbers.
3 Such as y = 1/x, with MATLAB being able to calculate y given various values of x.
4 Technical note: in MATLAB switch statements, only one case is ever executed, unlike C in which any number of cases can be executed provided they are true.
5 A function is a program that requires an input of data, whereas a script is one that does not.
6 That is to say, if dividing by two produces a remainder of one.
7 Same as before, only this time the remainder is zero.
8 According to the 3n+1 Conjecture, this function should always take a finite time to run.
9 * and / represent matrix multiplication and right division, while ./ represents array left division and / represents matrix left division. The definitions of these are beyond the scope of this Entry.
10 In MATLAB, a vector is a list of scalars (individual numbers) extending in one direction, such as [1 2 3 4 5]. Generally, a vector can be described as a multi-coordinate number, with each number in the vector indicating an ordinate along a different axis.
11 Individual numbers such as 1, 2, 3, 4 or 5.
12 Graphical User Interface, which is exactly what it sounds like.

No comments:

Post a Comment