During this activity, students should be able to:
This activity helps the student develop the following skills, values and attitudes: ability to analyze and synthesize, capacity for identifying and solving problems, and efficient use of computer systems.
A computer with a dual (or better) core processor, such as an Intel Centrino Duo or an AMD Turion 64 X2. Actually, any Erlang supported Symmetric Multiprocessing (SMP) computer system should work. An SMP computer has two or more identical CPUs that are connected to a single shared memory. These CPUs may be on a single multi-core chip, spread across several chips, or a combination of both.
This activity is an adaptation from [ARMSTRONG], pp. 375-379.
For this experiment we'll use two different problem sets:
We'll execute each test using two different versions of the
map
function: the ordinary (sequential)
lists:map
and the parallel plists:map
.
By default, this last function operates over each list element using an
independent Erlang process.
NOTE: In order to use the plists
module,
extract plists-1.0.zip,
copy the plists-1.0/src/plists.erl
file to your working
directory and compile it.
For Test 1, we need a function to create a list with K random integers in the range 1...1,000,000. The following code achieves this:
make_random_list(K) -> [random:uniform(1000000) || _ <- lists:duplicate(K, null)].
The seed
function resets the random number generator. This
is so we get the same sequence of random numbers each time we run the
program:
seed() -> random:seed(44,55,66).
This code creates a list L1 with 100 lists containing 1,000 random integers each:
L1 = [make_random_list(1000) || _ <- lists:duplicate(100, null)].
For Test 2, we need a function to compute the n-th element of the Fibonacci sequence. We'll use the following computational intensive (and inefficient) recursive definition:
fibo(0) -> 1; fibo(1) -> 1; fibo(N) when N > 1 -> fibo(N-1) + fibo(N-2).
The list L2, containing 100 twenty-sevens, can be defined as:
L2 = lists:duplicate(100, 27).
In summary, the objective of our experiment is to measure the time it takes to execute the four expressions contained in the following table:
Expression | Category | Description |
---|---|---|
lists:map(fun lists:sort/1, L1) |
Sequential | Light-weight computations with a lot of message passing: Sort 100 lists, each one with 1,000 random integers. |
plists:map(fun lists:sort/1, L1) |
Parallel | |
lists:map(fun fibo/1, L2) |
Sequential | CPU-bound computations with little message passing: Compute 100 times the 27th element of the Fibonacci sequence. |
plists:map(fun fibo/1, L2) |
Parallel |
In order to measure the execution time, we'll use the
timer:tc
function which takes three parameters:
Module, FunName, and ArgList. It calls the
corresponding function with the specified arguments and measures the
elapsed real time. It returns a tuple {Time, Value},
where Time is the elapsed real time in microseconds, and
Value is what the function returns. For example:
> {T,V} = timer:tc(lists,seq,[1,20]). {1,[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]} > T. 1 > V. [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
Once we have our time measurements, we can calculate the speedup, which refers to how much a parallel program is faster than a corresponding sequential program.
The speedup is defined by the following formula:
Sp = T1 / Tp
Where:
Now lets put everything together in a file called
multicore.erl
:
-module(multicore). -export([test1/0,test2/0]). make_random_list(K) -> [random:uniform(1000000) || _ <- lists:duplicate(K, null)]. seed() -> random:seed(44,55,66). fibo(0) -> 1; fibo(1) -> 1; fibo(N) when N > 1 -> fibo(N-1) + fibo(N-2). test1() -> seed(), L1 = [make_random_list(1000) || _ <- lists:duplicate(100, null)], {Time1, S1} = timer:tc(lists, map, [fun lists:sort/1, L1]), {Time2, S2} = timer:tc(plists, map, [fun lists:sort/1, L1]), io:format("sort: results equal? = ~w, speedup = ~w~n", [S1=:=S2, Time1/Time2]). test2() -> L2 = lists:duplicate(100, 27), {Time1, S1} = timer:tc(lists, map, [fun fibo/1, L2]), {Time2, S2} = timer:tc(plists, map, [fun fibo/1, L2]), io:format("fibo: results equal? = ~w, speedup = ~w~n", [S1=:=S2, Time1/Time2]).
Compile the module and run the functions test1
and
test2
. Which of the two tests has the best speedup? Why do
you think this is so?
By default, the Erlang runtime system starts with SMP support enabled if it's available and two or more logical processors are detected.
When you launch the Erlang shell, you can specify the "+S N" option to indicate that N schedulers should be used (each Erlang scheduler is a complete virtual machine that knows about all the other virtual machines). For example, the following command indicates that only one scheduler should be used:
$ erl +S 1
NOTE: In Windows, open a command window (cmd.exe
) and use the werl
command instead (normally found in the C:\Program Files\erl5.6\bin
directory) supplying the same +S option.
If the +S option is omitted, it defaults to the number of logical processors in the SMP machine.
Run again test1
and test2
, but this time using only one scheduler. What happens to the speedups? Why do you think this happens?
Finally, run test1
and test2
, but now using more schedulers than the number of logical processors available. What can you conclude from the observed results?
This is a class activity, nothing needs to be delivered.