A Multi-Core Experiment

Objectives

During this activity, students should be able to:

Compare the execution speed of Erlang programs running sequentially and in parallel using a multi-core computer.

This activity helps the student develop the following skills, values and attitudes: ability to analyze and synthesize, capacity for identifying and solving problems, and efficient use of computer systems.

Requirements

A computer with a dual (or better) core processor, such as an Intel Centrino Duo or an AMD Turion 64 X2. Actually, any Erlang supported Symmetric Multiprocessing (SMP) computer system should work. An SMP computer has two or more identical CPUs that are connected to a single shared memory. These CPUs may be on a single multi-core chip, spread across several chips, or a combination of both.

Activity Description

This activity is an adaptation from [ARMSTRONG], pp. 375-379.

For this experiment we'll use two different problem sets:

Test 1: Sort 100 lists, each one with 1,000 random integers.
Test 2: Compute 100 times the 27th element of the Fibonacci sequence.

We'll execute each test using two different versions of the map function: the ordinary (sequential) lists:map and the parallel plists:map. By default, this last function operates over each list element using an independent Erlang process.

NOTE: In order to use the plists module, extract plists-1.0.zip, copy the plists-1.0/src/plists.erl file to your working directory and compile it.

For Test 1, we need a function to create a list with K random integers in the range 1...1,000,000. The following code achieves this:

make_random_list(K) -> 
    [random:uniform(1000000) || _ <- lists:duplicate(K, null)].

The seed function resets the random number generator. This is so we get the same sequence of random numbers each time we run the program:

seed() -> random:seed(44,55,66).

This code creates a list L1 with 100 lists containing 1,000 random integers each:

L1 = [make_random_list(1000) || _ <- lists:duplicate(100, null)].

For Test 2, we need a function to compute the n-th element of the Fibonacci sequence. We'll use the following computational intensive (and inefficient) recursive definition:

fibo(0) -> 1;
fibo(1) -> 1;
fibo(N) when N > 1 -> fibo(N-1) + fibo(N-2).

The list L2, containing 100 twenty-sevens, can be defined as:

L2 = lists:duplicate(100, 27).

In summary, the objective of our experiment is to measure the time it takes to execute the four expressions contained in the following table:

Expression	Category	Description
`lists:map(fun lists:sort/1, L1)`	Sequential	Light-weight computations with a lot of message passing: Sort 100 lists, each one with 1,000 random integers.
`plists:map(fun lists:sort/1, L1)`	Parallel
`lists:map(fun fibo/1, L2)`	Sequential	CPU-bound computations with little message passing: Compute 100 times the 27th element of the Fibonacci sequence.
`plists:map(fun fibo/1, L2)`	Parallel

In order to measure the execution time, we'll use the timer:tc function which takes three parameters: Module, FunName, and ArgList. It calls the corresponding function with the specified arguments and measures the elapsed real time. It returns a tuple {Time, Value}, where Time is the elapsed real time in microseconds, and Value is what the function returns. For example:

> {T,V} = timer:tc(lists,seq,[1,20]). 
{1,[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]}
> T.
1
> V.
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]

Once we have our time measurements, we can calculate the speedup, which refers to how much a parallel program is faster than a corresponding sequential program.

The speedup is defined by the following formula:

S_p = T₁ / T_p

Where:

S_p: The speedup obtained from using p processors.
T₁: The time it takes the program to be executed sequentially.
T_p: The time it takes the program to be executed in parallel using p processors.

Now lets put everything together in a file called multicore.erl:

-module(multicore).
-export([test1/0,test2/0]).

make_random_list(K) -> 
    [random:uniform(1000000) || _ <- lists:duplicate(K, null)].

seed() -> random:seed(44,55,66).

fibo(0) -> 1;
fibo(1) -> 1;
fibo(N) when N > 1 -> fibo(N-1) + fibo(N-2).

test1() ->
    seed(),
    L1 = [make_random_list(1000) || _ <- lists:duplicate(100, null)],
    {Time1, S1} = timer:tc(lists, map, [fun lists:sort/1, L1]),
    {Time2, S2} = timer:tc(plists, map, [fun lists:sort/1, L1]),
    io:format("sort: results equal? = ~w, speedup = ~w~n", 
              [S1=:=S2, Time1/Time2]).

test2() ->
    L2 = lists:duplicate(100, 27),
    {Time1, S1} = timer:tc(lists, map, [fun fibo/1, L2]),
    {Time2, S2} = timer:tc(plists, map, [fun fibo/1, L2]),
    io:format("fibo: results equal? = ~w, speedup = ~w~n", 
              [S1=:=S2, Time1/Time2]).

Compile the module and run the functions test1 and test2. Which of the two tests has the best speedup? Why do you think this is so?

By default, the Erlang runtime system starts with SMP support enabled if it's available and two or more logical processors are detected.

When you launch the Erlang shell, you can specify the "+S N" option to indicate that N schedulers should be used (each Erlang scheduler is a complete virtual machine that knows about all the other virtual machines). For example, the following command indicates that only one scheduler should be used:

$ erl +S 1

NOTE: In Windows, open a command window (cmd.exe) and use the werl command instead (normally found in the C:\Program Files\erl5.6\bin directory) supplying the same +S option.

If the +S option is omitted, it defaults to the number of logical processors in the SMP machine.

Run again test1 and test2, but this time using only one scheduler. What happens to the speedups? Why do you think this happens?

Finally, run test1 and test2, but now using more schedulers than the number of logical processors available. What can you conclude from the observed results?

Deliverables

This is a class activity, nothing needs to be delivered.

Programming Languages

A Multi-Core Experiment

Objectives

Requirements

Activity Description

Deliverables