During this activity, students should be able to:
This activity helps the student develop the following skills, values and attitudes: ability to analyze and synthesize, capacity for identifying and solving problems, and efficient use of computer systems.
Individually, solve the following programming exercises
using Erlang and the plists:mapreduce
function. Place your
functions in a module called mapred
.
A number of the form 2n that contains the consecutive digits 666 (i.e., the beast number) is called an Apocalyptic Number. The number 2157 is an apocalyptic number, because 2157 = 182687704666362864775460604089535377456991567872 which contains the beast number starting at the digit in position 10 (starting from the left).
The function apocalyptic
takes two integer parameters, S and E (0 ≤ S ≤ E). It starts by calling the MapReduce operation with a list of all integer numbers between S and E inclusively. The mapping function receives a number N and determines if 2N is or isn’t an apocalyptic number. If it is, it emits the tuple {true, N}
, otherwise it emits the tuple {false, N}
. After the reduction, it returns a list with all the apocalyptic numbers found between S and E, or an empty list if none were found. Examples:
> mapred:apocalyptic(100, 200). [157,192] > mapred:apocalyptic(100, 150). [] > mapred:apocalyptic(800, 850). [800,807,819,820,822,823,824,826,828,836,838,840,841,842, 844,846,848,850]
The function max_access
determines the IP address
of the client computer (remote host) that has the greatest
number of accesses to a certain web server by inspecting its log
files. It receives a string with the name of the directory
containing the log files to be inspected.
Each of these log files is comprised of many lines similar to this one:
189.191.131.167 [23-Oct-2008:12:03:38 -0500] "GET /apps/s200813/tc2006/noticias/" 200 541
This is the description of each element in the previous line:
189.191.131.167
– IP address of the
client (remote host) which made the request to the server.
[23-Oct-2008:12:03:38 -0500]
–
The date and time that the server finished processing the
request.
"GET /apps/s200813/tc2006/noticias/"
–
The request line from the client within double quotes.
200
– Status code that the server
sent back to the client. Status code 200 means everything is
OK.
541
– Size in bytes of the response body
sent back to the client. This field may be absent.
The max_access
function calls the MapReduce
operation with a list containing the pathnames of all the
log files in the specified directory. The mapping function reads
a single log file and for each line it emits a
{IP_address, 1}
tuple. After the reduction, the total
count is computed for each unique IP_address
, and
the IP_address
with the largest count is returned
as a tuple {IP_address, Largest_count}
.
For example (using the contents of logs.zip):
> mapred:max_access("logs"). > {"10.48.9.90",613}
The function locs
computes the number of
lines of code (LOCs) in a set of source files. It takes two
inputs: a string with the name of the directory from where the
search will start, and a wildcard string used to determine which
files to include during the search. The search considers all the
files in the specified directory and all its subdirectories
at any depth.
The function calls the MapReduce operation with a list
containing the pathnames of all the files in the specified
directory (and all its subdirectories) that match the given
wildcard. The mapping function reads a single file, counts its
number of lines, and emits a {File_name,
Number_of_lines}
tuple. After the reduction, the total
number of files (TF) and the total number of lines in all files
(TL) are computed. The function returns the tuple {TF,
TL}
.
The following example (using the contents of
nasm-2.05.zip) demonstrates how the locs
function could be used:
> mapred:locs("nasm-2.05/lib", "*.c"). {2,74} > mapred:locs("nasm-2.05", "*.c"). {68,75892} > mapred:locs("nasm-2.05", "*.pl"). {23,5429}
In order to solve the previous problems, you will probably need to use additionally some or all of the following Erlang modules:
Assume you have a text file called codemonkey.txt
in the
current working directory with the following content:
Code Monkey like Fritos. Code Monkey like Tab and Mountain Dew. Code Monkey very simple man, with big warm fuzzy secret heart. Code Monkey like you.
You can use the following Erlang code to read this file one line at a time:
> {ok, F} = file:open("codemonkey.txt", read). {ok,<0.33.0>} > io:get_line(F, ''). "Code Monkey like Fritos.\n" > io:get_line(F, ''). "Code Monkey like Tab and Mountain Dew.\n" > io:get_line(F, ''). "Code Monkey very simple man,\n" > io:get_line(F, ''). "with big warm fuzzy secret heart.\n" > io:get_line(F, ''). "Code Monkey like you.\n" > io:get_line(F, ''). eof > file:close(F). ok
The filelib:wildcard
function returns a list of all files that match Unix-style wildcard-string. The filelib:is_dir
function allows to determine if a name actually refers to a directory. For example:
> filelib:wildcard("*.txt"). ["codemonkey.txt"] > filelib:is_dir("/home/aortiz"). true > filelib:is_dir("codemonkey.txt"). false
Using the Online
Assignment Delivery System (SETA), deliver the file called
mapred.erl
. No assignments will be accepted through
e-mail or any other means.
IMPORTANT: The program source file must include at the top the author's personal information (name and student id) within comments. For example:
%% ITESM CEM, April 12, 2010. %% Erlang Source File %% Activity: MapReduce Exercises %% Author: Steve Rogers, 449999 . . (The rest of the program goes here) .
Due date: Monday, April 12.
This activity will be evaluated using the following criteria:
-10 | The program doesn't contain within comments the author's personal information. |
---|---|
10 | The program contains syntax errors. |
DA | The program was plagiarized. |
10-100 | Depending on the amount of exercises that were solved correctly. |