Métodos computacionales

Problem Set #7: Regular Expressions

Objective

During this activity, students should be able to:

This activity helps students develop the following skills, values and attitudes: ability to analyze and synthesize, capacity for identifying and solving problems, and efficient use of computer systems.


Activity Description

Solve with your team the Clojure exercises described in this problem set. Make sure each regular expression passes all the unit tests.

Create a namespace called regex. At the begining of the file add the following code in order to declare the namespace and import the required external functions:

(ns regex
  (:require [clojure.test :refer [deftest is run-tests]]))

and at the end add:

(run-tests)

All the code you write should go between these two instructions.

  1. C Identifiers: An identifier is a sequence of letters and digits. The first character must be a letter; the underscore _ is considered a letter. Uppercase and lowercase letters are considered different.

    Reference: Kernighan, B. & Ritchie, D. (1988) C Programming Language, 2nd Edition. p. 192.

    ;;; Regular expression:
    (def c-identifier #"")
    

    Unit tests:

    (deftest test-c-identifier
      (is (re-matches c-identifier "_"))
      (is (re-matches c-identifier "a"))
      (is (re-matches c-identifier "A"))
      (is (re-matches c-identifier "_an_identfier_42"))
      (is (re-matches c-identifier "_1234567890"))
      (is (re-matches c-identifier "___________"))
      (is (re-matches c-identifier "ThisIsAnIdentfier"))
      (is (not (re-matches c-identifier "")))
      (is (not (re-matches c-identifier "5")))
      (is (not (re-matches c-identifier "1234567890")))
      (is (not (re-matches c-identifier "#!@$^")))
      (is (not (re-matches c-identifier "_a_b_c_$"))))
    
  2. Scheme Boolean Literals: The standard boolean objects for true and false are written as #t and #f. Alternatively, they can be written #true and #false, respectively.

    Reference: Shinn, A., Cowan, J., & Gleckler, A. (2020) Revised7 Report on the Algorithmic Language Scheme. p. 40.

    ;;; Regular expression:
    (def scheme-boolean #"")
    

    Unit tests:

    (deftest test-scheme-boolean
      (is (re-matches scheme-boolean "#t"))
      (is (re-matches scheme-boolean "#f"))
      (is (re-matches scheme-boolean "#true"))
      (is (re-matches scheme-boolean "#false"))
      (is (not (re-matches scheme-boolean "t")))
      (is (not (re-matches scheme-boolean "f")))
      (is (not (re-matches scheme-boolean "true")))
      (is (not (re-matches scheme-boolean "false")))
      (is (not (re-matches scheme-boolean "()")))
      (is (not (re-matches scheme-boolean "0")))
      (is (not (re-matches scheme-boolean "T")))
      (is (not (re-matches scheme-boolean "F")))
      (is (not (re-matches scheme-boolean "#v")))
      (is (not (re-matches scheme-boolean "#truth")))
      (is (not (re-matches scheme-boolean "#falsy"))))
    
  3. Scheme Integer Literals: [An integer] number may be written in binary, octal, decimal, or hexadecimal by the use of a radix prefix. The radix prefixes are #b (binary), #o (octal), #d (decimal), and #x (hexadecimal). With no radix prefix, a number is assumed to be expressed in decimal. [After the optional radix prefix, the number is comprised of one or more digits taken from the corresponding radix.] Note that case is not significant in numerical constants.

    Reference: Shinn, A., Cowan, J., & Gleckler, A. (2020) Revised7 Report on the Algorithmic Language Scheme. p. 34.

    ;;: Regular expression:
    (def scheme-integer #"")
    

    Unit tests:

    (deftest test-scheme-integer
      (is (re-matches scheme-integer "0"))
      (is (re-matches scheme-integer "24601"))
      (is (re-matches scheme-integer "#d1234567890"))
      (is (re-matches scheme-integer "#b10"))
      (is (re-matches scheme-integer "#o12345670"))
      (is (re-matches scheme-integer "#x1234567890abcdefABCDEF"))
      (is (not (re-matches scheme-integer "")))
      (is (not (re-matches scheme-integer "#123")))
      (is (not (re-matches scheme-integer "#da1234567890")))
      (is (not (re-matches scheme-integer "#b102")))
      (is (not (re-matches scheme-integer "#o123456780")))
      (is (not (re-matches scheme-integer
                           "#x1234567890abcdefgABCDEF"))))
    
  4. Java Integer Literals: An integer literal may be expressed in decimal (base 10), hexadecimal (base 16), or octal (base 8). An integer literal is of type long if it is suffixed with an ASCII letter L or l (ell); otherwise it is of type int. The suffix L is preferred, because the letter l (ell) is often hard to distinguish from the digit 1 (one).

    A decimal numeral is either the single ASCII character 0, representing the integer zero, or consists of an ASCII digit from 1 to 9, optionally followed by one or more ASCII digits from 0 to 9.

    An hexadecimal numeral consists of the leading ASCII characters 0x or 0X followed by one or more ASCII hexadecimal digits. Hexadecimal digits with values 10 through 15 are represented by the ASCII letters a through f or A through F, respectively; each letter used as a hexadecimal digit may be uppercase or lowercase.

    An octal numeral consists of an ASCII digit 0 followed by one or more of the ASCII digits 0 through 7.

    Note that octal numerals always consist of two or more digits; 0 is always considered to be a decimal numeral-not that it matters much in practice, for the numerals 0, 00, and 0x0 all represent exactly the same integer value.

    Reference: Gosling, J., Joy, B., Steele, G., & Brache, G. (2005) The Java Language Specification, Third Edition. Addison-Wesley. pp. 22-23.

    ;;; Regular expression:
    (def java-integer #"")
    

    Unit tests:

    (deftest test-java-integer
      (is (re-matches java-integer "0"))
      (is (re-matches java-integer "1234567890"))
      (is (re-matches java-integer "012345670"))
      (is (re-matches java-integer "0xabcdef1234567890ABCDEF"))
      (is (re-matches java-integer "0l"))
      (is (re-matches java-integer "1234567890l"))
      (is (re-matches java-integer "012345670L"))
      (is (re-matches java-integer "0Xabcde1234567890fABCDEFL"))
      (is (not (re-matches java-integer "")))
      (is (not (re-matches java-integer "L")))
      (is (not (re-matches java-integer "1a234567890")))
      (is (not (re-matches java-integer "0123456780")))
      (is (not (re-matches java-integer "0x1234567890abcdefgABCD"))))
    
  5. Java Floating-Point Literals: A floating-point literal has the following parts: a whole-number part, a decimal point (represented by an ASCII period character), a fractional part, an exponent, and a type suffix. The exponent, if present, is indicated by the ASCII letter e or E followed by an optionally signed integer.

    At least one digit, in either the whole number or the fraction part, and either a decimal point, an exponent, or a float type suffix are required. All other parts are optional.

    A floating-point literal is of type float if it is suffixed with an ASCII letter F or f; otherwise its type is double and it can optionally be suffixed with an ASCII letter D or d.

    Reference: Gosling, J., Joy, B., Steele, G., & Brache, G. (2000) The Java Language Specification, Second Edition. Addison-Wesley. p. 24.

    ;;: Regular expression:
    (def java-float #"")
    

    Unit tests:

    (deftest test-java-float
      (is (re-matches java-float "1."))
      (is (re-matches java-float ".2"))
      (is (re-matches java-float "3e4"))
      (is (re-matches java-float "5e-6"))
      (is (re-matches java-float "7e+8"))
      (is (re-matches java-float "9F"))
      (is (re-matches java-float "1f"))
      (is (re-matches java-float "2D"))
      (is (re-matches java-float "3d"))
      (is (re-matches java-float "1.2"))
      (is (re-matches java-float "1234.E567"))
      (is (re-matches java-float "1234.E-567"))
      (is (re-matches java-float "1234.E+567"))
      (is (re-matches java-float ".1234E567"))
      (is (re-matches java-float ".1234E-567"))
      (is (re-matches java-float ".1234E+567"))
      (is (re-matches java-float "123."))
      (is (re-matches java-float ".4567"))
      (is (re-matches java-float "123.4567"))
      (is (re-matches java-float "123.4567E890"))
      (is (re-matches java-float "123.4567E-890"))
      (is (re-matches java-float "123.4567E+890"))
      (is (re-matches java-float "123.F"))
      (is (re-matches java-float ".4567f"))
      (is (re-matches java-float "123.4567D"))
      (is (re-matches java-float "123.4567E890d"))
      (is (re-matches java-float "123.4567E-890F"))
      (is (re-matches java-float "123.4567E+890f"))
      (is (re-matches java-float "1234e567"))
      (is (re-matches java-float "1234e-567"))
      (is (re-matches java-float "1234e+567"))
      (is (re-matches java-float "12345678F"))
      (is (re-matches java-float "12345678f"))
      (is (re-matches java-float "12345678D"))
      (is (re-matches java-float "12345678d"))
      (is (not (re-matches java-float ".")))
      (is (not (re-matches java-float "0")))
      (is (not (re-matches java-float "1234")))
      (is (not (re-matches java-float "-1234")))
      (is (not (re-matches java-float "e1234")))
      (is (not (re-matches java-float "E-1234")))
      (is (not (re-matches java-float "F")))
      (is (not (re-matches java-float "f")))
      (is (not (re-matches java-float "d")))
      (is (not (re-matches java-float "D")))
      (is (not (re-matches java-float "E")))
      (is (not (re-matches java-float "e")))
      (is (not (re-matches java-float "123..456")))
      (is (not (re-matches java-float "123.456.789")))
      (is (not (re-matches java-float "123456E78.90")))
      (is (not (re-matches java-float "123.456E78.90")))
      (is (not (re-matches java-float "123.456E78DF")))
      (is (not (re-matches java-float "-123.4567E890")))
      (is (not (re-matches java-float "+123.4567E890")))
      (is (not (re-matches java-float "0x1234")))
      (is (not (re-matches java-float "01234")))
      (is (not (re-matches java-float "123E"))))
    
  6. C Comments: The characters /* start a comment that ends with the characters */. Comments cannot be nested nor can they be inside a string or character literal.

    Reference: Kernighan, B. & Ritchie, D. (1988) C Programming Language, 2nd Edition. p. 192.

    ;;; Regular expression:
    (def c-comment #"")
    

    Unit tests:

    (deftest test-c-comment
      (is (re-matches c-comment "/**/"))
      (is (re-matches c-comment "/*-*/"))
      (is (re-matches c-comment "/*\n*/"))
      (is (re-matches c-comment
                      "/***********
                       /*         *
                       /*         *
                       /***********/"))
      (is (= 3 (count (re-seq c-comment "/*********
                                          Comment 1
                                          *********/
    
                                         /*********
                                          Comment 2
                                          *********/
    
                                         /*********
                                          Comment 3
                                          *********/"))))
      (is (not (re-matches c-comment "/")))
      (is (not (re-matches c-comment "/*")))
      (is (not (re-matches c-comment "/**")))
      (is (not (re-matches c-comment "/*/")))
      (is (not (re-matches c-comment "//")))
      (is (not (re-matches c-comment "/** /")))
      (is (not (re-matches c-comment "******/")))
      (is (not (re-matches c-comment "/ * * * */"))))
    

Deliverables

The program source file must include at the top the authors’ personal information (student ID and name) within comments. For example:

;----------------------------------------------------------
; Problem Set #7: Regular Expressions
; Date: May 19, 2022.
; Authors:
;          A01770771 Sylvie Laufeydottir
;          A01777771 Loki Laufeyson
;----------------------------------------------------------

Instrucciones para subir archivo

Para entregar el archivo regex.clj, ingresa los siguientes datos:

Solicitar NIP

Only one team member needs to upload the file.

Due date is Thursday, May 19.

Evaluation

This activity will be evaluated using the following criteria:

-10 The program doesn't contain within comments the author’s personal information.
10 The program contains syntax errors.
1 The program was plagiarized in whole or in part.
10-100 Depending on the amount of exercises that were solved correctly.