How to write programs that write programs.

This article was originally published in Linux Journal Issue #158.

linux journal 158

A metaprogram is a program that generates other programs or program parts. Hence, metaprogramming means writing metaprograms. Many useful metaprograms are available for Linux; the most common ones include compilers (gcc or clang), interpreters (perl or ruby), parser generators (bison), assemblers (as or nasm) and preprocessors (cpp or m4). Typically, you use a metaprogram to eliminate or reduce a tedious or error-prone programming task. So, for example, instead of writing a machine code program by hand, you would use a high-level language, such as C, and then let the C compiler do the translation to the equivalent low-level machine instructions.

Metaprogramming at first may seem to be an advanced topic, suitable only for programming language gurus, but it’s not really that difficult once you know how to use the adequate tools.

Source Code Generation

In order to present a very simple example of metaprogramming, let’s assume the following totally fictional situation.

Erika is a very smart first-year undergraduate computer science student. She already knows several programming languages, including C and Ruby. During her introductory programming class, Professor Gomez, the course instructor, caught her chatting on her laptop computer. As punishment, he demanded Erika write a C program that printed the following 1,000 lines of text:

1. I must not chat in class.
2. I must not chat in class.
            ·
            ·
            ·
999. I must not chat in class.
1000. I must not chat in class.

An additional imposed restriction was that the program could not use any kind of loop or goto instruction. It should contain only one big main function with 1,000 printf instructions — something like this:

#include <stdio.h>
int main(void) {
    printf("1. I must not chat in class.\n");
    printf("2. I must not chat in class.\n");

    /* 996 printf instructions omitted. */

    printf("999. I must not chat in class.\n");
    printf("1000. I must not chat in class.\n");
    return 0;
}

Professor Gomez wasn’t too naive, so he basically expected Erika to write the printf instruction once, copy it to the clipboard, do 999 pastes, and manually change the numbers. He expected that even this amount of irksome and repetitive work would be enough to teach her a lesson. But, Erika immediately saw an easy way out — metaprogramming. Instead of writing this program by hand, why not write another program that writes this program automatically for her? So, she wrote the following Ruby script:

File.open('punishment.c', 'w') do |output|
  output.puts '#include <stdio.h>'
  output.puts 'int main(void) {'
  1.upto(1000) do |i|
    output.puts "    printf(\"#{ i }. " +
      "I must not chat in class.\\n\");"
  end
  output.puts '    return 0;'
  output.puts '}'
end

This code creates a file called punishment.c with the expected 1,000+ lines of C source code.

Although this example might seem a bit fabricated, it illustrates how easy it is to write a program that produces the source of another program. This technique can be used in more realistic settings. Let’s say that you have a C program that needs to include a PNG image, but for some reason, the deployment platform can accept one file only, the executable file. Thus, the data that conforms the PNG file data has to be integrated within the program code itself. To achieve this, we can read the PNG file beforehand and generate the C source text for an array declaration, initialized with the corresponding data as literal values. This Ruby script does exactly that:

INPUT_FILE_NAME = 'ljlogo.png'
OUTPUT_FILE_NAME = 'ljlogo.h'
DATA_VARIABLE_NAME = 'ljlogo'

File.open(INPUT_FILE_NAME, 'r') do |input|
  File.open(OUTPUT_FILE_NAME, 'w') do |output|
    output.print "unsigned char #{ DATA_VARIABLE_NAME }[] = {"
    data = input.read.unpack('C*')
    data.length.times do |i|
      if i % 8 == 0
        output.print "\n    "
      end
      output.print '0x%02X' % data[i]
      output.print ', ' if i < data.length - 1
    end
    output.puts "\n};"
  end
end

This script reads the file called ljlogo.png and creates a new output file called ljlogo.h. First, it writes the declaration of the variable ljlogo as an array of unsigned characters. Next, it reads the whole input file at once and unpacks every single input character as an unsigned byte. Then, it writes each of the input bytes as two-digit hexadecimal numbers in groups of eight elements per line. As should be expected, individual elements are terminated with commas, except the last one. Finally, the script writes the closing brace and semicolon. Here is a possible output file sample:

unsigned char ljlogo[] = {
    0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A,
    0x00, 0x00, 0x00, 0x0D, 0x49, 0x48, 0x44, 0x52,

    /* A few hundred lines omitted. */

    0x0B, 0x13, 0x00, 0x00, 0x00, 0x00, 0x49, 0x45,
    0x4E, 0x44, 0xAE, 0x42, 0x60, 0x82
};

The following C program demonstrates how you could use the generated code as an ordinary C header file. It’s important to note that the PNG file data will be stored in memory when the program itself is loaded:

#include <stdio.h>
#include "ljlogo.h"

/* Prints the contents of the array ljlogo as
   hexadecimal byte values. */
int main(void) {
    int i;
    for (i = 0; i < sizeof(ljlogo); i++) {
        printf("%X ", ljlogo[i]);
    }
    return 0;
}

You also can have a program that both generates source code and executes it on the spot. Some languages have a facility called eval, which allows you to translate and execute a piece of source code contained within a string of characters at runtime. This feature is usually available in languages that are typically interpreted, such as Lisp, Perl, Ruby, Python and JavaScript. In this Ruby code:

x = 3
s = 'x + 1'
puts eval(s)

The string 'x + 1' is translated and executed when the code is run, printing 4 as a result. Note that even the value bound to variable x is available during the runtime evaluation.

The following Ruby code demonstrates a contrived way to find the result of adding all the integer numbers between 1 and 100. Instead of using a normal loop or iteration method, we generate a big string containing the expression "1+2+3+···+99+100" and then proceed to evaluate it:

puts eval((1..100).to_a.join('+'))

The eval function should be used with care. If the string used as the argument to eval comes from an untrusted source (for example, from user input), it can be potentially dangerous (imagine what could happen if the string to evaluate contains the Ruby expression `rm -r *` ). In many cases, there are alternatives to eval that are more flexible, less insecure and do not require the speed hit of parsing code during runtime.

Quines

A quine is special kind of source code generator. The jargon file defines a quine as “a program that generates a copy of its own source text as its complete output”. You might be right if you think this lacks any practical value by itself, but as a brain-teaser, it can be mind-blowing. Here’s a quine written by Ryan Davis, which is one of the shortest ones for the Ruby language:

f="f=%p;puts f%%f";puts f%f

Run this program, and you will get it as output. You might even try something like this from a command line terminal:

ruby -e 'f="f=%p;puts f%%f";puts f%f' | ruby

Here we’re using the -e option from the command line to specify one line of Ruby source to execute, and then we use a pipe to send its output to another instance of the Ruby interpreter. The output is once again the same program source.

Modifying Programs during Runtime

Dynamic languages, such as Ruby, allow you to modify different parts of your program easily during runtime without having to generate source code explicitly as we did previously. Ruby’s core API and frameworks, such as Ruby on Rails, employ this facility to automate common programming tasks. For example, in a class definition, you can use the attr_accessor method to produce the read/write access methods automatically for a given attribute name. Thus, the following code:

class Person
  attr_accessor :name
end

is equivalent to this more verbose code:

class Person
  def name
    @name
  end
  def name=(new_name)
    @name = new_name
  end
end

The previous code has a minor drawback: the corresponding instance variable @name is not really created until you first set its value. This means you’ll get a nil value if you happen to read the name attribute before writing to it. If you’re not careful, this could introduce a few subtle bugs into your programs. The easiest way to avoid this problem is to set the @name instance variable to a reasonable value in the Person#initialize method. Because this is a quite common scenario, wouldn’t it be nice to have this method generated automatically, in addition to the read/write accessors? Let’s define an attr_initialize method that’ll do that using Ruby’s metaprogramming facilities.

First, let’s briefly address two methods that are key to performing our desired metaprogramming magic:

cls.define_method(name) { body }

This adds a new instance method to the receiving class. It takes as input the method’s name (as a symbol or string) and its body (as a code block).

obj.instance_variable_set(name, value)

The above code binds an instance variable to the specified value. The name of the instance variable should be a symbol or string, and it also should include the @ prefix.

Now, we’re ready to define the attr_initialize class method as an extension to the Object class so that any other class can use it:

class Object
  def Object.attr_initialize(*attrs)
    define_method(:initialize) do |*args|
      if attrs.length != args.length
        raise ArgumentError,
          "wrong number of arguments " +
          "(#{ args.length } for #{ attrs.length })"
      end
      attrs.zip(args).each do |attrib, arg|
        instance_variable_set("@#{ attrib }", arg)
      end
    end
    attr_accessor *attrs
  end
end

The attr_initialize method takes as input a variable number of attribute names (attrs). Each attribute name has the same position reserved for it in the dynamically created initialize method parameter list (args) in order to set its initial value. We start the new method’s code by checking that the number of arguments being received are the same as the number of attributes we originally specified. If not, we raise an error with a descriptive message. Afterward, we use the zip and each methods to iterate at the same time over the declared attributes list (attrs) and the actual arguments list (args) so as to perform a one-by-one attribute-argument binding using the instance_variable_set method. Finally, we delegate to the attr_accessor method in order to create the read/write access methods for all the declared attributes.

Here’s how we can use the attr_initialize method:

class Student
  attr_initialize :name, :id, :address
end

s = Student.new('Erika', 123, '13 Fake St')
s.address = '13 Wrong Rd'
puts s.name, s.id, s.address

The expected output would be:

Erika
123
13 Wrong Rd

Conclusion

Once you’re familiar with the techniques, metaprogramming is not as complicated as it might sound initially. Metaprogramming allows you to automate error-prone or repetitive programming tasks. You can use it to pre-generate data tables, to generate boilerplate code automatically that can’t be abstracted into a function, or even to test your ingenuity on writing self-replicating code.

I’d rather write programs that write programs than write programs.

— Richard Sites

Resources

  • The Jargon File: http://www.catb.org/esr/jargon

  • Ruby Cookbook by Lucas Carlson and Leonard Richardson, published by O’Reilly Media, 2006. Chapter 10 of this book contains 16 recipes on reflection and metaprogramming using Ruby. Highly recommended.

  • The Quine Page: http://www.nyx.net/~gthompso/quine.htm. This Web page contains quines in many different programming languages. It even has quines that work in more than one language.

Ariel Ortiz is a faculty member at the Computer Science Department of the Tecnólgico de Monterrey, Campus Estado de México. He’s been teaching computer programming for almost two decades. He’s not too sure what his favorite programming language is, but he thinks it’s either Scheme, Python or Ruby. He can be reached at ariel.ortiz@itesm.mx.