How to write programs that write programs.
This article was originally published in Linux Journal Issue #158.
A metaprogram is a program that generates other programs or program parts. Hence, metaprogramming means writing metaprograms. Many useful metaprograms are available for Linux; the most common ones include compilers (gcc
or clang
), interpreters (perl
or ruby
), parser generators (bison
), assemblers (as
or nasm
) and preprocessors (cpp
or m4
). Typically, you use a metaprogram to eliminate or reduce a tedious or error-prone programming task. So, for example, instead of writing a machine code program by hand, you would use a high-level language, such as C, and then let the C compiler do the translation to the equivalent low-level machine instructions.
Metaprogramming at first may seem to be an advanced topic, suitable only for programming language gurus, but it’s not really that difficult once you know how to use the adequate tools.
Source Code Generation
In order to present a very simple example of metaprogramming, let’s assume the following totally fictional situation.
Erika is a very smart first-year undergraduate computer science student. She already knows several programming languages, including C and Ruby. During her introductory programming class, Professor Gomez, the course instructor, caught her chatting on her laptop computer. As punishment, he demanded Erika write a C program that printed the following 1,000 lines of text:
1. I must not chat in class. 2. I must not chat in class. · · · 999. I must not chat in class. 1000. I must not chat in class.
An additional imposed restriction was that the program could not use any kind of loop or goto
instruction. It should contain only one big main
function with 1,000 printf
instructions — something like this:
#include <stdio.h>
int main(void) {
printf("1. I must not chat in class.\n");
printf("2. I must not chat in class.\n");
/* 996 printf instructions omitted. */
printf("999. I must not chat in class.\n");
printf("1000. I must not chat in class.\n");
return 0;
}
Professor Gomez wasn’t too naive, so he basically expected Erika to write the printf
instruction once, copy it to the clipboard, do 999 pastes, and manually change the numbers. He expected that even this amount of irksome and repetitive work would be enough to teach her a lesson. But, Erika immediately saw an easy way out — metaprogramming. Instead of writing this program by hand, why not write another program that writes this program automatically for her? So, she wrote the following Ruby script:
File.open('punishment.c', 'w') do |output|
output.puts '#include <stdio.h>'
output.puts 'int main(void) {'
1.upto(1000) do |i|
output.puts " printf(\"#{ i }. " +
"I must not chat in class.\\n\");"
end
output.puts ' return 0;'
output.puts '}'
end
This code creates a file called punishment.c
with the expected 1,000+ lines of C source code.
Although this example might seem a bit fabricated, it illustrates how easy it is to write a program that produces the source of another program. This technique can be used in more realistic settings. Let’s say that you have a C program that needs to include a PNG image, but for some reason, the deployment platform can accept one file only, the executable file. Thus, the data that conforms the PNG file data has to be integrated within the program code itself. To achieve this, we can read the PNG file beforehand and generate the C source text for an array declaration, initialized with the corresponding data as literal values. This Ruby script does exactly that:
INPUT_FILE_NAME = 'ljlogo.png'
OUTPUT_FILE_NAME = 'ljlogo.h'
DATA_VARIABLE_NAME = 'ljlogo'
File.open(INPUT_FILE_NAME, 'r') do |input|
File.open(OUTPUT_FILE_NAME, 'w') do |output|
output.print "unsigned char #{ DATA_VARIABLE_NAME }[] = {"
data = input.read.unpack('C*')
data.length.times do |i|
if i % 8 == 0
output.print "\n "
end
output.print '0x%02X' % data[i]
output.print ', ' if i < data.length - 1
end
output.puts "\n};"
end
end
This script reads the file called ljlogo.png
and creates a new output file called ljlogo.h
. First, it writes the declaration of the variable ljlogo
as an array of unsigned characters. Next, it reads the whole input file at once and unpacks every single input character as an unsigned byte. Then, it writes each of the input bytes as two-digit hexadecimal numbers in groups of eight elements per line. As should be expected, individual elements are terminated with commas, except the last one. Finally, the script writes the closing brace and semicolon. Here is a possible output file sample:
unsigned char ljlogo[] = {
0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A,
0x00, 0x00, 0x00, 0x0D, 0x49, 0x48, 0x44, 0x52,
/* A few hundred lines omitted. */
0x0B, 0x13, 0x00, 0x00, 0x00, 0x00, 0x49, 0x45,
0x4E, 0x44, 0xAE, 0x42, 0x60, 0x82
};
The following C program demonstrates how you could use the generated code as an ordinary C header file. It’s important to note that the PNG file data will be stored in memory when the program itself is loaded:
#include <stdio.h>
#include "ljlogo.h"
/* Prints the contents of the array ljlogo as
hexadecimal byte values. */
int main(void) {
int i;
for (i = 0; i < sizeof(ljlogo); i++) {
printf("%X ", ljlogo[i]);
}
return 0;
}
You also can have a program that both generates source code and executes it on the spot. Some languages have a facility called eval
, which allows you to translate and execute a piece of source code contained within a string of characters at runtime. This feature is usually available in languages that are typically interpreted, such as Lisp, Perl, Ruby, Python and JavaScript. In this Ruby code:
x = 3
s = 'x + 1'
puts eval(s)
The string 'x + 1'
is translated and executed when the code is run, printing 4
as a result. Note that even the value bound to variable x
is available during the runtime evaluation.
The following Ruby code demonstrates a contrived way to find the result of adding all the integer numbers between 1 and 100. Instead of using a normal loop or iteration method, we generate a big string containing the expression "1+2+3+···+99+100"
and then proceed to evaluate it:
puts eval((1..100).to_a.join('+'))
The eval
function should be used with care. If the string used as the argument to eval
comes from an untrusted source (for example, from user input), it can be potentially dangerous (imagine what could happen if the string to evaluate contains the Ruby expression `rm -r *`
). In many cases, there are alternatives to eval
that are more flexible, less insecure and do not require the speed hit of parsing code during runtime.
Quines
A quine is special kind of source code generator. The jargon file defines a quine as ``a program that generates a copy of its own source text as its complete output''. You might be right if you think this lacks any practical value by itself, but as a brain-teaser, it can be mind-blowing. Here’s a quine written by Ryan Davis, which is one of the shortest ones for the Ruby language:
f="f=%p;puts f%%f";puts f%f
Run this program, and you will get it as output. You might even try something like this from a command line terminal:
ruby -e 'f="f=%p;puts f%%f";puts f%f' | ruby
Here we’re using the -e
option from the command line to specify one line of Ruby source to execute, and then we use a pipe to send its output to another instance of the Ruby interpreter. The output is once again the same program source.
Modifying Programs during Runtime
Dynamic languages, such as Ruby, allow you to modify different parts of your program easily during runtime without having to generate source code explicitly as we did previously. Ruby’s core API and frameworks, such as Ruby on Rails, employ this facility to automate common programming tasks. For example, in a class definition, you can use the attr_accessor
method to produce the read/write access methods automatically for a given attribute name. Thus, the following code:
class Person
attr_accessor :name
end
is equivalent to this more verbose code:
class Person
def name
@name
end
def name=(new_name)
@name = new_name
end
end
The previous code has a minor drawback: the corresponding instance variable @name
is not really created until you first set its value. This means you’ll get a nil
value if you happen to read the name attribute before writing to it. If you’re not careful, this could introduce a few subtle bugs into your programs. The easiest way to avoid this problem is to set the @name
instance variable to a reasonable value in the Person#initialize
method. Because this is a quite common scenario, wouldn’t it be nice to have this method generated automatically, in addition to the read/write accessors? Let’s define an attr_initialize
method that’ll do that using Ruby’s metaprogramming facilities.
First, let’s briefly address two methods that are key to performing our desired metaprogramming magic:
- cls.define_method(name) { body }
-
This adds a new instance method to the receiving class. It takes as input the method’s name (as a symbol or string) and its body (as a code block).
- obj.instance_variable_set(name, value)
-
The above code binds an instance variable to the specified value. The name of the instance variable should be a symbol or string, and it also should include the
@
prefix.
Now, we’re ready to define the attr_initialize
class method as an extension to the Object
class so that any other class can use it:
class Object
def Object.attr_initialize(*attrs)
define_method(:initialize) do |*args|
if attrs.length != args.length
raise ArgumentError,
"wrong number of arguments " +
"(#{ args.length } for #{ attrs.length })"
end
attrs.zip(args).each do |attrib, arg|
instance_variable_set("@#{ attrib }", arg)
end
end
attr_accessor *attrs
end
end
The attr_initialize
method takes as input a variable number of attribute names (attrs
). Each attribute name has the same position reserved for it in the dynamically created initialize
method parameter list (args
) in order to set its initial value. We start the new method’s code by checking that the number of arguments being received are the same as the number of attributes we originally specified. If not, we raise an error with a descriptive message. Afterward, we use the zip
and each
methods to iterate at the same time over the declared attributes list (attrs
) and the actual arguments list (args
) so as to perform a one-by-one attribute-argument binding using the instance_variable_set
method. Finally, we delegate to the attr_accessor
method in order to create the read/write access methods for all the declared attributes.
Here’s how we can use the attr_initialize
method:
class Student
attr_initialize :name, :id, :address
end
s = Student.new('Erika', 123, '13 Fake St')
s.address = '13 Wrong Rd'
puts s.name, s.id, s.address
The expected output would be:
Erika 123 13 Wrong Rd
Conclusion
Once you’re familiar with the techniques, metaprogramming is not as complicated as it might sound initially. Metaprogramming allows you to automate error-prone or repetitive programming tasks. You can use it to pre-generate data tables, to generate boilerplate code automatically that can’t be abstracted into a function, or even to test your ingenuity on writing self-replicating code.
I’d rather write programs that write programs than write programs.
Resources
-
The Jargon File: http://www.catb.org/esr/jargon
-
Ruby Cookbook by Lucas Carlson and Leonard Richardson, published by O’Reilly Media, 2006. Chapter 10 of this book contains 16 recipes on reflection and metaprogramming using Ruby. Highly recommended.
-
The Quine Page: http://www.nyx.net/~gthompso/quine.htm. This Web page contains quines in many different programming languages. It even has quines that work in more than one language.