Data file formats for Tcl scripts
Abstract:
A Tcl script sometimes needs to save textual data on disk, and read it back. To avoid writing a parser for the data, you can use a few simple tricks that turn Tcl into a parser for free.
Introduction
A typical Tcl script stores its internal data in lists, dictionaries, and arrays. Suppose you want to write a Tcl application that can save its data on disk and read it back again. For example, the application could save a drawing project and load it back later. Writing the data from the running script to a file is not difficult: just use 'puts' to create a text file. But you also need a way to read the data back into a running script, which seems a lot harder.
You can choose to store the data in a binary form, or in a text file. This article looks only at textual data formats. We will look at a number of possible formats and how to parse them in Tcl. In particular, we will show some simple techniques that make text file parsing a lot easier.
A simple example
Suppose you have a simple drawing tool that places text and rectangle items on a canvas. To save the resulting pictures, you want a textual file format that must be easy to read, both by humans and by your drawing tool. The first and simplest file format that comes to mind, looks something like this:
example_01/datafile.dat
1 rectangle 10 10 150 50 2 blue
2 rectangle 7 7 153 53 2 blue
3 text 80 30 "Simple Drawing Tool" c red
The first two lines of this file represent the data for two blue, horizontally stretched rectangles with a line thickness of 2. The final line places a piece of red text, anchored at the center (hence the "c"), in the middle of the two rectangles.
Saving your data in a text file makes it easier to debug the application, because you can inspect the output to see if everything is correct. It also allows users to manually tinker with the saved data (which may be good or bad depending on your purposes).
When reading a data file in this format, you somehow need to parse the file and create data structures from it. To parse the file, you may be tempted to step through the file line by line, and use something like regexp
to analyse the different pieces of each line. This is what such an implementation could look like:
example_01/parser.tcl
1 canvas .c
2 pack .c
3
4 set fid [open "datafile.dat" r]
5 while { ![eof $fid] } {
6 # Read a line from the file and analyse it.
7 gets $fid line
8
9 if { [regexp \
10 {^rectangle +([0-9]+) +([0-9]+) +([0-9]+) +([0-9]+) +([0-9]+) +(.*)$} \
11 $line dummy x1 y1 x2 y2 thickness color] } {
12 .c create rectangle $x1 $y1 $x2 $y2 -width $thickness -outline $color
13
14 } elseif { [regexp \
15 {^text +([0-9]+) +([0-9]+) +("[^"]*") +([^ ]+) +(.*)$} \
16 $line dummy x y txt anchor color] } {
17 .c create text $x $y -text $txt -anchor $anchor -fill $color
18
19 } elseif { [regexp {^ *$} $line] } {
20 # Ignore blank lines
21
22 } else {
23 puts "error: unknown keyword."
24 }
25 }
26 close $fid
We read one line at a time, and use regular expressions to find out what kind of data the line represents. By looking at the first word, we can distinguish between data for rectangles and data for text. The first word serves as a keyword: it tells us exactly what kind of data we are dealing with. We also parse the coordinates, color and other attributes of each item.
Grouping parts of the regular expression between parentheses allows us to retrieve the parsed results in the variables 'x1', 'x2', etc.
This looks like a simple enough implementation, assuming that you understand how regular expressions work. But I find it pretty hard to maintain. The regular expressions also make it hard to understand.
There is a more elegant solution, known as an "active data file". It is captured in a design pattern, originally written by Nat Pryce. It is based on a very simple suggestion: Instead of writing your own parser in Tcl (using regexp or other means), why not let the Tcl parser do all the work for you?
The Active File design pattern
To explain this design pattern, we continue the example of the simple drawing tool from the previous section. First, we write two procedures in Tcl, one that draws a rectangle, and one that draws text.
example_02/parser.tcl
1 canvas .c
2 pack .c
3
4 proc d_rect {x1 y1 x2 y2 thickness color} {
5 .c create rectangle $x1 $y1 $x2 $y2 -width $thickness -outline $color
6 }
7
8 proc d_text {x y text anchor color} {
9 .c create text $x $y -text $text -anchor $anchor -fill $color
10 }
To make a picture on the canvas, we can now call these two procedures several times, once for each item we want to draw. To make the same picture as above, we need the following three calls:
example_02/datafile.dat
1 d_rect 10 10 150 50 2 blue
2 d_rect 7 7 153 53 2 blue
3 d_text 80 30 "Simple Drawing Tool" c red
Does this look familiar? The code for calling our two procedures looks almost exactly like the data file we parsed earlier. The only difference is that the keywords have changed from 'rectangle' and 'text' to 'd_rect' and 'd_text', the names of our 2 procedures.
Now we come to the insight that makes this design pattern tick: to parse the data file, we treat it like a Tcl script. The fact that the data file actually contains calls to Tcl procedures, is the heart of this design pattern.
Parsing the data file is now extremely easy:
example_02/parser.tcl
1 source "datafile.dat"
The built-in Tcl command source
reads the file, parses it, and executes the commands in the file. Since we have implemented the procedures d_rect and d_text, the source
command will automatically invoke the two procedures with the correct parameters. We will call d_rect and d_text the parsing procedures or parsing commands.
We do not need to do any further parsing. No regular expressions, no line-by-line loop, no opening and closing of files. Just one call to source
does the trick.
The data file has become a Tcl script that can be executed. This is called an Active File because it contains executable commands, not just passive data. The Active File design pattern works in many scripting languages, but here we stick to Tcl.
Advantages of using the Active File pattern:
-
No more need to write a parser. The
source
command invokes the Tcl parser which does the job.
-
The data file format is easy to read and edit by hand.
Disadvantages of using the Active File pattern:
-
If the data file contains dangerous commands such as '
exec rm *
', they get executed and can cause serious damage. You can solve this by executing the active file in a safe interpreter that blocks the dangerous commands.
Limitations of the Active File pattern:
-
This pattern does not work for all possible data formats. The format must be line-based, and every line must begin with a keyword. You write a Tcl procedure with the same name as the keyword, turning the passive keyword into an active command. This also implies that you cannot use keywords such as
if
or while
, because these would clash with Tcl's built-in commands. In fact, the reason why I changed the keyword text
into the command d_text
in our example, is because Tk already has a reserved word text
for creating text widgets, and I wanted to avoid a name clash.
Syntactic sugar
So far we have been able to come up with a very simple file format:
example_02/datafile.dat
1 d_rect 10 10 150 50 2 blue
2 d_rect 7 7 153 53 2 blue
3 d_text 80 30 "Simple Drawing Tool" c red
And we have a very simple parser for it, using only two parsing procedures and the source
command. Now, let's see how we can improve things.
When you look at large volumes of this kind of data, it is easy to get confused by all the command arguments. The first line contains the numbers 10 10 150 50 2
, and it takes some training to quickly see the first two as a pair of coordinates, the next two as another pair, and the last one as the line thickness. We can make this easier to read for a programmer by introducing some additional text in the data:
example_03/datafile.dat
1 d_rect from 10 10 to 150 50 thick 2 clr blue
2 d_rect from 7 7 to 153 53 thick 2 clr blue
3 d_text at 80 30 "Simple Drawing Tool" anchor c clr red
Prepositions like to
and from
, and argument names like thick
and color
make the data look more like a sentence (in English in this example). To accomodate these new prepositions, our parsing procedure needs to get some additional dummy arguments:
example_03/parser.tcl
1 proc d_rect {"from" x1 y1 "to" x2 y2 "thick" thickness "clr" color} {
2 .c create rectangle $x1 $y1 $x2 $y2 -width $thickness -outline $color
3 }
As you can see, the implementation does not change. The new arguments are not used in the procedure's body; their only purpose is to make the data more readable. I make it a habit to make the names of the procedure's parameters the same as the corresponding argument in the data file (e.g. from
appears at the same place in the data file and in the parameter list). That way, I can quickly see how one maps to the other. I also learned on the Tcl'ers Wiki to put quotes around the parameter names to make them stand out. It makes no difference to Tcl itself, but it makes the procedure signature more readable.
Introducing dummy arguments for readability is called "syntactic sugar". We will see other ways of making data more readable.
Option/value pairs
The Tk toolkit offers a set of widgets to create graphical interfaces. These widgets are configured with options and their values. The syntax for the configuration is simple (a dash, followed by the option name, followed by the value) and standardized (many other Tcl extensions use the same syntax for configuring their components).
With option/value pairs, our data file looks like this:
example_04/datafile.dat
1 d_rect -x1 10 -y1 10 -x2 150 -y2 50 -thickness 2
2 d_rect -thickness 2 -x1 7 -y1 7 -x2 153 -y2 53
3 d_text -x 80 -y 30 -text "Simple Drawing Tool" -anchor c -color red
I have made the two 'd_rect' calls use a different ordering of their options, just to show you that this is now possible. To parse this data, we need to introduce the parsing of option/value pairs in the parsing procedures d_rect
and d_text
. Our first attempt is to use dummy arguments (similar to the syntactic sugar above):
1 proc d_rect {opt1 x1 opt2 y1 opt3 x2 opt4 y2 opt5 thickness opt6 color} {
2 .c create rectangle $x1 $y1 $x2 $y2 -width $thickness -outline $color
3 }
Again, the implementation of the procedure does not change, because it does not use any of its dummy arguments.
This solution will only work for the simplest of data formats. It has two major disadvantages:
-
The position of the options in the argument list is fixed. For example, you cannot specify the color before the thickness. This is not so bad for a pure data file format (because the values are typically always saved in the same order anyway), but it becomes a hindrance when you also want to edit the saved data by hand.
-
The options do not have default values: you must supply all the options, you cannot leave any of them out.
Here is an implementation that solves both these problems, using Tcl arrays:
example_04/parser.tcl
1 proc d_rect {args} {
2 # First, specify some defaults.
3 set arr(-thickness) 1
4 set arr(-color) blue
5
6 # Then, 'parse' the user-supplied options and values.
7 array set arr $args
8
9 # Create the rectangle.
10 .c create rectangle $arr(-x1) $arr(-y1) $arr(-x2) $arr(-y2) \
11 -width $arr(-thickness) -outline $arr(-color)
12 }
Instead of a long list of parameters, the parsing procedure now only has one parameter called args
, which captures all the actual arguments of the call. The parameters x1
, y1
etc have disappeared. They are now handled by a local array.
The first part of the code sets the default values for some options. The second part parses option/value pairs from args
. This is done very elegantly with the built-in 'array set
' mechanism. It creates new entries in the array arr
, using the option names (including the leading dash) as keys into the array, and the option values as the array values.
If the user does not specify -color
in the call, we will use the default value of the arr(-color)
entry that we set explicitly. If they do specify the color, their value overwrites the default. The final line in the procedure body is the same as in the previous implementations, except that it now uses array entries rather than procedure arguments.
If the user forgets to specify option -x1
in the call, the array entry for -x1
is not set (there is no default for it) and the call to create rectangle
results in an error. This example shows that you can give default values to some options, making them optional, while leaving others mandatory by not specifying defaults for them. You may want to provide some user-friendly error message for such cases.
The best format is usually a combination
Now that we have seen some commonly known tricks for Tcl data files (Active File, syntactic sugar, option/value pairs), we can combine their advantages into a single data format. For the mandatory arguments, we should use fixed-position arguments, perhaps combined with dummy prepositions for readability (syntactic sugar). The optional arguments on the other hand, should be handled with the option/value pair mechanism, so that users can leave them out or change their positions in the call. The final format could then look something like this:
example_05/datafile.dat
1 d_rect from 10 10 to 150 50 -thickness 2
2 d_rect from 7 7 to 153 53 -color black
3 d_text at 80 30 "Simple Drawing Tool" -anchor c -color red
assuming that 'blue' is the default color for all items.
As a personal convention, I usually write such commands on multiple lines as follows:
example_05/datafile.dat
1 d_rect \
2 from 10 10 \
3 to 150 50 \
4 -thickness 2
5 d_rect \
6 from 7 7 \
7 to 153 53 \
8 -thickness 2
9 d_text \
10 at 80 30 "Simple Drawing Tool" \
11 -anchor c \
12 -color red
I find it slightly more readable, but that's all a matter of personal taste (or in my case perhaps lack of taste :-).
More complicated data
So far, we have worked on a very simple example involving only rectangles and text on a canvas. The data format was easy to read and easy to parse using the Active File design pattern.
We will now move to a more complex data format, to explain more advanced techniques. This will make you an expert in Tcl data file formats.
The repository tool
I used to collect design patterns. I made a repository of patterns, each with a brief description and some properties. I also kept the names, authors and ISBN numbers of the books in which I found the patterns, as a reference to be able to look them up later. To keep track of all this information, I implemented a repository tool in Tcl. It had features to organize patterns into categories and levels, and to point from each pattern to the book and page number where it was described.
The input to the tool was a file that looked like this:
1 # First, I describe some books in which you can find good design patterns
2 # and programming idioms. Each book, website or other source of patterns
3 # is specified with the 'Source' keyword, followed by a unique tag and some
4 # additional information.
5
6 Source GOF {
7 Design patterns
8 Elements of reusable object-oriented software
9 Gamm, Helm, Johnson, Vlissides
10 Addison-Wesley, 1995
11 0 201 63361 2
12 }
13
14 Source SYST {
15 A system of patterns
16 Pattern-oriented software architecture
17 Buschmann, Meunier, Rohnert, Sommerlad, Stal
18 Wiley, 1996
19 0 471 95869 7
20 }
21
22 # Next, I describe some categories. I want to group patterns
23 # in categories so I can find them back more easily. Each category
24 # has a name (such as "Access control") and a short description.
25
26 Category "Access control" {
27 How to let one object control the access to one or more
28 other objects.
29 }
30
31 Category "Distributed systems" {
32 Distributing computation over multiple processes, managing
33 communication between them.
34 }
35
36 Category "Resource handling" {
37 Preventing memory leaks, managing resources.
38 }
39
40 Category "Structural decomposition" {
41 To break monoliths down into indpendent components.
42 }
43
44 # Finally, I describe the patterns themselves. Each of them has a name,
45 # belongs to one or more categories, and occurs in one or more of the
46 # pattern sources listed above. Each pattern has a level, which can
47 # be 'arch' (for architectural patterns), 'design' for smaller-scale
48 # design patterns, or 'idiom' for language-specific patterns.
49
50 Pattern "Broker" {
51 Categories {"Distributed systems"}
52 Level arch
53 Sources {SYST:99} ; # This means that this pattern is described in
54 # the book with tag 'SYST' on page 99.
55 Info {
56 Remote service invocations.
57 }
58 }
59
60 Pattern "Proxy" {
61 # This pattern fits in two categories:
62 Categories {"Access control" "Structural decomposition::object"}
63 Level design
64 # Both these books talk about the Proxy pattern:
65 Sources {SYST:263 GOF:207}
66 Info {
67 Communicate with a representative rather than with the
68 actual object.
69 }
70 }
71
72 Pattern "Facade" {
73 Categories {"Access control" "Structural decomposition::object"}
74 Sources {GOF:185}
75 Level design
76 Info {
77 Group sub-interfaces into a single interface.
78 }
79 }
80
81 Pattern "Counted Pointer" {
82 Categories {"Resource handling"}
83 Level idiom
84 Sources {SYST:353}
85 Info {
86 Reference counting prevents memory leaks.
87 }
88 }
The descriptions of the patterns are short and pretty stupid, but that's OK for this example.
As you can see, this data file has a number of interesting new features:
-
The data has more structure. Each piece of data has a "body", contained between curly braces {}, with "child data" inside. Each structure starts with a keyword.
-
The structures can be nested: for example, the
Pattern
structure can contain an Info
structure.
-
The elements inside the structures can take many forms. Some of them are identifiers or strings (such as the
Level
element), others seem like special codes (such as SYST:353
), and some of them are even freeform text (as in the Category
or Info
structures).
-
The order of the elements in each structure is free. Look at the final two patterns to see that the order of the
Level
and Sources
elements can be swapped. The elements can indeed be placed in any order you want.
-
The data file contains Tcl comments, not only between the structures but even inside the structures. Comments allow you to make the data more understandable.
You may think that this format is a lot more complicated than the one in our previous example, and that it is nearly impossible to write a clean parser for this in Tcl. What may not seem straightforward, is that we can use the Active File pattern again, making the task a lot simpler. The parsing procedures are a bit more elaborate than before, but they are definitely not "complicated". The main trick is to use Tcl's uplevel
command to "parse" and even "execute" the struct bodies.
Here's the part of my tool that parses a data file such as the one above:
1 # We will internally store the data in these three lists:
2 set l_patterns [list]
3 set l_sources [list]
4 set l_categories [list]
5
6 # We also need a variable to keep track of the Pattern structure we are
7 # currently in:
8 set curPattern ""
9
10 # This is the parsing procedure for the 'Source' keyword.
11 # As you can see, the keyword is followed by an id (the unique tag for the
12 # source), and some textual description of the source.
13 proc Source {id info} {
14 # Remember that we saw this source.
15 global l_sources
16 lappend l_sources $id
17
18 # Remember the info of this source in a global array.
19 global a_sources
20 set a_sources($id,info) $info
21 }
22
23 # The parsing procedure for the 'Category' keyword is similar.
24 proc Category {id info} {
25 global l_categories
26 lappend l_categories $id
27
28 global a_categories
29 set a_categories($id,info) $info
30 }
31
32 # This is the parsing procedure for the 'Pattern' keyword.
33 # Since a 'Pattern' structure can contain sub-structures,
34 # we use 'uplevel' to recursively handle those.
35 proc Pattern {name args} {
36 global curPattern
37 set curPattern $name ; # This will be used in the sub-structures,
38 # which are parsed next.
39 global l_patterns
40 lappend l_patterns $curPattern
41
42 # We treat the final argument as a piece of Tcl code.
43 # We execute that code in the caller's scope. It contains calls
44 # to 'Categories', 'Level' and other commands which implement
45 # the sub-structures.
46 # This is similar to how we use the 'source' command to parse the entire
47 # data file.
48 uplevel 1 [lindex $args end]
49
50 # We're no longer inside a pattern body, so set curPattern to empty.
51 set curPattern ""
52 }
53
54 # The parsing procedure for one of the sub-structures. It is called
55 # by 'uplevel' as we described in the comments above.
56 proc Categories {categoryList} {
57 global curPattern ; # We access the global variable 'curPattern'
58 # to find out inside which structure we are.
59 global a_patterns
60 set a_patterns($curPattern,categories) $categoryList
61 }
62
63 # The following parsing procedures are for the other sub-structures
64 # of the Pattern structure.
65
66 proc Level {level} {
67 global curPattern
68 global a_patterns
69 set a_patterns($curPattern,level) $level
70 }
71
72 proc Sources {sourceList} {
73 global curPattern
74 global a_patterns
75 # We store the codes such as 'SYST:99' in a global array.
76 # My implementation uses regular expressions to extract the source tag
77 # and the page number from such a code (not shown here).
78 set a_patterns($curPattern,sources) $sourceList
79 }
80
81 proc Info {info} {
82 global curPattern
83 global a_patterns
84 set a_patterns($curPattern,info) $info
85 }
At first sight, this may seem to take much more work than what we did for the simple canvas example. But think of the power of this technique. With only a few parsing procedures and by making clever use of the uplevel
command, we can parse data files with intricate structure, containing comments, nested sub-structures and freeform textual data. Imagine writing a parser for this from scratch.
The data is parsed by the procedures such as Source
, Pattern
or Info
. The parsed data is stored internally in three arrays, and we keep the IDs of all the structures in three lists. The nestedness of the data is handled by calls to uplevel
, and by remembering in which 'scope' we currently are using the global variable curPattern
.
Note that this technique requires that your data follows Tcl syntax. This implies, among other things, that opening curly braces should be placed at the end of a line, not on the beginning of the next line. This enforces a consistent syntax onto your data, which is actually a Good Thing.
Recursive structures
In the pattern repository example, the structures of type Pattern
contain sub-structures of other types such as Info
and Sources
. What happens when a structure contains sub-structures of the same type? In other words, how do we handle recursive structures?
Suppose, for example, that you want to describe the design of an object-oriented system, which is divided recursively into subsystems:
example_06/datafile.dat
1 # Description of an object-oriented video game
2 System VideoGame {
3 System Maze {
4 System Walls {
5 Object WallGenerator
6 Object TextureMapper
7 }
8 System Monsters {
9 Object FightingEngine
10 Object MonsterManipulator
11 }
12 }
13 System Scores {
14 Object ScoreKeeper
15 }
16 }
To keep track of which System
structure we are currently in, it may seem that we need more than just a single global variable like currPattern
before. At any point during parsing, we can be inside many nested System structures, so we probably need some kind of stack, on which we push a value when we enter the System
parsing procedure, and from which we pop again at the end of the procedure. We can make such a stack using a Tcl list.
But there is a way to avoid maintaining your own stack. It is again based on a very simple suggestion: When you need a stack, see if you can use the function call stack itself. Just store the variables locally in each function call, so that Tcl's call stack takes care of the recursion automatically.
When dealing with such recursive data, I usually implement my parsing procedures like this:
example_06/parser.tcl
1 set currSystem ""
2
3 proc System {name args} {
4 # Instead of pushing the new system on the 'stack' of current systems,
5 # we remember it in a local variable, which ends up on TCL's
6 # function call stack.
7 global currSystem
8 set oldSystem $currSystem
9 set currSystem $name ; # Thanks to this, all sub-structures called by
10 # 'uplevel' will know what the name of their
11 # immediate parent System is
12
13 # Store the system in an internal data structure
14 # (details not shown here)
15 puts "Storing system '$currSystem'"
16
17 # Execute the parsing procedures for the sub-systems
18 uplevel 1 [lindex $args end]
19
20 # Pop the system off the 'stack' again. Restore the old system as if nothing happened.
21 set currSystem $oldSystem
22 }
23
24 proc Object {name} {
25 global currSystem
26 # Store the object in the internal data structure of the current
27 # system (details not shown here)
28 puts "System '$currSystem' contains object '$name'"
29 }
30
31 source "datafile.dat"
We just store the names in local variables called tmpSystem
. Since the parsing procedures are automatically called in a stack-based order by Tcl, we do not need to explicitly push/pop anything.
Another example: a CGI library by Don Libes
The CGI library by Don Libes uses the Active File pattern to represent HTML documents. The idea is that you write a Tcl script that acts as an HTML document and generates pure HTML for you. The documents contain nested structures for bulleted lists, preformatted text and other HTML elements. The parsing procedures call uplevel
to handle recursive sub-structures.
Here is a part of Don's code, showing you how he uses some of the tricks described in this article:
1 # Output preformatted text. This text must be surrounded by '<pre>' tags.
2 # Since it can recursively contain other tags such as '<em>' or hyperlinks,
3 # the procedure uses 'uplevel' on its final argument.
4 proc cgi_preformatted {args} {
5 cgi_put "<pre"
6 cgi_close_proc_push "cgi_puts </pre>"
7
8 if {[llength $args]} {
9 cgi_put "[cgi_lrange $args 0 [expr [llength $args]-2]]"
10 }
11 cgi_puts ">"
12 uplevel 1 [lindex $args end]
13 cgi_close_proc
14 }
15
16 # Output a single list bullet.
17 proc cgi_li {args} {
18 cgi_put "<li"
19 if {[llength $args] > 1} {
20 cgi_put "[cgi_lrange $args 0 [expr [llength $args]-2]]"
21 }
22 cgi_puts ">[lindex $args end]"
23 }
24
25 # Output a bullet list. It contains list bullets, represented
26 # by calls to 'cgi_li' above. Those calls are executed thanks
27 # to 'uplevel'.
28 proc cgi_bullet_list {args} {
29 cgi_put "<ul"
30 cgi_close_proc_push "cgi_puts </ul>"
31
32 if {[llength $args] > 1} {
33 cgi_put "[cgi_lrange $args 0 [expr [llength $args]-2]]"
34 }
35 cgi_puts ">"
36 uplevel 1 [lindex $args end]
37
38 cgi_close_proc
39 }
I am not going to explain the fine details of this great library, but you can find out for yourself by downloading it from Don's homepage.
Another example: Making Tcl look like C++
I once wrote a (very!) simple parser for C++ class implementations. Lazy as I am, I wrote the parser in Tcl, using many of the techniques in this article. It actually turned out to be too complicated to be of any use, but it shows how far you can go with the Active File pattern. Just look at this "data file" containing something that looks like very twisted C++ code:
1 // The following is NOT C++, it is Tcl!!
2 // Note the "documentation string" just before each class and method body.
3
4 class myListElt: public CListElt, private FString {
5 This is a documentation string for the class 'myListElt'.
6 You can see multiple inheritance at work here.
7 } {
8
9 public:
10 method int GetLength(void) {
11 This is the documentation string for the GetLength method.
12 } {
13 // This is the final argument of the 'method' command.
14 // It contains freeform text, so this is where I can write
15 // pure C++ code, including the comment you are now reading.
16 return myLength;
17 }
18
19 method char* GetString(void) {
20 This is the documentation string for the GetString method.
21 } {
22 append(0);
23 return (char*)data;
24 }
25
26 private:
27 method virtual void privateMethod(short int p1, short int p2) {
28 A private method with parameters.
29 } {
30 printf("Boo! p1=%d, p2=%d\n", p1, p2);
31 }
32 }
33
34 // We need a 'data' command for variables. We cannot say 'int b'
35 // without writing a command like 'int' for each available type.
36 data short int b {This is the documentation string for 'b'.}
37 data void* somePointer {This is the documentation string for 'somePointer'.}
38
39 method void error(short int errNo, char* message) {
40 This is a global library procedure, which reports an error message.
41 } {
42 cout << "Hey, there was an error (" << errNo << ") " << message << endl;
43 }
44
45 cpp_report
This example may be far-fetched, but it gives you an idea of the power of the Active File pattern. What you see is Tcl code, but it looks a lot like C++ code, and it can automatically generate documentation, class diagrams, programming references and of course compilable C++ code.
The parsing procedures such as method
and class
store the C++ implementation in internal Tcl data structures. Note that we need keywords like 'method' and 'data' which are not part of C++ itself. We need them here because they are the names of the parsing procedures that I wrote.
The call to cpp_report
generates the resulting C++ code.
The following fragment from the parser gives you an idea of how you can bend the Tcl interpreter to make it read a file with C++-like syntax:
1 # This is the parsing procedure for the 'class' keyword.
2 # Arguments:
3 # - class name
4 # - list of inheritance specifications, optional
5 # - comment block
6 # - body block
7 proc class {args} {
8 global _cpp
9
10 # split names from special characters like ':' ',' '*'
11 set cargs [expand [lrange $args 0 [expr [llength $args] - 3]]]
12 # -3 to avoid the comment block and the class body.
13
14 # First process the name.
15 set className [lindex $cargs 0]
16 if { $_cpp(CL) == "" } {
17 set _cpp(CL) $className ; # This is like 'currPattern' in the
18 # pattern repository example.
19 } else {
20 error "Class definition for $className: we are already inside class $_cpp(CL)"
21 }
22
23 # Then process the inheritance arguments.
24 # Obvisouly, this is already a lot more complicated than in the
25 # previous examples.
26 set inhr [list]
27 set mode beforeColon
28 set restArgs [lrange $cargs 1 end]
29 foreach arg $restArgs {
30 if { $arg == ":" } {
31 if { $mode != "beforeColon" } {
32 error "Misplaced \":\" in declaration \"class $className $restArgs\""
33 }
34 set mode afterColon
35 } elseif { $arg == "public" || $arg == "private" } {
36 if { $mode != "afterColon" } {
37 error "Misplaced \"$arg\" in declaration \"class $className $restArgs\""
38 }
39 set mode $arg
40 } elseif { $arg == "," } {
41 if { $mode != "afterInherit" } {
42 error "Misplaced \",\" in declaration \"class $className $restArgs\""
43 }
44 set mode afterColon
45 } else {
46 if { $mode != "public" && $mode != "private" } {
47 error "Misplaced \"$arg\" in declaration \"class $className $restArgs\""
48 }
49 if { ![IsID $arg] } {
50 warning "$arg is not a valid C++ identifier..."
51 }
52 lappend inhr [list $mode $arg]
53 set mode afterInherit
54 }
55 }
56
57 if { $mode != "afterInherit" && $mode != "beforeColon" } {
58 error "Missing something at end of declaration \"class $className $restArgs\""
59 }
60
61 set _cpp(CLih) $inhr
62 set _cpp(CLac) "private"
63
64 # First execute the comment block.
65 uplevel 1 [list syn_cpp_docClass [lindex $args [expr [llength $args] - 2]]]
66
67 # Then execute the body, using 'uplevel'.
68 uplevel 1 [list syn_cpp_bodyClass [lindex $args end]]
69
70 set _cpp(CL) ""
71 set _cpp(CLac) ""
72 set _cpp(CLih) ""
73 }
This is only part of the implementation, just to give you a general idea. I do not actually recommend that you parse C++ code this way, because it leads into all kinds of thorny problems. I just wanted to show an advanced example of the techniques we have described in this article.
Conclusion:
According to Perl's Larry Wall, one of the most important talents of a good programmer is lazyness. Creative lazyness, that is. This article makes two suggestions that both come down to the same thing: be lazy.
-
When you need a parser, use an existing parser and adapt your file format to please that parser (assuming that you have some degree of freedom in choosing the file format, of course).
-
When you need a stack, use the existing function call stack and forget about pushing, popping and other bookkeeping.
"Reuse" is not all about encapsulation and information hiding. Sometimes it's just about being lazy.