Recently, I was building a small utility in C that had a good assortment of parameters that needed to be configured and reconfigured. It established connections with several other programs while it was running, so restarting and resetting these connections took enough time to be a pain. As a result, I started writing a basic interperter to take care of changing settings while the program was running. The simplest situation was that I needed to disable or enable some parameter.
//Code Example A #include <stdio.h> #include <string.h> #include <stdbool.h> int main() { bool doPrint=true; char cmd[200]; fgets(cmd,200,stdin); if(!strcmp(cmd,"off\n")) doPrint=false; if(doPrint) puts("Still On..."); return 0; }
This is an extremely basic program that hardcodes all of its logic. On occasion you might want to create a more complex interpreter. One way to make one is to use a dispatch table. A dispatch table is a method of mapping data to functions. This can be used to avoid a large block of:
if(equal(data,A)) foo() else if(equal(data,B)) bar() else...
An extension on the previous idea of parameter settings is shown below:
//Code Example B #include <string.h> #include <stdio.h> #include <stdbool.h> void parse(const char *cmd); void showState(); int main() { char cmd[200]; while(1) { parse(fgets(cmd,200,stdin)); showState(); } return 2; } //global data int x=0; int y=0; int z=0; typedef struct { const char *nam; void (*func)(int); } entry_t; void set_x(int val) {x=val;} void set_y(int val) {y=val;} void set_z(int val) {z=val;} entry_t table[] = {{"set-x", set_x}, {"set-y", set_y}, {"set-z", set_z}}; void parse(const char *cmd) { int i=3; while(i--) { if(!strncmp(cmd,table[i].nam,5)) { int arg; if(sscanf(cmd+5,"%d",&arg) > 0) table[i].func(arg); return; } } } void showState() { printf("x: %d\ny: %d\nz: %d\n",x,y,z); }
The usage of the entry_t table[] is what provides the ability for basic function lookups. Parse simply uses the table as a map, which results in the funciton call if the function is found. This allows for the command "set-x 12" to get transformed into a call set_x(12).
With this example, there is the assumption that all functions that can be accessed have the same number and type of parameters. With the addition of variable parameters and variable types, this use of function lookup can get much more interesting. This means that the type of each parameter needs to be stored somehow. One simple way is to say that each possible type is a character in a string. In the previous example, the functions accepted a single int, which could be expressed as accepting "i", or a single integer.
This example is a bit closer to the practical side of an interpreter, in that it is flexible and it does accomplish some remotely useful task. It will be able to perform some simple arithmetic, storing/displaying data (as the previous example), and provide documentation for itself. The full listing of operations are:
Note: None of these functions will accept anything other than the static parameters (this should be an example and overloading can add to the verbosity a good bit)
This example is shown below:
//Code Example C #include <string.h> #include <stdio.h> #include <stdlib.h> #include <math.h> //Type definitions typedef union { char *s; char c; float f; } arg_t; typedef struct { const char* name; void (*func)(arg_t*); const char* args; const char* doc; } cmd_t; #define MK_CMD(x) void cmd_ ## x (arg_t*) //Functions definitions MK_CMD(prompt); MK_CMD(load); MK_CMD(disp); MK_CMD(add); MK_CMD(mul); MK_CMD(sqrt); MK_CMD(exit); MK_CMD(help); arg_t *args_parse(const char *s); //The dispatch table #define CMD(func, params, help) {#func, cmd_ ## func, params, help} #define CMDS 8 cmd_t dsp_table[CMDS] ={ CMD(prompt,"s","Select the prompt for input"), CMD(load,"cf","Load into register float"), CMD(disp,"c","Display register"), CMD(add,"ff","Add two numbers"), CMD(mul,"ff","Multiply two numbers"), CMD(sqrt,"f","Take the square root of number"), CMD(exit,"","Exits the interpreter"), CMD(help,"","Display this help")}; const char *delim = " \n(,);"; void parse(char *cmd) { const char* tok = strtok(cmd,delim); if(!tok) return; int i=CMDS; while(i--) { cmd_t cur = dsp_table[i]; if(!strcmp(tok,cur.name)) { arg_t *args = args_parse(cur.args); if(args==NULL && strlen(cur.args)) return;//Error in argument parsing cur.func(args); free(args); return; } } puts("Command Not Found"); } #define ESCAPE {free(args); puts("Bad Argument(s)"); return NULL;} arg_t *args_parse(const char *s) { int argc=strlen(s); arg_t *args=malloc(sizeof(arg_t)*argc); int i; for(i=0;i<argc;++i) { char *tok; switch(s[i]) { case 's': args[i].s = strtok(NULL,delim); if(!args[i].s) ESCAPE; break; case 'c': tok = strtok(NULL,delim); if(!tok) ESCAPE; args[i].c = tok[0]; if(!islower(args[i].c)) ESCAPE; break; case 'f': tok = strtok(NULL,delim); if(sscanf(tok,"%f", &args[i].f)!=1) ESCAPE; break; } } return args; } #undef ESCAPE //Global data char prompt[200]; float regs['z'-'a']; void set_reg(char c, float f) {regs[c-'a'] = f;} float get_reg(char c) {return regs[c-'a'];} int main() { char i; for(i='a';i<='z';++i) set_reg(i,0.0f); strncpy(prompt,">",200); //Read Parse Exec Loop char cmd[200]; while(1) { printf("%s ",prompt); fflush(stdout); parse(fgets(cmd,200,stdin)); } return 2; } void cmd_prompt(arg_t *args) {strncpy(prompt,args[0].s,200);} void cmd_load(arg_t *args) {set_reg(args[0].c,args[1].f);} void cmd_disp(arg_t *args) {printf("%f\n", get_reg(args[0].c));} void cmd_add(arg_t *args) {printf("%f\n",args[0].f+args[1].f);} void cmd_mul(arg_t *args) {printf("%f\n",args[0].f*args[1].f);} void cmd_sqrt(arg_t *args) {printf("%f\n",sqrt(args[0].f));} void cmd_exit(arg_t *args) {exit(0);} void cmd_help(arg_t *args) { puts("Available Commands:"); int i=CMDS; while(i--) { cmd_t cmd=dsp_table[i]; char tmp[100];//Formatting buffer snprintf(tmp,100,"%s(%s)",cmd.name,cmd.args); printf("%10s\t- %s\n",tmp,cmd.doc); } }
The main reason this is more complex than the previous example is that the arguments need to be identified and parsed. This is done with C's strtok(). This function returns the next token that is delimited by some character, which in this example was " \n(,);". Using strtok(), the string "these words are tokens" can be broken into ["these", "words", "are", "tokens"]. With these tokens, each is either a command or an argument. The rules of this interpreter make the first token of a line the command and the rest are arguments. This lets the process become:
As per the need of varied types, the union arg_t takes care of that issue. Outside of the function the dispatch table contains enough information to see how to use the union, in the form of the previously discussed format string. Inside the function the union is assumed to be the correct structure.
Now for the fun part playing around with the basic interpreter:
> help Available Commands: help() - Display this help exit() - Exits the interpreter sqrt(f) - Take the square root of number mul(ff) - Multiply two numbers add(ff) - Add two numbers disp(c) - Display register load(cf) - Load into register float prompt(s) - Select the prompt for input > prompt Interp: Interp: sqrt(3.14159) 1.772453 Interp: load(p,1.772453) Interp: disp(p) 1.772453 Interp: add(1,5) 6.000000 Interp: mul(3,7) 21.000000 Interp: invalid Command Not Found Interp: mul(string,3) Bad Argument(s) Interp: exit()
The interpreter does show that is is able to parse and run the commands that it has and even respond logically to errors. It is still just a start if one wants to focus on getting more into the area of parsing or interpreting in general.
For those who do not want to copy/paste, the files are available in a tar here
EDIT (Sep 28th 2017): This article was originally written July 10th 2010 and since then a few people have asked a few questions about this article. One of the more frequent ones is asking about the copyright of the code contained within this article. For tutorials I write and publish on my webserver any short (sub ~20 line) snippet should be considered public domain and anything larger should be considered MIT licensed. The text of the article itself (i.e. the non-code portion) should be considered copyrighted by myself with no license to redistribute without permission. If there is further confusion feel free to request clarification.