Reply to comment

Tokenizing

I've found that when beginning to write a tokenizer from scratch, it's best to lay out the definitions of your token types. It doesn't have to be in some magic format; just something you can understand will work.

For each token type, create a function to read that token and no more, and return that token as an object. (Be sure to add error handling and such.) Different types of tokens should use the same base class. (You can use an enum or subclasses. I chose an enum for its simplicity.) As for the name, Read<em>Type</em> is what I chose, as it does exactly that: reads a number, string, etc. from the input.

Then have a function to figure out what token follows, and have it return the token using the function you created earlier. ReadToken is a descriptive name.

It's a simple while loop to read in multiple tokens. Be sure to skip whitespace between tokens if it's unimportant. Naturally, this goes in a function called Tokenize or similar.

And now you have your tokenizer. Simple, wasn't it?

Reply

The content of this field is kept private and will not be shown publicly.