Unlock skill-first hiring with HackerEarth today
Learn moreBuilding your own Lisp Parser Part I
While writing a full-blown compiler for a programming language is a difficult and frustrating task, writing a smaller and more specific parser can be surprisingly easy if you know a small trick.
On the other hand, parsing problems pops up at several places in modern-day programming. So, learning this useful trick can be rewarding.
Source: http://mox.ingenierotraductor.com/2015/12/translation-is-like.html
Prerequisites
You need to know the basics of Python, more specifically, you should know the concepts of recursion and flow of control.
Objectives
After reading and understanding this post, you will be able to create simple calculators, interactive interpreters, parsers, very limited and small programming languages, etc. In general, you should be able to take input, tokenize it, perform whatever actions you want to on the tokens, and output the result of the process.
At the end of this post, you will have created a simple, Lisp-like prefix calculator. Following is a demonstration of how it’s going to look:
> ( + 3 2 )
= 5
> ( / 2 0 )
DivisionByZero
> ( – -3 2 )
= -5
> -2
= -2
> ( + ( * 3 2 ) 5 )
= 11
Step 1: Writing the Grammar
The first step to writing a parser is to write a clear grammar for its syntax. The grammar determines what is and what is not right. Once you have written the grammar, translating it to Python code is a trivial chore. The grammar will serve as our first pseudocode.
For our tiny calculator, we know that the input can come in two forms: a Number (-2, .5, +8, 8.5, 9.) or a more complicated Expression begins with a (, followed by an operator, etc.).
For writing a grammar, we need to identify different elements of the syntax. So far, we have Expression, Number, and Operator. The next important thing to do is to structure the elements (known as terms) into a hierarchical form. This is shown below:
Expression:
Number
( Operator Expression Expression )
Number:
a floating-point number ([-+][0-9][*\.][0-9]*)
Operators:
+ |
– |
* |
/ |
You will notice that Operator and Expression have no parent; they are independent terms.
A grammar is read from the bottom up and different choices appear on distinct lines. Our grammar says that:
- an Operator is one of +, -, *, /.
- a Number is a floating-point number which matches the RegEx [-+][0-9]*[\.][0-9]*
- an Expression is either a Number or a ( followed by an Operator, followed by two other Expressions, and finally ends in a ). Note that the definition of an Expression is recursive.
Step 2: Translating the Grammar into Pseudocode
Pseudocode is fake code resembling English which is supposed to be an intermediate code that can easily be converted into real code. Although writing pseudocode is optional, it is really helpful.
The trick here is to put each term from our grammar into a separate function. Whenever we need to apply the grammar of a certain term, we only have to call the function. Following is the pseudocode implementing the grammar above:
https://gist.github.com/HackerEarthBlog/f0a5a4304326936142da39b0d853f944
This is our rough pseudo-code that should be good enough for our purpose. In the next step, we will write the real code.
3. Writing the Code
It is said very profoundly about Python that reading and writing Python feels like doing pseudocode. The same applies here, but there is one small caveat— Python doesn’t provide any function for “unreading” or putting a character back in the input buffer.
For this, I have created a small class which extends the file object to include this feature. To keep things simple, I have avoided inheritance and my class is not compatible with the file object provided by Python. Treat it like a black-box if you don’t want to understand it.
https://gist.github.com/HackerEarthBlog/6465f93e1ca155ded5e8b0c8294f16ba
Here is the buffer.py file which handles buffered input:
https://gist.github.com/HackerEarthBlog/5330e5f11f96a22608b45affa61fa858
Explanation
expression():
expression() is our top-level function and maps the Expression grammar term. We first ignore all the whitespace. After that, it takes a single non-whitespace character as input and checks it against several possibilities.
If the input string starts with +, -, ., or a digit, it is a number. We put the character back and input the entire number.
If the input string starts with (, a complete expression is to follow. We input the operator, two more expressions which will serve as the operands, and finally the closing parenthesis. We then calculate the result and return it.
number():
The number function maps the Number grammar term and is very simple—just a wrapper around getword. We input a whole word and if it converts to a float, we return it, otherwise the function returns an error message.
operator():
The operator function inputs a single character and tests it for equality against several known operators. Like the above two functions, it also maps a grammar term, i.e., Operator. In case the given operator is not valid, an error message is returned.
calc():
The calc function is actually not necessary but makes the code substantially better. In an ideal program, each function should do only one logical task. calc removes some burden from expression.
UngetableInput
Although Python 3 supports buffered input through stdin.buffer, Python 2 has no such facility. Plus, Python 3’s stdin.buffer would still require us to create some wrapper of our own.
The UngetableInput class wraps Python’s basic input to go through a buffer. We take input into the buffer and put a character back into the buffer when ungetc is called. Unless the buffer is empty, all input comes from the buffer.
Homework
This code works and leaves a lot of cleaning as homework for the reader. 🙂 Following is a list of things you can do to improve and extend the rudimentary calculator:
Improve buffer.py to handle input whitespace more accurately. Hint: You might want to use a string as the buffer.
Implement a function to get a single character while skipping all whitespace and replace the whitespace skipping loop with it.
Add the ability to create variables. Following the Lisp syntax, it should look something like the following:
( define var_name 839457.892 )
What’s Coming Next?
One of the most important parts of our program is the input buffer we created. Unfortunately, it’s not general purpose and can break when used in something more complicated than our tiny calculator program. In the next article, we will examine a bigger module which does this chore better.
Get advanced recruiting insights delivered every month
Related reads
Recruitment Workflow Process: A Complete Guide
Finding the perfect fit for your team can feel like searching for a unicorn. But fret not, fellow recruiters! Having a well-defined recruitment…
Conquer Your Hiring Challenges: Top Recruiting Software Picks for Small Businesses in 2024
In today’s competitive job market, attracting and keeping top talent is crucial for small businesses to thrive. But for busy small business owners…
How To Become A Technical Recruiter: Your Guide to Launching a Rewarding Career
The tech industry thrives on innovation, and at the heart of that innovation lies a crucial role: the technical recruiter. Why Technical Recruiting…
How To Use Live Coding Interviews in Tech Recruiting?
In the fast-paced world of tech recruiting, finding the perfect candidate can feel like searching for a needle in a haystack. Resumes can…
Building a Strong Talent Pipeline: Strategies for Effective Sourcing and Engagement
Struggling to find the perfect candidate when a position opens up? Build a strong talent pipeline to streamline your hiring process and have…
How to Build a High-Performance Team
A high-performance team thrives by fostering trust, encouraging open communication, and setting clear goals for all members to work towards. By focusing on…