Hire top tech talent with our recruitment platform

Access Free Demo
piller_image

Building your own Lisp Parser Part I

While writing a full-blown compiler for a programming language is a difficult and frustrating task, writing a smaller and more specific parser can be surprisingly easy if you know a small trick.

On the other hand, parsing problems pops up at several places in modern-day programming. So, learning this useful trick can be rewarding.

Source: http://mox.ingenierotraductor.com/2015/12/translation-is-like.html

Prerequisites

You need to know the basics of Python, more specifically, you should know the concepts of recursion and flow of control.

Objectives

After reading and understanding this post, you will be able to create simple calculators, interactive interpreters, parsers, very limited and small programming languages, etc. In general, you should be able to take input, tokenize it, perform whatever actions you want to on the tokens, and output the result of the process.

At the end of this post, you will have created a simple, Lisp-like prefix calculator. Following is a demonstration of how it’s going to look:

> ( + 3 2 )
= 5
> ( / 2 0 )
DivisionByZero
> ( – -3 2 )
= -5
> -2
= -2
> ( + ( * 3 2 ) 5 )
= 11

Step 1: Writing the Grammar

The first step to writing a parser is to write a clear grammar for its syntax. The grammar determines what is and what is not right. Once you have written the grammar, translating it to Python code is a trivial chore. The grammar will serve as our first pseudocode.

For our tiny calculator, we know that the input can come in two forms: a Number (-2, .5, +8, 8.5, 9.) or a more complicated Expression begins with a (, followed by an operator, etc.).

For writing a grammar, we need to identify different elements of the syntax. So far, we have Expression, Number, and Operator. The next important thing to do is to structure the elements (known as terms) into a hierarchical form. This is shown below:

Expression:
Number

( Operator Expression Expression )

Number:
a floating-point number ([-+][0-9][*\.][0-9]*)

Operators:

            +
            –
            *
            /

You will notice that Operator and Expression have no parent; they are independent terms.

A grammar is read from the bottom up and different choices appear on distinct lines. Our grammar says that:

  • an Operator is one of +, -, *, /.
  • a Number is a floating-point number which matches the RegEx [-+][0-9]*[\.][0-9]*
  • an Expression is either a Number or a ( followed by an Operator, followed by two other Expressions, and finally ends in a ). Note that the definition of an Expression is recursive.

Step 2: Translating the Grammar into Pseudocode

Pseudocode is fake code resembling English which is supposed to be an intermediate code that can easily be converted into real code. Although writing pseudocode is optional, it is really helpful.

The trick here is to put each term from our grammar into a separate function. Whenever we need to apply the grammar of a certain term, we only have to call the function. Following is the pseudocode implementing the grammar above:

https://gist.github.com/HackerEarthBlog/f0a5a4304326936142da39b0d853f944

This is our rough pseudo-code that should be good enough for our purpose. In the next step, we will write the real code.

3. Writing the Code

It is said very profoundly about Python that reading and writing Python feels like doing pseudocode. The same applies here, but there is one small caveat— Python doesn’t provide any function for “unreading” or putting a character back in the input buffer.

For this, I have created a small class which extends the file object to include this feature. To keep things simple, I have avoided inheritance and my class is not compatible with the file object provided by Python. Treat it like a black-box if you don’t want to understand it.

https://gist.github.com/HackerEarthBlog/6465f93e1ca155ded5e8b0c8294f16ba

Here is the buffer.py file which handles buffered input:

https://gist.github.com/HackerEarthBlog/5330e5f11f96a22608b45affa61fa858

Explanation

expression():

expression() is our top-level function and maps the Expression grammar term. We first ignore all the whitespace. After that, it takes a single non-whitespace character as input and checks it against several possibilities.

If the input string starts with +, -, ., or a digit, it is a number. We put the character back and input the entire number.

If the input string starts with (, a complete expression is to follow. We input the operator, two more expressions which will serve as the operands, and finally the closing parenthesis. We then calculate the result and return it.

number():

The number function maps the Number grammar term and is very simple—just a wrapper around getword. We input a whole word and if it converts to a float, we return it, otherwise the function returns an error message.

operator():

The operator function inputs a single character and tests it for equality against several known operators. Like the above two functions, it also maps a grammar term, i.e., Operator. In case the given operator is not valid, an error message is returned.

calc():

The calc function is actually not necessary but makes the code substantially better. In an ideal program, each function should do only one logical task. calc removes some burden from expression.

UngetableInput

Although Python 3 supports buffered input through stdin.buffer, Python 2 has no such facility. Plus, Python 3’s stdin.buffer would still require us to create some wrapper of our own.

The UngetableInput class wraps Python’s basic input to go through a buffer. We take input into the buffer and put a character back into the buffer when ungetc is called. Unless the buffer is empty, all input comes from the buffer.

Homework

This code works and leaves a lot of cleaning as homework for the reader. 🙂 Following is a list of things you can do to improve and extend the rudimentary calculator:

Improve buffer.py to handle input whitespace more accurately. Hint: You might want to use a string as the buffer.
Implement a function to get a single character while skipping all whitespace and replace the whitespace skipping loop with it.
Add the ability to create variables. Following the Lisp syntax, it should look something like the following:

( define var_name 839457.892 )

What’s Coming Next?

One of the most important parts of our program is the input buffer we created. Unfortunately, it’s not general purpose and can break when used in something more complicated than our tiny calculator program. In the next article, we will examine a bigger module which does this chore better.

Hackerearth Subscribe

Get advanced recruiting insights delivered every month

Related reads

Benchmark Metrics to Improve Your Recruiting Funnel
Benchmark Metrics to Improve Your Recruiting Funnel

Benchmark Metrics to Improve Your Recruiting Funnel

In a competitive job market, recruiting the right talent efficiently and effectively can set your organization apart. However, even with a streamlined hiring…

What Is a 30-60-90 Day Plan for New Managers?
What Is a 30-60-90 Day Plan for New Managers?

What Is a 30-60-90 Day Plan for New Managers?

Transitioning to a managerial position can be both thrilling and a bit daunting. To help managers establish themselves quickly and gain success in…

Top 10 SaaS Recruitment Software
Top 10 SaaS Recruitment Software

Top 10 SaaS Recruitment Software

The competition for good jobs is very high, and SaaS recruitment software is used in modern companies to manage the vast pool of…

How Talent Assessment Tests Improve Hiring Accuracy and Reduce Employee Turnover
How Talent Assessment Tests Improve Hiring Accuracy and Reduce Employee Turnover

How Talent Assessment Tests Improve Hiring Accuracy and Reduce Employee Turnover

Recruiting the right candidates is a science and an art. In the current world where employment opportunities are scarce, employers require more than…

10 Digital Interviewing Tips for Employers
10 Digital Interviewing Tips for Employers

10 Digital Interviewing Tips for Employers

The shift to remote work has brought digital interviewing to the forefront of recruitment strategies. Video interviews, live coding challenges, and online assessments…

Best Offboarding Software in 2025
Best Offboarding Software in 2025

Best Offboarding Software in 2025

Offboarding is as important to an organization’s talent management system and strategy as onboarding is. An effective offboarding process is how employees are…

Hackerearth Subscribe

Get advanced recruiting insights delivered every month

View More

Top Products

Hackathons

Engage global developers through innovation

Hackerearth Hackathons Learn more

Assessments

AI-driven advanced coding assessments

Hackerearth Assessments Learn more

FaceCode

Real-time code editor for effective coding interviews

Hackerearth FaceCode Learn more

L & D

Tailored learning paths for continuous assessments

Hackerearth Learning and Development Learn more