Unlock skill-first hiring with HackerEarth today

Learn more
piller_image

Continuous Deployment System

This is one of the coolest and most important things we recently built at HackerEarth.

What’s so cool about it? Just have a little patience, you will soon find out. But make sure you read till the end 🙂

I hope to provide valuable insights into the implementation of a Continuous Deployment System(CDS).

At HackerEarth, we iterate over our product quickly and roll out new features as soon as they are production ready. In the last two weeks, we deployed 100+ commits in production, and a major release comprising over 150+ commits is scheduled for launch within a few days. Those commits consist of changes to backend app, website, static files, database, and so on. We have over a dozen different types of servers running, for example, webserver, code-checker server, log server, wiki server, realtime server, NoSQL server, etc. All of them are running on multiple EC2 instances at any point in time. Our codebase is still tightly integrated as one single project with many different components required for each server. When there are changes to the codebase, you need to update all the related dedicated servers and components when deploying in production. Doing that manually would have just driven us crazy and would have been a total waste of time!

Look at the table of commits deployed on a single day.

And with such speed, we needed an automated deployment system along with automated testing. Our implementation of CDS helped the team roll out features in production with just a single command: git push origin master. Also, another reason to use CDS is that we are trying to automate everything, and I see us going in right direction.

CDS Model

The process begins with the developer pushing a bunch of commits from his master branch to a remote repository, which in our case is set up on Bitbucket. We have set up a post hook on Bitbucket, so as soon as Bitbucket receives commits from the developer, it generates a payload(containing information about commits) and sends it to the toolchain server.
The toolchain server backend receives the payload and filters commits based on the branch and neglects any commit that is not from the master branch or of the type merge commit.

    def filter_commits(branch=settings.MASTER_BRANCH, all_commits=[]):
        """
        Filter commits by branch
        """
        commits = []

        # Reverse commits list so that we have branch info in first commit.
        all_commits.reverse()

        for commit in all_commits:
            if commit['branch'] is None:
                parents = commit['parents']
                # Ignore merge commits for now
                if parents.__len__() > 1:
                    # It's a merge commit and
                    # We don't know what to do yet!
                    continue

                # Check if we just stored the child commit.
                for lcommit in commits:
                    if commit['node'] in lcommit['parents']:
                        commit['branch'] = branch
                        commits.append(commit)
                        break
            elif commit['branch'] == branch:
                commits.append(commit)

        # Restore commits order
        commits.reverse()
        return commits

Filtered commits are then grouped intelligently using a file dependency algorithm.

    def group_commits(commits):
        """
        Creates groups of commits based on file dependency algorithm
        """

        # List of groups
        # Each group is a list of commits
        # In list, commits will be in the order they arrived
        groups_of_commits = []

        # Visited commits
        visited = {}

        # Store order of commits in which they arrived
        # Will be used later to sort commits inside each group
        for i, commit in enumerate(commits):
            commit['index'] = i

        # Loop over commits
        for commit in commits:
            queue = deque()

            # This may be one of the group in groups_of commits,
            # if not empty in the end
            commits_group = []

            commit_visited = visited.get(commit['raw_node'], None)
            if not commit_visited:
                queue.append(commit)

            while len(queue):
                c = queue.popleft()
                visited[c['raw_node']] = True
                commits_group.append(c)
                dependent_commits = get_dependent_commits_of(c, commits)

                for dep_commit in dependent_commits:
                    commit_visited = visited.get(dep_commit['raw_node'], None)
                    if not commit_visited:
                        queue.append(dep_commit)
            
            if len(commits_group)>0:
                # Remove duplicates
                nodes = []
                commits_group_new = []
                for commit in commits_group:
                    if commit['node'] not in nodes:
                        nodes.append(commit['node'])
                        commits_group_new.append(commit)
                commits_group = commits_group_new

                # Sort list using index key set earlier
                commits_group_sorted = sorted(commits_group, key= lambda
                        k: k['index'])
                groups_of_commits.append(commits_group_sorted)

        return groups_of_commits

The top commit of each group is sent for testing to the integration test server via rabbitmq. First, I wrote code which sent each commit for testing, but it was too slow. So Vivek suggested that I group commits from payload and run a test on the top commit of each group, which drastically reduced number of times tests are run.

Integration tests are run on the integration test server. There is a separate branch called test on which tests are run. Commits are cherry-picked from master onto test branch. Integration test server is a simulated setup to replicate production behavior. If tests are passed, then commits are put in release queue from where they are released in production. Otherwise, the test branch is rolled back to a previous stable commit and clean-up actions are performed, including notifying the developer whose commits failed the tests.

Git Branch Model

We have been using three branches — master, test, and release. In the Master, the developer pushes the code. This branch can be unstable. Test branch is for the integration test server and release branch is for the production server. Release and test branches move parallel, and they are always stable. As we write more tests, the uncertainty of a bad commit being deployed to production will reduce exponentially.

Django Models

Each commit(or revision) is stored in the database. This data is helpful in many circumstances like finding previously failed commits, relating commits to each other using file dependency algorithm, monitoring deployment, etc.

Following are the Django models used:* Revision– commithash, commitauthor, etc. * Revision Status– revisionid, testpassed, deployedonproduction, etc. * Revision Files– revisionid, filepath * Revision Dependencies.

When the top commit of each group is passed to the integration test server, we first find its dependencies, that is, previously failed commits using the file dependency algorithm, and save it in the Revision Dependencies model so that we can directly query from the database the next time.

def get_dependencies(revision_obj):
    dependencies = set()
    visited = {}

    queue = deque()
    filter_id = revision_obj.id
    queue.append(revision_obj)

    while len(queue):
        rev = queue.popleft()
        visited[rev.id] = True
        dependencies.add(rev)
        dependent_revs = get_all_dependent_revs(rev, filter_id)

        for rev in dependent_revs:
            r_visited = visited.get(rev.id, None)
            if not r_visited:
                queue.append(rev)
    #remove revision from it's own dependecies set.
    #makes sense, right?
    dependencies.remove(revision_obj)
    dependencies = list(dependencies)
    dependencies = sorted(dependencies, key=attrgetter('id'))
    return dependencies 

def get_all_dependent_revs(rev, filter_id):
    deps = rev.health_dependency.all()
    if len(deps)>0:
        return deps

    files_in_rev = rev.files.all()
    files_in_rev = [f.filepath for f in files_in_rev]

    reqd_revisions = Revision.objects.filter(files__filepath__in=files_in_rev, id__lt=filter_id, status__health_status=False) 
    return reqd_revisions

As we saw earlier in the Overview section, these commits are then cherry-picked onto the test branch from the master branch, and the process continues.

Deploying to Production

Commits that passed integration tests are now ready to be deployed. There are a few things to consider when deploying code to production, such as restarting webserver, deploying static files, running database migrations, etc. The toolchain code intelligently decides which servers to restart, whether to collect static files or run database migrations, and which servers to deploy on based on what changes were done in the commits. You might have noticed we do all this on the basis of types and categories of files changed/modified/deleted in the commits to be released.

You might also have noted that we control deployment to production and test servers from the toolchain server (that’s the one which receives payload from bitbucket). We use fabric to achieve this. A great tool indeed for executing remote administrative tasks!

from fabric.api import run, env, task, execute, parallel, sudo
@task
def deploy_prod(config, **kwargs):
    """
    Deploy code on production servers.
    """

    revision = kwargs['revision']
    commits_to_release = kwargs['commits_to_release']

    revisions = []
    for commit in commits_to_release:
        revisions.append(Revision.objects.get(raw_node=commit))

    result = init_deploy_static(revision, revisions=revisions, config=config,
                                commits_to_release=commits_to_release)
    is_restart_required = toolchain.deploy_utils.is_restart_required(revisions)
    if result is True:
        init_deploy_default(config=config, restart=is_restart_required)

All these processes take about 2 minutes for deployment on all machines for a group of commits or single push. Our life is a lot easier; we don’t worry anymore about pushing our code, and we can see our feature or bug fix or anything else live in production in just a few minutes. Undoubtedly, this will also help us release new features without wasting much time. Now deploying is as simple as writing code and testing on a local machine. We also deployed the hundredth commit to production a few days ago using automated deployment, which stands testimony to the robustness of this system.

P.S. I am an undergraduate student at IIT-Roorkee. You can find me @LalitKhattar.

This post was originally written for the HackerEarth Engineering blog by Lalit Khattar, Summer Intern 2013 @HackerEarth

Hackerearth Subscribe

Get advanced recruiting insights delivered every month

Related reads

Top 10 HR Competencies to Build a Strong HR Department: A Comprehensive Guide
Top 10 HR Competencies to Build a Strong HR Department: A Comprehensive Guide

Top 10 HR Competencies to Build a Strong HR Department: A Comprehensive Guide

Introduction In today’s dynamic workplaces, a strong HR department is no longer a luxury – it’s a necessity. HR professionals play a crucial…

8 Steps for Conducting a Job Tasks Analysis: A Complete Guide
8 Steps for Conducting a Job Tasks Analysis: A Complete Guide

8 Steps for Conducting a Job Tasks Analysis: A Complete Guide

Job task analysis is a crucial process for understanding the specific duties and skills required for a particular role. By incorporating insights from…

Top 8 Sourcing Tools for Recruiters: A Comprehensive Guide
Top 8 Sourcing Tools for Recruiters: A Comprehensive Guide

Top 8 Sourcing Tools for Recruiters: A Comprehensive Guide

In today’s competitive talent landscape, attracting top candidates requires going beyond traditional job board postings. This is where effective sourcing tools comes into…

The 12 Most Effective Employee Selection Methods: A Comprehensive Guide
The 12 Most Effective Employee Selection Methods: A Comprehensive Guide

The 12 Most Effective Employee Selection Methods: A Comprehensive Guide

Finding the perfect fit for your team can feel like searching for a unicorn. But fret not, fellow recruiters! Here’s where employee selection…

12 Important Recruiting Metrics You Should Know
12 Important Recruiting Metrics You Should Know

12 Important Recruiting Metrics You Should Know

Recruitment forms a strong foundation to build an effective team. However, do you know if your recruitment strategy is working or not? This…

7 Modern Performance Appraisal Methods to Boost Workforce Development
7 Modern Performance Appraisal Methods to Boost Workforce Development

7 Modern Performance Appraisal Methods to Boost Workforce Development

Introduction Performance appraisal has seen a tremendous change over the years. It is no longer just a grading of employees once in a…

Hackerearth Subscribe

Get advanced recruiting insights delivered every month

View More

Top Products

Hackathons

Engage global developers through innovation

Hackerearth Hackathons Learn more

Assessments

AI-driven advanced coding assessments

Hackerearth Assessments Learn more

FaceCode

Real-time code editor for effective coding interviews

Hackerearth FaceCode Learn more

L & D

Tailored learning paths for continuous assessments

Hackerearth Learning and Development Learn more