Exploring “Elasticsearch”

What is Elasticsearch?

Elasticsearch is a platform for performing different kinds of searches and for analyzing real time data. It helps to organize data and make it easily accessible. It is almost similar to Database Management System. Elasticsearch is a cluster which contains multiple Indices(databases), which in turn contains multiple Types(tables). These types hold multiple Documents(rows), and each document has Properties(columns). Elasticsearch supports RESTful operations. That is, you can use HTTP methods (GET, POST, PUT, DELETE, etc) in combination with an HTTP URL to manipulate your data.

Status of tasks

So, moving back to my assigned tasks, I was working with elasticsearch indices. I was trying out many sample examples. But when I started usingĀ  my data, I got stuck with lots of errors. All my time were being lost in debugging. That is the saddest part of my progress šŸ˜¦

I was stuck with one task for many days. But my mentors were very friendly. When I mentioned my issue in the last meeting, they readily agreed to help me anytime. That gave me great confidence. Now I’m trying out the options mentioned by them.


After installing Elasticsearch and Kibana, I started with adding data to elasticsearch.Ā  My task was to add the raw perceval output and the threading info to elasticsearch. For putting data into elasticsearch, we need to create index. Following the tutorial, I was using curl method of adding data to elasticsearch. According to that, inorder to add large json file, BulkAPI method needs to be used. So, you need to add a header line as shown in the documentation. After adding the header line, you need to execute the following command toĀ  add data to it.

curl -XPOST ‘yourhost:9200/jsonfilename/_bulk?pretty’ –data-binary @jsonfilename.json

At this point I was encountering errors.


So now, as mentioned by my mentors, I am learning to use sense in order to add data to elasticsearch.

Few more tasks

As all the platforms like elasticsearch, Kibana are new to me, its taking a long time for me to learn and start with these. But with the help of my mentors, I have started sorting out the issues and now I’m using different ways to resolve the issues. In the last meeting few more tasks were assigned :

  1. Ā Start working with matching script
  2. Ā Description of the process to run perceval

Along with this, the remaining tasks, which are yet to be completed, as I’m moving on very slowly.




Working with Kibana

One week over with my Outreachy project. Few tasks completed and few more new tasks being assigned for next week šŸ™‚

As per last week’s tasks, mentioned by my mentors, I had an insight into Kibana and ElasticSearch, over which I will be working later on. It was nice setting up a demo dashboard in Kibana.

What is Kibana???

Kibana is a flexible analytics and visualization platform. It helps in instant sharing and embedding of dashboards. It also helps in understanding large volumes of data easily by creating bar charts, pie charts etc. We can easily create, save, share, and embed visualized data for quick communications.

Apart from this, I also completed the testing for threading code. The testing code is available in my github repository.

When these two tasks were done, in today’s meeting few more tasks are being assigned.

As quoted by my mentors, tasks for the next week:

  1. Write a script that uploads both the raw Perceval output on an mbox, and the threading info, to ElasticSearch
  2. Write a script that uploads the Perceval output on a git repo to ElasticSearch
  3. Produce a simple dashboard for either the git or the mbox info (or both) in Kibana.
  4. Analyze the git for the Linux kernel with Perceval, and upload results to ElasticSearch.

So here goes today’s meeting’s complete conversation.

Getting Started with Outreachy Project

Happy that I got a good start with my project!

My project started with a short meeting with my mentors in IRC. One of my mentors briefed me the whole project like what I will be doing and how. Even though it was a short meeting, lots of details regarding my project was obtained. Thanks to my mentors!

These are the conversations happened Project-conversations.

So here goes the outline of my Outreachy project as quoted by my mentors:

  • Using Perceval, get the metainformation for some mboxes and from commit records in git repositories.
  • Thread the metainformation about messages, and match threads to commits.
  • Once the matching is complete, produce a combined set of information, with the thread id, the commit hash, and some info about both the thread and the commit, like the thread subject, time span, participants, commit author, date, length.
  • Upload the raw information to ElasticSearch, and then when the combined info is produced, upload that also to Elasticsearch and finally produce a dashboard for kibana.

I have started by setting up ElasticSearch and Kibana and I’m working in parallel on other tasks given. Hope to complete it soon.

My experience with Outreachy’12 with the Xen Project

I feel happy to have been chosen as an intern for Outreachy Program for women. It was a wonderful experience. Thanks to the Xen project!

I didnt have much plans to work forĀ  and get selected for the Outreacy project, but it was through the FOSS club in my University, that I came to know about Outreachy, and then the Xen project

The project on which I am working on is titled ‘Xen Code Review Dashboard’.Ā  I was nearing the deadline when I came to know about this project. Apparently, that was not at all an issue. Xen organization is very flexible and comfortable to work with. If you have submitted a nearly good proposal before the deadline, and worked on the microtasks given, your chances are high. Organizations like Xen evaluate proposals by assigning you with certain micro-tasks, and your approach to it.

I had submitted my application few hours before the deadline and contacted the mentors regarding the details of the project. The mentors of Xen were very helpful. They were readily available once I was stuck up somewhere with my work. They always gave a pleasant response without delay. That gave me a great confidence.

Let me quote down the microtask which I received from my mentors:

Write a script to use the Perceval E-mail backend to feed data from the xen-devel mailing list to an ElasticSearch database and annotating in it the messages in the same thread.

Basically I used jwzthreading algorithm to group similar threads. Firstly, I ran Perceval over the xen-devel mailbox and got a JSON output. Then the mbox mails were given as input to the jwzthreading algorithm, which produces a list of message-ids belonging to same thread. Based on these message-ids, a property tag was added to JSON fileĀ  to identify messages from similar threads. For complete code, you check my github repository.

Currently I am working over the testing part of this task, and I thank my mentors for providing me this wonderful opportunity and helping me throughout.


You might be wondering what is this ‘Outreachy’…. It is an internship program similar to Google Summer of Code that provides better opportunities for women and trans gender. Outreachy aims in women empowerment. Like GSOC, this internship’s duration is also 3-months with a stipend of about $5500. The goal of Outreachy is to “create a positive feedback loop” that supports more women participating in free and open source software. One main difference from GSOC is that, Outreachy is intended to be compatible with student schedules.

How to apply???

First of all, you need to choose an organization you wish to work for. Then, keeping in mind the deadline, you may have to come up with a good proposal. But an interesting fact about certain organizations in Outreachy is that, they may not evaluate your caliber based on your proposal. Instead, they will give you certain micro-tasks to evaluate your proposal. It depends upon the organization you choose.