Code to Learn

dont learn to code

Lets Solve Hard Problems

“The best minds of my generation are thinking about how to get people to click on ads. That sucks.”  Jeff Hammerbacher
This quote very aptly points to all the wrong problems we are trying to solve. Agree those are problems we needed to be solved and bucks made out of them. And they will get solved. But they do not change the world into a better place.  Lets dare, double dare ourselves to look at those hard problems which will really make an impact on how we live in rest of our life. Here is a video of Steve Yegge’s OSCON-2011 talk where puts the same thoughts in a much better way. [youtube=http://www.youtube.com/watch?v=vKmQW_Nkfk8]   Cheers,

Project Idea - Visual Thesauras Like Application

Complexity: Tough Project Idea description: The idea is to have a mind map of words ( thesaurus ). There is a website called Visual Thesaurus which provides commercial application which one can use to visualize the link among words. For example: notation used below  -> = relates to friend -> pal firend -> neighbour Visual Thesaurus will display a liked graph of colored nodes, where each node is a word, with edges of various lengths. It provides an interactive interface to view word’s meanings etc. What to use the accomplish the task? There is a project called WordNet where we can a database of English words and their relationship with other words. http://wordnet.princeton.edu/ WordNet will serve as database to be explored. Next comes the visualization part. One can use any visualization technology. The selection of technology will make the project a bit easier or tougher. Ones that I know of are, in the order of complexity: * OpenGL - http://www.opengl.org/ * Coin3D ( based on OpenGL ) - http://doc.coin3d.org/SoQt/ * HTML5 Canvas - http://diveintohtml5.org/canvas.html#divingin There would be definitely more of these. I have listed only those which I am  aware of. How to do it? Its upto you on how to make the design and select technology. Comments and suggestions are welcome.

Lets Search Future

Yeah Search is the achilles heel of an engineer. Most of the problem solving begins with searching the dataset. In early this decade an engineer Doug Cutting dreamt a complete open source search infrastructure. In this tryst he created a family of  java projects. But the work is far from complete. Its a ongoing effort and if you work on any of them you will gain immense knowledge in that field. Search has three main parts. Crawling the web, creating an index of crawled data and Searching the index given a query. Following two Apache projects handles these functionality.
  • Apache Nutch : This is the web crawler. Crawling sounds simple but given the vastness of internet with hundred different factors like webservers, content, backlinking etc. Its a huge task. You can play around nutch and implement some new cool feature. Here is the JIRA. And you can also use nutch to create some special search engine. For example, twitter search is broken. it sucks. write a tweet search engine which sucks less.
  • Apache Lucene : Lucene is the best collection of information retrieval algorithms.  It is used for indexing the information and retrieving the results. Checkout the Jira and see if something interests you. Lucene also has subproject called Solr which is a complete search server such that index saved in it is available through HTTP rest api and it saves the index in a relational db. Many of us have done work on lucene and should be able to guide you in case you face any blockers.
One indexing and search project that a friend of mine suggested long back was indexing all the research papers(download here) and making a recommendation engine for example if i am searching for Cassandra paper, in search also include results for Amazon Dynamo related papers. FYI, These two links are awesome  papers on distributed systems. I am saving distributed systems for the last ;-) Cheers,

Apache Mahout

In next few post i am going to suggest some ideas which can be picked up as projects. And first in the series is Apache Mahout. Apache Software Foundation is probably most important opensource foundation[1]. And i love to make quote ‘nobody gets fired using apache libraries’. Apache Mahout is an open source project to implement all the major machine learning algorithms in java. It covers algorithms on recommendations, classification, clustering and many more areas i am not aware of . These algorithms are highly optimized and often run on multiple nodes than single server using map/reduce paradigm(more on this later). Many companies including adobe, amazon foursquare, yahoo are using Mahout in their production systems. If you are interested in machine learning and algorithms, its a very good project to start working. Idea is to pick up an algorithm and implement it along with proper tests and may be a demo program. Here is a list of Algorithms planned to be implemented. If you are still reading, i assume you are interested to explore a little more on this. Great !! You can start getting idea what machine learning algorithms are. Read Collective Intelligence first. In the meantime decide which algorithm you wants to implement. Join the mahout mailing list.  And yes, you can contact me. Cheers,

One for Everything

I thought let me just try to put my thoughts together on what we are trying to achieve, what we are NOT trying to achieve. Basically setting the expectation right. These are initial thoughts and bound to change over a period of time as we learn how to fit together. In simple words, we just wants to tell you while you are doing your engineering we are here to help in case you need any advice on technology, career guidance and any other thing. The reason we are focusing on final year projects right now is that we believe as a undergraduate yours ability to write clean code is very important. But i wants to make it clear, its not a spoon feeding place. We will help you with right pointers and show you the direction. But you will have to lead the path and shine. In many cases, we wont have immediate answers for the your questions. We can explore together and learn, thats our benefit, the reason we are making this effort. All the google groups, github repo, blog etc might make you think why so much hoopla, this is just a minor project , no big deal. But we think differently. These projects will go to yours resume and you should feel good when you talk about it. My final year project was second best thing i did during my engineering (first thing of course engineering itself B-) ).
  • Google Group : This mailing list is the main communication channel. If you have any question, any blocker ask a question on the group and hopefully someone will answer it.
  • Github repo : Each of you should create a github account, put your code in the repo and may share it on the codetolearn. I would highly encourage reading other people code & learning good parts and putting a comment if you find any bug or better way of doing same thing.
  • Blog : We thought of a central blog or wiki for documentation purpose. But having a personal blog and sharing your design decisions there will help you in getting started with the practice of putting thoughts on paper.
Personal github repo/personal blog will also ensure that you will have a better record of the things you did in yours college days. But nothing of this is mandatory. We highly encourage this but there is no rule that i will not answer your query if you dont have blog or something. Personal opinions about technologies is welcome. You think tumblr is best blogging platform, use it. You find svn more easy to operate, use svn. One more thing i wants to emphasize is the openness. Please share your ideas. Dont think anyone else will copy it. Implementing ideas are hard especially if it is not yours. But sharing will enable you a medium of getting feedback early. More on this later. let the coding begin.

How It Started?

Lets get to the point. How it started? Asif said: When i was in college, i was mostly clueless about what goes on in industry and how programmers write code in a company. Now after 4 yrs of work i have learnt some part of it, naturally you guys being my fellow jamians i wants to share that with you guys. We can discuss here how and when in full detail. To begin with i wants to help you in yours final year projects and any other project work you wants to take on. We can discuss individually and in groups as and when you need advice. from Saleem Ansari and me. So we had a discussion and thought of a plan on how to put up the infrastructure bits together. Basically we need these things: (1) a communication channel or a mailing list (2) a place where all the code can be hosted (3) a place where students will put their updates So here is the plan. We are creating (1) a google group - http://groups.google.com/group/codetolearn?hl=en (2) a github repository - https://github.com/codetolearn (3) a wordpress blog - http://codetolearn.wordpress.com Additionally every student will maintain his/her own blog and we will aggregate everything at a single place. We have decided to use http://jmilug.org website for aggregation. Thats it for now.