“The best minds of my generation are thinking about how to get people to click on ads. That sucks.” Jeff HammerbacherThis quote very aptly points to all the wrong problems we are trying to solve. Agree those are problems we needed to be solved and bucks made out of them. And they will get solved. But they do not change the world into a better place. Lets dare, double dare ourselves to look at those hard problems which will really make an impact on how we live in rest of our life. Here is a video of Steve Yegge’s OSCON-2011 talk where puts the same thoughts in a much better way. [youtube=http://www.youtube.com/watch?v=vKmQW_Nkfk8] Cheers,
Project Idea - Visual Thesauras Like Application
Complexity: Tough
Project Idea description:
The idea is to have a mind map of words ( thesaurus ).
There is a website called Visual Thesaurus which provides commercial
application which one can use to visualize the link among words.
For example:
notation used below -> = relates to
friend -> pal
firend -> neighbour
Visual Thesaurus will display a liked graph of colored nodes, where
each node is a word, with edges of various lengths. It provides
an interactive interface to view word’s meanings etc.
What to use the accomplish the task?
There is a project called WordNet where we can a database of English
words and their relationship with other words.
http://wordnet.princeton.edu/
WordNet will serve as database to be explored.
Next comes the visualization part. One can use any visualization
technology.
The selection of technology will make the project a bit easier or
tougher.
Ones that I know of are, in the order of complexity:
* OpenGL - http://www.opengl.org/
* Coin3D ( based on OpenGL ) - http://doc.coin3d.org/SoQt/
* HTML5 Canvas - http://diveintohtml5.org/canvas.html#divingin
There would be definitely more of these. I have listed only those
which I am aware of.
How to do it?
Its upto you on how to make the design and select technology.
Comments and suggestions are welcome.
Lets Search Future
Yeah Search is the achilles heel of an engineer. Most of the problem solving begins with searching the dataset. In early this decade an engineer Doug Cutting dreamt a complete open source search infrastructure. In this tryst he created a family of java projects. But the work is far from complete. Its a ongoing effort and if you work on any of them you will gain immense knowledge in that field.
Search has three main parts. Crawling the web, creating an index of crawled data and Searching the index given a query. Following two Apache projects handles these functionality.
- Apache Nutch : This is the web crawler. Crawling sounds simple but given the vastness of internet with hundred different factors like webservers, content, backlinking etc. Its a huge task. You can play around nutch and implement some new cool feature. Here is the JIRA. And you can also use nutch to create some special search engine. For example, twitter search is broken. it sucks. write a tweet search engine which sucks less.
- Apache Lucene : Lucene is the best collection of information retrieval algorithms. It is used for indexing the information and retrieving the results. Checkout the Jira and see if something interests you. Lucene also has subproject called Solr which is a complete search server such that index saved in it is available through HTTP rest api and it saves the index in a relational db. Many of us have done work on lucene and should be able to guide you in case you face any blockers.
Apache Mahout
In next few post i am going to suggest some ideas which can be picked up as projects. And first in the series is Apache Mahout. Apache Software Foundation is probably most important opensource foundation[1]. And i love to make quote ‘nobody gets fired using apache libraries’.
Apache Mahout is an open source project to implement all the major machine learning algorithms in java. It covers algorithms on recommendations, classification, clustering and many more areas i am not aware of . These algorithms are highly optimized and often run on multiple nodes than single server using map/reduce paradigm(more on this later). Many companies including adobe, amazon foursquare, yahoo are using Mahout in their production systems.
If you are interested in machine learning and algorithms, its a very good project to start working. Idea is to pick up an algorithm and implement it along with proper tests and may be a demo program. Here is a list of Algorithms planned to be implemented.
If you are still reading, i assume you are interested to explore a little more on this. Great !!
You can start getting idea what machine learning algorithms are. Read Collective Intelligence first. In the meantime decide which algorithm you wants to implement. Join the mahout mailing list. And yes, you can contact me.
Cheers,
One for Everything
I thought let me just try to put my thoughts together on what we are trying to achieve, what we are NOT trying to achieve. Basically setting the expectation right. These are initial thoughts and bound to change over a period of time as we learn how to fit together.
In simple words, we just wants to tell you while you are doing your engineering we are here to help in case you need any advice on technology, career guidance and any other thing. The reason we are focusing on final year projects right now is that we believe as a undergraduate yours ability to write clean code is very important.
But i wants to make it clear, its not a spoon feeding place. We will help you with right pointers and show you the direction. But you will have to lead the path and shine. In many cases, we wont have immediate answers for the your questions. We can explore together and learn, thats our benefit, the reason we are making this effort.
All the google groups, github repo, blog etc might make you think why so much hoopla, this is just a minor project , no big deal. But we think differently. These projects will go to yours resume and you should feel good when you talk about it. My final year project was second best thing i did during my engineering (first thing of course engineering itself B-) ).
- Google Group : This mailing list is the main communication channel. If you have any question, any blocker ask a question on the group and hopefully someone will answer it.
- Github repo : Each of you should create a github account, put your code in the repo and may share it on the codetolearn. I would highly encourage reading other people code & learning good parts and putting a comment if you find any bug or better way of doing same thing.
- Blog : We thought of a central blog or wiki for documentation purpose. But having a personal blog and sharing your design decisions there will help you in getting started with the practice of putting thoughts on paper.
How It Started?
Lets get to the point.
How it started?
Asif said:
When i was in college, i was mostly clueless about what goes on in industry and how programmers write code in a company. Now after 4 yrs of work i have learnt some part of it, naturally you guys being my fellow jamians i wants to share that with you guys. We can discuss here how and when in full detail. To begin with i wants to help you in yours final year projects and any other project work you wants to take on. We can discuss individually and in groups as and when you need advice. from Saleem Ansari and me.
So we had a discussion and thought of a plan on how to put up the infrastructure bits together. Basically we need these things:
(1) a communication channel or a mailing list
(2) a place where all the code can be hosted
(3) a place where students will put their updates
So here is the plan.
We are creating
(1) a google group - http://groups.google.com/group/codetolearn?hl=en
(2) a github repository - https://github.com/codetolearn
(3) a wordpress blog - http://codetolearn.wordpress.com
Additionally every student will maintain his/her own blog and we will aggregate everything at a single place. We have decided to use http://jmilug.org website for aggregation.
Thats it for now.