Featured Post

My first book

Its been an amazing year, finished two summer of codes, got out of college, got voted as a committer at Apache Mahout, landed a job at Google. Meanwhile, I got an opportunity to write a book about Mahout. Its called Mahout in Action and its available here.

Read More

Google Summer of Code 2010 @ Mahout

Posted by Robin Anil | Posted in gsoc, mahout | Posted on 20-03-2010

0

Its a new year and a new Google Summer of Code. I would be mentoring this year at Mahout along with Grant and Ted. If you are a student interested in participating, this is what you should do:

  1. Instead of a wiki like last year this year we are putting up ideas and proposals as JIRA issues here. If you are looking for ideas for your proposal take a look at the JIRA link and explore things which you want to implement. If you have an idea of your own proceed to step 2
  2. Once you have identified a topic or an algorithm, post on the Mahout mailing list and discuss it with the rest of the committers.
  3. Once you finalize your proposal, go and apply through the GSOC website

I would urge you guys to stick to the mahout mailing list to discuss anything related to GSOC and not mail just the mentors directly. There is a whole bunch of great developers on the list ready to give you awesome feedback. Please make sure your proposal is well balanced in terms of timelines and targets. Don’t say you would implement the Linux kernel in 2 months :). There are plenty of guides available online on how to make a good GSOC proposal.

For many of you this would be a first time experience working in open-source, and might find it intimidating(I did feel that when I first joined). That feeling will disappear in a couple of weeks and you mingle with the community and take part in healthy discussions. So focus on getting the proposal right

Happy Summer of Coding

Mahout logo: And all that Jazz!

Posted by Robin Anil | Posted in mahout | Posted on 16-03-2010

1

Mahout is growing at a steady pace. We have new committers, plenty of contributions in terms of patches and we are ready to release 0.3 version. It took us almost 6 months to reach here and it is not going to stop here. Lukáš Vlček was the creator of our logo with the cute little elephant. Couple of days ago, I felt utterly bored of writing chapters for the book and wanted to unleash my creativity. I pinged Lukáš and asked him permission to go wild on the logo. After a day of vector drawing and shading and curvifying fonts and using one of Lukáš’ humanoid caricature, I ended up with a jazzed up version of the logo.
I wasn’t really sure what colors to use and where. So I went to the mahout community and asked them to give inputs. We discussed pros and cons of each color, each shape, each element in the logo. Finally with approval from the Mahout committers, the new Mahout logo is voted in and here it is. Hope you like it too.

My first book

Posted by Robin Anil | Posted in featured, mahout | Posted on 15-03-2010

0

Mahout in Action by Sean Owen and Robin Anil

Its been an amazing year, finished two summer of codes, got out of college, got voted as a committer at Apache Mahout, landed a job at Google. Meanwhile, I got an opportunity to write a book about Mahout. Its called Mahout in Action and its available here.

Apache Mahout 0.2 Released – Now classify, cluster and generate recommendations!

Posted by Robin Anil | Posted in classification, clustering, datamining, java, lucene, machine learning, mahout, map/reduce, recommendations | Posted on 18-11-2009

0

Apache Mahout

Apache Mahout

For the past two years, I have been working with this amazing bunch of people whilst, being paid by Google in their summer of code program in a project called Mahout. And like the name says, it is trying to tame the young beast known as Hadoop. I have received a lot from the community. Being part of the project, I have got some real exposure to Java, data mining, machine learning and hands on experience over distributed systems like Hadoop, Hbase, Pig.  The project is still in its infancy, but, its ambitions are high in the sky. I am happy to announce the second release of the project, and proud to be a part of it. I hope people will adapt it in their projects and that it becomes the defacto standard machine learning library the way lucene and hadoop has become in their respective focus areas.

If you are already excited and want to take it for a ride, read Grant’s article on IBM developerworks here
The release announcement below

Apache Mahout 0.2 has been released and is now available for public download at http://www.apache.org/dyn/closer.cgi/lucene/mahout

Up to date maven artifacts can be found in the Apache repository at
https://repository.apache.org/content/repositories/releases/org/apache/mahout/

Apache Mahout is a subproject of Apache Lucene with the goal of delivering scalable machine learning algorithm implementations under the Apache license. http://www.apache.org/licenses/LICENSE-2.0

Mahout is a machine learning library meant to scale: Scale in terms of community to support anyone interested in using machine learning. Scale in terms of business by providing the library under a commercially friendly, free software license. Scale in terms of computation to the size of data we manage today.

Built on top of the powerful map/reduce paradigm of the Apache Hadoop project, Mahout lets you solve popular machine learning problem settings like clustering, collaborative filtering and classification
over Terabytes of data over thousands of computers.

Implemented with scalability in mind the latest release brings many performance optimizations so that even in a single node setup the library performs well.

The complete changelist can be found here:

http://issues.apache.org/jira/browse/MAHOUT/fixforversion/12313278

New Mahout 0.2 features include

  • Major performance enhancements in Collaborative Filtering, Classification and Clustering
  • New: Latent Dirichlet Allocation(LDA) implementation for topic modelling
  • New: Frequent Itemset Mining for mining top-k patterns from a list of transactions
  • New: Decision Forests implementation for Decision Tree classification (In Memory & Partial Data)
  • New: HBase storage support for Naive Bayes model building and classification
  • New: Generation of vectors from Text documents for use with Mahout Algorithms
  • Performance improvements in various Vector implementations
  • Tons of bug fixes and code cleanup

Getting started: New to Mahout?

For more information on Apache Mahout, see http://lucene.apache.org/mahout

page generated in 1726383470401.4 ms