Collective Intelligence is a useful technique, as we try to provide web applications that will appeal to entire communities. We had an opportunity to talk to Satnam Alag, author of Collective Intelligence and learn about Collective Intelligence in Action and what it is and what will it do for Java developers.
To quote from Manning's website, this book is a hands-on guidebook for implementing collective-intelligence concepts using Java. It is the first Java-based book to emphasize the underlying algorithms and technical implementation of vital data gathering and mining techniques like analyzing trends, discovering relationships, and making predictions. It provides a pragmatic approach to personalization by combining content-based analysis with collaborative approaches. You can learn more about this book by visiting Manning's website.
Be one of the three lucky winners to win a free copy of this Book. We'd like to hear your experience with collective intelligence or where you think it would come in useful. Leave us your comment by March 13th for your chance to win.
Limited Time Only! You can get 35% off Collective Intelligence in Action book until April 1, 2009. Use code: dzone35 by ordering here.
DZone: Hi Satnam, can you please introduce yourself?
Satnam: I am currently the vice president of engineering at NextBio (http://www.nextbio.com), a life sciences search, knowledge discovery, and Web 2.0 collaboration application. My areas of interest and focus include personalization, intelligent search, machine learning and the development of large-scale distributed systems delivered as software-as-a-service (Saas). I enjoy solving information challenges, such as at NextBio, where we leverage a semantic (ontology-based) framework to deal with large amounts of data and text to make new discoveries. My past work experience includes: helping Johnson & Johnson’s BabyCenter build their personalization engine, being the chief software architect at Rearden Commerce, and being a part of the research staff at GE R&D. I am also a Sun Certified Enterprise Architect (SCEA) for the Java platform and my doctoral work at UC Berkeley was in the area of probabilistic reasoning and machine learning. I love to read and travel, especially with my family.
DZone: In a nutshell, what is Collective Intelligence? Is it a relatively new concept?
Satnam: Web applications are undergoing a revolution. In this post-dot-com era, the web is transforming. Newer web applications trust their users, invite them to interact, connect them with others, gain early feedback from them, and then use the collected information to constantly improve the application. Web applications that take this approach develop deeper relationships with their users, provide more value to users, are “stickier” and experience greater adoption, and ultimately offer more targeted experiences for each user according to their personal need.
Web user behavior is evolving rapidly. Users are expressing themselves more freely and more often. This expression may be in the form of sharing their opinions on a product or a service through reviews or comments; through sharing and tagging content; through participation in an online community; or by contributing new content. This increased user interaction and participation gives rise to data that can be converted into intelligence in your application.
Collective intelligence is about making your application more valuable by tapping into wise crowds. More formally, collective intelligence (CI) as used in my book (Collective Intelligence in Action) simply and concisely means
To effectively use the information provided by others to improve one’s application.
This is a fairly broad definition of collective intelligence—one which uses all types of information, both inside and outside the application, to improve the application for a user. This book introduces you to concepts from the areas of machine learning, information retrieval, and data mining, and demonstrates how you can add intelligence to your application. You’ll be exposed to how your application can learn about individual users by correlating their interactions with those of others to offer a highly personalized experience.
Collective Intelligence is an active field of research that predates the web. Scientists from the fields of sociology, mass-behavior, and computer science have made important contributions to this field. When a group of individuals collaborate or compete with each other, intelligence or behavior that otherwise did not exist suddenly emerges, this is commonly known as Collective Intelligence. The actions or influence of a few individuals slowly spreads across the community until it becomes the norm.
DZone: Machine Learning and constraints are generally seen as academic topics, with the main use being timetabling. Is this a lot more than machine learning?
Satnam: Collective Intelligence in Action is a practical book for applying collective intelligence to real-world web applications – there is a lot more here than yet another pedantic book on machine learning. The book is divided into three parts.
Part 1 deals with collecting data both within and outside the application, which will be translated into intelligence later. The first three chapters in this part deal with gathering information from within one’s application, while the remaining two focus on gathering information from outside of one’s application. In this part, I cover the basics of capturing intelligence from user interaction, content-based and collaborative-based approaches to similarity computation, the use of tags and tag clouds, various content types, especially those associated with collective intelligence, searching the blogosphere, and intelligent web crawling.
Part 2 of the book is focused on deriving intelligence from the information collected. It consists of four chapters – an introductory chapter to the data mining process, standards and toolkits, developing a text-analysis toolkit, finding patterns through clustering, and making predictions. In this section, I cover the use of WEKA for data mining, Java Data Mining (JDM) specification, and Lucene for text mining.
Part 3 consists of two chapters, which deal with applying intelligence within one’s application. The first chapter in this part, shows how one can leverage Lucene. It also covers six different approaches being taken in the area of intelligent search. The last chapter illustrates how to build a recommendation engine using both content-based and collaborative-based approaches. It also covers real-world case studies on how recommendation engines have been build at Amazon, Google News, and Netflix.
DZone: How does your book approach collective intelligence?
Satnam: In the book, I cover a broad spectrum of topics – from simple illustrative examples that explain the concept and the math behind it, to the ideal architecture for developing the feature, to the database schema, to code implementation and use of open-source toolkits. Regardless of your background and nature of development I am sure you will find the examples and code samples very useful. You should be able to directly use the code developed in this book. This is a practical book and I present a holistic view on things required to apply these techniques in the real-world. Consequently, the book discusses the architectures for implementing intelligence – you will find lots of diagrams, especially UML diagrams, a number of screen shots from well-known sites, in addition to code listings, and even database schema designs.
There are a plethora of examples. Typically, concepts and the underlying math for algorithms are explained via examples with detailed step-by-step analysis. Accompanying the examples is Java code that demonstrates the concepts by implementing the concept and/or using open-source frameworks.
DZone: What toolkits and APIs do you recommend using for Java developers?
Satnam: A lot of work has been done by the open-source community in Java in the areas of text processing and search (Lucene), data mining (WEKA), web crawling (Nutch), and data mining standards (JDM). This book leverages these frameworks; presents examples and develops code that you can directly use in your Java application.
The first few chapters do not assume knowledge of Java. You should be able to follow the concepts and the underlying math using the illustrative examples. For the latter chapters a basic understanding of Java will be helpful. The book uses a number of diagrams and screen shots to illustrate the concepts. The resources section of each chapter contains links to other useful content.
DZone: Are these libraries mature enough to get the job done right?
Satnam: Fortunately, a lot of work has been done by the open-source community in Java. The Java-based toolkits and APIs used in the book are very widely used, mature, and scalable. A few words about each of them:
Apache Lucene is an open-source Java-based full-text search engine that has been very widely used in the industry. Lucene provides a robust framework for adding one’s own expertise in text processing.
Nutch is a Java-based open source web crawler that has been demonstrated to scale well. Both Lucene and Nutch were developed by Doug Cutting. Nutch uses a plug-in–based architecture, allowing it to be easily customized. Its processing is segmented, allowing it to be distributed.
The Waikato Environment for Knowledge Analysis, commonly known as WEKA is one of the most popular suites of data mining algorithms that have been written in Java. WEKA contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It also has a GUI application through which one can apply the various data mining algorithms to datasets.
JDM aims at building a standard API for data mining with the goal that client applications coded to the specification are not dependent on any specific vendor application. The JDBC specification provides a good analogy to the potential of JDM. JDM has wide support from the industry, with representations from a number of companies including Oracle, IBM, SPSS, CA, Fair Issac, SAP, SAS, BEA, and others. JDM was developed as part of JSR 73 and JSR 247.
DZone: What do you see happening for the future of collective intelligence?
Satnam: The wider adoption and use of collective intelligence to personalize a site for a user, to aid him in search and to make decisions, to make the application more sticky are much cherished goals that web applications have tried to fulfill and will continue to do so in the future.
Applying concepts from collective intelligence could be the difference between a successful and a failed website. Given that most applications now invite the users to interact and leverage user-generated content, new content is being generated at a phenomenal rate. Showing the right content, to the right user, at the right time is the key for creating a sticky application. I will be surprised if most successful websites do not eventually leverage collective intelligence to provide a personalized experience to their users.
Remember, applications that make use of every user interaction to improve the value of the application for the current user and for future users, will become more viral and in turn will dominate their markets. This book provides you with the set of tools that you will need to leverage the information provided by the users on your site. Regardless of the form, shape or volume of the information you seek to manage; this book will guide you in harnessing the potential of your information to personalize the site for your users. Focus on the user, and you shall succeed. For collective intelligence begins with a crowd of one.
DZone: Thank you Satnam for your time.