The Algorithm That Automatically Creates Wikibooks
There is an algorithm that can create Wikibooks, but is it as effective as human curation?
Join the DZone community and get the full member experience.Join For Free
Wikibooks are fascinating things. At the time of writing this article, there are over 3,000 of them, and they aim to pull together content from Wikipedia on a wide range of topics to provide the reader with a comprehensive overview of a particular field.
While some are mere pamphlets, others, such as the Machine Learning book, come in at several thousand pages. It's hard enough reading such a tome, but compiling it is equally difficult, as you'd expect for a book with over 550 chapters.
Researchers from Ben-Gurion University of the Negev believe they've found an answer. In a recently published paper, they describe their AI-driven approach to automating the generation of Wikibooks.
There are nearly 7,000 Wikibooks already on the market. These books were made available by Wikipedia for such research on account of their quality, and each has been viewed at least 1,000 times. After applying additional filters for things such as length, they were left with over 400 to train the algorithm, with the remainder used for testing.
The task of creating the book was then split into sections, each requiring a unique skill. For instance, a title would be required, followed by a concept, and so on, before articles are chosen that best fit certain sections.
The best articles were selected based on their relative levels of popularity in terms of other articles linking to them. This was then built upon by comparing the network structure of the Wikibook to that of human-generated books. If including an article in the book made it more like a human-generated book, the article was included, but if it made it less human, it was culled.
The team then set about organizing the articles into chapters. They did this by looking for clusters within the network of articles to see whether there were clear thematic similarities. Last, but not least, the researchers determined the order the articles appeared within each of these chapters by putting each article into a pair and using network models to decide which would appear first.
Put to the Test
So how did the AI do? Well, they were able to produce automated versions of a Wikibook that was pretty similar to ones that had already been created by human curators. These books contained much of the same material and were in much the same order.
The next step is to try and scale up the project and produce a number of Wikibooks in areas that have, so far, not had books produced. They'll then put these out into the market and test the response from readers in terms of pageviews and the number of edits to gauge their quality and popularity.
It's a fascinating project, and it will be well worth keeping an eye on the books that come out of it to see just how good the algorithm is. What is unknown at this stage is whether the books will be marked as having been produced by AI and whether this would influence the readership numbers. Either way, it's worth keeping an eye out for.
Published at DZone with permission of Adi Gaskell, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.