Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Using Social Data to Mine for Drug-Related Side Effects

DZone's Guide to

Using Social Data to Mine for Drug-Related Side Effects

A team of data scientists hopes to use data mining and text data analysis, along with deep learning and neural networks, to uncover trends on Twitter.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Social media provides a window into our minds in all manner of ways, with researchers using what we share online to understand a wide range of phenomenon. A recent study from the First Moscow Medical University suggests that we can do a similar thing and gain insights into the side effects of taking medicine.

The researchers propose a system whereby complaints such as feeling a little giddy that are posted on social media can now be translated into medical terms such as vertigo.

The researchers utilized recurrent neural networks and semantic vector word representation to help with their translation. The team uploaded medical texts to their system, whereby a special vocabulary was forged. This data was then used to assign a vector to each word.

"We used user comments from the web. Our network is a recurrent one, so it's capable of memorizing. Of course, not in the literal sense of that word, because the network is not a thinking system, but there is a specific mechanism it uses to memorize texts. We upload texts to it, and it then compares them to the International Classification of Diseases (ICD). The outputs are word vectors, and words and terms often encountered in a similar context are assigned similar coordinates. Thus, the neural network "compares" user texts and official medical terms," the researchers explain.

Text Analysis

The system was able to map words such as queasy, with symptoms such as nausea, but the team believes it goes beyond mere comparison of vocabulary. This is crucial as many of the complaints people make online don't resemble medical terms at all.

"The importance of this research is caused by a growing demand for text data analysis. In our project, we use text analysis methods and machine learning to extract useful information from the available data," the researchers explain.

The team believes that their work can help to improve communication in healthcare, as the language used by patients and the medical industry are often very different, which can result in confusion and misunderstanding.

"Algorithmically speaking, this task is more like translating between different languages, albeit very similar ones. The solution lies within natural language processing. In the last years, the most successful solution for most tasks in speech and text processing have been based on deep neural networks which help determine complex regularities in data. In particular, recurrent neural networks work well with serialized data because they can find links between elements while taking consideration of the context," they say.

The researchers are confident that their work can provide the medical industry with a greater understanding of how patients respond to the various treatments they're given, and especially help to uncover any unwanted side effects. They plan to do more work to fine-tune the technology so that it can also take into account lifestyle factors, including diet.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:
social data ,big data ,data mining ,news ,text data analysis

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}