top of page

Twitter: Useful Tool for Machine Learning (ML)

Data is literally everywhere. This may seem as though this is solely a benefit, however at

times there is too much as it abounds. The vast amount, when attempted to analyze, may make it

difficult to understand what is really there and how it may be useful. Whether researching

InfoSec or the latest system upgrades, there should be methods and tools present to alleviate the

issue.

One potential source of this data is Twitter. People and businesses tweet on nearly

everything. This may be food, dinner, present mood, politics, or any other number of items. One

useful area that has reviewed this aspect of Twitter has been ML. This is a great source for data

mining with virtually any subject. This is also a free source for people to express their opinions

or thoughts. This lack of barrier for entry has allowed everyone to input their thoughts, whereas

other venues have not done this. At times, there may be results slightly skewed by the trolls. In

light of the overall number of entries, the level of skew due to this would not be significant and

could be primarily removed with a script.

One such application recently occurred with a study on opioid abuse. Tim Mackey,

Janani Kalyanam, and Takeo Katsuki in the American Journal of Public Health published their

research on detecting prescription opioid abuse promotion and access using Twitter

(http://alphapublications.org/doi/pdfplus/10.2105/AJPH.2017.303994). The researchers’

methodology included collecting tweets from Twitter. These were only the publicly accessible

items within Twitter. Their search filter was for terms associated with opioid prescriptions. The

researchers used unsupervised machine learning and applying topic modeling.

The sample analyzed was 619,937 tweets with the term codeine, percocet, fentanyl,

vicodin, oxycontin, oxycodone, and hydrocone. The sample period was from June to November

2015. From these 1,778 tweets or less than 1% were noted in marketing the sale of controlled

substances online. Of these, 90% had embedded links.

While no methodology for research is perfect, this falls within the realm of acceptable

protocols. ML has taken this and increased its potential exponentially. The continued ML use

and application will further research on not only the lease level but also the understanding and

comprehension of the data itself, along with its implications. This was only one example of the

many where ML would be exceptional in its application. As applied to InfoSec, this could also

be used to research compromises, data lost, or other subjects.

About the Author - Charles Parker, II has been working in the info sec field for over a decade, performing pen tests, vulnerability assessments, consulting with small- to medium-sized businesses to mitigate and remediate their issues, and preparing IT and info sec policies and procedures. Mr. Parker’s background includes work in the banking, medical, automotive, and staffing industries.

Featured Posts
Check back soon
Once posts are published, you’ll see them here.
Recent Posts
Archive
Search By Tags
Follow Us
  • Facebook Basic Square
  • Twitter Basic Square
  • Google+ Basic Square
bottom of page