Tidy data in data mining

Introduction In almost any data mining project, about 80% of the work is data cleaning (Dasu and Johnson 2003). That is, getting data ready for analysis. This short blog post aims to shed some light on a small but important part of data cleaning that Wickham (2014) called data tidying. We first outline properties of tidy data and present examples of messy forms in which data can be received then demonstrate how one can go Read more…

Linear Regression in R

Introduction The focus of this blog post is on simple linear regression using R. Simple linear regression is useful for examining or modelling the relationship between two numeric variables. Before looking for the type of relationship between pairs of quantities, it is recommended to conduct a correlation analysis to determine whether there is a linear relationship between these quantities. For this post we will use the classic cars data set to create a linear regression Read more…

Machine Learning and what it can do for you and your business (Part 1)

Machine Learning and what it can do for you and your business (Part 1) If you have not yet gotten onto the Machine Learning (ML) bandwagon, then your business is probably missing out some fantastic benefits it could surely be enjoying! To put this assertion in perspective, imagine getting meaningful insights from data that your business is already producing, and being able to solve business problems that may currently seem complex using such insights. The Read more…