Tag Archives: Data Science

Data Visualization – Part 1

Introduction to Data Visualization – Theory, R & ggplot2

The topic of data visualization is very popular in the data science community. The market size for visualization products is valued at $4 Billion and is projected to reach $7 Billion by the end of 2022 according to Mordor Intelligence. While we have seen amazing advances in the technology to display information, the understanding of how, why, and when to use visualization techniques has not kept up. Unfortunately, people are often taught how to make a chart before even thinking about whether or not it’s appropriate.

In short, are you adding value to your work or are you simply adding this to make it seem less boring? Let’s take a look at some examples before going through the Stoltzmaniac Data Visualization Philosophy.

Continue reading

Should I Learn R or Python? … It Doesn’t Matter

Should I learn R or Python for data science?

I am asked this question regularly, both online and in person. There is a simple answer: it doesn’t matter. There are pros and cons to both which have been written about extensively so I won’t reinvent the wheel by making a list here (do a quick search in Google and you’ll find tens of thousands of relevant results).

The fact is, you’re asking the wrong question. Continue reading

Random Forest Classification of Mushrooms

There is a plethora of classification algorithms available to people who have a bit of coding experience and a set of data. A common machine learning method is the random forest, which is a good place to start.

This is a use case in R of the randomForest package used on a data set from UCI’s Machine Learning Data Repository.

Are These Mushrooms Edible? Continue reading

Price Volatility – Basic Brownian Motion

The Situation

You are a consultant who has been hired by a business that sells one commodity product. On December 31st the price is $100 per unit. The business owner wants to know what to expect by the end of January.

Your client gave you the message:

  • Prices are based off the the sales the previous day
  • Roughly 95% of the time, the price will be +/- $10 compared to the day before

With only a few minutes to make the call, how would you decide on what to expect for the end of January? Continue reading

Seinfeld Characters – A Post About Nothing

This post is dedicated to my mother – Seinfeld’s greatest fan.

Seinfeld is a classic TV sitcom. It featured four main characters surrounded by relatively normal, everyday, run of the mill scenarios. In the spirit of Seinfeld, this post will also “be about nothing.” Continue reading