Recently, I started looking into data sets to compete in Go Code Colorado (check it out if you live in CO). The problem with such diversity in data sets is finding a way to quickly visualize the data and do exploratory analysis. While tools like Tableau make data visualization extremely easy, the data isn’t always properly formatted to be easily consumed. Here’s are a few tips to help speed up your exploratory data analysis!
We’ll use data from two sources to aid with this example:
Is George Washington better looking on the dollar bill or represented by a word cloud built with the text of The Constitution of the USA?
A colleague recently asked me that exact question. If you want to be taken seriously in the data science world, you better be able to answer something like this!
I decided that it would be fun to show off a Python package by Andreas Mueller called word_cloud (here) to make a fun image with the text of the Constitution and an image of one of the Founding Fathers.
I must warn you, word clouds are like pie charts people like the way they look but clouds don’t provide much information. That said, this package is really neat because it allows you to easily turn text into images utilizing masks, colors, and numpy!
I’ll keep this post short, what you want to do is simple:
Select an image which you would like to mimic in both color and shape
Read your image into Python using numpy
Read your text into Python using open() and read()
Anyone old enough to remember the Monty Hall problem from the old TV Show Let’s Make a Deal? It’s a classic probability problem – but despite its simplicity, it can be hard to understand what choices to make to maximize your odds of winning.
This is the problem:
You are a contestant on a game show. The host displays three doors. One has the brand new car behind it while behind the other doors have goats behind them. Here’s a beautiful image of all possible options you would have: Continue reading →
OpenCV is an incredibly powerful tool to have in your toolbox. I have had a lot of success using it in Python but very little success in R. I haven’t done too much other than searching Google but it seems as if “imager” and “videoplayR” provide a lot of the functionality but not all of it.
I have never actually called Python functions from R before. Initially, I tried the “rPython” library – that has a lot of advantages, but was completely unnecessary for me so system() worked absolutely fine. While this example is extremely simple, it should help to illustrate how easy it is to utilize the power of Python from within R. I need to give credit to Harrison Kinsley for all of his efforts and work at PythonProgramming.net – I used a lot of his code and ideas for this post (especially the Python portion).
I am asked this question regularly, both online and in person. There is a simple answer: it doesn’t matter. There are pros and cons to both which have been written about extensively so I won’t reinvent the wheel by making a list here (do a quick search in Google and you’ll find tens of thousands of relevant results).