Google Vision API in R – RoogleVision

Using the Google Vision API in R

Utilizing RoogleVision

After doing my post last month on OpenCV and face detection, I started looking into other algorithms used for pattern detection in images. As it turns out, Google has done a phenomenal job with their Vision API. It’s absolutely incredible the amount of information it can spit back to you by simply sending it a picture.

Also, it’s 100% free! I believe that includes 1000 images per month. Amazing!

In this post I’m going to walk you through the absolute basics of accessing the power of the Google Vision API using the RoogleVision package in R.

As always, we’ll start off loading some libraries. I wrote some extra notation around where you can install them within the code.

# Normal Libraries

# devtools::install_github("flovv/RoogleVision")
library(jsonlite) # to import credentials

# For image processing
# source("")
# biocLite("EBImage")

# For Latitude Longitude Map

Google Authentication

In order to use the API, you have to authenticate. There is plenty of documentation out there about how to setup an account, create a project, download credentials, etc. Head over to Google Cloud Console if you don’t have an account already.

# Credentials file I downloaded from the cloud console
creds = fromJSON('credentials.json')

# Google Authentication - Use Your Credentials
# options("googleAuthR.client_id" = "")
# options("googleAuthR.client_secret" = "")

options("googleAuthR.client_id" = creds$installed$client_id)
options("googleAuthR.client_secret" = creds$installed$client_secret)
options("googleAuthR.scopes.selected" = c(""))

Now You’re Ready to Go

The function getGoogleVisionResponse takes three arguments:

  1. imagePath
  2. feature
  3. numResults

Numbers 1 and 3 are self-explanatory, “feature” has 5 options:


These are self-explanatory but it’s nice to see each one in action.

As a side note: there are also other features that the API has which aren’t included (yet) in the RoogleVision package such as “Safe Search” which identifies inappropriate content, “Properties” which identifies dominant colors and aspect ratios and a few others can be found at the Cloud Vision website

Label Detection

This is used to help determine content within the photo. It can basically add a level of metadata around the image.

Here is a photo of our dog when we hiked up to Audubon Peak in Colorado:

plot of chunk unnamed-chunk-2

dog_mountain_label = getGoogleVisionResponse('dog_mountain.jpg',
                                              feature = 'LABEL_DETECTION')
##            mid           description     score
## 1     /m/09d_r              mountain 0.9188690
## 2 /g/11jxkqbpp mountainous landforms 0.9009549
## 3    /m/023bbt            wilderness 0.8733696
## 4     /m/0kpmf             dog breed 0.8398435
## 5    /m/0d4djn            dog hiking 0.8352048

All 5 responses were incredibly accurate! The “score” that is returned is how confident the Google Vision algorithms are, so there’s a 91.9% chance a mountain is prominent in this photo. I like “dog hiking” the best – considering that’s what we were doing at the time. Kind of a little bit too accurate…

Landmark Detection

This is a feature designed to specifically pick out a recognizable landmark! It provides the position in the image along with the geolocation of the landmark (in longitude and latitude).

My wife and I took this selfie in at the Linderhof Castle in Bavaria, Germany.

us_castle <- readImage('us_castle_2.jpg')

plot of chunk unnamed-chunk-4

The response from the Google Vision API was spot on. It returned “Linderhof Palace” as the description. It also provided a score (I reduced the resolution of the image which hurt the score), a boundingPoly field and locations.

  • Bounding Poly – gives x,y coordinates for a polygon around the landmark in the image
  • Locations – provides longitude,latitude coordinates
us_landmark = getGoogleVisionResponse('us_castle_2.jpg',
                                      feature = 'LANDMARK_DETECTION')
##         mid      description     score
## 1 /m/066h19 Linderhof Palace 0.4665011
##                               vertices          locations
## 1 25, 382, 382, 25, 178, 178, 659, 659 47.57127, 10.96072

I plotted the polygon over the image using the coordinates returned. It does a great job (certainly not perfect) of getting the castle identified. It’s a bit tough to say what the actual “landmark” would be in this case due to the fact the fountains, stairs and grounds are certainly important and are a key part of the castle.

us_castle <- readImage('us_castle_2.jpg')
xs = us_landmark$boundingPoly$vertices[[1]][1][[1]]
ys = us_landmark$boundingPoly$vertices[[1]][2][[1]]

plot of chunk unnamed-chunk-6

Turning to the locations – I plotted this using the leaflet library. If you haven’t used leaflet, start doing so immediately. I’m a huge fan of it due to speed and simplicity. There are a lot of customization options available as well that you can check out.

The location = spot on! While it isn’t a shock to me that Google could provide the location of “Linderhof Castle” – it is amazing to me that I don’t have to write a web crawler search function to find it myself! That’s just one of many little luxuries they have built into this API.

latt = us_landmark$locations[[1]][[1]][[1]]
lon = us_landmark$locations[[1]][[1]][[2]]
m = leaflet() %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  setView(lng = lon, lat = latt, zoom = 5) %>%
  addMarkers(lng = lon, lat = latt)

Face Detection

My last blog post showed the OpenCV package utilizing the haar cascade algorithm in action. I didn’t dig into Google’s algorithms to figure out what is under the hood, but it provides similar results. However, rather than layering in each subsequent “find the eyes” and “find the mouth” and …etc… it returns more than you ever needed to know.

  • Bounding Poly = highest level polygon
  • FD Bounding Poly = polygon surrounding each face
  • Landmarks = (funny name) includes each feature of the face (left eye, right eye, etc.)
  • Roll Angle, Pan Angle, Tilt Angle = all of the different angles you’d need per face
  • Confidence (detection and landmarking) = how certain the algorithm is that it’s accurate
  • Joy, sorrow, anger, surprise, under exposed, blurred, headwear likelihoods = how likely it is that each face contains that emotion or characteristic

The likelihoods is another amazing piece of information returned! I have run about 20 images through this API and every single one has been accurate – very impressive!

I wanted to showcase the face detection and headwear first. Here’s a picture of my wife and I at “The Bean” in Chicago (side note: it’s awesome! I thought it was going to be really silly, but you can really have a lot of fun with all of the angles and reflections):

us_hats_pic <- readImage('us_hats.jpg')

plot of chunk unnamed-chunk-8

us_hats = getGoogleVisionResponse('us_hats.jpg',
                                      feature = 'FACE_DETECTION')
##                                 vertices
## 1 295, 410, 410, 295, 164, 164, 297, 297
## 2 353, 455, 455, 353, 261, 261, 381, 381
##                                 vertices
## 1 327, 402, 402, 327, 206, 206, 280, 280
## 2 368, 439, 439, 368, 298, 298, 370, 370
## landmarks...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           landmarks
##   rollAngle panAngle tiltAngle detectionConfidence landmarkingConfidence
## 1  7.103324 23.46835 -2.816312           0.9877176             0.7072066
## 2  2.510939 -1.17956 -7.393063           0.9997375             0.7268016
##   joyLikelihood sorrowLikelihood angerLikelihood surpriseLikelihood
##   underExposedLikelihood blurredLikelihood headwearLikelihood
us_hats_pic <- readImage('us_hats.jpg')

xs1 = us_hats$fdBoundingPoly$vertices[[1]][1][[1]]
ys1 = us_hats$fdBoundingPoly$vertices[[1]][2][[1]]

xs2 = us_hats$fdBoundingPoly$vertices[[2]][1][[1]]
ys2 = us_hats$fdBoundingPoly$vertices[[2]][2][[1]]


plot of chunk unnamed-chunk-10

Here’s a shot that should be familiar (copied directly from my last blog) – and I wanted to highlight the different features that can be detected. Look at how many points are perfectly placed:

my_face_pic <- readImage('my_face.jpg')

plot of chunk unnamed-chunk-11

my_face = getGoogleVisionResponse('my_face.jpg',
                                      feature = 'FACE_DETECTION')
##                               vertices
## 1 456, 877, 877, 456, NA, NA, 473, 473
##                               vertices
## 1 515, 813, 813, 515, 98, 98, 395, 395
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             landmarks
## landmarks ...
##    rollAngle  panAngle tiltAngle detectionConfidence landmarkingConfidence
## 1 -0.6375801 -2.120439  5.706552            0.996818             0.8222974
##   joyLikelihood sorrowLikelihood angerLikelihood surpriseLikelihood
##   underExposedLikelihood blurredLikelihood headwearLikelihood
## [[1]]
##                            type position.x position.y    position.z
## 1                      LEFT_EYE   598.7636   192.1949  -0.001859295
## 2                     RIGHT_EYE   723.1612   192.4955  -4.805475700
## 3          LEFT_OF_LEFT_EYEBROW   556.1954   165.2836  15.825399000
## 4         RIGHT_OF_LEFT_EYEBROW   628.8224   159.9029 -23.345352000
## 5         LEFT_OF_RIGHT_EYEBROW   693.0257   160.6680 -25.614508000
## 6        RIGHT_OF_RIGHT_EYEBROW   767.7514   164.2806   7.637372000
## 7         MIDPOINT_BETWEEN_EYES   661.2344   185.0575 -29.068363000
## 8                      NOSE_TIP   661.9072   260.9006 -74.153710000
my_face_pic <- readImage('my_face.jpg')

xs1 = my_face$fdBoundingPoly$vertices[[1]][1][[1]]
ys1 = my_face$fdBoundingPoly$vertices[[1]][2][[1]]

xs2 = my_face$landmarks[[1]][[2]][[1]]
ys2 = my_face$landmarks[[1]][[2]][[2]]

points(x=xs2,y=ys2,lwd=2, col='lightblue')

plot of chunk unnamed-chunk-14

Logo Detection

To continue along the Chicago trip, we drove by Wrigley field and I took a really bad photo of the sign from a moving car as it was under construction. It’s nice because it has a lot of different lines and writing the Toyota logo isn’t incredibly prominent or necessarily fit to brand colors.

This call returns:

  • Description = Brand name of the logo detected
  • Score = Confidence of prediction accuracy
  • Bounding Poly = (Again) coordinates of the logo
wrigley_image <- readImage('wrigley_text.jpg')

plot of chunk unnamed-chunk-15

wrigley_logo = getGoogleVisionResponse('wrigley_text.jpg',
                                   feature = 'LOGO_DETECTION')
##           mid description     score                               vertices
## 1 /g/1tk6469q      Toyota 0.3126611 435, 551, 551, 435, 449, 449, 476, 476
wrigley_image <- readImage('wrigley_text.jpg')
xs = wrigley_logo$boundingPoly$vertices[[1]][[1]]
ys = wrigley_logo$boundingPoly$vertices[[1]][[2]]

plot of chunk unnamed-chunk-17

Text Detection

I’ll continue using the Wrigley Field picture. There is text all over the place and it’s fun to see what is captured and what isn’t. It appears as if the curved text at the top “field” isn’t easily interpreted as text. However, the rest is caught and the words are captured.

The response sent back is a bit more difficult to interpret than the rest of the API calls – it breaks things apart by word but also returns everything as one line. Here’s what comes back:

  • Locale = language, returned as source
  • Description = the text (the first line is everything, and then the rest are indiviudal words)
  • Bounding Poly = I’m sure you can guess by now
wrigley_text = getGoogleVisionResponse('wrigley_text.jpg',
                                   feature = 'TEXT_DETECTION')
##   locale
## 1     en

##                                                                                                        description
##                                 vertices
## 1   55, 657, 657, 55, 210, 210, 852, 852
## 2 343, 482, 484, 345, 217, 211, 260, 266

wrigley_image <- readImage('wrigley_text.jpg')

for(i in 1:length(wrigley_text$boundingPoly$vertices)){
  xs = wrigley_text$boundingPoly$vertices[[i]]$x
  ys = wrigley_text$boundingPoly$vertices[[i]]$y

plot of chunk unnamed-chunk-19

That’s about it for the basics of using the Google Vision API with the RoogleVision library. I highly recommend tinkering around with it a bit, especially because it won’t cost you a dime.

While I do enjoy the math under the hood and the thinking required to understand alrgorithms, I do think these sorts of API’s will become the way of the future for data science. Outside of specific use cases or special industries, it seems hard to imagine wanting to try and create algorithms that would be better than ones created for mass consumption. As long as they’re fast, free and accurate, I’m all about making my life easier! From the hiring perspective, I much prefer someone who can get the job done over someone who can slightly improve performance (as always, there are many cases where this doesn’t apply).

Please comment if you are utilizing any of the Google API’s for business purposes, I would love to hear it!

As always you can find this on my GitHub


  1. Thanks for the post! Setting numResults does not have an effect for me. Does it work for you?

  2. No Problem! Actually, it doesn’t work for me either… I initially used it at 5 and that’s already the default. However, it doesn’t appear to make any difference. You can open up an issue and/or contribute to RoogleVision repository as well. Good catch.

  3. This looks like good fun. In response to

    “Please comment if you are utilizing any of the Google API’s for business purposes”

    I’ve wrapped Google Maps API into R (package: googleway), so you can plot A Google Map of the locations too (and everything that comes with it (directions, streetview, elevation, etc)).
    Not taking anything away from leaflet though; I’m a huge fan of that too!

  4. I’ve tested out the Google Maps API – it definitely has some advantages. It’s awesome for doing route planning and that sort of thing that I’ve never been able to get out of Leaflet, but you do sacrifice speed if you want those advanced capabilities.

    I hope someone does something fun with the Natural Language API. I just started writing a small package for Microsoft’s Cognitive Services API – they have a response variable which returns a “caption” for the photo.

  5. Hi! Very impressive and comprehensive. Anyway I tried to execute getGoogleVisionResponse, but I got this error message:

    Request failed [400]. Retrying in 1.1 seconds…
    Request failed [400]. Retrying in 1.7 seconds…
    2017-11-21 11:27:32> Request Status Code: 400
    Error: API returned: Invalid value at ‘requests[0].features[0].type’ (TYPE_ENUM), “FACE DETECTION”

    Do you have ideas why this occurred? Many thanks 🙂

  6. Pingback: movies
  7. It’s actually a cool and helpful piece of info. I am happy that you simply shared this useful info with us. Please stay us informed like this. Thank you for sharing…

  8. Great blog here! Also your web site loads up fast! What host are you using? Can I get your affiliate link to your host? I wish my site loaded up as quickly as yours lol

Leave a Reply

Your email address will not be published.