Microsoft Cognitive Services Vision API in R

Microsoft Cognitive Services Vision API in R


A little while ago I did a brief tutorial of the Google Vision API using RoogleVision created by Mark Edmonson. I couldn’t find anything similar to that in R for the Microsoft Cognitive Services API so I thought I would give it a shot. I whipped this example together quickly to give it a proof-of-concept but I could certainly see myself building an R package to support this (unless someone can point to one – and please do if one exists)!

A quick example, sending this image retrieved the location of the human face and created a caption! Here’s my dog lined up next to his doppelganger:

 

plot of chunk unnamed-chunk-1

 

The API is extremely easy to access using RCurl and httr. There are a lot of options which can be accessed. In this example, I’ll just cover the basics of image detection and descriptions.

If you don’t want to spend time writing a bunch of code, you can simply use the “helper_functions.R” file I created and swap out your credentials and API endpoint to get it working.

Getting Started With Microsoft Cognitive Services

In order to get started, all you need is an Azure Account which is free if you can keep yourself under certain thresholds and limits. There is even a free trial period (at the time this was written, at least).

Once that is taken care of there are a few things you need to do:

  1. Login to portal.azure.com
  2. On the lefthand menu click “Add”
    Figure 1
  3. Click on “AI + Cognitive Services” and then the “Computer Vision API”
    Figure 2
  4. Fill out the information required. You may have “Free Trial” under Subscription. Pay special attention to Location because this will be used in your API script
    Figure 3
  5. In the lefthand menu, click “Keys” underneath “Resource Management” and you will find what you need for credentials. Underneath your Endpoint URL, click on “Show access keys…” – copy your key and use it in your script (do not make this publicly accessible)
    Figure 4
  6. You’re ready to go!

Now, you can write a script to utilize the power of the Microsoft Cognitive Vision API.

What data can you get?

There are a lot of features you can request. I’m only asking for: description, tags, categories, and faces. You can also return: image type, color, and adult. There are also details which can be returned such as: landmarks and celebrities.

Here is the setup and API call:

image_url = 'https://imgur.com/rapIn0u.jpg'
visualFeatures = "Description,Tags,Categories,Faces"
# options = "Categories, Tags, Description, Faces, ImageType, Color, Adult"

details = "Landmarks"
# options = Landmarks, Celebrities

reqURL = paste(api_endpoint_url,
               "?visualFeatures=",
               visualFeatures,
               "&details=",
               details,
               sep="")

APIresponse = POST(url = reqURL,
                   content_type('application/json'),
                   add_headers(.headers = c('Ocp-Apim-Subscription-Key' = api_key)),
                   body=list(url = image_url),
                   encode = "json") 

df = content(APIresponse)

The dataframe returned looks messy, but isn’t too bad once you dive in. Take a look:

str(df)
## List of 5
##  $ tags       :List of 6
##   ..$ :List of 2
##   .. ..$ name      : chr "dog"
##   .. ..$ confidence: num 0.987
##   ..$ :List of 3
##   .. ..$ name      : chr "mammal"
##   .. ..$ confidence: num 0.837
##   .. ..$ hint      : chr "animal"
##   ..$ :List of 2
##   .. ..$ name      : chr "looking"
##   .. ..$ confidence: num 0.814
##   ..$ :List of 2
##   .. ..$ name      : chr "animal"
##   .. ..$ confidence: num 0.811
##   ..$ :List of 2
##   .. ..$ name      : chr "posing"
##   .. ..$ confidence: num 0.54
##   ..$ :List of 2
##   .. ..$ name      : chr "staring"
##   .. ..$ confidence: num 0.165
##  $ description:List of 2
##   ..$ tags    :List of 18
##   .. ..$ : chr "dog"
##   .. ..$ : chr "mammal"
##   .. ..$ : chr "looking"
##   .. ..$ : chr "animal"
##   .. ..$ : chr "photo"
##   .. ..$ : chr "posing"
##   .. ..$ : chr "camera"
##   .. ..$ : chr "man"
##   .. ..$ : chr "standing"
##   .. ..$ : chr "smiling"
##   .. ..$ : chr "face"
##   .. ..$ : chr "white"
##   .. ..$ : chr "holding"
##   .. ..$ : chr "close"
##   .. ..$ : chr "wearing"
##   .. ..$ : chr "laying"
##   .. ..$ : chr "head"
##   .. ..$ : chr "teeth"
##   ..$ captions:List of 1
##   .. ..$ :List of 2
##   .. .. ..$ text      : chr "a close up of Albert Einstein and a dog posing for the camera"
##   .. .. ..$ confidence: num 0.892
##  $ requestId  : chr "2143e23a-14c8-47c4-9750-9bfc82381512"
##  $ metadata   :List of 3
##   ..$ width : int 824
##   ..$ height: int 824
##   ..$ format: chr "Jpeg"
##  $ faces      :List of 1
##   ..$ :List of 3
##   .. ..$ age          : int 73
##   .. ..$ gender       : chr "Male"
##   .. ..$ faceRectangle:List of 4
##   .. .. ..$ left  : int 505
##   .. .. ..$ top   : int 241
##   .. .. ..$ width : int 309
##   .. .. ..$ height: int 309

The top 5 descriptions returned are:

description_tags = df$description$tags
description_tags_tib = tibble(tag = character())
for(tag in description_tags){
  for(text in tag){
     if(class(tag) != "list"){  ## To remove the extra caption from being included
      tmp = tibble(tag = tag)
      description_tags_tib = description_tags_tib %>% bind_rows(tmp)
    } 
  }
}

knitr::kable(description_tags_tib[1:5,])
tag
dog
mammal
looking
animal
photo

The caption returned:

captions = df$description$captions
captions_tib = tibble(text = character(), confidence = numeric())
for(caption in captions){
  tmp = tibble(text = caption$text, confidence = caption$confidence)
  captions_tib = captions_tib %>% bind_rows(tmp)
}
knitr::kable(captions_tib)
text confidence
a close up of Albert Einstein and a dog posing for the camera 0.891846

The metadata returned:

metadata = df$metadata
metadata_tib = tibble(width = metadata$width, height = metadata$height, format = metadata$format)
knitr::kable(metadata_tib)
width height format
824 824 Jpeg

The locations of faces returned:

faces = df$faces
faces_tib = tibble(faceID = numeric(),
                   age = numeric(), 
                   gender = character(),
                   x1 = numeric(),
                   x2 = numeric(),
                   y1 = numeric(),
                   y2 = numeric())

n = 0
for(face in faces){
  n = n + 1
  tmp = tibble(faceID = n,
               age = face$age, 
               gender = face$gender,
               x1 = face$faceRectangle$left,
               y1 = face$faceRectangle$top,
               x2 = face$faceRectangle$left + face$faceRectangle$width,
               y2 = face$faceRectangle$top + face$faceRectangle$height)
  faces_tib = faces_tib %>% bind_rows(tmp)
}
#faces_tib
knitr::kable(faces_tib)
faceID age gender x1 x2 y1 y2
1 73 Male 505 814 241 550

A few more examples:

plot of chunk unnamed-chunk-8

plot of chunk unnamed-chunk-9

plot of chunk unnamed-chunk-10

plot of chunk unnamed-chunk-11

plot of chunk unnamed-chunk-12

As always, you can find the code I used on my GitHub.

Side note: I used a ton of for loops to access the data – not ideal… please let me know if you have better ways of dealing with nested data like this when you have unknown numbers of levels.