How to use the Google Vision API

Happy or sad? True cat or person? Use the Google API to detect details about images

How to use the Google Vision API
Thinkstock

Recently, I covered how computers tin run into, hear, feel, odor, and taste. Ane of the means your code tin can "see" is with the Google Vision API. Google Vision API connects your code to Google'due south image recognition capabilities. You can recall of Google Paradigm Search every bit a kind of API/REST interface to images.google.com, just it does much more than than bear witness you similar images.

Google Vision can observe whether you're a cat or a man, too as the parts of your face up. Information technology tries to detect whether yous're posed or doing something that wouldn't be okay for Google Condom Search—or not. It even tries to find if you're happy or lamentable.

Setting upwardly the Google Vision API

To use the Google Vision API, you take to sign upwards for a Google Compute Engine Business relationship. GCE is free to endeavor only y'all will need a credit bill of fare to sign up. From there y'all select a project (but My First Project is selected if you have only signed up). Then get yourself an API cardinal from the lefthand card.

google vision api screen 1 IDG

Here, I'thousand using a simple API key that I can use with the command line tool Roll (if you adopt, you can use a different tool able to call REST APIs):

google vision api screen 2 IDG

Save the key it generates to a text file or buffer somewhere (I refer to it equally YOUR_KEY for now on) and enable the API on your project (become to this URL and click Enable the API):

google vision api screen 3 IDG

Select your project from the next screen:

google vision api screen 4 IDG

Now you're ready to go! Stick this text in a file called google_vision.json:

{  "requests":[     {       "epitome":{       "source":{       "imageUri":       "https://upload.wikimedia.org/wikipedia/eatables/9/9b/Gustav_chocolate.jpg"          }      },        "features": [{          "blazon": "TYPE_UNSPECIFIED",          "maxResults": 50      },        {          "type": "LANDMARK_DETECTION",          "maxResults": 50      },        {          "blazon": "FACE_DETECTION",          "maxResults": l      },        {          "type": "LOGO_DETECTION",          "maxResults": fifty      },        {          "blazon": "LABEL_DETECTION",          "maxResults": 50      },        {          "type": "TEXT_DETECTION",          "maxResults": 50      },        {          "blazon": "SAFE_SEARCH_DETECTION",          "maxResults": 50      },        {          "type": "IMAGE_PROPERTIES",          "maxResults": 50      },        {          "type": "CROP_HINTS",          "maxResults": l      },        {          "type": "WEB_DETECTION",          "maxResults": 50      }     ]    }   ] }

This JSON asking tells the Google Vision API which image to parse and which of its detection features to enable. I only did most of them up to 50 results.

Now use Curl:

curlicue -v -s -H "Content-Blazon: application/json" https://vision.googleapis.com/v1/images:annotate?central=YOUR_KEY --data-binary @google_vision.json > results              

Looking at the Google Vision API response

* Connected to vision.googleapis.com (74.125.196.95) port 443 (#0) * TLS 1.2 connection using TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 * Server certificate: *.googleapis.com * Server certificate: Google Internet Authority G3 * Server certificate: GlobalSign > Postal service /v1/images:annotate?primal=YOUR_KEY HTTP/1.ane > Host: vision.googleapis.com > User-Amanuensis: coil/seven.43.0 > Have: */* > Content-Type: application/json > Content-Length: 2252 > Wait: 100-proceed > * Done waiting for 100-continue } [2252 bytes data] * Nosotros are completely uploaded and fine < HTTP/i.ane 200 OK < Content-Type: awarding/json; charset=UTF-viii < Vary: Ten-Origin < Vary: Referer < Date: Tue, 24 Apr 2018 xviii:26:10 GMT < Server: ESF < Enshroud-Control: private < 10-XSS-Protection: 1; mode=cake < Ten-Frame-Options: SAMEORIGIN < 10-Content-Type-Options: nosniff < Alt-Svc: hq=":443"; ma=2592000; quic=51303433; quic=51303432; quic=51303431; quic=51303339; quic=51303335,quic=":443"; ma=2592000; 5="43,42,41,39,35" < Accept-Ranges: none < Vary: Origin,Accept-Encoding < Transfer-Encoding: chunked <  { [905 bytes information] * Connection #0 to host vision.googleapis.com left intact

Yous should see something similar this:

If you expect in results, y'all'll see this:

{  "responses": [    {     "labelAnnotations": [     {     "mid": "/m/01yrx",     "description": "cat",     "score": 0.99524164,     "topicality": 0.99524164     },     {     "mid": "/m/035qhg",     "description": "fauna",     "score": 0.93651986,     "topicality": 0.93651986     },     {     "mid": "/m/04rky",     "description": "mammal",     "score": 0.92701304,     "topicality": 0.92701304     },     {     "mid": "/m/07k6w8",     "description": "small to medium sized cats",     "score": 0.92587274,     "topicality": 0.92587274     },     {     "mid": "/g/0307l",     "description": "true cat like mammal",     "score": 0.9215815,     "topicality": 0.9215815     },     {     "mid": "/g/09686",     "clarification": "vertebrate",     "score": 0.90370363,     "topicality": 0.90370363     },     {     "mid": "/m/01l7qd",     "clarification": "whiskers",     "score": 0.86890864,     "topicality": 0.86890864 …

Google knows you have supplied it a true cat motion picture. Information technology even constitute the whiskers!

At present, I'll try a larger mammal. Supercede the URL in the request with my Twitter profile moving-picture show and run it once more. Information technology has a picture of me getting smooched past an elephant on my 2014 trip to Thailand.

The results will include locations of my facial features.

…             "landmarks": [             {            "type": "LEFT_EYE",            "position": { "ten": 114.420876, "y": 252.82072, "z": -0.00017215312            }             },             {            "type": "RIGHT_EYE",            "position": { "x": 193.82027, "y": 259.787, "z": -iv.495486            }             },             {            "type": "LEFT_OF_LEFT_EYEBROW",            "position": { "ten": 95.38249, "y": 234.60289, "z": 11.487803            }             }, …

Google isn't equally great at judging emotion equally facial features:

"rollAngle": 5.7688847, "panAngle": -3.3820703, "joyLikelihood": "UNLIKELY", "sorrowLikelihood": "VERY_UNLIKELY", "angerLikelihood": "UNLIKELY", "surpriseLikelihood": "VERY_UNLIKELY", "underExposedLikelihood": "VERY_UNLIKELY", "blurredLikelihood": "VERY_UNLIKELY", "headwearLikelihood": "VERY_UNLIKELY"

I was definitely surprised, because I was not expecting the kiss (I was simply aiming for a selfie with the elephant). The picture may show a bit joy combined with "yuck" because elephant-snout kisses are messy and a chip slimy.

Google Vision likewise noticed some other things nigh the picture and me:

{ "mid": "/m/0jyfg", "description": "glasses", "score": 0.7390568, "topicality": 0.7390568 }, { "mid": "/m/08g_yr", "description": "temple", "score": 0.7100323, "topicality": 0.7100323 }, { "mid": "/m/05mqq3", "description": "snout", "score": 0.65698373, "topicality": 0.65698373 }, { "mid": "/m/07j7r", "description": "tree", "score": 0.6460454, "topicality": 0.6460454 }, { "mid": "/m/019nj4", "description": "grinning", "score": 0.60378826, "topicality": 0.60378826 }, { "mid": "/m/01j3sz", "description": "laughter", "score": 0.51390797, "topicality": 0.51390797 } ] …

Google recognized the elephant snout! It likewise noticed I'm grinning and that I'm laughing. Note the lower scores indicate lower confidence, simply it is good that the Google Vision API noticed.

… "safeSearchAnnotation": { "adult": "VERY_UNLIKELY", "spoof": "POSSIBLE", "medical": "VERY_UNLIKELY", "violence": "UNLIKELY", "racy": "UNLIKELY"   } …

Google doesn't believe that this is more a platonic osculation and realizes that I'one thousand not existence harmed by the elephant.

Bated from this, yous'll find things like matching images and similar images in the response. Y'all'll likewise detect topic associations. For example, I tweeted one time time most a "Xennials" article, and now I'm associated with it!

How is the Google Vision API useful?

Whether you're working in security or retail, beingness able to figure out what something is from an image can be fundamentally helpful. Whether you lot're trying to figure out what true cat breed you have or who this customer is or whether Google thinks a columnist is influential in a topic, the Google Vision API tin can help. Note that Google's terms just allow this API to be used in personal computing applications. Still whether you lot're adoring data in a search application or checking whether user submitted content is racy or not, Google Vision might exist just what y'all need.

While I used the version of the API that uses public URIs, y'all can likewise postal service raw binary or Google Cloud Storage file locations using different permutations.

Writer's note: Thanks to my colleague at Lucidworks, Roy Kiesler , whose research contributed to this article.

Copyright © 2018 IDG Communications, Inc.