I admit it, even I, when reading the acronym “CV”, feel a bit confused trying to place it in the context of a speech on innovation… a Curriculum Vitae! A bit like when we read “File” or “Code” in an Italian text and mentally pronounce it the English way, and then we laugh at ourselves and our mental deformation. We computing experts are like that, we are happy with very little.
And anyway we should be trained: CV (which, obviously stands for “Computer Vision”) is an increasingly mentioned acronym in a great variety of fields; if few years ago it was still a niche, prerogative of particular applications and very costly systems, now it is spreading in everyday “computing life”.
Obviously, this spread of Computer Vision moves alongside artificial intelligence “commoditization”: we have already talked about how tecnological tools make it now possible to extract content from the most disparate images, but it is worth mentioning how this field has now become accessible to many more users.
In other words, in the past a company using image identification probably did it as it was “its trade”, and licensed its solution – which probably covered few use cases – to its customers, who then introduced it in their applications. However CV is now available as technology, under various ways and solutions; there are still obviously the ad-hoc solutions, but the greatest Cloud suppliers provide very sophisticated services, using which a vast range of applications can be realized. On the other side of the spectrum we have libraries available in open source (and free), like OpenCV, which anybody can use. In particular, companies devoted to other fields can much more easily adopt the one which has indeed become an accessible technology, and “bend” it to their own purposes.
It is a great moment, because potentialities are enormous. I think many must have seen Google Lens: incorporated to many photocameras in Android systems, when activated it superimposes to the image framed a series of “active areas” the system recognizes. If I frame my dining room, it recognizes the coffee table (providing link to Ikea…oh, yes, I have got an Ikea coffee table), the water bottle on it (do I wish to buy it online? No, thanks!), my Macbook, and also allows me to text a message on the screen showing the title of a book nearby. Amazing.
Yes, OK, but what about us? We have already started to carry out concrete projects: a vision system realized with OpenCV has allowed us to substitute the photocells we use in RFID systems to assess whether a gate is committed to incoming or outgoing processes; besides direction, it also allows to know whether it is a man or a pallet which is passing through and therefore behave accordingly. All this by using just a webcam worth a few euro tens. Extending that to manufacturing is rather simple, but the next step we would like to take is move onto retail.
At the moment, .onRetail – our Mass Market Retailers solution – recognizes products by reading barcodes or by manual search. Indeed, nothing wrong with this: store inventories are already highly optimized; but various interesting possibilities open up to us. Framing the price label front and by means of text identification getting to know what is being inventoried? It is already possible to do it. Framing the product itself and recognizing it directly? That would be a great successive step. But also integrating customers’ flow detection and maybe customers’ moods (always taking privacy into account), individualizing store “hot areas”, maybe classified by day periods or by promotions on a given goods category to analyze their efficacy… these are just some ideas which could easily be integrated to add value to the context described above.