About this sample
About this sample
Words: 1861 |
10 min read
Published: Aug 31, 2023
Words: 1861|Pages: 4|10 min read
When it comes to storing images on a computer, anyone somewhat skilled with computers and the Internet would know that the computer just stores the images as millions of pixels and bytes. Computer scientists would say that images are just arrays of bytes. There is no reason to think that the computer would know exactly who or what the image is depicting because the image is only stored using bytes that differ depending on the RGB values and transparency. This changes completely with the developments of computer vision in the last sixty years. Computer vision is a subfield of the rapidly growing field of artificial intelligence. As the name implies, computer vision is a field in which humans attempt to replicate mankind’s ability to perceive digital imagery and videos onto a computer. Sight, something that most humans take for granted, is incredibly difficult to replicate, especially on a machine that naturally speaks in binary. The difficulty of computer vision seems warranted because it has many beneficial and useful applications in the workplace and personal lives.
Computer vision is something that is not natural for computers and yet seems necessary for machines in this era of technology. As stated by Richard Szeliski, the founding Director of the Computational Photography Group at Facebook, in his book Computer Vision: Algorithms and Applications, computer vision is so difficult, particularly because it is an inverse problem (Szeliski, 2010). Being an inverse problem makes it much more difficult than computer graphics for gaming or computer-generated imagery for movies, both of which seemed impossible for decades. These forward problems and can be solved by heavily analyzing the physics and model of an object in the physical world and applying it to the generated images (Szeliski, 2010). In particular, computer vision would involve having an actual snippet of the physical world and describing the said snippet. Szeliski shed light on the scale of difficulty of computer vision by mentioning that even having computer vision function at the same level as a two-year-old was still only an elusive dream at the time his book was published in 2010 (Szeliski, 2010). In just under a decade, computer vision has undergone a breakthrough and has gone above and beyond that goal of topping a two-year-old in image interpretation. This essentially made computers better at recognizing images than humans which seems unbelievable because imitations are usually worse than the original. Computer vision has only got so far in the last few years because of machine learning, and in particular, deep learning.
Computer vision was only so hard in the last couple of decades because computer scientists did not approach the problem with machine learning. Incorporating machine learning with computer vision made the problem much simpler. In 2012, a competition-winning network named AlexNet completely redefined what made up computer vision (Szegedy, Vanhoucke, Iogge, Shlens & Wojna, 2016). This network was eventually applied to many projects in computer vision and led to the first usage of machine learning and deep learning. The use of deep convolutional neural networks and other forms of deep learning methods and techniques provided significant improvements to the performance of computer vision. However, the convolutional neural network architecture is more widely used because it functions correctly even when the image’s orientation is modified. This is particularly important for many applications of computer vision such as image recognition, facial recognition, and scanning labels.
Athanasios Voulodimos, an Assistant Professor researching computer vision in the Department of Informatics and Computer Engineering at the University of West Attica in Greece, said that deep learning methods have eclipsed basic machine learning techniques, especially in computer vision (Voulodimos, Doulamis, Doulamis & Protopapadakis, 2018). To imitate a human function like perceiving an image, one must also imitate the way a human’s brain functions. Voulodimos and his team of three affiliates of the National Technical University of Athens state that this was done mostly by using the Deep Convolutional Neural Network architecture (Voulodimos et al., 2018). This architecture involves obtaining the input and putting the input through three types of layers called the convolutional layers, the pooling layers, and the fully connected layers. The process of going through these three types of layers will map the input into feature-maps in which the machine can understand, reduce and simplify such feature-maps and then use the neural networks to imitate what happens in a human’s brain when a human sees an image (Voulodimos et al., 2018). Christian Szegedy, a researcher at Google focused on computer vision via deep learning, demonstrated the effectiveness of the architecture in 2016. With a unique form of this architecture, Szegedy and his team were able to reduce the error rate for classifying an image goes down to only 21.2% with one attempt, and 5.6% with five attempts (Szegedy et al., 2016). This is significantly better than computer vision with state-of-the-art machine learning and especially better than computer vision in 2010 when even a two-year-old could beat a computer with image classification. The reliability of computer vision has gone from around 50% before 2012 to near 99% now in only a couple of years. This will only get higher as computer vision gets more developed by improving the convolutional neural network architecture via reducing computation costs and optimizing the mapping and reducing of the feature maps (Szegedy et al, 2016).
Seeing how hard computer vision was to perfect, there should be a lot of groundbreaking applications for it. One of the most useful applications of computer vision is enabling robots to see and perceive their immediate surroundings. This allows the robots to work more flexibly and get rid of the constraint of a rigid script to do an action. Weiwei Wan, a research scientist at the National Institute of Industrial Science and Technology, worked on such a robot with his team in 2016. He wanted to develop an “intelligent robot assembly system using multi-modal vision” that could replicate a series of human actions by looking at the human actions using a live 3D camera (Wan, Lu & Harada, 2016). Before deep learning computer vision, this would be impossible because it is still hard even with it. According to Wan, the main difficulty is the faulty precision of the visual detection of the human movements, which requires both quick and accurate 3D image analysis (Wan et al., 2016). Implementing such a robot assembly system would be incredibly innovative because the system would be able to mirror a skilled worker’s actions perfectly without requiring months and maybe years of training to achieve the same level. This process would be multi-purpose because it follows any action, not just a preset sequence of actions. Fine-tuning this system would make it more efficient than even the most skilled human workers of any industrial manufacturing assembly.
Another useful application that has recently started to skyrocket in popularity is facial recognition/detection. In particular, Apple uses a facial recognition system called FaceID to provide convenience to users of iPhone X and later iPhones, which use iOS 10 and later (Apple 2017). Facial recognition was previously not feasible because it was inaccurate and slow. With deep learning models, the Apple Computer Vision Machine Learning Team made it happen by encrypting and decrypting the images used for facial recognition and sending it to and from an image processing unit (Apple 2017). This ensures that the facial recognition process is fast, accurate and memory-efficient at the same time. The process of encrypting and decrypting the images protects user privacy, which is an important matter regarding computer vision. Users want to enjoy both the convenience of computer vision’s facial recognition while maintaining their privacy at the same time. With FaceID, users can unlock their phones without needing to type in a passcode every time they want to access it from a locked state. This provides convenience and reduces frustration and time wasted.
An application of computer vision that might be surprising to some people is protecting privacy. There are general trends and agreements that state having better technology means less privacy for users because the technology can get too personalized and that personalized information can be sent to databases elsewhere. However, applying computer vision to surveillance jobs can help protect privacy by replacing the humans who analyze such surveillance camera footage. This is better than having no surveillance at all or having a human analyze surveillance footage. According to Andrew Tzer-Yeu Chen, a member of Department of Electrical and Computer Engineering in the University of Auckland, the human brain will “inadvertently collect far more information” than told, which is not necessary especially for surveillance designed for catching criminals (Chen, Biglari-Abhari & Wang, 2017). Machines with quick and accurate computer vision fix this problem by doing the same job without any human interference. Chen stated that although this might not be possible with current technologies yet, it is the best approach to protecting privacy because it is privacy-affirming (Chen et al., 2017). This means that the footage is viewed entirely by machines and no archives are kept. Current surveillance by humans mostly has human interpreters, have archives and no censoring, which is a breach of privacy (Chen et al., 2017). Like all applications of computer vision and technology in general, this requires having trust and faith in the technology. A lot more testing and development need to be done before we can have full confidence in doing such things.
Despite computer vision being so powerful and applicable to many things, it only has one main negative for society. This negative is the breach of privacy if used incorrectly. An example of this would be when the surveillance cameras keep an archive of the footage (Chen et al, 2017). This allows for companies and corporations to access such information even when they are not supposed to. There needs to be a lot of trust and respect for this to not occur.
Computer vision is a subfield of artificial intelligence that has been gaining traction over the last few decades, especially in the recent decade. A breakthrough in computer vision in the past decade allowed the previously impossibly hard problems to be solved via deep learning architectures. This opens many options for computer vision to be applied to places like the workplace or in personal lives. Because computer vision is still relatively new and is going to be applied to important roles, it will have to be fine-tuned before society can fully trust it to work as expected.
Browse our vast selection of original essay samples, each expertly formatted and styled
Where do you want us to send this sample?
Be careful. This essay is not unique
This essay was donated by a student and is likely to have been used and submitted before
Download this Sample
Free samples may contain mistakes and not unique parts
Sorry, we could not paraphrase this essay. Our professional writers can rewrite it and get you a unique paper.
Please check your inbox.
We can write you a custom essay that will follow your exact instructions and meet the deadlines. Let's fix your grades together!