Advanced Search:

Contact us

联系我们

Telephone:(852) 2838 3620

Email:sales@silverwing.com.hk

Address: Unit 2, 4/F, Kwai Cheong Centre, 50 Kwai Cheong Road, Kwai Chung, New Territories, Hong Kong

The status quo and trend of artificial intelligence chip development

Source: Time:2018-11-29 11:24:28 views:

        Since the Dartmouth Conference in 1956, the research on artificial intelligence (AI) has experienced two developments due to many factors such as intelligent algorithms, computing speed, storage level, etc. Speech recognition, computer vision and other fields have finally achieved a major breakthrough. The reason is that the industry generally believes that three major factors have contributed to this breakthrough: rich data resources, deep learning algorithms and sufficient computing power support. The rich data resources depend on the popularity of the Internet and the massive information that comes with it. The accuracy and robustness of machine learning algorithms represented by deep learning are getting better and better, and various algorithms suitable for different scenarios are continuously optimized and perfected. The potential for large-scale commercial applications is available; while sufficient computing power is due to the evolution of Moore's Law, high-performance chips dramatically reduce the computational time and cost of deep learning algorithms.

        Although Moore's Law is gradually slowing down, as the hardware foundation to promote the continuous improvement of artificial intelligence technology, the next 10 years will still be an important period for the development of artificial intelligence chips (AI chips). In the face of growing market demand, various types of Innovative design concepts and architectural innovations in artificial intelligence applications will continue to emerge.

AI chip overview

        There is currently no recognized standard for the definition of artificial intelligence chips. The more general view is that chips for AI applications can be called AI chips. They are mainly divided into three categories according to design ideas: acceleration chips for training and reasoning for machine learning, especially deep neural network algorithms; The brain-like biomimetic chip; a general-purpose AI chip that can efficiently calculate various artificial intelligence algorithms.

        In order to support a variety of AI computing tasks and performance requirements, the ideal AI chip needs to have highly parallel processing capabilities, capable of supporting bit, fixed and floating point calculations of various data lengths; several orders of magnitude larger than the current memory bandwidth, Stores massive amounts of data; low memory latency and a novel architecture to enable flexible and rich connectivity between computing components and memory. And all of this needs to be done with very low power consumption and extremely high energy efficiency.

        The algorithms and applications in the current field of artificial intelligence are still in the stage of rapid development and rapid iteration. Considering the development cost and production cycle of the chip, the customized design for specific applications, algorithms or scenarios is difficult to adapt to changes. Designed for specific areas and not for specific applications, will be a guiding principle for AI chip design. Reconfigurable AI chips can be widely used in more applications and can be adapted to new AI algorithms and architectures through reconfiguration. And tasks.

AI chip type and development

        Carver Mead at Caltech first began the study of AI chip, mimicry began to study the nervous system (neuromorphic electronic systems) in the 1980s, the use of biological neural systems mimic the structure of analog circuits. After more than 30 years of development, various types of AI chips have been born, including graphics processing units (GPUs), field-programmable gate arrays (FPGAs), and digital signal processing (digital). Signal processing (DSP), application specific integrated circuits (ASIC), many-core processors, neural mimic chips, and the like. In recent years, image recognition algorithms and speech recognition algorithms based on deep learning have achieved outstanding results, which have attracted extensive attention from academia and industry. With the Google artificial intelligence Go program AlphaGo defeating Li Shizhen and Ke Jie, it is also artificial intelligence. The heat is pushed to the whole society. Google's performance is inseparable from the contribution of the AI acceleration chip. From the initial AlphaGo CPU + GPU construction program to the latest generation AlphaGo Zero using a dedicated tensor processing unit (TPU), the chip changes brought The huge increase in computing speed and the sharp drop in power consumption. It can be seen that different types of AI chips often have advantages for different computing tasks.

AI acceleration chip

        Simply put, the AI acceleration chip refers to the acceleration of a certain type of specific algorithm or scene based on the existing chip architecture, so as to optimize the calculation speed, power consumption and cost in this particular scenario. It usually includes various algorithms based on deep neural networks, as well as tasks such as image recognition, video retrieval, speech recognition, voiceprint detection, search engine optimization, and autopilot. There are two main ideas for the design of AI acceleration chips: using existing GPUs, FPGAs, DSPs, and many-core processors to implement heterogeneous computing; designing dedicated ASIC chips.

GPU

        The GPU, or graphics processor, is a massively parallel computing architecture consisting of a large number of cores. It is designed to handle multiple tasks at the same time. The original function is to help the CPU handle the task of graphic display, especially 3D graphics display. In order to perform complex parallel computing and fast graphics rendering, the GPU has far more cores than the CPU, but each core has a relatively small cache, and the digital logic unit is simpler and more suitable for computationally intensive tasks. Intel's GPUs are primarily used as integrated graphics cards for Intel's motherboards and CPUs, while Nvidia and AMD have advantages in discrete graphics.

人工智能芯片


        The deep neural network training process is extremely computationally intensive, and the data and operations are highly parallel. The GPU has the ability to perform massive data parallel operations and is equipped with a large amount of computing resources for floating-point vector operations, and the demand for deep learning is not contrived. Therefore, it was first introduced into the running deep learning algorithm and became one of the main chips in the field of high performance computing. However, since the GPU cannot support complex program logic control, it still needs to use a high-performance CPU to form a complete computing system.

FPGA

        FPGA is the product of further development based on programmable logic devices such as PAL, GAL, and CPLD. It appears as a semi-custom circuit in the field of ASICs, which not only solves the shortcomings of the flexibility of the custom circuit, but also overcomes the shortcomings of the limited number of original programmable device gates. The FPGA uses the gate circuit to directly calculate the speed, and the user can freely define the wiring between these gate circuits and the memory, and change the execution scheme in order to obtain the best effect. FPGAs can use more efficient programming languages such as OpenCL, which reduces the difficulty of hardware programming. It also integrates important control functions, integrates system modules, and improves application flexibility. Compared with GPUs, FPGAs have more computing power and Lower power consumption.

        Currently, Xilinx, a major FPGA vendor, and Altera, which was acquired by Intel, have introduced FPGA hardware and software tools specifically for AI acceleration. And major cloud service providers, such as Amazon, Microsoft, and Alibaba Cloud, have launched dedicated cloud FPGA instances to support AI applications. Beijing Shenjian Technology Co., Ltd., which was just acquired by Xilinx in China, is also an accelerator architecture based on FPGA to design deep learning, which can be flexibly extended for server and embedded.

DSP

        A DSP is a processor consisting of large-scale integrated circuit chips used to perform certain signal processing tasks. DSP is good at measuring, calculating, filtering or compressing continuous real-world analog signals. It is widely used in communication and information systems, signal and information processing, automatic control, radar, aerospace, medical, household appliances and other fields. For filtering, matrix operations, FFT (fast Fourier transformation) and other features that require a large number of multiply-add operations, the DSP is equipped with independent multipliers and adders, which greatly increases the operation rate.

        There are many kinds of DSPs, and DSPs currently used in the AI field are mainly used for tasks such as image and video in visual systems, and are most common in the fields of automatic driving, security monitoring, drones and mobile terminals. These DSPs incorporate acceleration components tailored for deep neural networks, such as matrix multipliers and accumulators, fully connected active and pooled layers. Because of its high speed, flexibility, small size, low power consumption, and programmable nature, DSP is ideal for use in end devices such as cell phones and cameras.

Multicore processor

        The many-core processor uses a processor architecture that integrates multiple processing cores, primarily for high-performance computing, as a coprocessor for the CPU. Many-core processors are suitable for processing computationally intensive tasks with high levels of parallelism, such as gene sequencing, weather simulation, and more. The control logic and data types of the computational tasks supported by many-core processors are more complex than GPUs. After 2000, chip research in this field has been active, such as IBM CELL and Kalray MPPA. Intel's Xeon Phi processor is a typical many-core processor, and KNL, released in 2017, represents the leading edge of many-core processors.

        The structure of many-core processors can effectively utilize the higher degree of thread parallelism in applications such as modern networks and servers. Although chip area and power consumption increase with the number of cores, performance is also effectively increased. Techniques such as increasing the computational component and commanded emission width increase the chip area and lengthen the signal transmission line, significantly increasing the line delay. Therefore, the many-core processor is more suitable for various AI training and inference tasks deployed in the data center.

ASIC

        ASIC is a custom-designed chip designed for specific purposes. It has the advantages of higher performance, smaller size, lower power consumption, lower cost and more reliable performance in mass production. . ASICs are divided into full custom and semi-custom. Full custom design requires the designer to complete the design of all circuits, so it requires a lot of manpower and resources, and flexibility, but the development efficiency is low and the time cost is high. If the design is ideal, full customization can run faster than a semi-custom ASIC chip. Semi-customized to use the standard logic unit in the library, the design can select the gate, adder, comparator, data path, memory and even system-level modules and IP core from the standard logic cell library. These logic units have been laid out and designed. More reliable, the designer can complete the system design more conveniently.

        In recent years, more and more companies have begun to use ASIC chips for deep learning algorithm acceleration, the most prominent of which is Google's TPU. The main modules of the TPU include 24 MB of local memory, 6 MB of accumulator memory, 256 × 256 matrix multiplication units, nonlinear neuron calculation units, and computational units for normalization and pooling. The TPU is 15 to 30 times faster than the GPU or CPU in the same period, and the energy efficiency ratio is increased by 30 to 80 times. China's Beijing Cambrian Technology Co., Ltd., Beijing Bit Continental Technology Co., Ltd., Beijing Horizon Information Technology Co., Ltd. and other companies have also introduced ASIC chips for deep neural network acceleration. At present, there is no unified standard for DNN-based algorithms, and the algorithm is still evolving rapidly. Therefore, the design of ASIC needs to maintain certain programmability and adopt software and hardware collaborative design.

Brain-like biochip

        The mainstream concept of today's brain-like biochips is a neural mimicry chip that uses neural mimicry engineering. The neural mimicry chip uses electronic technology to simulate the operating rules of the biological brain that has been proven to construct an electronic chip similar to the biological brain, namely the "bionic electronic brain." The neuromimetic state mainly refers to the implementation of a neural network model by using analog, digital or analog-to-digital hybrid VLSI (also including new materials or electronic components of neuron or synaptic models) and software systems. Building a research on intelligent systems. Neuromimetic engineering has evolved into an interdisciplinary discipline encompassing neurobiology, physics, mathematics, computer science, and electrical engineering. Neuromorphic research has been carried out all over the world and has received the attention and support of governments, such as the American brain program, the European brain project, and China's brain-like computing program. Inspired by the results of brain structure research, complex neural networks are characterized by low power consumption, low latency, high-speed processing, and space-time integration.

        At present, the design methods of neural mimic chips are mainly divided into non-silicon and silicon technologies. Non-silicon mainly refers to neuromorphic chips built with new materials and devices such as memristors, and is still in the research stage. Representatives of analog integrated circuits are the ROLLS chip from the Swiss Federal Institute of Technology in Zurich and the BrainScales chip from the University of Heidelberg. Digital integrated circuits are divided into asynchronous synchronous hybrid and pure synchronous. The representative of the asynchronous (no global clock) digital circuit is IBM's TrueNorth, and the purely synchronous digital circuit represents the Tsinghua University's series of chips. In addition, for on-chip self-learning capabilities, Intel recently introduced the Loihi chip with its own on-chip learning capability, which transmits information through pulses or spikes, and automatically adjusts the synaptic strength to enable self-learning through various feedbacks in the environment. China's Shanghai Xijing Information Technology Co., Ltd. has also successfully produced chips with on-chip learning capabilities.

Universal AI chip

        Today's AI chips can greatly surpass human capabilities in certain specific tasks, but their versatility and adaptability are far from human intelligence, and most of them are in the acceleration phase of specific algorithms. The final result of the AI chip will be a general-purpose AI chip, and it is best to dilute the self-learning, adaptive chip of manual intervention. Therefore, the future general-purpose AI chip should include the following features.

1) Programmability: adapt to the evolution of algorithms and the diversity of applications.

2) Dynamic variability of the architecture: It can adapt to different algorithms and achieve efficient calculation.

3) Efficient architecture refactoring or self-learning capabilities.

4) High computational efficiency: Avoid the use of inefficient architectures such as instructions.

5) High energy efficiency: The energy consumption ratio is greater than 5 Tops/W (ie 5 x 1012 operations per watt).

6) Low cost and low power consumption: Ability to enter IoT devices and consumer electronics.

7) Small size: can be loaded on the mobile terminal.

8) Easy application development: no need for users to have knowledge of chip design.

        There is no real universal AI chip in existence, and the software defined chip based on reconfigurable computing architecture may be the way out for the general AI chip. Software-defined chips, as the name suggests, let the chip adapt and adjust according to the software. Simply put, the software is sent to the hardware through different pipelines to perform functions, so that the chip can change the architecture and functions according to the requirements of software, products and application scenarios in real time. For a more flexible chip design. The chip designed with this architecture allows the computing power of the chip to be adapted to the needs of the software, rather than following the rigid architecture of traditional chip design, allowing the application to adapt to the architecture.

        Reconfigurable computing technology allows hardware architecture and functionality to change with software changes, combining processor versatility with ASIC's high performance and low power consumption. It is the core of software-defined chips and is recognized as a breakthrough next generation. Integrated circuit technology. The AI chip Thinker designed by the Institute of Microelectronics of Tsinghua University uses a reconfigurable computing architecture to support multiple AI algorithms such as convolutional neural networks, fully connected neural networks, and recurrent neural networks. The Thinker chip implements software-defined chips through the following three levels of reconfigurable computing technology.

        1) Computational Array Reconstruction: The computational array of Thinker chips is interconnected by multiple parallel computing units. Each computing unit can perform functional reconstruction according to the basic operators required by the algorithm. In addition, in complex AI tasks, the computational resource requirements of multiple AI algorithms are different, so the Thinker chip supports the on-demand resource partitioning of computational arrays to improve resource utilization and energy efficiency.

        2) Storage bandwidth reconstruction: The on-chip storage bandwidth of the Thinker chip can be reconstructed according to the AI algorithm. The distribution of data within the storage is adjusted as the bandwidth changes to improve data reusability and computational parallelism, improving computational throughput and energy efficiency.

        3) Data Bit Width Reconstruction: The 16-bit data bit width is sufficient to meet the accuracy requirements of most applications. For some scenes where accuracy is not high, even 8 bit data bit width is sufficient. In order to meet the various precision requirements of the AI algorithm, the computation unit of the Thinker chip supports high/low (16/8 bit) data bit width reconstruction. The calculation accuracy is improved in the high bit mode, and the throughput of the computing unit is improved in the low bit mode to improve the performance.

        Reconfigurable computing technology is an important technology for implementing software-defined chips, and is very suitable for application in the design of AI chips. After adopting reconfigurable computing technology, the software-defined level is not limited to the functional level. The computational accuracy, performance and energy efficiency of the algorithm can be included in the software-defined scope. The reconfigurable computing technology realizes the software and hardware collaborative design by virtue of its real-time dynamic configuration, which brings extremely high flexibility and applicable range to the AI chip. The Thinker team's latest Thinker 2 face recognition chip can achieve 6 ms face recognition (iPhone X is 10 ms) with an accuracy rate of over 98%; and Thinker S speech recognition chip, which consumes only 200 μW, only needs The 7th AAA battery runs for 1 year and can be voiceprinted. The MIT Technology Review commented on the work of the Thinker team in a special issue in early 2018, arguing that this is China's top achievement.

AI chip market status

        In 2018, the global AI chip market is expected to exceed US$2 billion. With the arrival of Internet giants including Google, Facebook, Microsoft, Amazon, Baidu, Ali, and Tencent, the global market is expected to exceed US$10 billion by 2020. Among them, China's market size is nearly 2.5 billion US dollars, the growth is very rapid, and the development space is huge. At present, major chip companies around the world are actively working on the layout of AI chips. In the cloud, Nvidia's line of GPU chips is widely used in deep neural network training and reasoning. The Google TPU is open for commercial use in the form of cloud service Cloud TPU, with a processing capacity of 180 Tflops, 64 GB of HBM memory and 2400 Gbit/s of storage bandwidth. The old chip giant Intel introduced NervanaTM Neural Network Processors (NNP), which also optimizes neural network computing with 32 GB HBM2, 1 Tbit/s bandwidth and 8 Tbit/s access speed. Startups such as Graph core, Cerebras, Wave computing, Cambrian, Bitcoin, etc. have also joined the ranks of competition, and have launched chips and hardware systems for AI.

        For some applications, however, due to various reasons such as network latency, bandwidth, and privacy issues, inference must be performed on the edge nodes. For example, the inference of autonomous vehicles cannot be completed by the cloud. Otherwise, if there is a network delay, catastrophic consequences will occur; in a large city, a million high-definition cameras will be completed if the face recognition is completed by the cloud. The data transmission will overwhelm the communication network. In a considerable part of the artificial intelligence application scenario in the future, the terminal device at the edge is required to have sufficient inferred computing power. At present, the computing power of edge processor chips cannot meet the requirements of implementing deep neural network inference locally. The industry needs a specially designed AI chip, and Advane Energy gives the device enough power to cope with the increasing number of artificial intelligence applications in the future. In addition to the performance requirements, power consumption and cost are important constraints that AI chips that work at edge nodes must face.

        Smartphones are currently the most widely used edge computing terminal devices. Mobile phone chip manufacturers including Samsung, Apple, Huawei, Qualcomm and MediaTek have launched or are developing chip products that are specifically adapted to AI applications. In addition, many startups have joined the field to provide chip and system solutions for edge computing devices, such as the 1A processor of Beijing Zhongke Cambrian Technology Co., Ltd., the Rising Sun processor of Beijing Horizon Information Technology Co., Ltd., and Beijing Shenjian Technology. Ltd. DPU and so on. Traditional IP vendors, including ARM, Synopsys, and Cadence, have also developed proprietary IP products for edge computing devices including mobile phones, tablets, smart cameras, drones, industrial and service robots, and smart speakers. In addition, in the terminal application, there is also a gold mine of the intelligent Internet of Things. Only when the AI chip is implemented from the cloud to the terminal can the "Intelligence of Everything" be truly endowed.

AI chip future trends

        In the field of AI chips, there is no general-purpose AI chip in the CPU class. Artificial intelligence wants to be as popular as mobile payment, and it may be a "killer" level application. Whether it is image recognition, speech recognition, machine translation, security monitoring, traffic planning, autopilot, smart companion, smart Internet of things, etc., AI covers all aspects of people's production and life, but there is still a lot of AI application landing and large-scale commercialization. The long way to go. For chip practitioners, it is imperative to study chip architecture issues. Software is the core to achieve intelligence, and the chip is the foundation for supporting intelligence. In the current development of AI chips, heterogeneous computing is mainly used to accelerate the landing of various application algorithms in the short term; in the medium term, self-reconfiguration, self-learning, and adaptive chips should be developed to support the evolution of algorithms and human-like natural intelligence; Development towards the general AI chip.

General AI calculation

        The versatility of AI actually consists of two levels: the first level can handle arbitrary problems; the second level handles arbitrary problems at the same time. The goal of the first level is to allow AI's algorithms to handle different problems through different design, data, and training methods. For example, using the popular deep learning method to train AI chess, image recognition, speech recognition, behavior recognition, motion navigation, and the like. However, different tasks use different data sets to train independently. Once the training is completed, the model is only suitable for such tasks, and cannot be used to handle other tasks. Therefore, it can be said that the algorithm and training method of this AI are universal, and the model it trains to perform a certain task is not universal. The goal of the second level is to allow the trained model to handle multiple tasks at the same time. Just like a person, you can play chess, translate, and drive a car and cook. This goal is even more difficult, and no single algorithm is so versatile.

Universal AI chip

        A general-purpose AI chip is a chip that can support and accelerate general-purpose AI calculations. Research on general AI hopes to maximize the essence of intelligence through a common mathematical model. At present, the mainstream view is that the system can have universal utility maximization ability: that is, the system has general induction ability, can approach any approximation mode, and can maximize the benefit of a utility function by using the identified mode. This is a very academic language. If it is said in a colloquial manner, it is that the system can accurately and efficiently handle the tasks that any intelligent subject can handle through learning and training. There are two main difficulties in general AI: versatility, including algorithms and architecture; implementation complexity. At present, the two major technical challenges of the gradual failure of Moore's Law and the bottleneck of the von Neumann architecture are also issues that the general AI chip needs to consider. It is not feasible to solve these two problems only through the design concept and architectural innovation of the chip. It also depends on more advanced process technology, new semiconductor materials, new storage devices and humans' further understanding of their own brains.

Opportunities and challenges facing AI chips

        At present, the global artificial intelligence industry is still in the process of rapid development. The wide industry distribution provides a broad market prospect for the application of artificial intelligence. The fast iterative algorithm pushes the artificial intelligence technology to commercialization quickly. The AI chip is the hardware foundation of the algorithm implementation. It is also the strategic commanding height of the future artificial intelligence era, but since the current AI algorithms often have their own advantages and disadvantages, only by setting a suitable scene for them to best play their role, therefore, determining the application field becomes the development of AI chips. Important prerequisites. But unfortunately, there is no general algorithm for adapting to various applications. The "killer" level application of artificial intelligence has not yet appeared. Some existing applications are not just needed for consumers' daily life, so which chip The company can seize the market pain point, the first to achieve application landing, you can get a big advantage on the artificial intelligence chip track.

        Architecture innovation is an unavoidable issue for AI chips. An important question to answer is: Will there be an AI processor that exists as a general-purpose CPU? If so, what is its architecture? If it does not exist, the current AI chip that meets the specific application will only exist in the form of an IP core, and will eventually be integrated by various SoCs (system-on-a-chips). This undoubtedly brings new problems. The size and power consumption of the chip are important factors to be considered. The traditional chip company is no doubt more experienced in the design optimization and engineering implementation of the SoC than the algorithmic start-up AI chip company.

        From the general trend of chip development, it is still the initial stage of AI chips. There is huge room for innovation in both scientific research and industrial applications. From the AI acceleration chip that determines the algorithm and application scenario to the development of a general-purpose smart chip with higher flexibility and adaptability is an inevitable direction of technological development. In the next 2 years, the AI chip industry will continue to be hot, and the company will enter into the market. However, by 2020, there will be a group of out-of-offices. The industry will begin to reshuffle, and the final success and otherwise will depend on the technical path of each company. Choose the speed at which the product will land.

                                          Home |  About us |  Product  |  Solution Provider  |   News |  Contact us  粤ICP备17091917号-1

                              HK Address: Unit 2, 4/F, Kwai Cheong Centre, 50 Kwai Cheong Road, Kwai Chung, New Territories, Hong Kong


Top