How to implement high-definition low-stream video coding using FPGA?

The rapid spread of 3G networks and smartphones has driven the development of mobile Internet, providing conditions for security networks to expand from LAN to mobile Internet. By measuring the uplink and downlink bandwidth of the mobile Internet, it can be known that 512 Kbps is an effective and reliable bandwidth value. If high-definition video transmission can be realized under the constraint of this bandwidth value, the popularity of mobile monitoring applications will be promoted. . This paper introduces the role of FPGA in realizing high-definition low-stream video coding and how to implement it.

Overview

The resolution and code stream of the security camera are positively related. After entering the high-definition era, the code stream is above 2Mbps, which is more than 3 times higher than the previous D1 era. Such a large code stream is transmitted in the local network of 100M/1000M. The problem, the cost of hard disk storage is also acceptable; but if you want to transmit high-definition video on the Internet and 3G networks, the low-stream requirements are highlighted.

The first is the problem of uploading high-definition video to the Internet. At present, the most popular and cheapest uploading technology is ADSL. The upload speed is 512Kbps, and it can be uploaded with 3G. The upload speed of CDMA2000 is 1.8 Mbps, due to the theoretical peak and actual continuous average of wireless transmission. There is a considerable gap in the value, so it can be estimated within a few hundred Kbps; secondly, the problem of downloading HD video from the Internet to the display terminal, the download speed of ADSL can generally be above 4 Mbps, and downloading a few from the Internet using 3G at home. Ten megabytes of files, TD-SCDMA download speed is about 430 Kbps, CDMA2000 download speed is about 720Kbps, WCDMA download speed is about 1120 Kbps.

In summary, if high-definition video is to be easily and economically applied on the Internet and 3G networks, an average stream of 512 Kbps is suitable. There is still a problem in the application of high-definition video in the Internet and 3G networks, that is, the fluctuation of the real-time bandwidth of the network is relatively large. The lower the average code stream of the video transmitted in this environment, the more assured the quality of the video.

The current status quo is that the video stream of HD video 720p is generally above 2 Mbps, and the code stream of 1080p is above 4 Mbps. To greatly reduce the code stream, it needs to be considered from several aspects.

H.264 encoder and FPGA

Video compression coding is the most effective way to reduce the code stream. Currently, H.264 is the preferred standard for encoders. The H.264 encoding algorithm is complex and uses many methods to reduce the code stream. In general, a video consists of consecutive frames, and the encoded frames mainly have I frames, P frames, and B frames. The encoding of the I frame does not depend on other frames, and only uses the pixels in the frame to perform various predictions to reduce the encoded code stream; the P frame uses the current frame and the previous frame as a reference, and uses various pixels in the frame and pixels between the frames to perform various types. Predicting to reduce the coded stream; B-frames use the current, previous, and subsequent frames for reference, and use the pixels within the frame and the pixels between the frames to make various predictions to reduce the coded stream.

From a practical point of view, P-frames and B-frames have the greatest contribution to reducing the coded stream, because in the monitoring application, the ratio of P-frames and B-frames to I-frames can be large; and the effect of B-frames is more obvious: not only can be utilized The reference frame is used to increase the accuracy of the prediction, and the decoding result of the B frame may not be used as the reference frame, so that the coded stream can be reduced by appropriately reducing the B-frame coding quality, so that the code stream of the B frame can be compared with the P frame. a lot less. The B frame is the same as the P frame except that the P frame is more backward than the P frame. Therefore, we only consider the I frame and the P frame, and discuss the quantization of the FPGA in the prediction and transformation results. The role played.

Prediction - the advantages of FPGA in parallel processing

The prediction method adopted by the I frame is relatively simple, and can be adopted in both the P frame and the B frame, so all prediction methods of the I frame should be fully implemented; the prediction method of the P frame is very complicated, and the H.264 encoder is large. Part of the workload is here. The prediction purpose of the P frame is to find the position of the current macroblock in the reference frame (the macroblock can be divided into several parts to match), and the matching precision is 1/4 pixel, and the exact matching can minimize the encoding.

In order to reduce the workload, the search matching of integer pixels is generally performed first, and then the final matching of 1/2 and 1/4 pixels. To improve the success rate of search matching, the number of reference frames, the search range and the number of matches are all Very critical. In general, a large number of reference frames or a large search range requires a relatively large number of matches.

Due to the hardware real-time and pipeline requirements, the prediction of P-frames must be completed in a fixed unit time. In order to achieve as many matching times as possible in a short period of time, parallel processing is the only choice, and FPGAs are processed in parallel. The superiority is demonstrated, and the matching of multiple positions at the same time can be realized. Like some small diamonds of 4 points or 3 points matching, the SAD of 3 to 4 points can be calculated at the same time, which is 3 to 4 times faster than the point-by-point calculation. In addition, multiple reference frames can also be processed in parallel, while obtaining the minimum SAD of different reference frames; parallel processing can greatly improve the number of matches, but also requires a large amount of internal memory and logic resources, which needs to be considered from the overall resources of the entire design.

Transform Result Quantization - QP Control of FPGA

The I frame is the first frame of a video sequence, so the quality of the general transform and quantization is higher than that of the P frame and the B frame, and the quantization parameter QP used is smaller than the QP used for the P frame and the B frame. However, for some areas with few moving objects, a larger QP can be considered to reduce the encoding, that is, different areas of the I frame can have different QPs.

The P frame generally has only transform and quantized results for the motion macroblock. The quantization parameter QP used is larger than the I frame. As for the larger the number, the quantization parameter QP can be determined according to the motion speed. If the speed is slow, the quantization parameter QP is larger than the I frame. 2. If the speed is fast, the quantization parameter QP is 3~4 larger than the I frame. Because the movement speed is fast, the human eye is a little fuzzy, so the image quality loss is not obvious.

The result of the B frame may not be used as a reference frame, so the quantization parameter QP of the B frame is larger than the P frame to obtain fewer code streams.

From the above analysis, we can know that not only the QP size of the three frames is different, but also the QP will change according to the motion situation. Such QP control is very complicated, and the H.264 encoder of the ASIC does not necessarily have such control function. It is just that such control can be achieved.

ISP-FPGA filter and sharpener function

ISP is the video image processing before encoding. The filtering intensity and sharpening intensity have a great influence on the code stream. There are many filtering algorithms. The filtering algorithm that can filter and preserve the edge is very complicated. It is convenient to use FPGA. Different algorithms and different strength filters and sharpeners are implemented.

Intelligent Analysis - Intelligent Analysis of FPGAs

Intelligent analysis has two main functions, image analysis and motion analysis. Motion analysis is to obtain moving objects in the video, such as people and vehicles; image analysis is to obtain the distribution of motion regions and still regions, which can reduce the code stream in the still region. The general ASIC's intelligent analysis function is only motion analysis, and the FPGA can implement both functions at the same time.

Application of FPGA in Intelligent Adaptive

The factors affecting the code stream mainly include: scene content changes, video resolution, video frame rate, quantization parameter QP, video filtering intensity, image sharpening intensity and image analysis sensitivity. When the content of the scene changes, the code stream will change accordingly. In order to achieve a stable target of maintaining the code stream, it is necessary to adjust other parameters at the same time, which is intelligent adaptation. Intelligent adaptive requires complex control strategies to achieve. The parameters involved in the adjustment are distributed in ISP, intelligent analysis, H.264 encoding and other links. The real-time requirements are relatively high, which is very suitable for FPGA implementation.

FPGA selection

H.264 encoding is in macroblock units, which inevitably involves the input and output and buffering of macroblocks during processing. The data of one macroblock is 384 bytes (256 bytes of luminance data and 128 bytes of chrominance data). If parallel is considered for input and output and processing, double copies must be set, that is, 768 bytes, so 1K bytes are used. The storage block just meets the requirements. The storage of the reference frame may include multiple reference frame macroblocks, requiring multiple memory blocks. ISPs often need to cache 1 row of pixels, 1080p has 1920 pixels per line, and requires 2K bytes of storage blocks.

From the above analysis, it can be seen that the FPGA for H.264 encoding and image processing requires internal memory: the memory block size is small (such as 1 ~ 2KB), and the number of memory blocks is as large as possible; in addition, the multiplier The requirements are as many as possible.

Conclusion

As can be seen from the above discussion, reducing the coded code stream of a video is a systematic project involving many links, especially the H.264 encoder can do a lot of work. At present, we use 1284 TImes of CYCLONE IV EP4CE115; 720 & TImes; 25fps average code stream is less than 512Kbps, H.264 encoding grade is main profile with cabac. With the advancement of FPGA technology, FPGA resources are more and more, the prediction of motion macroblocks can be more and more accurate, and the code stream will be less and less. Next, we are going to use CYCLONE V to realize 1920 & TImes; 1080 & TImes; The average code stream of 25fps is less than 1024Kbps.

We mainly produce Fridge Thermostat, namely Ranco, Sagnomiya style, Pacific style,danfoss .style.robertshaw style,ge style.TAM style,ATEA style,E style,F2000,A2000,711,TIMER,ROOM THERMOSTAT etc. Our products are mainly used in refrigerator, home air-conditioner, car air-conditioner, freezer,watering-trough, water heater and washing machine,oven, fryer etc.96% of our products export.and sell well in Europe, America, Africa ,mid-east and other places worldwide, with VDE,CE, CQC, RoHS and ISO9001 certificates.
Ranco:K50-P1110 K50-P1125 K50-P1126  K50-P1127 K50-P1179 K50-P1216  K50-P1272  K50-P1346  K54-P1102  VT9(K59-L1102)  K59-L3111-002   K59-P3018-003    K60-P1013 etc.
Danfoss:077B0021 077B0033 077B0407 077B1110  077B2671  077B6208  077B6706 etc.F2000,A2000

Fridge Thermostat

Fridge Thermostat,Refrigerator Thermostat,Freezer Thermostat,Refrigerator Temperature Control

ZHEJIANG ICE LOONG ENVIRONMENTAL SCI-TECH CO.,LTD. , https://www.china-refrigerantgas.com