1. Introduction

Mobile Internet industry in China has seen rapid development in the last decade, leading to a quick expansion and upgrade of IT infrastructure in all industries.

The average bandwidth in mainland China reached 52M in 2017, which exceeds many developed countries.
The number of devices with mobile Internet access surpassed 1.4 trillion, more than 60% of which had access to the 4G network.
With the approaching of 5G network, contents and display methods of videos are becoming increasingly diversified. In addition to basic scenarios such as live broadcasting and VOD, news trends like short videos and real-time audio/video interactions are emerging. Meanwhile, new video technologies will be applied to fields like security protection, health care, education, judicature, and broadcasting.

This article introduces major technical difficulties and challenges during the upgrade of video applications in all industries, proposes five key essentials and relevant reference standards for setting up the new generation of video cloud, and describes how intelligent video cloud help clients accelerate their upgrades with more convenient services at lower costs. In the future, application scenarios and fields of videos will probably become the core step when a company provides products or marketing its services. Also, its scalability perfectly matches the Matthew principle of the Internet. Thus, companies need to fully prepare for massive rich media materials in advance to avoid data disorder.

2. New Scenarios in the Video Era

2.1 Security Protection and Surveillance

According to the survey by Markets&Markets, between2017 and 2022 the compound annual growth rate in global video surveillance market will reach 15.4%. Market size will grow to $75.6 trillion in 2022. Video surveillance can be widely applied to various scenarios, including:

Road Traffic Surveillance
Urban Safety Surveillance
Public Area Surveillance
Household Security Surveillance

In the past two years, greater demands were being placed on the surveillance side by public institutions such as kindergartens and schools, such as:

Real-time surveillance via both Intranet and Internet.
Long-term storage of surveillance footage and real-time playback.

As for the surveillance on road traffic and urban safety, in addition to the traditional surveillance approach on vehicles’ violation of laws and regulations, pedestrians’ violation is gradually being added to the surveillance system, such as:

Recognition of pedestrians’ violation via images.
Real-time face recognition for instant identification.
Automatic capturing and uploading proof of violation to the cloud.

Above all, the video surveillance industry is now facing an upgrade. Big challenges lie in how people can stably access the video surveillance through the public network, and how massive images and videos generated can be well preserved, analyzed and retrieved.

2.2 Online Education

Online education has become unprecedentedly popular in recent years. Audio and video technologies of the Internet solve the problem of time and space in transmitting high-quality educational resources. It shows in the forms as follows:

Live Classroom: Classes are taught live, and teachers can provide online tutoring in amore natural manner.
Real-time Interaction: Video transmission with low delay helps online teachers communicate with students in a real-time way. With live chat room functions like text, audio, image and user-defined message, teaching objectives can be achieved more easily.
Video On-demand and Replaying: By recording in clouds and replaying online, students can watch the recorded videos at any time.

To solve the problem of lagging during live broadcasting, and further reduce delays in video interactions to improve teachers’ and students’ experience has now become a critical issue. With the development of Artificial Intelligence technology, more attention has been paid to how AI technologies will take video technologies further in online education, such as:

Intelligent Video Tags: Set rewards like trophies and flowers according to students’ class performance. Capture and save students’ excellent performance footage and share highlights video with parents.
Intelligent Video Recommendation: Intelligently recommend study materials to students, including video lessons, notes, exercises, explanations, and tests.

2.3 New Media Broadcasting

To provide an attractive and interactive live experience for their audience, media platforms of all levels should not only strive on contents but also pay attention to the demonstration and ways of interactions. Obviously, the traditional broadcasting scheme has certain limits:

When connected to the Internet, traditional broadcasting mixes not only audio and video streams but also new data streams such as sharing of slideshow or other documents to ensure real-time switch between multiple streams.
Transmission through exclusive routes impedes the progress of three-in-one network.
Low resolution and low bit rate while4K televisions are becoming increasingly popular.
Linear broadcasting and lack of replaying make interaction monotonous.
Lack of video contents analysis, resulting from an inaccurate data collection through reference data like audience rating with samples from fixed groups of the population.

Faced with above limits, the broadcasting industry urgently needs brand new video systems to provide customers with video entertainment experience at high image quality, vivid interactions and measurable, precise data management.

For live broadcasting and directing, audios and images can be mixed and switched on the cloud for quick directing.
For media asset management, staff can intelligently edit, audit and catalog audios and videos to intelligently process the whole step of production, audition, and management of gathering contents and largely increase efficiency in producing.
For content operation, big data capability and algorithm can be used to manage clients’ behavior data in the form of tags and distribute contents and advertisements in all angles, to increase the advertisement values.
For broadcasting to devices, 4K UHD video transmission with its transport layer on the broadband network will ensure user-defined time shifting and replaying of TV screens and trans-screen or multi-screen interactions under open scenarios to increase the convenience and amusement of end-users.

Difficulties need urgent solutions for new media broadcasting such as how to switch the contents of broadcast directing in a real-time way, how to ensure the real-time transmission of media contents, how to maximize the advertisement values of media and how to produce programs at low costs with high quality.

2.4 Intelligent Courts

Since 1st July 2016, all public hearings of the Supreme People's Court have been broadcast live online and all live videos have been stored for the public to watch online. By March2018, the live broadcasting of hearings across the country reached up to over 660 thousand and around 5 billion person-time accessed the live broadcasting. Intelligent courts make full use of advanced information technologies, like the Internet, big data, cloud computing and AI, to support online services in all areas, legitimate publicity in full process and intelligent services in all angles:

Based on videos and documents and combined with AI computer vision technology, people can read and analyze the electronic files to grab important elements and sort them by tagging. For example, people can tag files about criminal motive, time and tool with different colors and thus compare and cross-analyze.

In terms of intelligent courts, more demanding requests and challenges are put forward on the reliability of video infrastructures, such as how to ensure the real-time transmission of the live videos of public hearings, how to store massive live videos for VOD and replaying, and how to conduct intelligent analysis based on extensive video contents.

2.5 Telehealth

Nowadays, medical resources are still distributed unevenly in different regions in China. Medical experts can conduct cross-regional interactive consultation through online live broadcasting and real-time audios and videos:

Clinical Interactive Consultation: With the video conference system, patients can have “face-to-face” communication with doctors and doctors can solve patients’ different problems in a real-time way.
Remote Image Consultation: Patients can communicate with doctors through the main video and at the same time send medical records and data through videos, including radiographic examination images, pathologic examination images, ECG, blood pressure, laboratory test reports and stored videos, to simulate scenarios of real consultations.
Remote Medical Training: With the remote medical training system, people can conduct lectures to impart latest medical information and diagnosis and treatment experience and answer various difficult questions, which helps doctors attending the training renew their thoughts in diagnosis and treatment, and improve the overall professional skills of subordinate hospitals.

Since its appearance, telehealth has been widely applied. However, the remote medical business is now faced with certain major challenges, like how to improve the video transmission performance and how to ensure quick access to families, primary healthcare institutions, and urgent outdoor situations.

During the upgrade, the above industries are all confronted with enormous challenges from both technology and resource aspect while few companies are able to establish effective relevant video services in a short time. Hence knowing how to choose from and make use of relevant video services on public clouds to quickly fulfill business upgrading goals appears to be critical.

3. Key Essentials for Intelligent Video Cloud

To satisfy the needs and challenges of all industries in the era of videos, intelligent video clouds should have the following 5 essentials:

Stable Network Transmission and Dispatch: Live broadcasting delay should be no more than 1s, and interactive broadcasting delay should be no more than 150ms.
Extensible and Massive Storage Services: PBs-level expandability on the business layer and operation-free
Editing and Processing of Media in Clouds: instant independent computing instances to edit user-defined media.
Intelligent Analysis of Video Contents: recognition rate of video contents is higher than 95%.
Perfect Permission Control: ban illegal copying completely and block theft access in seconds.

Essential One: Stable Network Transmission and Dispatch: Smooth Watching Experience and Interactions with Low Delays

CDN Optimization: Integrate and optimize the traditional CDN and set up high-quality global nodes to accelerate dispatch of live and VOD contents, and further enhance video playing experience with instant opening and low delays.
LiveNet: Establish global LiveNet based on global nodes regarding complicated network environment, expensive cross-operator services and obsolete infrastructure in remote areas. With SDN, people deploy combinations of tracks dynamically and decide and dispatch the best track.
Improved SDK in Client Ends: Intelligent video clouds is a full set of SDK development suites, including streaming, short videos and players, to help users produce, edit and consume videos and reduce the difficulty and time costs of developing mobile apps.
With the latest coding and decoding techniques and transmission protocols such as P2P technology, H.265 coding and decoding and QUIC protocol, demands for bandwidth and quality of transmission network can be reduced along with lagging rates.
The standard WebRTC protocol stack is supported to reduce delays between different ends and provide video and audio interaction experience in hundreds of milliseconds.

Essential Two: Extensible and Massive Storage Services: Data Security with High Reliability and Easy Extensibility

Stable and Reliable Object Storage: With techniques like erasure code storage and replica redundancy across data centers, this service provides a data reliability of near 100% and ensures annual unavailable service timeless than 30stoensure the high availability of data stored.
Technical Framework with Easy Extensibility: Focus on business growth without any worries. The storage system supports expanding storage nodes dynamically with flexible storage needs and ensures expanding storage contents in PBs dynamically.
Edge Computing and Edge Storage: People can process computing and storage separately in the near-end devices close to data sources. There’s no need to pass data back to clouds for processing in a real-time way, thus reducing the work of the cloud platform, largely increasing the efficiency and reducing delays to become an effective supplement and optimization of the cloud platform.
Low-frequency Storage: Data with high throughput and durability and low access delays can be stored in clouds to largely reduce the enterprises’ operation costs of massive data storage, with 60% off in costs than traditional scheme and access delays less than 50ms.

Essential Three: Editing and Processing of Media in Clouds: Fast and Multi-functional Video Editing in Clouds

Fast and Lightweight processing: Video editing, in essence, is computing and processing data in videos. By deploying flexible containerized platforms, the utilization rate of physical resources in editing videos can increase from 40% to over70%. The delivery efficiency increases by 5 times and business can respond in seconds to emergencies.
Full Abilities to Edit Multimedia: Multimedia processing services, like video transcoding, screenshots, watermarks, rotation and cropping, are offered to satisfy various real-time program producing scenarios such as live and VOD, producing instant effects and output contents professionally produced for broadcasting.

Essential Four: Intelligent Analysis of Video Contents: Maximized Values behind Videos with AI

Recognition of Intelligent Multimedia Contents: Various functions include contents auditing, OCR, scenario recognition, facial recognition, audio and video processing and image processing. Meanwhile, faced with the daily increasing data processing requests, the flexible contents recognition platform can prevent servers from huge pressure.
Data Sorting of Deep Learning Platform: A deep learning platform framework with high performance can easily finish work daily, such as writing iterative training scripts, addition, deletion and management of new data, incremental learning and iterative learning, building semi-supervised marking system and comparison and integration of models. In all, 70% of repetitive work can be reduced.
The knowledge repository system of massive media resources is made up of the module of video structuration, knowledge graphs and big data retrieval.

In the module of video structuration, basic elements and contents in videos are collected and sorted and the linear videos can be divided into components to be used separately. Knowledge graphs are used to put the information from the video structuration into order, such as events, figures, objects and scenarios and store and present it for easy retrieval and relevance. Based on the previous two, the module of big data retrieval offers high-efficient retrieval of massive media resources and contents. In terms of characteristics of figures, faces, images and videos and even more complex combined structure, video retrieval services are offered quickly.

Essential Five: Perfect Permission Control: Ban on Illegal Copy and Theft

Perfect Anti-theft Mechanism: Regardless of live and VOD, there needs the perfect anti-theft mechanism for the access to video contents. Common anti-theft methods include the referrer anti-theft service, the anti-theft service with timestamps, authentication back to the source and so on, which can almost reduce risks of theft effectively. Meanwhile, the video cloud should be able to perceive and alarm the sudden theft and quickly block its access.
Reliable DRM Digital Copyright Security Mechanism: Besides anti-theft, the video cloud should also offer the protection of contents copyright. The common way is to transcode and encrypt the uploaded video, output the encrypted video for dispatch on the Internet and decrypt it for playing on the terminal, which can truly protect the contents copyright and prevent illegal copy.

From the above 5 essentials, Qiniu believes a full set of intelligent video cloud should compose the following modules:

Intelligent Video Cloud
Source: Qiniu

4. Cost Advantages of Intelligent Video Cloud

The intelligent video cloud of Qiniu can not only fully satisfy new needs from all industries in the era of videos in terms of techniques, but also largely save costs of research, development and operation for enterprises, compared with independent research and development.

Comparisons of TCO
Comparisons of Time Costs Source: Qiniu

Faced with high costs, video cloud services offer abundant products as well as feature easy usage, flexibility and low costs of maintenance. Intelligent video clouds provide the universal technical system, which can also be made according to the specific business, and largely reduce the development cycle and costs of apps in all industries. The private or mixed deployment of modules in the video cloud ensures data security of enterprises and offers the same stability, reliability and flexibility as the public cloud.

5. Added Applications of AI into Intelligent Video Cloud

AI, especially the deep application of computer vision technology, plays an enormous role in creating the technical and cost advantage of intelligent video clouds. In Qiniu’s intelligent video cloud system, computer vision technology replaces manual operation in many segments and largely increases the processing efficiency of video contents. Different from traditional data analysis, Qiniu’s intelligent video cloud system turns the previously unimaginable application in data analysis a reality.

5.1 Recognition of Video Contents: Automatically recognize information in videos and match relevant tags in the tag library

As the most essential techniques in the basic model layer of computer vision, facial recognition, object recognition and scenario recognition have been widely applied into many scenarios in fields such as security protection, broadcasting and education.

For example, in a security protection scenario, HD cameras with facial recognition and motion tracking can judge human’s behavior according to the motions in the monitoring range and will automatically call the police if the person is suspected. When the intelligent cameras is connected to the fugitive database of the police, it can help the police recognize suspects in crowded places, like airports and railway stations, and largely increase the efficiency of solving cases and arresting criminals.

5.2 Structuration of Video Contents: Search information in videos just like text files

Compared with manual tagging, video structuration of computer vision has a series of obvious advantages, such as wide recognition range, high accuracy, continuous iteration of study models, high efficiency of GPU machines and low cost. Tagged videos can play huge roles in industries like telehealth, online education and broadcasting.

For example, in telehealth, the number of videos and images are far beyond the capability of manual tagging. To make the best use of medical videos, people need to sort the videos and images according to the different categories. AI can accurately sort videos with high efficiency and people can search the key information in videos in the same way as text files to make better use, which truly turns medical big data into medical knowledge graphs.

5.3 Auditing of Video Contents: Identify sensitive information from videos and content filtering with high efficiency

Images and videos have replaced texts to be the new mainstream way for communication, so it turns more and more critical to audit the images and videos. However, manual auditing not only leads to high labor costs for enterprises but is also difficult to satisfy the current auditing needs from enormous videos due to its efficiency and accuracy.

For example, in broadcasting industry, manual auditing are widely used for pornography, violence, terrorism and political figures in videos. With computer vision technology, machines can replace human inmost contents auditing situations, which greatly enhances the auditing efficiency. With computer vision technology and the revolution of auditing efficiency it has brought, auditing pornography, violence, terrorism and political figures will be no longer a problem in broadcasting industry.

5.4 Recommendation of Video Contents: Lean operation according to video consuming habit

Besides high-efficient video structuration and contents auditing, computer vision technology can also serve as the innovative engine for contents operation and suit more individualized demand for products.

For example, after finishing the structuration of video contents, operators can intelligently recommend contents according to users’ watching records and even target users with ads in a specific period and place in the video, to maximize the ad conversion. Intelligent recommendation of video contents can help contents operator conduct high-level and lean user management with the highest efficiency.

6. Conclusion: Face unknown challenges from future industry upgrade calmly with flexible intelligent video cloud

In future, few enterprises can exist isolated from the Internet, the total amount of data in enterprises will grow ceaselessly. Value of data will increase along with its burden. All enterprises need to possess the flexibility of usage and storage of files and rich media materials (including massive images, videos, and audios). However, few enterprises feel necessary to own the capability and resources to build video clouds for themselves. What is needed for most enterprises is a set of stable, upgradable video platforms, with which they will be able to cope with the ever-changing and escalating challenges in the future.

Source: Qiniu

Return to Home

Five Key Essentials for the New Generation of Intelligent Video Cloud