Today, China’s National Technical Committee 260 on Cybersecurity Standardization of Administration (TC260, 全国网络安全标准化技术委员会) formally publicized its “Basic security requirements for generative artificial intelligence service”(the “Technical Document”).
In October, TC260 published a draft of a technical document for public review, setting forth stringent guidelines against using data from blacklisted corpora for training. It stipulates that any single-source corpus with over 5% of its content being illegal or inappropriate must be blacklisted. The document also details 31 primary security risks across five categories related to corpus and generated content.
The Cyberspace Administration of China, alongside various departments, released the "Interim Measures for the Management of Generative Artificial Intelligence Services" on July 10, effective from August 15. Article 17 mandates that providers of GAI services, particularly those influencing public opinion or capable of social mobilization, undergo security assessments, although the exact processes and criteria remain ambiguous.
While the technical document lacks legal enforceability and does not qualify as national standards, its significance is undeniable. In China’s complicated policy and legal system governing digital space, national standards and technical documents are generally categorized into two types: industry-led initiatives promoting best practices and regulator-mandated directives. The former is often about higher industrial standards and is the real “soft law”. However, the latter is often provisional yet widely adhered to rules favoured and supported by regulators. They imply a tacit consensus, reflecting a unique aspect of China's digital governance.
The technical document is more like the latter as it is the only official document specifically addressing the security requirements for GAI to date and has served as the sole somewhat detailed standard for security assessments mandated by the interim measures, offering guidance on specific security issues faced by GAI services. Additionally, the document serves as an essential tool for regulatory bodies assessing GAI service security.
Here are some key takeaways from the final version of the technical document:
A definition of the “foundation model” was introduced: “foundation model” was defined as “a deep neural network model trained on a large amount of data, used for general-purpose goals, and can be optimized and adapted to various downstream tasks”.
More stringent requirements for the legality of the source of the training corpus: the requirement in the draft was that the training corpus could not be the information required to be blocked by "cyber security related laws" (especially those highlighted in Article 15 of the Cyber Security Law). The technical document extends prohibitions to include content mandated for blocking by “cybersecurity policies”, a more ambiguous concept.
More burden on GAI service providers regarding harmful/illegal output: it imposes additional responsibilities on GAI service providers to mitigate harmful or illegal outputs through “technical measures”.
Clarified requirements on foundation model security: Companies are not allowed to use unregistered third-party foundation models to provide GAI services to the general public. However, the use of foundation models for R&D purposes is exempted. The EU AI Act under debate may provide some inspiration for the exemption. As we all know, The EU AI Act exempts AI systems “specifically developed and put into service for the sole purpose of scientific research and development” from its rules.
Enhanced minor protection: GAI service providers are now not allowed to provide paid services that do not align with minors' civil capacity.
Somewhere, the compliance burden of GAI service providers was further loosened: it eases compliance for GAI providers regarding user input data for training, offering users an easy opt-out instead of requiring prior agreements to authorize the use of input data as stipulated by the draft for comment.
Security assessment: the methodology and specifications regarding the specific quantity metrics for keyword libraries and test question banks for generated content remain unchanged.
Generally, the technical document conforms with China’s approach to GAI governance, focusing on content safety (or ideological security). Indeed, this may impose certain constraints on the development of China's AI industry. However, it clarifies crucial security standards for GAI providers, promoting a more open and positive development trajectory for China's AI sector in the long term.
The prohibition of unregistered US open-source LLM is a big deal. If it’s really enforced, many of the current AI startups in China that are using Llama 2 will be hurt. It might be a policy signal that the government does not favour this business model as China needs its own “sovereign LLM”. Leveraging open-source models may indeed provide a pathway for some Chinese AI startups to generate revenue. However, this approach carries the risk of becoming overly dependent on U.S. technology, which could lead to vulnerabilities similar to those experienced in the semiconductor industry.