[Unicode] Unicode Technical Committee Tech Site | Site Map | Search
 

Submitting Script Proposals

General Information

The Unicode Consortium accepts proposals for inclusion of new characters and scripts in the Unicode Standard. The Unicode Standard definition of character is stated in the Glossary of Unicode Terms. Before preparing a proposal, note in particular the distinction between the terms character and glyph as therein defined. Because of this distinction, graphics such as ligatures, conjunct consonants, minor variant written forms, or abbreviations are generally not acceptable as Unicode characters. Also see Where is my Character?

🛑
The Script Encoding Working Group does not accept emoji or flag proposals.
See Emoji Proposals for the process for submitting emoji proposals.

Proposal Guidelines

Proposals are accepted through the SEW Submission Form only. The submission form details requirements and provides further guidance, examples and references and authors are encouraged to review it before writing a proposal. Any proposal must be submitted as a single PDF document.

Eligibility

Before proceeding, determine that each proposed addition is a character according to the definition given in the Unicode Standard and that the proposed addition does not already exist in the Standard. Consult the Proposed New Characters page to see if the character is already on track to be encoded, and the Archive of Nonapproval Notices to see if the character has already been considered but was disapproved for some reason.

Often a proposed character can be expressed as a sequence of one or more existing Unicode characters. Encoding a character that can be represented by a sequence would be a duplicate representation, and is thus not suitable for encoding. (In any event, the proposed character would disappear when normalized.) For example, a g-umlaut character is not suitable for encoding, since it can already be expressed with the sequence <g, combining diaeresis>. For further information on such sequences see Where is my Character and the FAQ page Characters, Combining Marks.

Ensure that documentation supporting the proposal states whether any Unicode characters have been examined as possible equivalents for the proposed character and, if so, reasons for rejecting the proposed character equivalents. Consult the Unicode Character Encoding Stability Policy to make sure that any associated change to existing characters is in accordance with Consortium policies.

Criteria

For proposed new scripts and characters, three basic criteria need to be met:

  1. Usage: you need to demonstrate that the characters are already in use by a community (independent of the script creator, if applicable)
  2. Stability: you need to demonstrate that the proposed repertoire of characters, including the characters themselves, is stable and not in active development
  3. Need for interchange: you need to present a case that encoding of the characters is needed for public interchange of information in plain text

The exact form and amount of evidence to demonstrate these criteria is not specified to allow for some flexibility on a case-by-case basis. Refer to the submission form for a more detailed description of the criteria and information about what constitutes good evidence. Note that submissions meeting all the above criteria might still be rejected for other reasons.

What to Include in a Proposal

In general, we expect proposals to contain the following:

  • short summary of what the proposal is requesting;
  • reference to any prior related documents in the document register;
  • introduction to the proposed script or characters, citing modern sources of information;
  • comparison to existing characters that are visually similar, if any;
  • suggested character properties (see below);
  • preferred ordering of proposed characters — how would words be ordered in a dictionary?

Additionally, for new scripts:

  • information about punctuation and line and word breaking behavior;
  • suggested text for the introductory chapter about the script to be included in the Core Specification.

If the proposed characters exhibit shaping behavior (contextual shaping, ligatures, conjuncts, or stacking), provide a description of that behavior, preferably with glyph examples. Examples should be sufficient enough for software engineers to produce a minimally acceptable rendering of the characters.

Character Properties

Determine the proposed (or recommended) character properties for each character being proposed. See the Unicode Properties in Character Proposals for guidelines about character properties and a list of questions to help make determinations about appropriate property values. See also Chapter 4, Character Properties of The Unicode Standard.

While the properties can be included in the proposal document with additional commentary, it is enough to fill the character properties in the submission form. The form will assist you in authoring valid entries for selected data files. At a minimum, values for UnicodeData.txt are required. The format is described in UAX #44: Unicode Character Database.

ISO/IEC 10646 Summary Form

The Unicode Consortium works closely with the relevant committee responsible for ISO/IEC 10646, namely JTC1/SC2/WG2, in proposing additions as well as monitoring the status of proposals by various national bodies. Therefore, proposals for new characters are required to include a standardized form "ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646".

All the required information can be filled in the submission form directly. Alternatively, templates of the form can be downloaded from https://www.unicode.org/L2/summary.html and included either in the main document or as a separate PDF document in the submission form. Refer to the submission form for further assistance and guidance on providing the requested information.

Fonts

We require a font with an appropriate license for publishing the standard (see Font Submissions Policy) for any proposed additions or changes to existing characters. For submissions to the Script Encoding Working Group, fonts need to be provided using the submission form. Although this can be supplied later after the initial submission, it is strongly encouraged the font is provided as early as possible. The Script Encoding Working Group will not recommend any proposals to the UTC without an appropriate font in place.

Legal & Licensing Requirements for Script & Character Proposals

A Contributor License Agreement is Required

The Unicode Consortium’s mission is to enable people around the world to use computers in any language. In furtherance of this mission, the Consortium makes its standards, specifications, software, and data freely available to all users around the world under its Unicode Terms of Use and various highly permissive open-source licenses. In order to make its products freely available in this manner, the Consortium needs permission from contributors to freely use, modify, and distribute their contributions as part of the Consortium’s products.

The Consortium has adopted a standard Contributor License Agreement (CLA) for this purpose. The Unicode CLA ensures that a contributor retains ownership of any intellectual property rights in their contribution while granting the Unicode Consortium the necessary legal rights to use, modify, and distribute that contribution in Consortium products. Unicode CLAs are based on the Apache Software Foundation's CLAs, which are well-known in the industry and widely adopted by many respected open source projects.

All proposals (whether or not successful) and related materials will be retained by the Unicode Consortium as a matter of record and may be used for any legitimate Consortium purpose subject to the Unicode Consortium Intellectual Property, Licensing & Technical Contribution Policies.

Who needs to sign a Unicode CLA for Script & Character Proposals?

The Script Encoding Working Group recognizes up to four categories of entities that may be involved in a proposal submission:

  1. Authors include people who draft or otherwise prepare any significant portion of the proposal, including any data compilations, charts, or other exhibits or appendices.
  2. Sponsors are persons or entities or national bodies who may join a proposal, endorse or sponsor it without being an author.
  3. Potential IP holders are any persons or entities (other than the authors of the proposal) who have, may have, or claim intellectual property rights in the proposed character or script itself. This is an unusual scenario, see below.
  4. Submitter is the person who submits the proposal to the Script Encoding Working Group using the SEW Submission Form. Usually this is one of the proposal authors.

Proposals may have multiple authors, and all authors, not just the primary author, are required to sign a Unicode CLA. The submitter is required to sign a Unicode CLA before submitting a proposal and identify all authors and any potential IP holders on the submission form. The Unicode Consortium reserves the right to consider any other persons or entities to be potential IP holders for given submission at its discretion.

A CLA is required from a submitter before submission can be made to the Script Encoding Working Group.

Proposals will not be discussed by the Script Encoding Working Group until a Unicode CLA is in place for all authors of the proposal.

IP Claimants in Scripts & Characters

As noted above, if there are any persons or entities who have, may have, or claim intellectual property rights (copyright, design, or patent rights) in a proposed character or script itself, then the Consortium requires two things of all such IP owners/claimants: (i) that they sign a Unicode CLA or other appropriate license agreement, and (ii) that they provide a formal written endorsement of the proposal. For instructions on how to sign a standard Unicode CLA, please see above, as well as the Unicode Consortium Intellectual Property, Licensing & Technical Contribution Policies. To provide the required written endorsement from an IP owner/claimant in the proposed characters/scripts, please send an email to [email protected] from an email account that is identifiable as that of the IP owner/claimant and provide the endorsement of the proposal, clearly identifying the proposal by name, date, and author(s).

This will be an unusual scenario – the vast majority of scripts and characters that are in scope for encoding in the Unicode Standard are generally not subject to intellectual property protection for a variety of reasons. However, “fictional” languages/scripts, such as Elvish from Lord of the Rings, may be subject to copyright protection depending on the particular circumstances and jurisdiction. Additionally, there are some scripts (whether fictional or not) in which the script creator expressly claims copyright or other IP rights and/or has registered such rights.

Whether “fictional” and “created” languages/scripts are in fact subject to intellectual property protection is disputed by some and is not an area of well-settled law around the world. The Consortium acknowledges that there is no clear consensus on these questions in every jurisdiction. Nevertheless, in the interests of making Unicode standards, specifications, data, and software as widely and freely available as possible, it is Consortium policy that a CLA or similar license is required in these cases.

Proposers are required to identify any such potential IP owners or claimants in their proposals and should obtain the formal endorsement of such owners/claimants. The Consortium will not consider proposals that are not endorsed in writing by all IP claimants in the proposed characters/scripts. The Consortium does not have the resources to research and vet prior IP rights, and in cases where a proposal is not endorsed in writing by all IP claimants, and/or fails to provide sufficient information regarding IP rights/claims, the Consortium will have little choice but to decline to encode.

When potential IP owners in the script/characters are identified, the Consortium will need to review the circumstances and consider whether a standard CLA or other similar license best meets the needs for encoding. Proposers and IP claimants should provide as much information as possible about claimed IP rights in such cases to facilitate the Consortium’s review and to increase the chances that the Consortium will be able to encode.

How to Sign a Unicode CLA?

Briefly, each proposal author will need to determine whether they need to sign an Individual CLA or a Corporate CLA, depending on who owns the contribution being made, the contributor personally or the contributor’s employer or some other corporate entity. It is the contributor’s responsibility to do the research necessary to make this determination, as set forth in the Intellectual Property, Licensing & Technical Contribution Policies.".

In the case of a personal contribution not owned by any corporate entity, sign in to the SEW website using your e-mail and follow the instructions to sign an Individual CLA. If you plan to contribute to the Unicode Consortium using GitHub at any point in the future, it would be best to sign the CLA Form in GitHub - if you don’t, please be aware that you may later be required to sign the CLA again in GiHhub if you ever choose to contribute via GiHhub.

In the case of a contribution owned by the contributor’s corporate employer or some other corporate entity, then the Corporate CLA is required. Corporate CLAs cannot be signed in GitHub and must be signed in PDF format and submitted to [email protected]. To check to see if the Consortium already has a signed Corporate CLA on file for a particular company or other entity, please see the Public List of Corporate CLAs.

Once a contributor, whether individual or corporate, has signed a Unicode CLA, they may continue to make additional contributions to the Unicode Consortium indefinitely without having to sign a CLA for each separate contribution.

Proposal Review Process

The international standardization of entire scripts requires a significant effort on the authors' part. It frequently takes years to move from an initial draft to final standardization, particularly because of the requirements to synchronize proposals with the work done in the ISO committee responsible for the development of ISO/IEC 10646.

Experience has shown that it is often helpful to discuss preliminary proposals before submitting a detailed proposal. One option is to become a member of the Unicode Consortium, and submit the proposal to the members-only email list. Alternatively, authors can contact the UC Berkeley’s Script Encoding Initiative for initial review.

Script Encoding Working Group Process

The process for submitting a proposal to the Script Encoding Working Group is as follows:

  1. A new submission is created by the submitter. They will be required to sign a Unicode CLA before being able to create a new submission.
  2. On the submission form, submitter identifies all authors and potential IP holders and ensures all technical and editorial requirements for proposals are met. It is recommended the information in the submission form is reviewed and consulted alongside preparing the proposal document. The submission form collects:
    • a PDF version of the proposal document;
    • proposed changes to UnicodeData.txt and other properties where applicable;
    • the ISO/IEC 10646 summary form or corresponding information where applicable;
    • a font file where applicable. Including a font file at this stage is strongly encouraged but not required. If included, the font file should comply with the requirements of the Font Submission Policy.
  3. All authors need to sign into the SEW website using the declared e-mail addresses and confirm the submission by indicating a Unicode CLA under which they are contributing. If CLA is missing for any of the authors, the submission will be held pending CLA. No submission will be taken up in a Script Encoding Working Group meeting until a Unicode CLA is in place for all authors.
  4. Once a CLA is in place for all authors, the proposal will become available for review by the Script Encoding Working Group members.
  5. After an initial review, a proposal can be rejected without further consideration (desk rejection). This can happen when the submission does not meet basic editorial or technical requirements listed in the submission form and declared by the submitter as met. Desk rejected submissions will not be published or reported on. Note that a proposal can be desk rejected even before all CLAs are in place.
  6. If review indicates the proposal meets the criteria for discussion, the submission will be discussed in a future Script Encoding Working Group meeting. Note that there is no guarantee on when a particular proposal will be put on a meeting agenda.
  7. The discussion can result in several outcomes:
    1. The Script Encoding Working Group (SEW) does not plan recommending any action to the UTC and does not believe any further updates to the proposal can change this decision. This includes documents submitted just for information and not requiring any action from the UTC. The submitted documents will be posted to the public document register and reported on in the SEW report to the UTC, should the UTC disagree with the SEW's conclusion.
    2. The SEW will provide feedback to the proposal authors, asking them to provide missing information or to make changes to the proposal (the most common case). Once an updated version of the proposal is ready, the submitter can update the submission with a new document.
    3. The SEW believes the proposal meets all requirements and the authors made a convincing case, so the proposal can be recommended to the UTC. If no font has been provided at this point, the submission will be held, pending a font file.
  8. Once a font file meeting the requirements of the Font Submission Policy is included in the submission form, the proposal will be posted to the public document register and appropriate actions recommended to the UTC. Note that UTC can still disagree with the SEW's recommendations.

Any discussed proposal will be posted into the public document registry. Further progress can be monitored via the public UTC minutes as well as the Proposed New Characters -- Pipeline Table page. Furthermore, the submitter and all authors can view the status of their proposal after signing into the SEW website using the e-mail address indicated on the submission.

Authors of proposals, particularly for entire scripts, should be prepared to become involved at various times throughout the process, often revising their proposals more than once; collecting further detailed information; organizing on-line discussions or meetings to dispel controversy; or answering questions posed by committees or national bodies. Without such involvement, any proposal of more than a few characters is unlikely to be successful in the long-run.

Examples

Many good proposals can be found in the UTC document register. Anshuman Pandey has prepared a number of successful proposals.

For people interested in proposing a single symbol or a small set of symbols for encoding, there are also many successful proposals in the UTC document register. For example see the proposal for power symbols.

Interim Solutions

There are ways for programmers and scholarly organizations to make use of Unicode character encoding, even if the script they want to use or transmit is not yet (or may never be) part of the Unicode Standard. Individual groups that make use of rare scripts or special characters can reach a private agreement about interchange and set aside part of the Private Use Area to encode their private set of characters. Individuals with interests in rare scripts or materials relating to them may sometimes be contacted through an electronic mail list which the Consortium maintains. For information about these mail lists, please contact the Unicode office.