首页
学习
活动
专区
工具
TVP
发布
精选内容/技术社群/优惠产品,尽在小程序
立即前往

unicode

  • import "unicode"
  • 概观
  • 索引
  • 示例
  • 子目录

概观

打包unicode提供数据和函数来测试 Unicode 代码点的一些属性。

示例(Is)

以“Is”开头的函数可用于检查符文属于哪个范围的表格。请注意,符文可能适用于多个范围。

代码语言:javascript
复制
package main

import (
	"fmt"
	"unicode"
)

func main() {

	// constant with mixed type runes
	const mixed = "\b5Ὂg̀9! ℃ᾭG"
	for _, c := range mixed {
		fmt.Printf("For %q:\n", c)
		if unicode.IsControl(c) {
			fmt.Println("\tis control rune")
		}
		if unicode.IsDigit(c) {
			fmt.Println("\tis digit rune")
		}
		if unicode.IsGraphic(c) {
			fmt.Println("\tis graphic rune")
		}
		if unicode.IsLetter(c) {
			fmt.Println("\tis letter rune")
		}
		if unicode.IsLower(c) {
			fmt.Println("\tis lower case rune")
		}
		if unicode.IsMark(c) {
			fmt.Println("\tis mark rune")
		}
		if unicode.IsNumber(c) {
			fmt.Println("\tis number rune")
		}
		if unicode.IsPrint(c) {
			fmt.Println("\tis printable rune")
		}
		if !unicode.IsPrint(c) {
			fmt.Println("\tis not printable rune")
		}
		if unicode.IsPunct(c) {
			fmt.Println("\tis punct rune")
		}
		if unicode.IsSpace(c) {
			fmt.Println("\tis space rune")
		}
		if unicode.IsSymbol(c) {
			fmt.Println("\tis symbol rune")
		}
		if unicode.IsTitle(c) {
			fmt.Println("\tis title case rune")
		}
		if unicode.IsUpper(c) {
			fmt.Println("\tis upper case rune")
		}
	}

}

索引

  • 常量
  • 变量
  • func In(r rune, ranges ...*RangeTable) bool
  • func Is(rangeTab *RangeTable, r rune) bool
  • func IsControl(r rune) bool
  • func IsDigit(r rune) bool
  • func IsGraphic(r rune) bool
  • func IsLetter(r rune) bool
  • func IsLower(r rune) bool
  • func IsMark(r rune) bool
  • func IsNumber(r rune) bool
  • func IsOneOf(ranges []*RangeTable, r rune) bool
  • func IsPrint(r rune) bool
  • func IsPunct(r rune) bool
  • func IsSpace(r rune) bool
  • func IsSymbol(r rune) bool
  • func IsTitle(r rune) bool
  • func IsUpper(r rune) bool
  • func SimpleFold(r rune) rune
  • func To(_case int, r rune) rune
  • func ToLower(r rune) rune
  • func ToTitle(r rune) rune
  • func ToUpper(r rune) rune
  • type CaseRange
  • type Range16
  • type Range32
  • type RangeTable
  • type SpecialCase
  • func (special SpecialCase) ToLower(r rune) rune
  • func (special SpecialCase) ToTitle(r rune) rune
  • func (special SpecialCase) ToUpper(r rune) rune
  • 错误

示例

SimpleFold SpecialCase To ToLower ToTitle ToUpper Package (Is)

打包文件

常量

代码语言:javascript
复制
const (
        MaxRune         = '\U0010FFFF' // Maximum valid Unicode code point.
        ReplacementChar = '\uFFFD'     // Represents invalid code points.
        MaxASCII        = '\u007F'     // maximum ASCII value.
        MaxLatin1       = '\u00FF'     // maximum Latin-1 value.
)

CaseRanges内的 Delta 数组索引,用于大小写映射。

代码语言:javascript
复制
const (
        UpperCase = iota
        LowerCase
        TitleCase
        MaxCase
)

如果一个 CaseRange 的 Delta 字段是 UpperLower,则意味着这个 CaseRange 表示一个形式的序列(比如说) Upper Lower Upper Lower 。

代码语言:javascript
复制
const (
        UpperLower = MaxRune + 1 // (Cannot be a valid delta.)
)

版本是从中派生表的 Unicode 版本。

代码语言:javascript
复制
const Version = "9.0.0"

变量

这些变量有 * RangeTable 类型。

代码语言:javascript
复制
var (
        Cc     = _Cc // Cc is the set of Unicode characters in category Cc.
        Cf     = _Cf // Cf is the set of Unicode characters in category Cf.
        Co     = _Co // Co is the set of Unicode characters in category Co.
        Cs     = _Cs // Cs is the set of Unicode characters in category Cs.
        Digit  = _Nd // Digit is the set of Unicode characters with the "decimal digit" property.
        Nd     = _Nd // Nd is the set of Unicode characters in category Nd.
        Letter = _L  // Letter/L is the set of Unicode letters, category L.
        L      = _L
        Lm     = _Lm // Lm is the set of Unicode characters in category Lm.
        Lo     = _Lo // Lo is the set of Unicode characters in category Lo.
        Lower  = _Ll // Lower is the set of Unicode lower case letters.
        Ll     = _Ll // Ll is the set of Unicode characters in category Ll.
        Mark   = _M  // Mark/M is the set of Unicode mark characters, category M.
        M      = _M
        Mc     = _Mc // Mc is the set of Unicode characters in category Mc.
        Me     = _Me // Me is the set of Unicode characters in category Me.
        Mn     = _Mn // Mn is the set of Unicode characters in category Mn.
        Nl     = _Nl // Nl is the set of Unicode characters in category Nl.
        No     = _No // No is the set of Unicode characters in category No.
        Number = _N  // Number/N is the set of Unicode number characters, category N.
        N      = _N
        Other  = _C // Other/C is the set of Unicode control and special characters, category C.
        C      = _C
        Pc     = _Pc // Pc is the set of Unicode characters in category Pc.
        Pd     = _Pd // Pd is the set of Unicode characters in category Pd.
        Pe     = _Pe // Pe is the set of Unicode characters in category Pe.
        Pf     = _Pf // Pf is the set of Unicode characters in category Pf.
        Pi     = _Pi // Pi is the set of Unicode characters in category Pi.
        Po     = _Po // Po is the set of Unicode characters in category Po.
        Ps     = _Ps // Ps is the set of Unicode characters in category Ps.
        Punct  = _P  // Punct/P is the set of Unicode punctuation characters, category P.
        P      = _P
        Sc     = _Sc // Sc is the set of Unicode characters in category Sc.
        Sk     = _Sk // Sk is the set of Unicode characters in category Sk.
        Sm     = _Sm // Sm is the set of Unicode characters in category Sm.
        So     = _So // So is the set of Unicode characters in category So.
        Space  = _Z  // Space/Z is the set of Unicode space characters, category Z.
        Z      = _Z
        Symbol = _S // Symbol/S is the set of Unicode symbol characters, category S.
        S      = _S
        Title  = _Lt // Title is the set of Unicode title case letters.
        Lt     = _Lt // Lt is the set of Unicode characters in category Lt.
        Upper  = _Lu // Upper is the set of Unicode upper case letters.
        Lu     = _Lu // Lu is the set of Unicode characters in category Lu.
        Zl     = _Zl // Zl is the set of Unicode characters in category Zl.
        Zp     = _Zp // Zp is the set of Unicode characters in category Zp.
        Zs     = _Zs // Zs is the set of Unicode characters in category Zs.
)

这些变量有 * RangeTable 类型。

代码语言:javascript
复制
var (
        Adlam                  = _Adlam                  // Adlam is the set of Unicode characters in script Adlam.
        Ahom                   = _Ahom                   // Ahom is the set of Unicode characters in script Ahom.
        Anatolian_Hieroglyphs  = _Anatolian_Hieroglyphs  // Anatolian_Hieroglyphs is the set of Unicode characters in script Anatolian_Hieroglyphs.
        Arabic                 = _Arabic                 // Arabic is the set of Unicode characters in script Arabic.
        Armenian               = _Armenian               // Armenian is the set of Unicode characters in script Armenian.
        Avestan                = _Avestan                // Avestan is the set of Unicode characters in script Avestan.
        Balinese               = _Balinese               // Balinese is the set of Unicode characters in script Balinese.
        Bamum                  = _Bamum                  // Bamum is the set of Unicode characters in script Bamum.
        Bassa_Vah              = _Bassa_Vah              // Bassa_Vah is the set of Unicode characters in script Bassa_Vah.
        Batak                  = _Batak                  // Batak is the set of Unicode characters in script Batak.
        Bengali                = _Bengali                // Bengali is the set of Unicode characters in script Bengali.
        Bhaiksuki              = _Bhaiksuki              // Bhaiksuki is the set of Unicode characters in script Bhaiksuki.
        Bopomofo               = _Bopomofo               // Bopomofo is the set of Unicode characters in script Bopomofo.
        Brahmi                 = _Brahmi                 // Brahmi is the set of Unicode characters in script Brahmi.
        Braille                = _Braille                // Braille is the set of Unicode characters in script Braille.
        Buginese               = _Buginese               // Buginese is the set of Unicode characters in script Buginese.
        Buhid                  = _Buhid                  // Buhid is the set of Unicode characters in script Buhid.
        Canadian_Aboriginal    = _Canadian_Aboriginal    // Canadian_Aboriginal is the set of Unicode characters in script Canadian_Aboriginal.
        Carian                 = _Carian                 // Carian is the set of Unicode characters in script Carian.
        Caucasian_Albanian     = _Caucasian_Albanian     // Caucasian_Albanian is the set of Unicode characters in script Caucasian_Albanian.
        Chakma                 = _Chakma                 // Chakma is the set of Unicode characters in script Chakma.
        Cham                   = _Cham                   // Cham is the set of Unicode characters in script Cham.
        Cherokee               = _Cherokee               // Cherokee is the set of Unicode characters in script Cherokee.
        Common                 = _Common                 // Common is the set of Unicode characters in script Common.
        Coptic                 = _Coptic                 // Coptic is the set of Unicode characters in script Coptic.
        Cuneiform              = _Cuneiform              // Cuneiform is the set of Unicode characters in script Cuneiform.
        Cypriot                = _Cypriot                // Cypriot is the set of Unicode characters in script Cypriot.
        Cyrillic               = _Cyrillic               // Cyrillic is the set of Unicode characters in script Cyrillic.
        Deseret                = _Deseret                // Deseret is the set of Unicode characters in script Deseret.
        Devanagari             = _Devanagari             // Devanagari is the set of Unicode characters in script Devanagari.
        Duployan               = _Duployan               // Duployan is the set of Unicode characters in script Duployan.
        Egyptian_Hieroglyphs   = _Egyptian_Hieroglyphs   // Egyptian_Hieroglyphs is the set of Unicode characters in script Egyptian_Hieroglyphs.
        Elbasan                = _Elbasan                // Elbasan is the set of Unicode characters in script Elbasan.
        Ethiopic               = _Ethiopic               // Ethiopic is the set of Unicode characters in script Ethiopic.
        Georgian               = _Georgian               // Georgian is the set of Unicode characters in script Georgian.
        Glagolitic             = _Glagolitic             // Glagolitic is the set of Unicode characters in script Glagolitic.
        Gothic                 = _Gothic                 // Gothic is the set of Unicode characters in script Gothic.
        Grantha                = _Grantha                // Grantha is the set of Unicode characters in script Grantha.
        Greek                  = _Greek                  // Greek is the set of Unicode characters in script Greek.
        Gujarati               = _Gujarati               // Gujarati is the set of Unicode characters in script Gujarati.
        Gurmukhi               = _Gurmukhi               // Gurmukhi is the set of Unicode characters in script Gurmukhi.
        Han                    = _Han                    // Han is the set of Unicode characters in script Han.
        Hangul                 = _Hangul                 // Hangul is the set of Unicode characters in script Hangul.
        Hanunoo                = _Hanunoo                // Hanunoo is the set of Unicode characters in script Hanunoo.
        Hatran                 = _Hatran                 // Hatran is the set of Unicode characters in script Hatran.
        Hebrew                 = _Hebrew                 // Hebrew is the set of Unicode characters in script Hebrew.
        Hiragana               = _Hiragana               // Hiragana is the set of Unicode characters in script Hiragana.
        Imperial_Aramaic       = _Imperial_Aramaic       // Imperial_Aramaic is the set of Unicode characters in script Imperial_Aramaic.
        Inherited              = _Inherited              // Inherited is the set of Unicode characters in script Inherited.
        Inscriptional_Pahlavi  = _Inscriptional_Pahlavi  // Inscriptional_Pahlavi is the set of Unicode characters in script Inscriptional_Pahlavi.
        Inscriptional_Parthian = _Inscriptional_Parthian // Inscriptional_Parthian is the set of Unicode characters in script Inscriptional_Parthian.
        Javanese               = _Javanese               // Javanese is the set of Unicode characters in script Javanese.
        Kaithi                 = _Kaithi                 // Kaithi is the set of Unicode characters in script Kaithi.
        Kannada                = _Kannada                // Kannada is the set of Unicode characters in script Kannada.
        Katakana               = _Katakana               // Katakana is the set of Unicode characters in script Katakana.
        Kayah_Li               = _Kayah_Li               // Kayah_Li is the set of Unicode characters in script Kayah_Li.
        Kharoshthi             = _Kharoshthi             // Kharoshthi is the set of Unicode characters in script Kharoshthi.
        Khmer                  = _Khmer                  // Khmer is the set of Unicode characters in script Khmer.
        Khojki                 = _Khojki                 // Khojki is the set of Unicode characters in script Khojki.
        Khudawadi              = _Khudawadi              // Khudawadi is the set of Unicode characters in script Khudawadi.
        Lao                    = _Lao                    // Lao is the set of Unicode characters in script Lao.
        Latin                  = _Latin                  // Latin is the set of Unicode characters in script Latin.
        Lepcha                 = _Lepcha                 // Lepcha is the set of Unicode characters in script Lepcha.
        Limbu                  = _Limbu                  // Limbu is the set of Unicode characters in script Limbu.
        Linear_A               = _Linear_A               // Linear_A is the set of Unicode characters in script Linear_A.
        Linear_B               = _Linear_B               // Linear_B is the set of Unicode characters in script Linear_B.
        Lisu                   = _Lisu                   // Lisu is the set of Unicode characters in script Lisu.
        Lycian                 = _Lycian                 // Lycian is the set of Unicode characters in script Lycian.
        Lydian                 = _Lydian                 // Lydian is the set of Unicode characters in script Lydian.
        Mahajani               = _Mahajani               // Mahajani is the set of Unicode characters in script Mahajani.
        Malayalam              = _Malayalam              // Malayalam is the set of Unicode characters in script Malayalam.
        Mandaic                = _Mandaic                // Mandaic is the set of Unicode characters in script Mandaic.
        Manichaean             = _Manichaean             // Manichaean is the set of Unicode characters in script Manichaean.
        Marchen                = _Marchen                // Marchen is the set of Unicode characters in script Marchen.
        Meetei_Mayek           = _Meetei_Mayek           // Meetei_Mayek is the set of Unicode characters in script Meetei_Mayek.
        Mende_Kikakui          = _Mende_Kikakui          // Mende_Kikakui is the set of Unicode characters in script Mende_Kikakui.
        Meroitic_Cursive       = _Meroitic_Cursive       // Meroitic_Cursive is the set of Unicode characters in script Meroitic_Cursive.
        Meroitic_Hieroglyphs   = _Meroitic_Hieroglyphs   // Meroitic_Hieroglyphs is the set of Unicode characters in script Meroitic_Hieroglyphs.
        Miao                   = _Miao                   // Miao is the set of Unicode characters in script Miao.
        Modi                   = _Modi                   // Modi is the set of Unicode characters in script Modi.
        Mongolian              = _Mongolian              // Mongolian is the set of Unicode characters in script Mongolian.
        Mro                    = _Mro                    // Mro is the set of Unicode characters in script Mro.
        Multani                = _Multani                // Multani is the set of Unicode characters in script Multani.
        Myanmar                = _Myanmar                // Myanmar is the set of Unicode characters in script Myanmar.
        Nabataean              = _Nabataean              // Nabataean is the set of Unicode characters in script Nabataean.
        New_Tai_Lue            = _New_Tai_Lue            // New_Tai_Lue is the set of Unicode characters in script New_Tai_Lue.
        Newa                   = _Newa                   // Newa is the set of Unicode characters in script Newa.
        Nko                    = _Nko                    // Nko is the set of Unicode characters in script Nko.
        Ogham                  = _Ogham                  // Ogham is the set of Unicode characters in script Ogham.
        Ol_Chiki               = _Ol_Chiki               // Ol_Chiki is the set of Unicode characters in script Ol_Chiki.
        Old_Hungarian          = _Old_Hungarian          // Old_Hungarian is the set of Unicode characters in script Old_Hungarian.
        Old_Italic             = _Old_Italic             // Old_Italic is the set of Unicode characters in script Old_Italic.
        Old_North_Arabian      = _Old_North_Arabian      // Old_North_Arabian is the set of Unicode characters in script Old_North_Arabian.
        Old_Permic             = _Old_Permic             // Old_Permic is the set of Unicode characters in script Old_Permic.
        Old_Persian            = _Old_Persian            // Old_Persian is the set of Unicode characters in script Old_Persian.
        Old_South_Arabian      = _Old_South_Arabian      // Old_South_Arabian is the set of Unicode characters in script Old_South_Arabian.
        Old_Turkic             = _Old_Turkic             // Old_Turkic is the set of Unicode characters in script Old_Turkic.
        Oriya                  = _Oriya                  // Oriya is the set of Unicode characters in script Oriya.
        Osage                  = _Osage                  // Osage is the set of Unicode characters in script Osage.
        Osmanya                = _Osmanya                // Osmanya is the set of Unicode characters in script Osmanya.
        Pahawh_Hmong           = _Pahawh_Hmong           // Pahawh_Hmong is the set of Unicode characters in script Pahawh_Hmong.
        Palmyrene              = _Palmyrene              // Palmyrene is the set of Unicode characters in script Palmyrene.
        Pau_Cin_Hau            = _Pau_Cin_Hau            // Pau_Cin_Hau is the set of Unicode characters in script Pau_Cin_Hau.
        Phags_Pa               = _Phags_Pa               // Phags_Pa is the set of Unicode characters in script Phags_Pa.
        Phoenician             = _Phoenician             // Phoenician is the set of Unicode characters in script Phoenician.
        Psalter_Pahlavi        = _Psalter_Pahlavi        // Psalter_Pahlavi is the set of Unicode characters in script Psalter_Pahlavi.
        Rejang                 = _Rejang                 // Rejang is the set of Unicode characters in script Rejang.
        Runic                  = _Runic                  // Runic is the set of Unicode characters in script Runic.
        Samaritan              = _Samaritan              // Samaritan is the set of Unicode characters in script Samaritan.
        Saurashtra             = _Saurashtra             // Saurashtra is the set of Unicode characters in script Saurashtra.
        Sharada                = _Sharada                // Sharada is the set of Unicode characters in script Sharada.
        Shavian                = _Shavian                // Shavian is the set of Unicode characters in script Shavian.
        Siddham                = _Siddham                // Siddham is the set of Unicode characters in script Siddham.
        SignWriting            = _SignWriting            // SignWriting is the set of Unicode characters in script SignWriting.
        Sinhala                = _Sinhala                // Sinhala is the set of Unicode characters in script Sinhala.
        Sora_Sompeng           = _Sora_Sompeng           // Sora_Sompeng is the set of Unicode characters in script Sora_Sompeng.
        Sundanese              = _Sundanese              // Sundanese is the set of Unicode characters in script Sundanese.
        Syloti_Nagri           = _Syloti_Nagri           // Syloti_Nagri is the set of Unicode characters in script Syloti_Nagri.
        Syriac                 = _Syriac                 // Syriac is the set of Unicode characters in script Syriac.
        Tagalog                = _Tagalog                // Tagalog is the set of Unicode characters in script Tagalog.
        Tagbanwa               = _Tagbanwa               // Tagbanwa is the set of Unicode characters in script Tagbanwa.
        Tai_Le                 = _Tai_Le                 // Tai_Le is the set of Unicode characters in script Tai_Le.
        Tai_Tham               = _Tai_Tham               // Tai_Tham is the set of Unicode characters in script Tai_Tham.
        Tai_Viet               = _Tai_Viet               // Tai_Viet is the set of Unicode characters in script Tai_Viet.
        Takri                  = _Takri                  // Takri is the set of Unicode characters in script Takri.
        Tamil                  = _Tamil                  // Tamil is the set of Unicode characters in script Tamil.
        Tangut                 = _Tangut                 // Tangut is the set of Unicode characters in script Tangut.
        Telugu                 = _Telugu                 // Telugu is the set of Unicode characters in script Telugu.
        Thaana                 = _Thaana                 // Thaana is the set of Unicode characters in script Thaana.
        Thai                   = _Thai                   // Thai is the set of Unicode characters in script Thai.
        Tibetan                = _Tibetan                // Tibetan is the set of Unicode characters in script Tibetan.
        Tifinagh               = _Tifinagh               // Tifinagh is the set of Unicode characters in script Tifinagh.
        Tirhuta                = _Tirhuta                // Tirhuta is the set of Unicode characters in script Tirhuta.
        Ugaritic               = _Ugaritic               // Ugaritic is the set of Unicode characters in script Ugaritic.
        Vai                    = _Vai                    // Vai is the set of Unicode characters in script Vai.
        Warang_Citi            = _Warang_Citi            // Warang_Citi is the set of Unicode characters in script Warang_Citi.
        Yi                     = _Yi                     // Yi is the set of Unicode characters in script Yi.
)

这些变量有 * RangeTable 类型。

代码语言:javascript
复制
var (
        ASCII_Hex_Digit                    = _ASCII_Hex_Digit                    // ASCII_Hex_Digit is the set of Unicode characters with property ASCII_Hex_Digit.
        Bidi_Control                       = _Bidi_Control                       // Bidi_Control is the set of Unicode characters with property Bidi_Control.
        Dash                               = _Dash                               // Dash is the set of Unicode characters with property Dash.
        Deprecated                         = _Deprecated                         // Deprecated is the set of Unicode characters with property Deprecated.
        Diacritic                          = _Diacritic                          // Diacritic is the set of Unicode characters with property Diacritic.
        Extender                           = _Extender                           // Extender is the set of Unicode characters with property Extender.
        Hex_Digit                          = _Hex_Digit                          // Hex_Digit is the set of Unicode characters with property Hex_Digit.
        Hyphen                             = _Hyphen                             // Hyphen is the set of Unicode characters with property Hyphen.
        IDS_Binary_Operator                = _IDS_Binary_Operator                // IDS_Binary_Operator is the set of Unicode characters with property IDS_Binary_Operator.
        IDS_Trinary_Operator               = _IDS_Trinary_Operator               // IDS_Trinary_Operator is the set of Unicode characters with property IDS_Trinary_Operator.
        Ideographic                        = _Ideographic                        // Ideographic is the set of Unicode characters with property Ideographic.
        Join_Control                       = _Join_Control                       // Join_Control is the set of Unicode characters with property Join_Control.
        Logical_Order_Exception            = _Logical_Order_Exception            // Logical_Order_Exception is the set of Unicode characters with property Logical_Order_Exception.
        Noncharacter_Code_Point            = _Noncharacter_Code_Point            // Noncharacter_Code_Point is the set of Unicode characters with property Noncharacter_Code_Point.
        Other_Alphabetic                   = _Other_Alphabetic                   // Other_Alphabetic is the set of Unicode characters with property Other_Alphabetic.
        Other_Default_Ignorable_Code_Point = _Other_Default_Ignorable_Code_Point // Other_Default_Ignorable_Code_Point is the set of Unicode characters with property Other_Default_Ignorable_Code_Point.
        Other_Grapheme_Extend              = _Other_Grapheme_Extend              // Other_Grapheme_Extend is the set of Unicode characters with property Other_Grapheme_Extend.
        Other_ID_Continue                  = _Other_ID_Continue                  // Other_ID_Continue is the set of Unicode characters with property Other_ID_Continue.
        Other_ID_Start                     = _Other_ID_Start                     // Other_ID_Start is the set of Unicode characters with property Other_ID_Start.
        Other_Lowercase                    = _Other_Lowercase                    // Other_Lowercase is the set of Unicode characters with property Other_Lowercase.
        Other_Math                         = _Other_Math                         // Other_Math is the set of Unicode characters with property Other_Math.
        Other_Uppercase                    = _Other_Uppercase                    // Other_Uppercase is the set of Unicode characters with property Other_Uppercase.
        Pattern_Syntax                     = _Pattern_Syntax                     // Pattern_Syntax is the set of Unicode characters with property Pattern_Syntax.
        Pattern_White_Space                = _Pattern_White_Space                // Pattern_White_Space is the set of Unicode characters with property Pattern_White_Space.
        Prepended_Concatenation_Mark       = _Prepended_Concatenation_Mark       // Prepended_Concatenation_Mark is the set of Unicode characters with property Prepended_Concatenation_Mark.
        Quotation_Mark                     = _Quotation_Mark                     // Quotation_Mark is the set of Unicode characters with property Quotation_Mark.
        Radical                            = _Radical                            // Radical is the set of Unicode characters with property Radical.
        STerm                              = _Sentence_Terminal                  // STerm is an alias for Sentence_Terminal.
        Sentence_Terminal                  = _Sentence_Terminal                  // Sentence_Terminal is the set of Unicode characters with property Sentence_Terminal.
        Soft_Dotted                        = _Soft_Dotted                        // Soft_Dotted is the set of Unicode characters with property Soft_Dotted.
        Terminal_Punctuation               = _Terminal_Punctuation               // Terminal_Punctuation is the set of Unicode characters with property Terminal_Punctuation.
        Unified_Ideograph                  = _Unified_Ideograph                  // Unified_Ideograph is the set of Unicode characters with property Unified_Ideograph.
        Variation_Selector                 = _Variation_Selector                 // Variation_Selector is the set of Unicode characters with property Variation_Selector.
        White_Space                        = _White_Space                        // White_Space is the set of Unicode characters with property White_Space.
)

CaseRanges 是描述具有非自映射的所有字母的大小写映射的表格。

代码语言:javascript
复制
var CaseRanges = _CaseRanges

类别是一组 Unicode 类别表。

代码语言:javascript
复制
var Categories = map[string]*RangeTable{
        "C":  C,
        "Cc": Cc,
        "Cf": Cf,
        "Co": Co,
        "Cs": Cs,
        "L":  L,
        "Ll": Ll,
        "Lm": Lm,
        "Lo": Lo,
        "Lt": Lt,
        "Lu": Lu,
        "M":  M,
        "Mc": Mc,
        "Me": Me,
        "Mn": Mn,
        "N":  N,
        "Nd": Nd,
        "Nl": Nl,
        "No": No,
        "P":  P,
        "Pc": Pc,
        "Pd": Pd,
        "Pe": Pe,
        "Pf": Pf,
        "Pi": Pi,
        "Po": Po,
        "Ps": Ps,
        "S":  S,
        "Sc": Sc,
        "Sk": Sk,
        "Sm": Sm,
        "So": So,
        "Z":  Z,
        "Zl": Zl,
        "Zp": Zp,
        "Zs": Zs,
}

FoldCategory 将类别名称映射到类别外的代码点表,这些代码点在简单大小写折叠的情况下等同于类别内的代码点。如果没有类别名称的条目,则不存在这样的点。

代码语言:javascript
复制
var FoldCategory = map[string]*RangeTable{
        "L":  foldL,
        "Ll": foldLl,
        "Lt": foldLt,
        "Lu": foldLu,
        "M":  foldM,
        "Mn": foldMn,
}

FoldScript 将脚本名称映射到脚本外的代码点表,这些代码点在简单案例折叠到脚本内的代码点之后是等同的。如果没有条目名称的条目,则没有这样的条目。

代码语言:javascript
复制
var FoldScript = map[string]*RangeTable{
        "Common":    foldCommon,
        "Greek":     foldGreek,
        "Inherited": foldInherited,
}

GraphicRanges 根据 Unicode 定义了一组图形字符。

代码语言:javascript
复制
var GraphicRanges = []*RangeTable{
        L, M, N, P, S, Zs,
}

PrintRanges 根据 Go 定义一组可打印的字符。ASCII 空间 U+0020 分开处理。

代码语言:javascript
复制
var PrintRanges = []*RangeTable{
        L, M, N, P, S,
}

属性是 Unicode 属性表的集合。

代码语言:javascript
复制
var Properties = map[string]*RangeTable{
        "ASCII_Hex_Digit":                    ASCII_Hex_Digit,
        "Bidi_Control":                       Bidi_Control,
        "Dash":                               Dash,
        "Deprecated":                         Deprecated,
        "Diacritic":                          Diacritic,
        "Extender":                           Extender,
        "Hex_Digit":                          Hex_Digit,
        "Hyphen":                             Hyphen,
        "IDS_Binary_Operator":                IDS_Binary_Operator,
        "IDS_Trinary_Operator":               IDS_Trinary_Operator,
        "Ideographic":                        Ideographic,
        "Join_Control":                       Join_Control,
        "Logical_Order_Exception":            Logical_Order_Exception,
        "Noncharacter_Code_Point":            Noncharacter_Code_Point,
        "Other_Alphabetic":                   Other_Alphabetic,
        "Other_Default_Ignorable_Code_Point": Other_Default_Ignorable_Code_Point,
        "Other_Grapheme_Extend":              Other_Grapheme_Extend,
        "Other_ID_Continue":                  Other_ID_Continue,
        "Other_ID_Start":                     Other_ID_Start,
        "Other_Lowercase":                    Other_Lowercase,
        "Other_Math":                         Other_Math,
        "Other_Uppercase":                    Other_Uppercase,
        "Pattern_Syntax":                     Pattern_Syntax,
        "Pattern_White_Space":                Pattern_White_Space,
        "Prepended_Concatenation_Mark":       Prepended_Concatenation_Mark,
        "Quotation_Mark":                     Quotation_Mark,
        "Radical":                            Radical,
        "Sentence_Terminal":                  Sentence_Terminal,
        "STerm":                              Sentence_Terminal,
        "Soft_Dotted":                        Soft_Dotted,
        "Terminal_Punctuation":               Terminal_Punctuation,
        "Unified_Ideograph":                  Unified_Ideograph,
        "Variation_Selector":                 Variation_Selector,
        "White_Space":                        White_Space,
}

脚本是一组 Unicode 脚本表。

代码语言:javascript
复制
var Scripts = map[string]*RangeTable{
        "Adlam":                  Adlam,
        "Ahom":                   Ahom,
        "Anatolian_Hieroglyphs":  Anatolian_Hieroglyphs,
        "Arabic":                 Arabic,
        "Armenian":               Armenian,
        "Avestan":                Avestan,
        "Balinese":               Balinese,
        "Bamum":                  Bamum,
        "Bassa_Vah":              Bassa_Vah,
        "Batak":                  Batak,
        "Bengali":                Bengali,
        "Bhaiksuki":              Bhaiksuki,
        "Bopomofo":               Bopomofo,
        "Brahmi":                 Brahmi,
        "Braille":                Braille,
        "Buginese":               Buginese,
        "Buhid":                  Buhid,
        "Canadian_Aboriginal":    Canadian_Aboriginal,
        "Carian":                 Carian,
        "Caucasian_Albanian":     Caucasian_Albanian,
        "Chakma":                 Chakma,
        "Cham":                   Cham,
        "Cherokee":               Cherokee,
        "Common":                 Common,
        "Coptic":                 Coptic,
        "Cuneiform":              Cuneiform,
        "Cypriot":                Cypriot,
        "Cyrillic":               Cyrillic,
        "Deseret":                Deseret,
        "Devanagari":             Devanagari,
        "Duployan":               Duployan,
        "Egyptian_Hieroglyphs":   Egyptian_Hieroglyphs,
        "Elbasan":                Elbasan,
        "Ethiopic":               Ethiopic,
        "Georgian":               Georgian,
        "Glagolitic":             Glagolitic,
        "Gothic":                 Gothic,
        "Grantha":                Grantha,
        "Greek":                  Greek,
        "Gujarati":               Gujarati,
        "Gurmukhi":               Gurmukhi,
        "Han":                    Han,
        "Hangul":                 Hangul,
        "Hanunoo":                Hanunoo,
        "Hatran":                 Hatran,
        "Hebrew":                 Hebrew,
        "Hiragana":               Hiragana,
        "Imperial_Aramaic":       Imperial_Aramaic,
        "Inherited":              Inherited,
        "Inscriptional_Pahlavi":  Inscriptional_Pahlavi,
        "Inscriptional_Parthian": Inscriptional_Parthian,
        "Javanese":               Javanese,
        "Kaithi":                 Kaithi,
        "Kannada":                Kannada,
        "Katakana":               Katakana,
        "Kayah_Li":               Kayah_Li,
        "Kharoshthi":             Kharoshthi,
        "Khmer":                  Khmer,
        "Khojki":                 Khojki,
        "Khudawadi":              Khudawadi,
        "Lao":                    Lao,
        "Latin":                  Latin,
        "Lepcha":                 Lepcha,
        "Limbu":                  Limbu,
        "Linear_A":               Linear_A,
        "Linear_B":               Linear_B,
        "Lisu":                   Lisu,
        "Lycian":                 Lycian,
        "Lydian":                 Lydian,
        "Mahajani":               Mahajani,
        "Malayalam":              Malayalam,
        "Mandaic":                Mandaic,
        "Manichaean":             Manichaean,
        "Marchen":                Marchen,
        "Meetei_Mayek":           Meetei_Mayek,
        "Mende_Kikakui":          Mende_Kikakui,
        "Meroitic_Cursive":       Meroitic_Cursive,
        "Meroitic_Hieroglyphs":   Meroitic_Hieroglyphs,
        "Miao":                   Miao,
        "Modi":                   Modi,
        "Mongolian":              Mongolian,
        "Mro":                    Mro,
        "Multani":                Multani,
        "Myanmar":                Myanmar,
        "Nabataean":              Nabataean,
        "New_Tai_Lue":            New_Tai_Lue,
        "Newa":                   Newa,
        "Nko":                    Nko,
        "Ogham":                  Ogham,
        "Ol_Chiki":               Ol_Chiki,
        "Old_Hungarian":          Old_Hungarian,
        "Old_Italic":             Old_Italic,
        "Old_North_Arabian":      Old_North_Arabian,
        "Old_Permic":             Old_Permic,
        "Old_Persian":            Old_Persian,
        "Old_South_Arabian":      Old_South_Arabian,
        "Old_Turkic":             Old_Turkic,
        "Oriya":                  Oriya,
        "Osage":                  Osage,
        "Osmanya":                Osmanya,
        "Pahawh_Hmong":           Pahawh_Hmong,
        "Palmyrene":              Palmyrene,
        "Pau_Cin_Hau":            Pau_Cin_Hau,
        "Phags_Pa":               Phags_Pa,
        "Phoenician":             Phoenician,
        "Psalter_Pahlavi":        Psalter_Pahlavi,
        "Rejang":                 Rejang,
        "Runic":                  Runic,
        "Samaritan":              Samaritan,
        "Saurashtra":             Saurashtra,
        "Sharada":                Sharada,
        "Shavian":                Shavian,
        "Siddham":                Siddham,
        "SignWriting":            SignWriting,
        "Sinhala":                Sinhala,
        "Sora_Sompeng":           Sora_Sompeng,
        "Sundanese":              Sundanese,
        "Syloti_Nagri":           Syloti_Nagri,
        "Syriac":                 Syriac,
        "Tagalog":                Tagalog,
        "Tagbanwa":               Tagbanwa,
        "Tai_Le":                 Tai_Le,
        "Tai_Tham":               Tai_Tham,
        "Tai_Viet":               Tai_Viet,
        "Takri":                  Takri,
        "Tamil":                  Tamil,
        "Tangut":                 Tangut,
        "Telugu":                 Telugu,
        "Thaana":                 Thaana,
        "Thai":                   Thai,
        "Tibetan":                Tibetan,
        "Tifinagh":               Tifinagh,
        "Tirhuta":                Tirhuta,
        "Ugaritic":               Ugaritic,
        "Vai":                    Vai,
        "Warang_Citi":            Warang_Citi,
        "Yi":                     Yi,
}

func InSource

代码语言:javascript
复制
func In(r rune, ranges ...*RangeTable) bool

在报告中,符文是否是其中一个范围的成员。

func IsSource

代码语言:javascript
复制
func Is(rangeTab *RangeTable, r rune) bool

Is 报告符文是否在指定的范围表中。

func IsControlSource

代码语言:javascript
复制
func IsControl(r rune) bool

IsControl 报告符文是否是控制角色。C(其他)Unicode 类别包含更多代码点,例如代理; 使用 Is(C, r) 来测试它们。

func IsDigitSource

代码语言:javascript
复制
func IsDigit(r rune) bool

IsDigit 报告符文是否为十进制数字。

func IsGraphicSource

代码语言:javascript
复制
func IsGraphic(r rune) bool

IsGraphic 报告符文是否被Unicode定义为Graphic。这些字符包括类别 L,M,N,P,S,Zs 中的字母,标记,数字,标点,符号和空格。

func IsLetterSource

代码语言:javascript
复制
func IsLetter(r rune) bool

IsLetter 报告符文是否为字母(L类)。

func IsLowerSource

代码语言:javascript
复制
func IsLower(r rune) bool

IsLower 报告符文是否是小写字母。

func IsMarkSource

代码语言:javascript
复制
func IsMark(r rune) bool

IsMark 报告符文是否为标记符(M类)。

func IsNumberSource

代码语言:javascript
复制
func IsNumber(r rune) bool

IsNumber 报告符文是否是一个数字(类别N)。

func IsOneOfSource

代码语言:javascript
复制
func IsOneOf(ranges []*RangeTable, r rune) bool

IsOneOf 报告符文是否是其中一个范围的成员。函数“In”提供了更好的签名,应优先使用 IsOneOf。

func IsPrintSource

代码语言:javascript
复制
func IsPrint(r rune) bool

IsPrint 报告符文是否被 Go 定义为可打印。这些字符包括类别 L,M,N,P,S 和 ASCII 空格字符中的字母,标记,数字,标点,符号和 ASCII 空格字符。除了唯一的空格字符是 ASCII 空格 U+0020 之外,该分类与 IsGraphic 相同。

func IsPunctSource

代码语言:javascript
复制
func IsPunct(r rune) bool

IsPunct 报告符文是否是 Unicode 标点符号(类别 P)。

func IsSpaceSource

代码语言:javascript
复制
func IsSpace(r rune) bool

IsSpace 报告符文是否是由 Unicode 的空白属性定义的空格字符; 在拉丁美洲1空间

代码语言:javascript
复制
'\t', '\n', '\v', '\f', '\r', ' ', U+0085 (NEL), U+00A0 (NBSP).

间距字符的其他定义由类别 Z 和属性 Pattern_White_Space 设置。

func IsSymbolSource

代码语言:javascript
复制
func IsSymbol(r rune) bool

IsSymbol 报告符文是否是一个象征性的特征。

func IsTitleSource

代码语言:javascript
复制
func IsTitle(r rune) bool

IsTitle 报告符文是否是一个标题大小写字母。

func IsUpperSource

代码语言:javascript
复制
func IsUpper(r rune) bool

IsUpper 报告符文是否是大写字母。

func SimpleFoldSource

代码语言:javascript
复制
func SimpleFold(r rune) rune

SimpleFold 在 Unicode 定义的简单大小写折叠下迭代 Unicode 代码点。在相当于符文的代码点(包括符文本身)中,SimpleFold返回最小的符文> r,否则最小的符文> = 0.如果r不是有效的 Unicode 代码点,则 Si​​mpleFold(r) 返回 r。

例如:

代码语言:javascript
复制
SimpleFold('A') = 'a'
SimpleFold('a') = 'A'

SimpleFold('K') = 'k'
SimpleFold('k') = '\u212A' (Kelvin symbol, K)
SimpleFold('\u212A') = 'K'

SimpleFold('1') = '1'

SimpleFold(-2) = -2

示例

代码语言:javascript
复制
package main

import (
	"fmt"
	"unicode"
)

func main() {
	fmt.Printf("%#U\n", unicode.SimpleFold('A'))      // 'a'
	fmt.Printf("%#U\n", unicode.SimpleFold('a'))      // 'A'
	fmt.Printf("%#U\n", unicode.SimpleFold('K'))      // 'k'
	fmt.Printf("%#U\n", unicode.SimpleFold('k'))      // '\u212A' (Kelvin symbol, K)
	fmt.Printf("%#U\n", unicode.SimpleFold('\u212A')) // 'K'
	fmt.Printf("%#U\n", unicode.SimpleFold('1'))      // '1'

}

func ToSource

代码语言:javascript
复制
func To(_case int, r rune) rune

将符号映射到指定的情况:UpperCase, LowerCase 或 TitleCase。

示例

代码语言:javascript
复制
package main

import (
	"fmt"
	"unicode"
)

func main() {
	const lcG = 'g'
	fmt.Printf("%#U\n", unicode.To(unicode.UpperCase, lcG))
	fmt.Printf("%#U\n", unicode.To(unicode.LowerCase, lcG))
	fmt.Printf("%#U\n", unicode.To(unicode.TitleCase, lcG))

	const ucG = 'G'
	fmt.Printf("%#U\n", unicode.To(unicode.UpperCase, ucG))
	fmt.Printf("%#U\n", unicode.To(unicode.LowerCase, ucG))
	fmt.Printf("%#U\n", unicode.To(unicode.TitleCase, ucG))

}

func ToLowerSource

代码语言:javascript
复制
func ToLower(r rune) rune

ToLower 将符文映射为小写字母。

示例

代码语言:javascript
复制
package main

import (
	"fmt"
	"unicode"
)

func main() {
	const ucG = 'G'
	fmt.Printf("%#U\n", unicode.ToLower(ucG))

}

func ToTitleSource

代码语言:javascript
复制
func ToTitle(r rune) rune

ToTitle 将符文映射到标题大小写。

示例

代码语言:javascript
复制
package main

import (
	"fmt"
	"unicode"
)

func main() {
	const ucG = 'g'
	fmt.Printf("%#U\n", unicode.ToTitle(ucG))

}

func ToUpperSource

代码语言:javascript
复制
func ToUpper(r rune) rune

ToUpper 将符文映射为大写。

示例

代码语言:javascript
复制
package main

import (
	"fmt"
	"unicode"
)

func main() {
	const ucG = 'g'
	fmt.Printf("%#U\n", unicode.ToUpper(ucG))

}

键入 CaseRangeSource

CaseRange 表示简单(一个代码点到一个代码点)大小写转换的一系列 Unicode 代码点。范围从 Lo 到 Hi 包含,固定步幅为 1. Deltas 是要添加到代码点以达到该角色不同情况的代码点的数字。他们可能是消极的。如果为零,则表示该角色处于相应的情况。有一个特殊情况代表交替对应的上和下对的序列。它与一个固定的 Delta 相似

代码语言:javascript
复制
{UpperLower, UpperLower, UpperLower}

常数 UpperLower 具有不可能的增量值。

代码语言:javascript
复制
type CaseRange struct {
        Lo    uint32
        Hi    uint32
        Delta d
}

键入 Range16Source

Range16 表示一系列16位 Unicode 代码点。范围从 Lo 到 Hi 包含并且具有指定的步幅。

代码语言:javascript
复制
type Range16 struct {
        Lo     uint16
        Hi     uint16
        Stride uint16
}

键入 Range32Source

Range32 代表一系列 Unicode 代码点,当一个或多个值不适合16位时使用。范围从 Lo 到 Hi 包含并且具有指定的步幅。 Lo 和 Hi 必须始终> = 1 << 16。

代码语言:javascript
复制
type Range32 struct {
        Lo     uint32
        Hi     uint32
        Stride uint32
}

键入 RangeTableSource

RangeTable 通过列出集合中代码点的范围来定义一组 Unicode 代码点。范围在两个切片中列出以节省空间:16位范围片段和32位范围片段。这两个切片必须按排序顺序且不重叠。另外,R32 应该只包含 > = 0x10000(1 << 16) 的值。

代码语言:javascript
复制
type RangeTable struct {
        R16         []Range16
        R32         []Range32
        LatinOffset int // number of entries in R16 with Hi <= MaxLatin1
}

键入 SpecialCaseSource

SpecialCase 代表语言特定的案例映射,如土耳其语。SpecialCase 的方法自定义(通过覆盖)标准映射。

代码语言:javascript
复制
type SpecialCase []CaseRange
代码语言:javascript
复制
var AzeriCase SpecialCase = _TurkishCase
代码语言:javascript
复制
var TurkishCase SpecialCase = _TurkishCase

示例

代码语言:javascript
复制
package main

import (
	"fmt"
	"unicode"
)

func main() {
	t := unicode.TurkishCase

	const lci = 'i'
	fmt.Printf("%#U\n", t.ToLower(lci))
	fmt.Printf("%#U\n", t.ToTitle(lci))
	fmt.Printf("%#U\n", t.ToUpper(lci))

	const uci = 'İ'
	fmt.Printf("%#U\n", t.ToLower(uci))
	fmt.Printf("%#U\n", t.ToTitle(uci))
	fmt.Printf("%#U\n", t.ToUpper(uci))

}

func (SpecialCase) ToLowerSource

代码语言:javascript
复制
func (special SpecialCase) ToLower(r rune) rune

ToLower 将符文映射为小写,优先考虑特殊映射。

func (SpecialCase) ToTitleSource

代码语言:javascript
复制
func (special SpecialCase) ToTitle(r rune) rune

ToTitle 将符文映射为标题情况,优先考虑特殊映射。

func (SpecialCase) ToUpperSource

代码语言:javascript
复制
func (special SpecialCase) ToUpper(r rune) rune

ToUpper 将符文映射为大写,优先考虑特殊映射。

错误

  • 没有完整大小写折叠的机制,也就是说,对于在输入或输出中涉及多个符文的字符。

子目录

名称

概要

| .. |

| utf16 | 打包 utf16 实现 UTF-16 序列的编码和解码。|

| utf8 | 打包 utf8 实现函数和常量以支持以 UTF-8 编码的文本。|

扫码关注腾讯云开发者

领取腾讯云代金券