Api reference
TextEncodeBase.AbstractTokenization
TextEncodeBase.AbstractTokenizer
TextEncodeBase.ConstTerm
TextEncodeBase.FlatTokenizer
TextEncodeBase.IndexInputTerm
TextEncodeBase.InputTerm
TextEncodeBase.NestedTokenizer
TextEncodeBase.RepeatedTerm
TextEncodeBase.SequenceTemplate
TextEncodeBase.Splittability
TextEncodeBase.TemplateTerm
TextEncodeBase.TextEncoder
TextEncodeBase.TextEncoder
TextEncodeBase.TokenStages
TextEncodeBase.Vocab
TextEncodeBase.Vocab
TextEncodeBase.batch2nested
TextEncodeBase.decode
TextEncodeBase.decode_indices
TextEncodeBase.decode_text
TextEncodeBase.encode
TextEncodeBase.encode_indices
TextEncodeBase.join_text
TextEncodeBase.lookup
TextEncodeBase.lookup
TextEncodeBase.lookup
TextEncodeBase.lookup
TextEncodeBase.lookup
TextEncodeBase.lookup
TextEncodeBase.lookup
TextEncodeBase.matchsplits
TextEncodeBase.matchsplits
TextEncodeBase.nested2batch
TextEncodeBase.onehot_encode
TextEncodeBase.peek_sequence_sample_type
TextEncodeBase.preprocess
TextEncodeBase.process
TextEncodeBase.process
TextEncodeBase.sequence_sample_type
TextEncodeBase.splittability
TextEncodeBase.splittable
TextEncodeBase.splitting
TextEncodeBase.tokenization
TextEncodeBase.tokenize
TextEncodeBase.tokenize_procedure
TextEncodeBase.trunc_and_pad
TextEncodeBase.trunc_or_pad
TextEncodeBase.type_sequence_sample_type
TextEncodeBase.with_head_tail
TextEncodeBase.wrap
TextEncodeBase.wrap
TextEncodeBase.@stage
TextEncodeBase.AbstractTokenization
— Typeabstract type for tokenization.
The tokenization procedure is separate into multiple TokenStages
and recursive calls of splitting
, wrap
, and tokenize
. splitting
break string into substrings, wrap
mark the substrings with new TokenStages
, and tokenize
is responsible for the tokenization.
TextEncodeBase.AbstractTokenizer
— Typeabstract type for tokenizers.
Each tokenizer is link with a tokenization (by defining tokenization(::Tokenizer) = Tokenization()
). The overall framework dispatch on both tokenizer and tokenization, but most of the time we only add methods for tokenization. This allow further composability and can interfere the tokenization process with given tokenizer.
TextEncodeBase.ConstTerm
— TypeConstTerm(value::T, type_id = 1)
A TemplateTerm
that simply put value
to the output sequence.
TextEncodeBase.FlatTokenizer
— Typetokenizer that return flat array instead of nested array of tokens
TextEncodeBase.IndexInputTerm
— TypeIndexInputTerm{T}(idx::Int, type_id = 1)
A TemplateTerm
that take the idx
-th sequence of the input. If the IndexInputTerm
is also the idx
-th input related term in a SequenceTemplate
, it behave the same as InputTerm
.
TextEncodeBase.InputTerm
— TypeInputTerm{T}(type_id = 1)
A TemplateTerm
that take out a sequence from the input.
TextEncodeBase.NestedTokenizer
— Typetokenizer that return nested array instead of flat array of tokens
TextEncodeBase.RepeatedTerm
— TypeRepeatedTerm(terms::TemplateTerm...; dynamic_type_id = false)
A special term that indicate the terms
sequence can appear zero or multiple times. Cannot be nested. If dynamic_type_id
is set, each repeat would add an offset value to the type id of those repeat terms
. The offset value if the number of repetiton, starting form 0
, times dynamic_type_id
.
TextEncodeBase.SequenceTemplate
— TypeSequenceTemplate(terms::TemplateTerm)(sequences...)
Constructing a function by multiple TemplateTerm
that indicate how to combine the input sequences
. Return a tuple of the result sequence and a type id (a special number associated with the template term) sequence.
Example
julia> SequenceTemplate(ConstTerm(-1), InputTerm{Int}(), ConstTerm(-2))(1:5)[1] == TextEncodeBase.with_head_tail(1:5, -1, -2)
true
julia> SequenceTemplate(ConstTerm(-1), InputTerm{Int}(), ConstTerm(-2))(1:5)
([-1, 1, 2, 3, 4, 5, -2], [1, 1, 1, 1, 1, 1, 1])
julia> bert_template = SequenceTemplate(
ConstTerm("[CLS]", 1), InputTerm{String}(1), ConstTerm("[SEP]", 1),
RepeatedTerm(InputTerm{String}(2), ConstTerm("[SEP]", 2))
)
SequenceTemplate{String}([CLS]:<type=1> Input:<type=1> [SEP]:<type=1> (Input:<type=2> [SEP]:<type=2>)...)
julia> bert_template(["hello", "world"])
(["[CLS]", "hello", "world", "[SEP]"], [1, 1, 1, 1])
julia> bert_template(["hello", "world"], ["today", "is", "a", "good", "day"])
(["[CLS]", "hello", "world", "[SEP]", "today", "is", "a", "good", "day", "[SEP]"], [1, 1, 1, 1, 2, 2, 2, 2, 2, 2])
TextEncodeBase.Splittability
— Typesplittability trait
The splittability trait decide whether the given combination (tokenizer x tokenization x stage) is splittable or not (Splittable
or UnSplittable
). For example, DefaultTokenization
and SentenceStage
is splittable (i.e. splittability(::DefaultTokenization, ::SentenceStage) = Splittable()
). The splittability change the behavior of tokenize
: if it's splittable, tokenize
will try to call splitting
on the input, wrap
each splitting result and recurse. Otherwise, it will directly call wrap
and then recurse into tokenize
.
TextEncodeBase.TemplateTerm
— Typeabstract type TemplateTerm{T} end
Abstract type for term used in SequenceTemplate
.
TextEncodeBase.TextEncoder
— TypeTextEncoder(tokenizer, vocab, process = nestedcall(getvalue))
A simple encoder implementation.
TextEncodeBase.TextEncoder
— MethodTextEncoder(builder, e::TextEncoder)
Given an encoder, return a new encoder that has the same tokenizer and vocabulary. builder
is a function that take a encoder and return a new processing function.
TextEncodeBase.TokenStages
— Typeabstract type for type that wrap input into specific stage for control tokenization.
There are six builtin stages in TextEncodeBase (all abstract XStage <: TokenStages):
1. Document <: DocumentStage: the input string is a full document,
and thus need to be splitted into multiple sentence.
2. Sentence <: SentenceStage: the input string is a full string,
and thus need to be splitted into multiple part (SubSentence/Word/Token).
3. SubSentence <: SubSentenceStage: special wrapper for case where the tokenizer
does not directly break sentence all into words/tokens and these pieces contain
multiple words/tokens, but you need the information that they are not full sentence.
4. Word <: WordStage: the input string is a single word.
5. SubWord <: SubWordStage: similar to SubSentence, but for word.
6. Token <: TokenStage: the final piece of the tokenization process.
Generally, it's used to specify the end of this piece and should
never be splitted.
Each wrapper have two field: x
for the input, meta
for extra information (nothing
if not provided).
TextEncodeBase.Vocab
— TypeVocab(data::Vector{<:AbstractString}, unk::AbstractString="[UNK]")
Constructor for Vocab
. data
is the list of vocabulary word, can be nonunique. The actual list will be the unique version of data
(i.e. vocab.list = unique(data)
). unk
is the indicator word for all unknown words. unk
can be either in or not in data
, depends on the use case.
TextEncodeBase.Vocab
— MethodVocab{T}(data::AbstractVector, unk) where T
construct Vocab with element type T
. unk
must be specified.
TextEncodeBase.batch2nested
— Methodbatch2nested(x)
convert single array into nested array.
See also: nested2batch
Example
julia> x = ["a" "d"; "b" "e"; "c" "f";;; "x" "u"; "y" "v"; "z" "w"; ]
3×2×2 Array{String, 3}:
[:, :, 1] =
"a" "d"
"b" "e"
"c" "f"
[:, :, 2] =
"x" "u"
"y" "v"
"z" "w"
julia> TextEncodeBase.batch2nested(x)
2-element Vector{Vector{Vector{String}}}:
[["a", "b", "c"], ["d", "e", "f"]]
[["x", "y", "z"], ["u", "v", "w"]]
TextEncodeBase.decode
— Methoddecode(e::AbstractTextEncoder, x)
Decode x
. This is basically decode_indices
but can be overloaded for post-processing.
TextEncodeBase.decode_indices
— Methoddecode_indices(e::AbstractTextEncoder, x)
Decode from indices. Decode x
by reverse lookup x
in e.vocab
.
TextEncodeBase.decode_text
— Methoddecode_text(e::AbstractTextEncoder, x)
Decode x
into texts. This is basically join_text
with decode
but can be overloaded for post-processing.
TextEncodeBase.encode
— Methodencode(e::AbstractTextEncoder, x)
Encode x
.
TextEncodeBase.encode_indices
— Methodencode_indices(e::AbstractTextEncoder, x)
Encode for indices. Encode x
without calling lookup
bound with e
.
TextEncodeBase.join_text
— Functionjoin_text(x::AbstractArray [, delim [, last]])
join
the inner most array and preserve the array structure. If the inner most array is multi-dimensional, join
text along the first dimension.
Example
julia> TextEncodeBase.join_text([["a", "b", "c"], ['x', 'y', 'z']])
2-element Vector{String}:
"abc"
"xyz"
julia> TextEncodeBase.join_text([["a", "b", "c"], ['x', 'y', 'z']], " + ")
2-element Vector{String}:
"a + b + c"
"x + y + z"
julia> TextEncodeBase.join_text([[["a", "b", "c"], ['x', 'y', 'z']]], " + ", " = ")
1-element Vector{Vector{String}}:
["a + b = c", "x + y = z"]
julia> TextEncodeBase.join_text(["a" "d"; "b" "e"; "c" "f";;; "x" "u"; "y" "v"; "z" "w"; ], " + ", " = ")
2×2 Matrix{String}:
"a + b = c" "x + y = z"
"d + e = f" "u + v = w"
TextEncodeBase.lookup
— Functionlookup(v::Vocab, x)
Lookup x
in v
. lookup
words depends on the type of x
. If x
is an integer, return the x
-th word on the vocabulary list (i.e. v.list[x]
) and return the unknown word if x
is out-of-bound (v.unk
). If x
is a string, return the indice of x
in the vocabulary list (i.e findfirst(==(x), v.list
) and return the unknown indice if x
not found in the list. If the unknown word v.unk
is in the list, the unknown indice is its indice, otherwise 0.
This function is bidirectional except for Vocab{<:Integer}
. For integer vocabulary, this function only get the x
-th word (v.list[x]
). Use lookup(Int, v, x)
for explicit indice lookup.
Example
julia> vocab = Vocab(["a", "b", "c", "a", "b", "c"])
Vocab{String, StaticArrays.SizedVector{3, String, Vector{String}}}(size = 3, unk = [UNK], unki = 0)
julia> vocab_unk = Vocab(["a", "b", "xxx"], "xxx")
Vocab{String, StaticArrays.SizedVector{3, String, Vector{String}}}(size = 3, unk = xxx, unki = 3)
julia> lookup(vocab, "b")
2
julia> lookup(vocab, "d")
0
julia> lookup(vocab_unk, "d")
3
julia> lookup(vocab, 1)
"a"
julia> lookup(vocab, 10000)
"[UNK]"
julia> lookup(vocab_unk, 10000)
"xxx"
TextEncodeBase.lookup
— Methodlookup(e::AbstractTextEncoder, x)
Lookup x
. This is basically onehot_encode
but can be overloaded for extra processing.
TextEncodeBase.lookup
— Methodlookup(Int, v::Vocab, x)
The explicit version of lookup(v, x)
. Lookup the indice of x
in the vocabulary list. x
should have the same type as Vocab's element type.
Example
julia> vocab_unk = Vocab(["a", "b", "xxx"], "xxx")
Vocab{String, StaticArrays.SizedVector{3, String, Vector{String}}}(size = 3, unk = xxx, unki = 3)
julia> lookup(Int, vocab_unk, "b")
2
TextEncodeBase.lookup
— Methodlookup(OneHot, v::Vocab, i)
lookup i
and convert into one-hot representation.
Example
julia> lookup(OneHot, vocab, "a")
3-element OneHot{3}:
1
0
0
julia> lookup(OneHot, vocab, ["a" "b"; "c" "d"])
3x2x2 OneHotArray{3, 3, Matrix{OneHot{0x00000003}}}:
[:, :, 1] =
1 0
0 0
0 1
[:, :, 2] =
0 0
1 0
0 0
julia> lookup(OneHot, vocab, 3)
ERROR: DomainError with c:
cannot convert `lookup(::Vocab, 3)` = "c" into one-hot representation.
Stacktrace:
[...]
TextEncodeBase.lookup
— Methodlookup(v::Vocab, is::AbstractArray)
recursively lookup value from is
Example
julia> lookup(vocab, ["b", "c", "a", "A", "[UNK]"])
5-element Vector{Int64}:
2
3
1
0
0
julia> lookup(vocab, [1, "a", 0, "A", "[UNK]"])
5-element Vector{Any}:
"a"
1
"[UNK]"
0
0
TextEncodeBase.lookup
— Methodlookup(v::Vocab, i::OneHotArray)
convert the one-hot representation back into words.
Example
julia> lookup(OneHot, vocab, ["a" "b"; "c" "d"])
3x2x2 OneHotArray{3, 3, Matrix{OneHot{0x00000003}}}:
[:, :, 1] =
1 0
0 0
0 1
[:, :, 2] =
0 0
1 0
0 0
julia> lookup(vocab, ans)
2×2 Matrix{String}:
"a" "b"
"c" "[UNK]"
TextEncodeBase.lookup
— Methodlookup(::Type{T}, v::Vocab{T}, i::Integer) where T
The explicit version of lookup(v, i)
. Lookup the word at index i
on vocabulary list. T
should be the same type as Vocab's element type. This method won't work on integer vocab, use lookup(v, i)
directly.
Example
julia> vocab_unk = Vocab(["a", "b", "xxx"], "xxx")
Vocab{String, StaticArrays.SizedVector{3, String, Vector{String}}}(size = 3, unk = xxx, unki = 3)
julia> lookup(String, vocab_unk, 1)
"a"
TextEncodeBase.matchsplits
— Methodmatchsplits(pattern::AbstractPattern, str::String)
Split str
with the regular expression pattern
. Return a lazy iterator where each element is a Tuple{Bool, SubString}
. The Bool
indicate whether the SubString
is a match of pattern
.
Example
julia> matchsplits(r"a|c", "abc"^3)
MatchSplitIterator(r"a|c", "abcabcabc")
julia> collect(matchsplits(r"a|c", "abc"^3))
9-element Vector{Tuple{Bool, SubString{String}}}:
(1, "a")
(0, "b")
(1, "c")
(1, "a")
(0, "b")
(1, "c")
(1, "a")
(0, "b")
(1, "c")
TextEncodeBase.matchsplits
— Methodmatchsplits(patterns::Vector{<:AbstractPattern}, str::String)
Split str
with the list of regular expression patterns
. Return a lazy iterator where each element is a Tuple{Bool, SubString}
. The Bool
indicate whether the SubString
is a match of pattern
. The match order are specified by the list order.
Example
julia> matchsplits([r"a", r"c"], "abc"^3)
MatchSplits(Regex[r"a", r"c"], "abcabcabc")
julia> collect(matchsplits([r"a", r"c"], "abc"^3))
9-element Vector{Tuple{Bool, SubString{String}}}:
(1, "a")
(0, "b")
(1, "c")
(1, "a")
(0, "b")
(1, "c")
(1, "a")
(0, "b")
(1, "c")
julia> collect(matchsplits([r"ab", r"bc"], "abc"^3))
6-element Vector{Tuple{Bool, SubString{String}}}:
(1, "ab")
(0, "c")
(1, "ab")
(0, "c")
(1, "ab")
(0, "c")
TextEncodeBase.nested2batch
— Methodnested2batch(x)
convert nested array into single array
See also: batch2nested
Example
julia> TextEncodeBase.nested2batch([[[1 2],[3 4]]])
1×2×2×1 Array{Int64, 4}:
[:, :, 1, 1] =
1 2
[:, :, 2, 1] =
3 4
TextEncodeBase.onehot_encode
— Methodonehot_encode(e::AbstractTextEncoder, x)
Lookup x
in encoder's vocabulary. Return one-hot encoded vectors.
TextEncodeBase.peek_sequence_sample_type
— Methodpeek_sequence_sample_type([T::Type,] x)
Non-recursive version of sequence_sample_type
. Return -1
if the x
is an array of array with unknown elements, thus it's possible that sequence_sample_type(x[i]) == -2
. Specify T
to check if x
is a nested array with element type T
. If T
is not specified, every type not subtype to AbstractArray
is a count as element type.
see also: type_sequence_sample_type
, sequence_sample_type
Example
julia> TextEncodeBase.peek_sequence_sample_type([1,2,3])
1
julia> peek_sequence_sample_type(Int, Any[[[1,2,3]]]), sequence_sample_type(Int, Any[[[1,2,3]]])
(-1, 3)
julia> peek_sequence_sample_type(Int, [[[1,2,3], "abc"]]), sequence_sample_type(Int, [[[1,2,3], "abc"]])
(-1, -2)
TextEncodeBase.preprocess
— Methodpreprocess(tkr::AbstractTokenizer, x)
Preprocess the input x
. This is only called during tkr(x)
.
TextEncodeBase.process
— Methodprocess(e::AbstractTextEncoder, x)
Use encoder's processing function to process x
.
TextEncodeBase.process
— Methodprocess(::AbstractTextEncoder)
Get processing function of given encoder.
TextEncodeBase.sequence_sample_type
— Methodsequence_sample_type([T::Type,] x)
Get the depth of the nested array. If return natural number, x
is a nested array where each element has the same depth. Return -2
if x
is not a nested array or the depth of elements are different. Depth of empty array compute with the type and sequence_sample_type(Any[])
is 1
. Specify T
to check if x
is a nested array with element type T
. If T
is not specified, every type not subtype to AbstractArray
is a count as element type.
see also: type_sequence_sample_type
, peek_sequence_sample_type
Example
julia> sequence_sample_type([[1,2,3]])
2
julia> sequence_sample_type([[[2,3], [1]], Vector{Int}[]])
3
julia> sequence_sample_type([[[2,3], [1]], Any[]])
-2
julia> sequence_sample_type(Int, [[1,2], 3])
-2
julia> sequence_sample_type(Int, Any[[1,2], Int[]])
2
TextEncodeBase.splittability
— Functionsplittability(args...)
Return the splittability (Splittable
/UnSplittable
) of given argument combination. Overload to make a TokenStages
splittable.
TextEncodeBase.splittable
— Methodsplittable(args...)
Return true
if the splittability of given argument combination is Splittable()
.
TextEncodeBase.splitting
— Functionsplitting(t::AbstractTokenization, x::TokenStages)
Split x
given its tokenization stage. For example, the default behavior of a document stage is splitting into sentences (with WordTokenizers.split_sentences
).
Overload this method for custom tokenization.
TextEncodeBase.tokenization
— Methodtokenization(::AbstractTokenizer) :: AbstractTokenization
Return the tokenization object of given tokenizer.
TextEncodeBase.tokenize
— Methodtokenize(e::AbstractTextEncoder, x)
Use encoder's tokenizer to tokenize x
.
TextEncodeBase.tokenize_procedure
— Methodtokenization_procedure(tokenizer, tokenizaton, stage)
The procedure of tokenization (splitting
+ wrap
+ tokenize
).
TextEncodeBase.trunc_and_pad
— Functiontrunc_and_pad(x, maxn, pad)
Truncate x
if length exceed maxn
, and add pad
at the end of x until all length are the same. x
can be either nested or single array. If maxn
is nothing
, the largest length of the inner-most array will be used, then the behavior equals to trunc_or_pad
with nothing
.
trunc_and_pad(x, maxn, pad, trunc_end = :tail, pad_end = :tail)
trunc_end
and pad_end
specified whether the truncation and padding happened at the begining of the sentences or the end of the sentence. The value is either :tail
(means the end) or :head
(means the begining).
trunc_and_pad(maxn, pad, trunc_end = :tail, pad_end = :tail)
Create a function that truncate input to be length <= maxn
, and add pad
until all input has equal length.
see also: trunc_or_pad
Example
julia> TextEncodeBase.trunc_and_pad(1:5, 7, -1)
5-element Vector{Int64}:
1
2
3
4
5
julia> TextEncodeBase.trunc_and_pad([1:5, 2:7], 10, -1)
2-element Vector{Vector{Int64}}:
[1, 2, 3, 4, 5, -1]
[2, 3, 4, 5, 6, 7]
julia> TextEncodeBase.trunc_and_pad([1:5, [2:7, [1:2]]], nothing, -1)
2-element Vector{Vector}:
[1, 2, 3, 4, 5, -1]
Vector[[2, 3, 4, 5, 6, 7], [[1, 2, -1, -1, -1, -1]]]
TextEncodeBase.trunc_or_pad
— Functiontrunc_or_pad(x, n, pad)
Truncate x
to length n
, or add pad
at the end of x until length equal n
. x
can be either nested or single array. if n
is nothing
, the largest length of the inner-most array will be used.
trunc_or_pad(x, n, pad, trunc_end = :tail, pad_end = :tail)
trunc_end
and pad_end
specified whether the truncation and padding happened at the begining of the sentences or the end of the sentence. The value is either :tail
(means the end) or :head
(means the begining).
trunc_or_pad(n, pad, trunc_end = :tail, pad_end = :tail)
Create a function that will return new array with truncated or padded value of the input.
see also: trunc_and_pad
Example
julia> TextEncodeBase.trunc_or_pad(1:5, 7, -1)
7-element Vector{Int64}:
1
2
3
4
5
-1
-1
julia> TextEncodeBase.trunc_or_pad([1:5, 2:7], 10, -1)
2-element Vector{Vector{Int64}}:
[1, 2, 3, 4, 5, -1, -1, -1, -1, -1]
[2, 3, 4, 5, 6, 7, -1, -1, -1, -1]
julia> TextEncodeBase.trunc_or_pad([1:5, [2:7, [1:2]]], nothing, -1)
2-element Vector{Vector}:
[1, 2, 3, 4, 5, -1]
Vector[[2, 3, 4, 5, 6, 7], [[1, 2, -1, -1, -1, -1]]]
TextEncodeBase.type_sequence_sample_type
— Methodtype_sequence_sample_type([T::Type,] t::Type)
Get the depth of the nested array type. If return natural number, t
is a type of nested array. Return -1
if it cannot be known by type and return -2
if t
is not a nested array type. Specify T
to check if t
is a nested array type with element type T
. If T
is not specified, every type not subtype to AbstractArray
is a count as element type.
see also: sequence_sample_type
, peek_sequence_sample_type
Example
julia> type_sequence_sample_type(Vector{Vector{Integer}})
2
julia> type_sequence_sample_type(Number, Array{Vector{Union{Float64, Int}}})
2
julia> type_sequence_sample_type(Int, Array{Vector{Union{Float64, Int}}})
-2
TextEncodeBase.with_head_tail
— Methodwith_head_tail(x, head, tail)
Return [head; x; tail]
. Ignored if head
or tail
is nothing
. x
can be nested arrays.
Example
julia> TextEncodeBase.with_head_tail(1:5, -1, -2)
7-element Vector{Int64}:
-1
1
2
3
4
5
-2
julia> TextEncodeBase.with_head_tail([1:5, 2:3], -1, -2)
2-element Vector{Vector{Int64}}:
[-1, 1, 2, 3, 4, 5, -2]
[-1, 2, 3, -2]
TextEncodeBase.wrap
— Functionwrap(t::AbstractTokenization, s::TokenStages, x)
Mark the tokenization stage of x
, which is part of the splitting result of s
. For example, if we are doing simple whitespace tokenization and at the sentence stage, then x
is just single word of s
and thus return Word(x)
(or Token(x)
). Skip if x
is already a TokenStages
. (this method only apply to splittable stages)
Overload this method to control the tokenization process.
TextEncodeBase.wrap
— Methodwrap(t::AbstractTokenization, x::TokenStages)
A handler for unsplittable stages (token/word/...).
Overload this method for custom transform.
TextEncodeBase.@stage
— Macro@stage StageName
@stage StageName{A<:SomeType, B}
@stage StageName AbstractStage
@stage StageName{A<:SomeType, B} <: AbstractStage
Define TokenStages
with two field (x
and meta
), it's single arguement constructor, and add methods to setmeta
and setvalue
.
Equivalent to:
struct StageName{A<:SomeType, B} <: AbstractStage
x::A
meta::B
end
StageName(x) = StageName(x, nothing)
TextEncodeBase.setmeta(x::StageName, meta) = StageName(x.x, meta)
TextEncodeBase.setvalue(x::StageName, y) = StageName(y, x.meta)