API Reference
Base.:!
Base.:&
Base.:|
NeuralAttentionlib.apply_mask
NeuralAttentionlib.apply_mask
NeuralAttentionlib.attention_score
NeuralAttentionlib.collapsed_size
NeuralAttentionlib.collapseddim
NeuralAttentionlib.collapseddim
NeuralAttentionlib.dot_product_score
NeuralAttentionlib.generic_multihead_qkv_attention
NeuralAttentionlib.generic_qkv_attention
NeuralAttentionlib.getmask
NeuralAttentionlib.masked_score
NeuralAttentionlib.matmul
NeuralAttentionlib.merge_head
NeuralAttentionlib.mixing
NeuralAttentionlib.move_head_dim_in
NeuralAttentionlib.move_head_dim_in_perm
NeuralAttentionlib.move_head_dim_out
NeuralAttentionlib.move_head_dim_out_perm
NeuralAttentionlib.multihead_qkv_attention
NeuralAttentionlib.naive_qkv_attention
NeuralAttentionlib.noncollapsed_size
NeuralAttentionlib.normalized_score
NeuralAttentionlib.scaled_dot_product_score
NeuralAttentionlib.split_head
NeuralAttentionlib.unwrap_collapse
NeuralAttentionlib.weighted_sum_mixing
NeuralAttentionlib.AbstractArrayMask
NeuralAttentionlib.AbstractAttenMask
NeuralAttentionlib.AbstractAttenMaskOp
NeuralAttentionlib.AbstractDatalessMask
NeuralAttentionlib.BandPartMask
NeuralAttentionlib.BatchedMask
NeuralAttentionlib.BiLengthMask
NeuralAttentionlib.CausalMask
NeuralAttentionlib.CollapsedDimArray
NeuralAttentionlib.GenericMask
NeuralAttentionlib.LocalMask
NeuralAttentionlib.RandomMask
NeuralAttentionlib.RepeatMask
NeuralAttentionlib.SymLengthMask
Functional
NeuralAttentionlib.attention_score
— Functionattention_score(f, args...) = f(args...)
Attention score api. Can be overload for doing custom implementation with generic_qkv_attention
. f
is the score function.
See also: generic_qkv_attention
, generic_multihead_qkv_attention
, mixing
NeuralAttentionlib.dot_product_score
— Functiondot_product_score(q, k)
Dot-product attention score function. Equivalent to scaled_dot_product_score(q, k, 1)
.
See also: scaled_dot_product_score
NeuralAttentionlib.generic_multihead_qkv_attention
— Functiongeneric_multihead_qkv_attention(mixingf, scoref, head, q, k, v, args...)
Generic version of multihead_qkv_attention
. Need to specify mixing and score function.
NeuralAttentionlib.generic_qkv_attention
— Functiongeneric_qkv_attention(mixingf, scoref, q, k, v, args...)
Generic version of naive_qkv_attention
. Need to specify mixing and score function.
NeuralAttentionlib.masked_score
— Functionmasked_score(mask) = masked_score $ mask
masked_score(maskop::AbstractAttenMaskOp, mask::AbstractAttenMask, score, args...)
Masked attention score api. Applying the mask
according to maskop
on the attention score compute from score(args...)
.
See also: naive_qkv_attention
, SymLengthMask
, BiLengthMask
NeuralAttentionlib.merge_head
— Functionmerge_head(x)
merge the head
dimension split by split_head
.
NeuralAttentionlib.mixing
— Functionmixing(f, v, g, args...) = f(attention_score(g, args...), v)
Mixing
function api. Can be overload for doing custom implementation with generic_qkv_attention
. f
is the mixing function and g
is score function.
See also: generic_qkv_attention
, generic_multihead_qkv_attention
, attention_score
NeuralAttentionlib.move_head_dim_in
— Functionmove_head_dim_in(x::AbstractArray, nobatch=false)
Equivanlent to permutedims(x, move_head_dim_in_perm(x, nobatch)))
See also: merge_head
, move_head_dim_in_perm
NeuralAttentionlib.move_head_dim_in_perm
— Functionmove_head_dim_in_perm(x::AbstractArray{T, N}, nobatch=false)
move_head_dim_in_perm(N::Int, nobatch=false)
Dimension order for permutedims
to move the head
dimension (created by split_head
) from batch dimension to feature dimension (for merge_head
). Return a tuple of integer of length n
. nobatch
specify where x
is a batch of data.
Example
julia> Functional.move_head_dim_in_perm(5, false)
(1, 4, 2, 3, 5)
julia> Functional.move_head_dim_in_perm(5, true)
(1, 5, 2, 3, 4)
See also: merge_head
, move_head_dim_in
NeuralAttentionlib.move_head_dim_out
— Functionmove_head_dim_out(x::AbstractArray, nobatch=false)
Equivanlent to permutedims(x, move_head_dim_out_perm(x, nobatch)))
See also: split_head
, move_head_dim_out_perm
NeuralAttentionlib.move_head_dim_out_perm
— Functionmove_head_dim_out_perm(x::AbstractArray{T, N}, nobatch=false)
move_head_dim_out_perm(N::Int, nobatch=false)
Dimension order for permutedims
to move the head
dimension (created by split_head
) to batch dimension. Return a tuple of integer of length n
. nobatch
specify where x
is a batch of data.
Example
julia> Functional.move_head_dim_out_perm(5, false)
(1, 3, 4, 2, 5)
julia> Functional.move_head_dim_out_perm(5, true)
(1, 3, 4, 5, 2)
See also: split_head
, move_head_dim_out
NeuralAttentionlib.multihead_qkv_attention
— Functionmultihead_qkv_attention(head, q, k, v, mask=nothing)
Multihead version of naive_qkv_attention
. The core operation for implement a regular transformer layer.
NeuralAttentionlib.naive_qkv_attention
— Functionnaive_qkv_attention(q, k, v, mask=nothing)
The scaled dot-product attention of a regular transformer layer.
$Attention(Q, K, V) = softmax(\frac{QK^T}{\sqrt{d_k}})V$
It's equivalent to generic_qkv_attention(weighted_sum_mixing, normalized_score(NNlib.softmax) $ masked_score(mask) $ scaled_dot_product_score, q, k, v)
.
#Example
julia> fdim, ldim, bdim = 32, 10, 4;
julia> x = randn(fdim, ldim, bdim);
julia> y = naive_qkv_attention(x, x, x); # simple self attention
# no mask here
julia> z = generic_qkv_attention(weighted_sum_mixing, normalized_score(NNlib.softmax) $ scaled_dot_product_score, x, x, x);
julia> y ≈ z
true
See also: generic_qkv_attention
NeuralAttentionlib.normalized_score
— Functionnormalized_score(norm) = normalized_score $ norm
normalized_score(norm, score, args...)
Normalized attenion score api. norm
is the normalize function (like softmax
) and score
is the function that compute attention score from args...
.
See also: naive_qkv_attention
NeuralAttentionlib.scaled_dot_product_score
— Function scaled_dot_product_score(q, k, s = sqrt(inv(size(k, 1))))
The scaled dot-product attention score function of a regular transformer layer.
$Score(Q, K) = \frac{QK^T}{\sqrt{d_k}}$
See also: naive_qkv_attention
NeuralAttentionlib.split_head
— Functionsplit_head(head::Int, x)
Split the first dimension into head
piece of small vector. Equivalent to reshape(x, :, head, tail(size(x))...)
.
NeuralAttentionlib.weighted_sum_mixing
— Functionweighted_sum_mixing(s, v)
The mixing function of a regular transformer layer. s
is the attention score and v
is the value of QKV attention.
Mask
NeuralAttentionlib.AbstractAttenMask
— TypeAbstractAttenMask
Abstract type for mask data, can be viewed as AbstractArray{Bool}
NeuralAttentionlib.AbstractAttenMaskOp
— TypeAbstractAttenMaskOp
Trait-like abstract type for holding operation related argument, defined how the mask should be apply to input array
NeuralAttentionlib.apply_mask
— Methodapply_mask(op::GenericAttenMaskOp, mask::AbstractAttenMask, score)
Equivalent to op.apply(score, op.scale .* (op.flip ? .! mask : mask))
.
Example
julia> x = randn(10, 10);
julia> m = CausalMask()
CausalMask()
julia> apply_mask(GenericAttenMaskOp(.+, true, -1e9), m, x) == @. x + (!m * -1e9)
true
NeuralAttentionlib.apply_mask
— Methodapply_mask(op::NaiveAttenMaskOp, mask::AbstractAttenMask, score)
Directly broadcast multiply mask to attention score, i.e. score .* mask
.
NeuralAttentionlib.AbstractArrayMask
— TypeAbstractArrayMask <: AbstractAttenMask
Abstract type for mask with array data
NeuralAttentionlib.AbstractDatalessMask
— TypeAbstractDatalessMask <: AbstractAttenMask
Abstract type for mask without array data.
NeuralAttentionlib.BandPartMask
— TypeBandPartMask(l::Int, u::Int) <: AbstractDatalessMask
Attention mask that only allow band_part values to pass.
NeuralAttentionlib.BatchedMask
— TypeBatchedMask(mask::AbstractArrayMask) <: AbstractWrapperMask
Attention mask wrapper over array mask for applying the same mask within the same batch.
Example
julia> m = SymLengthMask([2,3])
SymLengthMask{1, Vector{Int32}}(Int32[2, 3])
julia> trues(3,3, 2) .* m
3×3×2 BitArray{3}:
[:, :, 1] =
1 1 0
1 1 0
0 0 0
[:, :, 2] =
1 1 1
1 1 1
1 1 1
julia> trues(3,3, 2, 2) .* m
ERROR: DimensionMismatch("arrays could not be broadcast to a common size; mask require ndims(A) == 3")
Stacktrace:
[...]
julia> trues(3,3, 2, 2) .* BatchedMask(m) # 4-th dim become batch dim
3×3×2×2 BitArray{4}:
[:, :, 1, 1] =
1 1 0
1 1 0
0 0 0
[:, :, 2, 1] =
1 1 0
1 1 0
0 0 0
[:, :, 1, 2] =
1 1 1
1 1 1
1 1 1
[:, :, 2, 2] =
1 1 1
1 1 1
1 1 1
NeuralAttentionlib.BiLengthMask
— TypeBiLengthMask(q_len::A, k_len::A) where {A <: AbstractArray{Int, N}} <: AbstractArrayMask
Attention mask specified by two arrays of integer that indicate the length dimension size.
Example
julia> bm = BiLengthMask([2,3], [3, 5])
BiLengthMask{1, Vector{Int32}}(Int32[2, 3], Int32[3, 5])
julia> trues(5,5, 2) .* bm
5×5×2 BitArray{3}:
[:, :, 1] =
1 1 0 0 0
1 1 0 0 0
1 1 0 0 0
0 0 0 0 0
0 0 0 0 0
[:, :, 2] =
1 1 1 0 0
1 1 1 0 0
1 1 1 0 0
1 1 1 0 0
1 1 1 0 0
See also: SymLengthMask
, BatchedMask
, RepeatMask
NeuralAttentionlib.CausalMask
— TypeCausalMask() <: AbstractDatalessMask
Attention mask that block the future values.
Similar to applying LinearAlgebra.triu!
on the score matrix
NeuralAttentionlib.GenericMask
— TypeGenericMask <: AbstractArrayMask
Generic attention mask. Just a wrapper over AbstractArray{Bool}
for dispatch.
NeuralAttentionlib.LocalMask
— TypeLocalMask(width::Int) <: AbstractDatalessMask
Attention mask that only allow local (diagonal like) values to pass.
width
should be ≥ 0 and A .* LocalMask(1)
is similar to Diagonal(A)
NeuralAttentionlib.RandomMask
— TypeRandomMask(p::Float64) <: AbstractDatalessMask
Attention mask that block value randomly.
p
specify the percentage of value to block. e.g. A .* RandomMask(0)
is equivalent to identity(A)
and A .* RandomMask(1)
is equivalent to zero(A)
.
NeuralAttentionlib.RepeatMask
— TypeRepeatMask(mask::AbstractAttenMask, num::Int) <: AbstractWrapperMask
Attention mask wrapper over array mask for doing inner repeat on the last dimension.
Example
julia> m = SymLengthMask([2,3])
SymLengthMask{1, Vector{Int32}}(Int32[2, 3])
julia> trues(3,3, 2) .* m
3×3×2 BitArray{3}:
[:, :, 1] =
1 1 0
1 1 0
0 0 0
[:, :, 2] =
1 1 1
1 1 1
1 1 1
julia> trues(3,3, 4) .* m
ERROR: DimensionMismatch("arrays could not be broadcast to a common size; mask require 3-th dimension to be 2, but get 4")
Stacktrace:
[...]
julia> trues(3,3, 4) .* RepeatMask(m, 2)
3×3×4 BitArray{3}:
[:, :, 1] =
1 1 0
1 1 0
0 0 0
[:, :, 2] =
1 1 0
1 1 0
0 0 0
[:, :, 3] =
1 1 1
1 1 1
1 1 1
[:, :, 4] =
1 1 1
1 1 1
1 1 1
NeuralAttentionlib.SymLengthMask
— TypeSymLengthMask(len::AbstractArray{Int, N}) <: AbstractArrayMask
Attention mask specified by an array of integer that indicate the length dimension size. assuming Query length and Key length are the same.
Example
julia> m = SymLengthMask([2,3])
SymLengthMask{1, Vector{Int32}}(Int32[2, 3])
julia> trues(3,3, 2) .* m
3×3×2 BitArray{3}:
[:, :, 1] =
1 1 0
1 1 0
0 0 0
[:, :, 2] =
1 1 1
1 1 1
1 1 1
See also: BiLengthMask
, BatchedMask
, RepeatMask
Base.:!
— Method!m::AbstractAttenMask
Boolean not of an attention mask
Base.:&
— Methodm1::AbstractAttenMask & m2::AbstractAttenMask
logical and of two attention mask
Base.:|
— Methodm1::AbstractAttenMask | m2::AbstractAttenMask
logical or of two attention mask
NeuralAttentionlib.getmask
— Functiongetmask(m::AbstractAttenMask, score, scale = 1)
Convert m
into mask array of AbstractArray
for score
with scale
.
Example
julia> getmask(CausalMask(), randn(7,7), 2)
7×7 Matrix{Float64}:
2.0 2.0 2.0 2.0 2.0 2.0 2.0
0.0 2.0 2.0 2.0 2.0 2.0 2.0
0.0 0.0 2.0 2.0 2.0 2.0 2.0
0.0 0.0 0.0 2.0 2.0 2.0 2.0
0.0 0.0 0.0 0.0 2.0 2.0 2.0
0.0 0.0 0.0 0.0 0.0 2.0 2.0
0.0 0.0 0.0 0.0 0.0 0.0 2.0
Matmul
NeuralAttentionlib.CollapsedDimArray
— TypeCollapsedDimArray{T}(array, si::Integer=2, sj::Integer=3) <: AbstractArray{T, 3}
Similar to lazy reshape array with collapsed_size
NeuralAttentionlib.collapsed_size
— Methodcollapsed_size(x, xi, xj)::Dim{3}
Collapse the dimensionality of x
into 3 according to xi
and xj
.
(X1, X2, ..., Xi-1, Xi, Xi+1, ..., Xj-1, Xj, ..., Xn)
|_____dim1______| |_______dim2______| |___dim3__|
This is equivalent to size(reshape(x, prod(size(x)[1:(xi-1)]), prod(size(x)[xi:(xj-1)]), prod(size(x)[xj:end])))
.
#Example
julia> x = randn(7,6,5,4,3,2);
julia> collapsed_size(x, 3, 5)
(42, 20, 6)
See also: noncollapsed_size
NeuralAttentionlib.noncollapsed_size
— Methodnoncollapsed_size(x, xi, xj, n)
Collapse the dimensionality of x
into 3 according to xi
and xj
.
(X1, X2, ..., Xi-1, Xi, Xi+1, ..., Xj-1, Xj, ..., Xn)
|_____dim1______| |_______dim2______| |___dim3__|
But take the size before collapse. e.g. noncollapsed_size(x, xi, xj, 2)
will be (Xi, Xi+1, ..., Xj-1)
.
#Example
julia> x = randn(7,6,5,4,3,2);
julia> noncollapsed_size(x, 3, 5, 1)
(7, 6)
julia> noncollapsed_size(x, 3, 5, 2)
(5, 4)
julia> noncollapsed_size(x, 3, 5, 3)
(3, 2)
See also: collapsed_size
NeuralAttentionlib.collapseddim
— Methodcollapseddim(x::AbstractArray, xi, xj)
Reshape x
into 3 dim array, equivalent to reshape(x, collapsed_size(x, xi, xj))
See also: collapsed_size
NeuralAttentionlib.collapseddim
— Methodcollapseddim(ca::CollapsedDimArray)
remove the wrapper and really reshape it.
See also: CollapsedDimArray
, unwrap_collapse
NeuralAttentionlib.matmul
— Functionmatmul(a::AbstractArray, b::AbstractArray, s::Number = 1)
Equivalent to s .* (a * b)
if a
and b
are Vector
or Matrix
. For array with higher dimension, it will convert a
and b
to CollapsedDimArray
and perform batched matrix multiplication, and then return the result as CollapsedDimArray
. This is useful for preserving the dimensionality. If the batch dimension of a
and b
have different shape, it pick the shape of b
for batch dimension. Work with NNlib.batch_transpose
and NNlib.batch_adjoint
.
Example
# b-dim shape: (6,)
julia> a = CollapsedDimArray(randn(3,4,2,3,6), 3, 5); size(a)
(12, 6, 6)
# b-dim shape: (3,1,2)
julia> b = CollapsedDimArray(randn(6,2,3,1,2), 2, 3); size(b)
(6, 2, 6)
julia> c = matmul(a, b); size(c), typeof(c)
((12, 2, 6), CollapsedDimArray{Float64, Array{Float64, 6}, Static.StaticInt{3}, Static.StaticInt{4}, Static.False})
# b-dim shape: (3,1,2)
julia> d = unwrap_collapse(c); size(d), typeof(d)
((3, 4, 2, 3, 1, 2), Array{Float64, 6})
# equivanlent to `batched_mul` but preserve shape
julia> NNlib.batched_mul(collapseddim(a), collapseddim(b)) == collapseddim(matmul(a, b))
true
See also: CollapsedDimArray
, unwrap_collapse
, collapseddim
NeuralAttentionlib.unwrap_collapse
— Functionunwrap_collapse(ca::CollapsedDimArray)
Return the underlying array of CollapsedDimArray
, otherwise just return the input.