API Reference

Functional

NeuralAttentionlib.move_head_dim_in_permFunction
move_head_dim_in_perm(x::AbstractArray{T, N}, nobatch=false)
move_head_dim_in_perm(N::Int, nobatch=false)

Dimension order for permutedims to move the head dimension (created by split_head) from batch dimension to feature dimension (for merge_head). Return a tuple of integer of length n. nobatch specify where x is a batch of data.

Example

julia> Functional.move_head_dim_in_perm(5, false)
(1, 4, 2, 3, 5)

julia> Functional.move_head_dim_in_perm(5, true)
(1, 5, 2, 3, 4)

See also: merge_head, move_head_dim_in

source
NeuralAttentionlib.move_head_dim_out_permFunction
move_head_dim_out_perm(x::AbstractArray{T, N}, nobatch=false)
move_head_dim_out_perm(N::Int, nobatch=false)

Dimension order for permutedims to move the head dimension (created by split_head) to batch dimension. Return a tuple of integer of length n. nobatch specify where x is a batch of data.

Example

julia> Functional.move_head_dim_out_perm(5, false)
(1, 3, 4, 2, 5)

julia> Functional.move_head_dim_out_perm(5, true)
(1, 3, 4, 5, 2)

See also: split_head, move_head_dim_out

source
NeuralAttentionlib.naive_qkv_attentionFunction
naive_qkv_attention(q, k, v, mask=nothing)

The scaled dot-product attention of a regular transformer layer.

$Attention(Q, K, V) = softmax(\frac{QK^T}{\sqrt{d_k}})V$

It's equivalent to generic_qkv_attention(weighted_sum_mixing, normalized_score(NNlib.softmax) $ masked_score(mask) $ scaled_dot_product_score, q, k, v).

#Example

julia> fdim, ldim, bdim = 32, 10, 4;

julia> x = randn(fdim, ldim, bdim);

julia> y = naive_qkv_attention(x, x, x); # simple self attention

# no mask here
julia> z = generic_qkv_attention(weighted_sum_mixing, normalized_score(NNlib.softmax) $ scaled_dot_product_score, x, x, x);

julia> y ≈ z
true

See also: generic_qkv_attention

source
NeuralAttentionlib.normalized_scoreFunction
normalized_score(norm) = normalized_score $ norm
normalized_score(norm, score, args...)

Normalized attenion score api. norm is the normalize function (like softmax) and score is the function that compute attention score from args....

See also: naive_qkv_attention

source
NeuralAttentionlib.split_headFunction
split_head(head::Int, x)

Split the first dimension into head piece of small vector. Equivalent to reshape(x, :, head, tail(size(x))...).

source

Mask

NeuralAttentionlib.apply_maskMethod
apply_mask(op::GenericAttenMaskOp, mask::AbstractAttenMask, score)

Equivalent to op.apply(score, op.scale .* (op.flip ? .! mask : mask)).

Example

julia> x = randn(10, 10);

julia> m = CausalMask()
CausalMask()

julia> apply_mask(GenericAttenMaskOp(.+, true, -1e9), m, x) ==  @. x + (!m * -1e9)
true
source
NeuralAttentionlib.apply_maskMethod
apply_mask(op::NaiveAttenMaskOp, mask::AbstractAttenMask, score)

Directly broadcast multiply mask to attention score, i.e. score .* mask.

source
NeuralAttentionlib.BatchedMaskType
BatchedMask(mask::AbstractArrayMask) <: AbstractWrapperMask

Attention mask wrapper over array mask for applying the same mask within the same batch.

Example

julia> m = SymLengthMask([2,3])
SymLengthMask{1, Vector{Int32}}(Int32[2, 3])

julia> trues(3,3, 2) .* m
3×3×2 BitArray{3}:
[:, :, 1] =
 1  1  0
 1  1  0
 0  0  0

[:, :, 2] =
 1  1  1
 1  1  1
 1  1  1

julia> trues(3,3, 2, 2) .* m
ERROR: DimensionMismatch("arrays could not be broadcast to a common size; mask require ndims(A) == 3")
Stacktrace:
[...]

julia> trues(3,3, 2, 2) .* BatchedMask(m) # 4-th dim become batch dim
3×3×2×2 BitArray{4}:
[:, :, 1, 1] =
 1  1  0
 1  1  0
 0  0  0

[:, :, 2, 1] =
 1  1  0
 1  1  0
 0  0  0

[:, :, 1, 2] =
 1  1  1
 1  1  1
 1  1  1

[:, :, 2, 2] =
 1  1  1
 1  1  1
 1  1  1
source
NeuralAttentionlib.BiLengthMaskType
BiLengthMask(q_len::A, k_len::A) where {A <: AbstractArray{Int, N}} <: AbstractArrayMask

Attention mask specified by two arrays of integer that indicate the length dimension size.

Example

julia> bm = BiLengthMask([2,3], [3, 5])
BiLengthMask{1, Vector{Int32}}(Int32[2, 3], Int32[3, 5])

julia> trues(5,5, 2) .* bm
5×5×2 BitArray{3}:
[:, :, 1] =
 1  1  0  0  0
 1  1  0  0  0
 1  1  0  0  0
 0  0  0  0  0
 0  0  0  0  0

[:, :, 2] =
 1  1  1  0  0
 1  1  1  0  0
 1  1  1  0  0
 1  1  1  0  0
 1  1  1  0  0

See also: SymLengthMask, BatchedMask, RepeatMask

source
NeuralAttentionlib.CausalMaskType
CausalMask() <: AbstractDatalessMask

Attention mask that block the future values.

Similar to applying LinearAlgebra.triu! on the score matrix

source
NeuralAttentionlib.LocalMaskType
LocalMask(width::Int) <: AbstractDatalessMask

Attention mask that only allow local (diagonal like) values to pass.

width should be ≥ 0 and A .* LocalMask(1) is similar to Diagonal(A)

source
NeuralAttentionlib.RandomMaskType
RandomMask(p::Float64) <: AbstractDatalessMask

Attention mask that block value randomly.

p specify the percentage of value to block. e.g. A .* RandomMask(0) is equivalent to identity(A) and A .* RandomMask(1) is equivalent to zero(A).

source
NeuralAttentionlib.RepeatMaskType
RepeatMask(mask::AbstractAttenMask, num::Int) <: AbstractWrapperMask

Attention mask wrapper over array mask for doing inner repeat on the last dimension.

Example

julia> m = SymLengthMask([2,3])
SymLengthMask{1, Vector{Int32}}(Int32[2, 3])

julia> trues(3,3, 2) .* m
3×3×2 BitArray{3}:
[:, :, 1] =
 1  1  0
 1  1  0
 0  0  0

[:, :, 2] =
 1  1  1
 1  1  1
 1  1  1

julia> trues(3,3, 4) .* m
ERROR: DimensionMismatch("arrays could not be broadcast to a common size; mask require 3-th dimension to be 2, but get 4")
Stacktrace:
[...]

julia> trues(3,3, 4) .* RepeatMask(m, 2)
3×3×4 BitArray{3}:
[:, :, 1] =
 1  1  0
 1  1  0
 0  0  0

[:, :, 2] =
 1  1  0
 1  1  0
 0  0  0

[:, :, 3] =
 1  1  1
 1  1  1
 1  1  1

[:, :, 4] =
 1  1  1
 1  1  1
 1  1  1
source
NeuralAttentionlib.SymLengthMaskType
SymLengthMask(len::AbstractArray{Int, N}) <: AbstractArrayMask

Attention mask specified by an array of integer that indicate the length dimension size. assuming Query length and Key length are the same.

Example

julia> m = SymLengthMask([2,3])
SymLengthMask{1, Vector{Int32}}(Int32[2, 3])

julia> trues(3,3, 2) .* m
3×3×2 BitArray{3}:
[:, :, 1] =
 1  1  0
 1  1  0
 0  0  0

[:, :, 2] =
 1  1  1
 1  1  1
 1  1  1

See also: BiLengthMask, BatchedMask, RepeatMask

source
Base.:!Method
!m::AbstractAttenMask

Boolean not of an attention mask

source
Base.:&Method
m1::AbstractAttenMask & m2::AbstractAttenMask

logical and of two attention mask

source
Base.:|Method
m1::AbstractAttenMask | m2::AbstractAttenMask

logical or of two attention mask

source
NeuralAttentionlib.getmaskFunction
getmask(m::AbstractAttenMask, score, scale = 1)

Convert m into mask array of AbstractArray for score with scale.

Example

julia> getmask(CausalMask(), randn(7,7), 2)
7×7 Matrix{Float64}:
 2.0  2.0  2.0  2.0  2.0  2.0  2.0
 0.0  2.0  2.0  2.0  2.0  2.0  2.0
 0.0  0.0  2.0  2.0  2.0  2.0  2.0
 0.0  0.0  0.0  2.0  2.0  2.0  2.0
 0.0  0.0  0.0  0.0  2.0  2.0  2.0
 0.0  0.0  0.0  0.0  0.0  2.0  2.0
 0.0  0.0  0.0  0.0  0.0  0.0  2.0
source

Matmul

NeuralAttentionlib.collapsed_sizeMethod
collapsed_size(x, xi, xj)::Dim{3}

Collapse the dimensionality of x into 3 according to xi and xj.

(X1, X2, ..., Xi-1, Xi, Xi+1, ..., Xj-1, Xj, ..., Xn)
 |_____dim1______|  |_______dim2______|  |___dim3__|

This is equivalent to size(reshape(x, prod(size(x)[1:(xi-1)]), prod(size(x)[xi:(xj-1)]), prod(size(x)[xj:end]))).

#Example

julia> x = randn(7,6,5,4,3,2);

julia> collapsed_size(x, 3, 5)
(42, 20, 6)

See also: noncollapsed_size

source
NeuralAttentionlib.noncollapsed_sizeMethod
noncollapsed_size(x, xi, xj, n)

Collapse the dimensionality of x into 3 according to xi and xj.

(X1, X2, ..., Xi-1, Xi, Xi+1, ..., Xj-1, Xj, ..., Xn)
 |_____dim1______|  |_______dim2______|  |___dim3__|

But take the size before collapse. e.g. noncollapsed_size(x, xi, xj, 2) will be (Xi, Xi+1, ..., Xj-1).

#Example

julia> x = randn(7,6,5,4,3,2);

julia> noncollapsed_size(x, 3, 5, 1)
(7, 6)

julia> noncollapsed_size(x, 3, 5, 2)
(5, 4)

julia> noncollapsed_size(x, 3, 5, 3)
(3, 2)

See also: collapsed_size

source
NeuralAttentionlib.matmulFunction
matmul(a::AbstractArray, b::AbstractArray, s::Number = 1)

Equivalent to s .* (a * b) if a and b are Vector or Matrix. For array with higher dimension, it will convert a and b to CollapsedDimArray and perform batched matrix multiplication, and then return the result as CollapsedDimArray. This is useful for preserving the dimensionality. If the batch dimension of a and b have different shape, it pick the shape of b for batch dimension. Work with NNlib.batch_transpose and NNlib.batch_adjoint.

Example

# b-dim shape: (6,)
julia> a = CollapsedDimArray(randn(3,4,2,3,6), 3, 5); size(a)
(12, 6, 6)

# b-dim shape: (3,1,2)
julia> b = CollapsedDimArray(randn(6,2,3,1,2), 2, 3); size(b)
(6, 2, 6)

julia> c = matmul(a, b); size(c), typeof(c)
((12, 2, 6), CollapsedDimArray{Float64, Array{Float64, 6}, Static.StaticInt{3}, Static.StaticInt{4}, Static.False})

# b-dim shape: (3,1,2)
julia> d = unwrap_collapse(c); size(d), typeof(d)
((3, 4, 2, 3, 1, 2), Array{Float64, 6})

# equivanlent to `batched_mul` but preserve shape
julia> NNlib.batched_mul(collapseddim(a), collapseddim(b)) == collapseddim(matmul(a, b))
true

See also: CollapsedDimArray, unwrap_collapse, collapseddim

source