takara-ai/go-attention: A full attention mechanism and transformer in pure go.

2025-03-03 11:38:00
github.com

From the Frontier Research Team at takara.ai we present the first pure Go implementation of attention mechanisms and transformer layers, designed for high performance and ease of use.

Run our comprehensive examples:

# Get the module
go get github.com/takara-ai/go-attention

# Run the examples
go run api_examples.go

type Vector []float64           // Represents a 1D vector of float64 values
type Matrix []Vector           // Represents a 2D matrix of float64 values

1. Basic Dot-Product Attention

The simplest form of attention mechanism. Useful for basic sequence processing tasks.

import "github.com/takara-ai/go-attention/attention"

// Create query-key-value setup
query := attention.Vector{1.0, 0.0, 1.0, 0.0}  // Pattern to search for
keys := attention.Matrix{
    {1.0, 0.0, 1.0, 0.0},  // Similar to query
    {0.0, 1.0, 0.0, 1.0},  // Different from query
    {0.5, 0.5, 0.5, 0.5},  // Neutral pattern
}
values := attention.Matrix{
    {1.0, 2.0},  // Value for similar key
    {3.0, 4.0},  // Value for different key
    {5.0, 6.0},  // Value for neutral key
}

// Compute attention
output, weights, err := attention.DotProductAttention(query, keys, values)
if err != nil {
    log.Fatal(err)
}

// Output will be a weighted combination of values based on query-key similarity
// Weights will show how much attention each key received

More sophisticated attention mechanism that can capture different types of relationships in parallel.

import "github.com/takara-ai/go-attention/attention"

// Configure multi-head attention
config := attention.MultiHeadConfig{
    NumHeads:    4,        // Number of parallel attention heads
    DModel:      64,       // Size of input/output embeddings
    DKey:        16,       // Size per head (DModel/NumHeads)
    DValue:      16,       // Size per head (DModel/NumHeads)
    DropoutRate: 0.1,      // For regularization
}

// Create the attention module
mha, err := attention.NewMultiHeadAttention(config)
if err != nil {
    log.Fatal(err)
}

// Process sequences (batched input)
batchSize, seqLen := 2, 3  // Process 2 sequences, each with 3 tokens

// Create input matrices [batchSize × seqLen × DModel]
queries := make(attention.Matrix, batchSize*seqLen)
keys := make(attention.Matrix, batchSize*seqLen)
values := make(attention.Matrix, batchSize*seqLen)

// Initialize your matrices with actual data...

// Process through multi-head attention
output, err := mha.Forward(queries, keys, values)
if err != nil {
    log.Fatal(err)
}

3. Full Transformer Layer

Complete transformer layer with self-attention and feed-forward network.

import (
    "github.com/takara-ai/go-attention/transformer"
    "github.com/takara-ai/go-attention/attention"
)

// Configure transformer layer
config := transformer.TransformerConfig{
    DModel:      64,       // Size of token embeddings
    NumHeads:    4,        // Number of attention heads
    DHidden:     256,      // Size of feed-forward hidden layer
    DropoutRate: 0.1,      // For regularization
}

// Create transformer layer
layer, err := transformer.NewTransformerLayer(config)
if err != nil {
    log.Fatal(err)
}

// Create input sequence [seq_len × d_model]
seqLen := 3
input := make(attention.Matrix, seqLen)
for i := range input {
    input[i] = make(attention.Vector, config.DModel)
    // Fill with your embedding data...
}

// Process through transformer
output, err := layer.Forward(input)
if err != nil {
    log.Fatal(err)
}

When running the examples, you’ll see:

Dot-Product Attention:

Query: [1 0 1 0]
Attention Weights: [0.506 0.186 0.307]  // Shows focus on similar patterns
Output: [2.601 3.601]                   // Weighted combination of values

Multi-Head Attention:

Input dimensions: [2 batches × 3 tokens × 64 features]
Output shape: [6×64]

Transformer Layer:

Input shape: [3×64]
Output shape: [3×64]

Text Processing:
- Sequence-to-sequence translation
- Document summarization
- Sentiment analysis
Time Series:
- Financial forecasting
- Sensor data analysis
- Anomaly detection
Structured Data:
- Graph node embedding
- Feature interaction modeling
- Recommendation systems

Performance Considerations

Matrix operations are optimized for CPU
Memory allocations are minimized
Batch processing for better throughput
No external dependencies

For more detailed examples, see the examples directory in the repository.

This module was created to provide a clean, efficient, and dependency-free implementation of attention mechanisms in Go. It’s particularly useful for:

Edge Computing: Zero external dependencies makes it perfect for edge devices where dependency management is crucial
Real-time Processing: Pure Go implementation ensures predictable performance for real-time applications
Cloud-native Applications: Efficient batched operations support high-throughput scaling in cloud environments
Embedded Systems: Predictable resource usage and minimal memory allocations
Production Systems: Comprehensive error handling and type safety for robust production deployments

Efficient dot-product attention mechanism
Multi-head attention support
Full transformer layer implementation with:
- Layer normalization
- Position-wise feed-forward networks
- Residual connections
Batched operations for improved performance

Future improvements may include:

Positional encoding implementations
Dropout support
CUDA acceleration support
Additional transformer variants
Pre-trained models
Training utilities

Contributions are welcome! Please feel free to submit a Pull Request.

MIT License – see LICENSE file for details

For research inquiries and press, please reach out to research@takara.ai

人類を変革する

Source Link

Keep your files stored safely and securely with the SanDisk 2TB Extreme Portable SSD. With over 69,505 ratings and an impressive 4.6 out of 5 stars, this product has been purchased over 8K+ times in the past month. At only $129.99, this Amazon’s Choice product is a must-have for secure file storage.

Help keep private content private with the included password protection featuring 256-bit AES hardware encryption. Order now for just $129.99 on Amazon!

Unlock unlimited streaming with a free Amazon Prime trial!
Sign up today!

Help Power Techcratic’s Future – Scan To Support

If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.

As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!

BITCOIN

Bitcoin Logo