Skip to content

mdaniluk/transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Simple Transformer implementation from scratch in PyTorch

Goal of this implementation is to show the simplicity of transformer models and self-attention. This transformer model consists of stack of simple transformer blocks. It doesn't have encoder-decoder structure as historical transformer implementation.

SelfAttention is simple implementation of multi head attentnion module.

TransformerBlock is simple block that consists of attentnion layer, layer normalization and feed forward network with resnet connections between them.

CTransformer is designed for classifying sequences. It consists of several transformer blocks and takes the average of output tokens from last layer and apply linear projection to this final layer.

You can easily train it to classify sequences from IMDB dataset:

python classify.py

About

Transformer from scratch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages