Advanced Deep Learning - Week 10

Course starts soon..


We will start now with a quiz based on last week's material

You have 6 minutes to answer the quiz.

The quiz link:
Quiz Link
It will be copied in Mattermost and in the Zoom chat.

Short Feedback Round

Presentation of the final project

  • Each group will make a presentation.
  • Each group should take 10-15 minutes. There will be the presentation and a round of questions.
  • The structure of the presentation is not strictly fixed, nor it is its content. However, there are some suggestions to have similar structures.

Guidelines for the presentation

  • Group: who are the people in the group.
  • Project: short description of the project and the motivation behind.
  • (*Optional) Tools: which tools you used.
  • Architecture: what architecture did you use, how many layers, which function (you can be technical on this part)
  • (*Optional) Story: attempts, failures, successes, whatever happened in the process. Sometimes what did not work can be as important as what worked!
  • Results: how it works? can you quantify results or show some visualizations?
  • Baselines: is there a baseline or an approach you can compare with?
  • (*Optional) Missing: is there something you missed to improve the project? Time, material, knowledge, data, computational power?
  • (*Optional) Future works: how could the project be improved or extended?

Sharing is caring

If not otherwise discussed with a single group,
we will add your project to our Project Page
  • Code: Provide a link to your repository if you have one, and also some short instructions to reproduce it if needed
    Please check that the code is clean (no testing or commented code) and has comments or text fields to understand it!
  • Data: mention if the data is public (with link), or if it is not possible to share.

Open Discussion

  • What are word embeddings?
  • What are positional embeddings?
  • How are they combined?

Positional Encoding (PE) [1]

PE is added to the Word Embedding (WE) [2]

LSTM processes word per word [3]

Transformers all at once [3]

How would you encode positions? [3]

Criteria [4]

Normal order ? [3]

Normal order does not work [3]

Normalized ? [3]

Normalized does not work [3]

Sin function [3]

Sinusoidal waves [3]

Waves and ordering? [4]

Relative positions [4, 5]

Here the proof! [6]

Embedding visualization [4, 5]

Why summing and not concatenating? [7,8]

Discussion on Tensorflow or Discussion on Reddit


[1] Timo Denk's Blog
[2] Jay Alammar's blog
[3] Visual Guide to Transformer Neural Networks - (Episode 1) Position Embeddings (Youtube Video)
[4] Amirhossein Kazemnejad's Blog
[5] Vaswani et al. (2017), Attention is All you Need
[6] Mathematical proof of the linear combination of relative position
[7] Tensorflow Repository Issue/Discussion about combination of positional encodings
[8] Reddit discussion about positional encoding
[*] StackExchange question about positional encoding
[*] Towardsdatascience article about positional encoding (part I)
[*] Towardsdatascience article about positional encoding (part II)


If you had troubles, there is a Tensorflow Tutorial which is more or less the same code

For the next week

  • You are done with the courses!
  • Finish the final presentation!