Boda Blog

Python

Inheritance

  • If you have inherited from parent class then you should call the parent class constructor if you overload it, or simply doesn’t overload it

Ex:

1
2
3
4
5
6
7
8
9
class parent:
    def __init__:

class child:
    def __init__:
        super().__init__()

class child2:
    ## or simply don't override the constructor and use the parent one

Multiple Inheritance

  • when we inherit from two or more classes, whatever class we inherited first(typed first in the list), would be the one to have pariority

Scribbles

Transformers

To deal with sequential data we have to options:

  • 1-D convolution NN
    • processing can be parallel
    • not practical for long sequences
  • Recurrent NN
    • can’t happen in prallel
    • have gradient vanshing problem of the squence becomes so long
    • we have bottleneck at the end of the encoder
  • RNN with attention mechanism
    • to solve the bottleneck problem, we make Encoder-Decoder attention
    • Decoder utilzes:
      • context vector
      • weighted sum of hidden states (h1,h2, … ) from the encoder

Transformers

Encoder

  • first we do input embedding, and positional embedding
  • in self attention: we multiply q,w,v by a matrix to do lenear transformation
  • self attentoion: k _ q –> scaling down –> softmax –> _ v

multi-head attention

  • works as we use many filters in CNN
  • in wide attention: it takes every word and spread it multi-head attention
  • in narrow attention: it take every word and split it up across the multi-head
    • but didnt we lose the adcantage of using multi-head as mutli prespectives, as we do with filters in CNN?

Positional info

  • positional encoding using the rotation sin/cos matrix
  • positional embedding

Residual connections

  • to give the chance to skip some learning parameters if it’s better to minimize the loss

Layer Normalization

  • in batch normalization
    • ==> we normalize to zero mean and unity varince
    • we calculate for all samples in each batch (for each channel )
  • in layer normalization
    • ==> $y = \gamma * x + \beta $ where gamm and bata are trainable parametes
    • calculates for all channles in the same sample
  • in instance normalization ==> calculate for one channel in one sample

Debugging ML Models

  • Understand bias-variance diagnoses

TTS Research

TTS

TTS can be viewed as a sequence-to-sequence mapping problem; from a sequence of discrete symbols (text) to a real-valued time series (speech signals). A typical TTS pipeline has two parts; 1) text analysis and 2) speech synthesis. The text analysis part typically includes a number of natural language processing (NLP) steps, such as sentence segmentation, word segmentation, text normalization, part-of-speech (POS) tagging, and grapheme-to-phoneme (G2P) conversion. It takes a word sequence as input and outputs a phoneme sequence with a variety of linguistic contexts. The speech synthesis part takes the context-dependent phoneme sequence as its input and outputs a synthesized speech waveform.

FastAi 2020

Lecture two

P value:

determines if some numbers have realationship, or they are random (whether they are independat or dependant)

suppose we have the temp and R (transmitity) values of a 100 cities in China and we want to see if there's a relation between them.

then we generate many sets of random numbers for each parameter then we calculate the P value which would tell us what's the percentage this slope  is a random, and that ther's no relation

A P-value is the probability of an observed result  assuming that the null hypothesis (there's no relation ) is true


PS: P-value also is dependant on the size of the set u used, so they don't measure the importance of the result.

so don't use P-values

If the P value is > 0.5 then we sure that these daata have no ralation, and if  the p-value is so small, then there's a chance that the data have a relation

Lecture three

In the course video and book, we built a bear classifier, using data from Microsoft Ping Api.

Computational Linear Algebra

Lecture 1

1
import numpy as np
1
2
a = np.array( [[6,5,3,1], [3,6,2,2], [3,4,3,1] ])
b = np.array( [ [1.5 ,1], [2,2.5], [5 ,4.5] ,[16 ,17] ])
1
2
for c in (a @ b):
    print(c)
[50. 49.]
[58.5 61. ]
[43.5 43.5]

Lecture 2

Matrix decomposition: we decopose matricies into smaller ones that has special properties

Singular Value Decomposition (SVD):

  • it’s an exact decomposition, so you can retrieve the orginal matrix again

Some SVD applications:

  • semantic analysis
  • collaborative filtering / recommendation
  • data compression
  • PCA (principal component analysis)

Non-negative Matrix Factorization (NMF)