Transformers To deal with sequential data we have to options:
1-D convolution NN processing can be parallel not practical for long sequences Recurrent NN can’t happen in prallel have gradient vanshing problem of the squence becomes so long we have bottleneck at the end of the encoder RNN with attention mechanism to solve the bottleneck problem, we make Encoder-Decoder attention Decoder utilzes: context vector weighted sum of hidden states (h1,h2, … ) from the encoder Transformers Encoder first we do input embedding, and positional embedding in self attention: we multiply q,w,v by a matrix to do lenear transformation self attentoion: k _ q –> scaling down –> softmax –> _ v multi-head attention works as we use many filters in CNN in wide attention: it takes every word and spread it multi-head attention in narrow attention: it take every word and split it up across the multi-head but didnt we lose the adcantage of using multi-head as mutli prespectives, as we do with filters in CNN?
TTS TTS can be viewed as a sequence-to-sequence mapping problem; from a sequence of discrete symbols (text) to a real-valued time series (speech signals). A typical TTS pipeline has two parts; 1) text analysis and 2) speech synthesis. The text analysis part typically includes a number of natural language processing (NLP) steps, such as sentence segmentation, word segmentation, text normalization, part-of-speech (POS) tagging, and grapheme-to-phoneme (G2P) conversion. It takes a word sequence as input and outputs a phoneme sequence with a variety of linguistic contexts.
Lecture two P value: determines if some numbers have realationship, or they are random (whether they are independat or dependant) suppose we have the temp and R (transmitity) values of a 100 cities in China and we want to see if there's a relation between them. then we generate many sets of random numbers for each parameter then we calculate the P value which would tell us what's the percentage this slope is a random, and that ther's no relation A P-value is the probability of an observed result assuming that the null hypothesis (there's no relation ) is true PS: P-value also is dependant on the size of the set u used, so they don't measure the importance of the result.
Lecture 1 1 import numpy as np 1 2 a = np.array( [[6,5,3,1], [3,6,2,2], [3,4,3,1] ]) b = np.array( [ [1.5 ,1], [2,2.5], [5 ,4.5] ,[16 ,17] ]) 1 2 for c in (a @ b): print(c) [50. 49.] [58.5 61. ] [43.5 43.5] Lecture 2 Matrix decomposition: we decopose matricies into smaller ones that has special properties
Singular Value Decomposition (SVD): it’s an exact decomposition, so you can retrieve the orginal matrix again Some SVD applications: semantic analysis collaborative filtering / recommendation data compression PCA (principal component analysis) Non-negative Matrix Factorization (NMF)
Lecture 1 We learn about the big picture behind multiplication of matrix and vector
we learn about the row picture and column picture
Lecture 2 we learned about elimination method to solve a system of equations
Lecture 3 in this lecture we learned about matrices multiplication:
we can do that in five ways:
row * col ==> gives an entry (1 cell) col _ row ==> sum ( r1 _ c1 , r2 * c2, etc) by columns ==> A * c1 = combination of A columns by columns ==> r1 * B = combination of A B rows by blocks ==> A (A1,A2,A3,A4) _ B (B1,B2,B3,B4) = C1 = (A1_ B1 + A2 * B3) and so on then we learned about gausian-Jordan elimination to find the matrix inverse
You and your Research https://www.cs.virginia.edu/~robins/YouAndYourResearch.html
Main Ideas Commit to your idea Ask yourself: What are the important problems in my field? Communicate with the bright minds Ask the important questions Closed door, or open door when you work with door closed you work harder you work on the wrong thing open door working get many interruptions get important clues My take on this, is we should keep an open mind about what other people are doing, and where the research is heading