{"font_size":0.4,"font_color":"#FFFFFF","background_alpha":0.5,"background_color":"#9C27B0","Stroke":"none","body":[{"from":0,"to":10.65,"location":2,"content":"[NOISE] Okay everyone, let's get started for today."},{"from":10.65,"to":15.35,"location":2,"content":"Okay. So, we're into week five of CS224n."},{"from":15.35,"to":18.45,"location":2,"content":"And so, this is the plan for today."},{"from":18.45,"to":22.08,"location":2,"content":"Um, in some sense a lot of this class is gonna be"},{"from":22.08,"to":26.95,"location":2,"content":"an easy class because I'm gonna talk about things like,"},{"from":26.95,"to":30.78,"location":2,"content":"um, final projects and tips for what you're meant to do,"},{"from":30.78,"to":32.1,"location":2,"content":"and finding a topic,"},{"from":32.1,"to":33.51,"location":2,"content":"and writing up your work,"},{"from":33.51,"to":34.68,"location":2,"content":"and things like that."},{"from":34.68,"to":36.24,"location":2,"content":"Um, so for, um, so,"},{"from":36.24,"to":39.13,"location":2,"content":"two-thirds of the class there isn't a lot of,"},{"from":39.13,"to":41.01,"location":2,"content":"um, deep technical content."},{"from":41.01,"to":42.11,"location":2,"content":"But I hope they're actually"},{"from":42.11,"to":46.61,"location":2,"content":"just some useful stuff and stuff that would be good to know about."},{"from":46.61,"to":49.82,"location":2,"content":"One way you can think about this is until,"},{"from":49.82,"to":53.09,"location":2,"content":"until this year we had a midterm in this class."},{"from":53.09,"to":56.54,"location":2,"content":"So, you know, if we weren't doing this class should instead be doing the"},{"from":56.54,"to":60.74,"location":2,"content":"the mid-term based on all the material that we've covered, um, so far."},{"from":60.74,"to":63.38,"location":2,"content":"So, this should be really pleasant by comparison."},{"from":63.38,"to":66.65,"location":2,"content":"Um, but that isn't gonna be quite the entire class."},{"from":66.65,"to":69.61,"location":2,"content":"So, for this piece here in the middle I'm gonna"},{"from":69.61,"to":73.91,"location":2,"content":"spend a while back on some of the topics of last week."},{"from":73.91,"to":79.42,"location":2,"content":"So, I wanted to have one more look at some of these gated recurrent models,"},{"from":79.42,"to":81.91,"location":2,"content":"um, that Abby introduced last week."},{"from":81.91,"to":84.38,"location":2,"content":"And I guess my hope is that now that you've"},{"from":84.38,"to":86.78,"location":2,"content":"had a bit more time to look and read about things,"},{"from":86.78,"to":91.67,"location":2,"content":"and hopefully even have started working on homework for that."},{"from":91.67,"to":96.8,"location":2,"content":"Maybe it starts to make a bit more sense or else even if it's more confusing then before,"},{"from":96.8,"to":100.1,"location":2,"content":"you've got some idea of what your confusions are and questions."},{"from":100.1,"to":101.84,"location":2,"content":"And so, hopefully it's, um,"},{"from":101.84,"to":107.72,"location":2,"content":"good to think about those one more time because I think they are quite a complex notion,"},{"from":107.72,"to":111.86,"location":2,"content":"and it's not so obvious what they're doing and why they're doing anything useful,"},{"from":111.86,"to":115.08,"location":2,"content":"or whether they're just this big complex blob of mystery."},{"from":115.08,"to":119.21,"location":2,"content":"And then also to touch on a couple of machine translation topics that have um, come up"},{"from":119.21,"to":123.3,"location":2,"content":"in the final project that we didn't really get m- time to say much about last week."},{"from":123.3,"to":124.62,"location":2,"content":"[NOISE] Okay."},{"from":124.62,"to":126.48,"location":2,"content":"So, let's get started."},{"from":126.48,"to":132.51,"location":2,"content":"Um, so, this is our coursework in grading that we showed at the beginning."},{"from":132.51,"to":137,"location":2,"content":"And so, the main thing I wanna do today is talk about this final project."},{"from":137,"to":138.62,"location":2,"content":"Um, but before tha- I do that,"},{"from":138.62,"to":142.15,"location":2,"content":"let's just save one minute on participation."},{"from":142.15,"to":147.35,"location":2,"content":"Um, so, I guess we started into one aspect of the participation policy, um,"},{"from":147.35,"to":149.78,"location":2,"content":"last Thursday when we took attendance,"},{"from":149.78,"to":151.73,"location":2,"content":"and that makes it sound draconian,"},{"from":151.73,"to":153.22,"location":2,"content":"but I wanted to say, um,"},{"from":153.22,"to":154.82,"location":2,"content":"the positive viewpoint of,"},{"from":154.82,"to":156.97,"location":2,"content":"um, the participation points."},{"from":156.97,"to":158.84,"location":2,"content":"I mean, obviously this is a big class."},{"from":158.84,"to":160.64,"location":2,"content":"There are lots of people."},{"from":160.64,"to":164.09,"location":2,"content":"Um, our hope is just that people will variously,"},{"from":164.09,"to":167.48,"location":2,"content":"they're sort of engaged and involved in the class,"},{"from":167.48,"to":169.58,"location":2,"content":"and the participation points,"},{"from":169.58,"to":171.34,"location":2,"content":"ah, are our way of doing that."},{"from":171.34,"to":173.94,"location":2,"content":"I mean, basically the way this is set up."},{"from":173.94,"to":177,"location":2,"content":"I mean, if you do much of anything"},{"from":177,"to":180.23,"location":2,"content":"you should just get three percent for the participation points."},{"from":180.23,"to":181.61,"location":2,"content":"It shouldn't be hard."},{"from":181.61,"to":185.73,"location":2,"content":"I mean, I will bet you that there will be some people who at the end,"},{"from":185.73,"to":189,"location":2,"content":"will have gotten seven points in the participation category."},{"from":189,"to":191.42,"location":2,"content":"And unfortunately we cap you, we'll only give you"},{"from":191.42,"to":194.45,"location":2,"content":"three percent for the participation category, but you know,"},{"from":194.45,"to":197.45,"location":2,"content":"providing you usually come to class,"},{"from":197.45,"to":199.4,"location":2,"content":"or usually write the,"},{"from":199.4,"to":202.16,"location":2,"content":"um, what we've got to [NOISE] the invited speakers"},{"from":202.16,"to":205.22,"location":2,"content":"the reaction paragraphs if you are an SCPD student."},{"from":205.22,"to":209.21,"location":2,"content":"Sometimes, um, write a helpful answer on Piazza, right."},{"from":209.21,"to":211.9,"location":2,"content":"You're already gonna be there on three percent."},{"from":211.9,"to":213.45,"location":2,"content":"Um, yeah."},{"from":213.45,"to":216.06,"location":2,"content":"And so, one, but one other thing, um,"},{"from":216.06,"to":219.91,"location":2,"content":"that's a way to get some parti- participation points that's out today."},{"from":219.91,"to":224.12,"location":2,"content":"So, um, today we're putting up our Mid-quarter feedback survey."},{"from":224.12,"to":226.28,"location":2,"content":"And we'd love to have you fill that in."},{"from":226.28,"to":229.56,"location":2,"content":"I mean, we'd like to get your thoughts on the course so far."},{"from":229.56,"to":231.72,"location":2,"content":"And, you know, for you guys,"},{"from":231.72,"to":233.37,"location":2,"content":"there are two ways that you can win."},{"from":233.37,"to":237.5,"location":2,"content":"First if you give us some feedback that can help the rest of your quarter be better,"},{"from":237.5,"to":240.86,"location":2,"content":"but we've also got a simple bribe built into this, um,"},{"from":240.86,"to":244.64,"location":2,"content":"which is you get half a participation point simply for filling in,"},{"from":244.64,"to":246.89,"location":2,"content":"um, the, um, Mid-quarter survey,"},{"from":246.89,"to":249.26,"location":2,"content":"but it'd be really good to get your feedback on that."},{"from":249.26,"to":251.78,"location":2,"content":"Okay. So, then the main thing I want to get to"},{"from":251.78,"to":255.93,"location":2,"content":"today is to talk about [NOISE] the final project."},{"from":255.93,"to":259.41,"location":2,"content":"Okay. And so, I'll jump right ahead, um, into that."},{"from":259.41,"to":263.24,"location":2,"content":"So, for the final project there are two choices."},{"from":263.24,"to":266.6,"location":2,"content":"Um, you, you can either do our default final project,"},{"from":266.6,"to":270.56,"location":2,"content":"which I'll say a little bit about, it's doing SQuAD question answering,"},{"from":270.56,"to":272.68,"location":2,"content":"or you can propose a final,"},{"from":272.68,"to":274.31,"location":2,"content":"a custom final project,"},{"from":274.31,"to":276.11,"location":2,"content":"which we then have to approve."},{"from":276.11,"to":277.83,"location":2,"content":"And in the course of that,"},{"from":277.83,"to":280.91,"location":2,"content":"um, if you have some outside mentor, um,"},{"from":280.91,"to":283.84,"location":2,"content":"you can say who they are and your project proposal,"},{"from":283.84,"to":289.15,"location":2,"content":"but otherwise, um, we'll attempt to assign you a mentor somewhere out of the course staff."},{"from":289.15,"to":291.3,"location":2,"content":"Um, so, for all the assignments,"},{"from":291.3,"to":293.21,"location":2,"content":"through assignments one through five,"},{"from":293.21,"to":295.5,"location":2,"content":"you have to do them by yourself."},{"from":295.5,"to":299.13,"location":2,"content":"Um, for the final project in either form of that,"},{"from":299.13,"to":300.9,"location":2,"content":"you can do it as a team."},{"from":300.9,"to":302.21,"location":2,"content":"So, you can do it as one,"},{"from":302.21,"to":304.31,"location":2,"content":"two, or three people."},{"from":304.31,"to":306.51,"location":2,"content":"And how does that work?"},{"from":306.51,"to":310.35,"location":2,"content":"Um, well, it works like this, um,"},{"from":310.35,"to":312.41,"location":2,"content":"if you're a bigger team,"},{"from":312.41,"to":314.57,"location":2,"content":"we do expect you to do more,"},{"from":314.57,"to":317.82,"location":2,"content":"and there are actually two ways you can be a bigger team that I'll point out."},{"from":317.82,"to":320.9,"location":2,"content":"One way is having more people being two or three people."},{"from":320.9,"to":323.75,"location":2,"content":"And the other thing that comes up is, um,"},{"from":323.75,"to":327.97,"location":2,"content":"sometimes people wanna do a final project for more than one class at the same time."},{"from":327.97,"to":330.05,"location":2,"content":"In particular for this quarter I know there are"},{"from":330.05,"to":332.48,"location":2,"content":"at least a couple of people who are hoping to do,"},{"from":332.48,"to":337.06,"location":2,"content":"um, a joint project with Emma's reinforcement learning class."},{"from":337.06,"to":338.68,"location":2,"content":"And we allow that as well."},{"from":338.68,"to":343.49,"location":2,"content":"But we sort of do multiplication because if you're two people using it for two classes,"},{"from":343.49,"to":346.91,"location":2,"content":"that means it should be four times as great as"},{"from":346.91,"to":350.39,"location":2,"content":"what one person is doing for one class, right?"},{"from":350.39,"to":354.47,"location":2,"content":"So, how, how it works with larger teams, you know,"},{"from":354.47,"to":359.51,"location":2,"content":"in all honesty it's a little bit subtle because, you know,"},{"from":359.51,"to":362.98,"location":2,"content":"the truth is if something is just bad, um,"},{"from":362.98,"to":365.5,"location":2,"content":"your model was broken, um,"},{"from":365.5,"to":368.54,"location":2,"content":"or you, your experiment failed,"},{"from":368.54,"to":370.39,"location":2,"content":"um, and you don't know why."},{"from":370.39,"to":376.04,"location":2,"content":"Um, you know. If, if there's just obvious ways in what you've done as bad as it's sort of,"},{"from":376.04,"to":379.32,"location":2,"content":"it's sort of bad whether you're one person or four person."},{"from":379.32,"to":381.86,"location":2,"content":"Um, and if you've written it up beautifully,"},{"from":381.86,"to":383.84,"location":2,"content":"you've written up beautifully regardless of whether"},{"from":383.84,"to":386.24,"location":2,"content":"you're one person or four per- people,"},{"from":386.24,"to":392,"location":2,"content":"that you know nevertheless the expectation is that if you're one person will be pleased,"},{"from":392,"to":395.96,"location":2,"content":"that if you put together one model and gotten it to work well, um,"},{"from":395.96,"to":398.79,"location":2,"content":"but if you're three people will say, \"Well,"},{"from":398.79,"to":400.9,"location":2,"content":"that wasn't such a big effort, um,"},{"from":400.9,"to":403.62,"location":2,"content":"running this one model against this task.\""},{"from":403.62,"to":405.31,"location":2,"content":"Surely if there are three people,"},{"from":405.31,"to":406.7,"location":2,"content":"they could have investigated"},{"from":406.7,"to":411.51,"location":2,"content":"some other model classes and seeing whether they perform better or worse on this task."},{"from":411.51,"to":413.45,"location":2,"content":"And we'll feel a sense of lightweight."},{"from":413.45,"to":418.18,"location":2,"content":"So, we are expecting that sort of both more ambitious projects,"},{"from":418.18,"to":421.19,"location":2,"content":"and more thorough exploration of them if you're"},{"from":421.19,"to":424.74,"location":2,"content":"being a bigger team or you're using it for multiple classes."},{"from":424.74,"to":426.41,"location":2,"content":"Um, for the final project,"},{"from":426.41,"to":429.55,"location":2,"content":"you are allowed to use any language or deep learning,"},{"from":429.55,"to":431.96,"location":2,"content":"um, framework that you choose to."},{"from":431.96,"to":433.91,"location":2,"content":"We don't insist on what you use,"},{"from":433.91,"to":436.02,"location":2,"content":"though in practice in past years."},{"from":436.02,"to":438.73,"location":2,"content":"Basically everyone keeps on using what they've learned,"},{"from":438.73,"to":439.88,"location":2,"content":"um, in the assignments."},{"from":439.88,"to":441.69,"location":2,"content":"I expect that will be true, um,"},{"from":441.69,"to":444.03,"location":2,"content":"this time as well. [NOISE]"},{"from":444.03,"to":449.88,"location":2,"content":"Okay. So, um, let me just mention quickly the default final project,"},{"from":449.88,"to":451.32,"location":2,"content":"so that you've got, um,"},{"from":451.32,"to":453.12,"location":2,"content":"some sense of context."},{"from":453.12,"to":456.38,"location":2,"content":"So, the materials of that will be released this Thursday."},{"from":456.38,"to":458.46,"location":2,"content":"And so, for the tasks for it is,"},{"from":458.46,"to":462.09,"location":2,"content":"a textural question-answering task which is done over the,"},{"from":462.09,"to":465.24,"location":2,"content":"the Stanford Question Answering Dataset, SQuAD,"},{"from":465.24,"to":467.48,"location":2,"content":"which was a dataset put together, um,"},{"from":467.48,"to":471.87,"location":2,"content":"by Percy Liang and the department and the student ."},{"from":471.87,"to":475.38,"location":2,"content":"Um, so, we've used this as a default final project,"},{"from":475.38,"to":478.68,"location":2,"content":"um, before but we're mixing up a couple of things this year."},{"from":478.68,"to":483.84,"location":2,"content":"I mean, firstly, the starter code we're providing this year is in pytorch,"},{"from":483.84,"to":486.46,"location":2,"content":"to fit in with what we've done to the rest of the class."},{"from":486.46,"to":489.76,"location":2,"content":"But secondly, the SQuAD team,"},{"from":489.76,"to":491.7,"location":2,"content":"released a new version of SQuAD,"},{"from":491.7,"to":495.84,"location":2,"content":"SQuAD 2.0 and we're going to use that for the class this year."},{"from":495.84,"to":498.63,"location":2,"content":"And the essential difference in SQuAD 2.0,"},{"from":498.63,"to":501.98,"location":2,"content":"is in SQuAD 1.1 or 1.0,"},{"from":501.98,"to":508.06,"location":2,"content":"every question had an answer in the passage of text whereas in SQuAD 2.0,"},{"from":508.06,"to":510.21,"location":2,"content":"a lot of questions don't have answers."},{"from":510.21,"to":514.77,"location":2,"content":"So, there's this extra significant thing that you need to do which is working out,"},{"from":514.77,"to":516.96,"location":2,"content":"um, whether a question has an answer."},{"from":516.96,"to":519.51,"location":2,"content":"So, th- this is just one example,"},{"from":519.51,"to":523.43,"location":2,"content":"um, which just gives you a sense of the SQuAD, what SQuAD is like."},{"from":523.43,"to":525.68,"location":2,"content":"So, there's a paragraph of text."},{"from":525.68,"to":528.68,"location":2,"content":"I've just put a subset of it here, um, Bill Aken,"},{"from":528.68,"to":532.46,"location":2,"content":"adopted by Mexican movie actress, Lupe Mayorga, um,"},{"from":532.46,"to":535.29,"location":2,"content":"grew up in the neighborhood town, neighboring, sorry,"},{"from":535.29,"to":537.99,"location":2,"content":"neighboring town of Madeira and his song chronicled"},{"from":537.99,"to":541.65,"location":2,"content":"the hardships faced by the migrant farm workers he saw as a child."},{"from":541.65,"to":544.03,"location":2,"content":"Right, there's then a question, um,"},{"from":544.03,"to":545.76,"location":2,"content":"in what town did Bill,"},{"from":545.76,"to":547.65,"location":2,"content":"right, actually I misspelled that sorry,"},{"from":547.65,"to":553.23,"location":2,"content":"it should have been Aken without an I. I got confused with our former department chair,"},{"from":553.23,"to":555.32,"location":2,"content":"Alex Aiken, I guess when I was typing."},{"from":555.32,"to":557.17,"location":2,"content":"Um, Bill Aken grow up?"},{"from":557.17,"to":559.92,"location":2,"content":"And the answer you are meant to give is Madeira."},{"from":559.92,"to":562.32,"location":2,"content":"Um, so, just incidentally,"},{"from":562.32,"to":564.18,"location":2,"content":"it's a random fact."},{"from":564.18,"to":568.5,"location":2,"content":"Um, so, quite a few of you know about something that was"},{"from":568.5,"to":570.45,"location":2,"content":"recently in the kind of tech news, tech"},{"from":570.45,"to":573.28,"location":2,"content":"news and we're going to talk about later in the class."},{"from":573.28,"to":574.86,"location":2,"content":"Um, that people, um,"},{"from":574.86,"to":579.01,"location":2,"content":"from Google produced this very strong New Natural Language"},{"from":579.01,"to":582.09,"location":2,"content":"Understanding representation model called BERT."},{"from":582.09,"to":586.7,"location":2,"content":"And which is one of several kind of models that are in a class of,"},{"from":586.7,"to":592.65,"location":2,"content":"models that contextually model words that have come into prominence in 2017 and 18."},{"from":592.65,"to":598.77,"location":2,"content":"And in general, BERT has sort of produced very good performance for very many tasks."},{"from":598.77,"to":603.9,"location":2,"content":"Indeed, if you look at the SQuAD 2.0 leader board online, um,"},{"from":603.9,"to":606.98,"location":2,"content":"at this URL, what you'll find is that"},{"from":606.98,"to":611.63,"location":2,"content":"all of the leading systems use BERT in some way or another, these days."},{"from":611.63,"to":614.91,"location":2,"content":"Um, but nevertheless, this was actually a question that BERT got wrong."},{"from":614.91,"to":616.29,"location":2,"content":"Um, that BERT said,"},{"from":616.29,"to":618,"location":2,"content":"\"No answer to this question,"},{"from":618,"to":619.68,"location":2,"content":"\" rather than getting the correct answer."},{"from":619.68,"to":623.07,"location":2,"content":"Even though it looks kind of straightforward reading it as a human being."},{"from":623.07,"to":627.32,"location":2,"content":"It doesn't really look a human tricky reading comprehension question."},{"from":627.32,"to":630.39,"location":2,"content":"Um, so, that's the default final project."},{"from":630.39,"to":635.36,"location":2,"content":"So, on Thursday, I'm going to talk more about the default final project."},{"from":635.36,"to":639.25,"location":2,"content":"I'm going to talk about how people build textual question answering systems."},{"from":639.25,"to":643.74,"location":2,"content":"And the details on the default final project should all be posted by then,"},{"from":643.74,"to":647.22,"location":2,"content":"but that's just to give you a bit of context of what the other choice is."},{"from":647.22,"to":651.16,"location":2,"content":"And today, I'm sort of more going to be aiming at people,"},{"from":651.16,"to":654.18,"location":2,"content":"um, doing the custom final project."},{"from":654.18,"to":658.59,"location":2,"content":"But let me just sort of say a bit first about the choice between the two of them."},{"from":658.59,"to":662.94,"location":2,"content":"So, um, why might you want to choose the default final project?"},{"from":662.94,"to":667.32,"location":2,"content":"So, if you have limited experience with research,"},{"from":667.32,"to":672.18,"location":2,"content":"you don't have any clear idea of a research project you want to do this quarter,"},{"from":672.18,"to":674.85,"location":2,"content":"you're just really busy with other classes that, uh,"},{"from":674.85,"to":677.7,"location":2,"content":"you're enrolled in CS140 and you're just really loade- loaded"},{"from":677.7,"to":681.2,"location":2,"content":"[LAUGHTER] now with other classes you're doing this quarter."},{"from":681.2,"to":685.89,"location":2,"content":"Um, you'd be happy to have just a clear goal towards, to work towards."},{"from":685.89,"to":689.55,"location":2,"content":"A leaderboard of your fellow students that you can compete against."},{"from":689.55,"to":691.89,"location":2,"content":"Um, do the default final project."},{"from":691.89,"to":696.51,"location":2,"content":"Um, I think for many people it's actually the good right choice."},{"from":696.51,"to":698.67,"location":2,"content":"And I mean, for what it's worth, I mean,"},{"from":698.67,"to":703.16,"location":2,"content":"typically, slightly over half of people have done the default final project."},{"from":703.16,"to":705.17,"location":2,"content":"It's normally that, so 55 percent have done"},{"from":705.17,"to":708.68,"location":2,"content":"the default final project and the rest the custom final project."},{"from":708.68,"to":711.14,"location":2,"content":"So, if you do the default final project,"},{"from":711.14,"to":712.72,"location":2,"content":"you'll get lots of guidance."},{"from":712.72,"to":714.54,"location":2,"content":"You get lots of scaffolding."},{"from":714.54,"to":718.36,"location":2,"content":"There are clear things to aim at in what you do."},{"from":718.36,"to":724.01,"location":2,"content":"Um, the course staff are in general most prepared and most able to help you."},{"from":724.01,"to":725.99,"location":2,"content":"Um, and in particular,"},{"from":725.99,"to":729,"location":2,"content":"I mean, the, for the bottom bullet here."},{"from":729,"to":731.51,"location":2,"content":"I mean, you know, something to think about in making"},{"from":731.51,"to":736.04,"location":2,"content":"the choices that some of it comes down to how committed,"},{"from":736.04,"to":741.32,"location":2,"content":"organized, and keen are you to be wanting to do your own custom final project."},{"from":741.32,"to":744.89,"location":2,"content":"If you've got a, something you really want to do for a custom final project, great."},{"from":744.89,"to":748.27,"location":2,"content":"We love to see interesting custom final projects."},{"from":748.27,"to":752.76,"location":2,"content":"But, you know, if you're going to end up doing something that just looks"},{"from":752.76,"to":759.15,"location":2,"content":"worse like [LAUGHTER] not done as well [LAUGHTER] as you would've done a, done a project."},{"from":759.15,"to":762.09,"location":2,"content":"If you'd just done the fin-, default final project,"},{"from":762.09,"to":765.09,"location":2,"content":"then you should probably choose the default final project [LAUGHTER]."},{"from":765.09,"to":767.18,"location":2,"content":"Um, okay."},{"from":767.18,"to":768.78,"location":2,"content":"But even if you are doing,"},{"from":768.78,"to":771.13,"location":2,"content":"think you'll do the default final project."},{"from":771.13,"to":774.62,"location":2,"content":"I hope that some of this lecture will still, um, be useful."},{"from":774.62,"to":776.66,"location":2,"content":"While the part in the middle, when I talk back about"},{"from":776.66,"to":779.52,"location":2,"content":"MT and Gater or current networks are definitely useful."},{"from":779.52,"to":781.67,"location":2,"content":"But, you know, beyond that, um,"},{"from":781.67,"to":785.35,"location":2,"content":"some of the tips on doing research and discussions of,"},{"from":785.35,"to":790.23,"location":2,"content":"sort of looking at how to make neural networks work and error analysis, paper writing."},{"from":790.23,"to":794.72,"location":2,"content":"These are all good topics that apply to the default final project as well."},{"from":794.72,"to":796.77,"location":2,"content":"So, in the other direction, um,"},{"from":796.77,"to":799.68,"location":2,"content":"if you have some research project that you're excited about."},{"from":799.68,"to":802.59,"location":2,"content":"Possibly, it's one you are already working on or possibly,"},{"from":802.59,"to":804.62,"location":2,"content":"that you've just always wished to do."},{"from":804.62,"to":807.69,"location":2,"content":"Something exciting with neural networks and rap music."},{"from":807.69,"to":812.34,"location":2,"content":"Um, well, you know, that custom final project is an opportunity to do that."},{"from":812.34,"to":815.55,"location":2,"content":"Um, so, it's a chance for you to do something on your own."},{"from":815.55,"to":817.98,"location":2,"content":"Um, it, you know, obviously,"},{"from":817.98,"to":820.2,"location":2,"content":"if you're not interested in textural question-answering"},{"from":820.2,"to":822.15,"location":2,"content":"but do you think you might like machine translation."},{"from":822.15,"to":823.74,"location":2,"content":"Well, it's an opportunity, um,"},{"from":823.74,"to":825.76,"location":2,"content":"to choose any topic of your own."},{"from":825.76,"to":832.59,"location":2,"content":"It's also a way to sort of experience much more of the research pro- process because,"},{"from":832.59,"to":835.29,"location":2,"content":"you know, for the default final project, it's a bigger,"},{"from":835.29,"to":838.54,"location":2,"content":"more open-ended thing than any of our assignments."},{"from":838.54,"to":839.89,"location":2,"content":"But, you know, nevertheless,"},{"from":839.89,"to":841.8,"location":2,"content":"the default final project is still"},{"from":841.8,"to":845.79,"location":2,"content":"sort of a pre-setup thing that you don't have to find your own problem,"},{"from":845.79,"to":847.18,"location":2,"content":"find your own data,"},{"from":847.18,"to":848.94,"location":2,"content":"work out a good approach to it."},{"from":848.94,"to":850.98,"location":2,"content":"A lot of that's sort of been done for you."},{"from":850.98,"to":854.5,"location":2,"content":"So, that, for a custom final project it's much more"},{"from":854.5,"to":858.9,"location":2,"content":"your own job to sort of define and execute a mini research project."},{"from":858.9,"to":862.44,"location":2,"content":"And so, if all of that stuff seems appealing or some of it seems appealing,"},{"from":862.44,"to":864.98,"location":2,"content":"um, then aim at the custom final project."},{"from":864.98,"to":870.04,"location":2,"content":"Um, doing this just reminded me about a fact about assignments one to five."},{"from":870.04,"to":872.31,"location":2,"content":"You know, for assignments one to five,"},{"from":872.31,"to":876.09,"location":2,"content":"we are hoping that they can be a set of stepping"},{"from":876.09,"to":879.88,"location":2,"content":"stones for learning how to build deep learning systems."},{"from":879.88,"to":887.31,"location":2,"content":"But, you know, one of our goals in that is to give you less hand holds as time goes by."},{"from":887.31,"to":891.65,"location":2,"content":"So, you know, assignment one was really easy and assignment three,"},{"from":891.65,"to":893.88,"location":2,"content":"we tried to make it really handholdy,"},{"from":893.88,"to":896.7,"location":2,"content":"so people could start to learn PyTorch."},{"from":896.7,"to":899.28,"location":2,"content":"But, you know, we're actually hoping for assignments"},{"from":899.28,"to":902.28,"location":2,"content":"four and five that they're actually harder,"},{"from":902.28,"to":904.85,"location":2,"content":"so that you're getting more experience of working"},{"from":904.85,"to":907.43,"location":2,"content":"out how to build and do things by yourself"},{"from":907.43,"to":912.83,"location":2,"content":"because if the only thing you ever see is completely scaffolded assignments."},{"from":912.83,"to":917.3,"location":2,"content":"It's sort of like when you do CS106A that you have to do a great job on"},{"from":917.3,"to":921.98,"location":2,"content":"the CS106A assignments but you don't really know how to write a program by yourselves."},{"from":921.98,"to":923.55,"location":2,"content":"And that's sort of what we want to, um,"},{"from":923.55,"to":925.31,"location":2,"content":"sort of get you beyond,"},{"from":925.31,"to":927.05,"location":2,"content":"um, in the latter two assignments."},{"from":927.05,"to":929.86,"location":2,"content":"So, I hope you have started on assignment four."},{"from":929.86,"to":934.88,"location":2,"content":"If not, you really should start and get underway soon as Abby was emphasizing."},{"from":934.88,"to":937.58,"location":2,"content":"Okay. So, this year for the,"},{"from":937.58,"to":940.93,"location":2,"content":"um, final project, whichever one you're doing."},{"from":940.93,"to":943.77,"location":2,"content":"Um, we're actually putting more structure in than we have"},{"from":943.77,"to":946.73,"location":2,"content":"in previous years to encourage people to get going."},{"from":946.73,"to":948.03,"location":2,"content":"And so, in particular,"},{"from":948.03,"to":952.19,"location":2,"content":"there are early on components which are worth points in the grading."},{"from":952.19,"to":955.5,"location":2,"content":"So, the first part of that is a project proposal,"},{"from":955.5,"to":957.41,"location":2,"content":"um, which is, um,"},{"from":957.41,"to":959.02,"location":2,"content":"we want from each team."},{"from":959.02,"to":960.91,"location":2,"content":"So, one per team, um,"},{"from":960.91,"to":962.67,"location":2,"content":"you can just do a joint one,"},{"from":962.67,"to":964.62,"location":2,"content":"um, which is worth five percent."},{"from":964.62,"to":968.3,"location":2,"content":"Um, so, it's, we're releasing the details on Thursday which is when"},{"from":968.3,"to":972.65,"location":2,"content":"assignment four is due and it'll be due the following Thursday."},{"from":972.65,"to":976.43,"location":2,"content":"So, we're actually having an interruption in the sequence of current assignments, right."},{"from":976.43,"to":979.25,"location":2,"content":"So, for the next week, um,"},{"from":979.25,"to":982.8,"location":2,"content":"what the thing to do is project proposal."},{"from":982.8,"to":984.77,"location":2,"content":"And then the week after that, um,"},{"from":984.77,"to":989.08,"location":2,"content":"we're back to assignment five and then we go full time into final project."},{"from":989.08,"to":990.59,"location":2,"content":"So, what we're wanting for"},{"from":990.59,"to":994.25,"location":2,"content":"the project proposal is we're actually wanting you to do a little bit"},{"from":994.25,"to":999.47,"location":2,"content":"of starting off research and the fine ter- terms of reading some paper."},{"from":999.47,"to":1001.75,"location":2,"content":"So, find some paper that's, um,"},{"from":1001.75,"to":1003.52,"location":2,"content":"relevant to your research,"},{"from":1003.52,"to":1005.54,"location":2,"content":"um, that you are going to do."},{"from":1005.54,"to":1009.22,"location":2,"content":"Um, read it, write a summary of what it does."},{"from":1009.22,"to":1014.27,"location":2,"content":"Um, write down some thoughts on how you could adapt or extend ideas in it,"},{"from":1014.27,"to":1016.45,"location":2,"content":"in your own final project."},{"from":1016.45,"to":1019.81,"location":2,"content":"Um, and then say something about what your plan is for"},{"from":1019.81,"to":1023.08,"location":2,"content":"what you're goi- hoping to do for your final project."},{"from":1023.08,"to":1025.66,"location":2,"content":"And especially, if you're doing a custom final project"},{"from":1025.66,"to":1028.21,"location":2,"content":"there's more to write there because we'll want to make"},{"from":1028.21,"to":1030.44,"location":2,"content":"sure that you have some idea as to"},{"from":1030.44,"to":1033.36,"location":2,"content":"what data you can use and how are you going to evaluate it."},{"from":1033.36,"to":1036.13,"location":2,"content":"Whereas a couple of those things are actually sort of"},{"from":1036.13,"to":1040.43,"location":2,"content":"determined for you if you're doing the default final project."},{"from":1040.43,"to":1045.54,"location":2,"content":"Um, and so then after that we're going to have a project milestone, um,"},{"from":1045.54,"to":1048.39,"location":2,"content":"which is the progress report where we're hoping that you can"},{"from":1048.39,"to":1051.3,"location":2,"content":"report that you're well along in your final project."},{"from":1051.3,"to":1053.85,"location":2,"content":"That you've run at least some experiment and have"},{"from":1053.85,"to":1057.08,"location":2,"content":"some results on some data that you can talk about."},{"from":1057.08,"to":1059.82,"location":2,"content":"So the default- the project milestone is due on,"},{"from":1059.82,"to":1061.79,"location":2,"content":"um, Thursday, March seven."},{"from":1061.79,"to":1065.01,"location":2,"content":"So it's actually more than halfway through"},{"from":1065.01,"to":1068.13,"location":2,"content":"the period that's sort of dedicated to the final project."},{"from":1068.13,"to":1071.22,"location":2,"content":"So, if you are not- we sort of put it past"},{"from":1071.22,"to":1074.97,"location":2,"content":"halfway because the fact of the matter is it always takes people time to get going,"},{"from":1074.97,"to":1076.63,"location":2,"content":"um, but nevertheless, you know,"},{"from":1076.63,"to":1079.35,"location":2,"content":"what you should have in your head is unless you're halfway"},{"from":1079.35,"to":1082.44,"location":2,"content":"through by the time you're handing in your,"},{"from":1082.44,"to":1086.04,"location":2,"content":"um, project milestone, then you're definitely behind."},{"from":1086.04,"to":1089.91,"location":2,"content":"And you'll be doing that typical Stanford thing of having a lot of late nights"},{"from":1089.91,"to":1094.76,"location":2,"content":"and lack of sleep in the last week [LAUGHTER] of class trying to catch up for that."},{"from":1094.76,"to":1097.48,"location":2,"content":"Um, okay. So, um,"},{"from":1097.48,"to":1099.02,"location":2,"content":"so now I've sort of, um,"},{"from":1099.02,"to":1102.9,"location":2,"content":"want to sort of just start saying a bit of- for"},{"from":1102.9,"to":1105.27,"location":2,"content":"custom final projects of some of the sort of"},{"from":1105.27,"to":1108.19,"location":2,"content":"thinking and types of things that you could do about that."},{"from":1108.19,"to":1111.57,"location":2,"content":"Um, so you have to determine some project,"},{"from":1111.57,"to":1115.14,"location":2,"content":"um, for- if you're doing a custom final project."},{"from":1115.14,"to":1117.33,"location":2,"content":"So, in philosophy of science, you know,"},{"from":1117.33,"to":1120.81,"location":2,"content":"there are basically two ways for any field you can have a project."},{"from":1120.81,"to":1124.52,"location":2,"content":"You either start with some domain problem of interest."},{"from":1124.52,"to":1128.46,"location":2,"content":"You're [NOISE] just got something you're interested in or say,"},{"from":1128.46,"to":1131.89,"location":2,"content":"\"Gee, I'd like to do better machine translation.\""},{"from":1131.89,"to":1135.22,"location":2,"content":"And then you work out some ways to address it with technology,"},{"from":1135.22,"to":1136.56,"location":2,"content":"or you start with some, um,"},{"from":1136.56,"to":1138.7,"location":2,"content":"technical approach of interest."},{"from":1138.7,"to":1140.55,"location":2,"content":"And you say, \"Oh well,"},{"from":1140.55,"to":1142.5,"location":2,"content":"those LSTMs seemed kind of neat,"},{"from":1142.5,"to":1144.36,"location":2,"content":"but I didn't understand why there's"},{"from":1144.36,"to":1148.04,"location":2,"content":"that extra 10H and I think it'd be better if it changed in this other way."},{"from":1148.04,"to":1153.57,"location":2,"content":"And you start exploring from a technical direction to try and come up with a better idea."},{"from":1153.57,"to":1155.97,"location":2,"content":"And then you're wanting to prove that it works."},{"from":1155.97,"to":1160.59,"location":2,"content":"So in kinds of the projects that people do for this class,"},{"from":1160.59,"to":1162.51,"location":2,"content":"this isn't quite an exhaustive list,"},{"from":1162.51,"to":1164.97,"location":2,"content":"but this is sort of in general what people do."},{"from":1164.97,"to":1168.51,"location":2,"content":"So, the first category and really I think this"},{"from":1168.51,"to":1172.08,"location":2,"content":"is the bulk of projects over half is people find"},{"from":1172.08,"to":1175.65,"location":2,"content":"some task replication of interest and they build"},{"from":1175.65,"to":1179.74,"location":2,"content":"some neural network models to try and do it as effectively as possible."},{"from":1179.74,"to":1187.02,"location":2,"content":"Um, there's a second category where people sort of concentrate on implementing,"},{"from":1187.02,"to":1193.58,"location":2,"content":"so re-implementing some complex neural architecture and getting it to work on some data."},{"from":1193.58,"to":1197.12,"location":2,"content":"And so let me just say a couple of sentences on this."},{"from":1197.12,"to":1201.53,"location":2,"content":"Um, so, it's certainly okay for you to,"},{"from":1201.53,"to":1205.39,"location":2,"content":"um, start by re-implementing some existing model."},{"from":1205.39,"to":1210.97,"location":2,"content":"Um, and some people that's as far as they get."},{"from":1210.97,"to":1214.63,"location":2,"content":"And then the question is, um, is that okay?"},{"from":1214.63,"to":1217.65,"location":2,"content":"And the answer to whether that's okay sort"},{"from":1217.65,"to":1220.92,"location":2,"content":"of largely depends on how complex your neural model is."},{"from":1220.92,"to":1228.06,"location":2,"content":"Um, so if what you think is okay I'm going to, um,"},{"from":1228.06,"to":1231.27,"location":2,"content":"re-implement something like we've seen already,"},{"from":1231.27,"to":1234.6,"location":2,"content":"like a window-based classification model and you"},{"from":1234.6,"to":1238.11,"location":2,"content":"just re-implement that and run it on some data and get some results and stop."},{"from":1238.11,"to":1240.36,"location":2,"content":"That's definitely a bad project."},{"from":1240.36,"to":1245.1,"location":2,"content":"Um, but there are lots of very complicated and sophisticated neural,"},{"from":1245.1,"to":1247.07,"location":2,"content":"um, architectures out there."},{"from":1247.07,"to":1251.79,"location":2,"content":"And if you're trying to do something complicated well then that can be a fine project."},{"from":1251.79,"to":1255.84,"location":2,"content":"Um, so, I actually sort of stuck in a few examples of projects."},{"from":1255.84,"to":1260.49,"location":2,"content":"So, I mean, here's one that was actually from a couple of years ago."},{"from":1260.49,"to":1263.54,"location":2,"content":"Um, so this was in the 2017 class."},{"from":1263.54,"to":1267.15,"location":2,"content":"And so, shortly before the 2017 class,"},{"from":1267.15,"to":1271.23,"location":2,"content":"\"Deep Mind\" who's one of the um, organizations producing"},{"from":1271.23,"to":1274.38,"location":2,"content":"the most complicated neural models had just released"},{"from":1274.38,"to":1277.89,"location":2,"content":"a paper about the differentiable neural computer model,"},{"from":1277.89,"to":1280.17,"location":2,"content":"which was a model of how to have something like"},{"from":1280.17,"to":1283.11,"location":2,"content":"a differentiate- differentiable Turing machine-like"},{"from":1283.11,"to":1286.66,"location":2,"content":"architecture inside a neural network, um,"},{"from":1286.66,"to":1289.05,"location":2,"content":"and thought, um,"},{"from":1289.05,"to":1292.23,"location":2,"content":"this would be a great challenge to try and, um,"},{"from":1292.23,"to":1296.97,"location":2,"content":"re-implement the differentiable neural computer which Deep Mind hadn't released"},{"from":1296.97,"to":1299.1,"location":2,"content":"any source code for because they're not the kind of"},{"from":1299.1,"to":1301.86,"location":2,"content":"place that generally releases their source code."},{"from":1301.86,"to":1306.42,"location":2,"content":"Um, and, you know, this was actually an extremely ambitious project because it"},{"from":1306.42,"to":1311.84,"location":2,"content":"was, it's a very complex architecture which is hard to get to train."},{"from":1311.84,"to":1314.27,"location":2,"content":"And so, you know, at the end,"},{"from":1314.27,"to":1318.18,"location":2,"content":"at the end she hadn't been able to sort of train as"},{"from":1318.18,"to":1322.23,"location":2,"content":"big a model or get as good results as they report in the paper that,"},{"from":1322.23,"to":1324.03,"location":2,"content":"you know, frankly we thought it was pretty"},{"from":1324.03,"to":1327.12,"location":2,"content":"miraculous that she managed to get it working at all."},{"from":1327.12,"to":1331.92,"location":2,"content":"In the period of time we had in the class and she did successfully do an open-source"},{"from":1331.92,"to":1336.6,"location":2,"content":"re-implementation of this model which basically worked the same as in their paper."},{"from":1336.6,"to":1337.77,"location":2,"content":"Though not quite as well."},{"from":1337.77,"to":1339.81,"location":2,"content":"So, you know, that seemed a huge achievement."},{"from":1339.81,"to":1343.9,"location":2,"content":"So, you certainly can do something of that sort."},{"from":1343.9,"to":1348.21,"location":2,"content":"Right. So, um, so you- you can sort of from"},{"from":1348.21,"to":1352.85,"location":2,"content":"a technical direction have some ideas for variant model and explore,"},{"from":1352.85,"to":1355.65,"location":2,"content":"um, how to make a different kind of model class and then look"},{"from":1355.65,"to":1359.07,"location":2,"content":"at how it works on some problem that works well."},{"from":1359.07,"to":1363.2,"location":2,"content":"Another kind of project you can do is an analysis project,"},{"from":1363.2,"to":1365.69,"location":2,"content":"so that you might be interested in something in"},{"from":1365.69,"to":1369.52,"location":2,"content":"natural language or something on the behavior of neural networks,"},{"from":1369.52,"to":1372.74,"location":2,"content":"and just think that you want to analyze them more closely."},{"from":1372.74,"to":1374.7,"location":2,"content":"So, you might think, \"Oh,"},{"from":1374.7,"to":1378.23,"location":2,"content":"maybe these neural machine translation systems work great"},{"from":1378.23,"to":1382.53,"location":2,"content":"providing the word order is the same in the source and target language,"},{"from":1382.53,"to":1387.18,"location":2,"content":"but can they really do a good job of reordering phrases for different language types?"},{"from":1387.18,"to":1389.67,"location":2,"content":"How much does their performance vary based on"},{"from":1389.67,"to":1392.63,"location":2,"content":"the amount of reordering between the source and target language?\""},{"from":1392.63,"to":1394.76,"location":2,"content":"And you could do some experiments to try and"},{"from":1394.76,"to":1398.61,"location":2,"content":"investigate that as an analysis problem that looks at a model,"},{"from":1398.61,"to":1401.01,"location":2,"content":"and we sometimes get projects like that."},{"from":1401.01,"to":1404.04,"location":2,"content":"Down at the bottom is the rarest kind of project,"},{"from":1404.04,"to":1406.86,"location":2,"content":"which is when some people try to do something"},{"from":1406.86,"to":1410.63,"location":2,"content":"theoretical which is to prove some properties of a system."},{"from":1410.63,"to":1415.41,"location":2,"content":"So if- this is easiest to do in simple systems for something like word vectors,"},{"from":1415.41,"to":1419.13,"location":2,"content":"that if you might want to prove something about"},{"from":1419.13,"to":1423.16,"location":2,"content":"the kind of spaces that are induced by word vectors,"},{"from":1423.16,"to":1425.49,"location":2,"content":"and what properties you need to have in"},{"from":1425.49,"to":1429.38,"location":2,"content":"models for word analogies to work or something like that."},{"from":1429.38,"to":1433.99,"location":2,"content":"Um here are just another couple of examples that so- shows some of the other classes."},{"from":1433.99,"to":1437.94,"location":2,"content":"So, this one is an example of find a problem and build some models."},{"from":1437.94,"to":1444.15,"location":2,"content":"So, these three people um, looked at Shakespearean Sonnet generation and then they considered"},{"from":1444.15,"to":1447.78,"location":2,"content":"several different models for Shakespearean Sonnet generation and"},{"from":1447.78,"to":1451.77,"location":2,"content":"got the best results from this sort of- you'd probably can't really see all the details,"},{"from":1451.77,"to":1455.07,"location":2,"content":"but they have a sort of a mixture of word level and"},{"from":1455.07,"to":1458.4,"location":2,"content":"character level gated model that feeds into"},{"from":1458.4,"to":1463.13,"location":2,"content":"a word level LSTM and produces sonnets and the output wasn't totally bad."},{"from":1463.13,"to":1466.31,"location":2,"content":"\"Thy youth's time and face his form shall cover."},{"from":1466.31,"to":1468.87,"location":2,"content":"Now all fresh beauty my love there."},{"from":1468.87,"to":1472.29,"location":2,"content":"Will ever time to greet forget each like ever decease,"},{"from":1472.29,"to":1475.82,"location":2,"content":"but in a- in a best at worship his glory die.\""},{"from":1475.82,"to":1477.78,"location":2,"content":"Okay. It's maybe not perfect,"},{"from":1477.78,"to":1481.96,"location":2,"content":"[LAUGHTER] but it sort of sounds like a Shakespearean sonnet."},{"from":1481.96,"to":1484.16,"location":2,"content":"Um, okay."},{"from":1484.16,"to":1486.88,"location":2,"content":"Yeah. So, I showed you that one already."},{"from":1486.88,"to":1494.21,"location":2,"content":"Um, here's, um, an example of someone who designed a different kind of network,"},{"from":1494.21,"to":1498.76,"location":2,"content":"and this was a project that came out of this class that was then continued with,"},{"from":1498.76,"to":1501.31,"location":2,"content":"and the- they got a conference paper out of it,"},{"from":1501.31,"to":1503.86,"location":2,"content":"the ICLR 2017 paper."},{"from":1503.86,"to":1509.44,"location":2,"content":"So, this was looking at doing a better job at building a neural language model."},{"from":1509.44,"to":1512.11,"location":2,"content":"And essentially, they had two ideas,"},{"from":1512.11,"to":1516.43,"location":2,"content":"both of which seem useful for building better neural language models."},{"from":1516.43,"to":1520.75,"location":2,"content":"And so, one is that in the stuff that we've presented so far,"},{"from":1520.75,"to":1522.79,"location":2,"content":"whether it was the early word vectors,"},{"from":1522.79,"to":1525.61,"location":2,"content":"or what Abby presented last week in the neural language model,"},{"from":1525.61,"to":1530.44,"location":2,"content":"there are effectively two vectors for each word: there's one for the word encoding"},{"from":1530.44,"to":1535.42,"location":2,"content":"on the input and then when you have the softmax on the other side effectively,"},{"from":1535.42,"to":1539.5,"location":2,"content":"the rows of that matrix that go into the softmax are also"},{"from":1539.5,"to":1544.19,"location":2,"content":"word vectors for determining how likely you are to produce different words."},{"from":1544.19,"to":1548.71,"location":2,"content":"And so, um, these two people had the idea that maybe if we actually in the model"},{"from":1548.71,"to":1554.95,"location":2,"content":"tied those two word ve- vectors together that would help and produce a better model and,"},{"from":1554.95,"to":1557.23,"location":2,"content":"um, and so this was actually done"},{"from":1557.23,"to":1560.86,"location":2,"content":"several years ago when that was a novel idea which hadn't actually been done."},{"from":1560.86,"to":1564.09,"location":2,"content":"So, this was done in the 2016 class,"},{"from":1564.09,"to":1566.88,"location":2,"content":"and then they had this second idea which was,"},{"from":1566.88,"to":1569.08,"location":2,"content":"well maybe doing the kind of,"},{"from":1569.08,"to":1571.66,"location":2,"content":"cross entropy one, zero,"},{"from":1571.66,"to":1574.6,"location":2,"content":"sort of you look at the correct word that you are meant to"},{"from":1574.6,"to":1578.62,"location":2,"content":"produce and sort of work out a loss based on that."},{"from":1578.62,"to":1581.14,"location":2,"content":"Maybe that's not very good because you don't get"},{"from":1581.14,"to":1585.52,"location":2,"content":"partial points if you produce a different word that's semantically similar."},{"from":1585.52,"to":1588.1,"location":2,"content":"And so, that they had this idea that they could use"},{"from":1588.1,"to":1593.35,"location":2,"content":"word vector similarity and then you'd be giving a score for any word that was"},{"from":1593.35,"to":1596.31,"location":2,"content":"produced next based on how similar it was"},{"from":1596.31,"to":1599.47,"location":2,"content":"according to word vector similarity to the word that you are"},{"from":1599.47,"to":1601.72,"location":2,"content":"meant to produce next and that was also"},{"from":1601.72,"to":1605.88,"location":2,"content":"a useful idea that they're able to produce improved language models with."},{"from":1605.88,"to":1607.42,"location":2,"content":"So, that was a cool project."},{"from":1607.42,"to":1610.18,"location":2,"content":"Um, here's an example of, um,"},{"from":1610.18,"to":1612.01,"location":2,"content":"somebody from last year,"},{"from":1612.01,"to":1614.56,"location":2,"content":"um, who did an analysis project."},{"from":1614.56,"to":1617.13,"location":2,"content":"So, their idea was,"},{"from":1617.13,"to":1619.66,"location":2,"content":"um, that they- well,"},{"from":1619.66,"to":1620.68,"location":2,"content":"they were going to, um,"},{"from":1620.68,"to":1622.35,"location":2,"content":"evaluate on some task,"},{"from":1622.35,"to":1624.16,"location":2,"content":"they actually did several tasks, um,"},{"from":1624.16,"to":1627.13,"location":2,"content":"word similarity, analogy, and the SQuAD,"},{"from":1627.13,"to":1629.21,"location":2,"content":"um, question answering system."},{"from":1629.21,"to":1631.18,"location":2,"content":"But the question was, okay,"},{"from":1631.18,"to":1636.23,"location":2,"content":"a lot of neural network models are big and so aren't very suitable for phones, um,"},{"from":1636.23,"to":1641.95,"location":2,"content":"could we get away with compressing the models a lot so that rather than having doubles,"},{"from":1641.95,"to":1645.58,"location":2,"content":"or 32-bit floats, or even 16-bit floats,"},{"from":1645.58,"to":1648.6,"location":2,"content":"that are now used quite a bit in neural networks, could we,"},{"from":1648.6,"to":1652.9,"location":2,"content":"um, compress a lot more and quantize, um,"},{"from":1652.9,"to":1655.45,"location":2,"content":"numeric values so that we can only be, say,"},{"from":1655.45,"to":1660.38,"location":2,"content":"using two bits fo- per parameter so they'll literally need four bits per parameter?"},{"from":1660.38,"to":1662.89,"location":2,"content":"And if you do that naively, it doesn't work."},{"from":1662.89,"to":1668.5,"location":2,"content":"But if you explore some cleverer ways of doing it and see how to make things work,"},{"from":1668.5,"to":1671.45,"location":2,"content":"you can actually get it to work, um, really well."},{"from":1671.45,"to":1674.68,"location":2,"content":"Um, in fact, it actually seems like sometimes you can improve"},{"from":1674.68,"to":1679.39,"location":2,"content":"your performance doing this because the quantization acts as a form of regularizer."},{"from":1679.39,"to":1683.29,"location":2,"content":"Um, you can find lots of other projects, um, online,"},{"from":1683.29,"to":1687.14,"location":2,"content":"if you look at the CS224n pages and you should."},{"from":1687.14,"to":1688.99,"location":2,"content":"Um, okay."},{"from":1688.99,"to":1692.83,"location":2,"content":"So, if you want to do a final project you have to find someplace to start."},{"from":1692.83,"to":1695.95,"location":2,"content":"You know, one place is to start looking at papers there's"},{"from":1695.95,"to":1699.76,"location":2,"content":"online anthology of most of the NLP conference papers."},{"from":1699.76,"to":1703.69,"location":2,"content":"You can look at M- ML conferences have lots of relevant papers as well."},{"from":1703.69,"to":1708.71,"location":2,"content":"You can look at past CS224n papers that cover lots of topics."},{"from":1708.71,"to":1713.2,"location":2,"content":"Um, though, you know, I- I sugge- don't also forget, um,"},{"from":1713.2,"to":1716.18,"location":2,"content":"the advice down the bottom, um,"},{"from":1716.18,"to":1719.98,"location":2,"content":"which is look for an interesting problem in the world."},{"from":1719.98,"to":1723.67,"location":2,"content":"Um, so, our Stanford's CS emeritus professor"},{"from":1723.67,"to":1727.24,"location":2,"content":"Ed Feigenbaum likes to quote the advice of his,"},{"from":1727.24,"to":1730.46,"location":2,"content":"um, advisor, Herb Simon, um,"},{"from":1730.46,"to":1735.65,"location":2,"content":"of \"If you see a research area where many people are working, go somewhere else.\""},{"from":1735.65,"to":1736.87,"location":2,"content":"Um, well, you know,"},{"from":1736.87,"to":1741.12,"location":2,"content":"in the context of this class don't go so far away that you're not using"},{"from":1741.12,"to":1745.83,"location":2,"content":"neural networks or NLP because that won't work for project for this class."},{"from":1745.83,"to":1748.09,"location":2,"content":"But, you know, nevertheless, I mean,"},{"from":1748.09,"to":1750.25,"location":2,"content":"in some sense it's a bad strategy of"},{"from":1750.25,"to":1752.92,"location":2,"content":"saying let's look at all the papers that were published last year,"},{"from":1752.92,"to":1755.48,"location":2,"content":"and let's wo- start working on one of their problems,"},{"from":1755.48,"to":1758.61,"location":2,"content":"or lots of people are working on question-answering, I'll do it too."},{"from":1758.61,"to":1761.69,"location":2,"content":"You know, there are lots of interesting different problems"},{"from":1761.69,"to":1764.24,"location":2,"content":"in the world and if you know of some, you know,"},{"from":1764.24,"to":1768.34,"location":2,"content":"cool website that somehow does something interesting related to language,"},{"from":1768.34,"to":1771.51,"location":2,"content":"you know, maybe you can make a final project out of that."},{"from":1771.51,"to":1774.68,"location":2,"content":"Um, other ways to find final projects."},{"from":1774.68,"to":1778.09,"location":2,"content":"Um, so the person who's first put together most of"},{"from":1778.09,"to":1783.22,"location":2,"content":"the CS231n content was And- Andrej Karpathy, um,"},{"from":1783.22,"to":1786.76,"location":2,"content":"who now works at Tesla and among his other- things"},{"from":1786.76,"to":1790.73,"location":2,"content":"he did for the world he put together this site Arxiv Sanity Preserver, um,"},{"from":1790.73,"to":1794.56,"location":2,"content":"which is a way to find online archive papers which is"},{"from":1794.56,"to":1799,"location":2,"content":"a major pre-print server and if you say a few papers you're interested in,"},{"from":1799,"to":1801.43,"location":2,"content":"it'll show you other papers that you're interested in."},{"from":1801.43,"to":1803.76,"location":2,"content":"It'll show you papers that are currently trending."},{"from":1803.76,"to":1805.7,"location":2,"content":"So, that can be a good way to look."},{"from":1805.7,"to":1808.15,"location":2,"content":"Um, if you think it'd be just good to be in"},{"from":1808.15,"to":1810.61,"location":2,"content":"some competition where you're wanting to"},{"from":1810.61,"to":1813.2,"location":2,"content":"build a system that's better than other people's,"},{"from":1813.2,"to":1816.41,"location":2,"content":"um, you can look at leaderboards for various tasks."},{"from":1816.41,"to":1819.16,"location":2,"content":"So, there's this brand new site which is pretty good though"},{"from":1819.16,"to":1821.95,"location":2,"content":"not completely error free and correct, of"},{"from":1821.95,"to":1826.12,"location":2,"content":"paperswithcode.com, and it collects a whole lot of"},{"from":1826.12,"to":1831.19,"location":2,"content":"leaderboards for a whole lot of machine learning tasks including tons of language ones."},{"from":1831.19,"to":1833.86,"location":2,"content":"So, it gives leaderboards for question answering,"},{"from":1833.86,"to":1835.99,"location":2,"content":"machine translation, named entity recognition,"},{"from":1835.99,"to":1838.09,"location":2,"content":"language modeling, part of speech tagging."},{"from":1838.09,"to":1840.12,"location":2,"content":"All sorts of tasks you can find there,"},{"from":1840.12,"to":1844.92,"location":2,"content":"and find out what the current states of the art and datasets are."},{"from":1844.92,"to":1848.47,"location":2,"content":"Okay. Um, so, you know,"},{"from":1848.47,"to":1850.3,"location":2,"content":"different projects are different,"},{"from":1850.3,"to":1854.68,"location":2,"content":"but often for a lot of projects the things you need to be making sure of is"},{"from":1854.68,"to":1859.21,"location":2,"content":"that something that you can get a decent amount of data about so you can train a model."},{"from":1859.21,"to":1860.8,"location":2,"content":"It's a feasible task,"},{"from":1860.8,"to":1864.1,"location":2,"content":"it's not so enormous you can't possibly do it in four weeks."},{"from":1864.1,"to":1868.42,"location":2,"content":"Um, you'll want to have some evaluation metric and"},{"from":1868.42,"to":1870.76,"location":2,"content":"normally for deep learning you have to have-"},{"from":1870.76,"to":1873.22,"location":2,"content":"even if you hope to do some human evaluation,"},{"from":1873.22,"to":1877.11,"location":2,"content":"as well, you have to have some automatic evaluation metric."},{"from":1877.11,"to":1879.65,"location":2,"content":"Because unless there's just some code that you can run"},{"from":1879.65,"to":1882.41,"location":2,"content":"that gives you a score for how well you're doing,"},{"from":1882.41,"to":1884.02,"location":2,"content":"then unless you have that,"},{"from":1884.02,"to":1887.92,"location":2,"content":"you just sort of can't do the deep learning trick of saying, \"Okay,"},{"from":1887.92,"to":1894.04,"location":2,"content":"let's, um, do backpropagation to optimize our scores according to this metric.\""},{"from":1894.04,"to":1899.05,"location":2,"content":"And pretty much you'll want to do that to be able to do neural network optimization."},{"from":1899.05,"to":1905.02,"location":2,"content":"Um, and we do require that there is an important part of NLP in your class project."},{"from":1905.02,"to":1906.4,"location":2,"content":"I mean, it doesn't have to be only thing,"},{"from":1906.4,"to":1908.66,"location":2,"content":"you can be doing reinforcement learning as well,"},{"from":1908.66,"to":1911.38,"location":2,"content":"or you could do images to caption, say you're"},{"from":1911.38,"to":1913.3,"location":2,"content":"doing joint vision and NLP,"},{"from":1913.3,"to":1915.65,"location":2,"content":"but there has to be NLP in it."},{"from":1915.65,"to":1922.35,"location":2,"content":"Okay. Ah, last bit before I get back onto the content from last week."},{"from":1922.35,"to":1927.61,"location":2,"content":"Ah, so, something that you'll need to do is have data for your project."},{"from":1927.61,"to":1932.37,"location":2,"content":"Um, so some people collect their own data for a project and, you know,"},{"from":1932.37,"to":1934.7,"location":2,"content":"it's not impossible to collect your own data"},{"from":1934.7,"to":1937.95,"location":2,"content":"especially if there's something you can do with unsupervised data."},{"from":1937.95,"to":1941.45,"location":2,"content":"You might be able to get it by just sort of crawling an interesting website."},{"from":1941.45,"to":1945.17,"location":2,"content":"You can annotate a small amount of data yourself."},{"from":1945.17,"to":1948.66,"location":2,"content":"If you have any site that has some kind of, you know,"},{"from":1948.66,"to":1951.33,"location":2,"content":"ratings annotation stars on it,"},{"from":1951.33,"to":1956.21,"location":2,"content":"you can treat those as a form of, ah, annotation."},{"from":1956.21,"to":1961.98,"location":2,"content":"Right? So, if you want to predict something like, um, you know,"},{"from":1961.98,"to":1966.67,"location":2,"content":"which descriptions on product review websites"},{"from":1966.67,"to":1970.23,"location":2,"content":"or which reviews on product review websites do people like?"},{"from":1970.23,"to":1973.29,"location":2,"content":"Well, they get star ratings at the bottom from people and"},{"from":1973.29,"to":1976.61,"location":2,"content":"then you can try and fit to that as your supervision."},{"from":1976.61,"to":1981.03,"location":2,"content":"Um, sometimes people have data from an existing project for a company."},{"from":1981.03,"to":1982.63,"location":2,"content":"You can use that."},{"from":1982.63,"to":1985.33,"location":2,"content":"But nevertheless for most people, um,"},{"from":1985.33,"to":1988.13,"location":2,"content":"given that classes are short and things like that,"},{"from":1988.13,"to":1990.53,"location":2,"content":"the practical thing to do is use"},{"from":1990.53,"to":1995.19,"location":2,"content":"an existing curated dataset that's been built by previous researchers."},{"from":1995.19,"to":2000.12,"location":2,"content":"That normally gives you a fast start and lets you get to work building models, um,"},{"from":2000.12,"to":2001.93,"location":2,"content":"there's obvious prior work,"},{"from":2001.93,"to":2004.63,"location":2,"content":"there are baselines and previous systems"},{"from":2004.63,"to":2008.25,"location":2,"content":"that you can compare your performance on, et cetera."},{"from":2008.25,"to":2012.04,"location":2,"content":"Okay. Um, so, where can you find data?"},{"from":2012.04,"to":2015.14,"location":2,"content":"I'll just mention a couple of places here and there are lots more."},{"from":2015.14,"to":2017.47,"location":2,"content":"So, traditionally the biggest source of"},{"from":2017.47,"to":2020.54,"location":2,"content":"linguistic data used by academics was this place called"},{"from":2020.54,"to":2023.42,"location":2,"content":"the Linguistic Data Consortium and they have lots of"},{"from":2023.42,"to":2026.96,"location":2,"content":"datasets for treebanks and named entities and coreference,"},{"from":2026.96,"to":2028.98,"location":2,"content":"parallel machine, translation data,"},{"from":2028.98,"to":2030.4,"location":2,"content":"et cetera, et cetera."},{"from":2030.4,"to":2035.31,"location":2,"content":"And so, um, the Linguistic Data Consortium licenses their data,"},{"from":2035.31,"to":2039.11,"location":2,"content":"Stanford pays that license so you can use any of it."},{"from":2039.11,"to":2041.5,"location":2,"content":"Um, but if you want to use it, um,"},{"from":2041.5,"to":2045.36,"location":2,"content":"you go to that, um, linguistics.stanford.edu page."},{"from":2045.36,"to":2048.32,"location":2,"content":"And there's a sign-up, um, ah,"},{"from":2048.32,"to":2052.49,"location":2,"content":"piece on how to sign up where you basically, um, say,"},{"from":2052.49,"to":2054.2,"location":2,"content":"\"I will use this data only for"},{"from":2054.2,"to":2057.94,"location":2,"content":"good Stanford purposes and not as the basis of my startup.\""},{"from":2057.94,"to":2061.07,"location":2,"content":"And, um, then you can have access to that data"},{"from":2061.07,"to":2064.78,"location":2,"content":"and it can be made available by NFS or otherwise."},{"from":2064.78,"to":2067.34,"location":2,"content":"Um, but as time has gone by,"},{"from":2067.34,"to":2072.28,"location":2,"content":"there's a ton of curated NLP data that's available on various websites."},{"from":2072.28,"to":2074.61,"location":2,"content":"In fact, if anything the problem is it's just sort of"},{"from":2074.61,"to":2077.99,"location":2,"content":"spread over the web and that's sort of hard to find different things."},{"from":2077.99,"to":2082.31,"location":2,"content":"But there are some, some sites that have a lot of data for various purposes."},{"from":2082.31,"to":2085.97,"location":2,"content":"So, anything related to machine translation or just parallel,"},{"from":2085.97,"to":2087.97,"location":2,"content":"um, data across different languages."},{"from":2087.97,"to":2092.68,"location":2,"content":"The statistical MT statmt.org site has a great amount of"},{"from":2092.68,"to":2097.43,"location":2,"content":"data and that organization runs shared tasks every year,"},{"from":2097.43,"to":2099.32,"location":2,"content":"the Workshop on Machine Translation,"},{"from":2099.32,"to":2103.36,"location":2,"content":"WMT which Abby already mentioned in her class."},{"from":2103.36,"to":2105.28,"location":2,"content":"And they've got datasets that we use for"},{"from":2105.28,"to":2108.21,"location":2,"content":"those tasks and then there are leaderboards for those tasks."},{"from":2108.21,"to":2110.41,"location":2,"content":"And you can find data for that."},{"from":2110.41,"to":2113.9,"location":2,"content":"Um, if you thought dependency parsing was cool, um,"},{"from":2113.9,"to":2118.7,"location":2,"content":"there's the Universal Dependencies site which has parallel, not parallel site,"},{"from":2118.7,"to":2121.72,"location":2,"content":"which has treebanks in the same annotation scheme for"},{"from":2121.72,"to":2124.3,"location":2,"content":"about 60 different languages and you can work on"},{"from":2124.3,"to":2127.8,"location":2,"content":"parsers for different languages and things like that."},{"from":2127.8,"to":2131.33,"location":2,"content":"Um, I'm not gonna bore you with going through all of them but, you know,"},{"from":2131.33,"to":2133.84,"location":2,"content":"there are just tons and tons of other datasets that"},{"from":2133.84,"to":2137.68,"location":2,"content":"Facebook has released datasets, Google's released datasets,"},{"from":2137.68,"to":2141.38,"location":2,"content":"I said Stanford have released several other datasets including"},{"from":2141.38,"to":2145.23,"location":2,"content":"the Stanford Sentiment Treebank and the Stanford Na- Natural Language, um,"},{"from":2145.23,"to":2148.78,"location":2,"content":"Inference corpus, uh, new question-answering datasets and"},{"from":2148.78,"to":2152.98,"location":2,"content":"including HotPotQA and conversational question answering."},{"from":2152.98,"to":2156.18,"location":2,"content":"Other groups at different universities have released datasets."},{"from":2156.18,"to":2157.66,"location":2,"content":"There are just tons of them."},{"from":2157.66,"to":2162.95,"location":2,"content":"You can find data on sites like Kaggle where it has machine-learning competitions."},{"from":2162.95,"to":2166.02,"location":2,"content":"There are sites with lists of datasets."},{"from":2166.02,"to":2169.86,"location":2,"content":"You can look at research papers and see what datasets they used."},{"from":2169.86,"to":2172.7,"location":2,"content":"And of course, you can ask the course staff or on Piazza"},{"from":2172.7,"to":2176.3,"location":2,"content":"to try and find suitable datasets for a project."},{"from":2176.3,"to":2179.57,"location":2,"content":"Okay. Um, so that's a fair bit about"},{"from":2179.57,"to":2183.18,"location":2,"content":"the projects that I've got a bit more to say later about doing projects."},{"from":2183.18,"to":2188.64,"location":2,"content":"Does anyone have any questions up until now on projects?"},{"from":2188.64,"to":2194.18,"location":2,"content":"Okay. Um, well, so now we're gonna sort of, um,"},{"from":2194.18,"to":2199.2,"location":2,"content":"flip a switch in our brains and go back and have one more look,"},{"from":2199.2,"to":2202.11,"location":2,"content":"um, at gated recurrent units,"},{"from":2202.11,"to":2205.49,"location":2,"content":"um, and what happens and what they mean."},{"from":2205.49,"to":2207.24,"location":2,"content":"Um, and, you know,"},{"from":2207.24,"to":2208.72,"location":2,"content":"this is sort of,"},{"from":2208.72,"to":2211.57,"location":2,"content":"sort of the same material that Abby presented,"},{"from":2211.57,"to":2214.07,"location":2,"content":"presented a little bit differently but, you know,"},{"from":2214.07,"to":2217.13,"location":2,"content":"I hope it might just sort of give one more way of"},{"from":2217.13,"to":2220.52,"location":2,"content":"sort of thinking a bit about what's happening about"},{"from":2220.52,"to":2223.8,"location":2,"content":"these gated recurrent units and why they might be doing"},{"from":2223.8,"to":2227.45,"location":2,"content":"something useful and what are the alternatives to them."},{"from":2227.45,"to":2231.64,"location":2,"content":"So, if you remember the problem we started with is that we"},{"from":2231.64,"to":2236.53,"location":2,"content":"wanted to understand sort of derivatives backward in time."},{"from":2236.53,"to":2238.27,"location":2,"content":"And so, the idea of that is well,"},{"from":2238.27,"to":2242.06,"location":2,"content":"if we twiddle this a little bit at time T,"},{"from":2242.06,"to":2247.24,"location":2,"content":"how much effect is that going to have so we make some adjustment here."},{"from":2247.24,"to":2252.05,"location":2,"content":"How much effect is that going to have n time steps later?"},{"from":2252.05,"to":2258.21,"location":2,"content":"Um, and well, we sort of looked at the derivatives and we sort of saw we got these,"},{"from":2258.21,"to":2261.9,"location":2,"content":"um, terms for each successive time step."},{"from":2261.9,"to":2268.7,"location":2,"content":"And so as Abby discussed the problem is that for the derivatives that we got,"},{"from":2268.7,"to":2272.22,"location":2,"content":"we kind of got this matrix form for each time step."},{"from":2272.22,"to":2275.16,"location":2,"content":"And so that if we're going through a lot of time steps,"},{"from":2275.16,"to":2280.59,"location":2,"content":"we got a lot of matrix multiplies and as the result of those matrix multiplies,"},{"from":2280.59,"to":2283.28,"location":2,"content":"pretty much either things disappeared down to"},{"from":2283.28,"to":2287.28,"location":2,"content":"zero or exploded upward depending on what was in the matrix."},{"from":2287.28,"to":2290.24,"location":2,"content":"And so that- and so that's sort of means we,"},{"from":2290.24,"to":2291.59,"location":2,"content":"When the gradient goes to zero,"},{"from":2291.59,"to":2294.87,"location":2,"content":"we kind of can't know what's happening there."},{"from":2294.87,"to":2298.63,"location":2,"content":"Whether there isn't any conditioning or just we can't measure it."},{"from":2298.63,"to":2303.03,"location":2,"content":"And so that's sort of made people think that maybe this naive, um,"},{"from":2303.03,"to":2309.35,"location":2,"content":"recurrent neural network transition function just isn't a good one to use."},{"from":2309.35,"to":2313.76,"location":2,"content":"And that sort of leads into these ideas of gated recurrent units."},{"from":2313.76,"to":2315.93,"location":2,"content":"Right? Because if we have"},{"from":2315.93,"to":2319.24,"location":2,"content":"the simple recurrent neural network where we're"},{"from":2319.24,"to":2322.8,"location":2,"content":"sort of feeding forward for each step in time."},{"from":2322.8,"to":2325.52,"location":2,"content":"Well, what happens is when we backpropagate."},{"from":2325.52,"to":2326.95,"location":2,"content":"We have to backpropagate through"},{"from":2326.95,"to":2332.23,"location":2,"content":"every intermediate node and that's where we sort of have our gradients disappear."},{"from":2332.23,"to":2337.19,"location":2,"content":"And so an idea of how you could fix that is to say well,"},{"from":2337.19,"to":2343.13,"location":2,"content":"suppose we just put in direct connections that were longer distance, um,"},{"from":2343.13,"to":2347.22,"location":2,"content":"then we'd also get direct backpropagation signal"},{"from":2347.22,"to":2351.86,"location":2,"content":"and so then we wouldn't have this same problem of vanishing gradients."},{"from":2351.86,"to":2357.13,"location":2,"content":"And effectively, we've sort of looked at two ways in which you can achieve that effect."},{"from":2357.13,"to":2361.24,"location":2,"content":"Because one way of you can achieve that effect which Abby looked at"},{"from":2361.24,"to":2365.45,"location":2,"content":"in the end part of the last lecture was this idea of attention."},{"from":2365.45,"to":2367.45,"location":2,"content":"So, when you've got attention,"},{"from":2367.45,"to":2371.89,"location":2,"content":"you're actually are creating these shortcut connections,"},{"from":2371.89,"to":2373.77,"location":2,"content":"oops, they're the blue ones, um,"},{"from":2373.77,"to":2378.87,"location":2,"content":"from every time step and using it to calculate an attention distribution."},{"from":2378.87,"to":2381.32,"location":2,"content":"But the way the attention was done that we looked at,"},{"from":2381.32,"to":2386.13,"location":2,"content":"it was sort of mushing together all previous time steps into some kind of an average."},{"from":2386.13,"to":2390.55,"location":2,"content":"But the idea of the gated recurrent units is in some sense we want to"},{"from":2390.55,"to":2395.76,"location":2,"content":"achieve this same kind of ability to have shortcut connections."},{"from":2395.76,"to":2397.95,"location":2,"content":"But we want to do it in"},{"from":2397.95,"to":2404.41,"location":2,"content":"a more controlled and adaptive fashion where we still do remember the position of things."},{"from":2404.41,"to":2408.97,"location":2,"content":"So, how can we create an adaptive shortcut connection?"},{"from":2408.97,"to":2410.77,"location":2,"content":"And so that's, um,"},{"from":2410.77,"to":2417.58,"location":2,"content":"what we start to do with the gates that are put into a gated recurrent network."},{"from":2417.58,"to":2422.36,"location":2,"content":"So, if- so first off we sort of say let's have"},{"from":2422.36,"to":2426.22,"location":2,"content":"a candidate update which is exactly the same"},{"from":2426.22,"to":2430.39,"location":2,"content":"as the one that's used in a simple recurrent neural network."},{"from":2430.39,"to":2434.28,"location":2,"content":"But what we can do is add a gate."},{"from":2434.28,"to":2437.89,"location":2,"content":"And so, the gate will calculate a value from zero to one."},{"from":2437.89,"to":2441.59,"location":2,"content":"And so what we're going to do here is mix together"},{"from":2441.59,"to":2446.21,"location":2,"content":"using our candidate update which is just like"},{"from":2446.21,"to":2451.72,"location":2,"content":"a simple recurrent neural network which will be then mixed together with simply"},{"from":2451.72,"to":2457.84,"location":2,"content":"directly carrying forward the hidden state from the previous time step."},{"from":2457.84,"to":2462.78,"location":2,"content":"So, once we're doing that we are sort of then adaptively-"},{"from":2462.78,"to":2469.99,"location":2,"content":"we're adaptively partly using a computation from one time step back,"},{"from":2469.99,"to":2473.08,"location":2,"content":"um, done as a recurrent neural network."},{"from":2473.08,"to":2476.98,"location":2,"content":"And we're partly just inheriting the,"},{"from":2476.98,"to":2479.54,"location":2,"content":"we're just part- sorry, we're partly inheriting"},{"from":2479.54,"to":2482.26,"location":2,"content":"the hidden state from the previous time step."},{"from":2482.26,"to":2486.24,"location":2,"content":"So, it's sort of like a shortcut connection but we're waiting as to"},{"from":2486.24,"to":2490.84,"location":2,"content":"how much we're short cutting and how much we're doing our computation."},{"from":2490.84,"to":2498.75,"location":2,"content":"And we control that adaptive choice by using a calculation to set the gate."},{"from":2498.75,"to":2501.07,"location":2,"content":"And we do that with a sigmoid, um,"},{"from":2501.07,"to":2506.54,"location":2,"content":"computed over the import and the hidden- previous hidden state and using it again,"},{"from":2506.54,"to":2510.78,"location":2,"content":"an equation kind of like a simple recurrent neural network."},{"from":2510.78,"to":2513.93,"location":2,"content":"Okay. Um, but, you know,"},{"from":2513.93,"to":2517.72,"location":2,"content":"if you wanted to go a bit further than that,"},{"from":2517.72,"to":2520.38,"location":2,"content":"um, you could think well,"},{"from":2520.38,"to":2525.82,"location":2,"content":"maybe sometimes we sort of might actually"},{"from":2525.82,"to":2531.43,"location":2,"content":"just want to get rid of the stuff that was in the past."},{"from":2531.43,"to":2535.47,"location":2,"content":"That maybe the stuff in the past sometimes becomes irrelevant, like,"},{"from":2535.47,"to":2538.29,"location":2,"content":"maybe sometimes we start a new sentence or a new"},{"from":2538.29,"to":2541.91,"location":2,"content":"thought and we just want to get rid of the stuff that's in the past."},{"from":2541.91,"to":2545.7,"location":2,"content":"And so, that can lead into this idea of having a second gate,"},{"from":2545.7,"to":2551.36,"location":2,"content":"a reset gate and so the reset gate calculates a value from 0 to 1, um,"},{"from":2551.36,"to":2553.07,"location":2,"content":"just like the other gates,"},{"from":2553.07,"to":2558.66,"location":2,"content":"and then we're doing this element wise dot-product between"},{"from":2558.66,"to":2564.43,"location":2,"content":"the reset gate and the previous hidden state and that's then sort of saying well,"},{"from":2564.43,"to":2567.9,"location":2,"content":"maybe we want to keep some parts of what was stored"},{"from":2567.9,"to":2572.36,"location":2,"content":"previously and some parts that we now want to throw away."},{"from":2572.36,"to":2576.15,"location":2,"content":"And so we put that into the model as a second gate."},{"from":2576.15,"to":2581.01,"location":2,"content":"Um, and so an interesting way to think about that is to sort of think"},{"from":2581.01,"to":2585.54,"location":2,"content":"about this as if this recurrent neural network is like"},{"from":2585.54,"to":2590.13,"location":2,"content":"a little tiny computer as the kind of little tiny computers you"},{"from":2590.13,"to":2595.03,"location":2,"content":"might do in a sort of simple architecture class and if you think about it that way,"},{"from":2595.03,"to":2600.3,"location":2,"content":"um, for the basic simple recurrent neural network"},{"from":2600.3,"to":2605.47,"location":2,"content":"the way the tiny computer works is that you've got a bank of registers h,"},{"from":2605.47,"to":2610.03,"location":2,"content":"your hidden state, and at each time step you have to"},{"from":2610.03,"to":2617.91,"location":2,"content":"read- whoops, at each time step you have to read the entirety of your bank of registers,"},{"from":2617.91,"to":2621,"location":2,"content":"you do some computation and then you write"},{"from":2621,"to":2624.6,"location":2,"content":"the entirety of your bank of registers and, you know,"},{"from":2624.6,"to":2627.96,"location":2,"content":"if in terms of thinking about computer architecture,"},{"from":2627.96,"to":2632.19,"location":2,"content":"that sounds like a pretty bad way to implement a simple computer."},{"from":2632.19,"to":2637.55,"location":2,"content":"Um, so precisely what a gated recurrent unit is doing is saying,"},{"from":2637.55,"to":2641.96,"location":2,"content":"\"Well, maybe we can have a slightly more sophisticated little baby computer.\""},{"from":2641.96,"to":2648.09,"location":2,"content":"Instead of that, we could select a subset of the registers that we want to read."},{"from":2648.09,"to":2651.17,"location":2,"content":"And so, the reset gate can control that because it can say,"},{"from":2651.17,"to":2653.72,"location":2,"content":"\"We'll just ignore a bunch of the other registers.\""},{"from":2653.72,"to":2660.78,"location":2,"content":"Um, it then will compute a new value based on just these, um,"},{"from":2660.78,"to":2667.22,"location":2,"content":"stored registers and then the update gate which is also adaptive can say, \"Well,"},{"from":2667.22,"to":2669.3,"location":2,"content":"I want you to write"},{"from":2669.3,"to":2674.58,"location":2,"content":"some registers but the rest of the registers will just keep their previous value.\""},{"from":2674.58,"to":2677.49,"location":2,"content":"That seems a useful idea to have in a computer."},{"from":2677.49,"to":2679.68,"location":2,"content":"And so, that's what we're doing here."},{"from":2679.68,"to":2682.71,"location":2,"content":"And so, this model here is, um,"},{"from":2682.71,"to":2689.11,"location":2,"content":"what was- Abby presented second as the gated recurrent unit."},{"from":2689.11,"to":2693.39,"location":2,"content":"So, this is sort of a much more realistic model"},{"from":2693.39,"to":2697.51,"location":2,"content":"and it sort of in some sense overlaps with the ideas of attention."},{"from":2697.51,"to":2703.24,"location":2,"content":"Okay. Um, so gated recurrent units are actually a quite new model."},{"from":2703.24,"to":2707.97,"location":2,"content":"Um, the model that was done way earlier and has had huge impact"},{"from":2707.97,"to":2713.34,"location":2,"content":"is these LSTM long short-term memory units and they are a bit more complex."},{"from":2713.34,"to":2715.03,"location":2,"content":"Um, but, you know,"},{"from":2715.03,"to":2717.69,"location":2,"content":"a lot of it is sort of the same, right?"},{"from":2717.69,"to":2720.21,"location":2,"content":"So, the hidden state of"},{"from":2720.21,"to":2725.04,"location":2,"content":"a gated recurrent unit is kind of equivalent to the cell of the LSTM."},{"from":2725.04,"to":2729.99,"location":2,"content":"So, both of them are using the same idea of summing together,"},{"from":2729.99,"to":2734.46,"location":2,"content":"a mixture of just directly interpret- directly inheriting"},{"from":2734.46,"to":2739.14,"location":2,"content":"what you had from the previous time step together with, um,"},{"from":2739.14,"to":2743.79,"location":2,"content":"something that you've calculated for the current time step and the way you count-"},{"from":2743.79,"to":2749.55,"location":2,"content":"calculate it for the current time step is exactly the same in both cases."},{"from":2749.55,"to":2753.38,"location":2,"content":"Whoops, sorry. Both cases again that you're calculating"},{"from":2753.38,"to":2758.13,"location":2,"content":"the current update using this sort of simple RNN equation."},{"from":2758.13,"to":2760.56,"location":2,"content":"So, those parts are exactly the same."},{"from":2760.56,"to":2764.31,"location":2,"content":"Um, but the LSTM is a little bit more complicated."},{"from":2764.31,"to":2767.31,"location":2,"content":"It now has three gates, um,"},{"from":2767.31,"to":2768.8,"location":2,"content":"and it's got this extra, um,"},{"from":2768.8,"to":2772.5,"location":2,"content":"hidden state that's then worked out with a bit more complexity."},{"from":2772.5,"to":2777.17,"location":2,"content":"So, in terms of my LSTM picture, you know,"},{"from":2777.17,"to":2782.36,"location":2,"content":"the LSTM picture looks as if you sort of pull apart all of its math pretty"},{"from":2782.36,"to":2789.99,"location":2,"content":"complex but so there are three gates so that you can forget or ignore everything."},{"from":2789.99,"to":2792.03,"location":2,"content":"So, you can forget or ignore the input,"},{"from":2792.03,"to":2793.89,"location":2,"content":"you can forget or ignore parts of"},{"from":2793.89,"to":2798.75,"location":2,"content":"your previous hidden state and you can forget or ignore parts of the cell"},{"from":2798.75,"to":2802.07,"location":2,"content":"when calculating the output and each of these"},{"from":2802.07,"to":2806.14,"location":2,"content":"is produce- when I say forget or ignore parts of,"},{"from":2806.14,"to":2810.63,"location":2,"content":"what that's meaning is you're calculating a vector which is then going to be element-wise"},{"from":2810.63,"to":2816.07,"location":2,"content":"multiplied by the import of the previous hidden state or the cell."},{"from":2816.07,"to":2819.27,"location":2,"content":"And so, that's why you have this effective now an addressable bank of"},{"from":2819.27,"to":2823.34,"location":2,"content":"registers where you can use some of them but not others of them."},{"from":2823.34,"to":2826.78,"location":2,"content":"Okay. So, the bottom part of the LSTM is just"},{"from":2826.78,"to":2830.4,"location":2,"content":"like a simpler simple recurrent neural network,"},{"from":2830.4,"to":2832.82,"location":2,"content":"um, which then calculates,"},{"from":2832.82,"to":2835.13,"location":2,"content":"um, a candidate update."},{"from":2835.13,"to":2841.29,"location":2,"content":"And so, for both of the GRU and the LSTM the real secret is"},{"from":2841.29,"to":2844.14,"location":2,"content":"that rather than just keeping on multiplying"},{"from":2844.14,"to":2848.03,"location":2,"content":"stuff what you do is you add two things together."},{"from":2848.03,"to":2852.12,"location":2,"content":"Um, and so this adding is why you don't"},{"from":2852.12,"to":2856.05,"location":2,"content":"get the same vanishing gradient evil effects because you're calculating a"},{"from":2856.05,"to":2859.32,"location":2,"content":"new candidate update and you're adding it to stuff that was"},{"from":2859.32,"to":2862.66,"location":2,"content":"previously in the cell and that gives you"},{"from":2862.66,"to":2866.19,"location":2,"content":"a simple gradient when you backpropagate that- that you have"},{"from":2866.19,"to":2872.74,"location":2,"content":"direct linear connection between the cell at time t and the cell at time t minus one."},{"from":2872.74,"to":2876.24,"location":2,"content":"And so, really that simple addition there is sort of"},{"from":2876.24,"to":2880.35,"location":2,"content":"the secret of most of the power of LSTMs and"},{"from":2880.35,"to":2884.01,"location":2,"content":"this same idea of adding two things together has also been a"},{"from":2884.01,"to":2888.11,"location":2,"content":"secret of many of the other advances in deep learning recently."},{"from":2888.11,"to":2892.45,"location":2,"content":"So, envision in the last couple of years the sort of standard model"},{"from":2892.45,"to":2897.06,"location":2,"content":"that everybody uses as ResNets, residual networks and they use"},{"from":2897.06,"to":2903,"location":2,"content":"exactly the same secret of allowing these adaptive updates where you add"},{"from":2903,"to":2910.68,"location":2,"content":"together a current layer's value with directly inheriting a value from the layer below."},{"from":2910.68,"to":2915.06,"location":2,"content":"Um, other things that use similar ideas are things like highway networks and so on."},{"from":2915.06,"to":2919.05,"location":2,"content":"So, that's proven to be an extremely powerful idea."},{"from":2919.05,"to":2922.44,"location":2,"content":"Um, the LSTM is slightly different from"},{"from":2922.44,"to":2926.51,"location":2,"content":"the GRU because when we look back at its equations"},{"from":2926.51,"to":2933.99,"location":2,"content":"that the- the GRU kind of does a linear mixture where you have one gate value,"},{"from":2933.99,"to":2937.55,"location":2,"content":"UT, and one minus UT,"},{"from":2937.55,"to":2942.87,"location":2,"content":"where the LSTM adds values controlled by two different gates,"},{"from":2942.87,"to":2945.61,"location":2,"content":"a forget gate, and an input gate."},{"from":2945.61,"to":2949.29,"location":2,"content":"Theoretically, having the adding of"},{"from":2949.29,"to":2953.94,"location":2,"content":"two separate gates rather than than a mixture is theoretically more powerful."},{"from":2953.94,"to":2956.55,"location":2,"content":"Um, depending on the application,"},{"from":2956.55,"to":2959.37,"location":2,"content":"sometimes it doesn't seem to make much difference, um,"},{"from":2959.37,"to":2963.48,"location":2,"content":"but there's definitely a theoretical advantage to the LSTM there."},{"from":2963.48,"to":2971.07,"location":2,"content":"Okay. Um, just, I hope that's maybe a little bit more helpful to have seen those again,"},{"from":2971.07,"to":2977.97,"location":2,"content":"um, any questions on gated recurrent units?"},{"from":2977.97,"to":2982.65,"location":2,"content":"Still look confusing?"},{"from":2982.65,"to":2988.45,"location":2,"content":"I think it's useful to have some kind of idea as to why the people come up with"},{"from":2988.45,"to":2993.67,"location":2,"content":"these things and why do they make sense but,"},{"from":2993.67,"to":2998.68,"location":2,"content":"you know, nevertheless, the reality is in the sort of era of"},{"from":2998.68,"to":3003.75,"location":2,"content":"2015 plus any deep learning package you use whether it's PyTorch,"},{"from":3003.75,"to":3005.94,"location":2,"content":"TensorFlow, MXNet whatever, you know,"},{"from":3005.94,"to":3011.25,"location":2,"content":"it just comes with LSTM and GRUs and you don't have to program your own."},{"from":3011.25,"to":3013.17,"location":2,"content":"In fact, you're at disadvantage if you"},{"from":3013.17,"to":3016.02,"location":2,"content":"program your own because if you are using the built-in one,"},{"from":3016.02,"to":3019.07,"location":2,"content":"it's using an efficient CUDA kernel from"},{"from":3019.07,"to":3023.91,"location":2,"content":"Nvidia whereas your custom built one won't and/or run three times slower."},{"from":3023.91,"to":3026.91,"location":2,"content":"Um, so, you know, essentially don't have to know how to do it,"},{"from":3026.91,"to":3030.53,"location":2,"content":"you can just take the attitude that an LSTM is just like"},{"from":3030.53,"to":3035.34,"location":2,"content":"a fancy recurrent network which will be easier to train and that's true."},{"from":3035.34,"to":3039.62,"location":2,"content":"Um, but you know, these kind of architectural ideas have actually been"},{"from":3039.62,"to":3045.42,"location":2,"content":"central to most of the big advances that have come in deep learning in the last couple of years,"},{"from":3045.42,"to":3047.64,"location":2,"content":"so there's actually good to have an ID,"},{"from":3047.64,"to":3049.92,"location":2,"content":"to have some sense of what were"},{"from":3049.92,"to":3053.68,"location":2,"content":"these important ideas that made everything so much better because they had"},{"from":3053.68,"to":3056.85,"location":2,"content":"the same kind of component building blocks you might also want"},{"from":3056.85,"to":3062.12,"location":2,"content":"to use in custom models that you design for yourself."},{"from":3062.12,"to":3066.84,"location":2,"content":"Okay, two bits of machine translation."},{"from":3066.84,"to":3071.25,"location":2,"content":"Um, so a bit of machine translation that we"},{"from":3071.25,"to":3075.72,"location":2,"content":"sort of didn't cover next week but lots of people have been seeing"},{"from":3075.72,"to":3079.92,"location":2,"content":"and getting confused by in the assignments so I thought I'd explain"},{"from":3079.92,"to":3084.21,"location":2,"content":"a bit about is UNKs and explain where do UNKs"},{"from":3084.21,"to":3088.41,"location":2,"content":"come from and why are there UNKs and the reason why"},{"from":3088.41,"to":3093.07,"location":2,"content":"there are UNKs is effectively kind of for efficiency reasons."},{"from":3093.07,"to":3099.7,"location":2,"content":"So, if you sort of think about producing output in a neural machine translation system"},{"from":3099.7,"to":3103.17,"location":2,"content":"and really this is the same as producing output"},{"from":3103.17,"to":3106.68,"location":2,"content":"in any natural, neural natural language generation system,"},{"from":3106.68,"to":3109.78,"location":2,"content":"so that's really the same for neural language model, um,"},{"from":3109.78,"to":3116.97,"location":2,"content":"that if you have a very large output vocabulary is just a expensive operation."},{"from":3116.97,"to":3124.85,"location":2,"content":"So you have a big matrix of softmax parameters where you have a row for every word, um,"},{"from":3124.85,"to":3132.42,"location":2,"content":"and then you have what,"},{"from":3132.42,"to":3135.33,"location":2,"content":"[NOISE] then we have an animation that is not working for me."},{"from":3135.33,"to":3138.21,"location":2,"content":"Oh, all right there, there we go."},{"from":3138.21,"to":3141.03,"location":2,"content":"Um, so then we have some hidden state that we've"},{"from":3141.03,"to":3145.34,"location":2,"content":"calculated in our recurrent neural network."},{"from":3145.34,"to":3149.99,"location":2,"content":"And so, what we gonna do is sort of multiply, um,"},{"from":3149.99,"to":3153.11,"location":2,"content":"that vector by every row of the matrix,"},{"from":3153.11,"to":3159.03,"location":2,"content":"put it through a softmax and then get probabilities without putting every word."},{"from":3159.03,"to":3160.77,"location":2,"content":"Um, and you know,"},{"from":3160.77,"to":3164.04,"location":2,"content":"this seems pretty simple but the problem is that"},{"from":3164.04,"to":3167.4,"location":2,"content":"to the extent that you have a humongous vocabulary here,"},{"from":3167.4,"to":3171.24,"location":2,"content":"you just have to do a humongous number of rows"},{"from":3171.24,"to":3175.18,"location":2,"content":"of this multiplication and it actually turns out that"},{"from":3175.18,"to":3179.03,"location":2,"content":"doing this is the expensive part of"},{"from":3179.03,"to":3183.6,"location":2,"content":"having a neural machine translation or neural language model system, right?"},{"from":3183.6,"to":3187.38,"location":2,"content":"The LSTM might look complicated and hard to understand, but you know,"},{"from":3187.38,"to":3191.94,"location":2,"content":"it's relatively small vectors that you multiply or dot-product once,"},{"from":3191.94,"to":3196.02,"location":2,"content":"and it's not that much work whereas if you have a huge number of words,"},{"from":3196.02,"to":3197.43,"location":2,"content":"this is a huge amount of work."},{"from":3197.43,"to":3202.56,"location":2,"content":"So, just for instance sort of for the pion- pioneering sequence to sequence,"},{"from":3202.56,"to":3206.36,"location":2,"content":"um, neural machine translation system that Google first did,"},{"from":3206.36,"to":3210.84,"location":2,"content":"they ran it on an eight GPU machine because they have lots of GPUs but"},{"from":3210.84,"to":3216.07,"location":2,"content":"the way they set it up to maximize performance was of those eight GPUs,"},{"from":3216.07,"to":3218.49,"location":2,"content":"three of them were running"},{"from":3218.49,"to":3224.07,"location":2,"content":"a deep multi-layer neural sequence model and the other five GPUs,"},{"from":3224.07,"to":3227.97,"location":2,"content":"the only thing that they were doing was calculating softmaxes because that's"},{"from":3227.97,"to":3232.77,"location":2,"content":"actually the bulk of the computation that you need to be able to do."},{"from":3232.77,"to":3236.85,"location":2,"content":"Um, so the simplest way to make this, um,"},{"from":3236.85,"to":3241.56,"location":2,"content":"computation not completely excessive is to say,"},{"from":3241.56,"to":3243.93,"location":2,"content":"\"Hey, I'll just limit the vocabulary.\""},{"from":3243.93,"to":3247.36,"location":2,"content":"Yeah I know that you can make"},{"from":3247.36,"to":3253.23,"location":2,"content":"a million different words in English and if you look at Spanish inflections of verbs,"},{"from":3253.23,"to":3256.24,"location":2,"content":"there are a lot of them and there's gonna be huge number of words, um,"},{"from":3256.24,"to":3260.22,"location":2,"content":"but maybe I can just make do with a modest vocabulary and it'll be near enough."},{"from":3260.22,"to":3262.3,"location":2,"content":"Surely 50,000 common words,"},{"from":3262.3,"to":3265.24,"location":2,"content":"I can cover a lot of stuff and so,"},{"from":3265.24,"to":3269.58,"location":2,"content":"that was sort of the starting off point of neural machine translation that you,"},{"from":3269.58,"to":3274.51,"location":2,"content":"people use the modest vocabulary like around 50,000 words."},{"from":3274.51,"to":3276.91,"location":2,"content":"And well, if you do that, um,"},{"from":3276.91,"to":3280.98,"location":2,"content":"well, then what happens is you have UNKs."},{"from":3280.98,"to":3283.26,"location":2,"content":"So UNK means, this is an unknown word,"},{"from":3283.26,"to":3287.32,"location":2,"content":"that's not in my vocabulary and so there are two kinds of UNKs,"},{"from":3287.32,"to":3291.32,"location":2,"content":"they can be UNKs in the source language and you know,"},{"from":3291.32,"to":3295.71,"location":2,"content":"they're sort of optional because, you know,"},{"from":3295.71,"to":3299.47,"location":2,"content":"it's not actually a problem having a large source language vocabulary,"},{"from":3299.47,"to":3302.07,"location":2,"content":"but the fact of the matter is if you've sort of trained"},{"from":3302.07,"to":3304.62,"location":2,"content":"a model on a certain amount of data,"},{"from":3304.62,"to":3306.72,"location":2,"content":"there are some words you aren't going to have seen,"},{"from":3306.72,"to":3309,"location":2,"content":"so you are going to have words that you just didn't"},{"from":3309,"to":3311.52,"location":2,"content":"see in your training data and you won't have"},{"from":3311.52,"to":3314.43,"location":2,"content":"any pre-trained or trained word vector"},{"from":3314.43,"to":3317.76,"location":2,"content":"for them and you can deal with that by either just treating them as UNK,"},{"from":3317.76,"to":3320.59,"location":2,"content":"so giving them a new word vector when you encounter them."},{"from":3320.59,"to":3324.57,"location":2,"content":"But the tricky part is on the translation that you're wanting to"},{"from":3324.57,"to":3328.72,"location":2,"content":"produce these rare words but they're not in your output vocabulary,"},{"from":3328.72,"to":3335.55,"location":2,"content":"so your system is producing UNK, UNK to UNK, which is not a very good translation really."},{"from":3335.55,"to":3339.72,"location":2,"content":"Um, yeah, and so that was sort of what the first,"},{"from":3339.72,"to":3344.22,"location":2,"content":"um, machine, neural machine translation systems, um, did."},{"from":3344.22,"to":3346.26,"location":2,"content":"And so, you know, obviously that's not"},{"from":3346.26,"to":3351.55,"location":2,"content":"a very satisfactory state of affairs and so there's been a whole bunch of work,"},{"from":3351.55,"to":3353.22,"location":2,"content":"um, as to how to deal with this,"},{"from":3353.22,"to":3360.47,"location":2,"content":"so you can use methods that allow you to deal with a larger output vocabulary,"},{"from":3360.47,"to":3363.78,"location":2,"content":"um, without the computation being excessive."},{"from":3363.78,"to":3367.78,"location":2,"content":"So one method of doing that is to have what's called a hierarchical softmax,"},{"from":3367.78,"to":3371.51,"location":2,"content":"so that rather than just having a huge matrix of words,"},{"from":3371.51,"to":3374.91,"location":2,"content":"you sort of have a tree structure in your vocabulary"},{"from":3374.91,"to":3378.48,"location":2,"content":"so you can do calculations with hierarchical,"},{"from":3378.48,"to":3382.82,"location":2,"content":"um, multiple small softmaxes and you can do that more quickly."},{"from":3382.82,"to":3385.62,"location":2,"content":"Um, I'm not gonna go through all these exam,"},{"from":3385.62,"to":3387.27,"location":2,"content":"all these things in detail now,"},{"from":3387.27,"to":3391.57,"location":2,"content":"I'm just sort of very quickly mentioning them and if anyone's interested, they can look."},{"from":3391.57,"to":3394.83,"location":2,"content":"People have used the noise-contrastive estimation idea that we"},{"from":3394.83,"to":3398.24,"location":2,"content":"saw with Word2vec in this context as well."},{"from":3398.24,"to":3402.66,"location":2,"content":"So this is a way to get much faster training which is important,"},{"from":3402.66,"to":3405.32,"location":2,"content":"it's not really a way to solve, um,"},{"from":3405.32,"to":3407.79,"location":2,"content":"speed at translation time but, you know,"},{"from":3407.79,"to":3410.58,"location":2,"content":"if this means you can train your system in six hours instead of"},{"from":3410.58,"to":3415.16,"location":2,"content":"six days that's a big win and so that's a good technique to use."},{"from":3415.16,"to":3420.33,"location":2,"content":"Um, people have done much smarter things, so really, um,"},{"from":3420.33,"to":3423.75,"location":2,"content":"the large vocabulary problem is basically solved"},{"from":3423.75,"to":3427.65,"location":2,"content":"now and so the kind of things that you can do is you can produce"},{"from":3427.65,"to":3431.97,"location":2,"content":"subsets of your vocabulary and train on particular subsets of"},{"from":3431.97,"to":3436.38,"location":2,"content":"vocabulary at a time and then when you're testing,"},{"from":3436.38,"to":3440.82,"location":2,"content":"you adaptively choose kind of a likely list of words that might"},{"from":3440.82,"to":3445.29,"location":2,"content":"appear in the translation of particular sentences or passages and then"},{"from":3445.29,"to":3448.2,"location":2,"content":"you can effectively work with sort of an appropriate subset of"},{"from":3448.2,"to":3452.85,"location":2,"content":"a vocabulary and that's sort of an efficient technique by which you can"},{"from":3452.85,"to":3456.33,"location":2,"content":"deal with an unlimited vocabulary but only be using"},{"from":3456.33,"to":3461.95,"location":2,"content":"a moderate sized softmax for any particular paragraph that you're translating,"},{"from":3461.95,"to":3464.79,"location":2,"content":"there's a paper that talks about that method."},{"from":3464.79,"to":3469.43,"location":2,"content":"Um, another idea is you can use attention when you do translation,"},{"from":3469.43,"to":3471.93,"location":2,"content":"the idea talked about at the end of last time."},{"from":3471.93,"to":3475.09,"location":2,"content":"So if you have attention, that sort of means that you can,"},{"from":3475.09,"to":3477.66,"location":2,"content":"you're pointing somewhere in the source and you"},{"from":3477.66,"to":3480.66,"location":2,"content":"know what you're translating at any point in time."},{"from":3480.66,"to":3485.07,"location":2,"content":"So, if that word is a rare word that's not in your vocabulary,"},{"from":3485.07,"to":3487.56,"location":2,"content":"there are things that you could do to deal with that."},{"from":3487.56,"to":3489.93,"location":2,"content":"I mean, firstly, if it's a rare word,"},{"from":3489.93,"to":3493.14,"location":2,"content":"its translation is much more likely to be constant,"},{"from":3493.14,"to":3497.47,"location":2,"content":"so you might just look it up in a dictionary or word list, um, and,"},{"from":3497.47,"to":3499.83,"location":2,"content":"um, stick in its translation,"},{"from":3499.83,"to":3502.49,"location":2,"content":"sometimes it's appropriate to do other things."},{"from":3502.49,"to":3504.45,"location":2,"content":"I mean, turns out that, you know,"},{"from":3504.45,"to":3509.68,"location":2,"content":"quite a lot of things that unknown words turn out to be other things like, you know,"},{"from":3509.68,"to":3512.97,"location":2,"content":"hexadecimal numbers, or FedEx tracking IDs,"},{"from":3512.97,"to":3515.68,"location":2,"content":"or GitHub shards, or things like that."},{"from":3515.68,"to":3517.02,"location":2,"content":"So for a lot of things like that,"},{"from":3517.02,"to":3519.39,"location":2,"content":"the right thing to do is just to copy them across."},{"from":3519.39,"to":3522.66,"location":2,"content":"And so, another thing that people have looked at is copying models,"},{"from":3522.66,"to":3525.22,"location":2,"content":"um, in machine translation."},{"from":3525.22,"to":3528.22,"location":2,"content":"Okay, um, there are more ideas that you can,"},{"from":3528.22,"to":3530.84,"location":2,"content":"we can get into to solve this and actually, um,"},{"from":3530.84,"to":3532.79,"location":2,"content":"next week we're gonna start dealing with"},{"from":3532.79,"to":3535.09,"location":2,"content":"some of the other ways that you could solve this, um,"},{"from":3535.09,"to":3539.41,"location":2,"content":"but I hope there to have given you sort of a sense of,"},{"from":3539.41,"to":3541.8,"location":2,"content":"um, sort of what these UNKs are about,"},{"from":3541.8,"to":3543.64,"location":2,"content":"why you see them and, uh,"},{"from":3543.64,"to":3546.14,"location":2,"content":"that there are sort of some ways that you might"},{"from":3546.14,"to":3548.6,"location":2,"content":"deal with them but you're not expected to be doing that,"},{"from":3548.6,"to":3550.9,"location":2,"content":"um, for assignment four."},{"from":3550.9,"to":3556.68,"location":2,"content":"Okay, then I just wanted to give a teeny bit more on evaluation."},{"from":3556.68,"to":3559.51,"location":2,"content":"Um, so Abby said a little bit about"},{"from":3559.51,"to":3563.37,"location":2,"content":"evaluation with blue and that then comes up in the assignment,"},{"from":3563.37,"to":3566.13,"location":2,"content":"so I just thought I'd give you a little bit more context on"},{"from":3566.13,"to":3569.09,"location":2,"content":"that since they're being quite a few questions about it."},{"from":3569.09,"to":3573.05,"location":2,"content":"So, um, so the general context here is, you know,"},{"from":3573.05,"to":3578.89,"location":2,"content":"how do you evaluate machine translation quality and sort of to this day,"},{"from":3578.89,"to":3583.98,"location":2,"content":"if you wanted to do a first rate bang up evaluation of machine translation quality,"},{"from":3583.98,"to":3587.67,"location":2,"content":"the way you do it is you get human beings to assess quality,"},{"from":3587.67,"to":3590.84,"location":2,"content":"you take translations and you send them to"},{"from":3590.84,"to":3594.87,"location":2,"content":"human beings with good bilingual skills and get them to score things."},{"from":3594.87,"to":3597.26,"location":2,"content":"And there are two ways that are commonly used."},{"from":3597.26,"to":3599.55,"location":2,"content":"One is sort of rating on"},{"from":3599.55,"to":3604.29,"location":2,"content":"Likert scales for things like adequacy and fluency of translations,"},{"from":3604.29,"to":3609.03,"location":2,"content":"um, but another way that often works better is asking for comparative judgments."},{"from":3609.03,"to":3614.03,"location":2,"content":"So here are two translations of this sentence which is better, um."},{"from":3614.03,"to":3616.94,"location":2,"content":"And so that's, you know,"},{"from":3616.94,"to":3620.07,"location":2,"content":"sort of still our gold standard of translation."},{"from":3620.07,"to":3622.88,"location":2,"content":"Um, another way you can evaluate translation is"},{"from":3622.88,"to":3625.93,"location":2,"content":"use your translations in the downstream task."},{"from":3625.93,"to":3628.64,"location":2,"content":"So, you could say \"I'm gonna build"},{"from":3628.64,"to":3633.5,"location":2,"content":"a cross-lingual question answering system and inside that system I'm,"},{"from":3633.5,"to":3635.78,"location":2,"content":"gonna use machine translation."},{"from":3635.78,"to":3637.97,"location":2,"content":"I'm gonna translate the questions um,"},{"from":3637.97,"to":3640.63,"location":2,"content":"and then try and match them against the documents."},{"from":3640.63,"to":3645.83,"location":2,"content":"Um, and then my score will be how good my question answering system is,"},{"from":3645.83,"to":3648.8,"location":2,"content":"and so the machine translation system is better"},{"from":3648.8,"to":3652.19,"location":2,"content":"if my question-answering score um, goes up.\""},{"from":3652.19,"to":3657.24,"location":2,"content":"I mean, that's kind of a nice way to do things because you're kinda then taking them in, run around needing,"},{"from":3657.24,"to":3660.11,"location":2,"content":"needing human beings, and yet you do have"},{"from":3660.11,"to":3663.49,"location":2,"content":"a clear numerical measure that's coming out the back end."},{"from":3663.49,"to":3666.55,"location":2,"content":"But it sort of has some catches because, you know,"},{"from":3666.55,"to":3669.98,"location":2,"content":"often there will be a fairly indirect connection between"},{"from":3669.98,"to":3674.09,"location":2,"content":"your end task and the quality of the machine translation,"},{"from":3674.09,"to":3676.64,"location":2,"content":"and it might turn out that there certain aspects of"},{"from":3676.64,"to":3680.51,"location":2,"content":"the machine translation like whether you get agreement endings,"},{"from":3680.51,"to":3682.97,"location":2,"content":"right on nouns and verbs or something."},{"from":3682.97,"to":3686.12,"location":2,"content":"They are actually just irrelevant to your performance in the task and say you're"},{"from":3686.12,"to":3689.64,"location":2,"content":"not assessing all aspects of um, quality."},{"from":3689.64,"to":3692.81,"location":2,"content":"Um, and so then the third way to do it is to come up with"},{"from":3692.81,"to":3695.84,"location":2,"content":"some way to score the direct tasks."},{"from":3695.84,"to":3700.41,"location":2,"content":"So, here, um, the direct task is machine translation,"},{"from":3700.41,"to":3704.45,"location":2,"content":"and this has been a valuable tool."},{"from":3704.45,"to":3707.3,"location":2,"content":"For, you know, really the last so"},{"from":3707.3,"to":3711.29,"location":2,"content":"25 years when people are doing machine learning models,"},{"from":3711.29,"to":3715.1,"location":2,"content":"because as soon as you have an automatic way to score things,"},{"from":3715.1,"to":3722.06,"location":2,"content":"you can then run automated experiments to say \"Let me try out these 50 different options."},{"from":3722.06,"to":3727.25,"location":2,"content":"Let me start varying these hyper-parameters and work out which way to do things is best.\""},{"from":3727.25,"to":3730.76,"location":2,"content":"And that importance has only grown in the deep learning era,"},{"from":3730.76,"to":3735.2,"location":2,"content":"when all the time what we want you to do is as Abby discussed, um,"},{"from":3735.2,"to":3738.14,"location":2,"content":"build end-to-end systems and then back"},{"from":3738.14,"to":3741.2,"location":2,"content":"propagate throughout the entire system to improve them,"},{"from":3741.2,"to":3742.91,"location":2,"content":"and we're doing that based on having"},{"from":3742.91,"to":3746.47,"location":2,"content":"some objective measure which is our automatic metric."},{"from":3746.47,"to":3749.41,"location":2,"content":"And so, that led into the development of"},{"from":3749.41,"to":3753.36,"location":2,"content":"automatic metrics to try and assess machine translation quality,"},{"from":3753.36,"to":3758.14,"location":2,"content":"and the most famous and still most used one is this one called BLEU."},{"from":3758.14,"to":3761.38,"location":2,"content":"And so, as Abby briefly mentioned,"},{"from":3761.38,"to":3764.9,"location":2,"content":"we have a reference translation done by human beings."},{"from":3764.9,"to":3769.79,"location":2,"content":"At some time a human being has to translate each piece of source material once,"},{"from":3769.79,"to":3773.18,"location":2,"content":"but then you take a machine translation and you"},{"from":3773.18,"to":3777.32,"location":2,"content":"score it based on the extent to which there"},{"from":3777.32,"to":3780.92,"location":2,"content":"are one or more word sequences that appear in"},{"from":3780.92,"to":3786.07,"location":2,"content":"the reference translation and also appear in the machine translation."},{"from":3786.07,"to":3792.53,"location":2,"content":"And so you are working out n-gram preci-precision scores for different values of n. So,"},{"from":3792.53,"to":3796.01,"location":2,"content":"the standard way of doing it is you do it for one grams,"},{"from":3796.01,"to":3798.56,"location":2,"content":"bigrams, trigrams, and four-grams."},{"from":3798.56,"to":3801.39,"location":2,"content":"So, word sequences of size one to four,"},{"from":3801.39,"to":3806.27,"location":2,"content":"and you try and find for ones of those in the machine translation,"},{"from":3806.27,"to":3811.76,"location":2,"content":"whether they also appear in the reference translation,"},{"from":3811.76,"to":3814.41,"location":2,"content":"and there are two tricks at work here."},{"from":3814.41,"to":3819.51,"location":2,"content":"Um, one trick is you have to do a kind of a bipartite matching um,"},{"from":3819.51,"to":3822.66,"location":2,"content":"because it just can't be that um,"},{"from":3822.66,"to":3825.18,"location":2,"content":"there's a word um,"},{"from":3825.18,"to":3829.55,"location":2,"content":"in the, in the reference translation somewhere."},{"from":3829.55,"to":3831.23,"location":2,"content":"Um, [NOISE] I don't know if there's."},{"from":3831.23,"to":3833.51,"location":2,"content":"I've got a good example here [NOISE]."},{"from":3833.51,"to":3837.77,"location":2,"content":"Um, maybe I can only do a silly example,"},{"from":3837.77,"to":3839.55,"location":2,"content":"but I'll do a silly example."},{"from":3839.55,"to":3843.32,"location":2,"content":"Um, that it's- it doesn't seem like you wanna say \"Okay."},{"from":3843.32,"to":3845.42,"location":2,"content":"Because there's a \"the\" in the reference,"},{"from":3845.42,"to":3848.97,"location":2,"content":"that means that this \"the\" is right and this \"the\" is right,"},{"from":3848.97,"to":3852.8,"location":2,"content":"and this \"the\" is right and every other \"the\" is also right.\""},{"from":3852.8,"to":3854.49,"location":2,"content":"That sort of seems unfair."},{"from":3854.49,"to":3860.82,"location":2,"content":"So, you're only allowed to use each thing in the reference once in matching n-grams,"},{"from":3860.82,"to":3864.14,"location":2,"content":"but you are allowed to use it multiple times for different order n-grams."},{"from":3864.14,"to":3866.57,"location":2,"content":"So, you can use it both in the uh unigram,"},{"from":3866.57,"to":3868.99,"location":2,"content":"bigram, trigram and 4-gram."},{"from":3868.99,"to":3872.27,"location":2,"content":"The other idea is that although you're measuring"},{"from":3872.27,"to":3877.2,"location":2,"content":"the precision of n-grams that are in the machine translation,"},{"from":3877.2,"to":3879.86,"location":2,"content":"you wouldn't want people to be able to cheat by"},{"from":3879.86,"to":3882.71,"location":2,"content":"putting almost nothing into the machine translation."},{"from":3882.71,"to":3887.45,"location":2,"content":"So, you might wanna game it by no matter what the source document is."},{"from":3887.45,"to":3889.52,"location":2,"content":"If the target language is English,"},{"from":3889.52,"to":3891.11,"location":2,"content":"you could just um say,"},{"from":3891.11,"to":3892.79,"location":2,"content":"\"My translation is the,"},{"from":3892.79,"to":3895.49,"location":2,"content":"because I'm pretty sure that will be in"},{"from":3895.49,"to":3899.32,"location":2,"content":"the reference translation somewhere and I'll get 0.3 unigram,"},{"from":3899.32,"to":3902.84,"location":2,"content":"and that's not great but I'll get something for that and I am done.\""},{"from":3902.84,"to":3904.88,"location":2,"content":"And so you wouldn't want that and so,"},{"from":3904.88,"to":3908.87,"location":2,"content":"you're then being penalized by something called the brevity penalty if"},{"from":3908.87,"to":3914.04,"location":2,"content":"your translation is shorter than the reference translation,"},{"from":3914.04,"to":3918.37,"location":2,"content":"and so this BLEU metric is um forming"},{"from":3918.37,"to":3924.28,"location":2,"content":"a geometric average of n-gram precision up to some n. Normally,"},{"from":3924.28,"to":3925.3,"location":2,"content":"it's sort of up to four,"},{"from":3925.3,"to":3926.49,"location":2,"content":"is how it's done."},{"from":3926.49,"to":3929,"location":2,"content":"Where it's a weighted geometric average,"},{"from":3929,"to":3932.41,"location":2,"content":"where you're putting weights on the different n-grams."},{"from":3932.41,"to":3935.87,"location":2,"content":"Um, for the assignment, we're only using unigrams and bigrams."},{"from":3935.87,"to":3939.45,"location":2,"content":"So, you could say that means we're putting a weight of zero on um,"},{"from":3939.45,"to":3942.65,"location":2,"content":"the trigrams and 4-grams."},{"from":3942.65,"to":3946.24,"location":2,"content":"Okay. Um, and so that's basically what we're doing."},{"from":3946.24,"to":3949.28,"location":2,"content":"I-I've just mentioned um couple of other things."},{"from":3949.28,"to":3951.84,"location":2,"content":"You might think that this is kind of random,"},{"from":3951.84,"to":3953.78,"location":2,"content":"and so people have um,"},{"from":3953.78,"to":3957.53,"location":2,"content":"used this idea of rather than just having one reference translation,"},{"from":3957.53,"to":3960.08,"location":2,"content":"we could have multiple reference translations,"},{"from":3960.08,"to":3962.72,"location":2,"content":"because that way we can allow for there being"},{"from":3962.72,"to":3965.54,"location":2,"content":"variation and good ways of translating things,"},{"from":3965.54,"to":3969.74,"location":2,"content":"because in language there's always lots of good ways that you can translate one sentence."},{"from":3969.74,"to":3972.43,"location":2,"content":"Um, people have done that quite a bit,"},{"from":3972.43,"to":3976.82,"location":2,"content":"but people have also decided that even if you have one translation,"},{"from":3976.82,"to":3980.99,"location":2,"content":"provided it's independent and on a kind of statistical basis,"},{"from":3980.99,"to":3985.34,"location":2,"content":"you're still more likely to match it if your translation is a good translation."},{"from":3985.34,"to":3987.56,"location":2,"content":"So, it's probably okay."},{"from":3987.56,"to":3992.93,"location":2,"content":"Um, so when BLEU was originally um, introduced,"},{"from":3992.93,"to":3997.37,"location":2,"content":"BLEU seemed marvelous and people drew graphs like this showing how"},{"from":3997.37,"to":4001.91,"location":2,"content":"closely BLEU scores correlated um,"},{"from":4001.91,"to":4005.61,"location":2,"content":"with human judgments of translation quality."},{"from":4005.61,"to":4008.71,"location":2,"content":"However, um, like a lot of things in life,"},{"from":4008.71,"to":4010.9,"location":2,"content":"there are a lot of things that are great measures,"},{"from":4010.9,"to":4013.87,"location":2,"content":"providing people aren't directly trying to optimize it,"},{"from":4013.87,"to":4016.72,"location":2,"content":"and so what's happened since then um,"},{"from":4016.72,"to":4020.62,"location":2,"content":"is that everybody has been trying to optimize BLEU scores,"},{"from":4020.62,"to":4026.38,"location":2,"content":"and the result of that is that BLEU scores have gone up massively but the correlation"},{"from":4026.38,"to":4028.54,"location":2,"content":"between BLEU scores and human judgments of"},{"from":4028.54,"to":4032.18,"location":2,"content":"translation in quality have gone down massively,"},{"from":4032.18,"to":4036.55,"location":2,"content":"and so we're in this current state that um, the BLEU scores,"},{"from":4036.55,"to":4042.64,"location":2,"content":"the machines, um are pretty near the scores of human translations."},{"from":4042.64,"to":4044.8,"location":2,"content":"So, you know, according to BLEU scores,"},{"from":4044.8,"to":4048.57,"location":2,"content":"we're producing almost human quality machine translation,"},{"from":4048.57,"to":4052.69,"location":2,"content":"but if you actually look at the real quality of the translations,"},{"from":4052.69,"to":4054.1,"location":2,"content":"they're still well behind"},{"from":4054.1,"to":4059.56,"location":2,"content":"human beings um and because you could say the metric is being gamed."},{"from":4059.56,"to":4065.95,"location":2,"content":"Okay. I'll hope those things help for giving more sense um for assignment four."},{"from":4065.95,"to":4068.26,"location":2,"content":"Um, so now for the last um,"},{"from":4068.26,"to":4070.14,"location":2,"content":"about 12 minutes, um,"},{"from":4070.14,"to":4071.5,"location":2,"content":"I just now wanna um,"},{"from":4071.5,"to":4078.16,"location":2,"content":"return to um final projects and say a little bit more um about final projects."},{"from":4078.16,"to":4081.2,"location":2,"content":"Um so, there many,"},{"from":4081.2,"to":4083.71,"location":2,"content":"many different ways you can do final projects,"},{"from":4083.71,"to":4086.29,"location":2,"content":"but just to sort of go through the steps."},{"from":4086.29,"to":4089.17,"location":2,"content":"I mean, you know, for a simple straightforward project,"},{"from":4089.17,"to":4091.51,"location":2,"content":"this is kind of the steps that you want to go through."},{"from":4091.51,"to":4093.3,"location":2,"content":"So, you choose some tasks,"},{"from":4093.3,"to":4097.38,"location":2,"content":"summarizing text um, producing a shorter version of a text."},{"from":4097.38,"to":4100.18,"location":2,"content":"You work out some dataset that you can use."},{"from":4100.18,"to":4102.97,"location":2,"content":"So, this is an example of the kind of tasks that there"},{"from":4102.97,"to":4106.02,"location":2,"content":"are academic data sets for that other people have used,"},{"from":4106.02,"to":4108.25,"location":2,"content":"and so you could just use one of those,"},{"from":4108.25,"to":4111.73,"location":2,"content":"and that's it, you're already done or you could think \"Oh no!"},{"from":4111.73,"to":4113.35,"location":2,"content":"I'm much too creative for that."},{"from":4113.35,"to":4118.78,"location":2,"content":"I'm gonna come up with my own dataset [NOISE] um and get some online source and do it.\""},{"from":4118.78,"to":4120.37,"location":2,"content":"Um, and you know,"},{"from":4120.37,"to":4125.8,"location":2,"content":"summaries of the kind of things you can find online and produce your own dataset."},{"from":4125.8,"to":4128.81,"location":2,"content":"Um [NOISE] I wanna say a bit in,"},{"from":4128.81,"to":4130.4,"location":2,"content":"in just after this,"},{"from":4130.4,"to":4133.38,"location":2,"content":"about separating off um data sets for"},{"from":4133.38,"to":4136.86,"location":2,"content":"training and test data, so I'll delay that, but that's important."},{"from":4136.86,"to":4141.44,"location":2,"content":"Then, you want to work out a way to evaluate your um,"},{"from":4141.44,"to":4144.94,"location":2,"content":"system including an automatic evaluation."},{"from":4144.94,"to":4146.53,"location":2,"content":"Um, normally, for summarization,"},{"from":4146.53,"to":4148.51,"location":2,"content":"people use a slightly different metric called"},{"from":4148.51,"to":4152.34,"location":2,"content":"ROUGE but it's sort of related to BLEU hence its name."},{"from":4152.34,"to":4154.96,"location":2,"content":"Um, it's the same story that it sort of works,"},{"from":4154.96,"to":4157.16,"location":2,"content":"but human evaluation is much better."},{"from":4157.16,"to":4161.34,"location":2,"content":"Um, but you need- so you need to work out some metrics you can use for the project."},{"from":4161.34,"to":4165.53,"location":2,"content":"Um, the next thing you should do is establish a baseline."},{"from":4165.53,"to":4169.56,"location":2,"content":"So, if it's a well-worked on problem there might already be one,"},{"from":4169.56,"to":4173.17,"location":2,"content":"but it's not bad to try and calculate one for yourself anyway,"},{"from":4173.17,"to":4176.17,"location":2,"content":"and in particular what you should first have is"},{"from":4176.17,"to":4179.44,"location":2,"content":"a very simple model and see how well it works."},{"from":4179.44,"to":4182.15,"location":2,"content":"So, for human language material,"},{"from":4182.15,"to":4185.02,"location":2,"content":"often doing things like bag of words models,"},{"from":4185.02,"to":4188.05,"location":2,"content":"whether they're just a simple classifier over"},{"from":4188.05,"to":4192.54,"location":2,"content":"words or a new bag of words, averaging word vectors."},{"from":4192.54,"to":4196.99,"location":2,"content":"It's just useful to try that on the task and see how it works,"},{"from":4196.99,"to":4199.68,"location":2,"content":"see what kinds of things it already gets right,"},{"from":4199.68,"to":4201.82,"location":2,"content":"what kind of things it gets wrong."},{"from":4201.82,"to":4203.88,"location":2,"content":"You know, one possibility is you will find that"},{"from":4203.88,"to":4207.14,"location":2,"content":"a very simple model already does great on your task."},{"from":4207.14,"to":4208.57,"location":2,"content":"If that's the case, um,"},{"from":4208.57,"to":4210.27,"location":2,"content":"you have too easy a task,"},{"from":4210.27,"to":4216.46,"location":2,"content":"and you probably need to find a task that's more challenging to work on. Um, yes."},{"from":4216.46,"to":4220.09,"location":2,"content":"So after that, you'll then sort of think about what could be a good kind"},{"from":4220.09,"to":4223.93,"location":2,"content":"of neural network model that might do well, implement it,"},{"from":4223.93,"to":4228.64,"location":2,"content":"test it um, see what kind of errors that makes and you know,"},{"from":4228.64,"to":4230.55,"location":2,"content":"that's sort of if you've gotten that far,"},{"from":4230.55,"to":4233.6,"location":2,"content":"you're sort of in the right space for a class project."},{"from":4233.6,"to":4237.4,"location":2,"content":"But, you know, it's sort of hoped that you could do more than that."},{"from":4237.4,"to":4239.94,"location":2,"content":"But after you've seen the errors from the first version,"},{"from":4239.94,"to":4243.86,"location":2,"content":"you could think about how to make it better and come up with a better project,"},{"from":4243.86,"to":4246.06,"location":2,"content":"and so I would encourage everyone,"},{"from":4246.06,"to":4248.68,"location":2,"content":"you know, you really do want to look at the data, right?"},{"from":4248.68,"to":4254.62,"location":2,"content":"You don't just wanna be sort of having things and files and run and say \"Okay, 0.71."},{"from":4254.62,"to":4257.37,"location":2,"content":"Let me make some random change 0.70."},{"from":4257.37,"to":4260.23,"location":2,"content":"Oh, that's not a good one,\" repeat over."},{"from":4260.23,"to":4264.33,"location":2,"content":"You actually want to be sort of looking at your dataset in any way you can."},{"from":4264.33,"to":4266.76,"location":2,"content":"It's good to visualize the dataset to understand what's"},{"from":4266.76,"to":4269.5,"location":2,"content":"important in it that you might be able to take advantage of,"},{"from":4269.5,"to":4271.11,"location":2,"content":"you want to be able to look at what kind of"},{"from":4271.11,"to":4272.97,"location":2,"content":"errors are being made because that might give you"},{"from":4272.97,"to":4276.86,"location":2,"content":"ideas of how you could put more stuff into the model that would do better."},{"from":4276.86,"to":4280.47,"location":2,"content":"Um, you might wanna do some graphing of the effect of hyper-parameters,"},{"from":4280.47,"to":4282.46,"location":2,"content":"so you can kind of understand that better."},{"from":4282.46,"to":4284.37,"location":2,"content":"And so, the hope is that you will try out"},{"from":4284.37,"to":4287.25,"location":2,"content":"some other kinds of models and make things better."},{"from":4287.25,"to":4289.52,"location":2,"content":"And sort of one of the goals here is,"},{"from":4289.52,"to":4294.09,"location":2,"content":"it's good if you've sort of got a well-setup experimental setup,"},{"from":4294.09,"to":4297.3,"location":2,"content":"so you can easily turn around experiments because then you're just more"},{"from":4297.3,"to":4301.85,"location":2,"content":"likely to be able to try several things in the time available."},{"from":4301.85,"to":4305.4,"location":2,"content":"Okay. Um, couple of other things I wanted to mention."},{"from":4305.4,"to":4309.61,"location":2,"content":"Um, one is sort of different amounts of data."},{"from":4309.61,"to":4313.51,"location":2,"content":"So, it's really, really important for all the stuff that we do,"},{"from":4313.51,"to":4316.87,"location":2,"content":"that we have different sets of data."},{"from":4316.87,"to":4318.64,"location":2,"content":"So, we have trained data,"},{"from":4318.64,"to":4320.43,"location":2,"content":"we have dev test data,"},{"from":4320.43,"to":4323.13,"location":2,"content":"we have test data at least,"},{"from":4323.13,"to":4325.54,"location":2,"content":"and sometimes it's useful to have even,"},{"from":4325.54,"to":4328.24,"location":2,"content":"um, more data available."},{"from":4328.24,"to":4334.08,"location":2,"content":"So, for many of the public datasets, they're already split into different subsets like this,"},{"from":4334.08,"to":4335.1,"location":2,"content":"but there are some that aren't."},{"from":4335.1,"to":4337.28,"location":2,"content":"There are some that might only have a training set,"},{"from":4337.28,"to":4339,"location":2,"content":"and a test set."},{"from":4339,"to":4341.26,"location":2,"content":"And what you don't want to do is think,"},{"from":4341.26,"to":4343.5,"location":2,"content":"\"Oh, there's only a training set and a test set."},{"from":4343.5,"to":4346.18,"location":2,"content":"Therefore I'll just run every time on the test set.\""},{"from":4346.18,"to":4349.89,"location":2,"content":"That- that's a really invalid way to go about your research."},{"from":4349.89,"to":4350.99,"location":2,"content":"So, if there aren't"},{"from":4350.99,"to":4354.39,"location":2,"content":"dev sets available or you need to do some more tuning,"},{"from":4354.39,"to":4356.38,"location":2,"content":"and you need some separate tuning data,"},{"from":4356.38,"to":4359.46,"location":2,"content":"you sort of have to, um,"},{"from":4359.46,"to":4363.4,"location":2,"content":"make it for yourself by splitting off some of the training data,"},{"from":4363.4,"to":4367.77,"location":2,"content":"and not using it for the basic training and using it for tuning,"},{"from":4367.77,"to":4370.44,"location":2,"content":"and fo- as dev data."},{"from":4370.44,"to":4372.53,"location":2,"content":"Um, yes."},{"from":4372.53,"to":4376.49,"location":2,"content":"So, to go on about that, um, more, more."},{"from":4376.49,"to":4382.68,"location":2,"content":"So, the basic issue is this issue of fitting and overfitting to particular datasets."},{"from":4382.68,"to":4385.61,"location":2,"content":"So, when we train a model, um,"},{"from":4385.61,"to":4387.56,"location":2,"content":"on some training data,"},{"from":4387.56,"to":4390.46,"location":2,"content":"we train it and the error rate goes down."},{"from":4390.46,"to":4395.9,"location":2,"content":"And over time, we gradually overfit to the training data because we sort of"},{"from":4395.9,"to":4401.82,"location":2,"content":"pick up on our neural network f- facts about the particular training data items,"},{"from":4401.82,"to":4404.03,"location":2,"content":"and we just sort of start to learn them."},{"from":4404.03,"to":4405.79,"location":2,"content":"Now in the old days,"},{"from":4405.79,"to":4410.06,"location":2,"content":"the fact that you overfit to the training data was seen as evil."},{"from":4410.06,"to":4412.13,"location":2,"content":"In modern neural network think,"},{"from":4412.13,"to":4415.63,"location":2,"content":"we don't think it is evil what we overfit to the training data"},{"from":4415.63,"to":4420.11,"location":2,"content":"because all neural nets that are any good overfit to the training data,"},{"from":4420.11,"to":4422.88,"location":2,"content":"and we would be very sad if they didn't."},{"from":4422.88,"to":4424.66,"location":2,"content":"I'll come back to that in a moment."},{"from":4424.66,"to":4427.56,"location":2,"content":"But nevertheless, they're overfitting like crazy."},{"from":4427.56,"to":4432.92,"location":2,"content":"So, what we, but and what we want to build is something that generalizes well."},{"from":4432.92,"to":4435.09,"location":2,"content":"So, we have to have some separate data,"},{"from":4435.09,"to":4436.81,"location":2,"content":"that's our validation data,"},{"from":4436.81,"to":4441.03,"location":2,"content":"and say look at what performance looks like on the validation data."},{"from":4441.03,"to":4444.88,"location":2,"content":"And commonly we find that training up until some point,"},{"from":4444.88,"to":4448.5,"location":2,"content":"improves our performance on separate validation data,"},{"from":4448.5,"to":4451.05,"location":2,"content":"and then we start to overfit to"},{"from":4451.05,"to":4455.77,"location":2,"content":"the training data in a way that our validation set performance gets worse."},{"from":4455.77,"to":4457.6,"location":2,"content":"Um, and so, then,"},{"from":4457.6,"to":4461.97,"location":2,"content":"further training on the training data isn't useful because we're starting"},{"from":4461.97,"to":4466.7,"location":2,"content":"to build a model that generalizes worse when run on other data."},{"from":4466.7,"to":4468.81,"location":2,"content":"But there's- the whole point here is,"},{"from":4468.81,"to":4474.84,"location":2,"content":"we can only do this experiment if our validation data is separate from our training data."},{"from":4474.84,"to":4477.91,"location":2,"content":"If it's the same data or if it's overlapping data,"},{"from":4477.91,"to":4479.95,"location":2,"content":"we can't draw this graph."},{"from":4479.95,"to":4482.81,"location":2,"content":"Um, and so, therefore, we can't do valid experiments."},{"from":4482.81,"to":4487.09,"location":2,"content":"Um, now you might think, \"Oh, well,"},{"from":4487.09,"to":4489.04,"location":2,"content":"maybe I can, um,"},{"from":4489.04,"to":4492.18,"location":2,"content":"do this and just use the test set of data.\""},{"from":4492.18,"to":4495.81,"location":2,"content":"Um, but that's also invalid,"},{"from":4495.81,"to":4498.92,"location":2,"content":"and the reason why that's invalid is,"},{"from":4498.92,"to":4500.84,"location":2,"content":"as you do experiments,"},{"from":4500.84,"to":4505.49,"location":2,"content":"you also start slowly over fitting to your development data."},{"from":4505.49,"to":4511.56,"location":2,"content":"So, the standard practice is you do a run and you get a score on the development data."},{"from":4511.56,"to":4513.15,"location":2,"content":"You do a second run."},{"from":4513.15,"to":4515.04,"location":2,"content":"You do worse on the development data,"},{"from":4515.04,"to":4517.77,"location":2,"content":"and so you throw that second model away."},{"from":4517.77,"to":4519.02,"location":2,"content":"You do a third experiment."},{"from":4519.02,"to":4520.95,"location":2,"content":"You do better on the development data,"},{"from":4520.95,"to":4524.9,"location":2,"content":"and so you keep that model and you repeat over 50 times."},{"from":4524.9,"to":4528.52,"location":2,"content":"And while some of those subsequent models you keep,"},{"from":4528.52,"to":4534.19,"location":2,"content":"are genuinely better because you sort of worked out something good to do."},{"from":4534.19,"to":4538.89,"location":2,"content":"But it turns out that some of those subsequent models only sort of just happened."},{"from":4538.89,"to":4542.98,"location":2,"content":"You just got lucky and they happened to score better on the development data."},{"from":4542.98,"to":4546.9,"location":2,"content":"And so, if you kind of keep repeating that process 60 or 100 times,"},{"from":4546.9,"to":4550.57,"location":2,"content":"you're also gradually [NOISE] overfitting on your development data,"},{"from":4550.57,"to":4553.57,"location":2,"content":"and you get unrealistically good dev scores."},{"from":4553.57,"to":4555.48,"location":2,"content":"And so, that means two things."},{"from":4555.48,"to":4559.82,"location":2,"content":"You know, if you want to be rigorous and do a huge amount of hyper-parameter exploration,"},{"from":4559.82,"to":4562.83,"location":2,"content":"it can be good to have a second development se- test set,"},{"from":4562.83,"to":4565.66,"location":2,"content":"so that you have one, that you haven't overfit as much."},{"from":4565.66,"to":4568.45,"location":2,"content":"And if you want to have valid scores on te-"},{"from":4568.45,"to":4572.6,"location":2,"content":"on as to what is my actual performance on independent data,"},{"from":4572.6,"to":4575.73,"location":2,"content":"it's vital that you have separate test data that you are"},{"from":4575.73,"to":4579.27,"location":2,"content":"not using at all in this process, right?"},{"from":4579.27,"to":4581.4,"location":2,"content":"So, the ideal state is that,"},{"from":4581.4,"to":4584.86,"location":2,"content":"for your real test data, um,"},{"from":4584.86,"to":4589.59,"location":2,"content":"that you never used it at all until you've finished training your data, uh,"},{"from":4589.59,"to":4594.06,"location":2,"content":"training your model, and then you run your final model once on the test data,"},{"from":4594.06,"to":4596.51,"location":2,"content":"and you write up your paper and those are your results."},{"from":4596.51,"to":4599.49,"location":2,"content":"Now, I will be honest and say the world usually isn't"},{"from":4599.49,"to":4602.79,"location":2,"content":"quite that perfect because after you've done that,"},{"from":4602.79,"to":4604.96,"location":2,"content":"you then go to sleep [NOISE] and wake up thinking."},{"from":4604.96,"to":4607.64,"location":2,"content":"\"I've got a fantastic idea of how to make my model better.\""},{"from":4607.64,"to":4609.52,"location":2,"content":"and you run off and implement that,"},{"from":4609.52,"to":4611.7,"location":2,"content":"and it works great on the dev data,"},{"from":4611.7,"to":4615.39,"location":2,"content":"and then for you, run it on the test data again and the numbers go up."},{"from":4615.39,"to":4617.64,"location":2,"content":"Um, sort of everybody does that."},{"from":4617.64,"to":4619.03,"location":2,"content":"Um, and you know,"},{"from":4619.03,"to":4621.3,"location":2,"content":"in modicum it's okay,"},{"from":4621.3,"to":4626.32,"location":2,"content":"you know, if that means you occasionally run on the test data it's not so bad, um,"},{"from":4626.32,"to":4630.55,"location":2,"content":"but you really need to be aware of the slippery slope because,"},{"from":4630.55,"to":4633.56,"location":2,"content":"if you then start falling into, \"I've got a new model."},{"from":4633.56,"to":4634.89,"location":2,"content":"Let me try that one on the test data."},{"from":4634.89,"to":4636.93,"location":2,"content":"I've got a new model. Let me try this one on the test data.\""},{"from":4636.93,"to":4640.13,"location":2,"content":"Then you're just sort of overfitting to the test data,"},{"from":4640.13,"to":4643.1,"location":2,"content":"and getting an unrealistically high score."},{"from":4643.1,"to":4647.6,"location":2,"content":"And that's precisely why a lot of the competitions like Kaggle competitions,"},{"from":4647.6,"to":4651.68,"location":2,"content":"have a secret test dataset that you can't run on."},{"from":4651.68,"to":4653.61,"location":2,"content":"So, that they can do a genuine,"},{"from":4653.61,"to":4657.15,"location":2,"content":"independent test on the actual test data."},{"from":4657.15,"to":4662.55,"location":2,"content":"Okay. Um, let's see, um, a couple more minutes."},{"from":4662.55,"to":4666.52,"location":2,"content":"So, yeah, getting your neural network to train."},{"from":4666.52,"to":4669.14,"location":2,"content":"Um, my two messages are, you know,"},{"from":4669.14,"to":4672.43,"location":2,"content":"first of all, you should start with a positive attitude."},{"from":4672.43,"to":4674.56,"location":2,"content":"Neural networks want to learn."},{"from":4674.56,"to":4675.95,"location":2,"content":"If they're not learning,"},{"from":4675.95,"to":4678.5,"location":2,"content":"you're doing something to stop them from learning."},{"from":4678.5,"to":4680.07,"location":2,"content":"And so, you should just stop that,"},{"from":4680.07,"to":4682.26,"location":2,"content":"and they will learn because they want to learn."},{"from":4682.26,"to":4683.94,"location":2,"content":"They're just like little children."},{"from":4683.94,"to":4689.79,"location":2,"content":"Um, but, if the follow up to that is the grim reality that there are just tons"},{"from":4689.79,"to":4691.91,"location":2,"content":"of things you can do that will cause"},{"from":4691.91,"to":4695.71,"location":2,"content":"your neural networks not to learn very well or at all,"},{"from":4695.71,"to":4697.82,"location":2,"content":"and this is the frustrating part of"},{"from":4697.82,"to":4701.6,"location":2,"content":"this whole field because you know, it's not like a compile error."},{"from":4701.6,"to":4705.34,"location":2,"content":"It can just be hard to find and fix them."},{"from":4705.34,"to":4707.72,"location":2,"content":"And, you know, it is just really"},{"from":4707.72,"to":4712.02,"location":2,"content":"standard that you spend more time dealing with trying to find,"},{"from":4712.02,"to":4715.23,"location":2,"content":"and fix why it doesn't work well and getting it to work well than"},{"from":4715.23,"to":4719.27,"location":2,"content":"you- than the time you spent writing the code for your model."},{"from":4719.27,"to":4723.73,"location":2,"content":"So, remember to budget for that when you're doing your final project,"},{"from":4723.73,"to":4728.47,"location":2,"content":"it just won't work if you finish the code a day or two before the deadline."},{"from":4728.47,"to":4731.99,"location":2,"content":"Um, so, you need to work out what those things are,"},{"from":4731.99,"to":4734.97,"location":2,"content":"\"That can be hard,\" but you know experience,"},{"from":4734.97,"to":4737.26,"location":2,"content":"experimental care, rules of thumb help."},{"from":4737.26,"to":4739.75,"location":2,"content":"So, there are just lots of things that are important."},{"from":4739.75,"to":4742.48,"location":2,"content":"So, you know, your learning rates are important."},{"from":4742.48,"to":4745.77,"location":2,"content":"If your learning rates are way too high, things won't learn."},{"from":4745.77,"to":4747.96,"location":2,"content":"If your learning rates are way too low,"},{"from":4747.96,"to":4750.65,"location":2,"content":"they will learn very slowly and badly."},{"from":4750.65,"to":4753.27,"location":2,"content":"Um, initialization makes a difference."},{"from":4753.27,"to":4759.04,"location":2,"content":"Having good initialization often determines how well neural networks, um, learn."},{"from":4759.04,"to":4763.44,"location":2,"content":"Um, I have a separate slide here that I probably haven't got time to go"},{"from":4763.44,"to":4768.23,"location":2,"content":"through all of on sort of for sequence [NOISE] models,"},{"from":4768.23,"to":4771.95,"location":2,"content":"some of the tips of what people normally think are"},{"from":4771.95,"to":4775.73,"location":2,"content":"good ways to get those models, um, working."},{"from":4775.73,"to":4778.41,"location":2,"content":"But I'll just say this one last thing."},{"from":4778.41,"to":4781.85,"location":2,"content":"Um, I think the strategy that you really want to"},{"from":4781.85,"to":4785.92,"location":2,"content":"take is to work incrementally and build up slowly."},{"from":4785.92,"to":4787.49,"location":2,"content":"It just doesn't work to think,"},{"from":4787.49,"to":4789.53,"location":2,"content":"\"Oh I've got the mother of all models,"},{"from":4789.53,"to":4791.66,"location":2,"content":"and build this enormously complex thing,"},{"from":4791.66,"to":4793,"location":2,"content":"and then run it on the data,"},{"from":4793,"to":4794.78,"location":2,"content":"and it crashes and burns.\""},{"from":4794.78,"to":4797.45,"location":2,"content":"You have no idea what to do at that point,"},{"from":4797.45,"to":4800.65,"location":2,"content":"that the only good way is to sort of build up slowly."},{"from":4800.65,"to":4802.94,"location":2,"content":"So [NOISE] start with a very simple model,"},{"from":4802.94,"to":4804.3,"location":2,"content":"get it to work,"},{"from":4804.3,"to":4805.82,"location":2,"content":"add your bells and whistles,"},{"from":4805.82,"to":4807.49,"location":2,"content":"extra layers and so on."},{"from":4807.49,"to":4809.59,"location":2,"content":"Get them to work or abandon them."},{"from":4809.59,"to":4814.23,"location":2,"content":"And so, try and proceed from one working model to another as much as possible."},{"from":4814.23,"to":4818.98,"location":2,"content":"One of- another way that you can start small and build up is with data."},{"from":4818.98,"to":4822.58,"location":2,"content":"The easiest way to see bugs and problems in your model,"},{"from":4822.58,"to":4825.61,"location":2,"content":"is with the minutest possible amount of data."},{"from":4825.61,"to":4829.03,"location":2,"content":"So, start with a dataset of eight items."},{"from":4829.03,"to":4832.66,"location":2,"content":"Sometimes it's even best if those eight items are ones that are"},{"from":4832.66,"to":4834.93,"location":2,"content":"artificial data that you designed yourself"},{"from":4834.93,"to":4837.56,"location":2,"content":"because then you can often more easily see problems,"},{"from":4837.56,"to":4838.81,"location":2,"content":"and what's going wrong."},{"from":4838.81,"to":4840.56,"location":2,"content":"So, you should train on that,"},{"from":4840.56,"to":4842.42,"location":2,"content":"um, because it's only eight items,"},{"from":4842.42,"to":4844.12,"location":2,"content":"training will only take seconds,"},{"from":4844.12,"to":4847.2,"location":2,"content":"and that's really, really useful for being able to iterate quickly."},{"from":4847.2,"to":4849.56,"location":2,"content":"And you know, if you can't have your model get"},{"from":4849.56,"to":4855.06,"location":2,"content":"100 percent accuracy on training and testing on those eight examples,"},{"from":4855.06,"to":4859.73,"location":2,"content":"well, you know, either the model is woefully under powered or the model is broken,"},{"from":4859.73,"to":4862.9,"location":2,"content":"and you've got clear things to do right there."},{"from":4862.9,"to":4866.4,"location":2,"content":"Um, when you go to a bigger model, um,"},{"from":4866.4,"to":4870.11,"location":2,"content":"the standard practice with modern neural networks is,"},{"from":4870.11,"to":4872.33,"location":2,"content":"you want to train your models."},{"from":4872.33,"to":4876.24,"location":2,"content":"You want models that can overfit massively on the training set."},{"from":4876.24,"to":4879.56,"location":2,"content":"So, in general, your models should still be getting"},{"from":4879.56,"to":4883.38,"location":2,"content":"close to 100 percent accuracy on the training set after you've"},{"from":4883.38,"to":4887.16,"location":2,"content":"trained it for a long time because powerful neural network models are"},{"from":4887.16,"to":4891.09,"location":2,"content":"just really good at over-fitting to, and memorizing data."},{"from":4891.09,"to":4893.45,"location":2,"content":"Um, if that's not the case well, you know,"},{"from":4893.45,"to":4894.81,"location":2,"content":"maybe you want a bigger model."},{"from":4894.81,"to":4898.16,"location":2,"content":"Maybe you want to have higher hidden dimensions or"},{"from":4898.16,"to":4901.91,"location":2,"content":"add an extra layer to your neural network or something like that."},{"from":4901.91,"to":4904.93,"location":2,"content":"You shouldn't be scared of overfitting on the training data."},{"from":4904.93,"to":4907.18,"location":2,"content":"But once you've proved you can do that,"},{"from":4907.18,"to":4910.56,"location":2,"content":"you then do want a model that also generalizes well."},{"from":4910.56,"to":4915.41,"location":2,"content":"And so, normally the way that you're addressing that is then by regularizing the model,"},{"from":4915.41,"to":4917.85,"location":2,"content":"and there are different ways to regularize your model,"},{"from":4917.85,"to":4921.3,"location":2,"content":"but we talked about in the assignment, doing dropout."},{"from":4921.3,"to":4923.76,"location":2,"content":"I mean, using generous dropout is"},{"from":4923.76,"to":4927.98,"location":2,"content":"one very common and effective strategy for regularizing your models."},{"from":4927.98,"to":4931.73,"location":2,"content":"And so, then you've, what you want to be doing is regularizing"},{"from":4931.73,"to":4936.27,"location":2,"content":"your model enough that the curve no longer looks like this,"},{"from":4936.27,"to":4941.1,"location":2,"content":"but instead that your validation performance kind of levels out,"},{"from":4941.1,"to":4943.51,"location":2,"content":"but doesn't start ramping back up again,"},{"from":4943.51,"to":4946.82,"location":2,"content":"and that's then a sort of a sign of a well regularized model."},{"from":4946.82,"to":4949.21,"location":2,"content":"Okay. I will stop there,"},{"from":4949.21,"to":4953.31,"location":2,"content":"and then we'll come back to the question-answering project on Thursday."}]}