{"font_size":0.4,"font_color":"#FFFFFF","background_alpha":0.5,"background_color":"#9C27B0","Stroke":"none","body":[{"from":4.76,"to":9.57,"location":2,"content":"Let's get started. So welcome to the very final lecture of the class."},{"from":9.57,"to":11.47,"location":2,"content":"I hope you're all surviving the last week and,"},{"from":11.47,"to":13.83,"location":2,"content":"uh, wrapping up your projects."},{"from":13.83,"to":18.54,"location":2,"content":"So today we're going to be hearing about the future of NLP and deep learning."},{"from":18.54,"to":22.44,"location":2,"content":"Uh, so Chris is still traveling and today we're going to be having Kevin Clark,"},{"from":22.44,"to":24.82,"location":2,"content":"who's one of the PhD students in the lab, uh,"},{"from":24.82,"to":26.58,"location":2,"content":"in the NLP lab,"},{"from":26.58,"to":29.61,"location":2,"content":"and he was also one of the head TAs for the class last year."},{"from":29.61,"to":31.79,"location":2,"content":"So he's very familiar with the class as a whole."},{"from":31.79,"to":33.77,"location":2,"content":"Um, so, take it away Kevin."},{"from":33.77,"to":37.83,"location":2,"content":"Okay. Thanks, Abby. Um, yeah,"},{"from":37.83,"to":40.44,"location":2,"content":"it's great to be back after being a TA last year."},{"from":40.44,"to":45.35,"location":2,"content":"Um, I'm really excited today to be talking about the future of deep learning and NLP."},{"from":45.35,"to":49.09,"location":2,"content":"Um, obviously, trying to forecast the future, um,"},{"from":49.09,"to":51.8,"location":2,"content":"for deep learning or anything in that space is really"},{"from":51.8,"to":54.8,"location":2,"content":"difficult because the field is changing super quickly."},{"from":54.8,"to":57.08,"location":2,"content":"Um, so as one reference point, um,"},{"from":57.08,"to":60.05,"location":2,"content":"let's look at what did deep learning for NLP,"},{"from":60.05,"to":62.29,"location":2,"content":"um, look like about five years ago."},{"from":62.29,"to":68.3,"location":2,"content":"And really, a lot of ideas that are now considered to be pretty core techniques,"},{"from":68.3,"to":70.44,"location":2,"content":"um, when we think of deep learning and NLP,"},{"from":70.44,"to":72.17,"location":2,"content":"um, didn't even exist back then."},{"from":72.17,"to":74.87,"location":2,"content":"Um, so things you learned in this class like Seq2Seq,"},{"from":74.87,"to":77.18,"location":2,"content":"attention mechanism, um, large-scale,"},{"from":77.18,"to":80.11,"location":2,"content":"reading comprehension, uh, even frameworks"},{"from":80.11,"to":83.3,"location":2,"content":"such as TensorFlow or Pytorch, um, didn't exist."},{"from":83.3,"to":87.14,"location":2,"content":"And, uh, the point I want to make with this is that, um,"},{"from":87.14,"to":91.2,"location":2,"content":"because of this it's really difficult to, to look into the future and say,"},{"from":91.2,"to":93.67,"location":2,"content":"okay, what are things going to be like?"},{"from":93.67,"to":98.06,"location":2,"content":"Um, what I think we can do though is look at, um,"},{"from":98.06,"to":101.87,"location":2,"content":"areas that right now are really sort of taking off, um,"},{"from":101.87,"to":103.64,"location":2,"content":"so areas in which, um,"},{"from":103.64,"to":106.37,"location":2,"content":"there's a lot, been a lot of recent success and kind of, uh,"},{"from":106.37,"to":108.1,"location":2,"content":"project from that, that,"},{"from":108.1,"to":110.9,"location":2,"content":"those same areas will likely be important in the future."},{"from":110.9,"to":115.82,"location":2,"content":"Um, and in this talk I'm going to be mostly focusing on one key idea of"},{"from":115.82,"to":118.97,"location":2,"content":"wh- key idea which is the idea of leveraging"},{"from":118.97,"to":122.92,"location":2,"content":"unlabeled examples when training our NLP systems."},{"from":122.92,"to":127.49,"location":2,"content":"So I'll be talking a bit about doing that for machine translation, um,"},{"from":127.49,"to":130.82,"location":2,"content":"both in improving the quality of translation and even"},{"from":130.82,"to":134.31,"location":2,"content":"in doing a translation in an unsupervised way."},{"from":134.31,"to":136.17,"location":2,"content":"So that means you don't have, um,"},{"from":136.17,"to":139.23,"location":2,"content":"paired sentences, uh, with, with their translations."},{"from":139.23,"to":143.37,"location":2,"content":"Um, you try to learn a translation model only from a monolingual corpus."},{"from":143.37,"to":147.12,"location":2,"content":"Um, the second thing I'll be talking a little bit about is, uh,"},{"from":147.12,"to":149.33,"location":2,"content":"OpenAI's GPT-2, um,"},{"from":149.33,"to":152.44,"location":2,"content":"and in general this phenomenon of really scaling up,"},{"from":152.44,"to":154.04,"location":2,"content":"um, deep learning models."},{"from":154.04,"to":158.33,"location":2,"content":"Um, I know you saw a little bit of this in the lecture on contextual representations,"},{"from":158.33,"to":160.34,"location":2,"content":"but this, but this will be a little bit more in depth."},{"from":160.34,"to":162.34,"location":2,"content":"Um, and I think, um,"},{"from":162.34,"to":166.66,"location":2,"content":"these new developments in NLP have had some,"},{"from":166.66,"to":168.6,"location":2,"content":"um, pretty big, uh,"},{"from":168.6,"to":170.59,"location":2,"content":"impacts in terms of,"},{"from":170.59,"to":173.75,"location":2,"content":"uh, more broadly kind of beyond even the technology we're using,"},{"from":173.75,"to":175.07,"location":2,"content":"and in particular, I mean,"},{"from":175.07,"to":180.56,"location":2,"content":"starting to raise more and more concerns about the social impact of NLP, um,"},{"from":180.56,"to":183.52,"location":2,"content":"both, um, in what our models can do and also in kind"},{"from":183.52,"to":186.59,"location":2,"content":"of plans of what, where people are looking to apply these models, um,"},{"from":186.59,"to":189.75,"location":2,"content":"and I think that really has some risks associated with it, um,"},{"from":189.75,"to":193.16,"location":2,"content":"in terms of security also in terms of areas like bias."},{"from":193.16,"to":196.47,"location":2,"content":"Um, I'm also gonna talk a bit about future areas of research,"},{"from":196.47,"to":199.14,"location":2,"content":"um, these are mostly research areas now that are, um,"},{"from":199.14,"to":202.06,"location":2,"content":"over the past year have really kind of developed into"},{"from":202.06,"to":207.19,"location":2,"content":"promising areas and I expect they will continue to be important in the future."},{"from":207.19,"to":209.7,"location":2,"content":"Okay, um, to start with,"},{"from":209.7,"to":213.31,"location":2,"content":"I wanna ask this question, why has deep learning been so successful recently?"},{"from":213.31,"to":215.51,"location":2,"content":"Um, I like this comic, um,"},{"from":215.51,"to":218.04,"location":2,"content":"here there's a statistical learning person,"},{"from":218.04,"to":221.03,"location":2,"content":"um, and they've got some really complicated,"},{"from":221.03,"to":224.01,"location":2,"content":"um, well-motivated, uh, method for doing, um,"},{"from":224.01,"to":225.51,"location":2,"content":"the task they care about,"},{"from":225.51,"to":227.46,"location":2,"content":"and then the neural net person just says,"},{"from":227.46,"to":229.11,"location":2,"content":"er, stack more layers."},{"from":229.11,"to":232.02,"location":2,"content":"Um, so, so the point I want to make here is that, um,"},{"from":232.02,"to":236.07,"location":2,"content":"deep learning has not been successful recently because it's more"},{"from":236.07,"to":241.53,"location":2,"content":"theoretically motivated or it's more sophisticated than previous techniques, um."},{"from":241.53,"to":244.25,"location":2,"content":"In fact I would say that actually a lot of, um,"},{"from":244.25,"to":246.63,"location":2,"content":"older statistical methods have more of"},{"from":246.63,"to":250.18,"location":2,"content":"a theoretical underpinning than some of the tricks we do in deep learning."},{"from":250.18,"to":254.05,"location":2,"content":"Um, really the thing that makes deep learning so"},{"from":254.05,"to":257.66,"location":2,"content":"successful in recent years has been its ability to scale, right."},{"from":257.66,"to":262.31,"location":2,"content":"So neural nets, as we increase the size of the data,"},{"from":262.31,"to":264.26,"location":2,"content":"as we increase the size of the models, um,"},{"from":264.26,"to":266.17,"location":2,"content":"they get a really big boost in accuracy,"},{"from":266.17,"to":268.56,"location":2,"content":"in ways other approaches do not."},{"from":268.56,"to":272.21,"location":2,"content":"And, um, if you look to the '80s and '90s, um,"},{"from":272.21,"to":276.19,"location":2,"content":"there was actually plenty of research in neural nets going on, um."},{"from":276.19,"to":279.13,"location":2,"content":"But it hadn't, doesn't have a hype around it that it does"},{"from":279.13,"to":282.2,"location":2,"content":"now and that seems likely to be because,"},{"from":282.2,"to":285.02,"location":2,"content":"um, in the past there wasn't, um,"},{"from":285.02,"to":287.18,"location":2,"content":"the same resources in terms of computers,"},{"from":287.18,"to":289.24,"location":2,"content":"in terms of data and, um,"},{"from":289.24,"to":293.12,"location":2,"content":"only now after we've reached sort of an inflection point where we can"},{"from":293.12,"to":295.22,"location":2,"content":"really take advantage of scale in"},{"from":295.22,"to":297.96,"location":2,"content":"our deep learning models and we started to see it become,"},{"from":297.96,"to":301.52,"location":2,"content":"um, a really successful paradigm for machine learning."},{"from":301.52,"to":304.08,"location":2,"content":"Um, if we look at big, uh,"},{"from":304.08,"to":306.07,"location":2,"content":"deep learning success stories, um,"},{"from":306.07,"to":310.2,"location":2,"content":"I think, uh, you can see kind of this idea play out, right?"},{"from":310.2,"to":316.49,"location":2,"content":"So here are three of what are arguably the most famous successes of deep learning, right."},{"from":316.49,"to":318.62,"location":2,"content":"So there's image recognition, where before,"},{"from":318.62,"to":320.87,"location":2,"content":"people used very highly engineered, um,"},{"from":320.87,"to":325.87,"location":2,"content":"features to classify images and now neural nets are much superior, um, to those methods."},{"from":325.87,"to":329.79,"location":2,"content":"Um, machine translation has really closed the gap between, um,"},{"from":329.79,"to":333.02,"location":2,"content":"phrase-based systems and human quality translation,"},{"from":333.02,"to":335.73,"location":2,"content":"so this is widely used in things like Google Translate"},{"from":335.73,"to":339.12,"location":2,"content":"and the quality has actually gotten a lot better over the past five years."},{"from":339.12,"to":343.55,"location":2,"content":"Um, another example that had a lot of hype around it is game-playing, so, um,"},{"from":343.55,"to":346.46,"location":2,"content":"there's been work on Atari games, there's been AlphaGo,"},{"from":346.46,"to":350.39,"location":2,"content":"uh, more recently there's been AlphaStar and OpenAI Five."},{"from":350.39,"to":353.6,"location":2,"content":"Um, if you look at all three of these cases underlying"},{"from":353.6,"to":357.2,"location":2,"content":"these successes is really large amounts of data, right."},{"from":357.2,"to":358.55,"location":2,"content":"So for ImageNet, um,"},{"from":358.55,"to":360.02,"location":2,"content":"for image recognition, um,"},{"from":360.02,"to":363.04,"location":2,"content":"there is the ImageNet dataset which has 14 million images,"},{"from":363.04,"to":366.32,"location":2,"content":"uh, machine translation datasets often have millions of examples."},{"from":366.32,"to":369.27,"location":2,"content":"Um, for game playing you can actually"},{"from":369.27,"to":372.47,"location":2,"content":"generate as much training data as you want essentially,"},{"from":372.47,"to":374.69,"location":2,"content":"um, just by running your agent,"},{"from":374.69,"to":376.04,"location":2,"content":"um, within the game,"},{"from":376.04,"to":379.12,"location":2,"content":"um, over and over again."},{"from":379.12,"to":381.36,"location":2,"content":"Um, so if we,"},{"from":381.36,"to":383.59,"location":2,"content":"if we look to NLP, um,"},{"from":383.59,"to":387.74,"location":2,"content":"the story is quite a bit different for a lot of tasks, um, right."},{"from":387.74,"to":392.03,"location":2,"content":"So if you look at even pretty core kind of popular tasks,"},{"from":392.03,"to":395.06,"location":2,"content":"to say, reading comprehension in English, um,"},{"from":395.06,"to":399.71,"location":2,"content":"datasets like SQuAD are in the order of like 100,000 examples"},{"from":399.71,"to":404.81,"location":2,"content":"which is considerably less than the millions or tens of millions of examples,"},{"from":404.81,"to":407.11,"location":2,"content":"um, that these previous,"},{"from":407.11,"to":410.29,"location":2,"content":"um, successes have, have benefited from."},{"from":410.29,"to":414.21,"location":2,"content":"Um, and that's of course only for English, right."},{"from":414.21,"to":415.77,"location":2,"content":"Um, there are, um,"},{"from":415.77,"to":419.57,"location":2,"content":"thousands of other languages and this is I think"},{"from":419.57,"to":423.77,"location":2,"content":"a problem with NLP data as it exists today."},{"from":423.77,"to":426.45,"location":2,"content":"Um, the vast majority of data is in English, um,"},{"from":426.45,"to":430.07,"location":2,"content":"when in reality fewer than 10% of the world's population,"},{"from":430.07,"to":432.19,"location":2,"content":"um, speak English as their first language."},{"from":432.19,"to":437.56,"location":2,"content":"Um, so these problems with small datasets are only compounded if you look at,"},{"from":437.56,"to":441.46,"location":2,"content":"um, the full spectrum of languages, um, that exist."},{"from":441.46,"to":443.95,"location":2,"content":"Um, so, as what do we do,"},{"from":443.95,"to":445.8,"location":2,"content":"uh, when we're limited by this data,"},{"from":445.8,"to":450.56,"location":2,"content":"but we want to take advantage of deep learning scale and train the biggest models we can."},{"from":450.56,"to":452.51,"location":2,"content":"Um, the popular solution, um,"},{"from":452.51,"to":456.23,"location":2,"content":"that's especially had recent success is using unlabeled data, um,"},{"from":456.23,"to":457.81,"location":2,"content":"because unlike labeled data,"},{"from":457.81,"to":460.84,"location":2,"content":"unlabeled data is very easy to acquire for language."},{"from":460.84,"to":462.12,"location":2,"content":"Um, you can just go to the Internet,"},{"from":462.12,"to":464.69,"location":2,"content":"you can go to books, you can get lots of text, um,"},{"from":464.69,"to":469.37,"location":2,"content":"whereas labeled data usually requires at the least crowdsourcing examples."},{"from":469.37,"to":474.73,"location":2,"content":"Um, in some cases you even require someone who's an expert in something like linguistics,"},{"from":474.73,"to":479.51,"location":2,"content":"um, to, to annotate that data."},{"from":479.51,"to":483.89,"location":2,"content":"Okay, so, um, this first part of the talk is going to be applying"},{"from":483.89,"to":488.19,"location":2,"content":"this idea of leveraging unlabeled data to improve our NLP models,"},{"from":488.19,"to":491.99,"location":2,"content":"um, to the task of machine translation."},{"from":491.99,"to":495.17,"location":2,"content":"Um, so let's talk about machine translation data."},{"from":495.17,"to":500.52,"location":2,"content":"Um, it is true that there do exist quite large datasets for machine translation."},{"from":500.52,"to":503.17,"location":2,"content":"Um, those datasets don't exist because"},{"from":503.17,"to":506.87,"location":2,"content":"NLP researchers have annotated texts for the purpose of training their models, right."},{"from":506.87,"to":509.75,"location":2,"content":"They exist because, er, in various settings,"},{"from":509.75,"to":513.2,"location":2,"content":"translation is done just because it's useful, so for example,"},{"from":513.2,"to":515.07,"location":2,"content":"proceedings of the European Parliament,"},{"from":515.07,"to":517.02,"location":2,"content":"um, proceedings of the United Nations,"},{"from":517.02,"to":521.32,"location":2,"content":"um, some, uh, news sites, they translate their articles into many languages."},{"from":521.32,"to":526.61,"location":2,"content":"Um, so really, the machine translation data we use to train our models are often"},{"from":526.61,"to":532.75,"location":2,"content":"more of byproducts of existing cases where translation is wanted rather than,"},{"from":532.75,"to":537.5,"location":2,"content":"um, kind of a full sampling of the sort of text we see in the world."},{"from":537.5,"to":538.91,"location":2,"content":"Um, so that means number one,"},{"from":538.91,"to":540.68,"location":2,"content":"it's quite limited in domain, right."},{"from":540.68,"to":543.58,"location":2,"content":"So it's not easy to find translated tweets,"},{"from":543.58,"to":545.41,"location":2,"content":"um, unless you happen to work for Twitter."},{"from":545.41,"to":548.14,"location":2,"content":"Um, in addition to that, um,"},{"from":548.14,"to":552.23,"location":2,"content":"there's limitations in terms of the languages that are covered, right."},{"from":552.23,"to":554.75,"location":2,"content":"So some languages, say European languages,"},{"from":554.75,"to":556.5,"location":2,"content":"there's a lot of translation data, um,"},{"from":556.5,"to":559.18,"location":2,"content":"for other languages there's much less."},{"from":559.18,"to":562.04,"location":2,"content":"Um, so in these settings where we want to work on"},{"from":562.04,"to":565.22,"location":2,"content":"a different domain or where we want to work with a low resource language,"},{"from":565.22,"to":568,"location":2,"content":"um, we're limited by labeled data, um,"},{"from":568,"to":570.99,"location":2,"content":"but what we can do is pretty easily find unlabeled data."},{"from":570.99,"to":573.62,"location":2,"content":"Um, so it's actually a pretty solved problem, um,"},{"from":573.62,"to":577.01,"location":2,"content":"maybe not 100%, but we can with good accuracy look at"},{"from":577.01,"to":581.2,"location":2,"content":"some text and decide what language it's in and train a classifier to do that."},{"from":581.2,"to":583.61,"location":2,"content":"Um, so this means it's really easy to find"},{"from":583.61,"to":586.1,"location":2,"content":"data in any language you care about because you can just go on"},{"from":586.1,"to":588.44,"location":2,"content":"the web and essentially search for data in"},{"from":588.44,"to":595.24,"location":2,"content":"that language and acquire a large corpus of monolingual data."},{"from":595.24,"to":600.77,"location":2,"content":"Okay, um, I'm now going into the first approach,"},{"from":600.77,"to":603.1,"location":2,"content":"um, I'm going to talk about on using"},{"from":603.1,"to":606.37,"location":2,"content":"unlabeled data to improve machine translation models."},{"from":606.37,"to":609.41,"location":2,"content":"Um, this technique is called pre-training and it's"},{"from":609.41,"to":612.79,"location":2,"content":"really reminiscent of ideas like, um, ELMo."},{"from":612.79,"to":616.58,"location":2,"content":"Um, the idea is to pre-train by doing language modeling."},{"from":616.58,"to":618.35,"location":2,"content":"So if we have, um,"},{"from":618.35,"to":621.35,"location":2,"content":"two languages we're interested in translating,"},{"from":621.35,"to":622.53,"location":2,"content":"um, from one end to the other,"},{"from":622.53,"to":627.48,"location":2,"content":"we'll collect large datasets for both of those languages and then we can train,"},{"from":627.48,"to":629.04,"location":2,"content":"uh, two language models,"},{"from":629.04,"to":633.37,"location":2,"content":"one each on that data and then, um,"},{"from":633.37,"to":634.49,"location":2,"content":"we can use those, uh,"},{"from":634.49,"to":638.45,"location":2,"content":"pre-trained language models as initialization for a machine translation system."},{"from":638.45,"to":641.72,"location":2,"content":"Um, so the encoder will get initialized with"},{"from":641.72,"to":645.49,"location":2,"content":"the weights of the language model trained on the source side language, um,"},{"from":645.49,"to":649.83,"location":2,"content":"the decoder will get initialized with weights trained on the target size language, uh,"},{"from":649.83,"to":651.23,"location":2,"content":"and this will, um,"},{"from":651.23,"to":655.49,"location":2,"content":"improve the performance of your model because during this pre-training, um,"},{"from":655.49,"to":659.75,"location":2,"content":"we hope that our language models will be learning useful information such as, you know,"},{"from":659.75,"to":662.46,"location":2,"content":"the meaning of words or, um, uh,"},{"from":662.46,"to":665.25,"location":2,"content":"the kind of structure of the language, um,"},{"from":665.25,"to":669.02,"location":2,"content":"they are processing, um, and this can, uh,"},{"from":669.02,"to":672.41,"location":2,"content":"down the line help the machine translation model,"},{"from":672.41,"to":675.02,"location":2,"content":"um, when we fine tune it."},{"from":675.02,"to":677.46,"location":2,"content":"Um, let me pause here and ask if there are any questions,"},{"from":677.46,"to":678.62,"location":2,"content":"and just in general, feel,"},{"from":678.62,"to":685.92,"location":2,"content":"feel free to ask questions throughout this talk. Okay."},{"from":685.92,"to":693.38,"location":2,"content":"So, so here is a plot showing some results of this pre-training technique."},{"from":693.38,"to":696.04,"location":2,"content":"Um, so this is English to German translation."},{"from":696.04,"to":699.8,"location":2,"content":"Uh, the x-axis is how much training data,"},{"from":699.8,"to":701.92,"location":2,"content":"as in unsupervised training data, um,"},{"from":701.92,"to":703.08,"location":2,"content":"you provide these models,"},{"from":703.08,"to":705.36,"location":2,"content":"but of course they also have large amounts"},{"from":705.36,"to":708.94,"location":2,"content":"of monolingual data for this pre-training step."},{"from":708.94,"to":711.97,"location":2,"content":"And you can see that this works pretty well, right?"},{"from":711.97,"to":714.45,"location":2,"content":"So you've got about two blue points, um,"},{"from":714.45,"to":717.67,"location":2,"content":"increase in performance, so that's this red line above the blue line,"},{"from":717.67,"to":720.17,"location":2,"content":"um, when doing this pre-training technique."},{"from":720.17,"to":721.69,"location":2,"content":"And not too surprisingly,"},{"from":721.69,"to":730.35,"location":2,"content":"this gain is especially large when the amount of labeled data is small."},{"from":730.35,"to":734.08,"location":2,"content":"Um, there is a problem with,"},{"from":734.08,"to":737.26,"location":2,"content":"uh, pre-training which I want to address, which is that, uh,"},{"from":737.26,"to":738.85,"location":2,"content":"in pre-training, you have"},{"from":738.85,"to":740.89,"location":2,"content":"these two separate language models and there's never"},{"from":740.89,"to":743.03,"location":2,"content":"really any interaction between the two,"},{"from":743.03,"to":745.78,"location":2,"content":"um, when you're running them on the unlabeled corpus."},{"from":745.78,"to":748.43,"location":2,"content":"Um, so here's a simple technique, um,"},{"from":748.43,"to":752.49,"location":2,"content":"that tries to solve this problem and it's called self-training."},{"from":752.49,"to":757.09,"location":2,"content":"Um, the idea is given a sentence from our monolingual corpus,"},{"from":757.09,"to":760.21,"location":2,"content":"so in this case, \"I traveled to Belgium,\" that's an English sentence."},{"from":760.21,"to":765.4,"location":2,"content":"Um, we won't have a human provided translation for this sentence, uh,"},{"from":765.4,"to":768.92,"location":2,"content":"but what we can do is we can run our machine translation model,"},{"from":768.92,"to":772.75,"location":2,"content":"and we'll get a translation in the target language."},{"from":772.75,"to":776.32,"location":2,"content":"Um, since this is from a machine learning model it won't be perfect, uh,"},{"from":776.32,"to":780.16,"location":2,"content":"but we can hope that maybe our model can still learn from this kind"},{"from":780.16,"to":783.58,"location":2,"content":"of noisy labeled example, right?"},{"from":783.58,"to":785.27,"location":2,"content":"So we, we treat, um,"},{"from":785.27,"to":788.23,"location":2,"content":"our original monolingual sentence and it's machine-provided"},{"from":788.23,"to":792.49,"location":2,"content":"translation as though it were a human-provided translation and,"},{"from":792.49,"to":799.8,"location":2,"content":"uh, train our machine learning model as normal on this example."},{"from":799.8,"to":804.19,"location":2,"content":"Um, I think this seems pretty strange actually as- as"},{"from":804.19,"to":807.97,"location":2,"content":"a method when you first see it because it seems really circular, right?"},{"from":807.97,"to":811.32,"location":2,"content":"So if you look at this, um, the, uh,"},{"from":811.32,"to":813.85,"location":2,"content":"translation that the model is being trained to"},{"from":813.85,"to":818.1,"location":2,"content":"produce is actually exactly what it already produces to begin with,"},{"from":818.1,"to":823.42,"location":2,"content":"right, because, um, this translation came from our model in the first place."},{"from":823.42,"to":825.7,"location":2,"content":"Um, so actually in practice,"},{"from":825.7,"to":829.48,"location":2,"content":"this is not a technique that's very widely used due to this problem,"},{"from":829.48,"to":833.37,"location":2,"content":"um, but it motivates another technique called back-translation."},{"from":833.37,"to":836.74,"location":2,"content":"And this technique is really a very popular, um,"},{"from":836.74,"to":839.95,"location":2,"content":"solution to that problem, and it's the method, um,"},{"from":839.95,"to":844.24,"location":2,"content":"that has had a lot of success in using unlabeled data for translation."},{"from":844.24,"to":846.94,"location":2,"content":"So here's the approach rather than only"},{"from":846.94,"to":850.86,"location":2,"content":"having our translation system that goes from source language to target language,"},{"from":850.86,"to":853.21,"location":2,"content":"um, we're also going to train a model that"},{"from":853.21,"to":856.38,"location":2,"content":"goes from our target language to our source language."},{"from":856.38,"to":858.67,"location":2,"content":"And so in this case, if,"},{"from":858.67,"to":861.34,"location":2,"content":"if at the end of the day we want a French to English model, um,"},{"from":861.34,"to":864.91,"location":2,"content":"we're gonna start by actually training an English to French model."},{"from":864.91,"to":867.88,"location":2,"content":"And then we can do something that's a lot like self-labeling."},{"from":867.88,"to":870.21,"location":2,"content":"So we take a English sentence."},{"from":870.21,"to":873.37,"location":2,"content":"We run our English to French model and translate it."},{"from":873.37,"to":875.95,"location":2,"content":"The difference to what we did before is that"},{"from":875.95,"to":878.5,"location":2,"content":"we're actually going to switch the source and target side."},{"from":878.5,"to":882.64,"location":2,"content":"So now in this case the French sentence is the source sequence."},{"from":882.64,"to":885.99,"location":2,"content":"Uh, the target sequence is, um,"},{"from":885.99,"to":890.74,"location":2,"content":"our original English sentence that came from monolingual corpora."},{"from":890.74,"to":892.16,"location":2,"content":"And now we're training the language, uh,"},{"from":892.16,"to":894.04,"location":2,"content":"the machine translation system that goes"},{"from":894.04,"to":897.26,"location":2,"content":"the other direction so that goes French to English."},{"from":897.26,"to":900.45,"location":2,"content":"Um, so, so why do we think this will work better?"},{"from":900.45,"to":902.32,"location":2,"content":"Um, number one, um,"},{"from":902.32,"to":905.23,"location":2,"content":"there's no longer this kind of circularity to the training"},{"from":905.23,"to":910.21,"location":2,"content":"because what the model is being trained on is the output of a completely different model."},{"from":910.21,"to":914.85,"location":2,"content":"Um, another thing that I think is pretty crucial here is that,"},{"from":914.85,"to":918.97,"location":2,"content":"um, the translations, the model is trained to produce."},{"from":918.97,"to":921.52,"location":2,"content":"So the things that the decoder is actually learning to"},{"from":921.52,"to":924.43,"location":2,"content":"generate are never bad translations, right?"},{"from":924.43,"to":926.57,"location":2,"content":"So if you look at this example,"},{"from":926.57,"to":929.54,"location":2,"content":"the target sequence for our French to English model,"},{"from":929.54,"to":931.16,"location":2,"content":"I traveled to Belgium, um,"},{"from":931.16,"to":934.64,"location":2,"content":"that originally came from a monolingual corpus."},{"from":934.64,"to":937.42,"location":2,"content":"Um, so I think intuitively this makes sense is"},{"from":937.42,"to":940.43,"location":2,"content":"that if we want to train a good translation model,"},{"from":940.43,"to":944.62,"location":2,"content":"um, it's probably okay to expose it to noisy inputs."},{"from":944.62,"to":947.51,"location":2,"content":"So we expose it to the output of a system that's English to French,"},{"from":947.51,"to":948.73,"location":2,"content":"it might not be perfect."},{"from":948.73,"to":952.33,"location":2,"content":"Um, but what we don't want to do is um, expose it to"},{"from":952.33,"to":954.85,"location":2,"content":"poor target sequences because then it"},{"from":954.85,"to":958.56,"location":2,"content":"won't learn how to generate in that language effectively."},{"from":958.56,"to":964.3,"location":2,"content":"Any questions on back-translation before I get to results? Um, sure."},{"from":964.3,"to":968.98,"location":2,"content":"[BACKGROUND]"},{"from":968.98,"to":971.5,"location":2,"content":"So this is assuming we have a large corpus of"},{"from":971.5,"to":977.33,"location":2,"content":"unlabeled data and we want to be using it to help our translation model."},{"from":977.33,"to":979.88,"location":2,"content":"Does that, does that make sense?"},{"from":979.88,"to":983.34,"location":2,"content":"Um, maybe you could clarify the question."},{"from":983.34,"to":989.16,"location":2,"content":"[BACKGROUND]"},{"from":989.16,"to":992.83,"location":2,"content":"Yeah, that's right. So we have a big corpus of English which includes the sentence,"},{"from":992.83,"to":996.19,"location":2,"content":"\"I traveled to Belgium,\" and we don't know the translations but we'd still like to"},{"from":996.19,"to":999.63,"location":2,"content":"use this data. Yeah, another question."},{"from":999.63,"to":1005.28,"location":2,"content":"[BACKGROUND]"},{"from":1005.28,"to":1007.11,"location":2,"content":"Yeah, so that's a good question is how do you"},{"from":1007.11,"to":1012.3,"location":2,"content":"avoid both the models let's say sort of blowing up and producing garbage?"},{"from":1012.3,"to":1014.4,"location":2,"content":"And then they're just feeding garbage to each other."},{"from":1014.4,"to":1017.82,"location":2,"content":"The answer is that there is some amount of labeled data here as well."},{"from":1017.82,"to":1020.82,"location":2,"content":"So on unlabeled data you do this, but on labeled data,"},{"from":1020.82,"to":1022.11,"location":2,"content":"you do standard training,"},{"from":1022.11,"to":1024.8,"location":2,"content":"and that way you avoid, you,"},{"from":1024.8,"to":1027.9,"location":2,"content":"you make sure you kind of keep the models on track because they still have to fit to"},{"from":1027.9,"to":1032.17,"location":2,"content":"the labeled data. Yeah, another question."},{"from":1032.17,"to":1035.47,"location":2,"content":"How do you schedule the training of the two models?"},{"from":1035.47,"to":1037.5,"location":2,"content":"Yeah, that is a good question."},{"from":1037.5,"to":1041.58,"location":2,"content":"And I think that's basically almost like a hyper-parameter you can tweak."},{"from":1041.58,"to":1045.72,"location":2,"content":"So I think a pretty common thing to do is first,"},{"from":1045.72,"to":1048.27,"location":2,"content":"train two models only on labeled data."},{"from":1048.27,"to":1052.96,"location":2,"content":"Then label, um, so then do back-translation"},{"from":1052.96,"to":1057.48,"location":2,"content":"over a large corpus and kind of repeat that process over and over again."},{"from":1057.48,"to":1060.16,"location":2,"content":"So each iteration, you train on the label data,"},{"from":1060.16,"to":1063.51,"location":2,"content":"label some unlabeled data and now you have more data to work with."},{"from":1063.51,"to":1066.27,"location":2,"content":"But I think there'd be many kinds of scheduling that would be effective"},{"from":1066.27,"to":1070.38,"location":2,"content":"here. Okay. Another question."},{"from":1070.38,"to":1086.1,"location":2,"content":"I'm curious as to the evaluation, considering if you have a very good French to English model, you could try to look up, or contest if you have a good French to English model, you could try to look up the original source and see if it matches."},{"from":1086.1,"to":1087.43,"location":2,"content":"Yeah, I'm not, I'm not quite sure."},{"from":1087.43,"to":1090.13,"location":2,"content":"Are you suggesting going like English to French to English and seeing if?"},{"from":1090.13,"to":1091.63,"location":2,"content":"I see, yeah, yeah,"},{"from":1091.63,"to":1092.78,"location":2,"content":"that's a really interesting idea."},{"from":1092.78,"to":1095.78,"location":2,"content":"And we're actually going to talk a little bit about this sort of,"},{"from":1095.78,"to":1097.29,"location":2,"content":"it's called cycle consistency,"},{"from":1097.29,"to":1100.97,"location":2,"content":"this idea later in this talk."},{"from":1100.97,"to":1103.77,"location":2,"content":"Okay, I'm going to move on to the results."},{"from":1103.77,"to":1108.12,"location":2,"content":"So, so here's the method for using unlabeled data to improve translation."},{"from":1108.12,"to":1109.89,"location":2,"content":"How well does it do?"},{"from":1109.89,"to":1113.22,"location":2,"content":"Um, the answer is that the improvements are at least to me, they"},{"from":1113.22,"to":1116.49,"location":2,"content":"were surprisingly extremely good, right?"},{"from":1116.49,"to":1119.44,"location":2,"content":"So, um, this is for English to German translation."},{"from":1119.44,"to":1124.52,"location":2,"content":"This is from some work by Facebook, so they used 5 million labeled sentence pairs."},{"from":1124.52,"to":1132.29,"location":2,"content":"But they also used 230 monolingual sentences, so sentences without translations."},{"from":1132.29,"to":1136.42,"location":2,"content":"And you can see that compared to previous state of the art,"},{"from":1136.42,"to":1139.76,"location":2,"content":"they get six BLEU points improvement which, um,"},{"from":1139.76,"to":1143.01,"location":2,"content":"if you compare it to most previous research and machine tran- machine translation"},{"from":1143.01,"to":1144.18,"location":2,"content":"is a really big gain, right?"},{"from":1144.18,"to":1148.02,"location":2,"content":"So even something like the invention of the transformer which most people would"},{"from":1148.02,"to":1153.16,"location":2,"content":"consider to be a really significant research development in NLP,"},{"from":1153.16,"to":1156.83,"location":2,"content":"that improved over prior work by about 2.5 BLEU points."},{"from":1156.83,"to":1162.33,"location":2,"content":"And here without doing any sort of fancy model design just by using way more data,"},{"from":1162.33,"to":1169.13,"location":2,"content":"um, we get actually much larger improvements."},{"from":1169.13,"to":1174.39,"location":2,"content":"Okay. So an interesting question to think about,"},{"from":1174.39,"to":1178.13,"location":2,"content":"um, is suppose we only have our monolingual corpora."},{"from":1178.13,"to":1181.15,"location":2,"content":"So we don't have any sentences that had been human translated."},{"from":1181.15,"to":1183.39,"location":2,"content":"We just have sentences in two languages."},{"from":1183.39,"to":1187.08,"location":2,"content":"Um, so the scenario you can sort of imagine is suppose,"},{"from":1187.08,"to":1188.98,"location":2,"content":"um, an alien comes down and,"},{"from":1188.98,"to":1190.74,"location":2,"content":"um, starts talking to you and it's a"},{"from":1190.74,"to":1193.96,"location":2,"content":"weird alien language, um, and it talks a lot,"},{"from":1193.96,"to":1198.12,"location":2,"content":"would you eventually be able to translate what it's saying to English,"},{"from":1198.12,"to":1203.3,"location":2,"content":"um, just by having a really large amount of data?"},{"from":1203.3,"to":1206.2,"location":2,"content":"Um, so I'm going to start with, um,"},{"from":1206.2,"to":1211.93,"location":2,"content":"a simpler task than full-on translating when you only have unlabeled sentences."},{"from":1211.93,"to":1215.22,"location":2,"content":"Um, instead of doing sentence to sentence translation,"},{"from":1215.22,"to":1218.64,"location":2,"content":"let's start by only worrying about word to word translation."},{"from":1218.64,"to":1221.49,"location":2,"content":"So the goal here is given a word in one language,"},{"from":1221.49,"to":1225.33,"location":2,"content":"find its translation but without using any labeled data."},{"from":1225.33,"to":1227.1,"location":2,"content":"Um, and the method,"},{"from":1227.1,"to":1229.44,"location":2,"content":"the method we're going to use to try to solve"},{"from":1229.44,"to":1233.46,"location":2,"content":"this task is called, uh, cross-lingual embeddings."},{"from":1233.46,"to":1235.83,"location":2,"content":"Um, so the goal is to learn, uh,"},{"from":1235.83,"to":1239.27,"location":2,"content":"word vectors for words in both languages,"},{"from":1239.27,"to":1241.93,"location":2,"content":"and we'd like those word vectors to have"},{"from":1241.93,"to":1245.55,"location":2,"content":"all the nice properties you've already learned about word vectors having, um,"},{"from":1245.55,"to":1249.15,"location":2,"content":"but we also want word vectors for a particular language,"},{"from":1249.15,"to":1252.86,"location":2,"content":"um, to be close to the word vector of its translation."},{"from":1252.86,"to":1257.09,"location":2,"content":"Um, so I'm not sure if it's visible in this figure but this fis- figure shows"},{"from":1257.09,"to":1262.47,"location":2,"content":"a large number of English and I think German words and you can see that,"},{"from":1262.47,"to":1267.8,"location":2,"content":"um, uh, the each English word has its corresponding German word,"},{"from":1267.8,"to":1270.33,"location":2,"content":"um, nearby to it in its embedding space."},{"from":1270.33,"to":1275.01,"location":2,"content":"So if we learn embeddings like this then it's pretty easy to do word to word translation."},{"from":1275.01,"to":1276.7,"location":2,"content":"Um, we just pick an English word,"},{"from":1276.7,"to":1278.55,"location":2,"content":"we find the nearest, uh,"},{"from":1278.55,"to":1282.08,"location":2,"content":"German word in this joint embedding space"},{"from":1282.08,"to":1288.47,"location":2,"content":"and that will give us a translation for the English word."},{"from":1288.47,"to":1292.18,"location":2,"content":"Um, our key method for or the key"},{"from":1292.18,"to":1295.5,"location":2,"content":"assumption that we're going to be using to solve this is that,"},{"from":1295.5,"to":1300.87,"location":2,"content":"um, th- even though if you run word2vec twice you'll get really different embeddings."},{"from":1300.87,"to":1306.93,"location":2,"content":"Um, the structure of that embedding space has a lot of regularity to it,"},{"from":1306.93,"to":1309.67,"location":2,"content":"and we can take advantage of that regularity, um,"},{"from":1309.67,"to":1311.7,"location":2,"content":"to help find when,"},{"from":1311.7,"to":1314.37,"location":2,"content":"um, an alignment between those embedding spaces."},{"from":1314.37,"to":1316.83,"location":2,"content":"So to be kind of more concrete here."},{"from":1316.83,"to":1319.56,"location":2,"content":"Here is a picture of two sets of word embeddings."},{"from":1319.56,"to":1320.82,"location":2,"content":"So in red, we have, um,"},{"from":1320.82,"to":1322.65,"location":2,"content":"English words, in, uh,"},{"from":1322.65,"to":1324.57,"location":2,"content":"blue we have Italian words,"},{"from":1324.57,"to":1329.28,"location":2,"content":"and although, um, the vector spaces right now look very different to each other,"},{"from":1329.28,"to":1332.4,"location":2,"content":"um, you can see that they have a really similar structure, right?"},{"from":1332.4,"to":1336.73,"location":2,"content":"So you'd imagine distances are kind of similar that the distance from,"},{"from":1336.73,"to":1339.35,"location":2,"content":"uh, cat and feline in the, um,"},{"from":1339.35,"to":1342.57,"location":2,"content":"English embedding space should be pretty similar to the distance"},{"from":1342.57,"to":1347.88,"location":2,"content":"between gatto and felino in the, um, Italian space."},{"from":1347.88,"to":1355.4,"location":2,"content":"Um, this kind of motivates an algorithm for learning these cross-lingual embeddings."},{"from":1355.4,"to":1358.44,"location":2,"content":"Um, so here's the idea."},{"from":1358.44,"to":1360.96,"location":2,"content":"What we're going to try to do is learn what's essentially"},{"from":1360.96,"to":1364.08,"location":2,"content":"a rotation such that we can transform,"},{"from":1364.08,"to":1366.66,"location":2,"content":"um, our set of English embeddings so"},{"from":1366.66,"to":1370.52,"location":2,"content":"that they match up with our Italian embe- embeddings."},{"from":1370.52,"to":1372.78,"location":2,"content":"So mathematically, what this means is we're gonna learn"},{"from":1372.78,"to":1375.66,"location":2,"content":"a matrix W such that if we take let's say,"},{"from":1375.66,"to":1380.36,"location":2,"content":"uh, the word vector for cat in English and we multiply it by W, um,"},{"from":1380.36,"to":1386.2,"location":2,"content":"we end up with the vector for gatto in Spanish or Italian,"},{"from":1386.2,"to":1389.55,"location":2,"content":"um, and a detail here is that, um,"},{"from":1389.55,"to":1392.58,"location":2,"content":"we're going to constrain W to be orthogonal, um,"},{"from":1392.58,"to":1395.07,"location":2,"content":"and what that means geometrically is just that W is"},{"from":1395.07,"to":1397.98,"location":2,"content":"only going to be doing a rotation to the,"},{"from":1397.98,"to":1399.94,"location":2,"content":"uh, vectors, um, in X."},{"from":1399.94,"to":1404.87,"location":2,"content":"It's not going to be doing some other weirder transformation."},{"from":1404.87,"to":1409.31,"location":2,"content":"So this is our goal is to learn this W. Um,"},{"from":1409.31,"to":1411,"location":2,"content":"next I'm gonna talk about,"},{"from":1411,"to":1416.98,"location":2,"content":"talking about how actually do we learn this W. Um,"},{"from":1416.98,"to":1421.66,"location":2,"content":"and there's actually a bunch of techniques for learning this W matrix,"},{"from":1421.66,"to":1424.74,"location":2,"content":"um, but, um, here is one of"},{"from":1424.74,"to":1428.31,"location":2,"content":"them that I think is quite clever is called adversarial training."},{"from":1428.31,"to":1430.63,"location":2,"content":"Um, so it works as follows,"},{"from":1430.63,"to":1433.77,"location":2,"content":"is in addition to trying to learn this W matrix,"},{"from":1433.77,"to":1437.67,"location":2,"content":"we're also going to be trying to learn a model that, uh,"},{"from":1437.67,"to":1438.91,"location":2,"content":"is called a discriminator,"},{"from":1438.91,"to":1442.8,"location":2,"content":"and what it'll do is take a vector and it will try to predict,"},{"from":1442.8,"to":1445.08,"location":2,"content":"is that vector originally, um,"},{"from":1445.08,"to":1448.83,"location":2,"content":"an English word embedding or is it originally an Italian word embedding?"},{"from":1448.83,"to":1451.42,"location":2,"content":"Um, in other words, if you think about, um,"},{"from":1451.42,"to":1454.92,"location":2,"content":"the diagram, what we're asking our discriminator to do is, uh,"},{"from":1454.92,"to":1457.68,"location":2,"content":"it's given one of these points and it's trying to predict is it"},{"from":1457.68,"to":1461.06,"location":2,"content":"basically a red point so an English word originally, or is it a blue point?"},{"from":1461.06,"to":1464.01,"location":2,"content":"Um, so if we have no W matrix and this is"},{"from":1464.01,"to":1467.19,"location":2,"content":"a really easy task for the discriminator because,"},{"from":1467.19,"to":1472.42,"location":2,"content":"um, the, uh, word embeddings for English and Italian are clearly separated."},{"from":1472.42,"to":1476.13,"location":2,"content":"Um, however, if we learn a W matrix"},{"from":1476.13,"to":1479.95,"location":2,"content":"that succeeds in aligning all these embeddings on top of each other,"},{"from":1479.95,"to":1483.27,"location":2,"content":"then our discriminator will never do a good job, right."},{"from":1483.27,"to":1486.21,"location":2,"content":"We can imagine it'll never really do better than 50%,"},{"from":1486.21,"to":1488.84,"location":2,"content":"um, because given a vector for say cat,"},{"from":1488.84,"to":1491.19,"location":2,"content":"it won't know is that the vector for cat that's been"},{"from":1491.19,"to":1494.13,"location":2,"content":"transformed by W or is it actually the vector for gatto?"},{"from":1494.13,"to":1498.88,"location":2,"content":"Um, because in this case those two vectors are aligned so they are on top of each other."},{"from":1498.88,"to":1503.71,"location":2,"content":"Um, so, um, during training, you first, um,"},{"from":1503.71,"to":1506.79,"location":2,"content":"you alternate between training the discriminator a little bit which"},{"from":1506.79,"to":1509.64,"location":2,"content":"means making sure it's as good as possible at"},{"from":1509.64,"to":1513.12,"location":2,"content":"distinguishing the English from Italian words and then you"},{"from":1513.12,"to":1516.93,"location":2,"content":"train the W and the goal for training W is to,"},{"from":1516.93,"to":1520.05,"location":2,"content":"uh, essentially confuse the discriminator as much as possible."},{"from":1520.05,"to":1523.21,"location":2,"content":"Um, so you want to have a situation where,"},{"from":1523.21,"to":1526.17,"location":2,"content":"um, you can't, um, with this machine learning model,"},{"from":1526.17,"to":1529.29,"location":2,"content":"figure out if a word embedding actually, um,"},{"from":1529.29,"to":1533.63,"location":2,"content":"was, um, originally from English or if it's an Italian word vector."},{"from":1533.63,"to":1536.09,"location":2,"content":"Um, and so at the end of the day you have,"},{"from":1536.09,"to":1539.42,"location":2,"content":"you have vectors that are kind of aligned with each other."},{"from":1539.42,"to":1547.22,"location":2,"content":"Um, any questions about this approach?"},{"from":1547.22,"to":1550.65,"location":2,"content":"Okay. Um, he- there's a link to a paper with more details."},{"from":1550.65,"to":1553.28,"location":2,"content":"There's actually kind of a range of other tricks you can do,"},{"from":1553.28,"to":1558.44,"location":2,"content":"um, but this is kind of a key idea."},{"from":1558.44,"to":1564.81,"location":2,"content":"Um, okay. So that was doing word to word unsupervised translation."},{"from":1564.81,"to":1569.15,"location":2,"content":"Um, how do we do full sentence to sentence translation?"},{"from":1569.15,"to":1571.72,"location":2,"content":"Um, so we're going to use, um,"},{"from":1571.72,"to":1573.75,"location":2,"content":"a standard sort of seq2seq model,"},{"from":1573.75,"to":1576.66,"location":2,"content":"um, without even an attention mechanism."},{"from":1576.66,"to":1579.9,"location":2,"content":"Um, there's one change to the standard seq2seq"},{"from":1579.9,"to":1583.05,"location":2,"content":"model going on here which is that, um,"},{"from":1583.05,"to":1585.78,"location":2,"content":"we're going to use the same encoder and decoder,"},{"from":1585.78,"to":1590.16,"location":2,"content":"uh, regardless of the input and output languages."},{"from":1590.16,"to":1591.93,"location":2,"content":"So you can see, um,"},{"from":1591.93,"to":1593.34,"location":2,"content":"in this example, um,"},{"from":1593.34,"to":1595.82,"location":2,"content":"we could give the encoder an English sentence,"},{"from":1595.82,"to":1600.36,"location":2,"content":"we could also give it a French sentence and it'll have these cross-lingual embeddings."},{"from":1600.36,"to":1603.26,"location":2,"content":"So it'll have vector representations for English words"},{"from":1603.26,"to":1607.14,"location":2,"content":"and French words which means it can handle sort of any input."},{"from":1607.14,"to":1609.38,"location":2,"content":"Um, for the decoder,"},{"from":1609.38,"to":1612.93,"location":2,"content":"we need to give it some information about what language is it supposed to generate in."},{"from":1612.93,"to":1614.95,"location":2,"content":"Is it going to generate in French or English?"},{"from":1614.95,"to":1618.66,"location":2,"content":"Um, so the way that is done is by, uh,"},{"from":1618.66,"to":1621.91,"location":2,"content":"feeding in a special token which here is Fr"},{"from":1621.91,"to":1625.59,"location":2,"content":"in brack- brackets to represent French that tells the model,"},{"from":1625.59,"to":1627.97,"location":2,"content":"okay, you should generate in French now."},{"from":1627.97,"to":1631.38,"location":2,"content":"Um, here in this figure it's only French,"},{"from":1631.38,"to":1633.97,"location":2,"content":"but you could imagine also feeding this model, uh,"},{"from":1633.97,"to":1637.63,"location":2,"content":"English in brackets, and then that'll tell it to, uh, generate English."},{"from":1637.63,"to":1641.67,"location":2,"content":"And one thing that you can see is that you could use this sort of model to g enerate,"},{"from":1641.67,"to":1643.15,"location":2,"content":"do go from English to French."},{"from":1643.15,"to":1645.45,"location":2,"content":"You could also use this model as an auto-encoder, right."},{"from":1645.45,"to":1647.3,"location":2,"content":"So, uh, at the bottom, um,"},{"from":1647.3,"to":1651.51,"location":2,"content":"it's taking in a French sentence as input and it's just generating French as"},{"from":1651.51,"to":1658.85,"location":2,"content":"output which here means just reproducing the original input sequence."},{"from":1658.85,"to":1663.1,"location":2,"content":"Um, so just a small change to standard seq2seq."},{"from":1663.1,"to":1666.77,"location":2,"content":"Here's how we're going to train the seq2seq model."},{"from":1666.77,"to":1670.17,"location":2,"content":"Um, there's going to be two training objectives, um,"},{"from":1670.17,"to":1671.94,"location":2,"content":"and I'll explain sort of why they're, uh,"},{"from":1671.94,"to":1675.06,"location":2,"content":"present in this model in just a few slides."},{"from":1675.06,"to":1677.03,"location":2,"content":"For now let's just say what they are."},{"from":1677.03,"to":1679.11,"location":2,"content":"So the first one is, um,"},{"from":1679.11,"to":1681.16,"location":2,"content":"called a de-noising autoencoder."},{"from":1681.16,"to":1686.43,"location":2,"content":"Um, what we're going to train our model to do in this case is take a, uh, sentence."},{"from":1686.43,"to":1688.14,"location":2,"content":"So, um, and here it's going to be"},{"from":1688.14,"to":1690.8,"location":2,"content":"an English sentence but it could also be a French sentence."},{"from":1690.8,"to":1694.17,"location":2,"content":"Um, we're going to scramble up the words a little bit,"},{"from":1694.17,"to":1696.88,"location":2,"content":"and then we're going to ask the model to, uh,"},{"from":1696.88,"to":1700.56,"location":2,"content":"de-noise that sentence which in other words means"},{"from":1700.56,"to":1705.35,"location":2,"content":"regenerating what the sentence actually was before it was scrambled."},{"from":1705.35,"to":1711.74,"location":2,"content":"And, uh, maybe one idea of why this would be a useful training objective is that,"},{"from":1711.74,"to":1715.51,"location":2,"content":"uh, since we have an encoder-decoder without atten- attention,"},{"from":1715.51,"to":1721.78,"location":2,"content":"the encoder is converting the entirety of the source sentence into a single vector,"},{"from":1721.78,"to":1727.11,"location":2,"content":"what an auto-encoder does is ensure that that vector contains all the information about"},{"from":1727.11,"to":1732.39,"location":2,"content":"the sentence such that we are able to recover what the original sentence was,"},{"from":1732.39,"to":1737.96,"location":2,"content":"um, from the vector produced by the encoder."},{"from":1737.96,"to":1740.81,"location":2,"content":"Um, so that was objective 1."},{"from":1740.81,"to":1745.01,"location":2,"content":"Training objective 2 is now we're actually going to be trying to do a translation,"},{"from":1745.01,"to":1747.48,"location":2,"content":"um, but, um, as before,"},{"from":1747.48,"to":1749.78,"location":2,"content":"we're going to be using this back-translation idea."},{"from":1749.78,"to":1752.97,"location":2,"content":"So remember, we only have unlabeled sentences,"},{"from":1752.97,"to":1756.02,"location":2,"content":"we don't have any human-provided translations,"},{"from":1756.02,"to":1759.75,"location":2,"content":"um, but what we can still do is, given, a,"},{"from":1759.75,"to":1762,"location":2,"content":"um, let's say an English sentence or let's say a French sentence,"},{"from":1762,"to":1764.51,"location":2,"content":"given a French sentence, we can translate it to English, um,"},{"from":1764.51,"to":1768.12,"location":2,"content":"using our model in its current state, uh,"},{"from":1768.12,"to":1772.61,"location":2,"content":"and then we can ask that model to translate from English or translate that- yeah,"},{"from":1772.61,"to":1774.69,"location":2,"content":"translate that English back into French."},{"from":1774.69,"to":1777.11,"location":2,"content":"Um, so what you can imagine is in this setting, um,"},{"from":1777.11,"to":1779.64,"location":2,"content":"the input sequence is going to be somewhat messed"},{"from":1779.64,"to":1782.82,"location":2,"content":"up because it's the output of our imperfect machine learning model."},{"from":1782.82,"to":1787.05,"location":2,"content":"So here the input sequence is just \"I am student,\" um, a word has been dropped,"},{"from":1787.05,"to":1791.82,"location":2,"content":"but, um, we're now gonna train it to, even with this kind of bad input,"},{"from":1791.82,"to":1795.33,"location":2,"content":"to reproduce the original, um,"},{"from":1795.33,"to":1798.27,"location":2,"content":"French sentence, um, from our,"},{"from":1798.27,"to":1801.27,"location":2,"content":"uh, corpus of- of monolingual, um, French text."},{"from":1801.27,"to":1808.91,"location":2,"content":"[NOISE] Um, let me- let me pause here actually and ask for questions."},{"from":1808.91,"to":1813.84,"location":2,"content":"Sure."},{"from":1813.84,"to":1816,"location":2,"content":"[NOISE] [inaudible] What if, um, the reason you have"},{"from":1816,"to":1820.31,"location":2,"content":"this orthogonality constraint for your words to be word embedding,"},{"from":1820.31,"to":1822.9,"location":2,"content":"is it to avoid overfitting?"},{"from":1822.9,"to":1829.8,"location":2,"content":"Have you tried to take that off, and you know, see what [inaudible]"},{"from":1829.8,"to":1831.06,"location":2,"content":"Yeah. That's a good question."},{"from":1831.06,"to":1835.31,"location":2,"content":"Um, so this is going back to earlier when there was a word-word translation."},{"from":1835.31,"to":1839.33,"location":2,"content":"Why would we constrain that W matrix to be orthogonal?"},{"from":1839.33,"to":1843.03,"location":2,"content":"Um, essentially, that's right. It's to avoid overfitting and in particular,"},{"from":1843.03,"to":1846.06,"location":2,"content":"it's making this assumption that our embedding spaces are so"},{"from":1846.06,"to":1850.01,"location":2,"content":"similar that there's actually just a rotation that distinguishes,"},{"from":1850.01,"to":1853.5,"location":2,"content":"um, our word vectors in English versus our word vectors in Italian."},{"from":1853.5,"to":1857.46,"location":2,"content":"Um, I think there has been, um,"},{"from":1857.46,"to":1861.36,"location":2,"content":"there have been results that don't include that orthogonality constraint,"},{"from":1861.36,"to":1864.48,"location":2,"content":"and I think it slightly hurts performance to not have that in there."},{"from":1864.48,"to":1869.13,"location":2,"content":"[NOISE] Okay."},{"from":1869.13,"to":1871.15,"location":2,"content":"Um, so- so continuing with,"},{"from":1871.15,"to":1873.77,"location":2,"content":"um, unsupervised machine translation,"},{"from":1873.77,"to":1877.29,"location":2,"content":"um, I- I gave a training method."},{"from":1877.29,"to":1880.39,"location":2,"content":"I didn't quite explain why it would work, so- so,"},{"from":1880.39,"to":1884.79,"location":2,"content":"um, here is some more intuition for- for this idea."},{"from":1884.79,"to":1887.73,"location":2,"content":"Um, so remember, um,"},{"from":1887.73,"to":1889.66,"location":2,"content":"we're going to initialize"},{"from":1889.66,"to":1893.26,"location":2,"content":"our machine translation model with these cross-lingual embeddings,"},{"from":1893.26,"to":1897.02,"location":2,"content":"which mean the English and French word should look close to identically."},{"from":1897.02,"to":1902.57,"location":2,"content":"Um, we're also using the shared, um, encoder."},{"from":1902.57,"to":1904.86,"location":2,"content":"Um, so that means if you think about it,"},{"from":1904.86,"to":1906.64,"location":2,"content":"um, at the top, we have just,"},{"from":1906.64,"to":1911.76,"location":2,"content":"a auto-encoding objective and we can certainly believe that our model can learn this."},{"from":1911.76,"to":1914.25,"location":2,"content":"Um, it's a pretty simple task."},{"from":1914.25,"to":1919.39,"location":2,"content":"Um, now imagine we're giving our model a French sentence as input instead."},{"from":1919.39,"to":1921.56,"location":2,"content":"Um, since the, uh,"},{"from":1921.56,"to":1923.85,"location":2,"content":"embeddings are going to look pretty similar,"},{"from":1923.85,"to":1926.19,"location":2,"content":"and since the encoder is the same, um,"},{"from":1926.19,"to":1929.76,"location":2,"content":"it's pretty likely that the model's representation of"},{"from":1929.76,"to":1931.95,"location":2,"content":"this French sentence should actually be very"},{"from":1931.95,"to":1935.52,"location":2,"content":"similar to the representation of the English sentence."},{"from":1935.52,"to":1939.87,"location":2,"content":"Um, so when this representation is passed into the decoder, um,"},{"from":1939.87,"to":1945.2,"location":2,"content":"we can hope that we'll get the same output as before."},{"from":1945.2,"to":1948.49,"location":2,"content":"Um, um, so here's like sort of as a starting point."},{"from":1948.49,"to":1950.87,"location":2,"content":"We- we can hope that our model, um,"},{"from":1950.87,"to":1953.43,"location":2,"content":"already is able to have some translation capability."},{"from":1953.43,"to":1957.84,"location":2,"content":"[NOISE] Um, another way of thinking about this is"},{"from":1957.84,"to":1962.36,"location":2,"content":"that what we really want our model to do is to be able to encode a sentence,"},{"from":1962.36,"to":1964.34,"location":2,"content":"such that the representation,"},{"from":1964.34,"to":1967.41,"location":2,"content":"um, is sort of a universal kind of Interlingua."},{"from":1967.41,"to":1969.88,"location":2,"content":"So a universal, um, uh,"},{"from":1969.88,"to":1973.68,"location":2,"content":"universal representation of that sentence that doesn't,"},{"from":1973.68,"to":1976.24,"location":2,"content":"uh, that's not specific to the language."},{"from":1976.24,"to":1979.79,"location":2,"content":"And so- so here's kind of a picture that's trying to get at this."},{"from":1979.79,"to":1983.16,"location":2,"content":"So our autoencoder, um, and our, um,"},{"from":1983.16,"to":1985.29,"location":2,"content":"here in our back-translation example,"},{"from":1985.29,"to":1987.21,"location":2,"content":"um, here, the target sequence is the same."},{"from":1987.21,"to":1990.09,"location":2,"content":"[NOISE] Um, so what that essentially means is"},{"from":1990.09,"to":1994.2,"location":2,"content":"that the vectors for the English sentence and the French sentence,"},{"from":1994.2,"to":1997.41,"location":2,"content":"um, are going to be trained to be the same, um, right?"},{"from":1997.41,"to":1999.64,"location":2,"content":"Because if they are different, our, uh,"},{"from":1999.64,"to":2001.52,"location":2,"content":"decoder would be generating different,"},{"from":2001.52,"to":2005.04,"location":2,"content":"uh, outputs on these two examples."},{"from":2005.04,"to":2009.64,"location":2,"content":"Um, so here- this is just another sort of intuition is that what our model is"},{"from":2009.64,"to":2011.29,"location":2,"content":"trying to learn here is kind of a way of"},{"from":2011.29,"to":2013.87,"location":2,"content":"encoding the information of a sentence in a vector,"},{"from":2013.87,"to":2017.1,"location":2,"content":"um, but in a way that is language-agnostic."},{"from":2017.1,"to":2019.46,"location":2,"content":"Um, any more questions about,"},{"from":2019.46,"to":2024.22,"location":2,"content":"uh, unsupervised machine translation?"},{"from":2024.22,"to":2030.35,"location":2,"content":"Okay. Um, so going on to results of this approach, um,"},{"from":2030.35,"to":2033.02,"location":2,"content":"here, the horizontal lines are,"},{"from":2033.02,"to":2036.86,"location":2,"content":"um, the results of an unsupervised machine translation model."},{"from":2036.86,"to":2040.78,"location":2,"content":"Um, the lines that go up are for a supervised machine translation model,"},{"from":2040.78,"to":2043.9,"location":2,"content":"um, as we give it more and more data."},{"from":2043.9,"to":2046.3,"location":2,"content":"Right? So unsurprisingly, um,"},{"from":2046.3,"to":2049.41,"location":2,"content":"given a large amount of supervised data, um,"},{"from":2049.41,"to":2051.79,"location":2,"content":"the supervised machine translation models"},{"from":2051.79,"to":2055.72,"location":2,"content":"work much better than the unsupervised machine translation model."},{"from":2055.72,"to":2059.28,"location":2,"content":"Um, but, um, the unsupervised machine translation model,"},{"from":2059.28,"to":2061.31,"location":2,"content":"actually still does quite well."},{"from":2061.31,"to":2066.99,"location":2,"content":"Um, so if you see it around 10,000 to 100,000 training examples,"},{"from":2066.99,"to":2070.57,"location":2,"content":"um, it actually does just as well or better than supervised translation,"},{"from":2070.57,"to":2073.58,"location":2,"content":"and I think that's a really promising result,"},{"from":2073.58,"to":2076.64,"location":2,"content":"uh, because if you think of, um,"},{"from":2076.64,"to":2079.55,"location":2,"content":"low-resource settings where there isn't much labeled examples, um,"},{"from":2079.55,"to":2082.28,"location":2,"content":"it suddenly becomes really nice that you can perform this well,"},{"from":2082.28,"to":2088.69,"location":2,"content":"um, without even needing to use a training set."},{"from":2088.69,"to":2091.74,"location":2,"content":"Um, another thing kind of fun you can do with,"},{"from":2091.74,"to":2095.2,"location":2,"content":"an unsupervised machine translation model is attribute transfer."},{"from":2095.2,"to":2098.86,"location":2,"content":"Um, so basically, you can, um, take, uh,"},{"from":2098.86,"to":2100.52,"location":2,"content":"collections of texts that,"},{"from":2100.52,"to":2103.19,"location":2,"content":"uh, split by any attribute you want."},{"from":2103.19,"to":2104.9,"location":2,"content":"So for example, you could go on Twitter,"},{"from":2104.9,"to":2108.65,"location":2,"content":"look at hashtags to decide which tweets are annoyed and which tweets are relaxed,"},{"from":2108.65,"to":2111.08,"location":2,"content":"and then you can treat those two corpora as"},{"from":2111.08,"to":2113.61,"location":2,"content":"text as though they were two different languages,"},{"from":2113.61,"to":2116.51,"location":2,"content":"and you can train an unsupervised machine translation model,"},{"from":2116.51,"to":2119.16,"location":2,"content":"uh, to convert from one to the other."},{"from":2119.16,"to":2122.49,"location":2,"content":"Uh, and you can see these examples, um,"},{"from":2122.49,"to":2126.65,"location":2,"content":"the model actually does a pretty good job of sort of minimally changing the sentence,"},{"from":2126.65,"to":2129.68,"location":2,"content":"kind of preserving a lot of that sentence's original semantics,"},{"from":2129.68,"to":2136.54,"location":2,"content":"um, such that the target attribute is changed."},{"from":2136.54,"to":2141.29,"location":2,"content":"Um, I also wanna throw a little bit of cold water on this idea."},{"from":2141.29,"to":2144.41,"location":2,"content":"So I do think it's really exciting and- and almost kind of"},{"from":2144.41,"to":2147.65,"location":2,"content":"mind-blowing that you can do this translation without labeled data."},{"from":2147.65,"to":2149.6,"location":2,"content":"Um, certainly, right."},{"from":2149.6,"to":2154.52,"location":2,"content":"It's really hard to imagine someone giving me a bunch of books in Italian and say, \"Okay."},{"from":2154.52,"to":2156.41,"location":2,"content":"We're in Italian,\" um, without, you know,"},{"from":2156.41,"to":2159.76,"location":2,"content":"teaching you how to specifically do the translation."},{"from":2159.76,"to":2163.95,"location":2,"content":"Um, but, um, even though these methods show promise,"},{"from":2163.95,"to":2168.09,"location":2,"content":"um, mostly they have shown promise on languages that are quite closely related."},{"from":2168.09,"to":2169.78,"location":2,"content":"So those previous results,"},{"from":2169.78,"to":2171.07,"location":2,"content":"those were all, um,"},{"from":2171.07,"to":2173.75,"location":2,"content":"some combination of English to French or English to German,"},{"from":2173.75,"to":2176.27,"location":2,"content":"um, or so on, and those languages are quite similar."},{"from":2176.27,"to":2178.28,"location":2,"content":"[NOISE] Um, so if you look at, uh,"},{"from":2178.28,"to":2180.32,"location":2,"content":"a different language pair, let's say English to Turkish,"},{"from":2180.32,"to":2184.68,"location":2,"content":"where, um, the linguistics in those two languages are quite different, uh,"},{"from":2184.68,"to":2187.61,"location":2,"content":"these methods do still work to some extent, um,"},{"from":2187.61,"to":2190.91,"location":2,"content":"so they get around five BLEU points let's say, uh,"},{"from":2190.91,"to":2193.19,"location":2,"content":"but they don't work nearly as well,"},{"from":2193.19,"to":2195.89,"location":2,"content":"um, as they do in the f- uh, i- in the other settings, right?"},{"from":2195.89,"to":2200.24,"location":2,"content":"So there's still a huge gap to purely supervised learning. Um, right?"},{"from":2200.24,"to":2201.35,"location":2,"content":"So we're probably not, you know,"},{"from":2201.35,"to":2205.04,"location":2,"content":"quite at this stage where an alien could come down and it's sort of, no problem,"},{"from":2205.04,"to":2208.22,"location":2,"content":"let's use our unsupervised machine translation system, um,"},{"from":2208.22,"to":2212.4,"location":2,"content":"but I still think that's pretty exciting progress. Um, yeah. Question?"},{"from":2212.4,"to":2215.27,"location":2,"content":"Um, so what you're saying is that the genealogy of"},{"from":2215.27,"to":2218.63,"location":2,"content":"a language might need it to superimpose worse, right?"},{"from":2218.63,"to":2221.51,"location":2,"content":"Because my original thought was that if you took, for example,"},{"from":2221.51,"to":2224.81,"location":2,"content":"like Latin, which doesn't have a word for, you know,"},{"from":2224.81,"to":2231.44,"location":2,"content":"the modern classification of car, I thought that would do more poorly. But if- but, uh, basically,"},{"from":2231.44,"to":2235.28,"location":2,"content":"what I'm asking is, do you think the English maps better to Latin"},{"from":2235.28,"to":2240.38,"location":2,"content":"because they're both related, and worse to Turkish or is it the other way around?"},{"from":2240.38,"to":2245.64,"location":2,"content":"Um, I would expect English to map quite a lot better to Latin."},{"from":2245.64,"to":2248.93,"location":2,"content":"And I think part of the issue here is that, um,"},{"from":2248.93,"to":2253.46,"location":2,"content":"the difficulty in translation I think is not really at the word level."},{"from":2253.46,"to":2255.41,"location":2,"content":"So I mean that certainly is an issue that words exist"},{"from":2255.41,"to":2257.49,"location":2,"content":"in one language that don't exist in another,"},{"from":2257.49,"to":2258.74,"location":2,"content":"um, but I think actually,"},{"from":2258.74,"to":2263.2,"location":2,"content":"more substantial differences between language is at the level of like syntax,"},{"from":2263.2,"to":2265.82,"location":2,"content":"um, um, or you know, semantics, right?"},{"from":2265.82,"to":2267.41,"location":2,"content":"How ideas are expressed."},{"from":2267.41,"to":2273.51,"location":2,"content":"Um, so- so I think I- I would expect Ital- Latin to have, you know,"},{"from":2273.51,"to":2276.02,"location":2,"content":"relatively similar syntax to English,"},{"from":2276.02,"to":2277.58,"location":2,"content":"um, compared to say Turkish,"},{"from":2277.58,"to":2279.86,"location":2,"content":"I imagine that is probably the bigger obstacle"},{"from":2279.86,"to":2287.19,"location":2,"content":"for unsupervised machine translation models."},{"from":2287.19,"to":2290.26,"location":2,"content":"Um, I'm going to really quickly go into"},{"from":2290.26,"to":2294.91,"location":2,"content":"this last recent research paper which is basically taking BERT which,"},{"from":2294.91,"to":2297.26,"location":2,"content":"which you've learned about, um, correct?"},{"from":2297.26,"to":2300.19,"location":2,"content":"Yes. Okay. And making it cross-lingual."},{"from":2300.19,"to":2303.73,"location":2,"content":"Um, so, um, here's what regular BERT is, right?"},{"from":2303.73,"to":2306.3,"location":2,"content":"We have a sequence of sentences in English."},{"from":2306.3,"to":2308.22,"location":2,"content":"We're going to mask out some of the words."},{"from":2308.22,"to":2311.5,"location":2,"content":"And we're going to ask BERT which is our transformer model, um,"},{"from":2311.5,"to":2316.68,"location":2,"content":"to essentially fill in the blanks and predict what were the words that were dropped out."},{"from":2316.68,"to":2322.99,"location":2,"content":"Um, what actually has already been done by Google is training a multilingual BERT ."},{"from":2322.99,"to":2326.84,"location":2,"content":"So what they did essentially is concatenate, um,"},{"from":2326.84,"to":2331.56,"location":2,"content":"a whole bunch of corpora in different languages and then train one model um,"},{"from":2331.56,"to":2334.78,"location":2,"content":"doing using this masked LM objective um,"},{"from":2334.78,"to":2336.32,"location":2,"content":"on all of that text at once."},{"from":2336.32,"to":2338.3,"location":2,"content":"And that's a publicly released model."},{"from":2338.3,"to":2343.49,"location":2,"content":"Um, the, the new kind of extension to this that has recently been uh,"},{"from":2343.49,"to":2346.3,"location":2,"content":"proposed by Facebook is to actually combine"},{"from":2346.3,"to":2350.97,"location":2,"content":"this masked LM training objective um, with uh, translation."},{"from":2350.97,"to":2357.13,"location":2,"content":"So what they do is sometimes give this model a in this case,"},{"from":2357.13,"to":2361.06,"location":2,"content":"a sequence in English and a sequence in uh, French."},{"from":2361.06,"to":2364.3,"location":2,"content":"Um, drop out some of the words and just as before,"},{"from":2364.3,"to":2366.13,"location":2,"content":"ask the model to fill it in."},{"from":2366.13,"to":2368.64,"location":2,"content":"And the motivation here is that, um,"},{"from":2368.64,"to":2371.08,"location":2,"content":"this will much better cause the model"},{"from":2371.08,"to":2373.53,"location":2,"content":"to understand the relation between these two languages."},{"from":2373.53,"to":2377.95,"location":2,"content":"Because if you're trying to find a fill in a English word that's been dropped,"},{"from":2377.95,"to":2380.5,"location":2,"content":"uh, the best way to do it if you have a translation is look"},{"from":2380.5,"to":2383.01,"location":2,"content":"at the French side and try to find that word."},{"from":2383.01,"to":2385.07,"location":2,"content":"Hopefully, that one hasn't been dropped as well."},{"from":2385.07,"to":2388.53,"location":2,"content":"And then you can um, much more easily fill in the blank."},{"from":2388.53,"to":2392.57,"location":2,"content":"And uh, this actually leads to very uh,"},{"from":2392.57,"to":2395.86,"location":2,"content":"substantial improvements in unsupervised machine translation."},{"from":2395.86,"to":2399.67,"location":2,"content":"So just like BERT is used for other tasks in NLP,"},{"from":2399.67,"to":2402.01,"location":2,"content":"they basically take this cross-lingual BERT."},{"from":2402.01,"to":2403.93,"location":2,"content":"They use it as initialization for"},{"from":2403.93,"to":2407.41,"location":2,"content":"a unsupervised machine translation system and they get, you know,"},{"from":2407.41,"to":2410.43,"location":2,"content":"really large gains on the order of 10 BLEU points um,"},{"from":2410.43,"to":2412.69,"location":2,"content":"such that the gap between"},{"from":2412.69,"to":2416.49,"location":2,"content":"unsupervised machine translation and the current supervised state of the art,"},{"from":2416.49,"to":2418.42,"location":2,"content":"um, is much smaller."},{"from":2418.42,"to":2423.19,"location":2,"content":"Uh, so this is a pretty recent idea but I think it also shows promise"},{"from":2423.19,"to":2428.32,"location":2,"content":"in really improving the quality of translation through using unlabeled data."},{"from":2428.32,"to":2430.95,"location":2,"content":"Um, although I guess yeah, I guess in this case with BERT"},{"from":2430.95,"to":2433.93,"location":2,"content":"they are using labeled translation data as well."},{"from":2433.93,"to":2437.82,"location":2,"content":"Any, any questions about this?"},{"from":2437.82,"to":2446.35,"location":2,"content":"Okay. Um, so that is all I'm going to say about using unlabeled data for translation."},{"from":2446.35,"to":2448.75,"location":2,"content":"The next part of this talk is about um,"},{"from":2448.75,"to":2454.72,"location":2,"content":"what happens if we really scale up these unsupervised language models."},{"from":2454.72,"to":2459.86,"location":2,"content":"Um, so in particular I'm gonna talk about GPT-2 which is a new model by OpenAI."},{"from":2459.86,"to":2462.26,"location":2,"content":"That's essentially a really giant language model"},{"from":2462.26,"to":2465.68,"location":2,"content":"and I think it has some interesting implications."},{"from":2465.68,"to":2475.06,"location":2,"content":"So first of all, here's just the sizes of a bunch of different NLP models and,"},{"from":2475.06,"to":2478.16,"location":2,"content":"um, you know, maybe a couple years ago the,"},{"from":2478.16,"to":2479.23,"location":2,"content":"the standard sort of"},{"from":2479.23,"to":2484.14,"location":2,"content":"LSTM medium-size model was on the order of about 10 million parameters."},{"from":2484.14,"to":2490.66,"location":2,"content":"Where 10- where a parameter is just a single weight let's say in the neural net um,"},{"from":2490.66,"to":2493.09,"location":2,"content":"ELMo and uh, GPT."},{"from":2493.09,"to":2495.52,"location":2,"content":"So the original OpenAI paper before they did"},{"from":2495.52,"to":2498.82,"location":2,"content":"this GPT-2 and we're about 10 times bigger than that."},{"from":2498.82,"to":2504.12,"location":2,"content":"Um, GPT-2 is about another order of magnitude bigger."},{"from":2504.12,"to":2508.82,"location":2,"content":"Um, one kind of interesting comparison point here is that uh,"},{"from":2508.82,"to":2511.74,"location":2,"content":"GPT-2 which is 1.5 billion parameters,"},{"from":2511.74,"to":2515.64,"location":2,"content":"actually has more parameters than a honey bee brain has synapses."},{"from":2515.64,"to":2518.44,"location":2,"content":"Um, so that sounds kind of impressive, right?"},{"from":2518.44,"to":2521.35,"location":2,"content":"You know honeybees are not the smartest of"},{"from":2521.35,"to":2525.32,"location":2,"content":"animals but they can still fly around and find nectar or whatever."},{"from":2525.32,"to":2528.76,"location":2,"content":"Um, but yeah. Of course, this isn't really an apples to apples comparison, right?"},{"from":2528.76,"to":2531.97,"location":2,"content":"So a synapse and a weight in a neural net are really quite different."},{"from":2531.97,"to":2534.49,"location":2,"content":"But I just think it's one kind of interesting milestone"},{"from":2534.49,"to":2536.82,"location":2,"content":"let's say in terms of model size um,"},{"from":2536.82,"to":2538.15,"location":2,"content":"that has been surpassed."},{"from":2538.15,"to":2546.84,"location":2,"content":"[NOISE] Um, one thing to point out here is that um,"},{"from":2546.84,"to":2552.13,"location":2,"content":"this increasing scaling of deep learning is really a general trend uh,"},{"from":2552.13,"to":2554.84,"location":2,"content":"in all of machine learning so beyond NLP."},{"from":2554.84,"to":2561.76,"location":2,"content":"So this plot is showing time on the x-axis and the y-axis is log scaled um,"},{"from":2561.76,"to":2565.26,"location":2,"content":"the amount of petaFLOPS used to train this model."},{"from":2565.26,"to":2570.01,"location":2,"content":"Um, so what this means is that the trend at least currently is that there is"},{"from":2570.01,"to":2573.13,"location":2,"content":"exponential growth in how much compute power"},{"from":2573.13,"to":2575.74,"location":2,"content":"we're throwing at our machine learning models."},{"from":2575.74,"to":2577.92,"location":2,"content":"I guess it is kind of unclear, you know,"},{"from":2577.92,"to":2580.7,"location":2,"content":"will exponential growth continue but certainly um,"},{"from":2580.7,"to":2583.56,"location":2,"content":"there's rapid growth in the size of our models."},{"from":2583.56,"to":2586.2,"location":2,"content":"And it's leading to some really amazing results, right?"},{"from":2586.2,"to":2589.45,"location":2,"content":"So here are results not from language but for vision."},{"from":2589.45,"to":2593.16,"location":2,"content":"Um, this is a generative adversarial network"},{"from":2593.16,"to":2596.92,"location":2,"content":"that's been trained on a lot of data and it's been trained on really large scales."},{"from":2596.92,"to":2602.71,"location":2,"content":"So it's a big model kind of in-between the size of ELMo and BERT let's say."},{"from":2602.71,"to":2607.51,"location":2,"content":"And uh, these photos here are actually productions of the model."},{"from":2607.51,"to":2608.74,"location":2,"content":"So those aren't real photos."},{"from":2608.74,"to":2611.51,"location":2,"content":"Those are things the model has just kind of hallucinated out of thin air."},{"from":2611.51,"to":2614.77,"location":2,"content":"And at least to me they look essentially photo-realistic."},{"from":2614.77,"to":2618.01,"location":2,"content":"There's also a website that um, is fun to look at it."},{"from":2618.01,"to":2619.91,"location":2,"content":"If you're not- if you're interested which is,"},{"from":2619.91,"to":2622.2,"location":2,"content":"thispersondoesnotexist.com."},{"from":2622.2,"to":2623.95,"location":2,"content":"So if you go there, you'll see"},{"from":2623.95,"to":2627.43,"location":2,"content":"a very convincing photo of a person but it's not a real photo."},{"from":2627.43,"to":2631.44,"location":2,"content":"It's again like a hallucinated image produced by a GAN."},{"from":2631.44,"to":2635.72,"location":2,"content":"We're also seeing really huge models being used for image recognition."},{"from":2635.72,"to":2638.11,"location":2,"content":"So this is recent work by Google where they trained"},{"from":2638.11,"to":2642.01,"location":2,"content":"an image net model with half a billion parameters."},{"from":2642.01,"to":2646.45,"location":2,"content":"So that's bigger than BERT but not as big as GPT-2."},{"from":2646.45,"to":2649.42,"location":2,"content":"Um, this plot here is showing a"},{"from":2649.42,"to":2654.76,"location":2,"content":"log scaled number of parameters on the x-axis and then accuracy at ImageNet"},{"from":2654.76,"to":2660.52,"location":2,"content":"on the y-axis- axis and sort of unsurprisingly bigger models perform better."},{"from":2660.52,"to":2664,"location":2,"content":"And there seems to actually be a pretty consistent trend here which is uh,"},{"from":2664,"to":2671.01,"location":2,"content":"accuracy is increasing with the log of the, the model size."},{"from":2671.01,"to":2675.1,"location":2,"content":"Um, I wanna go into a little bit more detail, how is it"},{"from":2675.1,"to":2679.06,"location":2,"content":"possible that we can scale up models and train models at such a large extent."},{"from":2679.06,"to":2681.19,"location":2,"content":"One answer is just better hardware."},{"from":2681.19,"to":2682.68,"location":2,"content":"And in particular, um,"},{"from":2682.68,"to":2684.16,"location":2,"content":"there's a growing uh,"},{"from":2684.16,"to":2688.16,"location":2,"content":"number of companies that are developing hardware specifically for deep learning."},{"from":2688.16,"to":2690.52,"location":2,"content":"So these are even more kind of constrained and the"},{"from":2690.52,"to":2693.19,"location":2,"content":"kind of operations they can do than a GPU,"},{"from":2693.19,"to":2695.95,"location":2,"content":"um but they do those operations even faster."},{"from":2695.95,"to":2699.61,"location":2,"content":"So Google's Tensor Processing Units is one example."},{"from":2699.61,"to":2703.18,"location":2,"content":"There are actually a bunch of other companies working on this idea."},{"from":2703.18,"to":2706.93,"location":2,"content":"Um, the other way to scale up models is by taking advantage of"},{"from":2706.93,"to":2711.84,"location":2,"content":"parallelism and there's two kinds of parallelism that I want to talk about very briefly."},{"from":2711.84,"to":2713.98,"location":2,"content":"So one is data parallelism."},{"from":2713.98,"to":2716.78,"location":2,"content":"In this case, each of your,"},{"from":2716.78,"to":2719.38,"location":2,"content":"let's say GPUs, will have a copy of the model."},{"from":2719.38,"to":2721.48,"location":2,"content":"And what you essentially do is split"},{"from":2721.48,"to":2725.35,"location":2,"content":"the mini-batch that you're training on across these different models."},{"from":2725.35,"to":2727.16,"location":2,"content":"So if you have, let's say,"},{"from":2727.16,"to":2730.95,"location":2,"content":"16 GPUs and each of them see a batch size of 32."},{"from":2730.95,"to":2735.67,"location":2,"content":"You can aggregate the gradients of these 16 uh, uh,"},{"from":2735.67,"to":2742.54,"location":2,"content":"if you do a back-prop on these 16 GPUs and you end up with effectively a batch size of 512."},{"from":2742.54,"to":2744.7,"location":2,"content":"So this allows you to train models much faster."},{"from":2744.7,"to":2750.34,"location":2,"content":"Um, the other kind of parallelism that's growing in importance is model par- parallelism."},{"from":2750.34,"to":2754.51,"location":2,"content":"Um, so eventually models get so big that they"},{"from":2754.51,"to":2759.07,"location":2,"content":"can't even fit on a single GPU and they can't even do a batch size of one."},{"from":2759.07,"to":2760.66,"location":2,"content":"Um, in this case,"},{"from":2760.66,"to":2762.99,"location":2,"content":"you actually need to split up the model across"},{"from":2762.99,"to":2766.07,"location":2,"content":"multiple computers- multiple compute units."},{"from":2766.07,"to":2770.49,"location":2,"content":"Um, and that's what's done for models kind of the size of,"},{"from":2770.49,"to":2772.72,"location":2,"content":"of let's say GPT-2."},{"from":2772.72,"to":2775.54,"location":2,"content":"There are new frameworks such as Mesh-TensorFlow, um,"},{"from":2775.54,"to":2783.99,"location":2,"content":"which are basically designed to make this sort of model parallelism easier."},{"from":2783.99,"to":2787.39,"location":2,"content":"Um, okay. So onto GPT-2, um,"},{"from":2787.39,"to":2791.56,"location":2,"content":"I know you already saw this a little bit in the contextualized uh,"},{"from":2791.56,"to":2796.54,"location":2,"content":"um, embeddings um, lecture but I'm going to go into some more depth here."},{"from":2796.54,"to":2801.26,"location":2,"content":"[NOISE] So so essentially it's a really large transformer language model."},{"from":2801.26,"to":2805.16,"location":2,"content":"Um, so there's nothing really kind of novel here in terms"},{"from":2805.16,"to":2809.3,"location":2,"content":"of new training algorithms or in terms of um,"},{"from":2809.3,"to":2811.64,"location":2,"content":"the loss function or anything like that."},{"from":2811.64,"to":2813.34,"location":2,"content":"Um, the thing that makes it different from"},{"from":2813.34,"to":2816.07,"location":2,"content":"prior work is that it's just really really big."},{"from":2816.07,"to":2819.97,"location":2,"content":"Uh, it's trained on a correspondingly huge amount of text."},{"from":2819.97,"to":2824.8,"location":2,"content":"So it's trained on 40 gigabytes and that's roughly 10 times larger than previous uh,"},{"from":2824.8,"to":2827.22,"location":2,"content":"language models have been trained on."},{"from":2827.22,"to":2831.07,"location":2,"content":"Um, when you have that size of dataset,"},{"from":2831.07,"to":2834.32,"location":2,"content":"um, the only way to get that much text is essentially to go to the web."},{"from":2834.32,"to":2838.84,"location":2,"content":"Um, so one thing OpenAI put a quite a bit of effort into when they're developing"},{"from":2838.84,"to":2843.57,"location":2,"content":"this network was to ensure that that text was pretty high-quality."},{"from":2843.57,"to":2846.18,"location":2,"content":"Um, and they did that in a kind of interesting way."},{"from":2846.18,"to":2848.97,"location":2,"content":"They, they looked at Reddit which is this website where people uh,"},{"from":2848.97,"to":2850.14,"location":2,"content":"can vote on links."},{"from":2850.14,"to":2851.64,"location":2,"content":"And then they said uh, if"},{"from":2851.64,"to":2855.09,"location":2,"content":"a link has a lot of votes then it's probably sort of a decent link."},{"from":2855.09,"to":2856.83,"location":2,"content":"There's probably um, you know,"},{"from":2856.83,"to":2860.61,"location":2,"content":"reasonable text there for a model to learn."},{"from":2860.61,"to":2863.08,"location":2,"content":"Um, okay, so if we have"},{"from":2863.08,"to":2865.6,"location":2,"content":"this super huge language model like"},{"from":2865.6,"to":2869.51,"location":2,"content":"GPT-2 on this question of what can you actually do with it,"},{"from":2869.51,"to":2873.41,"location":2,"content":"um, well obviously if you have a language model you can do language modelling with it."},{"from":2873.41,"to":2876.79,"location":2,"content":"Uh, but one thing kind of interestingly interesting is that you"},{"from":2876.79,"to":2880.53,"location":2,"content":"can run this language model on er,"},{"from":2880.53,"to":2883.43,"location":2,"content":"existing benchmarks, um, for,"},{"from":2883.43,"to":2885.25,"location":2,"content":"for language modelling, um,"},{"from":2885.25,"to":2888.52,"location":2,"content":"and it gets state of the art perplexity on these benchmarks even"},{"from":2888.52,"to":2891.7,"location":2,"content":"though it never sees the training data for these benchmarks, right?"},{"from":2891.7,"to":2896.77,"location":2,"content":"So normally, if you want to say evaluate your language model on the Penn Treebank."},{"from":2896.77,"to":2901.51,"location":2,"content":"You first train on the Penn Treebank and then you evaluate on this held-out set."},{"from":2901.51,"to":2903.79,"location":2,"content":"Uh, in this case, uh,"},{"from":2903.79,"to":2908.51,"location":2,"content":"a GPT-2 just by virtue of having seen so much text and being such a large model,"},{"from":2908.51,"to":2911.09,"location":2,"content":"outperforms all these other uh,"},{"from":2911.09,"to":2914.58,"location":2,"content":"prior works even though it's not seeing that data."},{"from":2914.58,"to":2920.8,"location":2,"content":"Um, on a bunch of different uh, language modelling benchmarks."},{"from":2920.8,"to":2926.32,"location":2,"content":"Um, but there's a bunch of other interesting experiments that OpenAI"},{"from":2926.32,"to":2931.7,"location":2,"content":"ran with this language modeling and these were based on zero-shot learning."},{"from":2931.7,"to":2937.25,"location":2,"content":"So zero-shot learning just means trying to do a task without ever training on it."},{"from":2937.25,"to":2940.45,"location":2,"content":"And, uh, the way you can do this with a language model"},{"from":2940.45,"to":2943.46,"location":2,"content":"is by designing a prompt you feed into"},{"from":2943.46,"to":2946.88,"location":2,"content":"the language model and then have it just generate from there and"},{"from":2946.88,"to":2951.07,"location":2,"content":"hopefully it generates something relevant to the task you're trying to solve."},{"from":2951.07,"to":2953.22,"location":2,"content":"So for example, for reading comprehension,"},{"from":2953.22,"to":2956.09,"location":2,"content":"what you can do is take the context paragraph,"},{"from":2956.09,"to":2960.08,"location":2,"content":"uh, concatenate the question to it and then add uh,"},{"from":2960.08,"to":2961.43,"location":2,"content":"a colon which is a way,"},{"from":2961.43,"to":2962.7,"location":2,"content":"I guess, of telling the model,"},{"from":2962.7,"to":2965.21,"location":2,"content":"''Okay you should be producing an answer to this question,''"},{"from":2965.21,"to":2967.79,"location":2,"content":"and then just have it generate text, um,"},{"from":2967.79,"to":2970.94,"location":2,"content":"and perhaps it'll generate something that is actually answering,"},{"from":2970.94,"to":2972.36,"location":2,"content":"um, the question and is,"},{"from":2972.36,"to":2974.06,"location":2,"content":"is paying attention to the context."},{"from":2974.06,"to":2977.39,"location":2,"content":"[NOISE] Um, and similarly, for summarization,"},{"from":2977.39,"to":2981.74,"location":2,"content":"you can get the article then TL;DR and perhaps the model will produce the summary."},{"from":2981.74,"to":2983.8,"location":2,"content":"Um, you can even do translation,"},{"from":2983.8,"to":2985.66,"location":2,"content":"where you give the model,"},{"from":2985.66,"to":2989.72,"location":2,"content":"um, some ex- a list of known English to French translations so you, sort of,"},{"from":2989.72,"to":2993.77,"location":2,"content":"prime it to tell it that it should be doing translation and then you give"},{"from":2993.77,"to":2998.12,"location":2,"content":"it the source sequence equals blank and have it just run and,"},{"from":2998.12,"to":2999.92,"location":2,"content":"um, perhaps it'll generate,"},{"from":2999.92,"to":3003.3,"location":2,"content":"um, the sequence in the target language."},{"from":3003.3,"to":3006.89,"location":2,"content":"Um, okay. So so here's what the results look like."},{"from":3006.89,"to":3009.1,"location":2,"content":"Um, for all of these,"},{"from":3009.1,"to":3011.55,"location":2,"content":"uh, the X-axis is,"},{"from":3011.55,"to":3016.21,"location":2,"content":"is log scaled model size and the Y-axis is accuracy, um,"},{"from":3016.21,"to":3018.72,"location":2,"content":"and the dotted lines basically correspond to,"},{"from":3018.72,"to":3022.09,"location":2,"content":"um, existing works on these tasks."},{"from":3022.09,"to":3026.29,"location":2,"content":"Um, so for most of these tasks, um,"},{"from":3026.29,"to":3031.76,"location":2,"content":"GPT-2 is quite a bit below existing systems,"},{"from":3031.76,"to":3033.63,"location":2,"content":"um, but there's of course this big difference, right?"},{"from":3033.63,"to":3037.2,"location":2,"content":"Existing systems are trained specifically to do,"},{"from":3037.2,"to":3039.78,"location":2,"content":"um, whatever task they're being evaluated on,"},{"from":3039.78,"to":3042.52,"location":2,"content":"where GPT-2 is um,"},{"from":3042.52,"to":3046.54,"location":2,"content":"only trained to do language modeling and as it learns language modeling,"},{"from":3046.54,"to":3048.86,"location":2,"content":"it's sort of picking up on these other tasks."},{"from":3048.86,"to":3050.78,"location":2,"content":"Um, so right. So for example, um,"},{"from":3050.78,"to":3054.39,"location":2,"content":"it does, uh, English to French machine translation, um,"},{"from":3054.39,"to":3056.88,"location":2,"content":"not as well as, uh,"},{"from":3056.88,"to":3060.4,"location":2,"content":"standard unsupervised machine translation which is those, uh,"},{"from":3060.4,"to":3062.92,"location":2,"content":"dotted lines, um, but it still,"},{"from":3062.92,"to":3064.3,"location":2,"content":"it still does quite well."},{"from":3064.3,"to":3066.37,"location":2,"content":"And, um, one thing, kind of,"},{"from":3066.37,"to":3067.81,"location":2,"content":"interesting is the trend line, right,"},{"from":3067.81,"to":3069.52,"location":2,"content":"for almost all of these tasks."},{"from":3069.52,"to":3071.53,"location":2,"content":"Um, performance is getting uh,"},{"from":3071.53,"to":3073.6,"location":2,"content":"much better as the model increases in size."},{"from":3073.6,"to":3078.53,"location":2,"content":"[NOISE] Um, I think a particularly interesting,"},{"from":3078.53,"to":3081.58,"location":2,"content":"uh, one of these tasks is machine translation, right?"},{"from":3081.58,"to":3083.29,"location":2,"content":"So the question is, how can it be doing"},{"from":3083.29,"to":3086.44,"location":2,"content":"machine translation when all we're giving it as a bunch of"},{"from":3086.44,"to":3088.54,"location":2,"content":"web pages and those web pages are almost all in"},{"from":3088.54,"to":3091.81,"location":2,"content":"English and yet somehow it sort of magically picks up uh,"},{"from":3091.81,"to":3093.34,"location":2,"content":"a little bit of machine translation, right."},{"from":3093.34,"to":3095.39,"location":2,"content":"So it's not a great model but it can still,"},{"from":3095.39,"to":3098.26,"location":2,"content":"um, you know, do a decent job in some cases."},{"from":3098.26,"to":3100.51,"location":2,"content":"Um, and the answer is that,"},{"from":3100.51,"to":3103.81,"location":2,"content":"if you look at this giant corpus of English,"},{"from":3103.81,"to":3107.05,"location":2,"content":"occasionally, uh, within, within that corpus,"},{"from":3107.05,"to":3108.88,"location":2,"content":"you see examples of translations, right?"},{"from":3108.88,"to":3110.29,"location":2,"content":"So you see, um,"},{"from":3110.29,"to":3112.81,"location":2,"content":"a French idiom and its translation or"},{"from":3112.81,"to":3116.03,"location":2,"content":"a quote from someone who's French and then the translation in English."},{"from":3116.03,"to":3117.4,"location":2,"content":"And, um, kind of,"},{"from":3117.4,"to":3120.7,"location":2,"content":"amazingly I think this big model, um,"},{"from":3120.7,"to":3125.38,"location":2,"content":"sees enough of these examples that it actually starts to learn how to generate French,"},{"from":3125.38,"to":3127.03,"location":2,"content":"um, even though that wasn't really,"},{"from":3127.03,"to":3131.97,"location":2,"content":"sort of, an intended part of its training."},{"from":3131.97,"to":3134.56,"location":2,"content":"Um, another interesting, um,"},{"from":3134.56,"to":3138.7,"location":2,"content":"thing to dig a bit more into is its ability to do question answering."},{"from":3138.7,"to":3144.04,"location":2,"content":"So uh, a simple baseline for question answering gets about 1% accuracy,"},{"from":3144.04,"to":3147.3,"location":2,"content":"GPT-2 barely does better at 4% accuracy."},{"from":3147.3,"to":3148.84,"location":2,"content":"So this isn't, like, you know,"},{"from":3148.84,"to":3152.44,"location":2,"content":"super amazingly solved question answering, um, but, um,"},{"from":3152.44,"to":3154.42,"location":2,"content":"it's still pretty interesting in that,"},{"from":3154.42,"to":3157.43,"location":2,"content":"if you look at answers the model's most confident about,"},{"from":3157.43,"to":3159.01,"location":2,"content":"you can see that it sort of"},{"from":3159.01,"to":3161.32,"location":2,"content":"has learned some facts about the world, right."},{"from":3161.32,"to":3165.55,"location":2,"content":"So it's learned that Charles Darwin wrote Origin of Species."},{"from":3165.55,"to":3170.74,"location":2,"content":"Um, normally in the history of NLP, if you want to get, kind of,"},{"from":3170.74,"to":3172.76,"location":2,"content":"world knowledge into an NLP system,"},{"from":3172.76,"to":3175.43,"location":2,"content":"you'd need something like a big database of facts."},{"from":3175.43,"to":3177.34,"location":2,"content":"And even though this is still,"},{"from":3177.34,"to":3179.5,"location":2,"content":"kind of, very early stages and that, um,"},{"from":3179.5,"to":3184,"location":2,"content":"there's still a huge gap between 4% accuracy and the, uh, you know,"},{"from":3184,"to":3185.88,"location":2,"content":"70% or so that, uh,"},{"from":3185.88,"to":3189.55,"location":2,"content":"state of the art open domain question answering systems can do,"},{"from":3189.55,"to":3192.01,"location":2,"content":"um, it, it, um,"},{"from":3192.01,"to":3194.2,"location":2,"content":"it still can, uh,"},{"from":3194.2,"to":3197.74,"location":2,"content":"pick up some world knowledge just by reading a lot of text, um, without,"},{"from":3197.74,"to":3201.89,"location":2,"content":"kind of, explicitly having that knowledge put into the model."},{"from":3201.89,"to":3208.05,"location":2,"content":"Um, any questions by the way on GPT-2 so far?"},{"from":3208.05,"to":3213.86,"location":2,"content":"Okay. So one question that's interesting to think about is,"},{"from":3213.86,"to":3216.51,"location":2,"content":"what happens if our models get even bigger?"},{"from":3216.51,"to":3218.3,"location":2,"content":"Um, so here I've done the, um,"},{"from":3218.3,"to":3222.57,"location":2,"content":"very scientific thing of drawing some lines in PowerPoint and seeing where they meet up."},{"from":3222.57,"to":3224.66,"location":2,"content":"Um, and you can see that, um,"},{"from":3224.66,"to":3228.43,"location":2,"content":"if the trend holds at about 1 trillion parameters,"},{"from":3228.43,"to":3232.39,"location":2,"content":"um, we get to human level reading comprehension performance."},{"from":3232.39,"to":3235.48,"location":2,"content":"Um, so if that's true it would be really astonishing."},{"from":3235.48,"to":3240.51,"location":2,"content":"I actually do expect that a 1 trillion parameter model would be attainable in,"},{"from":3240.51,"to":3242.16,"location":2,"content":"I don't know, ten years or so,"},{"from":3242.16,"to":3244.24,"location":2,"content":"um, but of course,"},{"from":3244.24,"to":3245.66,"location":2,"content":"right, the trend isn't clear."},{"from":3245.66,"to":3247.63,"location":2,"content":"So if you look at summarization for example,"},{"from":3247.63,"to":3249.04,"location":2,"content":"it seems like performance is already,"},{"from":3249.04,"to":3251.01,"location":2,"content":"uh, uh, topped out."},{"from":3251.01,"to":3255.76,"location":2,"content":"Um, so I think this will be a really interesting thing kinda going forward,"},{"from":3255.76,"to":3257.98,"location":2,"content":"looking at the future of NLP, um,"},{"from":3257.98,"to":3260.71,"location":2,"content":"is how the scaling will change,"},{"from":3260.71,"to":3264.12,"location":2,"content":"um, the way NLP is approached."},{"from":3264.12,"to":3269.76,"location":2,"content":"Um, the other interesting thing about GPT-2 was its reaction from uh,"},{"from":3269.76,"to":3272.13,"location":2,"content":"the media and also from other researchers."},{"from":3272.13,"to":3275.45,"location":2,"content":"Um, and the real cause of"},{"from":3275.45,"to":3279.3,"location":2,"content":"a lot of the controversy about it was this statement from OpenAI."},{"from":3279.3,"to":3283,"location":2,"content":"They said that, ''We're not going to release our full language model,"},{"from":3283,"to":3284.59,"location":2,"content":"um, because it's too dangerous,"},{"from":3284.59,"to":3286.01,"location":2,"content":"you know, our language model is too good.''"},{"from":3286.01,"to":3291.11,"location":2,"content":"Um, so the media really enjoyed this and,"},{"from":3291.11,"to":3292.33,"location":2,"content":"you know, said that,"},{"from":3292.33,"to":3295.14,"location":2,"content":"uh, machine learning is going to break the Internet."},{"from":3295.14,"to":3300.58,"location":2,"content":"Um, there's also some pretty interesting reactions from our researchers, right."},{"from":3300.58,"to":3302.02,"location":2,"content":"So um, there's some,"},{"from":3302.02,"to":3304.2,"location":2,"content":"kind of, tongue-in-cheek responses here, right."},{"from":3304.2,"to":3305.76,"location":2,"content":"You know, I trained the model on MNIST."},{"from":3305.76,"to":3307.91,"location":2,"content":"Is it too dangerous for me to release it?"},{"from":3307.91,"to":3311.53,"location":2,"content":"Um, and similarly, we've done really great work"},{"from":3311.53,"to":3315.72,"location":2,"content":"but we can't release it it's too dangerous so you're just gonna have to trust us on this."},{"from":3315.72,"to":3318.97,"location":2,"content":"Looking at more, kind of, reasoned, um,"},{"from":3318.97,"to":3320.66,"location":2,"content":"debate about this issue,"},{"from":3320.66,"to":3322.89,"location":2,"content":"you still see articles,"},{"from":3322.89,"to":3324.61,"location":2,"content":"um, arguing both sides."},{"from":3324.61,"to":3326.47,"location":2,"content":"So these are two ar- articles,"},{"from":3326.47,"to":3329.55,"location":2,"content":"um, from The Gradient which is a, sort of,"},{"from":3329.55,"to":3331.69,"location":2,"content":"machine learning newsletter, um,"},{"from":3331.69,"to":3335.88,"location":2,"content":"and they're arguing precisely opposite sides of this issue,"},{"from":3335.88,"to":3340.77,"location":2,"content":"um, should it be released or not."},{"from":3340.77,"to":3347.13,"location":2,"content":"So I guess I can briefly go over a few arguments for or against."},{"from":3347.13,"to":3350.18,"location":2,"content":"There is, kind of, a lot of debate about this and I don't want to"},{"from":3350.18,"to":3354.15,"location":2,"content":"go too deep into a controversial issue,"},{"from":3354.15,"to":3356.71,"location":2,"content":"um, but here's a long list of,"},{"from":3356.71,"to":3358.57,"location":2,"content":"kind of, things people have said about this, right."},{"from":3358.57,"to":3361.45,"location":2,"content":"So um, here's why you should release."},{"from":3361.45,"to":3363.28,"location":2,"content":"One complaint is that,"},{"from":3363.28,"to":3365.07,"location":2,"content":"is this model really that special?"},{"from":3365.07,"to":3366.59,"location":2,"content":"There's nothing new going on here."},{"from":3366.59,"to":3369.64,"location":2,"content":"It's just 10 times bigger than previous models, um,"},{"from":3369.64,"to":3371.86,"location":2,"content":"and there's also some arguments that,"},{"from":3371.86,"to":3374.5,"location":2,"content":"um, even if this one isn't released, you know,"},{"from":3374.5,"to":3377.18,"location":2,"content":"in five years everybody can train a model this good, um,"},{"from":3377.18,"to":3382.27,"location":2,"content":"and actually if you look at image recognition or look at images and speech data, um,"},{"from":3382.27,"to":3385.78,"location":2,"content":"it already is possible to synthesize highly convincing,"},{"from":3385.78,"to":3388.41,"location":2,"content":"um, fake images and fake speech."},{"from":3388.41,"to":3394.75,"location":2,"content":"So kinda, what makes this thing different from those other, um, systems."},{"from":3394.75,"to":3396.31,"location":2,"content":"And speaking of other systems, right,"},{"from":3396.31,"to":3398.34,"location":2,"content":"Photoshop has existed for a long time,"},{"from":3398.34,"to":3401.95,"location":2,"content":"so we can already convincingly fake images, um,"},{"from":3401.95,"to":3404.14,"location":2,"content":"people have just learned to adjust and learned"},{"from":3404.14,"to":3406.64,"location":2,"content":"that you shouldn't always trust what's in an image,"},{"from":3406.64,"to":3407.99,"location":2,"content":"um, because it may have been,"},{"from":3407.99,"to":3410.07,"location":2,"content":"um, altered in some way."},{"from":3410.07,"to":3412.45,"location":2,"content":"Um, on the other hand, you could say,"},{"from":3412.45,"to":3415.78,"location":2,"content":"''Okay, uh, Photoshop exists but, um, you can't, sort of,"},{"from":3415.78,"to":3420.13,"location":2,"content":"scale up Photoshop and start mass producing fake content the way you can with this sort"},{"from":3420.13,"to":3424.66,"location":2,"content":"of model,'' and they pointed at the danger of uh, fake news, um,"},{"from":3424.66,"to":3428.95,"location":2,"content":"fake reviews, um, in general just astroturfing, which means basically,"},{"from":3428.95,"to":3435.37,"location":2,"content":"uh, creating fake user content that's supporting a view you want other people to hold."},{"from":3435.37,"to":3438.87,"location":2,"content":"Um, this is actually something that's already done,"},{"from":3438.87,"to":3441.66,"location":2,"content":"um, pretty widely by country- companies and governments."},{"from":3441.66,"to":3443.47,"location":2,"content":"There's a lot of evidence for this, um,"},{"from":3443.47,"to":3445.5,"location":2,"content":"but they are of course hiring people to"},{"from":3445.5,"to":3447.8,"location":2,"content":"write all these comments on news articles let's say"},{"from":3447.8,"to":3450.39,"location":2,"content":"and we don't want to make their job any easier"},{"from":3450.39,"to":3453.62,"location":2,"content":"by producing a machine that could potentially do this."},{"from":3453.62,"to":3457.33,"location":2,"content":"So um, I'm not really gonna take a side here,"},{"from":3457.33,"to":3459.57,"location":2,"content":"um, there's still a lot of debate about this."},{"from":3459.57,"to":3461.11,"location":2,"content":"I think, you know,"},{"from":3461.11,"to":3463.3,"location":2,"content":"the main, the main takeaway here is that,"},{"from":3463.3,"to":3466.96,"location":2,"content":"as a community on people in machine learning and NLP,"},{"from":3466.96,"to":3468.91,"location":2,"content":"don't really have a handle on this, right?"},{"from":3468.91,"to":3471.36,"location":2,"content":"We are sort of caught by surprise by, um,"},{"from":3471.36,"to":3476.09,"location":2,"content":"OpenAI's, um, decision here and, um, uh,"},{"from":3476.09,"to":3477.76,"location":2,"content":"that means that, you know,"},{"from":3477.76,"to":3481.12,"location":2,"content":"there really is some figuring out that needs to be done on what"},{"from":3481.12,"to":3485.51,"location":2,"content":"exactly is responsible to release publicly."},{"from":3485.51,"to":3489.43,"location":2,"content":"What kind of research problems should we be working on and so on."},{"from":3489.43,"to":3491.53,"location":2,"content":"[NOISE] So yeah."},{"from":3491.53,"to":3493.8,"location":2,"content":"Any questions about uh, this,"},{"from":3493.8,"to":3496.45,"location":2,"content":"this reaction or this debate in general?"},{"from":3496.45,"to":3502.14,"location":2,"content":"[NOISE] Okay."},{"from":3502.14,"to":3507.61,"location":2,"content":"Um, I think something arising from this debate is, um,"},{"from":3507.61,"to":3509.31,"location":2,"content":"the question of, um,"},{"from":3509.31,"to":3512.58,"location":2,"content":"should really the ML people be the people making these, sort of,"},{"from":3512.58,"to":3518.09,"location":2,"content":"decisions or is there a need for more interdisciplinary science where we look at, um,"},{"from":3518.09,"to":3520.43,"location":2,"content":"experts in say, computer security,"},{"from":3520.43,"to":3522.7,"location":2,"content":"um, people from social sciences,"},{"from":3522.7,"to":3526.18,"location":2,"content":"um, you know, people who are experts in ethics,"},{"from":3526.18,"to":3528.36,"location":2,"content":"um, to look at these decisions."},{"from":3528.36,"to":3534.59,"location":2,"content":"Um, right. So GPT-2 was definitely one example of where suddenly it seems like,"},{"from":3534.59,"to":3538.42,"location":2,"content":"um, our NLP technology has a lot of pitfalls, right."},{"from":3538.42,"to":3542.01,"location":2,"content":"Where they could be used in a malicious way or they could cause damage."},{"from":3542.01,"to":3545.72,"location":2,"content":"And I think this trend is only going to increase, um,"},{"from":3545.72,"to":3547.16,"location":2,"content":"if you look at, kind of,"},{"from":3547.16,"to":3550.54,"location":2,"content":"areas of NLP that people are working on, uh,"},{"from":3550.54,"to":3556.51,"location":2,"content":"increasingly people are working on really high stakes applications of NLP,"},{"from":3556.51,"to":3559.57,"location":2,"content":"um, and those often have really big, um,"},{"from":3559.57,"to":3565.98,"location":2,"content":"ramifications, especially if you think from the angle of bias and fairness."},{"from":3565.98,"to":3572.69,"location":2,"content":"Um, so, so let's go over a couple examples of this, um-"},{"from":3572.69,"to":3575.95,"location":2,"content":"Um, one- so some, some areas where,"},{"from":3575.95,"to":3577.88,"location":2,"content":"where this is happening is people are looking at,"},{"from":3577.88,"to":3580.05,"location":2,"content":"uh, NLP to look at judicial decisions."},{"from":3580.05,"to":3581.89,"location":2,"content":"So for example, should this person,"},{"from":3581.89,"to":3583.3,"location":2,"content":"uh, get bail or not?"},{"from":3583.3,"to":3585.21,"location":2,"content":"Um, for hiring decisions, right?"},{"from":3585.21,"to":3586.68,"location":2,"content":"So you look at someone's resume,"},{"from":3586.68,"to":3588,"location":2,"content":"you run NLP on it,"},{"from":3588,"to":3590.78,"location":2,"content":"and then you'd make a decision automatically,"},{"from":3590.78,"to":3593.13,"location":2,"content":"um, sh- should we throw out this resume or not?"},{"from":3593.13,"to":3596.85,"location":2,"content":"So do some, sort of, screening, um, grading tests."},{"from":3596.85,"to":3598.65,"location":2,"content":"Um, if you take the GRE, um,"},{"from":3598.65,"to":3600.82,"location":2,"content":"your, your tests will be graded by a machine."},{"from":3600.82,"to":3603.09,"location":2,"content":"Um, a person will also look at it, um,"},{"from":3603.09,"to":3605.3,"location":2,"content":"but nevertheless, um, that's, you know,"},{"from":3605.3,"to":3609.09,"location":2,"content":"a sometimes very impactful part of your life, um, when it's,"},{"from":3609.09,"to":3611.09,"location":2,"content":"when it's the tests that, um, inf- you know,"},{"from":3611.09,"to":3614.49,"location":2,"content":"affects your, um, acceptance into a school, let's say."},{"from":3614.49,"to":3617.26,"location":2,"content":"Um, so I think there is- are some,"},{"from":3617.26,"to":3620.79,"location":2,"content":"some good sides of using Machine Learning in these kinds of contexts."},{"from":3620.79,"to":3624.12,"location":2,"content":"So one is that we can pretty quickly evaluate,"},{"from":3624.12,"to":3626.99,"location":2,"content":"a machine learning system and search out."},{"from":3626.99,"to":3628.68,"location":2,"content":"Does it have some, kind of, bias,"},{"from":3628.68,"to":3631.35,"location":2,"content":"just by running it on a bunch of data and seeing what it does,"},{"from":3631.35,"to":3634.35,"location":2,"content":"and also perhaps even more importantly,"},{"from":3634.35,"to":3635.64,"location":2,"content":"um, we can fix this, kind of,"},{"from":3635.64,"to":3637.08,"location":2,"content":"problem if it arises, right?"},{"from":3637.08,"to":3642.24,"location":2,"content":"So, um, it's probably easier to fix a machine learning system that screens resumes,"},{"from":3642.24,"to":3644.73,"location":2,"content":"than it is to s- to fix having, you know,"},{"from":3644.73,"to":3648.3,"location":2,"content":"5,000 executives that are slightly sexist or something, right?"},{"from":3648.3,"to":3649.72,"location":2,"content":"So, so in this way,"},{"from":3649.72,"to":3651.18,"location":2,"content":"um, there is a, sort of,"},{"from":3651.18,"to":3657.84,"location":2,"content":"positive angle on using machine learning in these high-stakes, um, uh, decisions."},{"from":3657.84,"to":3660.01,"location":2,"content":"Um, on the other hand, um,"},{"from":3660.01,"to":3662.22,"location":2,"content":"it's been pretty well, uh, s- known,"},{"from":3662.22,"to":3664.77,"location":2,"content":"and I know you had a lecture on bias and fairness,"},{"from":3664.77,"to":3667.77,"location":2,"content":"that machine learning often reflects bias in a data-set,"},{"from":3667.77,"to":3671.03,"location":2,"content":"um, it can even amplify bias in the data-set."},{"from":3671.03,"to":3672.66,"location":2,"content":"Um, and there's concern of, kind of,"},{"from":3672.66,"to":3675.32,"location":2,"content":"a feedback loop where a biased algorithm"},{"from":3675.32,"to":3678.36,"location":2,"content":"actually will lead to the creation of more biased data,"},{"from":3678.36,"to":3683.15,"location":2,"content":"um, in which case these problems will only compound and get worse."},{"from":3683.15,"to":3688.95,"location":2,"content":"Um, so for all of the, uh, high-impact decisions,"},{"from":3688.95,"to":3690.99,"location":2,"content":"um, I, I had listed on that slide,"},{"from":3690.99,"to":3694.32,"location":2,"content":"there are examples where things have gone awry, right?"},{"from":3694.32,"to":3696.69,"location":2,"content":"So Amazon had some AI that was,"},{"from":3696.69,"to":3699.97,"location":2,"content":"um, working as a recruiting tool and it turned out to be sexist."},{"from":3699.97,"to":3702.26,"location":2,"content":"Um, um, there have been some, kind of,"},{"from":3702.26,"to":3704.55,"location":2,"content":"early pilots of using AI, um,"},{"from":3704.55,"to":3706.68,"location":2,"content":"in the justice system and those also have had,"},{"from":3706.68,"to":3709.71,"location":2,"content":"um, in some cases, really bad results."},{"from":3709.71,"to":3712.92,"location":2,"content":"Um, if you look at automatic,"},{"from":3712.92,"to":3714.86,"location":2,"content":"automatic essay grading, um,"},{"from":3714.86,"to":3716.43,"location":2,"content":"it's not really a great,"},{"from":3716.43,"to":3717.72,"location":2,"content":"you know, NLP system, right?"},{"from":3717.72,"to":3719.73,"location":2,"content":"So here's an example, um,"},{"from":3719.73,"to":3722.36,"location":2,"content":"excerpt of an essay that, um,"},{"from":3722.36,"to":3726.24,"location":2,"content":"a automatic grading system used by the GRE test gives, uh,"},{"from":3726.24,"to":3728.04,"location":2,"content":"a very high score, um,"},{"from":3728.04,"to":3730.23,"location":2,"content":"but really it's just, kind of, a solid of,"},{"from":3730.23,"to":3732.42,"location":2,"content":"uh, big fancy words and that's"},{"from":3732.42,"to":3737.24,"location":2,"content":"enough to convince the model that this is a, a great essay."},{"from":3737.24,"to":3739.41,"location":2,"content":"Um, the last, um,"},{"from":3739.41,"to":3741.55,"location":2,"content":"area I wanna talk about where, where, um,"},{"from":3741.55,"to":3743.55,"location":2,"content":"you can see there's really some risks and"},{"from":3743.55,"to":3746.66,"location":2,"content":"some pitfalls with using NLP technology, is chatbots."},{"from":3746.66,"to":3751.56,"location":2,"content":"Um, so I think chatbots do have a side where they can be very beneficial."},{"from":3751.56,"to":3753.93,"location":2,"content":"Um, Woebot is one example,"},{"from":3753.93,"to":3757.55,"location":2,"content":"is this company that has this chatbot you can talk to if you're not,"},{"from":3757.55,"to":3759.48,"location":2,"content":"um, feeling too great and it'll try to,"},{"from":3759.48,"to":3761.57,"location":2,"content":"um, I don't know, cheer you up."},{"from":3761.57,"to":3763.83,"location":2,"content":"Um, so, so that, you know,"},{"from":3763.83,"to":3766.77,"location":2,"content":"could be a- a really nice piece of technology that helps people,"},{"from":3766.77,"to":3769.38,"location":2,"content":"um, but on the other hand, there's some big risks."},{"from":3769.38,"to":3773.52,"location":2,"content":"So, so one example is Microsoft research had a chatbot trained on tweets,"},{"from":3773.52,"to":3776.85,"location":2,"content":"and it started quickly saying racist things and had to be pulled."},{"from":3776.85,"to":3779.63,"location":2,"content":"Um, so I think all of this highlights that, um,"},{"from":3779.63,"to":3782.51,"location":2,"content":"as NLP is becoming more effective,"},{"from":3782.51,"to":3785.84,"location":2,"content":"people are seeing opportunities to use it in, um,"},{"from":3785.84,"to":3789.3,"location":2,"content":"increasingly high-stakes decisions and although,"},{"from":3789.3,"to":3791.78,"location":2,"content":"you know, there are some nice- there's some appeal to that,"},{"from":3791.78,"to":3794.31,"location":2,"content":"um, there's also a lot of risk."},{"from":3794.31,"to":3797.31,"location":2,"content":"Um, any more questions on, uh,"},{"from":3797.31,"to":3801.65,"location":2,"content":"this sort of social impact of NLP?"},{"from":3801.65,"to":3809.25,"location":2,"content":"Okay. Um, last part of this lecture is looking more at future research, right?"},{"from":3809.25,"to":3810.47,"location":2,"content":"And in particular, um,"},{"from":3810.47,"to":3813.51,"location":2,"content":"I think a lot of the current research trends are,"},{"from":3813.51,"to":3815.76,"location":2,"content":"kind of reactions to BERT, um, right?"},{"from":3815.76,"to":3820.08,"location":2,"content":"So, so the question is what did BERT solve and- and what do we work on next?"},{"from":3820.08,"to":3824.3,"location":2,"content":"Um, so here are results on the GLUE benchmark."},{"from":3824.3,"to":3827.07,"location":2,"content":"Um, that is, uh, a compendium of,"},{"from":3827.07,"to":3830.28,"location":2,"content":"uh, 10 natural language understanding tasks."},{"from":3830.28,"to":3834.42,"location":2,"content":"Um, and you get an average score across those 10 tasks."},{"from":3834.42,"to":3837.81,"location":2,"content":"Um, the left, uh, two- the two are,"},{"from":3837.81,"to":3840.72,"location":2,"content":"sorry the right- two right most models are,"},{"from":3840.72,"to":3843.33,"location":2,"content":"um, uh, s- non, uh,"},{"from":3843.33,"to":3846.48,"location":2,"content":"are just supervised trained machine learning systems, right?"},{"from":3846.48,"to":3848.36,"location":2,"content":"So we have Bag-of-Vectors, um,"},{"from":3848.36,"to":3850.92,"location":2,"content":"we instead use our fancy neural net architecture"},{"from":3850.92,"to":3853.65,"location":2,"content":"of BiLSTM + Attention and we get about five points."},{"from":3853.65,"to":3855.6,"location":2,"content":"Um, but the gains from BERT,"},{"from":3855.6,"to":3857.52,"location":2,"content":"uh, really dwarf that difference, right?"},{"from":3857.52,"to":3859.89,"location":2,"content":"So, so BERT improves results by about, uh,"},{"from":3859.89,"to":3864.12,"location":2,"content":"17 points and we end up being actually quite close,"},{"from":3864.12,"to":3866.93,"location":2,"content":"um, to human performance on these tasks."},{"from":3866.93,"to":3869.82,"location":2,"content":"Um, so one, sort of,"},{"from":3869.82,"to":3872.22,"location":2,"content":"implication of this that people are wondering about is,"},{"from":3872.22,"to":3875.11,"location":2,"content":"is this, kind of, the death of architecture engineering?"},{"from":3875.11,"to":3879.22,"location":2,"content":"Um, so I'm sure all of you who have worked on the default final project, um,"},{"from":3879.22,"to":3882.57,"location":2,"content":"have seen a whole bunch of fancy pictures showing different,"},{"from":3882.57,"to":3884.49,"location":2,"content":"uh, architectures for solving SQuAD."},{"from":3884.49,"to":3886.71,"location":2,"content":"Um, there are a lot of papers."},{"from":3886.71,"to":3888.39,"location":2,"content":"They all propose some, kind of,"},{"from":3888.39,"to":3890.89,"location":2,"content":"uh, attention mechanism or something like that."},{"from":3890.89,"to":3893.88,"location":2,"content":"Um, and, um, right."},{"from":3893.88,"to":3895.17,"location":2,"content":"With BERT, it's, sort of,"},{"from":3895.17,"to":3896.97,"location":2,"content":"um, you don't need to do any of that, right?"},{"from":3896.97,"to":3899.19,"location":2,"content":"You just train a transformer and you give it enough data,"},{"from":3899.19,"to":3901.02,"location":2,"content":"and actually you're doing great on SQuAD,"},{"from":3901.02,"to":3903.89,"location":2,"content":"you know, maybe, um, these, uh,"},{"from":3903.89,"to":3907.8,"location":2,"content":"architectural enhancements are not necessarily, um,"},{"from":3907.8,"to":3910.59,"location":2,"content":"the key thing that'll drive progress in,"},{"from":3910.59,"to":3914.15,"location":2,"content":"uh, improving results on these tasks."},{"from":3914.15,"to":3916.74,"location":2,"content":"Um, right. So, uh,"},{"from":3916.74,"to":3918.63,"location":2,"content":"if you look at this with the perspective of a researcher,"},{"from":3918.63,"to":3920.61,"location":2,"content":"you can think a researcher will say, \"Okay,"},{"from":3920.61,"to":3923.52,"location":2,"content":"I can spend six months designing a fancy new architecture for"},{"from":3923.52,"to":3927.93,"location":2,"content":"SQuAD and if I do a good job maybe I'll improve results by 1, uh, F1 point.\""},{"from":3927.93,"to":3930.03,"location":2,"content":"Um, but in the case of BERT, um,"},{"from":3930.03,"to":3932.16,"location":2,"content":"increasing the size of their model of 3x,"},{"from":3932.16,"to":3933.24,"location":2,"content":"which is the difference between,"},{"from":3933.24,"to":3936.09,"location":2,"content":"they've like a base size model and a large model,"},{"from":3936.09,"to":3939.59,"location":2,"content":"um, that improve results by 5 F1 points."},{"from":3939.59,"to":3942.15,"location":2,"content":"Um, so it does seem to suggest we need to, sort of,"},{"from":3942.15,"to":3946.64,"location":2,"content":"re-prioritize, um, which avenues of research we'd pursue,"},{"from":3946.64,"to":3949.5,"location":2,"content":"because this architecture engineering isn't providing, kind of,"},{"from":3949.5,"to":3952.61,"location":2,"content":"gains for its time investment the way,"},{"from":3952.61,"to":3954.76,"location":2,"content":"uh, leveraging unlabeled data is."},{"from":3954.76,"to":3957.74,"location":2,"content":"Um, so now, if you look at the SQuAD leaderboard, um,"},{"from":3957.74,"to":3964.19,"location":2,"content":"I think at least the top 20 entrants are all BERT plus something."},{"from":3964.19,"to":3967.72,"location":2,"content":"Um, one other issue, uh,"},{"from":3967.72,"to":3969.54,"location":2,"content":"I think BERT has raised is that,"},{"from":3969.54,"to":3971.4,"location":2,"content":"um, we need harder tasks, right?"},{"from":3971.4,"to":3973.56,"location":2,"content":"BERT has almost solved SQuAD,"},{"from":3973.56,"to":3975.06,"location":2,"content":"if you define it by, uh,"},{"from":3975.06,"to":3976.86,"location":2,"content":"getting close to human performance."},{"from":3976.86,"to":3979.23,"location":2,"content":"Um, so there's been, um,"},{"from":3979.23,"to":3982.64,"location":2,"content":"a growth in new datasets that are, uh,"},{"from":3982.64,"to":3985.02,"location":2,"content":"more challenging and there are a couple of ways in which,"},{"from":3985.02,"to":3986.37,"location":2,"content":"um, they can be more challenging."},{"from":3986.37,"to":3988.14,"location":2,"content":"So one is, um,"},{"from":3988.14,"to":3990.24,"location":2,"content":"doing reading comprehension on longer documents,"},{"from":3990.24,"to":3992.63,"location":2,"content":"or doing it across more than one document."},{"from":3992.63,"to":3995.28,"location":2,"content":"Um, one area is looking at c- uh,"},{"from":3995.28,"to":3998.85,"location":2,"content":"coming up with harder questions that require a multi-hop reasoning."},{"from":3998.85,"to":4001.55,"location":2,"content":"Um, so that essentially meas- means you have to string"},{"from":4001.55,"to":4005.18,"location":2,"content":"together multiple supporting facts from different places,"},{"from":4005.18,"to":4007.67,"location":2,"content":"um, to produce the correct answer."},{"from":4007.67,"to":4009.35,"location":2,"content":"Um, and another area,"},{"from":4009.35,"to":4011.87,"location":2,"content":"situating question-answering within a dialogue."},{"from":4011.87,"to":4014.33,"location":2,"content":"Um, there's also been a, kind of,"},{"from":4014.33,"to":4018.26,"location":2,"content":"small detail with the construction of reading comprehension datasets,"},{"from":4018.26,"to":4020.6,"location":2,"content":"that has actually really affected,"},{"from":4020.6,"to":4022.84,"location":2,"content":"um, the, the difficulty of the task."},{"from":4022.84,"to":4024.11,"location":2,"content":"And that is whether, um,"},{"from":4024.11,"to":4026.49,"location":2,"content":"when you create these datasets, um,"},{"from":4026.49,"to":4029.42,"location":2,"content":"is the person who writes questions about a passage,"},{"from":4029.42,"to":4031.53,"location":2,"content":"can they see that passage or not?"},{"from":4031.53,"to":4034.07,"location":2,"content":"Um, so of course, it's much easier to come up"},{"from":4034.07,"to":4036.11,"location":2,"content":"with a question that when you see the passage,"},{"from":4036.11,"to":4038.87,"location":2,"content":"and if you come up with a question without seeing the passage,"},{"from":4038.87,"to":4041.81,"location":2,"content":"you may not even have a answerable question."},{"from":4041.81,"to":4043.73,"location":2,"content":"Um, but the problem with looking at"},{"from":4043.73,"to":4046.46,"location":2,"content":"the passage is that first of all it's not realistic, right?"},{"from":4046.46,"to":4048.84,"location":2,"content":"So, uh, if I'm asking a question, you know,"},{"from":4048.84,"to":4050.59,"location":2,"content":"I'm not going to have usually"},{"from":4050.59,"to":4053.87,"location":2,"content":"the paragraph that answers that question sitting in front of me."},{"from":4053.87,"to":4055.67,"location":2,"content":"Um, on top of that,"},{"from":4055.67,"to":4057.56,"location":2,"content":"it really encourages easy questions, right?"},{"from":4057.56,"to":4059.84,"location":2,"content":"So, um, if you're a Mechanical Turker,"},{"from":4059.84,"to":4062.87,"location":2,"content":"and you're paid to write as many questions as possible,"},{"from":4062.87,"to":4064.79,"location":2,"content":"and then you see an article that says,"},{"from":4064.79,"to":4066.35,"location":2,"content":"um, I don't know, you know,"},{"from":4066.35,"to":4070.04,"location":2,"content":"uh, Abraham Lincoln was the 16th president of the United States,"},{"from":4070.04,"to":4071.6,"location":2,"content":"um, what are you gonna write?"},{"from":4071.6,"to":4073.1,"location":2,"content":"As your question, you're gonna write,"},{"from":4073.1,"to":4075.36,"location":2,"content":"who was the 16th president of the United States."},{"from":4075.36,"to":4078.03,"location":2,"content":"You're not gonna write something more interesting that's harder to answer."},{"from":4078.03,"to":4081.89,"location":2,"content":"Um, so- so this is one way in which crowdsourced datasets have changed, um,"},{"from":4081.89,"to":4084.17,"location":2,"content":"people are now making sure questions are,"},{"from":4084.17,"to":4087.41,"location":2,"content":"sort of, independent of, of the contexts."},{"from":4087.41,"to":4089.38,"location":2,"content":"Um, so I'm gonna briefly, uh,"},{"from":4089.38,"to":4091.61,"location":2,"content":"go over a couple of new datasets in this line."},{"from":4091.61,"to":4095.15,"location":2,"content":"So one is called QuAC, which stands for Question Answering in Context."},{"from":4095.15,"to":4096.81,"location":2,"content":"Um, in this dataset,"},{"from":4096.81,"to":4098.69,"location":2,"content":"there is a teacher and a student,"},{"from":4098.69,"to":4101.39,"location":2,"content":"um, the teacher sees a Wikipedia article."},{"from":4101.39,"to":4104.19,"location":2,"content":"The student wants to learn about this Wikipedia article,"},{"from":4104.19,"to":4108.01,"location":2,"content":"and the goal is to train a machine learning model that acts as the teacher."},{"from":4108.01,"to":4110,"location":2,"content":"Um, so you can imagine maybe in the future, this,"},{"from":4110,"to":4112.19,"location":2,"content":"sort of, technology would be useful for,"},{"from":4112.19,"to":4114.32,"location":2,"content":"uh, um, education for, kind of,"},{"from":4114.32,"to":4117.03,"location":2,"content":"having, uh, adding some automation."},{"from":4117.03,"to":4122.49,"location":2,"content":"Um, uh, one thing that makes this task difficult is that,"},{"from":4122.49,"to":4126.55,"location":2,"content":"uh, questions depend on the entire history of the conversation."},{"from":4126.55,"to":4128.23,"location":2,"content":"Um, so for example, uh,"},{"from":4128.23,"to":4130.79,"location":2,"content":"if you look, um, on the left here, uh,"},{"from":4130.79,"to":4134.81,"location":2,"content":"the example, um, dialogue,"},{"from":4134.81,"to":4137.31,"location":2,"content":"um, the third question is was he the star?"},{"from":4137.31,"to":4142.07,"location":2,"content":"Um, clearly you can't answer that question unless you look back earlier in the dialogue,"},{"from":4142.07,"to":4144.1,"location":2,"content":"and realize that the subject of this,"},{"from":4144.1,"to":4146.18,"location":2,"content":"uh, conversation is Daffy Duck."},{"from":4146.18,"to":4149.06,"location":2,"content":"Um, a- and, sort of,"},{"from":4149.06,"to":4151.04,"location":2,"content":"because this dataset is more challenging,"},{"from":4151.04,"to":4154.34,"location":2,"content":"and you can see there's a, there's a much bigger gap to human performance, right?"},{"from":4154.34,"to":4157.61,"location":2,"content":"So if you train some BERT with some extensions, you'll st- uh,"},{"from":4157.61,"to":4162.19,"location":2,"content":"the results are still like 15 F1 points worse than human performance."},{"from":4162.19,"to":4168.94,"location":2,"content":"Um, um, here's one other dataset, um, called HotPotQA."},{"from":4168.94,"to":4170.51,"location":2,"content":"Um, it is, uh,"},{"from":4170.51,"to":4172.76,"location":2,"content":"designed instead for multi-hop reasoning."},{"from":4172.76,"to":4175.61,"location":2,"content":"Um, so essentially, in order to answer a question,"},{"from":4175.61,"to":4177.88,"location":2,"content":"you have to look at multiple documents,"},{"from":4177.88,"to":4180.35,"location":2,"content":"you have to look at different facts from those documents,"},{"from":4180.35,"to":4181.93,"location":2,"content":"and perform some inference,"},{"from":4181.93,"to":4184.65,"location":2,"content":"um, to get what the correct answer is."},{"from":4184.65,"to":4188.65,"location":2,"content":"Um, so I think, you know, this is a- a much harder task."},{"from":4188.65,"to":4194.59,"location":2,"content":"And again, um, there's a much bigger gap between human performance."},{"from":4194.59,"to":4197.39,"location":2,"content":"Um, any questions on, uh,"},{"from":4197.39,"to":4201.9,"location":2,"content":"new datasets, um, harder chi- tasks for NLP?"},{"from":4201.9,"to":4207.03,"location":2,"content":"Okay. Um, I'm gonna,"},{"from":4207.03,"to":4209.36,"location":2,"content":"kind of, rapid fire and go through, um,"},{"from":4209.36,"to":4212.21,"location":2,"content":"a couple of more areas in the last minutes of this talk."},{"from":4212.21,"to":4216.34,"location":2,"content":"Um, so multitask learning I think is really growing in importance."},{"from":4216.34,"to":4218.39,"location":2,"content":"Um, of course, um,"},{"from":4218.39,"to":4220.19,"location":2,"content":"you've had a whole lecture on this, right?"},{"from":4220.19,"to":4221.75,"location":2,"content":"So I'm not gonna spend too much time on it."},{"from":4221.75,"to":4224.33,"location":2,"content":"Um, but maybe one, uh,"},{"from":4224.33,"to":4228.92,"location":2,"content":"point of interest is that if you look at performance on this GLUE benchmark,"},{"from":4228.92,"to":4231.32,"location":2,"content":"so this benchmark for natural language understanding,"},{"from":4231.32,"to":4234.92,"location":2,"content":"um, all the top couple results, um,"},{"from":4234.92,"to":4237.98,"location":2,"content":"are- that are now actually surpassing BERT in"},{"from":4237.98,"to":4242.39,"location":2,"content":"performance are- is taking BERT and training it in a multi-task way."},{"from":4242.39,"to":4247.37,"location":2,"content":"Um, I think another interesting, uh,"},{"from":4247.37,"to":4252.02,"location":2,"content":"motivation for multi-task learning is that if you are training BERT, you have a really,"},{"from":4252.02,"to":4254.48,"location":2,"content":"really large model and one way to make"},{"from":4254.48,"to":4260.95,"location":2,"content":"more efficient use of that model is training it to do many things at once."},{"from":4260.95,"to":4264.92,"location":2,"content":"Another area that's definitely important, um,"},{"from":4264.92,"to":4269.09,"location":2,"content":"and I think will be important going in the future is dealing with low-resource settings."},{"from":4269.09,"to":4270.89,"location":2,"content":"Um, and here I'm using a really broad,"},{"from":4270.89,"to":4273.02,"location":2,"content":"uh, definition of resources, right."},{"from":4273.02,"to":4275.44,"location":2,"content":"So that could mean compute power, um, you know,"},{"from":4275.44,"to":4278.99,"location":2,"content":"BERT is great but it also takes huge amounts of compute to run it."},{"from":4278.99,"to":4280.31,"location":2,"content":"So it's not realistic to say,"},{"from":4280.31,"to":4282.55,"location":2,"content":"um, if you're building, let's say a mobile, uh,"},{"from":4282.55,"to":4287.51,"location":2,"content":"an app for a mobile device that you could run a model the size of BERT."},{"from":4287.51,"to":4291.85,"location":2,"content":"Um, as I already ga- went into earlier in this talk, um, you know,"},{"from":4291.85,"to":4296.23,"location":2,"content":"low-resource languages is an area that I think is pretty, um,"},{"from":4296.23,"to":4299.12,"location":2,"content":"under-represented in NLP research right now,"},{"from":4299.12,"to":4301.46,"location":2,"content":"because most datasets are in English, um,"},{"from":4301.46,"to":4302.57,"location":2,"content":"but I do think, right,"},{"from":4302.57,"to":4304.13,"location":2,"content":"there's a really, you know,"},{"from":4304.13,"to":4309.24,"location":2,"content":"large number of people that in order to benefit from NLP technology, um,"},{"from":4309.24,"to":4312.2,"location":2,"content":"we'll need to have technologies that work well in a lot of"},{"from":4312.2,"to":4316.06,"location":2,"content":"different languages especially those without much training data."},{"from":4316.06,"to":4320.87,"location":2,"content":"And, um, speaking of low- low amounts of training data, I think in general this is,"},{"from":4320.87,"to":4324.06,"location":2,"content":"uh, a- an interesting area of research,"},{"from":4324.06,"to":4325.55,"location":2,"content":"um, within machine learning."},{"from":4325.55,"to":4327.31,"location":2,"content":"Actually, people are, um,"},{"from":4327.31,"to":4329.31,"location":2,"content":"working a lot on this as well."},{"from":4329.31,"to":4331.46,"location":2,"content":"Um, so a term is often, uh,"},{"from":4331.46,"to":4334.02,"location":2,"content":"a term often used is few shot learning."},{"from":4334.02,"to":4336.41,"location":2,"content":"Um, and that essentially means being able to"},{"from":4336.41,"to":4338.72,"location":2,"content":"train a machine learning model that only sees,"},{"from":4338.72,"to":4340.73,"location":2,"content":"let's say five or ten examples."},{"from":4340.73,"to":4343.37,"location":2,"content":"Um, one motivation there is, um,"},{"from":4343.37,"to":4349.44,"location":2,"content":"I think a clear distinction between how our existing machine learning systems learn,"},{"from":4349.44,"to":4351.88,"location":2,"content":"and how humans learn is that, um,"},{"from":4351.88,"to":4355.55,"location":2,"content":"humans can generalize very quickly from five or so examples."},{"from":4355.55,"to":4357.19,"location":2,"content":"Um, if you're training a neural net,"},{"from":4357.19,"to":4358.58,"location":2,"content":"you normally need, you know,"},{"from":4358.58,"to":4361.61,"location":2,"content":"thousands of examples or perhaps even tens of thousands,"},{"from":4361.61,"to":4365.06,"location":2,"content":"hundreds of thousands of examples to get something that works."},{"from":4365.06,"to":4369.65,"location":2,"content":"Um, so I also see this being a pretty important area in the future."},{"from":4369.65,"to":4373.73,"location":2,"content":"Um, the last area where I want to go in, um,"},{"from":4373.73,"to":4377.6,"location":2,"content":"a little bit more depth is interpreting and understanding models."},{"from":4377.6,"to":4380.57,"location":2,"content":"Um, so, so really there's two aspects of this."},{"from":4380.57,"to":4384.1,"location":2,"content":"One is if I have a machine learning model and it makes a prediction,"},{"from":4384.1,"to":4386.45,"location":2,"content":"I would like to be able to, uh,"},{"from":4386.45,"to":4388.79,"location":2,"content":"know why did it make that prediction?"},{"from":4388.79,"to":4391.39,"location":2,"content":"So gets some rationale, get some explanation,"},{"from":4391.39,"to":4395.18,"location":2,"content":"um, that would especially be important in an area like health care, right?"},{"from":4395.18,"to":4397.91,"location":2,"content":"So if you're a doctor and you're making a decision, um,"},{"from":4397.91,"to":4401.09,"location":2,"content":"it's probably not good enough for your machine learning model to say,"},{"from":4401.09,"to":4402.47,"location":2,"content":"\"Patient has disease X.\""},{"from":4402.47,"to":4403.81,"location":2,"content":"You really want it to say,"},{"from":4403.81,"to":4406.07,"location":2,"content":"\"Patient has disease X for these reasons.\""},{"from":4406.07,"to":4408.59,"location":2,"content":"Um, because then you as a doctor can double-check,"},{"from":4408.59,"to":4410.54,"location":2,"content":"and, and try to validate the, the,"},{"from":4410.54,"to":4413.16,"location":2,"content":"uh, machine's, um, thinking I guess,"},{"from":4413.16,"to":4415.61,"location":2,"content":"um, to come up with that diagnosis."},{"from":4415.61,"to":4418.64,"location":2,"content":"Um, the other area of interpreting"},{"from":4418.64,"to":4421.37,"location":2,"content":"understanding models is more of a scientific question, right?"},{"from":4421.37,"to":4423.86,"location":2,"content":"Is we know things like BERT work really well,"},{"from":4423.86,"to":4425.96,"location":2,"content":"um, we want to know why do they work well?"},{"from":4425.96,"to":4428.19,"location":2,"content":"What -what what aspects of language do they model?"},{"from":4428.19,"to":4429.99,"location":2,"content":"Um, what things don't they model?"},{"from":4429.99,"to":4432.02,"location":2,"content":"Um, and that might lead to, um,"},{"from":4432.02,"to":4435.69,"location":2,"content":"ideas of improving, um, those- those models."},{"from":4435.69,"to":4439.58,"location":2,"content":"Um, so, um, here is a, uh,"},{"from":4439.58,"to":4444.94,"location":2,"content":"couple slides on the main approach for evalu- answering the sort of scientific questions."},{"from":4444.94,"to":4446.98,"location":2,"content":"What does a machine-learning model learn?"},{"from":4446.98,"to":4450.53,"location":2,"content":"Um, what you do is you have a model so let's say it's BERT."},{"from":4450.53,"to":4453.44,"location":2,"content":"It takes as input a sequence of words, um,"},{"from":4453.44,"to":4456.47,"location":2,"content":"it produces as output a sequence of vectors, um,"},{"from":4456.47,"to":4458.57,"location":2,"content":"we want to ask does it know for example,"},{"from":4458.57,"to":4459.68,"location":2,"content":"the part of speech of words?"},{"from":4459.68,"to":4462.45,"location":2,"content":"So, so it does in its vector representations,"},{"from":4462.45,"to":4464.63,"location":2,"content":"does that capture something about syntax?"},{"from":4464.63,"to":4469.85,"location":2,"content":"Um, and a simple way of asking this question is train another classifier on top of BERT,"},{"from":4469.85,"to":4471.97,"location":2,"content":"uh, that's trained to do,"},{"from":4471.97,"to":4474.4,"location":2,"content":"um, let's say part-of-speech tagging."},{"from":4474.4,"to":4476.82,"location":2,"content":"Um, but we only, um,"},{"from":4476.82,"to":4479.94,"location":2,"content":"backprop into that diagnostic classifier itself."},{"from":4479.94,"to":4483.68,"location":2,"content":"So in other words we're treating the output of BERT, um,"},{"from":4483.68,"to":4486.19,"location":2,"content":"that sequence of vectors as a fixed input,"},{"from":4486.19,"to":4488.6,"location":2,"content":"and we're sort of probing those vectors to see,"},{"from":4488.6,"to":4490.51,"location":2,"content":"um, do they contain, um,"},{"from":4490.51,"to":4492.44,"location":2,"content":"information about a part of speech that"},{"from":4492.44,"to":4496.44,"location":2,"content":"this second diagnostic classifier on top can decode,"},{"from":4496.44,"to":4499.12,"location":2,"content":"um, to get the correct labels?"},{"from":4499.12,"to":4503.69,"location":2,"content":"Um, so, um, it was kind of quite a few concerns here."},{"from":4503.69,"to":4506.54,"location":2,"content":"Um, one concern is, uh,"},{"from":4506.54,"to":4509.91,"location":2,"content":"if you make your diagnostic classifier too complicated,"},{"from":4509.91,"to":4513.2,"location":2,"content":"it can just solve the classif- the task all on itself,"},{"from":4513.2,"to":4515.21,"location":2,"content":"and it can basically ignore, uh,"},{"from":4515.21,"to":4517.56,"location":2,"content":"whatever representations were produced by BERT."},{"from":4517.56,"to":4520.04,"location":2,"content":"Um, so- so the kind of standard thing right now is to use"},{"from":4520.04,"to":4523.2,"location":2,"content":"a single softmax layer on top of BERT,"},{"from":4523.2,"to":4525.19,"location":2,"content":"um, to do these decisions."},{"from":4525.19,"to":4529.1,"location":2,"content":"Um, and there's been a whole bunch of tasks proposed for"},{"from":4529.1,"to":4532.9,"location":2,"content":"evaluating essentially the linguistic knowledge of these models."},{"from":4532.9,"to":4534.78,"location":2,"content":"Um, so you could do part-of-speech tagging,"},{"from":4534.78,"to":4537.08,"location":2,"content":"you could do more semantic tasks like,"},{"from":4537.08,"to":4539.28,"location":2,"content":"uh, relation extraction, um,"},{"from":4539.28,"to":4541.27,"location":2,"content":"or- or something like co-reference."},{"from":4541.27,"to":4544.28,"location":2,"content":"Um, and this is a pretty active area of work."},{"from":4544.28,"to":4547.06,"location":2,"content":"Um, here is, uh, just one, uh,"},{"from":4547.06,"to":4551.19,"location":2,"content":"plot showing some of the results, um, of this approach."},{"from":4551.19,"to":4553.86,"location":2,"content":"So here what we're doing is we're adding"},{"from":4553.86,"to":4556.95,"location":2,"content":"diagnostic classifiers to different layers of BERT,"},{"from":4556.95,"to":4562.62,"location":2,"content":"and we are seeing which layers of BERT are more useful for particular tasks."},{"from":4562.62,"to":4567.02,"location":2,"content":"Um, and, um, something kind of interesting comes out of this which is that, um,"},{"from":4567.02,"to":4570.31,"location":2,"content":"the different layers of BERT seem to be corresponding, um,"},{"from":4570.31,"to":4572.89,"location":2,"content":"fairly well with notions of,"},{"from":4572.89,"to":4575.4,"location":2,"content":"uh, different layers of li- of linguistics."},{"from":4575.4,"to":4579.11,"location":2,"content":"Um, so, uh, dependency parsing which is a syntactic task,"},{"from":4579.11,"to":4580.94,"location":2,"content":"um, it's, uh, considered sort of a, you know,"},{"from":4580.94,"to":4583.43,"location":2,"content":"medium level task in understanding a sentence."},{"from":4583.43,"to":4588.13,"location":2,"content":"Um, the medium layers of BERT, so layers kind of 6 through 8 or something,"},{"from":4588.13,"to":4590.48,"location":2,"content":"are the ones best at dependency parsing."},{"from":4590.48,"to":4594.1,"location":2,"content":"Um, if you have a se- very semantic task like sentiment analysis,"},{"from":4594.1,"to":4595.88,"location":2,"content":"um, where you're trying to learn some kind of, uh,"},{"from":4595.88,"to":4598.32,"location":2,"content":"semantic property of the whole sentence, um,"},{"from":4598.32,"to":4601.49,"location":2,"content":"then the very last layers of BERT are the ones that seem"},{"from":4601.49,"to":4606.31,"location":2,"content":"to encode the most information about- about this, uh, phenomenon."},{"from":4606.31,"to":4608.69,"location":2,"content":"Um, okay."},{"from":4608.69,"to":4610.84,"location":2,"content":"So this is almost it for the talk, um,"},{"from":4610.84,"to":4614.6,"location":2,"content":"I just have one slide here of, uh, um,"},{"from":4614.6,"to":4617.87,"location":2,"content":"NLP not in kind of the academic researching context,"},{"from":4617.87,"to":4620.73,"location":2,"content":"which I have already been talking a lot about but NLP in industry,"},{"from":4620.73,"to":4623.07,"location":2,"content":"and really there's rapid progress there."},{"from":4623.07,"to":4626.31,"location":2,"content":"And I wanted to point to you two areas where I think there's"},{"from":4626.31,"to":4630.65,"location":2,"content":"especially a large interest in using NLP technology."},{"from":4630.65,"to":4632.24,"location":2,"content":"Um, one is dialogue,"},{"from":4632.24,"to":4634.01,"location":2,"content":"um, so for things like chatbots, right?"},{"from":4634.01,"to":4637.58,"location":2,"content":"There's the Alexa Prize where they're actually investing a lot of money in,"},{"from":4637.58,"to":4641.1,"location":2,"content":"um, having groups figure out how to improve chitchat dialogue."},{"from":4641.1,"to":4645.23,"location":2,"content":"Um, there's also I think a lot of potential for customer service, right?"},{"from":4645.23,"to":4648.17,"location":2,"content":"So improving basically automated systems that'll, um,"},{"from":4648.17,"to":4649.58,"location":2,"content":"you know, book you a flight,"},{"from":4649.58,"to":4652.39,"location":2,"content":"or help you cancel a subscription, or anything like that."},{"from":4652.39,"to":4655.46,"location":2,"content":"Um, and similarly, there's a lot of potential in health care."},{"from":4655.46,"to":4659.18,"location":2,"content":"Um, one is understanding the records of someone who,"},{"from":4659.18,"to":4662.06,"location":2,"content":"um, is sick and to help them- to help with diagnoses."},{"from":4662.06,"to":4663.94,"location":2,"content":"Um, I think another, um,"},{"from":4663.94,"to":4666.22,"location":2,"content":"equally important area is actually, uh,"},{"from":4666.22,"to":4669.02,"location":2,"content":"parsing, uh, biomedical papers."},{"from":4669.02,"to":4674.28,"location":2,"content":"Um, so, um, the number of biomedical papers that are being written is really insane,"},{"from":4674.28,"to":4676.1,"location":2,"content":"um, it's, it's way larger than the number"},{"from":4676.1,"to":4677.96,"location":2,"content":"of computer science papers that are being written."},{"from":4677.96,"to":4681.53,"location":2,"content":"[NOISE] Um, often if you're a doctor,"},{"from":4681.53,"to":4683.15,"location":2,"content":"or if you're a researcher, um,"},{"from":4683.15,"to":4686.36,"location":2,"content":"in medicine, you might want to look up something very specific, right?"},{"from":4686.36,"to":4687.62,"location":2,"content":"You might want to know what is"},{"from":4687.62,"to":4691.37,"location":2,"content":"the effect of this particular drug on this particular gene,"},{"from":4691.37,"to":4693.14,"location":2,"content":"or a cell with this particular gene."},{"from":4693.14,"to":4696.71,"location":2,"content":"Um, there's no good way right now of searching through, um,"},{"from":4696.71,"to":4700.18,"location":2,"content":"hundreds of thousands of papers to find if someone has a- has, uh,"},{"from":4700.18,"to":4703.09,"location":2,"content":"done this experiment and have results for this,"},{"from":4703.09,"to":4705.1,"location":2,"content":"um, particular combination of things."},{"from":4705.1,"to":4708.59,"location":2,"content":"Um, so automated reading of all this biomedical literature,"},{"from":4708.59,"to":4711.1,"location":2,"content":"um, could have a lot of value."},{"from":4711.1,"to":4713.96,"location":2,"content":"Okay, um, to conclude, um,"},{"from":4713.96,"to":4718.28,"location":2,"content":"there's been rapid progress in the last five years due to deep learning, um, in NLP."},{"from":4718.28,"to":4722.78,"location":2,"content":"Um, in the last year, we've seen another really kind of, uh,"},{"from":4722.78,"to":4725.3,"location":2,"content":"a dramatic increase in the capability of our systems,"},{"from":4725.3,"to":4727.61,"location":2,"content":"thanks to, uh, using unlabeled data."},{"from":4727.61,"to":4729.1,"location":2,"content":"So that's methods like BERT."},{"from":4729.1,"to":4734.21,"location":2,"content":"Um, and, um, the other kind of thing that's I think important to think about is that,"},{"from":4734.21,"to":4738.17,"location":2,"content":"NLP systems are starting to be at a place where they can have big social impact."},{"from":4738.17,"to":4744.85,"location":2,"content":"Um, so that makes some issues like bias and security very important. Um, thank you."},{"from":4744.85,"to":4746.69,"location":2,"content":"Uh, good luck finishing all your projects."},{"from":4746.69,"to":4754.8,"location":2,"content":"[APPLAUSE]."}]}