字幕記錄 00:03 alright hello everyone let's get started 00:08 I want to talk about a system called 00:11 certificate transparency today and this 00:15 is a bit of a departure from most of the 00:18 topics we talked about so far we've 00:21 talked about distributed systems that 00:23 are really closed systems where all the 00:24 participants are trustworthy they're all 00:27 maybe be run being run by the same sort 00:30 of mutually trusting organization like 00:32 rafters that way you know you just 00:34 assume at the RAF's tiers do what 00:36 they're supposed to do but there's also 00:40 plenty of systems out there particularly 00:42 systems sort of built an internet scale 00:44 where the systems are open and anyone 00:48 can participate being active participant 00:50 I mean in some big systems out there and 00:54 if you build systems that are completely 00:56 open in that way there's often no single 01:01 universally trusted Authority that 01:03 everybody is willing to trust to run the 01:06 system or to protect it that is 01:09 everybody you sort of potentially 01:11 mutually suspicious of everyone else and 01:14 if that's the situation you have to be 01:16 able to build useful systems out of 01:18 mutually distrusting pieces and this 01:23 makes in any sort of internet wide open 01:26 systems to make trust and security sort 01:27 of top level systems issues when you're 01:30 thinking about designing a distributed 01:31 system so the most basic question when 01:34 you're building an open system is when 01:37 I'm talking to another computer or 01:38 another person you need to know are you 01:41 talking to the right other computer or 01:43 are you talking to the right website and 01:46 this problem is actually close to 01:48 unsolvable it turns out there's really 01:51 there's lots of solutions and none 01:54 really work that well but it is the 01:56 problem that certificate transparency 01:59 today's topic is trying to help with the 02:04 material today ties sort of backwards in 02:06 the course to consistency it turns out 02:08 that a lot of what certificate 02:09 transparency do doing is ensuring that 02:12 all parties see 02:13 the same information about certificates 02:16 that's a real consistency issue and this 02:18 material also ties forward to blockchain 02:21 systems like blockchain which is what we 02:23 talk talking about next week a 02:26 certificate transparency is among the 02:29 relatively few non cryptocurrency uses 02:32 of a blockchain like design alright so 02:37 by way of introduction I want to start 02:39 with the situation on the web with web 02:45 security at any rate as it existed 02:47 before 1995 before certificates so this 02:50 is for 1995 and in particular there was 02:56 a there was a kind of attack in those 02:57 days that people were worried about 03:00 called a man-in-the-middle attacks this 03:02 is man in 03:07 middle and this is a name for a class of 03:10 attacks style of attack so you know the 03:14 set up in those days is you have the 03:16 internet and you have people running 03:20 browsers 03:23 um sitting with our computer attached to 03:26 the Internet 03:27 anyone sitting in front of my computer I 03:29 want to talk to a specific server 03:31 exposing what I want to do is talk to 03:33 gmail.com right and ordinarily I would 03:39 you know maybe contact the DNS system I 03:42 would as a user I maybe type gmail.com I 03:45 would sort of know what it was I wanted 03:47 to talk to name Li gmail.com my browser 03:49 would talk to DNS servers say what's 03:51 gmail.com it would reply with a IP 03:54 address I connected that IP address and 03:56 you know I need to authenticate myself 03:58 so I'd probably type my password to 04:00 Gmail to Gmail's website and then Gmail 04:02 would show me my email without some kind 04:08 of story for security this system is 04:10 actually quite easy to attack and turn 04:12 out to be easy to attack and the one 04:18 style of attack is that what's called a 04:19 man-in-the-middle attack where some evil 04:21 person sets up a another web server that 04:25 serves pages that look just like Gmail 04:28 web servers like the last for your login 04:30 and password right and then the attacker 04:34 would maybe intercept my DNS packets or 04:39 just guess when I would have sent a DNS 04:41 packet and come up with a fake reply 04:43 that instead of providing the real IP 04:47 address of the real gmail.com server 04:49 would provide the email address of ma of 04:52 the attackers fake computer and then the 04:54 user's browser instead of talking to 04:56 Gmail would actually unknown to them be 05:00 talking to the attackers computer the 05:02 attackers computer would provide a web 05:04 page looks just like a login page user 05:05 types are paths log and a password and 05:08 now the attackers computer can forward 05:11 that to the real Gmail login for you of 05:14 course you don't know that you know get 05:16 your current inbox back to the attackers 05:18 computer which presumably records it 05:20 along with your password and then sends 05:22 your inbox or whatever to the browser 05:24 and this allows a you know if you can 05:28 execute this kind of man-in-the-middle 05:29 attack the attackers computer can record 05:32 your password record your email and 05:34 you'll never be the wiser 05:35 and 05:36 before certificates on SSL and HTTPS 05:40 there was really no defense against this 05:42 mom okay so this is the man in the 05:46 middle attack and this attacker here is 05:48 the man in the middle looks just like 05:50 Gmail to the browser pretends to be the 05:53 user when talking to Gmail so that it 05:54 can actually get the information from 05:57 Gmail required to trick the user into 05:59 thinking it's really Gmail all right so 06:01 this is the attack in the mid-90s people 06:05 came up with certificates with SSL or 06:11 it's also called TLS it's what the 06:14 protocol the security protocol that 06:15 you're using when you use HTTPS links um 06:20 and here the game was that Gmail comm 06:24 was gonna have a public/private key pair 06:28 so we'd have a private key that only 06:34 Gmail knows sitting in its server and 06:38 then when you connect well your the user 06:41 you connect somewhere you ask to connect 06:44 to Gmail you know and in order to verify 06:48 that you're really talking to Gmail the 06:50 users going to demand Gmail prove that 06:52 it really owns Gmail is private key well 06:55 of course 06:55 where does your browser find out Gmail 06:58 is private key from your Gmail public 07:01 key which is what you need to check that 07:03 it really has the private key there's 07:05 also this notion of certificate 07:07 authorities and certificates so there'd 07:09 be a certificate authority when Gmail 07:11 set up its server it would contact the 07:14 certificate authority may be on the 07:15 phone or by email or something and say 07:17 look you know I want a certificate for 07:19 the DNS name gmail.com and the 07:24 certificate authority would sort of try 07:25 to verify that oh yes whoever's asking 07:28 for certificate really owns that name 07:30 it really is Google or whoever owns 07:32 gmail.com and if so the certificate 07:35 authority would provide a certificate 07:39 back to gee 07:40 komm which basically what a certificate 07:43 contains is the name of the web server 07:50 the web servers public key and a 07:57 signature over this certificate made 08:01 with the certificate authorities private 08:04 key so this is sort of a self-contained 08:08 assertion checkable by checking the 08:11 signature an assertion by the 08:12 certificate authority that the public 08:15 key of gmail.com is really this public 08:18 key gmail.com server would I just keep a 08:21 copy of the certificate if you connect 08:23 to gmail.com server with HTTPS the first 08:27 thing it does is sends you back this 08:28 certificate at this point is just a 08:32 certificate right now of course since 08:33 gmail.com is willing to give it to 08:35 anybody it's the certificate itself is 08:37 not at all pregnant it's quite public 08:38 and then the browser would send some 08:42 information like a random number for 08:44 example to the server and ask it to sign 08:48 it with its private key and then the 08:53 browser can check using the public key 08:55 in the certificate that the random 08:57 number is ran and remember was really 08:59 signed by the private key that's 09:02 associated with the public key in the 09:04 certificate and therefore that whoever 09:05 it's talking to is really the entity 09:08 that the certificate authority believes 09:10 is gmail.com all right and now the 09:14 reason why this makes man-in-the-middle 09:15 attacks much harder is that yeah you 09:17 know you can set up a rogue server that 09:20 looks just like Gmail calm and maybe you 09:23 can even hack the DNS system indeed you 09:25 still can if you're sufficiently clever 09:27 powerful hack the DNS system to tell 09:32 people's browsers that oh they should go 09:34 to your server instead of gmail.com but 09:36 once somebody's browser contacts your 09:38 server 09:40 you're not presumably going to be able 09:42 to produce a certificate that says but 09:46 you you can produce Gmail certificate 09:47 but then Gmail certificate as Gmail's 09:50 public key your server doesn't have 09:51 their private key so you can 09:53 sign the challenge the browser sent you 09:55 and presumably since you're not the real 09:58 Google and not the real Gmail you're not 10:01 going to be able to persuade a 10:01 certificate authority to give you a 10:03 certificate associating gmail comm with 10:06 your public key that unit and so this 10:10 certificate scheme made 10:11 man-in-the-middle attacks quite a bit 10:13 harder and you know indeed they are 10:14 quite a bit harder now because of 10:16 certificates okay so it turns out though 10:21 that the certificate scheme as people 10:24 now have a lot of experience with it 10:27 almost 25 years experience within so we 10:30 now know there's some kind of things 10:32 that go wrong it was originally imagined 10:34 that there would just be a couple of 10:35 trustworthy certificate authorities who 10:38 would do a good job of checking that 10:40 request really came from who they 10:42 claimed to come from that if somebody 10:43 asked for a certificate for gmail.com 10:45 that this certificate authorities would 10:46 indeed actually verified that the 10:49 request came from the owner gmail.com 10:50 and not hand out certificates to random 10:53 people for gmail comp but it that turns 10:57 out to be very challenging for google 11:00 maybe you can convince this certificate 11:02 authority can convince itself that a 11:04 request comes from Google but you know 11:06 for just XCOM that's very hard to have a 11:09 certificate authority reliably able to 11:11 say oh yeah gosh this request really 11:14 came from the person who really does own 11:16 the DMS name XCOM all right a worse 11:20 problem is that while originally they 11:23 were envisioned there'd be only a few 11:25 certificate authority there are now 11:26 literally hundreds of certificate 11:28 authorities out there and any 11:30 certificate authority can generate a 11:33 certificate for any name and indeed may 11:38 want to you're allowed to change 11:39 certificate authorities if you're a 11:40 website owner you can change certificate 11:42 authority to whoever you like so there's 11:46 no sense in which certificate 11:48 authorities have limits on their powers 11:49 they can any certificate authority can 11:51 produce any certificate and now browsers 11:56 have you know there's a couple hundred 11:57 certificate authorities and that means 11:59 that each browser has built into it like 12:00 Chrome or Firefox or something has built 12:03 into it a list of the public keys of all 12:05 the certificate all couple hundred sort 12:07 good authorities and if any of them sign 12:09 has signed a certificate produced by web 12:11 server certificates acceptable the 12:16 result of this is that there have been 12:18 multiple incidents of certificate 12:21 authorities producing bogus certificates 12:23 that is producing certificates that said 12:27 they were certificate for Google or 12:28 Gmail or some other real company but 12:31 were actually issued to someone totally 12:34 else absolutely not issued certificate 12:37 for one of Google's names but not issued 12:40 to Google issued to someone else like 12:44 and you know sometimes this happens just 12:47 by mistake because superior Authority 12:50 doesn't realize that they're doing the 12:52 wrong thing and sometimes it's actually 12:53 quite malicious I mean there have 12:55 certainly been certificates issued to 12:57 people who just wanted to snoop on 12:59 people's traffic and mount 13:01 man-in-the-middle attacks and did 13:02 Mountain Man the middle attacks today's 13:05 readings are mentioned a couple of these 13:07 incidents and they're particularly 13:09 troubling because they're hard to 13:12 prevent because there's so many 13:13 certificate authorities and not all of 13:15 them 13:16 although sorry the last question let was 13:19 the last line insert box it's a 13:21 signature over the certificate by the 13:23 sir tip using by the certificate 13:25 authorities using the certificate 13:27 authorities private key okay so there 13:32 have been incidents of bogus 13:33 certificates certificates for real 13:35 websites like Google issued to totally 13:38 the wrong people and those certificates 13:40 have been abused and it's not clear how 13:43 to fix the certificate authority system 13:45 itself to prevent them because there's 13:47 so many certificate authorities and they 13:50 really you just can't expect that 13:54 they're going to be completely reliable 13:55 so what can we do about this one 14:00 possibility would be to have a single 14:03 online database of all valid 14:05 certificates so that when a browser 14:07 you know browser Comcast websites web 14:09 site hands at a certificate you know 14:11 might or might be valid then maybe you 14:13 could imagine the browser would contact 14:15 the global valid certificate database 14:18 ins assays this really is certificate 14:20 a bogus certificate issued by a row 14:24 certificate authority um the problem is 14:28 as many problems with that approach one 14:32 is it's still not clear how you can how 14:36 anybody can distinguish valid correctly 14:38 issued certificates from bogus 14:40 certificates because typically you just 14:42 don't know who the proper owner of DNS 14:44 names it is furthermore you need to 14:47 allow certificate owners to change 14:49 certificate authorities or renew their 14:51 certificates or they may lose their 14:52 private key and need a new certificate 14:54 to replace their older to think because 14:57 using a new public/private key pair so 15:00 people's certificates change all the 15:02 time and finally even if technically or 15:05 were possible to distinguish correct 15:07 certificates from bogus ones 15:10 there's no entity that everybody would 15:12 trust to do it you know everybody in the 15:14 world those you know the Chinese 15:15 Iranians the Americans you know there's 15:18 not any one outfit that they all trust 15:21 and that's the root reason why there's 15:23 so many certificate authorities so we 15:26 really can't you really can't expect 15:29 there to be a single Clearing House that 15:31 accurately distinguishes between valid 15:33 and invalid certificates however what 15:38 certificate authority certificate 15:40 transparency doing is doing is 15:42 essentially try not do the best that 15:47 it's possible to do you know the longest 15:51 step it can towards a database of the 15:54 holid trustworthy certificates so now 15:59 I'm gonna give an overview of the 16:02 general strategy of certificate 16:04 transparency the style of certificate 16:10 transparency is that it's an audit 16:13 system because it's so hard hard to 16:18 impossible to just decide does this 16:21 person own a name a certificate 16:23 transparency isn't a building a system 16:25 that prevents bad things from happening 16:27 which would require you to be able to 16:29 detect right away that as 16:32 certificate was bogus instead 16:35 certificate transparency is going to 16:37 enable audit that is it'll it's a system 16:42 to cause all the information to be 16:44 public so that it can be inspected by 16:47 people who care that is it's gonna if 16:49 you know maybe people it'll still allow 16:51 people to issue bogus certificates but 16:53 it's gonna insure those certificates are 16:55 public and that everybody can see them 16:58 including whoever it is that owns the 17:01 name that the name that's in the bogus 17:06 certificate and so this fixes the 17:07 problem with the pre certificate 17:10 transparency system where certificate 17:12 authorities could issue bogus 17:13 certificates and no one would ever know 17:15 and they could even give them to victim 17:19 a few victim browsers who would be 17:21 tricked by them and still because 17:23 certificates aren't generally public 17:24 they could somebody could a certificate 17:28 authority could issue a bogus 17:30 certificate for anybody for Google or 17:32 Microsoft and Google Microsoft might 17:34 never realize it and the incidents that 17:35 have come to light have generally been 17:37 discovered only by accident not because 17:41 they were sort of foredoomed to be 17:43 discovered so instead of relying on 17:46 accidental discovery of bogus 17:48 certificates certificate transparency 17:50 it's going to sort of force them into 17:51 the light where they is much easier to 17:54 notice them again so it has a sort of 17:57 audit flavor or nada not a prevention 17:59 flavor okay so the basic structure again 18:04 we have gmail.com or some other service 18:08 that wants a certificate as usual 18:11 they're gonna ask someone of the 18:12 hundreds of CAS for a certificate when 18:15 when when the cert web servers first set 18:18 up so we're gonna ask a certificate and 18:21 the certificate authority is gonna send 18:23 this certificate back to the web server 18:26 because of course is the web server that 18:28 gives a certificate to the browser and 18:32 at the same time though the certificate 18:34 authority is going to send a copy of the 18:36 certificate or equivalent information to 18:41 a sort 18:43 Transparency vlog server there's gonna 18:46 the real system there's multiple 18:48 independent certificate transparently 18:50 log servers i can assume there's just 18:52 one so this is some service that you 18:55 know we don't have turns out we're not 18:56 gonna have to trust the certificate 19:00 authorities gonna send it certificate to 19:01 this certificate log service which has 19:04 been maintaining a log of all issued 19:08 certificates or all ones that 19:10 certificate authorities have told it 19:12 about when it gets a new certificate 19:13 it's gonna append it to its log so this 19:17 you know might have millions of 19:18 certificates in it after a while now 19:22 when the browser and some human wants to 19:26 talk to a website they you know they 19:29 talk did set up an HTTPS connection to 19:32 Gmail Gmail sends them a certificate 19:33 back and the browser's gonna send that 19:38 certificate to the certificate log 19:40 server see is this certificate in the 19:42 log there's difficut log servers gonna 19:46 say yes or no is their certificate in 19:48 the log now and if it is then the 19:50 browser will go ahead and use it now the 19:53 fact that it's in the log you know 19:55 doesn't mean it's not bogus right 19:56 because any certificate authority 19:58 including the ones that are out there 20:00 that are malicious or badly run any 20:03 certificate authority can insert a 20:06 certificate into the log system and 20:09 therefore perhaps trick users into using 20:13 it so for so far we haven't built a 20:14 system that prevents abuse however it is 20:20 the case that no browser will use a 20:22 certificate unless it's in the log so at 20:25 the same time gmail is going to run up 20:29 with the CT system calls a monitor and 20:34 for now well 20:36 just assume that there's a monitor 20:37 associated with every website so this 20:39 monitor periodically also talks to the 20:44 certificate log servers an asset please 20:47 give me a copy of your log or really you 20:49 know please give me a copy of whatever 20:51 new has been added to your long since I 20:52 last asked and that means that the 20:54 monitor is going to build up it's going 20:55 to be aware of every single certificate 20:58 that's going to be enough that's in the 21:00 log and but also because the monitor is 21:03 associated with Gmail the monitor knows 21:05 what Gmail's correct certificate is so 21:10 if some rogue certificate authority 21:12 issues a certificate for Gmail it's not 21:14 the one that Gmail itself asked for then 21:18 Gmail's monitor will stumble across it 21:20 in the certificate log because Gmail's 21:24 monitor knows Gmail's correct 21:26 certificate now of course the rogue 21:29 certificate authority doesn't have to 21:30 send its certificate to the certificate 21:32 log system but in that case when 21:34 browsers you know maybe accidentally 21:37 connect to the attackers web server and 21:40 the attacker would swipe server gives 21:42 them the bogus certificate if they 21:43 haven't put it in the log then the 21:45 browser won't believe it and will abort 21:47 the connection it's not because it's not 21:48 in the log 21:49 so the log sort of forces because 21:53 browsers require certificates being a 21:55 log the log forces all certificates to 21:58 be public where they can be audited and 22:00 checked by monitors who know what the 22:03 proper certificates our and so some 22:05 monitors are run by big companies and 22:07 companies know their own certificates 22:10 some monitors are run by certificate 22:12 authorities on behalf of their customers 22:14 and again those certificate authorities 22:15 know what certificates they've issued to 22:17 their customers and they can at least 22:19 alert their customers if they see a 22:21 certificate they didn't issue for one of 22:23 their customers names I'm in addition 22:26 there's some totally third-party monitor 22:28 systems where you give the third-party 22:30 monitor your names and yours and your 22:34 valid certificates and it checks for 22:37 expected certificates for your names 22:41 alright this is the overall scheme but 22:47 it depends very much on browsers seeing 22:51 the very same log contents that monitors 22:54 see and but remember we were up against 22:59 this problem that we're not sure that we 23:00 can trust any component in this system 23:02 so indeed we found this certificate 23:04 authorities some of them are malicious 23:06 or have employees who can't be trusted 23:07 or are sloppy and don't follow the rules 23:11 so we're going to assume we have to 23:13 assume that the same will be true the 23:14 certificate log servers that some of 23:16 them will be malicious some of them may 23:18 conspire with rogue certificate 23:21 authorities and intentionally try to 23:23 help them issue bogus certificates some 23:27 of them may be sloppy some of them may 23:29 be legitimate but maybe some of their 23:31 employees or are corruptible you pay 23:33 them being a bribe so I'll do something 23:36 funny to the log delete something or add 23:38 something to it so what we need to build 23:41 is a log that even though the log 23:43 operator may be not cooperating not 23:47 trustworthy we can still be sure or at 23:50 least know if it's not the case that 23:52 browsers are seeing the same log contest 23:54 as monitors so if our browser uses a 23:56 certificate that was in the log the 23:59 monitor who owns that name will 24:01 eventually see it so what we need to do 24:05 is we need to build a log system that is 24:13 append-only so that it can't show a 24:16 certificate to a browser then delete it 24:20 before monitors see it so append-only 24:27 no Forks 24:28 in the sense that we don't want the log 24:33 system to basically keep two logs one of 24:36 which it shows two browsers and one of 24:38 which shows two monitors so we need no 24:41 Forks and we need untrusted we can't be 24:53 sure that the certificate servers are 24:56 correct so just to back up a bit the 25:02 critical properties we need for the log 25:05 system so larger than just a log servers 25:08 but the entire system of the log servers 25:10 plus the various checks is we have to 25:14 prevent deletion that is we need the 25:16 logs to be append only because if a log 25:19 server could delete items out of its log 25:24 then they could effectively show a bogus 25:26 certificate to a browser claimants in 25:29 the long and maybe in the log at that 25:31 time the browser uses it but then maybe 25:34 this certificate server could delete 25:35 that certificate from its log so that by 25:38 the time the monitor's came to look at 25:40 the log the bogus certificate wouldn't 25:42 be there so we need to have a system 25:44 that either prevents deletion or at 25:46 least detects if deletion occurred so 25:49 that's the sense in which the system 25:52 needs to be append-only and we also have 25:56 to prevent what's called equivocation or 26:00 not' we have to prevent Forks or 26:02 equivalently equivocation 26:08 so you know it's 26:12 maybe the certificate log servers could 26:15 be implementing append-only logs but if 26:17 it if it uh implemented two different 26:21 depend the only logs and showed one two 26:23 browsers and show the other append-only 26:25 log two monitors then we could be in a 26:27 position where yeah you know that the 26:30 browser that we showed the log we showed 26:31 the browser's contains the bogus 26:33 certificate but the log we showed a 26:36 monitors doesn't doesn't contain the 26:39 bogus certificate so we have to rule out 26:42 equivocation to all without trusting the 26:45 servers so how can we do this now we're 26:50 getting into the kind of details that 26:53 the last of the assignments was talking 26:56 about the first step is this thing 27:00 called a Merkel tree and this is 27:05 something that's sort of that the log 27:08 servers are expected to build on top of 27:10 the log so the idea is that there's the 27:12 actual log itself which is a sequence of 27:14 certificates you know certificate one 27:17 certificate to presumably in the order 27:19 that a certificate 27:24 certificates to be added to the system 27:26 and the prime millions I'm just going to 27:28 assume there's a couple now it's gonna 27:33 turn out you know we don't want to have 27:35 the browser's have to download the whole 27:36 log and so we need tools to so that we 27:40 can allow the logging system to 27:42 basically send trustworthy summaries or 27:48 unambiguous summaries of what's in the 27:50 log to the the browsers and I'll talk in 27:53 a bit about it exactly what those 27:54 summaries are used for but the basic 27:57 scheme is that the log servers are gonna 28:03 use cryptographic hashes to sort of hash 28:07 up the complete set of records that are 28:10 in the log can produce a single 28:11 cryptographic hash which is typically 28:14 these days about 256 bits long so the 28:16 cryptographic hash summarizes the 28:19 countenance of the log and the way 28:23 that's done is that the is as a 28:25 basically a tree structure of pairs over 28:28 hash always hashing together pairs of 28:30 numbers at the zeroeth level so I'm 28:35 gonna write each for a hash each one of 28:38 the log entries has a hash so we're 28:40 gonna have sort of at the base level we 28:42 have the hash of each log entry each 28:46 certificate and then we're going to hash 28:50 up peers so that the next level we're 28:55 gonna have a hash of this and 28:59 concatenated with this and a hash of 29:04 this concatenated with this these two 29:07 hashes and then at the top level sort of 29:12 we're we're overdoing is hashing these 29:14 two the concatenation of these two 29:16 hashes and this single hash here is a 29:21 unambiguous sort of stand-in for the 29:26 complete log one of the properties of 29:28 these cryptographic hashes like sha-256 29:31 is that it's not feasible to find two 29:33 inputs to the hash function that produce 29:35 the same output and that means if you 29:37 tell somebody the output of the hash 29:39 function there's only one input you're 29:43 ever going to be able to find that 29:44 produce that output so if the log server 29:48 does hash up in this way the contents of 29:51 its logs only this sequence of these log 29:54 records will ever be able to produce 29:56 that hash or guaranteed effectively that 29:59 the log server is not going to be able 30:02 to find some other log that produces the 30:05 same final tree hash as this sequence of 30:09 log entries all right so this is the 30:12 Merkel tree this is the sort of tree 30:14 hash that summarizes the entire log at 30:18 the top of the Merkel tree there there's 30:23 will actually call it a signed tree head 30:27 because in fact the log servers take 30:29 this hash this at the top of the tree 30:32 and sign it with their private key and 30:33 give that to clients to browsers and 30:36 monitors and the fact that they've 30:40 signed it means that they they can't 30:42 disavow it later 30:43 that was really them and produced it so 30:45 that's you know just to be able to catch 30:47 lying lying log servers and so the point 30:53 here is that once a log server has 30:55 revealed a particular sign tree head to 30:59 a browser or monitor its committed to 31:03 some specific log contents because it 31:05 won't be able to ever produce a 31:06 different log contents to produce the 31:08 same hash so you hashes are really 31:10 function as kind of commitments okay so 31:14 this is the with the log but the Merkel 31:17 tree looks like for a particular log now 31:20 the third reading today sort of outlined 31:23 how to 31:25 and the law how to add records to the 31:27 log for arbitrary numbers of Records I'm 31:32 just going to assume that the log always 31:34 grows by factors of 2 which is 31:37 impractical but makes it easier to 31:39 explain Naumann so that means that as 31:41 certificate authorities send in new 31:43 certificates to add to the log the log 31:45 server will wait until it has as many 31:48 new records as it has old records and 31:50 then produce another tree head and the 31:54 way it does that is it's gonna in order 31:56 to extend the log the log servers going 32:01 to wait off as another four records and 32:02 then it's gonna hash them pairwise just 32:05 as before and then it'll produce a new 32:10 tree head that is the hash of the 32:14 concatenation of these two hashes and 32:21 this is the new tree head for the new 32:26 expanded law and so that means as time 32:28 goes on and a log server this log grows 32:33 longer and longer it produces sort of 32:34 higher and higher a sequence of higher 32:37 and higher tree heads as the logarithms 32:44 okay so this is the structure that we're 32:50 expecting log servers to maintain of 32:53 course who knows what they're actually 32:54 doing especially if they're malicious 32:57 but the protocol the certificate 32:59 transparency protocol sort of is written 33:01 you know as if the log server was was 33:03 actually doing this all right so what do 33:06 we need to do but do the point of this 33:08 Merkle trees is to use them to force log 33:14 servers to prove certain things about 33:16 the logs that they can about the log 33:18 that they're maintaining we're going to 33:21 want to know what those those proofs 33:24 look like the first kind of 33:27 is what I'll call a proof of inclusion 33:33 and this is what a 33:40 NEADS when it when it wants to find out 33:42 if a certificate that has just been 33:44 given by a web server if that 33:46 certificate is really in the law it's 33:49 gonna ask the certificate it's gonna ask 33:54 the log server look here's a certificate 33:57 you know is it an is it in your log and 33:59 the certificate server is gonna send 34:01 back a proof of actually not just that 34:05 the certificate is in the log but 34:07 actually where it is what its position 34:08 is in the log and of course the browser 34:14 wants this proof because it doesn't want 34:16 to use the certificate if it's not in 34:17 the log because if it's not I'm along 34:19 then monitors won't see it and there's 34:21 no / - no protection against their 34:23 certificate being bogus and it needs to 34:27 be a proof because we we can't afford to 34:33 let this log server a malicious log 34:35 forever change its mind we don't want to 34:37 take the log servers word for it because 34:39 then they might a malicious log server 34:40 might say yes and this proof is gonna 34:44 help us catch it you know if a log 34:46 server does lie these proofs are gonna 34:49 help us catch the fact that the log 34:50 servers lied and produce evidence that 34:54 the log server is malicious and should 34:56 be ignored from now on is that sort of 34:59 the ultimate sanction against the log 35:01 servers is that the browser's actually 35:03 have a list of acceptable log servers 35:05 and these proofs would be part of the 35:10 evidence to cause one of the log servers 35:14 to be taken out of the log if it was 35:16 malicious okay so we need a proof we 35:18 want the log server to produce a proof 35:20 that a given certificate is in its log 35:24 so actually the first step is that the 35:29 browser asks the log server for the 35:31 current sign tree head so what the 35:35 browser's really asking is is this 35:37 certificate in the log that summarized 35:41 by this current by this sign tree head 35:45 and the log server may lie about the 35:47 sign tree head right the browser asks it 35:49 for the current sign tree head and then 35:52 for a proof that the certificate is in 35:54 the log the log server could lie about 35:56 the sign tree headband will deal about 35:58 that we'll consider that later but for 36:01 now let's assume that the the browser 36:06 has the correct sign tree head and is 36:09 demanding a proof okay so for simplicity 36:12 I'm just gonna explain how to do this 36:15 for a log with two records and it turns 36:16 out that extending that to a log with 36:18 with other more higher power of two 36:21 records is relatively easy um so the 36:26 browser actually has a particular sign 36:27 tree head let's suppose the correct log 36:32 that sits under that sign tree head is 36:35 the two LM in log a B for particular 36:39 certificates a and B and that means that 36:44 the correct 36:46 Merkle tree for that it securely is at 36:49 the bottom as the hashes of a and B and 36:52 then the sign tree head is actually the 36:56 hash of a hash of a concatenated with a 37:01 hash would be so let's suppose this is 37:06 the sign tree head that the certificate 37:09 that the log server actually gave to the 37:11 client of course the client doesn't this 37:16 client only knows this value this is 37:20 final hash value doesn't actually know 37:21 what is in the log the proof if the if 37:26 the browser asked for a proof that a is 37:28 in the log then the proof that the log 37:33 server can return is simply the proof 37:38 for a is a in the log is simply eizan in 37:42 the log and the hash of the other 37:50 element in the log so zero and the hash 37:55 of b 37:56 and that is enough information for a to 38:00 convince itself that for sorry for the 38:03 client to convince itself that a really 38:05 is at position zero because it can take 38:08 it knows the certificate is interested 38:10 in it can hash it part of the proof was 38:13 the hash of the other element in this 38:16 lowest level hash so the browser can 38:21 that now knows H a and H B you can hash 38:23 them together can execute this hash and 38:26 see if the result is the same as the 38:27 sine tree head that it happens and if it 38:29 is then that means that the certificate 38:34 log is actually produce a valid proof 38:35 that certificate a is at position B 38:39 that's a sorry it's a position zero in 38:42 the log summarized by this sign tree 38:45 head and it turns out that in larger 38:50 larger logs you know if you're looking 38:55 for if you need a proof that a is really 38:57 here all you need is the sequence of 38:59 hashes of the other branch of each hash 39:05 up to the sign tree head that you have 39:07 so in a for element log if you if you 39:11 need a proof that a is position zero you 39:13 need this hash units then you need this 39:15 hash and if the lock is bigger you know 39:17 eight elements then you also need this 39:19 hash assuming that you have the signed 39:22 tree hit so you can take the element you 39:23 know and hash it together with each of 39:25 these other hashes see if it's equal to 39:28 the sign tree head okay so if the 39:32 browser asks is supposing the browser 39:34 asks whether X is in the log at position 39:37 zero well X isn't in the log right so 39:41 hopefully there's no easy way for the 39:44 log server to produce the proof that X 39:46 is in the log in position zero but 39:48 suppose the log servers wants to lie and 39:50 it's in the position where it already 39:52 exposed a sign tree head for log that 39:55 contain a and then B browser doesn't 39:59 know was a and B doesn't know what's in 40:01 the log and the log server wants to 40:03 trick the client into the browser into 40:06 thinking that it's really 40:07 at position zero well it turns out that 40:11 in order to do that the for this small 40:17 log the certificate server has to 40:20 produce for some why it needs to find a 40:31 why that if it takes it's hash one 40:36 concatenated with X you know so this is 40:38 that's that it's equal to the sign tree 40:41 head right because the client we're 40:44 assuming the client already has to sign 40:45 tree head we need to find a some number 40:48 here that when hashed together with the 40:50 hash of X that the clients asking about 40:52 produces that same sign tree hit well we 40:55 know the sign tree head or the 40:57 assumption is assigned tree it was 40:58 actually for some other log right 40:59 because we're trying to rule out the 41:00 possibility that the log server can give 41:04 you a sign tree head for one log but 41:06 that convince you that something else is 41:09 in that log that's not there so the sign 41:10 tree had really was produced by from the 41:14 hashes of the records that really were 41:17 in the log and now we need and since you 41:22 know X is definitely different from a 41:24 that means the hash of X is different 41:26 from the hash of a and that means that 41:28 the log server needs to find two 41:32 different inputs to the hash function 41:35 that produced the same output and the 41:38 Assumption widely believed to be true 41:41 for practical purposes is that that's 41:43 not possible for cryptographic hashes 41:46 therefore the cent sign tree head was 41:50 produced by hashing up one log that it 41:53 will not be possible to find these sort 41:56 of other hash values that would be 42:00 required to produce a proof that some 42:04 other element was in the log that wasn't 42:06 really there 42:07 any questions about this about anything 42:17 [Music] 42:18 interesting a nice thing about this is 42:20 that the proofs are the proofs consist 42:24 of just the sort of other hashes on the 42:27 way up to the root if there's n 42:29 certificates there's only log in other 42:32 hashes and so the proofs are reasonably 42:34 concise in particular that are much much 42:36 smaller than the full log and since you 42:39 know every browser that needs to connect 42:40 to a website he's going to need one of 42:42 these proofs it's good if they're small 42:48 okay well this was whole discussion was 42:51 assuming that the sign tree had the 42:55 theum 42:58 or had was the correct sign tree head if 43:04 the but no there's no immediate reason 43:07 to believe that the log server would 43:09 have given if the logs are is malicious 43:11 and it wants to trick a client you know 43:13 why would it give the client the correct 43:14 see sign tree head why doesn't it give 43:16 it just me giving the sign tree head for 43:18 the bogus log that it wants to trick the 43:20 client into using so we have to be 43:24 prepared for the possibility that the 43:26 log server has cooked up I just 43:28 completely different log for the browser 43:29 that's not like anybody else's log and 43:31 it just contains the bogus certificates 43:33 that a malicious log server wants to 43:36 trick this client into believing so what 43:43 do we do about that well it turns out 43:47 that this is at least in the first 43:50 instance this is totally possible 43:52 you know usually what's gonna happen 43:55 usually the way this will play out is 43:57 that we'd have some browser that was you 44:00 know seeing the correct logs until some 44:03 point in time when when somebody wanted 44:06 to attack it and you know you want the 44:10 browser student be able to use all the 44:11 websites that it's ordinarily seeing 44:13 plus a sort of different log with bogus 44:18 certificates that the log server wants 44:21 to trick just that client just that 44:23 victim browser into using so now this is 44:25 a fork fork attack or more broadly 44:31 equivocation and the reason why people 44:35 call this kind of attack 44:39 a fork attack is that if we just never 44:41 mind the Merkel tree for a moment if we 44:42 just consider the log usually the log 44:45 already has you know millions of 44:47 certificates in it and everybody's seen 44:50 the beginning part of the log then at 44:52 some point in time we want to attack we 44:57 want to persuade our victim to use some 45:00 bogus certificate B but we don't want to 45:04 show B to anybody else certainly not to 45:05 the monitor so we're gonna sort of cook 45:07 up this other log the sort of continues 45:10 as usual and contains new submissions 45:12 but definitely doesn't contain the bogus 45:14 certificate B and you know what this 45:18 looks like is a fork because both the 45:20 sort of main log that monitors are shown 45:23 is kind of off on one fork and then this 45:26 vlog we're cooking up especially to 45:28 trick a victim is a different fork this 45:31 is the construction that the malicious 45:33 log server would have to produce if it 45:35 wants to trick a browser into using a 45:37 bogus certificate and again these are 45:42 possible it's possible to do this at 45:45 least briefly in with certificate 45:48 authority the sift a fit transparency 45:52 luckily though is not the end of the 45:54 story and certificate authority contains 45:57 some tools that allow it to make Forks 46:01 much more difficult so the basic scheme 46:06 is that this isn't this is the way the 46:15 certificate authority sort of intended 46:16 to work all certificate transparency is 46:18 intended to work but doesn't quite 46:20 what's going on here is that the the the 46:24 monitors and people are not being 46:26 attacked or gonna see a a sign tree 46:30 particular sign tree head let's say 46:32 science we hit one of course is gonna 46:33 change as the log extends and the victim 46:37 we know must see some other sign tree 46:39 head because this is a signed tree hit 46:41 that is hashed over this 46:44 certificates guaranteed to be different 46:46 from the sign tree heads this is the 46:48 militia service showing two monitors 46:51 if only the browsers and monitors could 46:53 compare notes they would maybe instantly 46:56 realize that they were seeing different 46:58 trees and all it takes is comparing you 47:00 know if we play our cards right all it 47:02 takes is comparing the sign tree had its 47:04 they've gotten from the log server to 47:06 realize wait a minute we're seeing 47:08 different logs now something's terribly 47:10 wrong so the critical thing we need to 47:15 do is have have the different 47:18 participants in the system be able to 47:21 compare sign tree heads and the 47:24 certificate transparency has a provision 47:27 for this called gossip and the way it's 47:30 intended to works that browsers well the 47:33 details don't really matter but what it 47:36 really amounts to is that all the 47:38 participants sort of drop off the recent 47:41 sign tree heads they've seen into a big 47:43 pool that they all inspect to try to 47:47 figure out if there's inconsistent sign 47:50 tree heads that clearly indicate 47:52 divergent logs that have for it so we're 47:55 going to gossip which really means 47:58 exchange 48:02 I'm sign tree heads and compare it turns 48:07 out that current certificate 48:09 transparency implementations don't do 48:12 this but they ought to and they'll 48:16 figure it out at some point 48:17 all right okay so the question is given 48:21 to sign tree heads how do we decide if 48:25 they're evidence that the log has been 48:27 forked the thing that makes this hard is 48:33 that even if a log hasn't been forked as 48:36 it's depended to new sign tree heads 48:40 will become current so you know maybe 48:42 sign tree head one was the legitimate so 48:46 he had a vlog at this point of then some 48:47 more certificates are added and sign 48:50 tree head 3 becomes the correct head of 48:54 the law and then signed tree head for 48:55 etc so really what this gossip 48:59 comparison least to do is distinguish 49:04 situations where one sign tree head is 49:07 really describes a prefix a log that's a 49:09 prefix of the log described by another 49:11 sign tree head because this is the 49:13 legitimate situation where you have the 49:15 two these two sign tree heads are 49:17 different but the second one really does 49:20 subsume the first one we want to 49:21 distinguish that from two signed tree as 49:24 that are different where neither 49:26 describes a log that's a prefix of the 49:28 other one's log one tell these two cases 49:31 apart this telling that situation apart 49:40 is the purpose of the consistency proof 49:44 the log or Merkel consistency proof that 49:47 the reading is talked about so this is 49:49 the 49:52 la consistency proof 49:58 you 50:05 so the game here is that we're given to 50:08 sign tree heads H 1 and H 2 and we're 50:12 asking is h 1s log prefix really it's 50:22 not these are - these are hashes so it's 50:24 really asking about the log that the 50:26 hashes represent and you know we're 50:38 hoping the answer is yes and if the 50:40 answer's no that means that the log 50:41 servers Fork Dustin is hiding something 50:43 from one party or the other okay well it 50:51 turns out that um as we as I mentioned 50:54 before the as the Merkel tree as the log 50:57 grows the Merkel tree also grows and 50:59 what we see is a sequence of signs of 51:03 tree heads each one as a log doubles in 51:11 size each one has its as its left thing 51:14 let me draw in the actual hash functions 51:17 of this hash function is hashing up two 51:20 things the result of this hash function 51:24 is one of the inputs to the next sign 51:27 tree head the result of this hash 51:28 function is one of the inputs to the 51:30 next sign tree head I know we get this 51:34 kind of tree of life sign tree heads all 51:42 right and I need to sign tree heads if 51:45 they're legitimate you know if each one 51:47 is log is a prefix of H 2 that means 51:49 that maybe this one's H 1 and this one's 51:50 H 2 and they're gonna have this 51:52 relationship thing you know if each one 51:55 is a piece of H 2 then they must have 51:57 this relationship where each 2 was 51:59 produced by taking each one hashing it 52:02 with some other thing and maybe hashing 52:04 that with some other thing until we get 52:06 to the point where we find H 2 and with 52:10 means is that if a browser or monitor 52:14 challenges a log a log server to prove 52:20 that each one's log is really a prefix 52:23 of h2s log what the log server has to 52:27 produce is this sequence of other the 52:31 other side of each of the Hat sign tree 52:34 head hashes on the way from h1 to h2 and 52:39 this is the proof and then again you 52:43 know this is reminiscent of the 52:46 inclusion proofs then to check the proof 52:51 you need to take each one hash it with 52:54 the first other thing you know hash that 52:57 along with the second other things that 52:58 you get to the last one of these and 53:00 that had better be equal to h2 if it is 53:03 it's a proof that h2 is a suffix of each 53:08 one otherwise the log servers evidently 53:13 tried to fork you and again you know the 53:18 basis of this is that there's no other 53:22 you know h2 really isn't as supposing h1 53:25 isn't a prefix of h2 there's no way that 53:29 uh since h2 was created from some actual 53:34 log that's not the same as h1 there's no 53:36 way that the log server could cook up 53:40 these values that are required to cause 53:44 the hashes this sort of repeated hash of 53:47 h1 to equal H to H do really encompass 53:50 ooming that the cryptographic hash does 53:53 prevent you from binding to different 53:55 inputs that produce the same out 54:02 alright ok so this is the log 54:06 consistency proof okay so the question 54:12 is who usually challenges the log server 54:14 so I'll actually talk about that in a 54:15 minute but it turns out that um both 54:19 browsers and monitors 54:25 well Luke browsers and monitors 54:28 challenge the log server you it's 54:30 actually usually the browser's 54:31 challenging the log server that's the 54:33 most important thing but there's two 54:35 points in time at which you need to 54:36 challenge the log server to produce 54:37 these proofs and I'll talk about both of 54:41 them all right okay actually so the 55:00 first place at which one point at which 55:06 these proofs are used as for gossip as 55:07 part of gossip as I outlined and the the 55:11 scheme that's intended for gossip is 55:12 that browsers will periodically talk to 55:16 some central repository of some set of 55:18 central repositories and just contribute 55:22 to a pool of sign tree heads the sign 55:24 tree hits the recently seen from the log 55:28 server and the browsers were also 55:30 periodically pull out random elements of 55:34 sign tree heads that other browsers have 55:36 seen just Brandon they pulled them out 55:37 of the pool and it'll be multiple of 55:39 these collects these pools run by 55:41 different people so that if one of them 55:43 is cheating that will be proof against 55:46 that and then the browser will for 55:51 whatever just any random sign tree has 55:53 it apples out of the pool it will ask 55:56 the log server to produce the logs 55:59 insistency proof for that pair of sign 56:01 tree heads and you know if nobody's 56:03 cheating design it should always be easy 56:06 for the log server to produce you know 56:09 any consistency proof that's demanded of 56:12 it but if it's for somebody suppose it 56:15 the log server is for somebody and given 56:18 them a sign tree had this really 56:19 describes a totally different log or 56:21 even a long the difference in one 56:22 element from the logs that everybody 56:25 else is seeing eventually that browser 56:27 will contribute that's that sign tree 56:30 head to the pool the gossip pool then 56:34 eventually somebody else 56:36 we'll pull that sign tray head out of 56:38 the pool and ask for a proof for you 56:41 know some other sign tree had that 56:42 presumably is on a different Fork and 56:43 then the log server will not be able to 56:46 produce the proof and I'm since they're 56:49 signed since the scientist or signed by 56:52 the log server that's just absolute 56:55 proof that the log server has forked two 56:59 of its clients presumably with intent 57:02 reveal a bogus certificate to one of 57:05 them and hide it from the other okay but 57:09 there's actually another place where it 57:11 turns out you need the these consistency 57:15 proves not just during gossip but 57:18 actually also during the ordinary 57:19 operation of the browsers so the the 57:28 difficulty is that suppose you know 57:31 suppose the browser is it's kind of 57:33 seeing consistent version of the log is 57:36 the same as everybody else but then log 57:39 server wants to trick it into using this 57:41 bogus certificate so the log server 57:48 sends it a signed tree you know makes 57:52 signed RIA that's different from 57:53 everybody else that refers to a you know 57:56 malicious log that contains this bad 57:58 certificate preferred video since it 57:59 doesn't want other people to notice 58:00 certainly doesn't want you know the 58:02 monitors to notice you know cooks up 58:04 this other log that is what everybody 58:07 else is seeing all right so now the you 58:12 know the browser checks and sees you 58:16 know I asked for inclusion proof and the 58:18 inclusion that log server will be able 58:20 to produce the inclusion proof because 58:21 this sign tree had that the browser has 58:23 really does refer to this bad log the 58:25 browser will go ahead and use this bogus 58:27 certificate and maybe get tricked and 58:30 give away the user's password 58:31 you know who knows what but depending on 58:36 the details of other browsers work we're 58:38 at risk of the next time the browser 58:40 which it doesn't realize anything's gone 58:42 wrong talks to the log server the log 58:44 server might then say you know there's a 58:46 new log with a bunch of new stuff on it 58:47 and here is the sign tree 58:49 of the current log why don't you switch 58:52 my to use that as your sign tree hit and 58:54 so now if that were allowed to happen 58:59 then the browser's now would completely 59:01 lost the evidence that anything went 59:03 wrong because now the browser is using 59:04 the same trees everybody else no it's 59:06 going to contribute this sign tree head 59:08 to the gossip pool it's all gonna look 59:10 good and we had this sort of brief evil 59:15 tree that was evil log that was revealed 59:17 evil log Fork but if the browser's are 59:20 willing to accept a new sign tree head 59:22 then we can basically have the browser 59:25 forget about so we want what we want is 59:30 this what we want is for if a browser if 59:34 the log service shows a particular log 59:38 to the browser that the browser that 59:41 they can't trick the browser into 59:43 switching away from that log that is 59:46 that we want to be able to enforce that 59:48 the browser sees only strict extensions 59:52 to the log that it's seen already and 59:55 doesn't simply get switched to a log 59:57 that is not compatible with the log the 60:00 browser seen before it's the property 60:01 that we're looking for it's actually 60:03 called for consistency and with any 60:12 first two is that if the browser's been 60:14 forked onto a different fork from other 60:16 people then they must stay on that fork 60:18 in it it should never be able to switch 60:22 to the main fork and the reason for that 60:25 is we want to preserve you need to 60:27 preserve this bad sign tree head and its 60:29 successors so that when the browser 60:33 participates in the gossip protocol it's 60:36 contributing sign tree heads that nobody 60:41 else has and that cannot be proved to be 60:44 compatible using the log consistency 60:46 proof okay so how do we achieve for 60:48 consistency well um it's actually easy 60:52 with the tools we have now every time 60:53 the log server tells a browser oh here's 60:56 a new sign tree head for a longer log 60:58 the browser will require the will not 61:01 accept the new sign tree head until the 61:04 log server has has produced a log 61:08 consistency proof that the new sign tree 61:10 head describes a suffix of the old sign 61:15 tree that is that the log of the old 61:17 sign tree has a prefix of the log of the 61:19 new sign tree and of course if a log 61:21 server is as forked the browser and it's 61:24 keeping the browser on that same Fork it 61:26 can produce the proofs but of course you 61:28 know it's digging its grave even deeper 61:30 because I'm as producing more and more 61:33 sign tree heads for a which will 61:35 eventually be caught by the gossip 61:37 protocol whereas if the blog server 61:40 tries to cause the browser to switch to 61:43 a sign tree head that describes the same 61:45 log everybody else has been seeing the 61:48 browser will demand a consistency proof 61:50 and the log server will not be able to 61:52 produce it because deed the log 61:55 described by the first sign tree head is 61:57 not a prefix of the log described by the 62:00 second sign tree 62:05 okay okay so the system these these log 62:11 consistency proofs provide for 62:13 consistency and for consistency plus 62:15 gossiping and that requiring this log 62:20 consistency proves for the science found 62:23 by gossiping 62:24 I'm the two of them together make it 62:27 likely that all the participants or 62:31 seeing the same log and that if they're 62:33 not seeing the same log they'll be able 62:34 to detect that fact by the failure of a 62:38 log consistency proof 62:45 any questions 62:53 okay so that how many log service are 62:58 there that is a great question 62:59 so I describe the system as if there was 63:02 just one log server it turns out in the 63:03 real system there's lots of log servers 63:05 at least dozens so this is a deployed 63:07 system which you can programmed in that 63:09 is actually used by Chrome and I think 63:12 Safari there are at least dozens of 63:15 these log servers and when certificate 63:17 and certificate authorities are actually 63:19 required by chrome to submit all their 63:21 certificates to the to the log servers 63:25 to multiple log servers the different 63:29 log servers don't actually keep 63:30 identical logs the convention is that a 63:32 certificate authority will submit a new 63:34 certificate to save you know a couple 63:37 maybe five different log servers and 63:41 actually in the certificate information 63:44 that a website tells your browser it 63:46 includes the identities of log servers 63:50 of the certificate transparency log 63:52 servers that have the certificate in 63:54 their log so your browser knows which 63:56 log servers to talk to and the reason 64:01 why there's more than one of them is of 64:03 course some of them may go bad some of 64:05 them may turn out to be malicious or go 64:06 out of business or who knows what and in 64:09 that case you still want to have a 64:10 couple more to fall back on they don't 64:15 have to be identical because they don't 64:17 as long as the certificate is in at 64:20 least one log that's you know as far as 64:23 anybody knows is trustworthy that's 64:25 sufficient because you know the issue 64:32 here 64:33 not really necessarily the fact that the 64:36 log had the certificate in it because 64:37 that's not proof that the certificate is 64:40 good all we're looking for is log 64:43 servers that aren't forking the monitors 64:47 and browsers that use them so it's 64:50 enough for a certificate to be in even a 64:52 single log server that's not forking 64:56 people because then the monitors are 64:58 guaranteed to see it because the 64:59 monitors check all the log servers so if 65:04 a bogus certificate shows up even even a 65:06 single log server the monitors will 65:07 eventually notice because all the 65:10 monitors look at all the log servers 65:15 that the browsers are willing to accept 65:18 all right another question what prevents 65:22 a log server from going down and issuing 65:25 bogus certificates before they get 65:28 caught you know nothing actually if 65:31 you're willing to that's definitely a 65:34 defect in the system that at least for a 65:36 while you can 65:38 malicious log server contributing bogus 65:43 certificates so if you have a 65:44 certificate authority that's become 65:47 malicious and this issuing bogus 65:49 certificates they look correct but 65:51 they're bogus and a log server then that 65:59 that's willing to serve these it's 66:00 willing to put these certificates in the 66:01 log and of course they all are then at 66:04 least for a while browsers will be 66:05 willing to use them the thing is though 66:07 that the you know they will be caught 66:09 and this is the system is its intent is 66:12 to improve the situation in the priests 66:15 or to make a transparency system if 66:17 somebody was issuing bogus certificates 66:19 and browsers were being tricked into 66:21 using them you might never find out ever 66:23 in the certificate transparency world 66:26 you may not find out right away and so 66:28 some some people may use them but then 66:31 relatively quickly you know a few days 66:32 or something the monitors will start to 66:35 notice that there's bad certificates in 66:37 the logs and somebody will go and track 66:39 it down and figure out who is malicious 66:41 or who is making mistakes 66:52 yeah so I guess a certificate a 66:56 certificate transparency law could 66:58 refuse to talk to the monitors yeah I'm 67:02 not sure I think ultimately the if you 67:08 know we're now treading into a kind of 67:09 non-technical region you know what to do 67:11 if there's evidence that something's 67:13 gone wrong this is actually quite hard 67:15 because much of the time is something 67:18 seems to go wrong even bogus 67:20 certificates often often the reason it's 67:22 just somebody made a mistake it was a 67:24 legitimate mistake you know somebody 67:26 blew it and it's not evidence of malice 67:28 is just that somebody made a mistake I 67:31 think what would happen if a monitor was 67:33 misbehaving in almost any way like not 67:35 answering requests if it was doing 67:37 consistently people notice and either 67:41 ask them to shape up or take them out of 67:43 the list 67:44 stop using them the browser vendors 67:46 would take that logs her out of a list 67:48 of acceptable log servers after a while 67:50 but yeah there's like a gray area of bad 67:53 behavior that's not bad enough to the 67:56 warrant being taken out of the 67:57 acceptable list I think of a log server 68:00 has been found to work the question is 68:01 what if the log server has been found 68:03 before what happens then I think I think 68:07 what would happen is the people who were 68:09 run you know the people who the browser 68:11 vendors would talk to the log server and 68:15 ask them the people running the log 68:17 server and ask them what happened and if 68:19 they came up with a convincing 68:21 explanation that they didn't made a 68:22 mistake you know which maybe they 68:24 couldn't maybe I don't know they their 68:26 machine crashes it loses part of their 68:28 log they restart you know starting from 68:31 a prefix of the log and start growing a 68:34 different log if it seems like a mistake 68:37 honest mistake then well it was a 68:41 mistake but if it if the log server 68:44 operators can't provide a convincing 68:46 explanation of what happened then I 68:48 think the browser vendors would just 68:49 delete them from the list of acceptable 68:53 klog servers okay but these are you know 69:02 these are sort of problems with the 69:06 system because you can you know the 69:09 definitions of like who owns a name or 69:11 what acceptable but you know whether 69:13 it's okay for your server to be down or 69:14 not these are very hard to pin down 69:18 properties you know I think the system 69:24 is not full you could definitely get 69:26 away with bad behavior at least for a 69:28 while but the hope is that there's 69:32 strong enough auditing here that if some 69:35 certificate authority or log server was 69:39 persistently badly behaved that people 69:42 would notice the monitors would notice 69:44 they may not do anything for a while but 69:45 eventually they would decide that you 69:50 know you're either too much of a pain or 69:52 to malicious to be part of the system 69:54 and delete you from the browser lists of 69:58 course they split the browser vendors in 69:59 a position of quite strong power so Wow 70:03 the system is in general pretty 70:04 decentralized yeah there can be lots of 70:06 certificate authorities and lots of 70:08 certificate transparency log servers 70:10 there's only a handful of browser 70:12 vendors and that there because they 70:15 maintain the lists of acceptable 70:17 certificate authorities and log servers 70:21 they do have a lot of power and you know 70:26 it's the way it is unfortunately okay so 70:31 things to take away from a certificate 70:35 transparency design so one thing is the 70:38 key property it has super important is 70:40 just that everyone sees the same log 70:43 even if some of the parties are 70:46 malicious either everyone sees the same 70:48 long or they can accumulate evidence 70:50 from failed proofs that something's 70:53 funny is going on and because both 70:55 browsers who are using those 70:56 certificates and the owners of the DNS 70:59 names who are running monitors see the 71:01 same log because of these proofs 71:05 the monitors can detect problems and 71:08 therefore the browser's even though the 71:10 browsers can't actually detect bogus 71:11 certificates they can at least be 71:13 confident that there if there's bogus 71:14 certificates out there that monitors 71:16 will detect them and possibly put them 71:19 on revocation lists actually that's 71:20 something I didn't mention if if there's 71:23 evidence of a monitor spots what must be 71:26 a bogus certificate like MIT sees 71:29 somebody they don't know about being 71:32 issued a certificate for MIT did you it 71:34 turns out there's a pre-existing 71:35 revocation service that you can put bad 71:38 certificates on that the browser's check 71:41 so if a monitor sees a bogus certificate 71:44 it can actually be effectively disabled 71:46 by putting it on in the revocation 71:49 certificate revocation system that's not 71:51 part of certificate transparency it's 71:53 been around for a long time okay so the 71:57 key property is everyone sees the same 71:58 log of certificates another thing to 72:02 take away from this is that if you can't 72:04 figure out a way to prevent bad behavior 72:07 maybe you can build something these 72:10 usable that relies on auditing instead 72:13 of preventing that is can detect bad 72:16 things after the fact that might be good 72:19 enough it's often much easier than 72:21 preventing the bad things some technical 72:24 ideas are here in this this work one is 72:27 this idea of equivocation that I'm a big 72:30 danger is the possibility that a 72:33 malicious server will sort of provide 72:35 split views one viewed one set of people 72:38 another view to another set of people 72:39 it's usually called a fork or 72:42 equivocation it's an important kind of 72:43 attack another property this for 72:46 consistency property it turns out it's 72:48 often valuable to when you're worried 72:50 about Forks to build a system that 72:52 forces the malicious server once it has 72:55 formed somebody to keep them on that 72:57 fork so it can't erase evidence by 73:00 erasing a fork I'm the final technical 73:03 trick is the notion of gossiping in 73:06 order to detect for because it's 73:08 actually gen if the participants don't 73:10 communicate with each other it's 73:13 actually typically not possible to 73:14 notice that there has been a fork so if 73:17 you want to detect Forks there has to be 73:18 one way or another 73:19 some kind of gossip some kind of 73:22 communication between the parties so 73:23 they can compare notes and detect forks 73:26 and we'll see most of these things again 73:30 next week when we look at Bitcoin and 73:36 that's all I had to say