00:00
Upstate, yes, and Rochester. Cool.

00:43
Great title for the talk, better way.

00:51
Oh so I just want to double check. You're okay with recording that yes.

01:13
I don't see or

01:27
So cool.

01:34
I'll try to keep on this one, so it's just

01:46
Different shot.

01:53
So we're talking later on this afternoon. Yeah, I mean it's that's okay. Yeah.

02:11
Okay, let me

02:17
So I'm trying to figure something out. I where are you sets? You're supposed to be at UNM, right? And that and within 15 yards of dirt. So is this the conference room then? Okay. Yeah, yeah. It's kind of where the little somewhat we've done on so. So, it's in the Asian that there were about you, will this report

02:56
Okay. On. Let's go ahead and get started. It's it's my great pleasure in degrees outside the book. So Seth got his PhD from the University of Texas at Austin in 2003 and then was a high school at the maximum for sever off of this. I'm just currently a at the University of Michigan mobile.

03:36
The has like published lot of different areas of theoretical either science. I think distributed ones Many of his papers. So, the top of soldiers and sure all of examples. Okay. Thank you. And I have an hour or less or what roughly about an hour and then and then we'll have our time performing at home for for questions.

04:44
Okay, great, cool. So I've actually had a prior interest in bullshit detection for a long time and I've kind of always wanted to, you know, work it into my research at some point. So I think I'll just start off with a little stories. So one of my, my dirty little, you know, secret pleasures is reading about the replication prices in the social sciences.

05:15
And if you haven't heard of this, I guess this is going to be negative surprising but yeah, several years ago they decided to actually you know, try to reproduce 100, you know, famous psychological experiments bullets, and top journals. And they discovered that only 36% of them. Reproducible in 64, basically, field, reproduce and the head in the sands reaction to this was almost instantaneous and, you know, so it sort of continues to just stay in some way and over the years they've been, you know, going through all the oldies and goodies trying to, you know, figure out what this is and what doesn't and just send the list of.

05:59
Yeah, entire websites. Let's see. Hundreds of studies that just do not So. Anything that you've read the syrup in the papers more than half of those studies or very system. So, we read something like this. You think, well as a computer scientist, maybe there's something I can do to help.

06:18
And that's kind of what this DARPA replication markets study was was the figure out, you know, but what's a 2019 way of solving this problem? You want to build a classifier to tell what the studies nonsense are. Not right? You don't want to actually fix the field, you just want to be able to distinguish, you know, noise from signal.

06:42
And you found out that from fireworks, that prediction markets were really good at telling whether stuffies were getting tradition markets, did much better than chance at predicting whether studies were replicable or not. So the idea here I think is that you're going to build the classifier to do BS detection on psych experiments and into train that classifier, you're going to have a lot of markets going and then you're going to get some ground cruise by rerunning 5% of the experiments and and use that to the calibrate.

07:17
So there's lots of money involved start but it's much bigger project and maybe you're just seeing and and you've started, this is you know sort of depressing in one it which is between 2,000 watt and 15. When the bomb went off and 2019 very little seems to have changed and you know how the rate of replication.

07:39
So it's still very low if you want to read the most funny thing that you're gonna read today, they haven't seen this one person who participates in this market is written all about this fantastic and acronym, and it's hilarious. Okay? So lots of causes the replication prices, the publication bias research degrees of freedom of your registration key hacking and the previous forms, and then there's outright fraud, which you think would be pretty negligible, but it turns out that, it's not completely necessary.

08:16
And this is another area where I was thinking out, maybe CS people can get involved with helping reframes. Some of the problems here is BS protection, okay, so how much fraud is there resignments on? I think it's like the only person on earth who you would want to trust to guess this number because he is sort of personally gone through studies looking for broad and is outed.

08:40
I think basically three people now maybe some more and he's it's a roughly 5% of studies our project and I don't know this is a good number or not but certainly consistent with other sort of cursory staffs that at looking for data fabrication, but this talk was designed to be interrupted.

09:06
So if you have any interruptions that you want to make, please, I mean yourself and go for it. What do you mean by bad proxy? It. What it seems is that in early a bunch of data from insurance company and it and he claims that the fraud happened that the guy putting in the data from the insurance company wasn't him per se, but on the publication, he was the guy holding the data and all his co-authors are definitely innocent but it's unclear, whether it's him or the guy at the insurance company.

09:40
But, yeah, so the whole outing of this figment complication is, if you look at the data colada blog, that's run by already Simonson and two or three of his co-authors, they go through the evidence there. And once again, that is extremely hilarious reading. Because it's a, it's not the most confident product.

10:04
Okay. So anyway, yeah, thought exists the way or assignments into text broad is purely through statistical measures, okay? So it's not through any sort of exercise, you just opens the papers looks at the paper, puts the stitch and says, this doesn't make sense. There's something very fishy in the statistics themselves.

10:23
Okay. So now I was thinking I should get into this world detection gate. This is fun stuff and I was thinking back to a talk that Jared gave at soda. I think like roughly 10 years ago or something where he is this exact, you know? Here you look at the statistics in the statistics themselves betray for art and the problem he was working on was Byzantine.

10:44
So I'm like this is the perfect place to start and the way that Byzantine agreement a problem works is it's a distributed computing. This is is end processes each process, you know, wakes up and holds a value that's either minus 101. Let's say, and throughout this game, that's going to be played processes to become corrupted.

11:10
So there's an adversary out there. They can take over the machines and brush them and after that they behave any way, they like, so they're big isn't you? And the goal of this game is the processes that have not been corrupted have to decide from the same value. So they have to reach some kind of agreement minus 101 and they can't just all, you know, in some prior arrangement decide to always agree on this one regardless of their inputs are because that would be trivial and kind of dumb.

11:40
So, you have to agree on a value that existed in the import. And in particular, if everyone is unanimous in their initial opinions, then you have to decide that value, okay? So this is going to become pretty bad if everyone became corrupted. So usually measure their protocols, the resilience, just how many of the players can tolerate becoming corrupted and still have the critical, you know, complete, okay.

12:10
So what's the model? The model is a communications network with point messages so it's just like the postal service. You know, you have all these parties around the country and the only way they can communicate with each other is to send letters to each. And this postal service guarantees that every message that you put in the mailbox, is eventually delivered to its target and it gives you absolutely no guarantee on how many days weeks months, or years, it takes to actually get there.

12:36
Okay, so the way you measure time in this model is through chains of messages. It's not for some kind of absolute notion of, you know, wall clock time or something. Okay. What can the adversary do? The adversary is the post office in some sense, they get to decide when when to deliver the messages and what order and they can also do the corruption.

13:00
So they can take over certain, you know, parties and corrupt them and change the way to days and they have a budget of yes, let's say this is how many party still alive corrupt and throughout the whole time, you know, tell me the particles on it. Okay. Okay, so that's basically the model.

13:19
So there's a lot of work from early 80s. Basically showing when this is completely insoluble and classic fissured lens. Pattern bound says that even if you have one fault and that fault is a crash fall, which means, you know, the machine just stops running. They don't behave erratically or anything.

13:40
That you cannot reach agreement. If you're political is deterministic, that is, you know, how you react to the letters, you receive and which letters you send is just some deterministic transformation. There's still lots of non-determinism in the system, because you don't know the order of messages that are being delivered, but the way you're each program runs itself has to be and it's also impossible to solve even with randomization.

14:07
If f is at least a third of the populations, and there's lots of this that are published that are not trivial because they, they prove this and over three bound and more stronger models. But in this model that I just sketched out, it's basically trivial in all of the show.

14:24
Little proof at the end of this slide. Okay. So also in the 80s people, you know, managers get actually this in over three bound. Exactly. Okay, so, bin orders algorithm dot f to be linear and and Barafa and improve binar's algorithm and got f to be exactly up until the, the upper down.

14:47
Number three, so if it's strictly less than and over three, then you can solve isn't agreement. It's probability one, but the downside is it has exponentially latency. Okay. So the exponential you know, message traffic going to the post office and that was the state of the art for about 20 years.

15:07
And so King and Sia came up with a weight of build unbroken algorithm. And but this drastically reduced, the the resiliency so they had two algorithms actually and one was that resilience you have to enter 400 and it had polynomial latency. That's good. But the internal to the machine, it had to run some exponential time algorithms to decide what to do next occasionally.

15:35
So the local computation is quite large. They came over the different algorithm, which is resiliency about Denver billion and it was polynomial in Ireland. Okay. And our new algorithm is based on the the same kind of approach of canines and we get latents resiliency almost up to the lower, the upper mind.

15:59
Number three, so number four is not too far away.

16:06
All right. So here's a little proof of the trivial lower account, just so you can see why this number three really is, you know, you're never going to get past this this problem. So imagine you have a population of processes and they're divided into three equal size groups, agency, and there could be anniversary crash faults, which means the processes wake up immediately die, and never send a receiving message.

16:34
That's a possible scenario. You have to deal with. So, what that implies is that, you know, if your process may in principle, you should be able to complete the protocol without ever talking to you, or hearing from anyone can be and vice versa. Everyone be has to be able to complete protocol without ever hearing any message from that.

16:53
And if that's true, then the outposter doesn't very simple, which is, it takes over everybody and see, and now just refuses to deliver messages between a and b, and just talks to a separately and talks to be separate. And when it talks to a, it pretends, it's inputs or minus one and when the parts could be, it pretends, it's inputs or plus one and since maybe never communicate, there's no way they can do sanity checks on these messages.

17:17
So you know hey in fact does all minus one exclusively and be false plus one exclusively. Then they're going to agree on inconsistent values. Sorry, can ask a question here. Where did the second line come from? You know, what, why do you say and be shipped finish? Even if there's no communication between it's mine has to entertain the possibility that every process would be instantly crafted.

17:47
Oh, I see. So it can't wait for a message to be before doing something. It'll just wait. Forever and won't be correct. I see. So it is view of the what's going on? Is the B has crossed. And I'm only talking to C and C and I have to decide something, right?

18:06
Okay, I get it. I'm sorry. Yeah yeah so this this upper mound enter number three holds and stronger models too, where there's actually synchrony and some other things. So it's if it's a better. Look at this model. The lower button is very quick over the upper round. I guess it's very quick.

18:25
Okay, so we're building on Brock as algorithm just like king inside did. So try to go for process algorithm. This is not my kind of distributed computing. So I'm still someone for wildered by, you know how you, you know, write and reason about asynchronous algorithms, it's are that a subtle but the whole point of this is to, you know, basically cut to the chase in which is doesn't have to do with rock as I was in percent.

18:51
So, so Brock has two big innovations. The first is this point, the point thing is super confusing, but it doesn't seem like the adversary could send you a message and send someone else a message and have those messages begins consistent and confused everybody. And what Barca first did is said that, you can, you can assume without loss of generality that, everyone's kind of talking on a broadcast channel, you can't lie to people when you broadcast the value.

19:19
Everyone hears it the adversary attempts to broadcast the volume, some very slippery way. It can't make some. It received the value of other people never received the value. It's either going to go to everybody or nobody. And the downside of this drug test primitive is that there's no guarantee on the time to completion.

19:37
So you know, you don't know what these things are even initiated because the broad customer might have crashed, okay? But nonetheless you can't you can't disassemble. It's okay. The second thing he came up with which is also very powerful is the validation methods, which basically says it forces, the evidence, really, the corrupt processes to obey the protocol.

20:01
They cannot be VA from the protocol. They can't, you know, lie about, you know, what they're doing. And you might think that this is weird, because it doesn't that isn't that? Because of corruption is not following protocols, so sort of. So what this does is it is it forces the adversary to use two tools.

20:20
The first, is it still gets to choose the order of messages that pick the leopard. And the second tool is, you know, deterministic algorithms are possible to your outgoing to put points and then it can still flick coins anyway at once, right? So it can flip points, go ahead.

20:34
So it's heads, heads heads, heads and that might be an unlikely sequence of points, but it is definitely within the parameters of the protocol, that it could be that. So that, that's an insole power is coin flipping and choosing the order messages.

20:53
Okay. So Bratz has that rhythm is brinstein definitely just wants just one two and three until processes agreement and this takes exponential number of iterations and expectation for breakfast out. So this is code from the perspective of process key initially key holds a value VP and the first thing it does is it broadcasts its value and everybody else broadcasts their value at the same time.

21:21
But remember, he can't wait up to hear from everybody because they might have crashed. So as soon as it gets n minus f values broadcast from other processes. They has to move on. They can't wait for one more. It's what might wait for us. So it takes all of these values.

21:36
That's received sums them up, takes the sign. This is like taking a majority vote, right? And it says that's my new value. So it updates its value. It says, I started off with minus one, but one looks like it's in the majority of what I've heard. Well, it's already immediately produced that both minus one and plus one are valid outputs.

21:55
So it's it's ready to switch its value as process moves along, okay? So it updates its value with the majority vote. The usefulness of this majority vote is that if there's already a super majority you already have to end minus f processes that hold the same value they're going to win every one of these majority votes and this proud, this majority voting will create unitity after the step.

22:21
If you already start off with a sufficient superlator step two, you send your value again. And now you wait for n minus, f other values to come back to this broadcast and here your looking to see if there's an absolute majority. Is there a value out there? Minus one or plus one that has a strict majority in the population, so greater than n over two and if you do see that value of you, you update your opinion of the state of things to be, you know, you take on that value is your value and you also say I'm willing to decide, that's about it next step.

22:58
So this is what gets these are means is I'm holding the value, these are. And I'm really into decide this in the next step. And then next step is you send out all of these messages so either your value or deck in your value. If you've seen it, if you think that these are the inventory value.

23:17
And you count up all these, get these star messages that you. So, if you've received at least one debtful star message, then you are aware of what the majority value is. It's a star and you adopt that as grown down if you've seen f. Plus one of these messages then it's actually okay to decide these start and say I can't be considered.

23:41
I can't be persuaded from this opinion. The output of this point of this agreement protocol is going to be started. And if you haven't seen any decky, star messages come in, then you choose your value to V, key going into the next iteration on the basis appointment and just flip a coin comes up.

24:01
I just want to plus one. That's your value. Then you go back to step one in the next iteration. Okay. So, the point of this protocol is two processes, can't disagree on their x values, by more than half because of, you know, their that's a maximum amount of disagreement, they can have an, and their x values.

24:22
So, one happy situation is the entire population is in one of these first two cases, They've seen at least one and maybe some have seen each step. Plus one, This is great, this means that going into the next iteration, everybody's unanimous and they will reach agreement, very quickly. So, this is kind of the terminal iteration for rocka's album.

24:47
The normal situation is at least, one process has seen the majority value in this ticking to it in the next iteration. So they've already set. I'm going into the next iteration with me stars, as my value and the rest of the population is flipping points. And the reason braces algorithm works is there is a very, very, very slimmed chance that everybody can flips a coin flips, the start, and then they would be anonymous.

25:17
And then the protocol of event, right? And you don't you know you might have to run this. A lot of iterations before you get that magic magic event, when everyone puts the point and their unanimous and they chose the correct value. But you know, with probability one that will eventually happen.

25:35
Okay, so so yeah, so the way it came inside fix this and we were gonna fix. This is, we don't want everyone to just privately flip their own little point to choose the value going into the next iteration, but we really want them to is flip a really big global coin that they can all agree on.

25:52
So at least at the very least they'll have the same value going next iteration, right? So what we really want to design the points looking, so the desirable properties of this protocol are it would be great if everyone participated in the clipping of this, you know, shared coin. And when the protocol was done, they agreed on which way the coin flip came up, is it plus 100 miles?

26:19
It would also be really nice if we're unbiased this is not super duper important. It's it's okay if the if it only had you know, 10% chance of coming up heads and 90% tales of vice versa you know anything close to being reasonably biased, unbiased is fine. Okay so if we could implement a collective shared coin flipping protocol with these properties, brokers album with super easy, because every time you get to the point flipping stage, you have a you know basically 50/50 chance of the quote, coming up the correct direction.

26:55
And then all the players adopting, the value of the star and then you have unanimity going into the next round and then the vertical ends, right? So it all comes down to figuring out a way to do a shared point, okay? So we're going to design a shared points, looking routine and it's, it's complicated in a lot of ways.

27:19
And when we throw away, all the complications, you can get at the heart of the matter. And that's what we're going to look at. First is a simplified point looking at. Okay, so we're going to assume that we have a little bit less than 1/3 of the players are going to be that.

27:38
Okay. So F players are bad. Their population is a little bit less than 1/3 and remaining players are good and the game proceeds in lots of rounds. So each round looks like this, the bad players ahead of time get together and they choose a value sigma, it's going to be the outcome of the point they get together and they say okay built me minus one.

28:01
Then each good player is going to privately flip a coin and the player. I will put the point called xi and announce it and then the bad players sort of look up the corners that of their eyes. See what the declared dollars of the good players are pretend to flip the coin and then just announce the points of as they like.

28:25
So they could be the two things are, you know. Good. Anyway they want. And if the bad players can ensure that the sign of the sum comes up the exact direction that they predicted ahead of time the make that happen and you know with exponentially small probability that it's impossible.

28:46
So we're just gonna like that never happens and say that the adversaries can always make this happen and we always do makeup. Okay, so what's the goal here goal is, you run this, this game for tee rounds and then you look at the transcript of everyone's big player. Coin flip values and you try to figure out who was not looking forward.

29:06
Not looking very once any questions on the, the definition of the game.

29:17
So I'm going through a little trailer. So here we have a bunch of columns. So each row is going to be an execution of the game. We have seven, good players, three that players, mark, I'm doing great first thing. It happens, the bad players get together and privately right down the value of sigma that they want to achieve.

29:35
They want to assume to add up to something negative, the good players all fit their coins and you get, you know, let's say this. So the plus ones are slightly ahead of the minus ones and the bad players have to force the sum to be negatives. So they're going to choose minus ones, make a negative brown, two pictures.

29:56
Another dollar is sigma, doesn't have to have anything to do with the history. This is minus one again, the good players slip, their fair coin. Flips minus ones are in the lead, the obligation, and the bad players is now, not so strong. They did this. Don't have to make it positive and they what the coins and the some comes a negatives of the they win and they move on to the next version.

30:19
Now they be clear. Sitting at me one the the good players come up with their cone clips. One is in the majority of this points. So that players and groupings more or less as they like and they win again. Okay, so you play this game, not just three times three times, there's not a whole lot.

30:43
You can you can say, I mean you know X3 and X4 look very suspicious, right? They flipped coins three times and got in the same not couple three times but you know that's that's a transpose game, a whole lot before you get really good solids just for evidence of who's that.

31:02
And and he was looking very close. So you run this game for tea rounds and then at the end of two rounds to make a guess, okay? So let's see how this might work. So your little number line, you know zero let's say. The adversary is clear that sigma is minus one that's very intended outcome and we'll just consolidate things and say SG is some of the good flips and SB is the sum of the bad foots.

31:29
Okay. So SG is already negative, then the bad players, look at this and we're like well we're our thought, you don't have to do anything, you know, except not screwed up, right? So in this case, if they just set, SB to be zero, you know, equal numbers of plus ones and minus ones.

31:43
Then the sum will stay negative and everything's fine, or they could do something a little bit different. As long as this, the total same negative. So, the more interesting case is,

32:00
This stone. Yeah. The more interesting case is when the good players put their coin flips and they're in the direction opposite sigma and the bad players, have an obligation to push this some across zero. Okay. So in this case, the sum of the bad point, flips must be at least the magnitude of the sum of the good form clips.

32:29
Okay. So we can analyze this statistics of these good coin flips because it's visible process and then infer about escape. So if you look at the second moment of SG and expand this, it's just a sum of, you know, some quadratic number of terms and because you know eyes point flip and JS point flipper independent, the expected value of their product is zero, right?

32:57
So most of these terms, they zero expectation, they'll only the only terms here, they're going to zero expectation of it on the diagonal and excise squared is is always plus one. So there's n minus f. Good guys. So expectation of the second moment is, is that minus f um, and what's SD doing?

33:20
So, half the time SB has to be SD squared, has to be at least zero and a half. The time it has to be at least as large as SG square. So, we can infer that the second moment of the SB is at least one half times n minus f.

33:41
Okay, so how does this work? We play the coins of games for tee times and we're gonna pay attention to just one statistic, which is the correlation between every repair players, okay? So we're gonna, you know, so this this notation your XIT means in the teeth iteration of the game.

33:59
What was the coin flip of x of a lot and the correlation score here between I and J is going to be something and you can see what if this would look like for good players because they're behaving, you know, the following purpose. So in expectation, the correlations for two good players is zero because they flip independent coins and you can use turn off balance to say that only that but the is there's going to be some mediations from zero.

34:32
But the correlations for is always going to be within a region around zero with roughly square root of teaching. Okay, so what about that players. So bad players and expectation are going to be slightly different. So if you look at, if you take the sum of all of these SV squared in terms of all the iterations that you play this game and expand this out here.

34:59
Well, there's key iterations and each one. We we know about the round is an expectation already, okay? So what happens when you expand this, this sum there. So if you look at the the off diagonal entries here. So we're I, you know, two terms are INJ or are not equal.

35:22
You get something that looks exactly like the correlation score. And if you look at the diagonal entries, then, you know, with these are that the sexi squared is just one. So that's we also diagonals, here are captured exactly by the correlation source between all the pairs of that players.

35:40
And this extra term here is just f for each of the two relations. Okay, so this is the, this type of the, you know, the sum of all the correlations force you to subtract off, TF from some quality here, and because there's F times f, minus one. Correlations, four is in total.

36:00
This means that the average correlation score is going to be lower bounded by this quantity here. Okay, so it's at this point that we're actually using the fact that F is slightly less than one third of the total population. So n minus three F here is about epsilon. F.

36:18
And in total, this gives you a sort of a bias and correlation sports of something on the order of X1T. Over M.

36:30
Okay, so what do we know the good, the good good correlation, just because of, you know, normal deviations random noise is going to be roughly on the order of square root of t. And we know that somewhere out there, there's a bad bad correlation. That's at least epsilon t over n.

36:47
Okay. So what do we do? We make tea big enough so that a bad correlation exceeds the tolerance frame into the porridge. And there's a third category of course which is good bad correlations, you can't see anything about that, okay? So we set tea, you know, so that one of these bad badges is going to be above the threshold.

37:09
That means that both parties cannot be good. Okay, so at this point, might want to think about this is kind of a sort of a graph problem. So we have a population of good players population of bad players. There's a you know you can draw an edge and let's say the the darkness or grayness or you know thickness of this edge of corresponds to the magnitude of the correlation score and the good, the good edges are going to be very faint because it's just random noise.

37:40
It's just normal. You know, things that you expect from branding forebooks and click on the band. Bad edges. Here is going to be darker on average and there's going to be at least one edge there. That's, you know, the darker than any good edge and we can say, almost nothing about this processing to cut.

38:00
So if we just grab the edge with the maximum correlation score out of this graph, and look at those two players, we're only guaranteed that at least one of them is bad. Okay, so this is sort of a disappointment because, you know, started off kitchen the game saying, we want to identify a bad player with high probability and all we got is a pair of players, one of which of them.

38:27
Okay, this is fine for our purpose. This is good enough for Byzantine. Okay, so the real coins looking gain is just like the fake, the warm-up one's looking game, except the bunch of complications that are a little bit technical and I'm, I'll go through them. But, you know, I wouldn't get too split by all the stuff.

38:49
It's just, it's just the basic point of the game with some more, you know, tweaks and stuff that are kind of annoying. So, one thing that we have to do is we can't have each player. Flip one point, we have to each round of the game, we have to have them flip, a bunch of phones, so Xi is now not a single point.

39:07
It's the sum of of M point books and the way you arrange, all these coin flips and in disseminate your outcomes is through a blackboard terminal. So the way the blackboard primitive works is like this, this is a from king and size paper is a blackboard, just a big matrix the columns are indexed by the players.

39:28
And what column what player I tries to do is just right points into their column sequentially one after the other. And so, you know, I would like to, you know, write random for influx into this column and everyone's kind of between this, you know, you know and have their own case basically.

39:49
So because you can only have F crash faults in total. You know that eventually n minus f is columns will be full, okay? But the F columns that might have been processes that just kind of die that halfway along the way are not guaranteed to be, you know, full.

40:08
It could be completely empty or partial, okay, so that's one wrinkle is, you don't know how many points. Let's are gonna be something in total. Another wrinkle, which has to do with the scheduler, is the last entry on all the partial columns, might not get registered by everybody. So, everybody is seeing a blackboard and two players blackboards might disagree, but they will only disagree in these orange positions and some of the most think that, that position is there, and some of them think.

40:36
Well it's it's it's a blank. Okay, so we have the absolute blackboard which is sometimes the count truth, and then we put these little parentheses up here with P. This is what P thinks XI is. So we have XI is the real value of this sum of up to I column and X IP is whatever P thinks that to be based on its political view of the black code, okay?

40:58
So one superannoying thing is, you know, when you evaluate the sign of the coin flip, what you're going to do is sum up all the points that you see on the blackboard and take the sign of that. But better, the majority value is and because two players can disagree by what they think the sum is.

41:14
By up to F, they can come up with different conclusions about which way they're going. Flips came up. Okay, so to see how big M has to be. You can sort of imagine you know even if everyone were flipping fair coins, you have MN point flips this were you know nice well be a process, this would have a standard deviation the you know, square root of MN and let's say that speak delay and sign up for me.

41:50
Yeah, let's say that sigma is the the true sum of all the points. Okay? So because of this uncertainty about, everyone's different perspective, on the blackboard, there's a plus, or minus f, uncertainty interval around sigma and everyone's view of the world is in this is in this uncertain interval somewhere.

42:08
So, this is kind of a problem if this uncertainty integral includes zero because the adversary can kind of for free, give some population, the view that the sum is negative and some population do that it's positive and just create very easy. Disagreements here. So, what we need is that, this uncertainty interval is much, much less than a standard deviation.

42:30
And that basically means that m has to be much larger than him. Okay. So that's one. Wrinkle in the brings in the simplify coinc. There's some other wrinkles as well. So, what we're going to try to do after running this game for a certain amount of time is kick players out of the game.

42:51
If you think they're bad, maybe we'll take players out of the game if they think one of them is that that's not too bad smart fronting you. So, the way we're going to do this is with weights. Okay? So, every player initially has a weight one, which means they're a full persistent.

43:05
And this weight can be slowly reduced over time and all the way down to zero. Okay. And the way these weights come into play is that xi is the sum of the eighth column coin flips. And when player pdk declares an outcome it takes this sum and scales it by the weight device.

43:27
So if I if the weight of eyes less than one, this player just doesn't have as much of an impact on the game anymore. Okay, so we're gonna tweak some other things too, because we have weights correlation is measured with weights, weighted correlation as well. We're also going to pay attention to deviations, okay?

43:47
Because we're looking individual points, you know, the value of excise squared is now not fun because we don't forget individual importance. I squared is it's own, you know statistic worth tracking. Okay. So, these are the only statistics that we paid attention to correlations and mediations. So there's another wrinkle just pulls up which is the scheduling adversary can take.

44:12
Basically take F players out of the game by just never delivering the messages they can kind of pretend like they had crashed also something so the resiliency goes from slightly less than one third to slightly less than one fourth, so have a bigger question. Oh yeah, it's okay to interrupt you.

44:34
So please. Yeah. So otherwise subjective or the weights, does every processor of different idea of what the weight should be? No, we actually change that between these submission of the stock paper and the what the proceedings version will look like you, simplified it a lot. So now thank you got in I take it.

44:53
Yeah I did and now everyone agrees on what the weights are, okay? Thank you the way the weights are common knowledge. Okay but not so yeah I can I can you know, explain the stuff. No no that's great. Thank you. Okay? So yeah so from the good players, point of view, every time they play one round of the game, the expected contribution this deviation because they flip up to M coin flips is his hand, okay?

45:22
And there could be some, you know. So in expectation, it's empty and there can be some noise around this because it turn off basically, okay? So this whole green thing is kind of like the allowable deviation that you've expect from a completely normal, good process. And we're just writing that as alpha help us just, you know, sort of the upper threshold of what you would expect from a good process and the correlation is the sort of the same thing.

45:47
So an expectation, the correlation zero for two good players, but it could be off by that because the channel sounds, and we're writing beta to mean this purple part there, which is to be outer thresholds or plausibility of correlations for two good processes. Okay, so how do you detect bed processes?

46:06
Welded their deviation score could be a lot larger than alpha or over. You could find a correlation score. That's much larger than beta. So, if you see a deviation sport and it's much larger than alpha, you know, it's fat. And if you see a correlation score, that's much larger than beta, you know, at least one of the processes is that okay.

46:24
So the main you know lemma that we're trying to show here is that you take a deviation score and you subtract off basically what you would expect to see from a good process and you take a correlation score and subtract off. What you expect to see from a good pair of processes that for bad processes.

46:44
The you're going to find a deviations for correlation score. That's way above these two thresholds. And in fact, this some is bounded away from zero by some quantity. Okay? The fact that it's pounded away from zero steampunk good. Okay. So basically, there's going to be a correlation score of deviation score that app.

47:05
That outs at least one of the bad players, is the other shot, okay? So back to this sort of graphical view of tracking the correlation source. So the correlations for still correspond to these edges, normal edges. And the deviation score is now get reflected in this graph itself because this is deviation.

47:24
So we're only pertaining to one players points, okay? So we create a capacitive graph after each epic of this game, the goal of this is now to reduce process weights. The vertex capacity at each vertex is its former weight. The edge capacity of a loop, depends on the mediation score of that node and capacity of an anti-J.

47:46
We're actually a different is the correlation score for reflexive colors and because we subtract off these allowable thresholds for kind of a good behavior. Just what you expect to see with random noise, the clique on the good players has no edges, no, all edges of zero capacity between two good books.

48:05
So the only edges that exist are crossing the cut and okay, so once we know that a positive capacity edge, means that I or J is bad. You can imagine doing the following. You could say, look at the weighted graph. Take all the vertices that are incident to, at least one positively capacitated edge and throw away that entire population, this doesn't work because you have no guarantee that you're throwing away, more bad, players than good players or even equal numbers of bad players, there's no, no such guarantee.

48:43
So here's a second try, which is a little bit better. You find you? Look at the capacity. Yeah, you should have a guarantee that you're throwing at the most as many good players as that players. Great. No, they get us crossing, the cut can be pretty much arbitrary. So you could have every blue edge on the left.

49:03
Be incident to some red, note on the right. So if you throw away every note, that's touching. Any capacitated edge can throw away the entire population. It isn't that under twice as many blue vertices as red onesies. What am I listening to degrees? Can be anything. You can have a red of node on the right touching 11.

49:26
Blue nodes on the left. But you can see you're saying that. Like, once you've thrown away a red vertex, you don't get to erase all the other edges out of it. No, I'm just saying that you can't just say, throw away every vertex that's incident took opacity to edge that doesn't work, okay, but what's ever watching?

49:53
Making a meeting what it wasn't sort of matching, you know. Yeah. So more reasonable ways to find the maximum matching in the subgraph of capacitated edges through all the vertices in that. And now you're at least guaranteed, parity, you're throwing away. At least as many bad players as good players because maybe all those edges cross, the fact.

50:13
Okay. And one what is, what is throwing them away mean. It means takes their take their weight from one and just make it go all the way down to zero. And this has a lot of good things we're going for it, which is if you start off with a three to one or three plus that's one to one weight advantage, and you throw away, equal numbers of good players, and bad players.

50:31
Then you're your ratio actually improves over time says, you know, the same for the argument but it actually improves, so that's good. The con, which is, I don't know if it's a deal breaker, but it's super annoying to think about, which is the graph was. Derived from the the correlation scores in the mediation source.

50:52
The correlation scores in the deviations force were arrived from the history of all the blackboards and everybody disagrees about the history of the blackboards through in numerically small but potentially significant ways. So maybe what they the adversaries going to do is the strategies create very subtle, disagreements about the scores and have every player blacklist, a different set of players and that's super annoying to think so.

51:22
We don't want to do that. We're going to do something slightly different. What we actually do is find a fractional, maximum matching mu using a certain algorithm that we call rising type. So I'm not going to show pseudocode rising tithe. I'm just going to run through an execution and you can kind of scale.

51:39
It looks so, I mean, the name rising tide is kind of a dead getaway. It's sort of describes what the algorithm is happening. Okay? So the the other graph here that the pink vertex capacities are the initial beats the green edge capacities are you know, stim to these, let's say correlations works, so and we're going to find a fractional matching which is the brace stuff and it help the fractional matching starts off with zero and all we're going to do is just raise up the grade values for as long as possible continuously until we get some barrier.

52:16
So we raise them. All up to point one. And at this point, we have two edges here that are saturated and they cannot increase any more. So they get frozen. And everyone, all the other edges keep rising, they rise up to point three. And at this point, this vertex is now saturated because it has point one in a 0.3 into it.

52:40
So, it's hit, it's saturation. Level of four, this black vertex became saturated because it has point three point, three and point. One, and this edge over here, also became saturated since it can't rates anymore. So the only thing that's still rising is this topic. It has to be 0.5 and potentially equals 5 and then you're in, you're done.

52:57
That's your fractional next in the match. Okay so we use this fractional natural maximum matching just like the maximal mapping to knock the weights but now we dock it fractions. So the capacity of a vertex was its former weight. Then the new weight is just the old weight subtract after you subtract off all of these fractional matching values incident to it.

53:21
Okay? So that's the way, the weight up the works. And the main thing that we show here is that every process is running this algorithm on a slightly different graph because it thinks the the edge of these are slightly different. So our main, you know, result having to do with maximal matching is this particular algorithm has a continuous lift chips property.

53:46
Meaning if you perturb the input by a some amount you will protect each number that comes out of the output by some at most some amount proportional to that perfect. So there's no sudden this continuance basically.

54:04
Okay, so yeah. So the theorem is that if you have two graphs and they're they disagree in the vertex and edge capacities by eight of media e, then if you look at any number that shows up in their respective outputs, they're never going to disagree by more than you know, this quantity could be plus density.

54:21
So this is where the processes come to very slight you know disagreements about the nature of your alley. So what this allows us to guarantee is that if you look at any point in time, you look at the total weight deducted from good processes and the total weight product back processes.

54:39
The good process deduction is, is less than or equal to the bad process. Deduction up to some really negligible quantity here. And what this means is that the good process is always maintained this three to one rated damage as time works on. So questions, are you said you sure that you recently recently been able to to improve things that are all the good processes that we are all the weights?

55:08
Yeah, so what, what we start to see? What this lip should condition is sort of important. Like in the case where you allow personal weights, How did you what what is the, what is the thing that you need to do? Are, how do you adopt things that you can get every positive muscles to treat?

55:27
Oh, it's very simple. So, everyone's running this algorithm using their own local data, right? So there's n different graphs and everyone's coming up with their own output which is a week factor, right? Right. And then what you do going into the next round is you just say it processes?

55:47
I declares that the weight of I is what I got, and my local computation today. So probably the weight of I is, what weight I have an insulable competition using its graph. The weight of J is what J got out of its local competition doing, it's thing. And all of these things are validatable.

56:04
You can't lie about it, right? So but yeah. So, that becomes, that becomes the absolute truth, right? So the reason, yes, but because I got updated according to one graph, and Jaga updated, according to a different graph. They could be some migraine consistencies between you lose this property. That the good deduction is less than or equal to the that impact.

56:26
And that's why there's a small added error at the end, okay, it's so when you, what does everyone choose? I mean, is there some sort of was? So every everyone everyone declares their own weight essentially, okay? Right. And you validate that someone's declaring and honestly but you declare a weight.

56:46
Everyone looks at that and says yeah that looks like a great way. Continue to put points as I think probably. So, the bad guys. Of course, I guess if somebody doesn't the other way you just mark them as bad. No. What we're declaring? Their weight is clearing their history.

57:04
They're saying my weight is based on this history, if that history is completely invalid, then you don't need to listen to them anymore. Oh you're still waiting to hear from some people, right? You have one still waiting to hear from some, right? But the whole point is, is that if anyone ever gets past that flips, a coin, that means that their weight was validated, so you might not know someone's weight.

57:25
But the only reason you might not know, someone's weight, is that they never successfully. Put the point after that. Nice. According to some, except I say, anyone who's calling you except is someone who is declared that way, right? Thanks. Okay. So yeah, so how does all this develop that the way that the bad guys epic by epic is getting slowly reduced and the good players, maintain this three plus epsilon to one weight advantage and eventually if you run this, the bad players.

57:56
Wait. So and at this point, the adversary is not, you know, impotent. I mean it's still has the scheduling powers so it can still bias the coin, but it can't bias it by more than a constant roughly. So it's, it really doesn't have any power to prolong the game at this point.

58:13
So everyone is flipping a coin, it has a constant probability of coming up the correct direction and everyone agreeing on top come. So this point Brock has already been gonna naturally come to it very quickly and totally agency is really huge. So at epsilon is constant. It's end of the fort, it's already too big.

58:33
And if epsilon is tiny like if we have our, if we have the smallest possible advantage over end of the fourth. Like m minus one number four, then an epsilons on the order of one over n. And this is like end of the eight, which is really bad. So this sucks and a lot of constructions that maybe you know this fruitful to understand the he's detection point of engage better improve the resiliency to represent over three.

59:03
It's some ideas about this. One thing that's interesting is that deviation is in correlations really dedicated barrier and number four and they're completely useless beyond that threshold. So those those two statistics by themselves, just aren't powerful enough to get you beyond that. Number four and you can improve the latency a lot.

59:21
So game inside is that algorithms are the one with exponential local computation time has good latency end of the 2.5, the one with lower latency lower resiliency, the polynomial local complication is latency about independence and cute. All right, thank you. So did you think about? I mean, really helped to use instead of pairs like triples or something like this?

59:51
Yeah, probably I'm not sure if it triples. I mean, triples might work better if you're like right at end over four just below and ever before because I think there's some information there but this is kind of a phase transition at number three where you're not going to expect to see weird correlations between three players, I think look at the latency.

1:00:17
Why is the latency? Let's what's with the latency was? It's so bad. Why is it not as good? Say so the that our blackboard is not in rose high, it's end of our epsilon square goes eyes. When epsilon, get really tiny the blackboard itself. Nice below though, so that's one thing.

1:00:39
And then the number of iterations that we have to run in an epic is quite large. And at the end of an epic we really only fractionally blacklist one player. I mean it seems like you're gonna end up black wasting a bunch of players, but we couldn't root that.

1:00:58
So it's only guarantee that you can basically blacklist one unit of weight away. So that was a great talk. Thank you.

1:01:10
So what? Well, so what were some of the ideas done for improving? The resilience like, is this with the idea being a to look beyond like, just their words photos, you know? I, yeah, but I wasn't about. I think that might be a good idea if you're trying to like hit and to the answer like anniversary and a half.

1:01:33
If you're trying to be resilient to anniversary and a half, looking at more complicated, correlations might be a good idea. But at any number three, I, I think you're you're not going to see any of those correlations. I don't think there's going to be interesting relationships between three players basically, because the adversarial strategies can get can be much more interesting visible from filming care points.

1:02:02
I think so participation.

1:02:09
Sort of pretty hard raised around it. No. I had a oh, I'm clapping. Okay. Sorry, you know, these zoom talks. We we don't have the opportunity to everybody clap for the speaker anymore but oh well thanks that that was a great talk or

1:02:45
Well, all right, let's ask

1:02:52
Right at the rate. Yeah, I mean that's why it's another question. I mean like on this stuff really interesting. Although I'm I never haven't been able to make any progress, significant progress. So, you notice a lot about endless that comes from the comes from consensus comes from non-vising, just crash faults.

1:03:17
So have you tried at all from the other end, I'm interested in. Like, why is there this gap? And is it necessary? Yeah, you haven't looked at the other end, right? No, you looked at the resilience but not the latency. Yeah, as a lower bound. No, that's a problem.

1:03:40
Either upper patio or lower the mean, but yeah, I guess from the lower bound point of view. I'm kind of curious to. Yeah, No. Okay. Just wondering this is much more natural. I really like this your approach. So oh also sorry but just so stuff. I was thinking maybe from what we have a moment from.

1:04:07
So set the house. This question to me lost last night which was you know I don't it was something like it's sort of just the definition of a latency right there. You know, we have a sort of definition in our paper, which is basically the one that is part of this communication graph and and I guess like in some of the distributed being textbooks, they serve more of us the same definition.

1:04:32
Um, I was just pointing out this just issue where like if if you haven't ever you can have a communication graph where their messages that are sent you sort of have this long path even though like the path is sort of made long by the schedule, even though the message is in in the path or not like required for it's something you have this.

1:04:56
It's sort of this relationship where, you know, like the message that you received from processor and was needed for you to set out the message. Just not. That's not a, I mean, I thought it's the path of that. That like what you would have to receive in order to go to the next way, what it should be.

1:05:19
So it's basically like it's not really communication crop so much as it is like it a growth which is like of the messages required for. There's some sort of required for this two definitions, right? And this supposed to be equivalent, that's one. And the other one is that, if you assume that there is a delay that you don't know what it is, right?

1:05:41
And you said it later the maximum delay for any that, any message could take to go from A to B, you know, across one inch from one node to another. So, but then, that's not the length of the longest pop though. So if sorry, I was actually confused about, this is a couple weeks ago because I'm teaching distributed algorithms, and yeah, they have this notion of time that executions, right?

1:06:07
So, if you put that maximum delay, right? And then, you can't say that it's the maximum delay times the longest path because because white take the longer path. It took a longer path. Because shorter paths had the maximum delay so it could only take the longer path because things maybe I didn't say that, I didn't say it was the delay times that.

1:06:32
I mean it's if you it's just the maximum time, the men. So, think the cost of the problem is, the minimum you can do and in terms of units, whether the unit is the, you know, it's a number of units of delay that you need in order to, I think it's the exact same definition as saying it's the short.

1:06:59
If you design your algorithm there, the shortest chain possible. It's a length of that, that maximum shortage to check the maximum. You minimize the maximum length of the chain of messages that you have to wait for in order to execute the algorithm. And that is you that length of that maximum chain, when you've minimized that maximum change is the time of the algorithm, right?

1:07:21
Or the time of the problem that it takes. But you can look at in terms of delay, I don't see, I don't. But so, I mean, if you do it in terms of the way, it's kind of. So, so you're saying this city got this time, delta up, right?

1:07:41
For the conversation and then you don't know what Delta is, okay? You don't know ahead of time. What Delta is going to be? Yeah, of course. Yeah. Okay. But then you're. So there's a certain minimum minimization of your viewing and the only minimization you can do is a chain really, it's the same.

1:08:01
I don't think it's any different from that change definition, which is different, right? Give me, how would it be different? But you can form a pain of messages where each message was sent and, and then arrived before the next message was sent and arrived to give a change, that would be no logical.

1:08:20
Dependency between them would be your best algorithm. What would it? Why would you whether you might get a message and I receive it like, you can't not receive it right? It's received. So anything you send after that came after that, but it's like, you would say, wait for this, you wouldn't be your algorithm would proceed whether or not you received that message, right?

1:08:48
You're saying there are messages you'd be waiting for that. You don't know. Imagine it like this. And that like, like let's say you have, let's say you have a hundred processes, they're running, concurrently they have nothing to do with each other's. No logical dependencies at all. The maximum chain length for the union of these processes is a hundred times the chain length of one think so, you're another process that's waiting to hear messages from all these processes, right?

1:09:22
I'm a process. I'm a process and I'm running a hundred separate independent processes. Concurrently so every message that comes in relates to one of the processes leading to here all these messages. Yeah. Remember doing a certain way. So if you run a hundred of these, concurrently the chain length can be a hundred times, your own wouldn't be your album would be just wait the best algorithm to process to wait for these guys to write to me, right?

1:09:50
You should probably have a direct edge to, you know, because stuff like this, you're not going to wait for everybody. You can't do that. So we didn't get that message from people. You you later? You wait for a certain number of these guys you to write to you and that would be it.

1:10:09
A length of one from each. No. Right. You're just waiting for them to write to you? Please independently. So you're reacting immediately. Whenever one process is allowed to make progress, you send out replies to that message, saying, let's make products. I don't see that as a chain because you're not waiting for eight or right to be the right to see is not writing for the hear from being.

1:10:29
That's what I'm saying is they're not turn up. Required is received this message strictly before this message was sent, it's not a change. Describing the schedule. Like how that actually happened? It's a change. Describing the algorithm, you wait for this week for this, current, before you send your message.

1:10:48
That's why, that's why send it to you guys. There's two definitions. Excuse me. I mean, a chain of messages is empirical, right. You say no unique. It's the definition of a chain here. Yes. I mean, it's not just. The other definition is has a real basis. Like can't you have algorithms where they behave in a certain way, but you're not really, you're not entirely sure what the dependency is or whether it was fixed in advance.

1:11:14
Can it be a dynamic dependency where it's like? It's unclear like, right? No idea. It would. Finally. Do you want to force everyone to stay? Yes. Yeah. Okay, let's go ahead and make this speaker. I'm sorry. Thanks once you around and look very hang around, but thank you. So,

1:11:39
I'm going to go ahead and turn off the recording also. So because half of those recording stopped, okay, have you looked at, I don't have a book because I'm honest the battle, and I don't have my books with me. I'm sure that they don't, they define it in a way that excludes some, like, they're not talking about how it actually operates.

1:12:05
They're not saying that AEC, something from B before, B receives Tuesday, they're talking about the the algorithm design. In other words did behalf, did they have to wait to receive the message from me before it could send? So that it's the outcome design, not the actual execution.