Transcript Search in video 0:00 good afternoon good afternoon so today we will look at bypassing and EHR so EHR 0:09 is a new thing which you haven't seen so far in blue speck though you may have been implicitly using it because we have 0:17 given you modules which could all be put together using EHRs but today you 0:23 demystified and you can write your own you can use the hrs and produce more stuff right so by it bypassing it's a 0:31 big deal which has been emphasized to you and lectures by semina and and and 0:37 daniel at a very high level it's really an idea so that we can reduce the number 0:44 of stalls something that takes multiple cycles can be done in one cycle if we 0:50 can provide some way of reducing connecting the producer of a value with 0:58 the consumer of the value so essentially bypass as the word implies is providing 1:03 a path an extra path so for example normally you will write back into the 1:09 register file and decode stage tries to read it so that's the normal path and a 1:15 bypass essentially means that you're going to provide an extra path which bypasses the register files so as the 1:22 value is being produced you can use it in the decode and if you can do that then it's going to save this cycle of 1:29 storing it and register file in and using it later right now there are a 1:38 couple of things which are just well known and yet it's hard to measure 1:43 unless you have the right tools unless you do synthesis of your design and one of them is the benefits of bypassing at 1:51 some level are very obvious to you you'll be able to see oh yeah I should eliminate a cycle here stall there etc 1:57 but what is much harder to estimate is what impact it has on the clock cycle 2:03 because bypass by its very definition introducing new combinational paths in 2:09 your circuit right in your design and anytime introduce new combinational parts there 2:15 is always a chance that that path suddenly becomes a critical path and therefore your clock period may get 2:21 larger have to be larger in order to accommodate this by passing logic and 2:27 this thing is something where you know intuition really is very hard to 2:32 quantify it without proper tools so we'll show you some very simple examples of this 2:39 the next question is that bypassing is a good idea but there is no such thing as 2:45 a free lunch so there is some costs associated with it in terms of area in terms of clock periods perhaps and then 2:54 you know should I do it or should I not do it so that depends very much on how 3:00 frequently you expect that by path to be used if it's used often absolutely but 3:06 if it's used rarely then its overall impact of the performance may be minimal 3:11 so this is something else where we can do lots of measurements on the kind of programs running or the hardware you're 3:18 designing how much impact will bypass have and this goes beyond hardware it becomes a issue about what kind of 3:25 software is being run on top of your processor now when it comes to bypassing 3:31 in blue speck you're not going to find yourself connecting this point to that point combinational II doesn't work like 3:37 that right so there's a different higher level view of bypassing right so you 3:43 think of it in terms of reducing the number of cycles it takes to execute two conflicting rules or methods okay so let 3:53 me illustrate or let me ask you questions so suppose I want to build a FIFO right where I want to be able to 4:02 end queue even though 5/4 is full because concurrently somebody is going 4:08 to be D queueing it now obviously for correctness what you want is no nobody 4:14 is D queueing it then it behaves like a full 5/4 you can't do an at that point on the other hand if you 4:20 could detect that's all somebody is actually going to end cue sorry DQ at this time then you can also end Q so it 4:27 looks like as if you include in a full FIFO so this is a slightly different way 4:34 of thinking about you know the concurrent methods we have had so far where we just said oh this NQ + DQ can't 4:41 be done together because there's a conflict of some sort right there I think the same register or something 4:46 like that but there is a reason why we want to go beyond that here is another 4:52 example which you have seen earlier so there are two rules here RA and RB and 4:57 you can see quickly that they conflict with each other why because both of them you know first 5:05 one writes in X and the next one writes in Y and the first one reads Y and the 5:11 next one reads X so there's sort of a cyclic dependency in these rules conflict because you have to schedule 5:17 them one by one but you can ask the question what will happen if I really want to execute them 5:25 in the same cycle what is preventing me from executing them in the same cycle so what you somehow want is to communicate 5:33 this value of x from rule RA to RB right in the same cycle if possible as opposed 5:40 to waiting for it to go into the register and then coming back you should have a bypass for register X and if you 5:47 can do that somehow then it may be possible to execute these rules together and if you execute them together then 5:54 it'll behave as if rule RA happened before rule RB so I'm giving you very 6:02 high-level issues here you know where I have a very clear goal what I want to do 6:07 and now the question is we will discover very soon that the only solution to these things is introduction of bypasses 6:15 so we will also introduce bypasses that is in a very different way at significantly higher level then it's 6:22 done in traditional digital design right and the these kinds of designs that I'm 6:27 mentioning on this slide are not possible the subset of blues fake you have seen so far okay so why is that limitations 6:35 of registers if you want because that's the basic abstraction on which the whole 6:41 thing is built up in blue spec okay so when you use registers for communication 6:48 right in the same atomic action certain 6:53 kinds of communications are just not possible so for example if you are writing into that registers you can't 6:59 read it in the same cycle you read it in the next cycle so what that does is that two methods can't communicate as you 7:06 just saw between RA and RB because X is being written here and to preserve the 7:12 semantics I want this X to be visible here well then I have to wait until the next cycle okay so if if you could you 7:20 know if there is a desire that two methods communicate with each other in the same clock cycle in the same atomic 7:26 action then something is needed because registers are not going to let you do it similar issue arises between two rules 7:34 and two methods and between a rule and a method so in all these things if a 7:41 register is being set it can be read by the other entity only the next cycle and 7:47 this is what we want to get over we want to you know see how to solve this 7:52 problem and actually there are quite a few solutions and I'm not going to give you the history of them and you know it 8:00 depends I sigh in the eye of the beholder but some of them are pretty ugly they will turn blue spike into from 8:07 a high-level language into a very low-level language so this is the solution that I like best which is 8:15 called EHRs EHR stand for ephemeral 8:20 history register a mouthful of a name and I won't go into why we named it like 8:25 that but EHRs is it a primitive element to design modules with concurrent 8:31 methods so in other words what I'm gonna do is until now you had a blue speck 8:36 with registers now you're going to have a blue speck with registers and EHRs do 8:42 an EHR can to be used as a register ordinary register so HR in some sense subsumes 8:47 the register but let's not mix them up I mean when we use EHRs I want to use it 8:52 explicitly so that you know what you're doing with this okay 8:58 so this idea was introduced by John Rosen banded his thesis and there are many many interesting aspects of that 9:04 work but just imagine I'm designing a module and what is my register with 9:10 enable that's what I have shown you on this slide right you have a flip-flop and you know you keep feeding it back so 9:17 if there is a new right coming in it's enabled then the new value gets captured otherwise it doesn't change its value so 9:26 that's what reads and writes behave like an ordinary register so this is the normal behavior and if you do them in 9:32 parallel it behaves as if we'd happened and only after that right happened 9:38 because the effect of right can only be seen in the next cycle so far so good 9:45 all right so now I'm going to embellish this a bit extend this a bit how about 9:52 this suppose I also provide another port on my register which takes this output 10:00 and puts it out so what is r1 now 10:08 how does r1 behave it's actually going 10:14 to read the value if the new value is going to be written in other words if if 10:20 you have you know Arvin returns the current state if w0 is not enabled so if 10:26 nothing is enabled it's just like r0 on the other hand if w0 is enabled 10:34 sometimes is trying to write into this register then r1 will see that value which is being bypassed it's a bypass 10:41 spot you know you can see it even though the register anybody who's reading r0 10:46 will only see it in the next cycle so here is a register element where I'm 10:53 giving you both I'm giving you the normal read and I'm giving you the bypass read from it 10:59 so that's half the idea of EHR the second idea is that we are also going to 11:07 give you another right which is prioritized right so if you have w0 and 11:14 w1 and both are enabled then I want the system to behave as if w1 is the one I 11:20 wanted so you mean the same clock cycle if you want to do w0 and w1 I want w1 to 11:26 prevail and this is this can be done simply by the circuit so I take 11:32 something that is coming from you know w1 and then if w2 was sorry W 0 and if 11:38 w1 was also enabled you know we will select that using this enable and that is the value that will be stored in your 11:44 register and if you do it like that then you know it's very clear that W 0 comes 11:51 before W 1 though there is no guarantee that either will be enabled so you have 11:57 to understand the behavior of the circuit you know when neither w0 and w1 is enabled somebody tell me what do 12:05 reeds look like in that case it'll be the old value whatever was 12:12 stored in you know register beforehand that's what you will see suppose w1 is 12:20 enabled then what will happen I keep 12:27 though I keep jumping suppose w0 is enabled what will you see 12:35 so r0 will still see the whole value or one will see the new value that is 12:41 coming in and what will I see in the next cycle the value that you just wrote 12:48 right w0 is what will be visible in the next cycle now tell me what will happen 12:54 if w0 is not enabled but W 1 is enabled 13:05 W 0 is not enabled but W 1 is enabled yes yes and what will R 1 C very good so 13:22 both of them will see old value but the value that will be stored in the register is the new one right is the W 1 13:29 value that you're writing and now it's a easy extension suppose both of them are 13:35 enable tell me what will happen yep 13:49 so in the next cycle I'll see w1 fantastic so everybody got it so 13:54 functioning of this is straightforward but like in most things in Duisburg we 14:00 don't want to think in terms of low-level circuits we want to think in terms of some high-level thing and that 14:05 would be an EHR declaration so anytime you do any HR you're gonna get this for portrait device to read ports and to 14:12 write ports with its own behavior so we 14:18 like to capture these things in terms of conflict matrix just a reminder the conflict matrix for a register was 14:25 simply that to write reads of course don't conflict with each other and two 14:30 writes into the same registers of course conflict with each other but if you do them simultaneously it looks like 14:37 register read happened before the right right so that's what is captured in this 14:43 and now based on this I want to derive the conflict matrix for EHR so I filled 14:53 out the part which is just like the register right I mean if you're just focusing on W zeros and are zeros you 14:59 know it's just like the old stuff all right so now tell me what should go into 15:06 eh r dot R 1 right if you are writing something at that point so it will be 15:14 it'll look like as if the right happened and then the read r1 happened right 15:21 because by design that's the bypass you know you're seeing it you know come after that and the reads of course you 15:29 know remain conflict-free and these things always have a duality so if you think so these things then the 15:36 you can fill out the diagonally opposite entries in a similar way all right now 15:45 what about does this make sense by definition w0 comes before w1 you 15:53 have seen the circuit right so if you are trying to do w0 and w1 in 15:58 the same cycle it'll behave as if w0 didn't happen you know you will just 16:03 read the w1 value as far as storing the value in the register is concerned so 16:09 that's why you have that and again by symmetry you will have this over here now what about our 0 and w1 so you are 16:26 reading our 0 and somebody is writing in 16:34 w1 it will be that you know you will see 16:46 it in the next cycle you are not seeing it right now right because you only see what was 16:54 being written in I mean you know this 17:00 this is I was talking about how do I I should be pointing I'm looking at this 17:06 entry right so if you are reading something here and writing in w1 you 17:13 won't see it right now right you'll see it only in the next cycle so that's why it's less than that all right this is 17:22 obvious that our ones are conflict free and two people writing into W ones will 17:28 definitely be a conflict it's like the same double use of a port never allowed 17:33 and you can argue out what this entry should be so you know it may take some 17:39 thing but given that diagram and your understanding you should be able to fill this out so when you design this circuit 17:46 this is the kind of conflict matrix that would be generated but I do not 17:52 recommend you design this circuit because this cannot be designed by you you know it requires some very low-level 17:58 hacking inside do spec so from our point of view we're just going to use EHR as a 18:03 primitive you're going to use it just just as we have used resistors now we're going to use EHRs also as a primitive 18:10 and now what I'm going to do is actually do some practice let's design a few 18:15 things using EHRs so what we're going to do is design a pipeline FIFO you know 18:22 which allows you to in tune to a full five for as long as there is a 18:27 simultaneous DQ going on and we will design a pipeline FIFO which will allow 18:33 you to DQ from an empty five four provided somebody is in queuing at the 18:38 same time you know the value will slip through it and you will see it at the output and we will design a 18:44 conflict-free FIFO which we had done earlier but our conflict-free five four had a dead cycles right you could do 18:51 reads and writes but then you had to wait internally for canonicalization so from outside you couldn't tell but from 18:58 inside you'll be able to see that we can design a more efficient FIFO of this 19:04 sort where nqd could do not interact with each other they can be done concurrently and there is no debt cycle 19:11 you can keep doing n qdq every cycle in this FIFO okay 19:16 so what I want to show you is that whatever work you have done so far is already very useful we can use that and 19:24 transform those five five fours in a very systematic way right to generate 19:30 the FIFO of our desired so let's look at this so this is the one element five 19:35 four from lecture 17 or even earlier I think and here's the conflict matrix you 19:41 know you know n qdq can't fire together because they you know mutually exclusive 19:47 parts of this so this is the one element five four we had earlier very very 19:53 straightforward to write but doesn't have exactly the properties we want so the question I'm asking is can I convert 19:59 it into a five four where n Q will happen before D Q or can I convert it 20:04 into something where n Q will DQ will happen before n Q in other words can I 20:11 transform this into a either a pipeline five four or in a bypass five four 20:17 okay so let's see how does this transformation work I have copied the code again here and now this is the 20:23 conflict matrix I want in this so what 20:29 is it that I want I mean if I want to be able to in queue in a full FIFO then I 20:36 have to know if DQ is happening right if DQ is happening then I can do the NQ 20:45 also so what that is saying somehow is if V is being written right by DQ if I 20:54 can communicate that information to the NQ well then maybe I can do something 21:00 about it right because what dairy registers do not allow this communication you know two methods 21:06 cannot talk each other except through the next cycle I want to do it in the same cycle any ideas how we'll proceed 21:13 with this what is this reminiscent of 21:18 somebody is writing into V and another method bonds to read it in the same 21:27 cycle what does that remind you of 21:32 we need a bypass right of some sort okay so let's replace this V by an EHR so EHR 21:43 you know I'm just dealing with two reports to write ports but you're gonna have VH ours with n ports if you want so 21:50 that's why that 2 is written here ok otherwise it's just like a register you 21:56 know it's type is boolean and it's initialized to false 22:02 okay so when you have this what I'm gonna do is I write M v-0 and DQ and I 22:10 read in the same cycle so I am reading the bypass path from there how many 22:19 people understand this okay so what was 22:26 happening in DQ I was writing in V right so now I'm going to writing in me same 22:33 as writing V dot you know zero in that right I mean that's the first port I was 22:39 writing into it but the change I want is in my NQ procedure in my NQ method I 22:46 want to be able to see this right away as opposed to the next cycle so I want 22:53 the bypassed version of it so which is the bypassed value coming out of V V one 23:02 right so we you can write in V 0 V 1 you 23:07 can read V 0 and V 1 so when you write in V 0 and you read me one you are 23:14 actually reading that value at the same time we are not done yet this is just to 23:33 establish some you know at least we can communicate right so first of all the 23:39 moment you have made it in EHR this has become an illegal program because their RVs here which are not be a charge so 23:46 you make up your mind whether it agrees a register or with an EHR if V is an EHR then all the V's have to be changed so 23:53 that's the next question what should be the V in the DQ part when you're reading 24:01 it it should be zero right because you're 24:08 seeing the current state you know it is their space in it you know - is there 24:13 some sorry is there some value in it okay so this has to be easier 24:25 please ask questions whether I'm seeing too many blank faces here what have I 24:33 done by doing this yes because I could 24:39 give you a long-winded answer but the first one is reading the current element 24:44 you could have defined it different first you could have defined the first which is just going to read what who 24:50 came in but that's not what we are trying to do our only goal was that we 24:56 want to be able to end you in a full FIFO so you can experiment with that I 25:01 mean if you wrote it as v1 you'll get a different fire yet and not all of them will make sense I mean do say how will I 25:07 use that one you know this one is has a very specific purpose in mind who's but 25:13 you can try to design something with all kinds of funny entries this so what was 25:22 I going to say about this yes so remember our goal in this thing is that 25:27 NQ sorry DQ happens first and then NQ happens whenever you have such a clear 25:35 understanding that means anything that was going to happen first put zeros in it everywhere because you want it to 25:42 happen right you don't you're not depending upon anything else so therefore in DQ you have v-0 and 25:49 you're writing in b0 and you are saying NQ will happen afterwards right so what 25:55 does NQ do it was reading V so we already agreed that it should do maybe 26:02 one and not v-0 because we want this communication and what should happen to the Viets writing 26:10 it should be v1 right who's gonna argue with me that that's the right answer 26:20 what happens if you do NQ & DQ simultaneously this is the one element FIFO it'll save me full right and we've 26:30 had something in it now it's gonna get something in it again so you won't be one to be true so previously what was a 26:40 conflict that we was getting false and V was getting true as I make up your mind 26:47 we have resolved that issue we say you know it can get a zero it can get a true 26:54 or a false but in this case v1 has priority over it right it has precedence if if you are writing 27:00 something in v1 and that will prevail it doesn't say that somebody is writing in 27:06 v-0 but if somebody is writing in v1 that's the value you're going to get the next time so that's what happened here 27:13 that and again it's very mechanical if you want NQ to happen before after DQ 27:21 then in DQ everything is zeros in thank you everything is once 27:32 more questions okay so basically what we 27:42 have done is you saying NQ sees dqv has the right value in all 27:47 cases so there are quite a few cases which you have to analyze that NQ is being done not being done DQ is being 27:53 done not being done nothing is being done right so in all those cases you must end up with the right values for V 28:00 and D otherwise your implementation is not correct I'm a happy camper you know 28:08 so I got I use my ordinary FIFO and after a while you'll get so good at it 28:15 I give you an implementation say what do you want you want this before that okay these are the things I have to change 28:21 and you'll just go and you know introduce EHRs and use the proper ports into your EHRs to achieve that goal 28:29 let's do a few more exercises okay so let's now make a bypass FIFO so by PI's 28:36 five four is opposite we want in bypass FIFO two in queue to happen before DQ 28:44 right that's like saying that I want this value to be communicated here because if somebody is in queuing then I 28:53 want to be able to read that value now I want to be able to sense that oh something like this was happening 28:59 and in order to do that I'll have to change again to this V has to be changed to VH ours and view you'll change all 29:06 these values accordingly because if I write V 0 in NQ then DQ when I read V 1 29:16 I'm seeing the latest value I know that somebody wrote it but this is not a 29:22 complete picture because there is a D in this picture - and what DD want to see 29:27 when you return D right this is the do you want to see so that 29:37 means if I write in d0 down here I must be reading d1 if I want to read the same 29:45 value in the first operation that I perform on it and if you do that that means D also has to be changed into an 29:52 EHR are you getting the basic idea 29:58 behind the hrs that instead of introducing bypasses everywhere 30:05 we have encapsulated them in this new register element called EHR and we will 30:11 use EHRs to introduce bypasses everywhere yes 30:19 andcue have oh because this is a bypass 5-4 as opposed to a pipeline FIFO so in 30:26 bypass five for the idea is suppose the 5/4 is empty so that means you can't DQ 30:32 from it right on the other hand if somebody is NQ I could be clever and immediately pass it on right so that's 30:40 what the idea is that NQ happens and then DQ will sense that same thing is true for first okay so again the same 30:56 stuff no double right errors even though two people are writing because they're writing in different ports now and when 31:02 they write in different ports we have defined the priority that port 1 has higher priority than port 0 very good ok 31:13 so this was our 2 element solution 2 element FIFO where we were successful in 31:19 making NQ & DQ conflict-free but there 31:25 was this internal rule called canonicalize you know so you happily did your NQ and DQ and then they were sort 31:31 of cosmic silence for a cycle right but internally something happened right it 31:37 got into the right shape and then you could do more in queues and DQ's and this was explained in this conflict 31:43 diagram when I included the conflicts with canonicalization canonicalization 31:49 was mutually exclusive with all the other methods in the system and suppose 31:54 I don't want that suppose I really want canonicalization to happen as soon as I 32:01 am done doing NQ and DQ or either NQ or DQ I want canonicalization to happen all 32:07 the time and not take an extra cycle can we transform this program and by the by 32:14 now all of you know the trick what's the refrain 32:21 replace all registers by EHRs right 32:27 what do you want to happen first NQ & DQ 32:35 should happen before canonicalization then NQ & DQ are happening in parallel so they are conflict-free with each 32:41 other so there are all zeros right so this is what I I do here by reading 32:51 zeros what you're implying is anything you do right is based on the old state 32:57 whatever was in the system because I'm not reading any by past value same thing 33:02 is true for the NQ pot in this write DQ 33:11 part sorry everything is zeros we could do this much before we knew that I mean just 33:17 using registers NQ & DQ were going in parallel our only problem was canonical 33:23 couldn't execute execute because it conflicted so now what are we going to 33:28 do to count our canonicalization rule is 33:34 just waiting for all these other methods to execute right and if anybody executes 33:39 they'll be writing in 0 so it's going to go and read once it does bypassing so this is the solution here you know I 33:46 have replaced everything in the canonical methods with zeros by ones so 33:55 by definition canonicalization is seeing the updated values right all the values 34:01 they haven't been updated in the register yet but we have provided extra pass extra paths we have provided 34:07 bypasses so that they can all view the values that are coming in to NQ & DQ etc 34:15 so now this FIFO is wonderful because it's conflict free and you can keep 34:22 doing NQ & DQ every cycle in it there is no dead cycle in the middle 34:30 okay very good 34:39 now we can keep doing this to all kinds of modules so we have a register file 34:44 right and what is the normal view of a register file that reads happen before 34:51 right in other words if I do simultaneous reads and writes I read the old values 34:57 from the register file and whatever I write will be visible in the next cycle straightforward code to write and now 35:05 suppose can I make it into a bypass register file so that on my reads if you 35:14 were writing in the same register I want to see that value so that's the idea of 35:19 a bypassed register file so it should behave as if the right happened and then reads happened it turns out to be a 35:26 pretty useful thing that we use right so let's try to implement this then if we 35:33 continue with this idea that we have so far what we could do is the following 35:39 I can just declare the whole register files to be EHRs so each element of my 35:50 register file is going to be an EHR right and does this code make sense 35:56 I change the declaration so that instead of instantiating registers I'm in shat in sign feeding EHRs and then when I 36:05 read sorry when I write I write in it's it's w0 I write in the port but when I 36:15 read from the register file I'm reading 36:20 or once from it so this will automatically capture anything that was 36:26 coming in and we will be reading out not a difficult thing to do but to most 36:34 people this seems like an overkill right because you're doing this one right and 36:39 you have turned the whole register file into a you know EHRs you know which is a 36:46 lot of logic is there a better way of doing it right which takes much less 36:51 hardware alright so let me give you another design for this which will we 36:57 keep our old-fashioned register files and we put some new logic outside it 37:03 some funny logic outside it so that the whole blob will behave like a bypass register file okay so how does this work 37:14 the idea is very simply the following instead of directly writing into the 37:20 register file you write into a FIFO okay 37:26 and whenever you have to read something instead of reading from the register 37:31 file first you try to find it in the FIFO if the value exists that means 37:37 somebody try to write it you want to read that one if you don't find it there then you go in need it from the register 37:44 file and at your leisure your leisure 37:49 better not be too long you can move these values from the 5/4 into the 37:56 register file whenever you find time just move it from here to there I move 38:01 it from register file to from 5/4 to register file is the high level plan K 38:07 at every one yes no okay let's try it again 38:15 so I have my ordinary register file which keeps all values right but every 38:23 time you do a write I create a small buffer here so I'm new values are being written here because I may not have had 38:29 time just to go and update the register file so whenever you do a read before reading the register file I just read 38:36 this buffer first say is there a value for register R in here yes then I read 38:41 that one if not then I go and read the register file so that's the whole idea that is trying to be captured in that 38:49 diagram that is shown up there where RF is the ordinary register file and this 38:54 whole blob will behave like a bypassed register file okay so help me write this 39:00 code so what will right do what does register write do sorry register file 39:07 right do in this case is just gonna 39:14 write into 5/4 right so you you have oh I need all these declarations then my 39:20 register file is ordinary and I have introduced a 5 for bypass FIFO in there 39:29 and so I'm calling this 5 for bypass because that's what I'm keeping all the 39:36 bypass values so I have now include a value into the bypass register file 39:42 that's all that's all register right does what does it just a redo who's gonna 39:52 tell me at high level yep 39:59 very good so we go to bypass rod you 40:06 know I'm being fancier you know my I'm assuming FIFO is very big it's really one element so it's a very simple test 40:11 so there is no search going on per se but if you had two elements then you would have to write more code but I'm 40:17 just writing it more abstractly right so you can search the FIFO and if you find this value then you will return it 40:26 otherwise you know this is what is being shown here that if it is not in the 40:35 bypass FIFO then I go and read it from the register file otherwise I just read 40:40 from the bypass fivefold in this and the 40:45 second read is exactly the same because it's a two ported register file there is no change in that so far so good so we 40:55 are reading from here we are writing into this but somehow we have to get 41:01 things into the register file - yes I'm 41:08 coming to that okay so that's the thing that is written at the bottom what you 41:16 have your five phones right of any ugh so this has nothing to with bypass or not bypass ordinary FIFO and now I'm 41:23 gonna provide you an extra function on it search right so for that your you 41:30 know you can search on anything inside the FIFO but generally we think of it as a key value pair being inside the FIFO 41:38 but you could have search on any bit pattern in the value so you define the 41:43 search function when you make a searchable FIFO and it will just go 41:48 element by element and returns the value it tells you whether it found it or not 41:53 the many many possible designs but in this case it's in total overkill because if you have a one element FIFO you can 42:01 write down all the logic yourself of the searchable FIFO but in the library you 42:08 know you'll find s FIFO so anything is you cs5 for that means a searchable FIFO 42:14 okay and then you have this move function a move rule so what is the move 42:20 rule doing yep so now I want you to take 42:35 it a step further tell me when does this we'll have to execute 42:44 must execute them suppose I don't execute it what will happen sorry 42:56 right so what'll happen it will try to build a you know your this bypass FIFO 43:03 has to be bigger and bigger right if you don't ever write it but five bypass five 43:09 for itself is fixed size one element or something so how will this register file 43:14 behave if you don't execute that rule 43:22 well we can't afford to lose all right right exactly so remember in blue spec 43:30 every method has a ready signal until 43:36 now the register file behaved as if the ready signal was always true now what 43:43 will be that ready signal for the right 43:48 it'll be the ready signal of the who's 43:55 going to help you here bypass FIFO right so if bypass FIFO has no space left in 44:01 it means a sorry so he can't execute this can't do this in cue that means the 44:07 outer thing will stop what I'm trying to show you here is blue speck at some very 44:14 high level remains a extremely time independent model right so whenever we 44:22 argue in terms of cycles whenever we argue in terms of concurrency it's always so performance which is extremely 44:28 important but performance is not any more important than correctness and whenever you argue correctness you can 44:35 see this is getting too much for me I'm going to argue one thing at a time right let's do this first let's do that first 44:41 and that level of argument is also always valid in blue spec you can think 44:46 of your designs regardless of how many things are happening concurrently one at a time time even after EHRs everything has a 44:55 proper meaning one rule at a time we do not sacrifice one rule at a time 45:00 semantics of blue spec just because we have introduced by passes in EHRs so 45:07 that's really the at a deeper level that is the invention here otherwise EHRs would have been just a hack but it's a 45:15 hack with very very good theory associated with it you know you you're eating your cake and eat you know what I 45:22 miss my metaphor when you get the point eat your cake okay so we will have a 45:37 register file design this way and now you know if we really want to be honest you should go and synthesize a register 45:44 file using EHRs and synthesize the register file using this stuff and see 45:49 you know what are their properties which takes more area which takes you know which has better delay and so on so 45:56 don't take my word for it if you want to and synthesize both these things and you see the difference okay now what I've 46:07 shown you so far is actually interesting that I can take these modules and I can 46:12 derive many derivative modules right which has the same functionality but 46:17 they have different concurrency properties to them which is extremely useful for me to design these systems 46:25 and this idea can be applied to many different things I just showed you two 46:30 examples five force and register file but you can do score boards this way you 46:36 can do memory systems just provide so many variants because of this you know 46:41 because in some places in memory system we absolutely cannot afford to lose a 46:47 cycle right because it'll be just have plastic consequences for performance in 46:52 many other places we don't care if extra cycle gets introduced and if you are wasting too much time with bypasses 46:59 there you're wasting time okay now the good thing about such users 47:05 of EHRs is outsider doesn't have to know anything you know so you're an 47:13 introductory class well you guys are very advanced but you know if you didn't understand EHRs and somebody says don't 47:20 worry about it here is a module use it here is a bypass five for use it here is a bypass register file you use it in the 47:27 rest of your design you don't have to worry about EHRs you can just use these modules as they are in a blackbox manner 47:34 so we deliver on that promise to you know that whenever you use these things we do not break the black box model of 47:43 use of various modules in the system but what can happen when you use such 47:50 modules is that it can affect the delay of combinational parts because you know 47:56 there is no free lunch whether I'm going through bypass five four I'm introducing explicit bypass there is a bypass at 48:04 work and anytime there is a bypass at work you can make the combinational 48:09 parts longer and that makes things worse right so let me just quickly illustrate 48:16 that how this shows up so you may recall this from an earlier lecture right so we 48:24 were designing two stage pipeline and we were trying to resolve you know the 48:30 control hazard you know so we were speculating and then we did a very good design where PC is being updated here 48:37 and a PC is being updated there and we had epics and everything and you know 48:43 two rules conflicted because they were both writing in PC so it turned out to be a bad design it produced right 48:49 results but it was not any faster than multi cycle version of it now if I had 48:57 taught you EHR s earlier all of you would have said oh use EHRs why because 49:04 we are using you know we are updating here and we are also updating here which 49:11 caused a conflict and therefore these rules became conflicting with each other now you have 49:17 this big weapon in your arsenal called EHR what will you do 49:34 [Music] did everybody get that turned PC into a EHR right so instead of using an 49:42 ordinary register use it as an EHR and it has two input ports right so I will 49:48 write here and I should be allowed to read this thing in the fetch pot and 49:55 let's do that so when I do all this what happened right so I have turned PC and I 50:02 you will realize you have to do this to a pack also so what you will find is wherever I was using PC whenever I was 50:09 writing in PC I'm writing in PC 0 here and I'm you know reading epoch 0 I'm 50:15 writing an epic 0 but in the fetch stage because I want to see what happened 50:21 later on right in my pipeline I want to know right away what what happened because this is the older stuff that is 50:27 happening so there I read EC 1 so you updated it in the execute stage and we 50:35 provided a bypass from this to the fetch stage so there you can read it you know what's in the what is the new value of 50:42 PC that somebody wants to write who has 50:54 the authority in this who knows the who should prevail in case of you see 51:00 remember the first one the fetch stage is speculating right so this is the 51:07 thing whatever it does should prevail because fetch stage comes after me I 51:12 mean you know it comes before but the older instruction has the authority to 51:18 update everything so that's why this happens first and then you will do it over there but these are exactly the 51:24 kind of questions you will have to face and I just wanted to show you what happens when you do this 51:31 first this last stage and last one is the one that is done using EHRs see what 51:36 happens to the number of cycles they drop dramatically 51:43 they dropped so much you know because the moment you see it when you go and 51:48 update it you know at the same time but because there is some extra bypassing 51:56 going on here right indeed the critical path became longer and that is affected 52:02 in this bottom row here that you know we were getting these kinds of timings you 52:07 know before 4 o'clock period and now suddenly went by more than 120 52:14 picoseconds no free lunch right and so 52:20 therefore you have to look at the multiplication of the two things generally I would say use EHRs if you 52:27 are using it at like this very very carefully very carefully because in big 52:33 design this gets harder to analyze you know what's happening so in modules yes you know because they're short bypasses 52:40 going on and we can analyze the effect when we use it we can plug in this module versus that module but when you 52:46 use the hrs at a higher level then there is many other things going on in the solution I think that's the there are 52:53 other things I could have told you but this is more than enough for yeah I would have told you how you can use the 53:00 ideas in the project but you you have that slide so you can see it good thank 53:05 you