00:08
The next, that's an easy way to have server computer scene as a failure. So, but that happened and we're very close to

00:20
This offers first stable, okay, so I'm gonna do some background and then give a tour of or

00:35
Talk about containers and cloud and some of our current research directions. So,

00:52
You know, I'm gonna go downstairs, anything

01:02
Great talks.

01:14
National agency is No first order built fast, fast, big and fast feel big and fast computers, right? And things like so many ways was, of course, there's this more nuanced aspect where we try to make the system, especially maximize the science and engineering output. We time. I'm for dollar spend dinner.

01:40
Allow anybody? And there's lots of nuance to aspects here, right? We co-design our hardware to better meet, and you don't want the applications needs or what applications are doing. And on the flip side, we

01:59
Whatever hardware of the day is the most economically viable across. Wow, most prevalent. So, GTUs are kind of the first reason.

02:13
Things like energy efficiency software to show that the OS, doesn't that? And then really providing future. I can't piece of that or math libraries, I'm guessing, you know, but if not jacked on error, you know, along

02:38
They need for many years, lots of work on math, libraries and other tools that kind of under one of those meetings. What library computer years, you can actually there's a New Mexico license. So I know where that's at. Now, that would be thing to to have around. I guess.

03:12
So, okay, so that's, that's background. You'll hear people talk about strong signaling. It's kind of underlying. My applications was lost Everybody. Amazon, what it says, is that the serial fraction is really what bounds your parallel speed up, right? So of, you know, for example, one percent of your application is serial, no matter how many processes processors, you have, no matter how much you realize that the maximum speed that you're going to get is about a hundred.

03:47
So that's quite depressing, right? If you build a big, big cluster, and can only get a hundred to speed up, that's not So everything.

04:05
Twisted. This think about having small fixing problem size. Relax, as you.

04:43
The very successful model this is. So what this means.

04:59
That's sold aside from us. I haven't put this nice job with showing how a lot of applications are parallelized today. Right. We've got a, we've got a problem domain or problem that we want to solve, we partition that up into pieces and spread the various pieces across lots of server nodes.

05:21
So, this is right straws. Yeah, this is weeks scaling. Right? As we add servers, we add more memory, and we can thus solve a bigger problem. And then within each server for a long time, there was only a single core or single processor in each of those servers. But that's multi-core processors of was the process with fix memory size.

05:47
We do strong scaling within a note. So after we, you know, partition up the problem with all these different pieces, you know, each of the processes communicate with each other using, you know, message passing for example, or it would be a shared memory or some other mechanism, they kind of grind on that compute.

06:04
They come back. Can they synchronize reductions? Making some collective operations and then they iterate and they just kind of keep doing that in this bulk. Synchronous way, There's lots of work going on to try to relax that and be more asynchronous dynamic. But this box, synchronous model is very easy to think about, It's easy for application developers to to write to and it's what the vast majority of codes do today.

06:33
It's been a very successful model. That's the point where it's enabled us to build a large number of highly parallel computers. They're, you know, large clusters made up of servers. The current workhorse of certainly within the department of energy, lots of independent servers that are connected by a fasted network together.

06:54
So it's like and users typically write MPI programs that explicitly pass messages but there's this kind of really rich software middleware layer. That's been built up. Where users don't often, don't even have to know that MPI is there that libraries. The math library is the IO library thing like that under the covers, these MPI and kind of hide it behind the layer of extraction.

07:25
So that's that's kind of mechan the productivity of these systems on the on the left. Right is came in notional model and the massively parallel system. There's different node types in the system. There's service knows that information is log into. There's an I/O partition that provides a parallel file system.

07:46
And then there's a large you know set of compute nodes where the actual problem runs and on the right there's a link to a video where you can learn more about the astro superiority sanity which is a project that both under and I worked out. Okay, that's this is kind of what a system looks like in our data center.

08:10
That's kind of a bunch of cabinets with a bunch of computers around them and then a network connecting them. You can see the power of us bars ahead in overhead. They're supplying power, and then water is coming up from the floor and providing cooling to the system. So that's astray.

08:29
You can see some stats about it. It's about 2,600 nodes 56 farm ports per node. And then by contrast this is a fugacu that region in Japan. This is the current number one was Astro is 2600 notes. The systems across 4. So it's it's just massive. It also is using arm processors so that's pretty cool.

08:57
It's about half of an extra flop or ish and if you click on this when I post the slides which I haven't done yet, there's a 3d interactive tour so you can kind of zoom around the building you can I think it's pretty cool to go in the basement.

09:12
You can see the thermal isolators that are seismic and I started out there. Well, size, make isolators, of course, Japan has lots of earthquakes. So it's some pretty impressive infrastructure. Okay, and then kind of, where where is HPC headed or what are we thinking a lot about and the directions I guess machine learning and AI?

09:36
Of course, is huge and pretty much every domain. But we're trying to figure out apply that to HPC, you know, maybe we can use physics, can train, a male models to accelerate certain computers or zero or in on things we want to look at and higher fidelity. So there's a lot of work going on there increased hardware specialization and hydrogen 80.

09:59
There's kind of this increasing notion that computer architecture. And specializing architecture is the key to achieving higher performance going forward because we're not going to get those things from process scaling and other things that help us in the past. So there's a lot of work going on in co design.

10:17
And then, lastly, you know, the big hyperscalers and cloud providers in the world are increasingly starting to make their own hardware. There's there's a whole make aspect of this of what's going to be economically, viable going forward and we're trying to understand how we should think about that the HPC space either making directly some clouds or think adders.

10:43
My answer is going to be talking about some of that later. Across all the answers. These are super areas and there's a ton of software set challenges in each of these individual areas. So they're there's a lot of interesting work out there. That needs attention. I'll just put it once again for one of my new projects.

11:02
There's a lot of work going on and open source hardware and the open source hardware designs. Things like this five and then high productivity designed tools that are out there that make it easier to kind of design and run workloads on hypothetical SOC designs. So, we're trying to use that to kind of make this full stat co design framework.

11:26
That has thing all the way from, you know, RTL through the operating system. So going across the whole spectrum of what can we do in that area to develop systems and better, run our workloads of interest? So you know I'd be happy to talk more to you about this if you're interested or if you're doing work in this area that also love to talk to you.

11:50
So just about it. I mentioned that quick. Okay, so now a tour of the Etsy software stack. But first, it's kind of useful to know about the specular systems and think about there are the production systems that have mature software sex, which are compilers hard, hardened, MPI libraries, things like that, and these just kind of have to work.

12:16
So there's there's not a lot of R&D going on in production side because you're kind of more dialed in systems. We stand up On the left hand side, right? Test bit system. These are kind of onesies twees you know, 1632 clusters you technologies that we want to try out and see how well they work.

12:37
So you can think that arm was in this category. One point you you can think about things like all these new ML accelerators cerebrous, I don't know if you're familiar with them but they have wafer scale systems that have like a 12 inch wafer a processors. And you know, there's kind of the question of how you program that right is going to look different than is serious memory, super computer.

13:03
So DOE has close collaborations with a lot of these vendors to, you know, develop their software sex and figure out how to make them work for the types of scientific computing workloads that were interested. Then, in the middle is Vanguard, this is, Astro was the first band or its system.

13:22
These systems are larger and they kind of have a critical mass so that we can get more users interested in. Right there. Certain things that only show up with scale and as a carrot, right? You need to have enough capability to get users to port their codes to it.

13:40
So the notion with these systems is we're trying to mature new technologies show that they're viable so that we can then move them into production systems. This kind of whole technology maturation, path is really aimed at trying to get, you know, technologies to the point where we can deploy them options.

14:00
So, like this clients we needed to software stack that could easily adapt to new hardware that comes about and integrate with vendor tools. So, we looked around, and there wasn't really anything that met all over requirements. So we we developed at sea, this module, extensible, software environment. We originally based it off of the open HPC software stack, which guessing some of you may be familiar with.

14:28
It's a great stack. Not to be, we did have to adapted and in a few regards and also integrated with with all of these other components showing on the great deploy this on Astro, and we're expanding it to work on GPU based systems 864FX. The process that's in Fugaku and and other testaments as well.

14:51
And, you know, have containerized it which, which will be talking about later, but there's there's some more details on asteroid than this. The paper looked at the bottom I'm gonna go in and just do a high level description of some of the different areas. We're gonna ask each one is the system management layer, right?

15:13
So this is the layer that runs the whole system, how you manage OS images and knows and things like that. So our partner for Astros HD, had this new product called HBM, a GPs, cluster manager solution, and it was a new product of the time and it was a mature.

15:36
So we worked very closely with them to develop it to add some hierarchy to the scalable and things like that. And now it's being used a number of very large systems so that that's kind of a path where we can have a prototype things on things like Vanguard that can then move into wire usage so it's kind of a happy story there.

16:00
I guess the base operating system, there's been a lot of work over the years on, developing operating systems for HBC and a lot of work at UNM as well. That depending partnership to make scaleable operating systems. A lot of that is translated into Linux now and how to configure Linux so that it's scalable, Livermore leads the trial, lab operating system software stack, which takes some of these lessons learned into account and applies them to a red hat.

16:31
Base stack. And this is what we deploy in our number of our systems will be deployed on and I mentioned vendor components vendors. Often provide their own optimized compilers intel. Does it arm does it? Most of the vendors do that? So we try to use those along with open source libraries and then also the vendors typically provide an MPI library.

16:55
So those have to be integrated and with the stack so it. So in astray case had their MPI that came from the SGI days that we integrated and arm had their compiler tool chain that there were standing up for HBC. We worked very closely with them, placed engineering contracts and developed, a lot of useful capabilities in their math.

17:17
Libraries batch glass batched laws capability, for example, which is now there in their libraries. And for some of our workloads, we're seeing very good speed ups there.

17:31
And and then the programming environment, right? This is the thing that users see in program two, it needs to provide things like IO library's. MBI math libraries, these abundance all together and make them build users, having kind of this cure rated and tested. And environment is really important for immature technologies, right?

17:55
So if the people standing up the system or doing the work to do this and try, it possible for users, get onto it tackle, those issues. Once rather than each coat team do it, we think that's a useful thing. That was one of the benefits. See model, like it's mentioned earlier.

18:17
We don't use both in HBC to build this but now building apps, it was back. It's back to tool builds HPC software packages from source. It's very very good at managing different versions and configurations of software packages simultaneously which is indicated were you know making disrespect. I didn't capab one of them was shown on the slide where a user can extend a spec a system installed back with their own packages of.

18:51
You know, they want to use the open blocks, it is tuned for the system and already there. They and build trilingos on top of that. They can, they can do that in. Kind of a building on the system installed foundation rather than build everything. Okay, so that was my portion.

19:10
So Andrew I'm gonna hand it over to you. That's good. Thanks. Kevin, yep. So I'm gonna tell about directions and several of aspects. It's gonna first formulate around containers but really try to expand towards along with the talk in terms of both HPC and cloud and how those interplay next slide, please.

19:35
So all right. What are containers? Well other than the things that you get to go buy and then pull coals into don't have the container store. Containers are a software package which you know packages up all the code dependencies necessary to execute some singular processor tasks. Generally this means you're encapsulating a tire software ecosystem, minus, the current level.

19:58
This is essentially OS level virtualization. If you consider the full virtualization stack from ISA level virtualization to API level virtualization, We're working at the OS level and that means we're they're fundamentally different. Now that virtual issues if you've studied those in essentially think of it as at a very very broad stroke perspective is, you know, CH route on steroids or at the SD Jail, they've evolved heavily since then.

20:23
So that on the bed notion only gets all right, depending on your post operating system. And as you are virtualizing, various parts of dropping system, which means the answer is, you can use any operating system, license based, right? And often nowadays were leveraging, some kernel features called namespaces that are built within the last.

20:44
There's different namespaces that we have user mount PID. Their secrets as well. Etc. This goes on. And these new features will be regularly often, But these are essentially the key aspects of the kernel itself that allow us to better provide advanced features for cont. So if you've heard a containers you probably heard of docker, right?

21:10
It's been the leading container runtime. It's been used extensively in the industry and cloud enterprise space originally with the underpinning of a lot of what's done. Now, on various cloud services. Kubernetes. And so, the problem here, the interesting bit is containers is while the originally with LIC, They've largely sponsored some episodes of enterprise hyperscalers baking in the class, right?

21:40
The public cloud offers and this looks significantly different than what we see in the context. And without simply because I believe there's a substantial rift between in HPC and the enterprise cam while we've ended up building very small machines. Maybe even with similar features, the goals are fundamentally different.

22:03
Right? Because Kevin talked about, we have this this, you know, folk synchronous, parallel model where we have one task that gets, break it up into many thousands or millions of subtasks. Acted on parallel. Really for one key simulation, that's the bulk of our modeling. Your simulation work that's been done to the past 20 30 years.

22:20
Conversely, the cloud has a different problem. They have many tasks that get broken up into many more tasks, potentially to service a million. Google requests for LOL cats on, you know, continual basis. These have the what's funny is this generates very similar looking infrastructure and it would be really nice if we can beg, you know, borrow and steal bits from one, or the other in a more readily available solution than we can in a past.

22:53
And so that is led us to really explore containers and fundamentally new way in the context of next like this. So how do we plan on using containers at Sandia as well as you know wider in the in this aspect of that is the DOE enterprise. So one of the things that we really liked to do is support software development, that's focused on HPC, right?

23:16
There's millions of lines that go into our codes and a lot of developers that back those lines of code and it'd be really nice. If you know while they are targeting you know potentially scale class machines nowadays. It'd be nice if you could enable a nice company development environment that they can work on their workstation on the laptops that could then be brought towards brought to bear effectively on super resource.

23:41
We really want to minimize the development time for targeting user computing and as you can imagine at which Kevin just talked about plays a fundamental role and then right we want to be able to have a whole software stack. This ecosystem, develop our applications on top of that. And then move that effectively and how you get that at sea stack, not on a super computers and interesting challenge.

24:03
Right? The key aspect that folks really like about containers is the fact that you can specify your environment exactly how I want to build my environment as well as the application that says that they have. I can just import this container presumably and then go run it on a target platform.

24:20
I'm going to many containers. Each targeting a different machine type architecture, using a different compiler. Differential options, you name it. Right. There's there's a infinite list of the custom ability here for customization that that can happen. You're not really bound by traditional imitations that we often have seen in the past and our soft reading systems, perhaps provided by a vendor.

24:41
I'm not being said what containers are? Great performance matters. Performing. You know, it's fundamental aspect to HTC HPC and we need to keep it that way. We also have the ability to support fundamentally new and work emerging workloads in its context right? These MLD elements, YouTube analytics capabilities. Large-scale data analytics is and being able to provide that software ecosystem.

25:07
That is potentially even different than HBC, but needs to be brought to bear on an HTC or leadership classrooms is it is a question of. How do we, how do we make that easier? We think containers are part of that next slide. So there's plenty of container runtimes. I've mentioned Dr.

25:24
Briefly. However, Docker doesn't really sit well for running HPC workloads. These are fundamentally shared resources and Docker doesn't share. Well, You know, I can use Docker on my laptop and that's useful. That's helpful. However, actually deploying Docker directly on a supercomputing resource. There's fundamental integration issues, they're security issues.

25:45
I can go in like the best but it's not not very useful. The interesting thing is there's many different contents that are specifically now, focused on on HPC They're shifter. So you already Charlie Cloud Soros and Section Mac. Good news is all these can hit a runtimes are usable in HP, right?

26:08
They take different mechanisms, they leverage the underlying external and a fundamental way. Constraint containers different ways, but they all essentially work and I'll talk a little bit about building in a second next slide, please. So, I wanted to take this opportunity to introduce my brand new container from time.

26:26
It's called yet another container and time. As you can see here, I've the entire source code developed. Everybody is only sixty six lines of go. Isn't that phenomenal? Aren't I? Great. Right. I figured because we had so many container runtimes. Why not? It one more next slide back. Please, I, I'm really kidding here.

26:46
My point is that containers are not that scary, there's some fundamental code that you could leverage, I'm borrowing directly from Liz Rice's containers from scratchbook. Here, this is a really interesting book of getting on and checked it out. There's there's several talks that I've got links here as well.

27:00
My point is containers, aren't that scary? There's some basic process manipulation that you can do, which was quickly illustrated in, you know, have been trendy. Go that being said, these aren't fundamental catastrophic efforts that have to happen to provide a container runtime. How we use those is potentially a more difficult and how they apply the context of HPCs, a fundamental more, lots of questions, but I don't want to scare folks away from container runtime development.

27:33
This is not like the million plus lines. That you find in VMware. For instance, next slide. Please That being said, we contribute to the trend times across the DMV enterprise and as part of the super containers project that I that I need, I've listed several different advances that we've made just in the past year, investing in several container runtimes right between Charlie Cloud, led by the folks Atlanta we've invested in sinkularity as well as aptainer this is youth pretty heavily.

28:01
We've, we've also made it investments in and moving shifter to a new role. Mother DUE system at nursed at Berkeley. A lot of bug fixes and there's been a lot of activity going on lately with potential. And the fact is, you know, this is good. This is a helpful to ecosystem, We'd like to see more of this.

28:23
There is there's a clickability and utility of different runtimes for different purposes. Anywhere from fundamental R&D, research production level container support on and this is really the fundamental thing that we're trying to do and address and improve in the context of this year project next slide. So, as I mentioned containers, you know, while we looked at this, with an initial exploratory vision, we've done some experiments, make sure they work fairly well.

28:52
We do have a performance solution they've since been deployed in many different contexts across the VUE enterprise, the quick snapshot of sort of this state of containers on various machines and pretty much any major DOE resources and sports containers, in some fashion shape or forms today each with different runtimes for different use cases models and, and you know, different levels of integration with the fundamental system itself.

29:19
So, this is great. Okay, I can go run a container. However, you'll notice a lot of these systems are very different from a hardware aspect, right? Or for summit is a power nine system. Kevin just talked about ask for our arm base system. I think some folks now have armed base, Mac laptops.

29:38
That was in the case, even before here and a half ago and I certainly don't run around with a power. Nine base mat or x86 with the right particular GPU. Right? North what I want that many laptops. So there's this interesting question. Now, about how do I build containers for angels?

29:56
How do I improve this state of the landscape in this section Next slide? Please. So one of the things that we've tried to do is leverage pot, man, podman's and interesting runtime for several reasons. It's meant to be essentially a docker drop in replacement. It is command line equivalent, or near as makes no difference, identical to doctor.

30:16
So if you're familiar talk, you know, typing docker run is what you can take pod, man. Run or actually I just now aliased pop man and you know things usually work out. There's some Nick's cases here so don't take me to literally but it's close enough to to use effectively in that matter.

30:32
So what we've done is because we have this arm based system two years ago or what you know several years ago. Well before I could buy an arm-based mack laptop. I need a way to build AT form fast and efficiently on my actual certificating resource and then provide that quickly to our user base.

30:52
So you know, we enable this we enable Podman and a very initial prototype and sort of hacking way but it's you know targeted towards larger production deployment. Where we can actually enable building of a container image where you're literally jumping young install or do nothing. Install or zip for, you know what, having simulating root, privileges in a naturally safe and rootless matter through podman through it, you know, a CLI that you're already.

31:22
Most people are familiar with So we demonstrated this were sort of pond platform container build. We're really actually excited about this. There was a lot of weird nuance things that we did to make this work in real seven and the good news is we're collaborating directly with Red Hat and on the latest relate and what we'll soon, be real nine offerings.

31:40
This is a significantly that are supported product, but here you could see where walking through building podman directly on a login node, pushing that container image. That is an OCI compliance. It's regular it all container. It's nothing fancy. Pushing it to a local on-site. On premise contained. Registry service shifting entirely to a separate network or like super computer is, potentially on an airbag system.

32:05
And then in this case, using singularity to go into deploy, that one that on the couple times across astra this is not unique to Astra. Now, at this point, this is similar features are being rolled out elsewhere across the DUE, including both both nurse and Oak Ridge at this point.

32:22
And there's also work on going that we've been doing in partnership with our with our friends at Berkeley on enabling podman to be a scalable tool as well and launching container scale directly next. Like please. So this kind of captures a lot of what we've been doing and why this model is really kind of interesting unique and there's a lot of advantage.

32:49
We can take our entire software ecosystem that is adding to Kevin has explained, We can employ in container and I can do release testing against that. I can go build a series of applications that I really care about and I need to make sure work and I can find that.

33:04
Oh, hey, there's a bug in the updated math library, that we're about to roll out, I better figure out what that is tested against in modifying things at before. I'm actually rolling out my software natively. It's really powerful emotion for a lot of folks, a very different every time you update a supercomputer, something breaks and people's hair around fire.

33:22
So making that process. A little simpler is often a very good thing and saved a lot of time. In terms of development, The same is reverse, right? Well I knew my version that I had six months ago worked and I really need to test that. Again, I can go get the container image and roll back to a previous deployment, which is something that may be very difficult to some of the systems software.

33:44
Libraries have been updated over time, right? And we're trying to enable more build tools and functionality from moving and transporting container images across the, you know, not a lady diverse architecture but across different sites. In the DOE, we really want to simplify that employment of our core code set at Sandia as well as emphasis and really try to embrace and lead into this.

34:06
This whole laptop HPC test bed Default sort of situation that we're starting to really find ourselves in Next slide. Please. So, this is led us to some additional advancements and really kind of caught the eyes of some of our core analysts and there's this notion that. Well, great, I can run HPC, I need to get these tools that we've developed as part of the ASC program across the trial labs.

34:37
I those tools need to get into our analysts and our designers hands to her and often that requires them learning today. At least lyrics learning how to work with a batch scheduling local resource management to rent that, that's actually very difficult to times. And there's a significant barrier of entry to HPC that we'd like to see what we can do about reducing that or at least augmenting that with something as simple as a front and web service at times or potentially a Jupiter notebook, or a series of databases that I can stand up that out less than any given job or time.

35:14
And so if we're getting used to creating containers as this course schedule unit, let's really lead into it and provide surface service orchestration that also pairs nicely with our traditional agency. And this is a new activity that we've really tried to embrace leaning to and leverage at Sandy, a really starting this year.

35:34
Looking at how we stand up, how we have user authentication and away, how we can define these pods that can construct, small micro apps with front end UIs and web interfaces, the key HPC tools that are designers really need and then having those go off interact with our HPC.

35:51
Potentially even directly deploying those workloads in a containerized fashion on HPC. You don't want to bring results back and really provide a comprehensive tooling to to the analysts that you need to use these tools. There's a lot of work in this. There's a lot of nuance, there's many different ways in which we can talk about doing this.

36:10
This is a, you know very interesting at least for me you know exploration and investigation into how we can provide with scheduling and a bad sense and orchestration and services and anything in between while still maintaining that key aspect of HBC and scale. And there's also a whole just to add a little more, there's this whole notion of devops and creating a nice way for our dev teams to develop it for and targeting these environments as well.

36:41
Next slide. Please. So, as you can see, this is sort of our path board that we really are starting to to put together. You know, we've developed datsy for Vanguard and we'll be, you know, there's a fundamental R&D aspect that we had to do and have to do this initial lift for Astra.

37:00
We've also learned that it's in allowed us to engage closely with vendors, as well as the wire open source community. And I think they're gonna see a lot more iteration and turn on this moving forward from a hardware standpoint. We're seeing really this embracing of extreme heterogeneity. Not only at the node level with different GPUs potentially but at a system level, right?

37:24
And trying to focus more on, not individual simulations, but larger workflow ensembles be it be a extreme, you know, data science, or data center engineering or coupling that directly with our multi physics modeling simulation tools, and try to enable a tiger. Inner, that requires dynamic resource management and that's a key aspect of what we're interesting in, you know, to inexploring further and it and very curious to see if other folks are interested in this activity.

37:58
There's a lot of R&D that we can do combining some of the key cloud-like features with things that that we need in HPC. And really trying to, you know, bridge this gap from from both sides. And there's a, you know, a work to enable sort of this cross lab, containerized workflows potentially across multiple air gap.

38:16
Networks. This is a larger challenge than it may sound limits even at first, especially when we consider, you know, how we deploy. Comprehensive. See container ICICD on in this. In the same aspect next slide. Please. So I'm going to try and quickly wrap up. I'm going to give you my my views here at a very high level.

38:40
I'll let Kevin also try to chime in here but you know the doe's made significant investments. In HPC, hardware moving into excess scale. We're really interested in trying to focus on how we can better enable that movement software perspective. We've laid a significant foundation in the army system initially, we've asked rep benefits for a lot of our partners in collaborators.

39:03
You know, we can be regions largest agencies and I have a feeling we'll see many more army goes up armed based supercomputing architectures to come, you know, investing in code. Design is a key aspect that we really need to double down on. And as you've as we've clearly implied software, people pivotal to the success of machine and flexibility of that software particular, our, our ability to adapt redeploy and move and change our software over time as proved crucial, just in this prototype astro system is will be looking forward.

39:41
We want to leverage open source tools, whenever possible available and really try to double down on some notion and really push it to the extreme. That's that some of the things that I'm really excited about. There's there's a lot more work to do in this enablement. That's clean. Kubernetes based service orchestration and how we manage HPC today.

40:02
I think you're going to see a lot more coming down the pipeline from every DOE, we lab and a lot of different evidence institutions on this and I'm really excited about this activity here. So with that, I'll just say, thank you very much for having us love talking about this.

40:19
Kevin and I, I'll let him also share some including thoughts here, but really appreciate the opportunity to chat about anything more. We should have here in, in our concluding thoughts together. No, no, I think you summed it up. Well, Patrick's. Got some questions and chat already that. Oh, great.

40:37
The answer. So I think we've got plenty of time to answer questions. So yeah, so it's suggest to that. We proceed with the with the questions and page interviewed like to elaborate in the questions that you potentially chat. Sure. I can So I'd be happy to One. Thanks to talk, it's great.

40:58
I've been using containers in a variety of ways. Christmas development work. I'm doing it's really nice in a GitHub. GitLab environment to be able to do my work on my big smart application and have CICD in a container work. When I do a plural request or a commit to say, oh look, it broke something that dope without me having to, you know, remember to mainly run?

41:25
All my tests, You know, that the entire work was preeminent. As a it's gonna interesting learning experience. As me as sort of an old-school C, programmer to get more huge to using containers and all the CICD things. That said, one of the challenges I've had using containers in DHPC space.

41:45
Is that by and large? They're not portable across HPC systems. I in general, can't take a container. I build to run on a UNM HPC system and run it on a Sandia HPC system. I can't take one over one of the Sandy HPC system. Run it on a landing HPC.

42:09
And that's mainly because and so, the isolation promise of containers, basically is broken from the HTC world. And that's maybe because something with the the contents of the container itself in the HPC world has to be tied to stuff outside, the container on the HPC system particularly in PI and the process orchestration processing management system.

42:36
On those, you know, is there any move and progress to fix that? So that I can take my slack built HPC container that used for continuous integration on GitHub and also run that on Scoma at Los Alamos and on another S86 system at Sandia and

43:08
So I'll say yes and no Let me back up and just say, why do we? I like to draw the analogy that there's there's a really creepy monster that we put in this box that we call containers on HPC. We realized it had to breathe. So we started poking some holes in and that's essentially we've done to keep containers performant in the context of HPC.

43:29
We hope these holes in to bring in almost level libraries for API to, then make them run really efficiently And that worked at the moment, you know, several years ago, it's created this compatibility issue. That is significant and we end up relying heavily on ABI compatibility of our MPI applications as a result.

43:48
And that's that's not a good thing to rely on, in my opinion, I think we want to push that reliance further down close to the hardware. So there's a lot of work. Some of this is actually not technical, a lot of it is trying to work with our vendors to understand how we're deploying software now you know, explaining to create an HBE that like, you know, it's not good enough to simply have those black box that is their MPI raise MPI.

44:13
They need to be able to let us put that in a container. Either building it or make it easily insult the library. That's been a long road and it's been a long time coming. It's getting better and I think they see else. So there is movement there, It's really the fact that vendors aren't used to deploying their software in this context, and they finally figured out that if they can't figure that out, their software won't be used on this next generation systems.

44:41
That's the key difference. So I think you're going to see more things finally getting better there. I also want to touch on the notion of portability a little bit more. I want to focus on embracing CI/CD pipelines as much possible in the context of containers, because portability comes at the build and the manifest level how I specify, how to build my container, and the less, it be fun.

45:06
And at the resulting image that results, I don't care that much about making sure that resulting container, which can run on 10 different machines. If I can very easily at a push of a button in an automated fashion, generate 10, different container images that are each especially targeting the hardware system, potentially from the same manifest, I think that's where portability really comes in into its own element in the context of containers, being able to quickly generate multiple different containerized sessions that can be deployed on different systems.

45:39
So there's nuance in there, there's more work that we could do. I would really like to hear some interesting thoughts on that, right? Hot swapping of libraries, it run it, you know, in a new at build time rather than run time. I think that's that's something that we can talk about printing, what we could do there to, really push the moments.

45:59
I hope, I hope that's helpful. I'm curious if you have different thoughts on it. No. I I certainly see where you're coming from and you know, I am using a lot of those same work around, I use a spac container respect specification to generate my containers. And so if I put the appropriate magic into package.yamlit, will use the different MPIs I need for the different systems, I need to build on from the same manifest that just means so.

46:28
So I'm doing that work for ideally, this systems are doing that work to generate two, create those back configurations. So I think those of a super used back know about how brittle it can be in some situations but you know, it's, you know, there's no there's no silver balls here, right?

46:52
Because you certainly will need to recompile for different architectures and different systems. But you know, the ability to at least get a first run, even if it's not particularly performant, on a wide range of systems, you know would be nice as opposed to having to maintain a manifestment. Can successfully run on 20 different mutually incompatible supercomputers.

47:20
Yeah just yeah I'll just add my two sons, right? I mean two of the key challenges seem to be MPI right and job wire up which mostly related but on MPI. There's some some there's some hope or some people of like come up with open MPI configurations that run on many different systems so that that's one path.

47:46
But it seems doomed to failure kind of there's also I forget the name of it but there's an MPI ABI interposer layer that translates between ABIs. I think it's I think it might be called why? Yeah. Why for MPI. So that that's seems like a promising approach for kind of patching up these ABI differences between the stuff running in the container and the stuff running on the host or the the driver stack running on the host rather.

48:18
But I don't know if anyone's push it on that so that, that might be it. Interesting path to go on the wire up issues. Your point in the chat right about they're not being a kind of portable orchestration layer. I think that's a good one. I don't, I don't know of any you know we're doing things like using smart master on launch containers and then you know, relying on the premix and the container of matching, the mix on the hopes things like that, that makes this sort of the only orchestration engine we have right.

48:53
I mean don't even make that explicit or acknowledge that it's just sort of. Yeah, we need some better. I think I I don't know what that might be. So what we're telling in on it right. Really into hemics being the way in which we do this. All right? Or and potentially beyond just MPI for other orchestrated.

49:17
Wire up of distributed tasks. I think PIMX has utility beyond, just MPR. So you could write something to translate. So Docker stacks to pimmicks with what stocker stacks is at the mystic. One of the dock orchestration systems. Okay. Yeah, there is some work happening in this there. There was this MPI operator that existed for Kubernetes, for instance, and a lot of what we've talked about recently.

49:49
It's just ripping that out entirely and redoing it almost as a PMIX standard can make standard, right? I'm not sure how that'll come about. There's some proposals and you know options the other thing is again catching up vendors. That means everybody needs to learn how to play with interoperate there and PI utility or package with pics.

50:11
And that that does admittedly vary depending on, you know, your MPI that happens to support all of the US and put that happens at the speed of acquisition RFPs. Not even then, I was gonna say, that's optimistic that fast, you must do this, vendor is, you know, has silly some leverage but then, you know, you're right.

50:35
Other, oh.

50:42
Going to ask your question about portability. This is, this is going to, by the way. Hello thanks. Thanks for, for sharing all this. This is, this is cool. I'm glad you guys are looking at through converged cloud, HPC type workflows because that's not really on my radar. It will sound almost, so I'm really working on this sort of portability issue kind of a lot.

51:08
And so the the discussion so far is been sort of relevant to that. So I I've found that portability seems to work between microarchitectures. If you are building MPI and the container, and then the, you know, PMI submersion of PMI, and a scheduler is sort of the interface that matters for for that portability.

51:31
So it's been something that I can order, sort of just negotiate and say, hey would you, you know make me make me a big container that has the appropriate, you know, TI build and then you know, we're we've been sort of moving those between like broadwall machines for instance without works.

51:52
But, you know, once we've also seen things break when we build for a, you know, say a skylight architecture, take it back to a rod. Well, it'll segment. Look at the core down, but it says, this is missing instruction because it just doesn't have the new, the new stuff.

52:08
And so I was wondering what, how much portability have you tested? Are you also seeing set that, you know, it looks like micro architecture stuff is portable? Are there financial renewality things that I need to be on the lookout for what what works, and what breaks with portability for you to?

52:27
So, the first pragmatic answer is open blogs. Open blogs, breaks it. It goes and takes a look at build time. It says, oh look, I have a sky link. I'll add AVX by 12 features, right? And then you go and take it to a broadwell on CTS and it's going to break.

52:42
So trying to be explicit about the architectures that you're building for directly in your docker. Manifest is a good specify things like, you know, build for Rockwell, and you can use containerized build guards for that. We have some examples, we can probably share. Again, I'd also argue that Eldars, you know, I've got a single container file, a docker file where I can just build five different kinds and say build for IV bridge builds, and it has well built for, you know, frogwell and the sky lake and I'm building this GPU, right?

53:18
And have the build arcs almost be giant if deaths, right? That's kind of the simple way to think about it control some of that. So it's less than one container. Image can be ported, you can do that. I can just say, target equal, Scott. You know, I'd be bridge and it should run out any modern intel system.

53:39
We have across the DOE, it's so utility. But the hard part is your challenges often fought with at the end, rather than the beginning. So I kind of want to encourage folks, to really consider, what do you need the importability at the onset of build time when you first building these things, a portable, do you really need this to be?

54:03
And then, from there, I think you can make some veget, but from a pragmatic sense. Open flaws drives me nuts, that it goes and makes, you know, clever optimizations on your behalf unless you directly specify. Yeah. So it sounds like you're, you're kind of on the same page for, you know, we're gonna need one container per micro architecture, at least with, you know, processor fans like that.

54:30
You I my argument is the developer should be making that decision and we should be trying to empower developers to make that decision sooner in their process rather than at the end is that it's okay. If they want to build a run anywhere container that we know how to do that.

54:49
We're going to come at the cost to performance, right? But making sure they're aware of that sort of performance portability, you know, continuum right? Okay. Yeah, that makes sense. Thanks and you might need one per system in the ad, not just per micro architecture, right? If you got two, you know what, whatever the halo systems and what gets one has AMD.

55:15
And one has NVIDIA accelerators, right? That, that right there, you are going to need different container builds. So, the other portability thing that's not solved is the thing, Patrick mentioned, right? Is when we poke holes and containers to map things in like GPU librarys, or the local network stacks like that, that that's, that's sort of a challenge we have to deal with.

55:40
And there's this trade-off between portability and performance and they're always is, so that's just something we have to learn how to deal with in the container world. That said I think there is work. Interesting work to be done and making these interposal layers reportability layers with containers. Especially for HVC working on that.

56:04
So, you know, if I'm that might be a an interesting project to work on. As a student, it sound it's needed and a lot of people would use it.

56:19
Well, I don't think we have more questions than chat. So I want to thank you. Kevin Andrew for this presentation. And I want to thank And sorry for initiating this series of seminars, and we'll be in touch about possibly organizing the future seminars, The next epidemic here, But thank you all for participating and have a good summer.

56:53
Thanks, it's a lot.