(00:00) High performance computing used to be the domain of supercomputer labs and deep pocketed enterprises, but not anymore. Matthew Sax, co-founder and CEO of Parallel Works, is tearing down these walls, putting massive compute power into the hands of scientists, engineers, and the AI community without the headache of being an IT plumber. (00:22) In this episode, we dig into how their platform Activate is cutting cloud cost, escaping VMware locking, and redefining HPC for the AI era. This is episode 103 of Great Things with Great Tech with Parallel Works. Hey, Matthew. Welcome to episode 103 of Great Things with Great Tech. It's great to have you here. (00:56) Um, and before we get into all things parallel, let's talk about yourself, background in civil and structural engineering. So, you're a smart guy for sure. Um, but tell us, you know, how you got started in this world and how you came to found this company. >> Great. Yeah, Anthony, good to be here. Thanks for having me. Uh, goo d question. (01:18) I started actually as you as you kind of said a practitioner in engineering companies. I was working for large architecture engineering companies that design big buildings across the globe. Uh Burj Khalifa these type of things it's called and and some other ones in a similar vein. >> And my job in these companies was to run computing simulations. (01:45) My job was to take different designs or iterations of either a building or a campus or city and run them through really phy sics codes at the end of the day, which is kind of what high performance computing is all about. You're running simulations on physics to try to replicate some type of problem in the real world. (02:07) And what I would do is I would take computers that were in the organization that we we had and they were actually old render farms that I was able to kind of piece together. >> Okay. Yep. >> Uh I think about like 30 of them or something is how I started. This was >> What sort of time frame was th is? >> This was like 15 years ago. (02:24) So >> Okay. >> When was that? That was right. Right when I >> graduated 2010 now. Yeah. 2010. That's It sounds like it should be something. It's 2010. >> Something like that. And I was basically uh you know I put all these individual computers together. A problem you know came in and I'd say they I'd say all right I can use this particular simulation tool to try to solve maybe like the structural optimization. (02:47) How much you know can you minimize structural output to save on uh you know volume of uh concrete or whatever it is. There's problems like that. or how how can you position the facade to get the most sun hitting it so you can maximize solar output for example. So it was problems like that all different kinds >> pretty important stuff right for buildings this is this is >> well it's everybody has different ways of doing this this was like high performance design they called it so that was that was the world that that (03:16) was i n it was always trying to like optimize for costs or you know some output so basically long story short you start running these physics codes on a set of computers and I was running kind of what they call uh Monte Carlo simulations where you're testing different varieties of a of a solution space and then you're trying to figure out which one gives you the optimal. (03:43) So, uh you do that, you run a lot of different iterations, you know, 30 computers lets you run a certain number of iteration s and then you you want to run more. So, where do you go to run more? Uh the the US has I think about five leadership computing facilities or so. I actually got to update the exact number, but they have these leadership computing facilities where they build the large supercomputing systems across the country for researchers and practitioners. (04:12) And Argon National Lab is one. Oakidge is one. PNNL, I believe, is one. There's there's several of them. >> And I went over to Argon National Lab b ecause I live in Chicago and it's, you know, an hour drive away. >> Okay. and what you know started my journey of getting into high performance computing and >> you go you go there you kind of tour the big floor you know they're building size computers so you see you know racks and racks of these computers >> and uh you know I was able to meet my partner Mike Wild who was a a principal investigator there for many years (04:44) building a piece of technology that helps people like me researchers and engineers scale up their simulation codes on these big systems systems and started working on that. And I remember very clearly, you know, first time running a large HPC job. I was in my office late at night, clicked a button, spun up 20,000 cores on an on one of them cray machine. (05:06) >> These are some big systems. These are these are some big super they're proper. >> Yeah, they're big. You know, they're building, you know, building size. They take up several floors and that, you know, you run more simulations than you have in your entire you know, in my entire life at the time. (05:20) And I was like, "Wow, that's really powerful technology. How can we bring that to more organizations?" And that's kind of what started our thinking about, you know what, let's let's create a company about this together. Yeah. >> So, just going before we talk about Parallel Works and that that the starting of the company, um, what what got you into that computing sort of angle before, was it som ething that you just through college, through university that just was was natural to you or? >> Yeah, that's a good question. (05:46) I'm thinking back. I uh I did a lot of programming in college. I went to Northwestern, did a lot of programming uh kind of more for fun type things. And actually, I remember pretty clearly I I went to a conference. It was a National Science Foundation um summit of some kind in Chicago. And someone presented there who was my first boss actually. (06:11) He present ed this uh simulated city of Chicago actually is what he was presenting. >> Okay. And I remember looking at that like when he was presenting I was like wow that's so cool you know being able to like use data and you know physics and simulations to inform improving spaces and improving you know environments and I actually went up to him after and that that's how I got my first job out of college. (06:35) So it was kind of like that and then that took me into this world of using physics codes and were just literally 30 desktops that were just sitting around and nobody was really using them essentially. And you know I was able to kind of take those and put Linux on all of them and use them as kind of an internal cluster for you know this type of computing simulation work. (07:51) >> Yeah. Excellent. So okay so let's talk about parallel work. So out of Argon National Laboratory you've met met your your partner there. So what's the first sort of spark and what's the problem? I mean we can k onths as GPUs and accelerators are really kind of finding their way into organizations it's been kind of evolving into that world because that's what you know the organizations you wouldn't think of as a typical kind of computing consumer are buying these type of systems and need to use them for (09:16) critical parts of their business. So it's the same type of goal. Yeah, because early on like we're talking of those time frames. Obviously, machine learning, I mean, everyone thinks AI is somethi hough, there's >> there is pockets of it. (10:21) You know, there's different within an organization kind of large or small. There's certain groups of users that need to use the computing environments for certain reasons. You know, there may be a team that needs to do what I was doing and simulate some things to kind of get an answer. (10:37) There may be analysts that need to use, you know, Jupyter notebooks for number crunching and, you know, analytic, you know, analysis on on data they're bri r. Same with batch scheduler systems and the HPC. They're all kind of existing in these pockets or silos. (11:44) And that brings a lot of complexity. And I always kind of say this, it's complexity for the infrastructure teams that are supporting the users and the CIO CTO levels depending where they're rolling up into. And then it's complexity for the end users. They're having to kind of whiplash around into like, oh, I'm in my terminal slurmuler cluster now versus like, oh, now I'm in, you know or Google >> it really is a different paradigm on how they handle roles and permissions and (13:00) really allocation of resources and you know billing is kind of different. And so everyone is different and these groups say okay well how do we do that? do we need a Azure expert now or a Google expert or something? And >> we're trying to say you don't really need to do that, you know, when you use a control plane like what we have and we didn't I didn't really introduce exactly it, but yeah, that he difference. Yeah. So through these early years founded in 2015 and then obviously you know working through up until say 2022 23 was it was it just that platform was it was it called anything specifically because now it's activate right and you're going to we're going to talk about the new evolution of that in later on as well but what was it effectively before was it just the UI when activate >> you know even as a as a company right we were you know very uh lightly capitalized Right. We took you know EFA or everyone has one Azure had infiniband on the floor around that time (15:48) they just started rolling that out you could actually get those things to perform in a way that was similar performance to these onrem supercomputing systems. Um, >> and that was kind of a turning point for us because we started >> provisioning cloud resources at that time >> and trying to match performance to these on-rem systems and naturally we're starting to run hybrid then. (16:11) So, but really du nrem and so they're being used because they're seen to be a little bit more (17:13) efficient but that was still str was wasn't there. >> Exactly. Yeah. We were we were helping a handful of organizations with their cloud programs. That's kind of what I say like we were helping them operationalize their cloud programs which what that means in my head is like they want to bring cloud to you know and this is specifically kind of HPC cloud to large users across the organization and they want to do t D contract is able to use that platform and then we can you know we're rolling out other Fed ramp versions as we speak. (18:45) So those are our two managed SAS environments. They're kind of the easy button. >> Yep. >> We're not a cloud reseller. We're just a software company. So you you plug your own cloud credentials into these accounts. >> Yes. There's a little bit of a difference in the I5 high platform, but generally you plug them in and we're acting as an orchestrator inside of your own ac do like allocation and quota enforcement at, you know, at the scale of an organization. And then we run these workflows that you know if you can run a workflow in our platform uh which is all YAML based workflow framework kind of like GitHub actions it will run on any of the systems that you have connected into it. (20:24) So it's kind of a unifying place where oh great this organization has HPC systems it has OpenStack virtualization clusters. It has AWS and you know Google for example cloud u that's kind of a near-term roadmap item kind of toward the end of the year. >> Y okay. And you talked about the DoD side of things as well. (21:41) That's that's obviously very important. HSP IL5 certification. Um, we understand exactly. Well, well, explain some people might not because I I think I do because I work within software within a company that's US-based and the importance of being certified for US specific DoD, but maybe just explain why that is necessary for people that are outside tems that >> our managed environment, our software running in it, everything that users can do inside of it has to meet these 400 items and then we have to document it and audit it and have them check it is what that means. (23:19) >> Yeah. And I think I went through a similar thing. Uh we we call it IRAP here in Australia. So for government contract and whatever. So again different sort of levels. Um and again a whole bunch of controls which you have to adhere to. So I think every every country ments. Right? So it's in this particular level they call it controlled unclassified. So it's CUI which is like uh ITAR so export controlled where it's US citizens and alliance partners in some capacities that are able to go into this environment and run with certain sets of data and software that are not available in the broader ecos you know commercial ecosystem. (24:45) And there's you know I got a whole bunch of examples of things you can kind of do in that environment that you can't do in th lot of things and some domains you know like digital engineering is a good example where they're very >> ingrained trained in the ances and seammens and do of the world where those (26:01) ISV software license become a big part of the expense in a computing environment you know very big I'm not going to say a number I have one in my head but and each one is kind of its own ecosystem so there's other companies out there that have really focused a lot more in that type of world we've been kind of Lab on a large set of maybe GPU accelerators. (27:25) >> Okay. >> And they want to do that without >> interest. What's what is a because I've seen Jupyter Lab pop up there and it's in a lot of your videos as well. So just for the people that don't know what what is Jupyter Labs? It's it's kind of one of the I'd say main flavored Azours of organ you know re ML and AI researchers built actually building the the models >> and it gives you an interactive interface you know a user interface that lets r R Studio or whatever these things are to build their models. and we let them kind of send those tools out to the user base. >> Yeah. I just wanted to go back. (28:53) You mentioned abstracting schedulers and I was just I was just thinking about what that actually means. You know, I'm seeing those being and you kind of mentioned it sort of an old school way of of doing things versus a modern way. That's how I kind of picked it up. But what did you mean by, you know, abstracting the schedulers a they want to deploy their workloads or they're doing development inside of that uh they need to interact with cube control you know that's the inter interface you're using to interact with that cluster in these HPC systems a lot of people are running slurm or PBSulers or LSF they're different flavors of it you go into a terminal and (30:20) you have to write these commands to interact with the computing systems underneath it so these are this is kind of what I mean by scheduler >> when I say our ntioned customers who have moved from onrem to the cloud in that period of time have you seen the pull back to that repatriation happening over the last couple of years is that something that you're seeing and obviously you're able (31:39) to support that because it's almost like where you started >> are you seeing that as well yeah >> absolutely well yeah absolutely it's it is it is happening right for sure and it's interesting to see because you kind of started started on prem and sort of did 're actually supporting customers now that are wanting to migrate their workloads back into an on-prem you know colo or data center but they want to match the user experiences in the cloud they're running today and still have the opportunity to burst out. So we kind of fit right into that that world. Yeah. (33:07) >> And being that overlay sets you up perfectly for that. >> That's kind of a big a big purpose. It's kind of, you know, a group that's wanting to do that, you know, we can say, hey, g t just real quickly, how did the name come to be? That's that's I always ask >> it came up as you know we were a high performance computing parallel processing >> software and we were like what's a good name for that and we kind of landed on parallel works and I remember we were like looking at what domains are open 10 years ago now and that one kind of just stuck and we ran with it so it was all it was that theme workflows for parallel computing and you know that parallel (34:35) works >> all r y in our platform where you can set up these you know allocations or resource quotas. (35:40) Those authenticated users are able to actually get a cube control file out and all the authentications taking place. uh we're allowing these infrastructure teams to actually put rates dollar rates on their you know currency rates on their CPU, RAM, GPU and storage resources as part of the cluster and they can charge back the resources across an organization because we're seeing a demand for groups that ery powerful you know uh platforms and so just giving it to one seems like it's not the most efficient way to do it and >> well there's a place for that when you (37:10) run out when you run a process into production Right? And now you need to scale out you know your workload across multiple >> GPUs or multiple nodes then yes you need the whole node and do it but when people are developing that's not always the case. >> Interesting. >> These things are very expensive and you know >> uh often tim r existing infrastructure and that's (38:18) kind of a you know big differentiator I'd say for us where we've supported those other infrastructure types for a long time, >> you know, and now now great, Kubernetes can come along. You can slide it in exactly as the other ones uh work user experience wise. >> And maybe a final question to lead into that. (38:36) So, how have you seen the profile of the I guess a data scientist or the high performance compute user evolve over times as a bit of a sor place their physics code models. (39:46) So it's the same they they need to be able to do the same type of virtual testing for physics but now they're augmenting it with ML inference models instead of having to run crunch the actual math numbers. So that's been happening a lot, you know, and then I'd say as that world moves now to more, you know, AI forward organizations, maybe groups you wouldn't conventionally think of as, you know, consuming computing for R&D. (40:11) um they're buying these 41:05) starting to get that hype cycle, you know, pumping a little bit. Totally. You know, I mean, where does this is where high performance comput leads itself directly into that in >> Yeah, I mean, we're going to keep evolving with the infrastructure, right? So I mean we we're already talking to some quantum companies that you know ones that make on-prem quantum systems and then also let you rent them about folding that in as another you know again infrastructure type for us that >> as organiz And for us eventually, >> I see us becoming kind of this engine that helps organizations and then the end users >> place their computing tasks without having to like explicitly assign them to a place. It's like where's the best place to run based on you know these policies or objectives like cost or performance or power availability or you know it needs this type of task or oh this can run out of an edge. (42:40) I want our system to be the type of thing that helps you know organizations deal wi ead the word and if you feel like it, drop a review. Thanks for joining us and we'll see you next time on Great Things with Great."