Hello World | Joe Ton

Human contribution

Speaking the full draft out loud

AI assistance

Speech-to-text dictation (Grok)

Ramble

Hello there. So this is my first test run with speech-to-text. I’m gonna try to explain things as best as possible in an unedited way. Um, one of the things that I am trying to do with this website is to build a transparent structure that allows me to explore AI systems performance engineering. What I mean by that is, can I take a new piece of paper, relatively speaking, and implement on hardware, get the benchmark and findings, improve my vertical stack knowledge, and try to do it in a transparent way that would be useful for other people. So this isn’t just rambling, although it’ll be, it’ll sound like rambling. I’m trying to present value to the readership or to the people who are willing to listen on my thought process. And hopefully, through this, it will inspire more engineers to pursue AI systems performance engineering. And in addition to that, allow for feedback loop to improve my own craftsmanship. This is a relatively hard position to get into, and there isn’t particularly a roadmap that I can see that would get me into a position like that of AI systems performance engineering that would deal with the hardware, the software, the algorithms, the different complexities that go with scaling, optimizing an AI ecosystem like AMD or NVIDIA. For now, the landscape that I see is, I’m probably going to pursue more of an AMD roadmap. I don’t know if that would get me anywhere. But the nice thing about the AMD ecosystem is that right now it’s open source. And it seems like it’s going to be open source for a while. The issue here is that, of course, NVIDIA is not open source, but it sounds like they will invest a substantial amount of money, to the tune of billions in order to get their CUDA version into an open source format. I don’t know how long it’ll take in order for them to do that, but I imagine they’re going to be very mindful of harnesses to ensure that AI slop doesn’t go into that codebase. So, speaking about AI slop, one of the things that I really want to ensure that I prevent is using too much AI to do the thinking for me. There are trade-offs, obviously, to AI, and one of the things that I don’t like and that I’m fearful of is brain rot. For the people who don’t know, brain rot is this process where you overuse AI to outsource thinking and outsource talking and outsource understanding to the eventual detriment of your own skill. For example, if I were to use AI heavily for coding, the issue is that I will unfortunately get to the point where I experienced this for myself already. That, when I overuse AI for coding, I start developing less skills in reading. And if my goal here is to understand, I need to be able to read. And, sometimes, the code that’s outputted from AI usage is so many lines of code changes that it requires another AI to be able to go through and ensure that I am not doing anything wrong. And, I have this horrible feeling that I’m introducing things into the codebase that is not useful or is material that hallucinates. And the issue here is not a matter of introducing like a needle in a haystack. It’s just that the haystack will be a bunch of needles. And maybe that’s part of the trade-off that you have to be careful with, or rather that you’re just okay with, which is when you use AI coding, you increase the speed substantially, maybe to the tune of 20 to 50 percent. But according to a lot of the articles that I’ve read and these quote-unquote research, you’re introducing maybe 60 to 100 percent more bugs. And at first glance, this seems like a horrible investment. But when you think about it in certain cases, assuming that you do some testing and you have some guardrails and some harness, it actually isn’t that bad. And assume that the technology for AI coding improves over time, maybe the hallucination can be reduced, especially if we increase the context window and have a better HBM for KV caching. Maybe there is a future where the abstraction that we work on is just pure language as opposed to code. I don’t know if that reality will ever come true, because I am thinking a lot about hallucination and whether or not there will be a time where I’m interested in understanding and ensuring that it’s moving in the right technical route. Because if you take a look at a piece of code, it’s not a matter of a translation from human to machine code. It’s a matter of what kind of trade-offs can I do for this particular software solution that only shows up given the language that I have in mind. And if that is true, if I change my mind suddenly, and the requirements, which oftentimes do change over a period of time, does become dynamic, then what happens at that point? And I’m not quite sure what the solution to that is. Whether or not it makes any sense to be able to have a large context window and apply those changes dynamically and be able to double-check as we go. So, not to spill too much and harp on, you know, something that a lot of software engineers and AIs, um, let’s call them specialty or experts. But I just want to make sure that I’m aware of that trade-off. And after all, there are no solutions. There are trade-offs. That out of the way, I think that for now, I’m going to pursue mostly an AMD ecosystem. I’m going to get myself into ROCm. I’m probably going to contribute a lot to the open source. Um, I’m probably going to look, take a look at VLLM, take a look at MLIR, look into Chris Lattner’s work and, um, you know, probably look at some other stuff. And ensure that I can contribute in a meaningful way that actually moves the needle forward. So, the first thing that I want to take a look at in terms of contribution is, I’m thinking about implementing or doing the attention matching. This is a paper that came out in February of this year. That really caught my attention. And although there’s a lot of benchmarks that are already done on the H100, H200, there hasn’t been any AMD hardware that I’ve seen. So, this makes me wonder if there’s a reason for that. And probably the reason for that has to do with just what people default to, which is NVIDIA hardware. If that is true, then I have a potential opportunity to invest in AMD hardware and test this properly, and improve my own performance engineering skills. So, I think the first step is to really understand the AMD hardware, and also understand the paper itself. My understanding of the paper is that it essentially converts a book into a cliff notes version in order to compress the HBM, KV tensors. And, which in some ways has already been done with compression, but the way they do it appears to be much more effective. It’s to the tune of 50 times more effective. I have to deep dive this a little bit further and make sure that I understand the paper before I really do a significant amount of time towards it. In addition to that, I want to make sure that I choose the right hardware. Right now, I’m thinking about AMD MI300, not the MI350 or 355, rather. I’m instead just thinking about just going with the MI300 series. And exploring that should be, it should cost a lot less. And I’m willing to spend maybe 100, 200, 300 dollars on this monthly. It seems like the process to do this isn’t that significant. I can also sign up for the AMD Dev thing that I heard. I think it was, they’ll give like $100 worth of credit to start out. This is to encourage AMD developers to invest in here. And then, through that, I’ll find other research papers and implement them over a period of time and ensure that, you know, I’m getting feedback correctly, probably from the X community or some LinkedIn posts or something like that. And then, scale from there. So, that’s my game plan right now is just follow the AMD ecosystem because it’s open source. I can contribute to ROCm as well, um, in hopes of maybe finding some optimization paths. I can take a look into Composable Kernel as well. And look at the compiler. There might be some opportunities in MLIR. Um, I gotta take a look at Chris Lattner’s stuff. I saw that there was some AMD stuff, uh, as well recently about AMD Dev Day or something like that. So I gotta take a look into that as well. But anyway, this is my first post and I’m pretty sure it’s gonna be much more polished in the future. But for now, you know, I’ll take a look at it and I’ll diagnose what I need to improve on and probably do some human editing. Um, I’m using currently Grok’s speech-to-text with its dictation feature. Maybe I should use something else. But for now, I’m gonna use this and see where it takes me. Uh, wish me luck.