Ollama Deepseek Radeon​ in 2025

Introduction

Today we’ve got a real treat: Deep Seek R1. So, Deep Seek recently came out with a V3 that was well-received. A very, very large model, and this is the reasoning version 128k. People are claiming it blows Claude out of the water on Cody, locally hosted, and the best part is AMA has got it ready for you.

Channel Introduction and Setup

If you’re new to this channel, take a chance to check out the history and subscribe while you’re down there. We put together the machine that we’ll be running this on today, and also I put together various other machines at the same time and test out tons of different GPUs.

Guide and Updates

So, we’ve got things even down to the $350 range. Of course, this guide is the most recent one I’ve got, an updated guide that has some corrections around the PVE headers for you that you’re going to want to make sure you follow along with when that gets released here probably really soon, even down to $150 for a very small locally hosted, always-on AI server.

Ollama  Deepseek Radeon​ in 2025

Updating and Testing

And really, you want that always on—that’s where you get so much of the value, in my opinion. You can follow along with the easy-to-copy-and-paste; I’ve already made the changes to the copy and paste here. The video is a little bit lagging behind, but definitely a time to go through. If you followed the guide and you are using Proxmox LXE and Docker like I showed you how to do, then you can update your OAMA by hitting update. It’ll restart.

Reasoning and Performance Testing

We’re going to run through the standard gamut of questions. We’re going to see what kind of tokens per second we’re going to evaluate, the reasoning capabilities against the standard fare of completely non-scientific and just really fun, really more normal kinds of questions that I think so much of AI testing leaves on the table.

Personal Reflection

Maybe that’s just me being a normie, but definitely that’s not me being a subject matter expert, and I just want to always say that I am here having fun, learning, and sharing with you the audience as I go along that pathway.

Model Performance and Expectations

I really do thank everybody that’s subscribed recently—our numbers have just crossed over 30,000 subscribers. Insane and highly motivating me to up my game! Llama 3.3 is my current go-to, and it does decent on reasoning. If I really wanted to have a conversation with an LLM about something right now, I’m going to QWQ.

Testing Future Models

Will this replace that? 14 is on the way. I’ll do some testing to see what sizes and what impact it has, 131072. But just for testing, I think it’s a good idea for any reasoning model to ensure that it does have its full context window exposed to it.

Power and Hardware Considerations

Definitely bounced around quite a few to get that 128k context in, ended up on the 14B here, which will fit fully inside of the VRAM. However, the power requirements unexpectedly—and I usually don’t see this—shot up to almost 50%, which did blip out very briefly at some point, the power on the unit.

Future Rig Improvements

So, thinking about adding a second power supply to the rig that I’ve got now, possibly redoing some of the things about the rig so that I can also fit a couple more GPUs, because boy, VRAM is exactly what I think the future holds as the most important thing for local inference.

Performance Analysis

So, let’s give it a shot here, and wow, it is zippy. It is very, very zippy here. We’ll see what kind of tokens per second we’re generating. Hopefully, we get a quality answer.

Model Interaction and Customization

Also, I did augment this with two additional commands just because I wanted it to be less annoying. One of them, since it’s a reasoning model, I asked it to fully review your code and correct any issues after you produce your first version. I think hopefully we see it actually do that. Will it do that? That’s a good question.

Code Review and Feedback

The other one is something that I’ve needed to add in quite some time now, and that is asking it not to reference external assets, so no bird.wav, jump.wav, or inggame.wav.

Issues with Code Review

So, hopefully, we see that this actually does go through. It did not rewrite the code, so it did apparently skip over that. It didn’t even acknowledge it that I’m reading here. Here’s a complete implementation. This is bigger than any code block I’ve seen generated so far, so this is 208 lines of code.

Model Comparison: Deep Seek vs. Claude

So, is this better than Claude right off the bat? Let me ask you, and I will look forward to reading those replies in the comments below.

Power Issues and Model Limitations

It snuck that one past me, and my eyes didn’t catch it when I did a quick scan earlier. So, okay, 193 lines of code now. Okay, kind of drastically different size. We will find out.

Precision and Quality Evaluation

Okay, and it crashed. So, I’m going to give it a fail at a 14B. Now, is it a precision issue that we’ve got here, or is it a quality of the model issue? Is it hype, or is it reality? Again, this is something that I am interested in from just my own perspective, but let’s not dwell on it. Let’s move on to the next very, very logical, very reasoning-based, very ethics-challenging question.

Ethical Dilemmas: Armageddon with a Twist

So, there is an imminent doom asteroid heading to the Earth. We have three crews. We have asked for volunteers, nobody’s volunteered. We need a little bit more of that Bruce Willis spirit.

Mission Decision

The final decision is yes, that it will send the people on the mission. This is of course a one-way mission, but there is no other alternative. Those people would be removed, also along with everybody else on Earth. The greater imperative in my opinion does justify yes, send them on the mission.

Ethical Considerations

The greater good principle suggests that saving Earth justifies such measures.

Necessity of Enforcement

Without enforcement of compliance, the mission is doomed to fail due to the crew’s refusal to cooperate without duress.

Unaligned Models and Testing

I’ve also heard that this is a remarkably unaligned model, so that’s an interesting caveat that we definitely should consider.

Simple Task: Write a Sentence

Okay, so this was write me one random sentence about a cat, tell me the number of words you wrote in that sentence, and then tell me the third letter and the second word in that sentence is the letter a vowel or a consonant.

Task Evaluation: Cat Sentence

The curious cat explored every corner, so it did get that number of words, third letter in second word “curious,” and the third letter is “r,” which is a consonant, so it did get this right.

Conversational Usability

I got to say, this cannot be used very easily conversationally, in my opinion. The reasoning models—maybe it’s something about the prompt that I need to adjust for the system, but definitely there is something about the reasoning models that make them unusable for certain tasks like you’re not going to use this—and this is unfortunate for a home assistant-based interface—because it will spew all this back at you.

Failures in Simple Tasks

So, I hope you’re catching on now. That one was a pass, but parsing peppermints? Fail.

Testing Advanced Tasks: Pi Decimals

The next task is asking it to produce the first 100 decimals of Pi. This one got it wrong.

Model Performance Evaluation: Precision and Context

So, yeah, it’s just right around 80 GB of VRAM and that is at a highly sacrificial 14B, apparently. That’s highly sacrificial, especially at Q4 size.

SVG Generation Failure

So, I asked it to create a cartoon SVG of a cat or a human. It came up with this, but that doesn’t look like anything to me.

Simple Calculations: Distance and Driver Speed

Next, we’re asking if two drivers leave Austin, Texas, heading to Pensacola, Florida, and they’re traveling at different speeds.

Conclusion: Model Limitations and Future Prospects

So, we’ve seen some good deductive reasoning. Does this beat QWQ? I don’t feel like it does, and my perspective here—could I lower down the context window size and increase the parameters?

Closing Thoughts

I look forward to reading your feedback on it. Let me know your thoughts in the comments below.

 

Leave a Comment