Local LLMs are the rage: Deepseek v3

Subhajeet Dey

•

January 17, 2025

DeepSeek has made a significant advancement with the introduction of DeepSeek-V3, a model that surpasses its predecessor in coding, mathematical reasoning, and natural language processing capabilities

What Is DeepSeek-V3?

DeepSeek V3 is a mixture-of-experts (MoE) language model with 671 billion parameters, 37 billion of which are activated per token.

It is trained on 14.8 trillion high-quality tokens and excels in various tasks, including code generation and analysis. The model architecture incorporates innovations like multi-head latent attention (MLA) and an auxiliary-loss-free strategy for load balancing, ensuring efficient inference and cost-effective training.

The multi-head latent attention (MLA) mechanism enables the model to focus on multiple aspects of input simultaneously, improving inference efficiency. The DeepSeekMoE architecture employs a mixture-of-experts approach to optimize training costs and performance.

An auxiliary-loss-free load balancing strategy distributes computational load evenly without relying on auxiliary loss functions, enhancing training stability. Additionally, a multi-token prediction objective allows the model to predict multiple tokens simultaneously, boosting performance and enabling speculative decoding for faster inference.

Some of the reasons that make DeepSeek-V3 particularly exciting for me are:

It offers advanced reasoning and understanding, making it suitable for complex tasks like code completion and analysis.
With a processing speed of 60 tokens per second, DeepSeek-V3 is three times faster than its predecessor, DeepSeek-V2.
Both the model and its accompanying research papers are fully open-source, promoting transparency and community collaboration

How to Connect to the DeepSeek-V3 API

To integrate DeepSeek V3 into our application, we need to set up the API key. Follow the steps below to access your API key:

1. Go to DeepSeek.com and click on “Access API”.

2. Sign up on DeepSeek’s API platform.

3. Click on “Top up” and add the required amount to your account. At the moment of writing this article, DeepSeek API pricing is:

Input (cache miss) : $0.14/M tokens

Input (cache hit): $0.014/M tokens

Output: $0.28/M tokens

3. Navigate to the API Keys tab on the left side and click on “Create new API key.” Finally, set a name for the API key and copy it for future use.

How to use deepseek API:

import requests
import json

# Replace 'YOUR_API_KEY' with your actual DeepSeek API key
API_KEY = 'YOUR_API_KEY'
API_URL = 'https://api.deepseek.com/chat/completions'

HEADERS = {
    'Content-Type': 'application/json',
    'Authorization': f'Bearer {API_KEY}'
}

def get_deepseek_response(prompt):
    """Fetches a response from the DeepSeek API based on the given prompt."""
    data = {
        'model': 'deepseek-chat',  # Specifies the DeepSeek V3 model
        'messages': [
            {'role': 'system', 'content': 'You are a helpful code reviewer.'},
            {'role': 'user', 'content': prompt}
        ],
        'stream': False  # Set to True for streaming responses
    }

    response = requests.post(API_URL, headers=HEADERS, data=json.dumps(data))

    if response.status_code == 200:
        result = response.json()
        return result['choices'][0]['message']['content'].strip()
    else:
        raise Exception(f"Error {response.status_code}: {response.text}")

Let’s break down the code above in more detail:

The requests and JSON libraries are imported to make HTTP POST requests to the DeepSeek API and handle the encoding of the request payload into JSON format, respectively.

Set up your API key and base URL that specifies the DeepSeek API endpoint for chat completions.

The get_deepseek_response function sends a user prompt to the API and retrieves the response.

We construct the request payload with the specified model, message history, and streaming preference, then send a POST request to the API endpoint with the appropriate headers and JSON payload. If the response has a status code of 200, it indicates success; the assistant's reply is parsed and returned. Otherwise, an exception is raised with the error details

Apart from using API you can directly use the chat interface like ChatGPT

Deepseek V3 is the most capable open llm which comes as the cheapest amongst ChatGPT 4o, Claude Sonnet and even self hosting Llama 3.3