中文 English

Double-Consumed Streaming: Debugging a NewAPI v1.0.0-rc.10 MiniMax Proxy Bug

Published: 2026-05-31
NewAPI OneAPI MiniMax Streaming AI Agent Debugging OpenAI Compatible API Engineering

TL;DR

An AI Agent was producing duplicated replies in DingTalk — not two separate messages, but a single message whose content appeared twice. The root cause traced back to a streaming bug in NewAPI (the QuantumNous fork of OneAPI) v1.0.0-rc.10: when proxying MiniMax through the OpenAI Chat Completions protocol, the finish chunk contained both delta.content and message.content with identical values. The Agent’s stream handler consumed both as “visible text,” concatenating the content twice. The fix was straightforward: switch the OpenClaw provider protocol from openai-completions to anthropic-messages, which OneAPI natively supports and where streaming behaves correctly.

No internal IPs, tokens, model IDs, or private paths appear in this article. All configuration snippets have been sanitized.

Streaming duplicate analysis diagram

Figure 1: Self-made analysis diagram showing the complete request flow from user to duplicate reply, with the streaming chunk that causes the issue.

1. Background: When AI Agents Meet IM, Every Anomaly Becomes Visible

Connecting AI Agents to instant messaging tools — DingTalk, WeChat, Lark, Slack — has become a common practice. The benefits over terminal-based interaction are obvious: no need to remember dashboard URLs or SSH into a server. Just send a message, and the Agent executes scripts, queries status, analyzes data, or summarizes documents.

But this convenience also amplifies problems. Extra output in a terminal is mildly annoying; the same output in DingTalk or WeChat becomes a visible defect. More importantly, an IM integration typically involves a long chain: message receipt → deduplication → session routing → model selection → Agent execution → streaming assembly → text normalization → send API. Any layer in this chain can introduce issues, and the user only sees the final defective message.

In this case, the user chatted with an OpenClaw Agent via DingTalk. Every query produced a reply with duplicated content — not two messages, but one message with two copies of the same text. This distinction is critical: it determines whether debugging starts at the send layer or the model layer.

2. Symptoms: One Message, Two Copies

The problem can be stated in one line:

User says: hi
Expected: Hello! How can I help you today?
Actual: Hello! How can I help you today? Hello! How can I help you today?

Evidence from OpenClaw session logs confirmed this. With the custom/DIY-123 model (routed through the OneAPI gateway), the Agent recorded:

{
  "provider": "custom",
  "model": "DIY-123",
  "finalText": "Hello! How can I help you today? Hello! How can I help you today?"
}

But with a native MiniMax provider, the same prompt produced clean output:

{
  "provider": "minimax",
  "model": "MiniMax-M2.7",
  "finalText": "Hello! How can I help you today?"
}

The comparison was unambiguous: the same model produced different quality through different provider protocol paths. The compatible gateway path generated duplicates; the native path worked perfectly.

Provider comparison diagram

Figure 2: Same prompt, different provider. The red path (openai-completions protocol) produces duplicates; the green path (anthropic-messages) is clean.

3. Debugging: Backward Tracing, Layer by Layer

My approach was to avoid assumptions and instead use minimal reproducible samples to eliminate layers one at a time — essentially a backward trace from the visible symptom.

3.1 Layer 1: Was the DingTalk Send Layer Sending Twice?

Many IM integration issues start at the send layer. IM platforms typically have retry, callback, ack, and deduplication mechanisms. If the send API was called twice, the fix would be in the send layer.

But the key clue was: the user saw one message with duplicated content, not two messages. This strongly suggested the send layer was called once with already-duplicated text.

I verified by examining the DingTalk plugin’s send path. The send layer constructs a single text item — it doesn’t naturally loop. The real question was whether the Agent’s final output text was already duplicated.

3.2 Layer 2: Agent Execution Trace

OpenClaw session logs record key fields for every turn:

The logs showed the custom/DIY-123 provider’s finalText already contained duplicated content. This proved the duplication happened before the DingTalk send layer. The send layer just faithfully delivered already-corrupted text.

Now the focus shifted to the model routing and provider layer. Why did the OneAPI gateway path produce duplicated content?

3.3 Layer 3: Direct OneAPI Streaming Test

This was the turning point. Instead of guessing at the code level, I sent a direct curl request to the OneAPI endpoint to observe the raw streaming output:

curl -s -X POST "https://<ONEAPI_ENDPOINT>/v1/chat/completions" \
  -H "Authorization: Bearer <TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{"model":"<MODEL_ID>","messages":[{"role":"user","content":"hi"}],"stream":true,"max_tokens":500}' \
  | python3 -c '
import sys, json
for line in sys.stdin:
    line = line.strip()
    if not line or line == "data: [DONE]": continue
    if line.startswith("data: "):
        d = json.loads(line[6:])
        for c in d.get("choices",[]):
            delta = c.get("delta",{})
            finish = c.get("finish_reason","")
            msg = c.get("message",{})
            if "content" in delta and delta["content"]:
                print("DELTA content:", repr(delta["content"]))
            if finish:
                if "content" in msg and msg["content"]:
                    print("FINAL message.content:", repr(msg["content"]))
                print("FINISH:", finish)
'

Output:

DELTA content: 'Hello! How can I help you today?'
FINISH: stop
FINAL message.content: 'Hello! How can I help you today?'
FINISH: stop

Bug confirmed. The same text appeared twice: once via delta.content (normal streaming event) and once via message.content inside the finish chunk (redundant).

Curl verification output

Figure 3: Direct curl verification — the finish chunk contains both delta.content and message.content with identical text.

4. Root Cause: OneAPI’s Finish Chunk Includes Redundant message.content

4.1 OpenAI Chat Completions Protocol Standard

In the OpenAI Chat Completions streaming protocol, each chunk (except the final one) carries only the delta field — no message field. The final chunk (where finish_reason is not null) may include message, but a correct implementation never sets both delta.content and message.content to non-empty values simultaneously.

A standard finish chunk looks like:

{
  "choices": [{
    "delta": {},
    "message": {"content": "Hello!"},
    "finish_reason": "stop"
  }]
}

Or simply:

{
  "choices": [{
    "delta": {},
    "finish_reason": "stop"
  }]
}

4.2 OneAPI v1.0.0-rc.10 Actual Behavior

But OneAPI v1.0.0-rc.10 (QuantumNous fork), when proxying MiniMax responses, produces a finish chunk like this:

{
  "choices": [{
    "delta": {"content": "Hello! How can I help you today?"},
    "message": {"content": "Hello! How can I help you today?"},
    "finish_reason": "stop"
  }]
}

Note that delta and message both contain the identical content. Depending on the client’s streaming handler strategy:

  1. Consume only delta.content, ignore message.content — clients using this strategy would not trigger this bug
  2. Consume both — OpenClaw’s stream handler treats both as “visible text” and concatenates them

OpenClaw uses strategy 2, which works correctly with most compatible gateways. But OneAPI’s dual-output finish chunk exposes a difference that other gateways handle gracefully.

4.3 Why MiniMax Specifically?

This relates to OneAPI’s MiniMax channel adapter implementation. OneAPI’s MiniMax adapter (channel type 35), when mapping MiniMax native responses to the OpenAI-compatible format, adds message.content in the finish chunk alongside delta.content. Other model channels (like the native OpenAI channel) may not exhibit this issue because their finish chunk generation logic is different.

5. Fix: Switch to the anthropic-messages Protocol

5.1 Options Considered

Once the root cause was identified, several fix paths were possible:

  1. Modify OpenClaw’s stream handler to ignore delta.content in finish chunks. But this is a global change that could affect other providers.
  2. Patch OneAPI’s source code to fix the finish chunk logic. But OneAPI is an upstream project — fixes depend on release cycles.
  3. Switch protocol channels: OneAPI natively supports both /v1/chat/completions (OpenAI) and /v1/messages (Anthropic) endpoints. The anthropic-messages protocol path produces clean streaming output.

Option 3 offered the lowest cost and highest reward. OneAPI has built-in support for the Anthropic Messages protocol, and OpenClaw includes anthropic-messages in its GENERIC_PROVIDER_APIS.

5.2 Configuration Change

In the OpenClaw configuration, change the provider’s api type from openai-completions to anthropic-messages:

{
  "providers": [
    {
      "name": "custom",
      "api": "anthropic-messages",
      "baseUrl": "https://<ONEAPI_ENDPOINT>",
      "apiKey": "<TOKEN>",
      "models": [
        {"model": "DIY-123", "reasoning": true}
      ]
    }
  ]
}

Note: the anthropic-messages protocol automatically appends /v1/messages to the base URL. So only the base address is needed without the /v1 suffix.

5.3 Verification

After switching protocols, a direct curl test of the /v1/messages endpoint showed clean streaming:

data: {"type":"content_block_delta","delta":{"text":"Hello!"}}
data: {"type":"message_stop"}

No duplicates. No redundant message.content. The Agent’s actual conversations returned to normal:

User: hi
Agent: Hello! How can I help you today?

6. Why Not Patch OpenClaw’s Stream Handler?

You might ask: if OpenClaw consumes both delta.content and message.content, isn’t that OpenClaw’s bug?

That’s a fair question. In a strict sense, OpenClaw’s stream handler could be more defensive. But from a protocol compatibility perspective, OpenClaw’s behavior is normal with the vast majority of OpenAI-compatible gateways. Most gateways either leave delta empty in the finish chunk or don’t set message alongside a non-empty delta. The issue is specific to OneAPI’s implementation.

Furthermore, modifying OpenClaw’s stream handler would be a global change. If some future provider needs to carry extra information via delta.content in the finish chunk (for example, tool call result fragments), a global ignore could introduce new bugs. The protocol switch is a precise, reversible fix that doesn’t affect other providers.

7. Takeaways: A Reusable Debugging Approach

The most valuable outcome of this debugging session isn’t the config change — it’s the repeatable methodology:

1. Classify the duplicate type first

2. Reproduce with a minimal prompt

Only reply with these two words: Hello

Don’t start with long prompts. A minimal prompt eliminates context noise and exposes the shortest path to the issue.

3. Verify raw responses directly Don’t rely on log inference alone. Use curl to test the target endpoint’s raw output. A self-describing Python filter that extracts only key fields makes the judgment obvious.

4. Run provider comparison experiments If your system has multiple providers or protocol paths, run the same prompt through each. This quickly distinguishes “model-intrinsic issues” from “protocol-path issues.”

5. Track the final visible text In Agent environments, don’t just check whether the model API returned 200. Examine the Agent’s assembled finalText, because streaming processing, tool calls, reasoning, and fallbacks can all affect the final output.

8. Q&A

Q1: Has this bug been reported to the OneAPI repository?

QuantumNous/new-api has limited reported issues about the MiniMax channel. This bug may not have been sufficiently documented. Affected users are encouraged to file an issue or PR. The fix direction would be: in the MiniMax channel’s finish chunk generation logic, ensure that delta.content and message.content are not both set simultaneously.

Q2: Would upgrading OneAPI fix this?

It depends. This bug exists in v1.0.0-rc.10. If a later version has fixed the MiniMax channel’s finish chunk logic, upgrading would resolve it. Until then, the protocol switch is a safe workaround.

Q3: What’s the difference between anthropic-messages and openai-completions protocols?

Both are natively supported by OneAPI but use different data formats:

OneAPI receives the MiniMax native response and converts it to each format. The bug only manifests in the former; the latter works correctly.

Q4: Would this happen with non-streaming requests?

No. Non-streaming requests return a single complete response body with no streaming event assembly. This bug is streaming-specific.

Q5: Would other models (non-MiniMax) trigger the same issue?

It depends on the channel type implementation in OneAPI. Different model channels (OpenAI, Azure, Google, MiniMax, etc.) have independent response conversion logic. This issue is confirmed in the MiniMax channel (type 35). If other channels have similar dual-output logic in their finish chunks, they could exhibit the same problem.

Q6: Would this issue also appear in WeChat or other IM channels?

Yes. This is not an IM-channel-specific issue — it’s a model-layer output problem. Whether the backend is DingTalk, WeChat, or another IM channel, as long as the Agent uses the same provider path and protocol, it will produce the same duplicated text. This is precisely why fixing the root cause is better than patching each IM channel separately.

References