How to Investigate Language Shift in AI Coding Assistants: A Step-by-Step Guide

Introduction

Have you ever typed a prompt in Chinese to your coding assistant, only to receive a reply in Korean? This puzzling behavior—observed when an AI model unexpectedly switches languages—offers a fascinating window into how embedding spaces reshape language based on code vocabulary. In this guide, we'll walk through a systematic investigation of this phenomenon, using tools from natural language processing (NLP) and embedding analysis. By the end, you'll understand why such shifts occur and how to trace them back to the underlying language representations.

How to Investigate Language Shift in AI Coding Assistants: A Step-by-Step Guide — Source: towardsdatascience.com

What You Need

A coding assistant or language model that supports multilingual prompts (e.g., GPT‑3.5/4, Codex, or an open‑source model like CodeLlama).
Python environment with numpy, scikit-learn, transformers, and matplotlib installed.
Sample prompts in Chinese (or any source language) that include code snippets (e.g., Python, JavaScript).
Access to the model's embeddings (e.g., via Hugging Face API or a local model).
Jupyter Notebook or similar for interactive analysis.

Step‑by‑Step Investigation

Step 1: Reproduce the Anomaly

First, confirm the behavior by sending a mixed prompt—Chinese text plus a short code block—to your coding assistant. Record the full input and output. For example:

输入：请写一个Python函数来计算斐波那契数列。def fib(n):

If the response is in Korean (or another unexpected language), note the output exactly. This step establishes the phenomenon you'll explain.

Step 2: Extract Input and Output Embeddings

Use the model's embedding layer to capture the vector representation of both the prompt and the response. In Python:

from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('model-name')
model = AutoModel.from_pretrained('model-name')
inputs = tokenizer(prompt, return_tensors='pt')
with torch.no_grad():
    embeddings = model(**inputs).last_hidden_state.mean(dim=1)

Repeat for the Korean output. Save these embeddings for later comparison.

Step 3: Compute Embedding Similarity

Calculate cosine similarity between the prompt embedding and the response embedding. A low similarity (e.g., < 0.5) hints that the model shifted its internal representation. Compare this to a control prompt (pure Chinese without code) to see the difference.

from sklearn.metrics.pairwise import cosine_similarity
similarity = cosine_similarity(emb_prompt, emb_response)

Step 4: Analyze Vocabulary Overlap in Embedding Space

Use t‑SNE or PCA to visualize the embedding neighborhoods of key tokens. Tokenize both the Chinese and Korean parts of the prompt and response. For each token, find its nearest neighbors in the model's embedding space. Look for tokens from the code snippet (e.g., def, if, print) that might be closer to Korean words than to Chinese words.

Step 5: Identify the “Code Vocabulary” Influence

Code keywords and symbols (e.g., var, function, =) often embed near their translation equivalents in many languages. In multilingual models, these points can become “attractors” that pull nearby tokens toward a different language region. Calculate the centroid of all code‑token embeddings and measure its distance to the centroid of Chinese tokens vs. Korean tokens. A significant drift toward Korean indicates that code vocabulary reshaped the language region.

Step 6: Test with Synthetic Variations

To confirm causation, create controlled prompts:

Only Chinese (no code) → expect Chinese output.
Code only (no natural language) → observe output language.
Chinese with non‑code symbols (e.g., punctuation) → still Chinese.
Chinese with code but replace code tokens with synonyms (e.g., def vs. define) → see if language shift weakens.

Run each through the model and record the output language. This isolates the effect of code vocabulary.

Step 7: Map the Embedding Geometry

Build a small 2D PCA projection of token embeddings from all your prompts. Color points by language (Chinese, Korean, code tokens). Observe clusters. You should see an overlapping region where code tokens reside—if that region is closer to the Korean cluster than to the Chinese cluster, the model’s representation is biased.

Step 8: Hypothesis Verification

Based on your analysis, formulate a hypothesis (e.g., “Code tokens in Chinese prompt pull the embedding space toward Korean because the model was fine‑tuned on parallel code‑Korean data”). Test it by checking the training data documentation or by probing with specific token pairs. If possible, repeat the experiment with a different model (e.g., Codex vs. GPT‑4).

Step 9: Document and Share Findings

Write a brief report summarizing your method, visualizations, and conclusions. Include code snippets and embedding plots. Share with the NLP community—this helps improve multilingual model design.

Tips for a Successful Investigation

Use a debug environment – Run experiments in a Jupyter notebook to iterate quickly and keep all results in one place.
Control for randomness – Set manual seeds (torch.manual_seed(42)) to ensure reproducibility.
Compare with a baseline – Always run a pure Chinese prompt without code to see the expected output language.
Look out for tokenizer artifacts – Some tokenizers split Chinese into byte‑pair encoding (BPE) pieces differently than Korean; note how token IDs differ.
Visualize early and often – Plot embeddings in 2D/3D to spot clusters that raw numbers might hide.
Check model card – The Hugging Face model card often lists training languages and data sources that explain biases.
Try different code languages – Python, JavaScript, and SQL may have different embedding neighborhoods.
Keep a log – Record every prompt, output, and similarity score to track patterns.

By following these steps, you can turn a strange quirk into a clear lesson about how embedding spaces encode multilingual relationships. The investigation not only explains why your coding assistant replied in Korean—it also reveals fundamental properties of modern language models.