Skip to content

guide

Five Ways to Improve Japanese AI Search Accuracy

February 5, 2025 · PC AI

A common complaint after rolling out RAG internally: “the answers aren’t quite as accurate as we expected.”

In a Japanese context, this almost always comes back to one of a small set of issues. English RAG patterns do not transfer cleanly to Japanese, and the places they break are predictable. Below are five levers that tend to produce the largest accuracy gains in practice.

1. Chunk by meaning, not by raw token count

RAG works by splitting documents into small “chunks” before indexing. The chunking strategy matters more than almost anything else.

English tutorials often say “split every 200 tokens.” Apply that directly to Japanese and you will routinely cut sentences and definitions in half, which destroys context.

For Japanese documents, the strategy that works:

  • Respect the existing heading structure. Treat H2 and H3 boundaries as hard chunk boundaries.
  • Sub-chunk within sections by paragraph, treating paragraphs as the natural unit of meaning.
  • Use overlap to preserve term definitions. Make sure key terms are not separated from the sentences that explain them.

For FAQ and manual-style content specifically, ensuring each question and its answer end up in the same chunk alone produces a large jump in retrieval accuracy.

2. Use a multilingual embedding model

Search is powered by embeddings — numerical vectors that represent the meaning of text. If your embedding model is not trained on Japanese, it cannot represent Japanese meaning well.

Models that perform well on Japanese in practice: OpenAI’s text-embedding-3-large, Cohere’s embed-multilingual-v3, and (for self-hosted) multilingual-e5-large.

If you are still using an English-focused embedding model from a couple of generations ago (the older text-embedding-ada-002, for example), this is likely where most of your accuracy is leaking.

3. Combine vector search with keyword search (hybrid)

Vector search is good at finding semantically similar passages, but it is weak at exact matches on proper nouns and product codes.

If a user asks “show me the manual for model ABC-123,” a pure vector search may surface the ABC-122 manual at the top. The two product codes are semantically nearly identical even though the literal match is what matters.

The fix is to combine vector search with a classical keyword search (BM25 or similar) and merge the two scores. You get semantic recall and exact-match precision in the same query.

4. Add a reranker

After the initial retrieval returns its top results, running them through a reranker model dramatically improves the quality of what gets handed to the LLM.

The pattern: let the first stage (vector + keyword) be fast and recall-oriented, returning the top ~20 candidates. Then run a reranker over those 20 to reorder them by how directly they answer the question. The context the LLM ends up with is much cleaner.

Several rerankers now handle Japanese well — Cohere Rerank and BAAI/bge-reranker-v2-m3 are both solid choices.

5. Build an evaluation set and measure continuously

None of the above matters if you cannot tell what worked.

Build a small evaluation set: 30 to 100 actual questions your users ask, paired with their expected answers. Every time you change a setting, run the evaluation set through and look at how retrieval accuracy and answer accuracy changed.

This is the only way to improve without guessing. And because the questions come from real internal use, every accuracy improvement maps directly to a workflow that gets better.

Wrapping up

Getting Japanese RAG to production-quality accuracy is not a matter of translating an English template. The five levers above — chunking, embeddings, hybrid search, reranking, and evaluation — tackled in roughly that order, will get most teams to the level they need.

PC AI’s Saachi is built with all of these baked in. End users do not have to think about tuning to get good Japanese results.

Get in touch if you would like to discuss your setup.