1. Context window
How much text a model can hold in working memory at once.
In practice: It's the size of the conversation. You paste your customer feedback dump into Claude and ask "what are the top three complaints," it nails it. You paste five months of Slack threads, your roadmap, and your churn data, then ask the same question. Now it's guessing.
Bigger isn't always better.
2. Context collapse
What happens when you stuff too much into the window. The model loses the plot. Recall drops. Quality tanks.
In practice: That moment in a long ChatGPT or Claude thread where it suddenly forgets the instruction you gave at the start. Makes up a fact. Contradicts something you said three messages ago. That's context collapse.
The fix isn't a bigger window. It's better curation.
3. Guardrails
The rules and filters constraining what a model can say or do.
In practice: The reason your support bot won't write profanity, won't quote your competitor by name, and can't promise refunds it isn't authorized to give. If you're shipping an AI feature without guardrails, you're shipping a liability.
4. Evals
Structured tests that measure model performance on actual tasks. Not vibes. Not demos.
In practice: Before you swap GPT for Claude or upgrade to whatever model is trending on X this week, evals tell you whether it's actually better for your product. If your engineer can't show you their evals, they're guessing too.
5. GraphRAG
Retrieval-augmented generation built on a knowledge graph instead of isolated text chunks.
In practice: You ask "which of our customers are at risk of churn this month?" Vector RAG finds documents that mention "churn" and "customers." GraphRAG follows the relationships from your customer list, to their usage patterns, to their last support ticket, and gives you an actual answer.
Multi-hop reasoning lives here. It's why Gartner just flagged it as a critical enabler for GenAI.
6. Inference
Running a trained model to produce outputs.
In practice: This is where your AI bill actually comes from. Every time someone on your team prompts Claude, every time a customer uses your AI feature, that's an inference call. Training is a one-time investment. Inference is the rent. And most teams aren't tracking it.
7. Chunking
How documents get split before retrieval. Sounds boring. Quietly destroys most RAG systems.
In practice: You feed your help docs into an AI support tool. A customer asks about your refund policy. The model returns the first half of the policy but misses the conditions, because the document was sliced down the middle. That's bad chunking.
Fixed-size chunking ignores meaning. Semantic chunking respects it.
8. KV cache
The stored key-value tensors from attention that let models skip recomputing past tokens. Pronounced "kay-vee cache."
In practice: It's the hidden cost behind long-context features. Every time you reuse a long system prompt across thousands of calls, the KV cache is what makes it fast and cheap. It's also what fills up GPU memory. Long context isn't free. The KV cache is the receipt.
9. Quantization
Shrinking a model by lowering the numerical precision of its weights. FP16 to INT8 to INT4.
In practice: It's why you can now run a respectable model on a laptop instead of needing a data center. Same model, fraction of the memory, almost the same accuracy. It's why every week there's a new open-source model that punches above its weight.
If you skimmed this list with a knot in your stomach, that's the signal. Your AI vocabulary needs an update. Your team is probably ahead of you. Your competition definitely is.
The companies winning with AI in 2026 aren't the ones with the biggest models. They're the ones with the sharpest vocabulary.