Safety & Responsible Deployment

Question 1

An untrusted document instructs the model to ignore its system prompt. What is this, and a key mitigation?

Accepted Answer

A prompt-injection attack; isolate and clearly label untrusted content and constrain tool permissions — Malicious instructions embedded in retrieved/user content are prompt injection. Mitigations include clearly delimiting and labeling untrusted input, instructing the model not to follow instructions found in data, and least-privilege tool access.

Answer

A rate-limit error; retry with backoff

Answer

A streaming bug; disable streaming

Answer

Expected behavior; no action needed

Question 2

Before shipping an LLM feature, what is the most important practice for measuring quality?

Accepted Answer

An evaluation set with representative cases and clear success criteria, run automatically — Systematic evals — a curated set of representative inputs scored against defined criteria — let you measure quality, catch regressions, and compare prompts/models objectively before and after deployment.

Answer

Manual spot-checks only

Answer

Trusting vibes from a few prompts

Answer

Maximizing `max_tokens`

Question 3

Which technique most directly reduces cost and latency for repeated, large, stable prompt prefixes?

Accepted Answer

Prompt caching — Prompt caching reuses a previously processed, stable prefix (e.g. a large system prompt or document set) across requests, cutting both cost and time-to-first-token on the cached portion.

Answer

Raising temperature

Answer

Adding more few-shot examples

Answer

Using a larger model

Question 4

For high-volume, latency-tolerant background jobs, which approach typically lowers cost the most?

Accepted Answer

Use batch processing and/or a smaller, faster model where quality allows — Non-interactive workloads can use batch APIs (discounted, asynchronous) and right-sized smaller models. Match the model and execution mode to the task's real quality and latency needs rather than defaulting to the biggest model.

Answer

Always use the largest model synchronously

Answer

Increase `max_tokens` for every request

Answer

Disable streaming

Question 5

What should you monitor in production beyond raw error rates?

Accepted Answer

Output quality, latency, token/cost usage, refusal rates, and user feedback — Observability for LLM apps spans quality (via evals/feedback), latency, token and cost usage, refusal/safety signals, and tool success rates — not just HTTP errors. This is what lets you catch drift and regressions.

Answer

Only the number of requests

Answer

Only the model version string

Answer

Nothing; LLM apps are self-correcting

Question 6

A model-driven action could be destructive (e.g. deleting data). What is the safest design?

Accepted Answer

Require a human-in-the-loop confirmation or scoped, reversible permissions for high-impact actions — High-impact or irreversible actions should require human confirmation or be constrained to least-privilege, reversible, well-logged operations. Autonomy is earned for low-risk steps, not granted blanket access to destructive ones.

Answer

Let the agent act fully autonomously to save time

Answer

Raise temperature for better judgment

Answer

Remove logging to reduce overhead

Question 7

You compare two prompts on your eval set. Prompt A scores higher overall but fails a small set of safety-critical cases that Prompt B passes. What is the sound decision?

Accepted Answer

Treat safety-critical cases as gating: do not ship a prompt that regresses them, regardless of aggregate score — Not all eval cases carry equal weight. Safety-critical or must-not-fail cases should act as hard gates, so a higher overall average does not justify shipping a regression on them. Segment evals by severity rather than optimizing a single aggregate.

Answer

Always pick the higher aggregate score

Answer

Average the two prompts together

Answer

Ignore the eval and choose by intuition

Question 8

Why use an LLM-as-judge (model-graded) evaluation instead of exact string matching?

Accepted Answer

It can score open-ended outputs against criteria like correctness, tone, or completeness where many valid wordings exist — Open-ended generations rarely match a fixed string, so a model graded against a clear rubric can assess qualities like factual correctness, tone, and completeness. It should be validated against human judgments, since it adds cost and is not perfectly deterministic.

Answer

It is always cheaper than string matching

Answer

It removes the need for any test cases

Answer

It guarantees deterministic scores

Question 9

Users can submit free-text that becomes part of the prompt to a tool-enabled agent. Which combination best limits prompt-injection blast radius?

Accepted Answer

Least-privilege tool scopes, human confirmation for high-impact actions, and clearly separating untrusted input from instructions — Defense in depth limits damage when injection succeeds: scope tools to least privilege, gate high-impact or irreversible actions behind human approval, and structurally separate and label untrusted input. No single prompt instruction is a reliable sole defense.

Answer

Trust the model to ignore malicious instructions on its own

Answer

Raising `max_tokens` so the model can reason more

Answer

Disabling evals to ship faster