Claude API Fundamentals

Question 1

In the Messages API, what determines the maximum length of the model's response?

Accepted Answer

The `max_tokens` parameter — `max_tokens` caps how many tokens the model may generate in its response. It is a hard upper bound; the model may stop earlier (e.g. an `end_turn` stop reason). It does not control input length.

Answer

The `temperature` parameter

Answer

The size of the system prompt

Answer

The number of messages in the conversation

Question 2

Which field carries instructions that set the model's role and behavior for the whole conversation, separate from the turn-by-turn dialogue?

Accepted Answer

The `system` parameter — The top-level `system` parameter holds the system prompt: persistent role, tone, and task framing. The `messages` array carries the alternating user/assistant turns.

Answer

The first `user` message

Answer

A `developer` role message inside `messages`

Answer

The `metadata` field

Question 3

What is the required structure of the `messages` array?

Accepted Answer

Roles must alternate, starting with `user` — Conversational turns alternate between `user` and `assistant`, and the array must begin with a `user` turn. The system prompt is passed separately, not as a message role.

Answer

Any order of roles is accepted

Answer

It must start with a `system` message

Answer

All messages must use the `assistant` role

Question 4

You set `temperature: 0`. What behavior should you expect?

Accepted Answer

More deterministic, focused output — Lower temperature reduces randomness, producing more deterministic and focused responses. Higher values increase diversity. Use low temperature for extraction, classification, and other tasks that reward consistency.

Answer

The model refuses to answer

Answer

Maximum creativity and variation

Answer

The response is truncated

Question 5

A request returns `stop_reason: "max_tokens"`. What does this indicate?

Accepted Answer

The output hit the `max_tokens` limit and was cut off — `max_tokens` as a stop reason means generation was truncated at the token cap, not that the model completed naturally (`end_turn`). Increase `max_tokens` or design for continuation if you see this on complete-looking tasks.

Answer

The model finished its turn naturally

Answer

A stop sequence was matched

Answer

The input exceeded the context window

Question 6

When you stream a response, how is the content delivered?

Accepted Answer

As a sequence of server-sent events with incremental deltas — Streaming uses server-sent events: the response arrives as incremental delta events you accumulate client-side, which lowers time-to-first-token for interactive UIs.

Answer

As one final JSON object only

Answer

As a WebSocket binary frame

Answer

Streaming returns the same payload as non-streaming, just slower

Question 7

Which factor counts against the model's context window?

Accepted Answer

All input tokens plus the tokens the model generates — The context window bounds the total of input tokens (system prompt, full message history, tool definitions, documents) plus the output tokens generated. As a conversation grows, input usage rises and leaves less room for output.

Answer

Only the generated output tokens

Answer

Only the system prompt

Answer

Only the number of messages, not their length

Question 8

You need both diverse, creative output and a hard limit on response length. Which two parameters address those separately?

Accepted Answer

`temperature` for diversity, `max_tokens` for length — These are independent levers: `temperature` (and top_p) controls randomness/diversity of sampling, while `max_tokens` caps how long the response may run. Raising one does not affect the other.

Answer

`max_tokens` for both

Answer

`system` for diversity, `temperature` for length

Answer

`stop_sequences` for both

Question 9

What is the role of a `stop_sequences` value in a request?

Accepted Answer

It lists strings that, when generated, end the response with a `stop_sequence` stop reason — Stop sequences are custom strings that halt generation as soon as the model produces one. The response ends with a `stop_sequence` stop reason, which is useful for delimiting structured output or cutting off at a known boundary.

Answer

It sets the minimum output length

Answer

It filters unsafe content from the output

Answer

It defines which tools the model may call