What criteria does the AI use to mark MMI responses?

The AI marks MMI responses across five criteria: Empathy, Communication, Reasoning, Reflection, and Real-world Awareness. Each criterion is scored 1-5 per prompt and an overall score is produced for the station. The criteria align with what Australian medical school and specialist training MMI examiners are formally instructed to assess.

What criteria does the AI use to mark CASPer responses?

For CASPer, the AI marks across nine core competencies: Collaboration, Communication, Empathy, Ethics, Fairness, Motivation, Problem Solving, Professionalism, and Service Orientation. These match the competencies ACER assesses in the official CASPer test. Scores are given on a 1-9 scale with detailed written feedback.

How accurate is AI feedback compared to real examiners?

The AI marking is designed to catch the same structural gaps real examiners flag: responses that skip empathy, responses that rush to solutions, responses that state principles without applying them to the scenario, and responses that show no genuine reflection. It will not replicate a specific examiner's personal reaction to a phrase, but it reliably identifies whether a response addresses what examiners formally score.

How the AI Marks Your MMI and CASPer Responses

The short version

The AI reads your transcribed or typed response and assigns scores against the same criteria that real examiners formally use. It is not listening for polish or filler words. It is checking whether you actually addressed what the scenario was testing.

Every score comes with written feedback explaining what was present, what was absent, and what a stronger version of the same response would look like. The goal is to give you the same calibration a good examiner would give you, at a fraction of the cost and with no waiting.

What the AI is not

The AI does not simulate the subjective reaction of any specific examiner. It will not penalise you for a nervous laugh, a long pause, or an unusual word choice. It marks on structural content: did you show empathy, did you reason through the dilemma, did you reflect? That is what moves scores at real interviews too.

MMI marking

How MMI responses are marked

Each MMI station you complete is marked across five criteria. Scores run from 1 to 5 per criterion per prompt. The AI marks every prompt in the station separately, then produces a per-criterion average and an overall station score.

Empathy

Does the response acknowledge the emotional reality of the people in the scenario before reaching for a solution?

Communication

Is the response structured? Does it speak to the person, not past them? Is it free of rehearsed script delivery?

Reasoning

Does the response name and weigh competing values? Is there a clear decision logic, or does it just list considerations?

Reflection

Does the response acknowledge what the candidate does not yet know, or what they would do differently? Does it avoid false certainty?

Real-world Awareness

Does the response show an understanding of how healthcare actually works, not how a textbook says it should work?

What a 1/5 looks like vs a 5/5

Empathy 1/5: The candidate jumps immediately to policy or procedure. The people in the scenario are treated as a problem to manage rather than people to understand. There is no pause for feelings, no acknowledgement that the situation is difficult.

Empathy 5/5: The candidate names the emotional weight of the situation in specific terms. They stay present in the human element of the scenario for at least part of the response, before, not after, any practical action is mentioned.

Reasoning 1/5: The candidate states a conclusion with no visible logic. "I would tell the patient" with no explanation of what competing values were weighed.

Reasoning 5/5: The candidate explicitly names the tension: patient autonomy versus beneficence, for example, or the colleague's wellbeing versus patient safety. They state which value they are weighing more heavily and say why, not just that they are weighing them.

Per-prompt scoring

The AI marks each follow-up question in a station as a separate unit. If you gave a strong answer to the first prompt and then failed to engage with the second, the AI will flag it. This mirrors how MMI examiners are trained: they re-score at each new prompt rather than carrying a halo from the opening answer.

Specialist mode

Specialist mode applies higher bar calibration for ACRRM, RANZCO, RACS, and other vocational training program interviews. The reasoning and real-world awareness criteria are weighted more heavily. An answer that would score 4/5 at medical school level may score 3/5 in specialist mode because it lacks the depth expected of a candidate entering autonomous practice.

CASPer marking

How CASPer responses are marked

CASPer is marked across nine competencies. These mirror the competencies ACER formally assesses in the official CASPer test used by Australian and New Zealand medical programs.

Collaboration

Working constructively with others; sharing credit; avoiding conflict escalation.

Communication

Clarity, structure, and appropriate tone for the audience described in the scenario.

Empathy

Genuine understanding of the perspective of affected parties, not just token acknowledgement.

Ethics

Application of ethical principles with appropriate nuance; avoidance of black-and-white thinking.

Fairness

Equitable treatment of all parties; awareness of bias or conflicting interests.

Motivation

Demonstration of genuine engagement with the scenario rather than performance of expected answers.

Problem Solving

Practical, realistic steps rather than generic or impossibly idealistic resolutions.

Professionalism

Appropriate handling of hierarchy, institutional constraints, and boundaries.

Service Orientation

Consistent orientation toward the needs of patients and communities over personal benefit.

Scores are given on a 1 to 9 scale. The AI also produces a short summary of the strongest element of the response and the single most useful improvement that would move the score.

How the model works

The model and its constraints

The marking is performed by a large language model operating with a low temperature setting. Low temperature means the model is constrained toward consistent, structured outputs rather than creative or unpredictable ones. Two runs of the same response will produce highly similar scores and near-identical feedback themes.

The model reads the full transcript of your spoken response, not just keywords. It can identify that you named empathy in an abstract sense while giving a response that contains no concrete acknowledgement of the person in front of you. It will flag this gap.

The AI is calibrated to the same structural patterns that real examiners are formally trained to reward. Whether it captures the exact human reaction of a specific examiner is a harder question. Whether it catches the structural gaps that will cost you marks at a real interview: yes, reliably.

What the feedback tells you

Every marked response includes:

A score for each criterion (1-5 for MMI, 1-9 for CASPer)
An overall station or response score
A written explanation for each score: what was present and what was absent
A practical note on what a stronger version of the same response would contain
For MMI premium tier: voice quality metrics including pacing, filler word frequency, and clarity

The delta view

When you attempt the same station more than once, your history page shows a criterion-by-criterion comparison between attempts. This lets you confirm that a specific improvement you worked on actually moved the score, rather than inferring improvement from the overall number alone.

Common misunderstandings

What the AI does not penalise

Filler words and hesitations. "Um," "ah," and brief pauses do not reduce your score. In fact, a response with genuine pauses for thought often scores higher on Reasoning and Reflection than a response that rushes through without them.

Imperfect sentence structure. The AI marks spoken transcripts. Spoken language is not syntactically clean and the model knows this. You will not lose marks for a sentence that trails off and restarts.

Agreement with any particular ethical position. The AI does not have a preferred answer to ethical dilemmas. It marks on whether you reasoned through the dilemma, not on which side you landed.

What the AI does penalise

Skipping empathy entirely. If the scenario involves a person in distress and your response goes straight to action without acknowledgement, that is a Empathy score of 1 or 2 regardless of how good the action is.

Generic responses. A response that could be pasted under any scenario in the same category will score poorly on Communication and Real-world Awareness. The AI checks whether your response is specific to the scenario or is essentially a template.

No visible reasoning structure. Naming competing values and concluding is not enough. The response needs to show that you weighed them, meaning it should say something like "I am giving more weight to X because in this context Y is less at risk" rather than just "there is a tension between X and Y."

How the AI marks your MMI and CASPer responses

The short version

How MMI responses are marked

What a 1/5 looks like vs a 5/5

Specialist mode

How CASPer responses are marked

The model and its constraints

What the feedback tells you

The delta view

What the AI does not penalise

What the AI does penalise

See it mark your answer now

Book with Dan

How the AI marks your MMI and CASPer responses

The short version

How MMI responses are marked

What a 1/5 looks like vs a 5/5

Specialist mode

How CASPer responses are marked

The model and its constraints

What the feedback tells you

The delta view

What the AI does not penalise

What the AI does penalise

See it mark your answer now

Related reading

MMI Guide Australia

CASPer Guide Australia

Your feedback history

Book with Dan