How These AWS AI Practitioner Practice Questions Are Organized
These 30 questions cover all five AIF-C01 exam domains: Fundamentals of AI and ML (Q1–Q6), Fundamentals of Generative AI (Q7–Q12), Applications of Foundation Models (Q13–Q18), Guidelines for Responsible AI (Q19–Q24), and Security, Compliance, and Governance (Q25–Q30). Each question includes the correct answer and a detailed explanation. The real exam has 65 questions (50 scored + 15 unscored), a 90-minute time limit, and a passing score of 700 out of 1,000. Registration is through aws.amazon.com/certification/certified-ai-practitioner for $100.
Domain 1: Fundamentals of AI and ML (Questions 1–6)
Q1. A retail company wants to segment its customers into groups based on purchasing behavior, without any predefined categories. Which type of machine learning is most appropriate?
Answer: Unsupervised learning (clustering). Unsupervised learning finds patterns in unlabeled data without predefined output categories. Supervised learning requires labeled training examples. The key phrase "without predefined categories" indicates unsupervised clustering.
Q2. An ML model achieves 98% accuracy on training data but only 71% accuracy on the validation dataset. What problem does this describe?
Answer: Overfitting. Overfitting occurs when a model learns the training data too specifically—including its noise—and fails to generalize to new data. Solutions include regularization, reducing model complexity, or collecting more diverse training data. The large gap between training and validation performance is the diagnostic signal.
Q3. A medical imaging company needs to detect rare diseases in X-ray scans. Missing a disease (false negative) is far more costly than a false alarm (false positive). Which evaluation metric should they prioritize?
Answer: Recall (also called sensitivity or true positive rate). Recall measures what proportion of actual positive cases the model correctly identifies. When false negatives are the most costly error type—as in disease detection—maximizing recall is the priority, even if it means accepting more false positives.
Q4. A company trains an ML model on customer service transcripts from the past five years, then deploys it to classify new tickets. The model performs significantly worse six months after deployment. What is the most likely cause?
Answer: Data drift (also called concept drift or distributional shift). The statistical properties of the input data have changed over time—new product categories, changed customer language, different ticket types—making the training data no longer representative of current inputs. Monitoring for data drift and retraining periodically addresses this.
Q5. Which of the following best describes the difference between artificial intelligence and machine learning?
Answer: Machine learning is a subset of AI in which systems learn patterns from data rather than following explicitly programmed rules. AI is the broader category of systems performing tasks that typically require human intelligence. ML is one specific approach within AI—learning from data. Other AI approaches include rule-based expert systems and search algorithms.
Q6. During the ML pipeline, a data scientist discovers that the training dataset contains 95% non-fraudulent transactions and only 5% fraudulent ones. What problem does this describe, and what is a common solution?
Answer: Class imbalance. Common solutions include oversampling the minority class (e.g., SMOTE), undersampling the majority class, or using class-weighted loss functions during training. A model trained on severely imbalanced data will learn to predict the majority class almost always, achieving high accuracy while failing at the actual task (detecting fraud).
Domain 2: Fundamentals of Generative AI (Questions 7–12)
Q7. A developer is building a customer support chatbot and wants it to always respond in a formal, professional tone. Which component of the prompt structure is most appropriate for setting this behavior?
Answer: A system prompt. System prompts set the model's overall behavior, persona, and constraints before any user interaction occurs. They persist across the conversation and are the correct mechanism for enforcing consistent tone, format, or behavioral guidelines.
Q8. A company's foundation model frequently produces outdated information about its product catalog, which changes weekly. The company does not want to retrain the model every week. Which approach addresses this?
Answer: Retrieval-augmented generation (RAG). RAG retrieves relevant, up-to-date documents at inference time and includes them in the prompt context, grounding the model's response in current information without modifying model weights. Fine-tuning would require weekly retraining—expensive and slow. RAG provides freshness without retraining.
Q9. Which scenario is best suited for fine-tuning a foundation model rather than using RAG?
Answer: A company wants the model to consistently use its internal legal terminology and response style across all outputs. Fine-tuning adapts model behavior and style by updating weights on domain-specific training data. It is best when the goal is changing how the model writes or reasons—not just what information it accesses. RAG is better for adding factual knowledge; fine-tuning is better for adapting behavior and tone.
Q10. A prompt reads: "Classify the following customer review as positive, neutral, or negative. Here are three examples: [example 1 - positive], [example 2 - neutral], [example 3 - negative]. Now classify: [review]." What prompting technique is this?
Answer: Few-shot prompting. Few-shot prompting provides multiple examples (two or more) of the desired task before the actual request. Three examples makes this few-shot (not zero-shot or one-shot). The examples guide the model on format and classification criteria.
Q11. A generative AI model produces a highly detailed but completely fabricated product specification for a product that does not exist. This information is presented confidently. What term describes this behavior?
Answer: Hallucination. LLMs generate statistically plausible continuations of text, which can produce confident-sounding false information. Hallucination is a known limitation of generative AI models. Mitigation approaches include RAG (grounding responses in verified documents) and human review before publication.
Q12. A model's context window is 8,192 tokens. A user submits a 10,000-token document for summarization. What happens?
Answer: The model cannot process the entire document in a single inference call—the input exceeds the context window limit. Solutions include chunking the document into segments small enough for the context window, using a model with a larger context window, or using a map-reduce summarization approach that summarizes chunks and then combines the summaries.
Domain 3: Applications of Foundation Models (Questions 13–18)
Q13. A company needs to extract structured data—invoice numbers, dates, and line items—from thousands of scanned PDF invoices. Which AWS service is most appropriate?
Answer: Amazon Textract. Textract is purpose-built for extracting text and structured data from scanned documents, forms, and PDFs—including tables and key-value pairs. Rekognition handles image analysis but is not optimized for document data extraction. Comprehend extracts entities from text but requires text input, not raw document images.
Q14. A company wants to build a chatbot that answers questions about its internal knowledge base using natural language. The knowledge base is stored as thousands of Word documents. Which combination of AWS services is most appropriate?
Answer: Amazon Kendra (for document indexing and retrieval) + Amazon Bedrock (for response generation). This is a RAG architecture: Kendra retrieves relevant document passages in response to a query; Bedrock generates a natural language answer grounded in those passages. Amazon Lex handles conversation flow if needed.
Q15. A media company needs to automatically moderate user-uploaded images to detect explicit content at scale. Which AWS service handles this?