100 Must-Know AI & NLP MCQs to Crack Your Next Competitive Exam

Top 100 AI MCQs: NLP, Text Processing, and LLMs - Competitive Exam Special

AI Competitive Exam Drill: NLP & LLMs

Correct: 0

Wrong: 0

Live Score: 0/100

Artificial Intelligence Comprehensive MCQs

Master Natural Language Processing (NLP), Text Processing, Language Understanding, Chatbots, and Large Language Models (LLMs). Perfect preparation for advanced technical interviews and competitive computer science examinations.

Part 1: Text Processing (Q1 - Q25)

Q1. Which of the following normalizations reduces inflected words to their base form mapping directly to a valid dictionary word?

Explanation: Lemmatization uses morphological analysis and a vocabulary dictionary to return the true base form (lemma), unlike stemming which blindly chops off suffixes.

Q2. Which subword tokenization algorithm initializes its vocabulary with individual characters and iteratively merges the most frequent symbol pairs?

Explanation: BPE counts frequent adjacent pairs of bytes/characters and iteratively merges them to form higher-level tokens.

Q3. What problem does the Porter Stemmer attempt to address using programmatic heuristic rules?

Explanation: The Porter Stemmer relies on explicit, cascading algorithmic rules to strip suffixes from English words to canonicalize text.

Q4. In Text Processing, what does the term 'Stop Words' typically refer to?

Explanation: Stop words (like 'the', 'is', 'at') are frequently filtered out during indexing because they carry minimal unique structural meaning for retrieval.

Q5. Which technique transforms text into fixed-length numerical vectors based solely on cumulative term occurrence counts?

Explanation: Bag-of-Words discards sequential grammar and word position, focusing strictly on token frequency distributions.

Q6. What formula computes Inverse Document Frequency (IDF) for term $t$ given total documents $D$?

Explanation: $\log(D / \text{DF}(t))$ penalizes common words appearing in many documents, boosting rare distinctive terms.

Q7. Which regular expression character is used to match the exact start of a string or line?

Explanation: The caret symbol (^) serves as an anchor asserting the match starts at the beginning of the line/string.

Q8. If an N-gram model operates at a value of $N=3$, what is it called?

Explanation: An $N=3$ model processes sliding sequences of three consecutive tokens, termed a trigram.

Q9. What occurs during under-stemming in text preprocessing?

Explanation: Under-stemming occurs when algorithmic parameters are too conservative, failing to group morphologically related inflections together.

Q10. Which tokenization approach natively eliminates Out-of-Vocabulary (OOV) errors by falling back to character levels?

Explanation: Subword tokenization (like BPE, WordPiece) breaks novel words down into component pieces or base characters, removing OOV errors.

Q11. What is the fundamental goal of text normalization?

Explanation: Normalization merges variations (e.g., lowercasing, converting "u.s.a" to "usa") so downstream algorithms process them identically.

Q12. In the context of string distance metrics, what operations are permitted to compute Levenshtein distance?

Explanation: Levenshtein distance measures the minimum number of single-character insertions, deletions, or substitutions required to change one string into another.

Q13. Why is text lowercasing applied cautiously in production competitive NLP setups?

Explanation: Lowercasing masks proper nouns (e.g., proper companies or surnames), severely confusing NER extractors.

Q14. What does a Document-Term Matrix represent?

Explanation: A Document-Term Matrix (DTM) organizes counts or TF-IDF weights across a collection of text documents for structural vector algebra.

Q15. Which of the following is an example of an over-stemming error?

Explanation: Over-stemming occurs when words with distinct meanings are aggressively stripped to the same root, losing semantic clarity.

Q16. What is the vocabulary size of a model using exact one-hot encoding on 5,000 unique tokens?

Explanation: One-hot encoding creates a vector with a length equal to the exact vocabulary size, where one bit is high and all others are zero.

Q17. Which of these is a major weakness of the Bag-of-Words representation model?

Explanation: By reducing text to unstructured frequency buckets, BoW discards context (e.g., "not bad" vs "bad, not").

Q18. What regex token combination matches zero or more occurrences of any generic character?

Explanation: The dot (.) represents any character, and the asterisk (*) indicates zero or more repetitions of that character.

Q19. In SentencePiece tokenization, how are whitespaces natively preserved within the input text?

Explanation: SentencePiece treats input text as a raw character stream, swapping spaces for a specific meta-symbol to remain reversible.

Q20. What text processing operation removes HTML tags, stripping anchors like `<p>` or `<div>`?

Explanation: Text cleaning or sanitization extracts raw human readable sentences while removing noise like structural code tags.

Q21. If a word occurs in every single document in a corpus, what will its final TF-IDF weight be?

Explanation: Since $D / \text{DF}(t) = 1$, $\log(1) = 0$. This nullifies the overall score, identifying the word as a corpus-wide non-discriminative term.

Q22. Which process matches variable phrase strings like "Jan 1st, 2026" or "01/01/2026" into a unified internal date format?

Explanation: Canonicalization involves mapping diverse variations of a semantic concept to a singular standard baseline structure.

Q23. What character sequence does the regex pattern `\d+` match?

Explanation: `\d` targets numbers from 0 to 9, and the modifier `+` requires at least one or more consecutive matches.

Q24. What is the effect of applying Zipf's Law to natural language processing?

Explanation: Zipf's law indicates that word frequency is inversely proportional to its rank, explaining why stop words dominate corpus counts.

Q25. Which library is widely used in Python specifically for rule-based text processing and tokenization?

Explanation: NLTK (Natural Language Toolkit) is a foundational Python library built for educational and standard rule-based NLP text processing.

Part 2: NLP Fundamentals (Q26 - Q50)

Q26. What task is performed when words are annotated as Nouns, Verbs, Adjectives, or Adverbs?

Explanation: POS tagging assigns grammatical categories to tokens based on both their definition and context.

Q27. In Named Entity Recognition (NER), which of the following would be extracted from the phrase "Google acquired YouTube in 2006"?

Explanation: NER identifies and classifies real-world entities into predefined categories like organizations, locations, and temporal values.

Q28. What architecture implements a continuous bag-of-words (CBOW) setup by predicting a target word from its surrounding context?

Explanation: Google's Word2Vec introduced CBOW (predicting target from context) and Skip-gram (predicting context from target).

Q29. What is the fundamental difference between Word2Vec and GloVe?

Explanation: GloVe (Global Vectors) utilizes matrix factorization on global token co-occurrence statistics, whereas Word2Vec relies on local context optimization.

Q30. What metric measures the angle between two word vectors to determine semantic similarity?

Explanation: Cosine similarity normalizes vector magnitude, measuring direction overlap to evaluate semantic alignment.

Q31. Which parser generates a tree structural model where nodes are connected by directed grammatical links?

Explanation: Dependency parsing establishes binary grammatical relationships directly between words (e.g., subject, object, modifier), rather than grouping them into nested phrases.

Q32. What is the purpose of the Viterbi Algorithm in hidden Markov models for POS tagging?

Explanation: The Viterbi algorithm uses dynamic programming to efficiently compute the optimal sequence of hidden states in an HMM graph.

Q33. What is syntactic ambiguity in natural language?

Explanation: Syntactic ambiguity arises from structural placement (e.g., "I saw the man with the telescope" could mean I used the telescope, or the man had it).

Q34. Which framework groups phrases into structural components like Noun Phrase (NP) and Verb Phrase (VP)?

Explanation: Constituency parsing models sentence structure by recursively nesting phrasal components according to formal grammar rules.

Q35. What limitation of traditional static word embeddings (like Word2Vec) led to the development of contextual models?

Explanation: Static embeddings assign the same vector to homonyms, merging distinct meanings into a single vector representation.

Q36. In the BIO tagging scheme used for entity extraction, what do the acronym letters represent?

Explanation: BIO tags trace token positions in multi-word entities: B denotes the start, I indicates internal continuation, and O represents non-entity tokens.

Q37. What is a "Language Model" fundamentally trained to estimate?

Explanation: A language model computes the probability $P(w_1, w_2, \dots, w_n)$ of a token sequence or predicts the next token given preceding context.

Q38. Which neural architecture uses feedback loops to process sequential inputs, passing historical states across structural time blocks?

Explanation: RNNs feature internal recurrent connections that preserve sequence history across time steps, making them well-suited for natural language modeling.

Q39. What structural problem causes Vanilla RNNs to lose long-range context dependencies during backpropagation?

Explanation: Repeatedly multiplying small matrix weights over long sequences causes gradients to shrink exponentially, preventing early layers from updating effectively.

Q40. How does a Long Short-Term Memory (LSTM) network prevent the vanishing gradient problem?

Explanation: LSTMs introduce input, forget, and output gates along with a cell state, allowing gradients to flow uninterrupted through time steps.

Q41. What is the main structural simplification introduced by a Gated Recurrent Unit (GRU) compared to an LSTM?

Explanation: GRUs simplify the architecture by combining the cell and hidden states, utilizing only two gates (reset and update) for faster training.

Q42. In NLP, what does "Lemmatization" require that "Stemming" does not?

Explanation: True lemmatization requires the word's POS context (e.g., determining whether "saw" functions as a noun or a verb) to resolve the correct base lemma.

Q43. What is the primary purpose of applying Add-1 (Laplace) Smoothing to an N-gram language model?

Explanation: Laplace smoothing adds one to all counts, ensuring that unseen N-grams receive a small non-zero probability instead of zeroing out the entire sequence product.

Q44. Which algorithm can train word representations using subword information, allowing vectors to be generated even for Out-Of-Vocabulary terms?

Explanation: FastText models words as bags of character n-grams, enabling it to construct representations for unseen words based on subword strings.

Q45. What represents the vector space arithmetic formula demonstrating semantic relational properties within embeddings?

Explanation: This famous analogy demonstrates that vector offsets capture semantic relationships (like gender) across structured geometric axes.

Q46. What NLP task automatically categorizes a full text block into labels like 'Spam' or 'Not Spam'?

Explanation: Text classification assigns a categorical label to an entire document or sequence based on its global text features.

Q47. What does perplexity evaluate in language modeling?

Explanation: Perplexity is the exponentiated cross-entropy of a model on test text; a lower perplexity indicates higher prediction confidence.

Q48. Which tool is a specialized library optimized for industrial-strength deep learning pipelines and production-grade NLP parsing?

Explanation: spaCy is a modern, high-performance Python library optimized for production NLP pipelines and deep learning integrations.

Q49. What is a "Chomsky Normal Form" parsing constraint in context-free grammars?

Explanation: Chomsky Normal Form simplifies context-free grammars by requiring all production rules to yield either two non-terminals or a single terminal.

Q50. What core problem does a Conditional Random Field (CRF) layer solve in structured sequence labeling?

Explanation: CRFs optimize sequence labeling by modeling the global dependencies among adjacent labels, preventing invalid sequences like an `I-ORG` tag following an `O` tag.

Part 3: Language Understanding (Q51 - Q75)

Q51. What task resolves which distinct real-world expressions or pronouns refer to the same entity (e.g., mapping "Michael" and "he" to the same person)?

Explanation: Coreference resolution links expressions in a text to the correct underlying real-world entities.

Q52. In Semantic Role Labeling (SRL), what is the primary objective?

Explanation: SRL identifies semantic roles (such as agent, patient, or instrument) associated with the predicates in a sentence.

Q53. What type of text summarization creates novel sentences to compress source text instead of recycling existing phrases?

Explanation: Abstractive summarization generates entirely new phrasing to paraphrase source text, whereas extractive summarization copies key phrases directly.

Q54. What challenge describes an evaluation system struggling with sentiment sarcasm, such as "I love waiting hours in line!"?

Explanation: Sarcasm relies on pragmatic context, where the literal definition of the words contrasts with the speaker's true intent.

Q55. Which architectural bottleneck does the "Attention Mechanism" solve in Seq2Seq machine translation models?

Explanation: Attention allows the decoder to look back at all source token hidden states, eliminating the reliance on a single fixed context vector.

Q56. What type of ambiguity occurs when a single word token has multiple distinct dictionary definitions, such as "crane"?

Explanation: Lexical ambiguity occurs when an individual word has multiple distinct meanings, requiring contextual disambiguation.

Q57. What standard task is defined by determining if a premise sentence logically entails, contradicts, or remains neutral toward a hypothesis sentence?

Explanation: NLI or RTE frameworks assess structural semantic relationships by classifying whether a premise entails, contradicts, or is neutral toward a hypothesis.

Q58. Which metric calculates geometric precision matches of n-grams to evaluate automated machine translation quality against human references?

Explanation: BLEU (Bilingual Evaluation Understudy) evaluates translation quality by measuring n-gram overlap between model outputs and human references.

Q59. What standard evaluation benchmark checks models for multi-task language understanding across domains like math, history, and law?

Explanation: MMLU evaluates zero-shot and few-shot multi-task language understanding across various academic subjects.

Q60. Which of these describes an anaphora resolution subproblem?

Explanation: Anaphora resolution links a pronoun back to its preceding noun phrase anchor.

Q61. What does a "Dependency Relation" link indicate between two tokens in a sentence?

Explanation: Dependency relationships map asymmetric syntactic links between a structural head token and its modifiers.

Q62. What lexical resource organizes English words into synsets (sets of synonyms) linked by semantic relations?

Explanation: WordNet is a widely used lexical database that groups nouns, verbs, and adjectives into cognitive synonym sets called synsets.

Q63. What relationship exists between a "Hypernym" and a "Hyponym"?

Explanation: Hypernymy captures hierarchical "is-a" relationships, mapping a general category down to specific hyponym instances.

Q64. What evaluation metric measures overlap between candidate summaries and reference summaries using recall metrics over n-grams?

Explanation: ROUGE (Recall-Oriented Understudy for Gisting Evaluation) measures n-gram recall, making it well-suited for evaluating text summarization models.

Q65. What challenge is illustrated by the Winograd Schema Challenge (e.g., "The trophy did not fit in the suitcase because it was too big. What was too big?")?

Explanation: Winograd schemas present pairs of sentences with a pronoun whose reference changes based on small lexical updates, requiring commonsense reasoning to resolve.

Q66. What layer handles semantic sentence encoding in a standard bi-directional recurrent encoder architecture?

Explanation: Bidirectional networks concatenate internal hidden states from forward and backward passes to capture context from both sides of a token.

Q67. What aspect of language does "Pragmatics" study?

Explanation: Pragmatics analyzes how context, social conventions, and intent shape interpretation beyond literal dictionary meanings.

Q68. What is the distinction between structural syntax and semantics?

Explanation: As Noam Chomsky demonstrated with "Colorless green ideas sleep furiously," a sentence can be syntactically correct while remaining semantically nonsensical.

Q69. Which type of vector space model extracts hidden topics from a document collection using singular value decomposition on term matrices?

Explanation: LSA applies Singular Value Decomposition (SVD) to a document-term matrix to discover latent semantic structures in text.

Q70. What does a "Sentiment Lexicon" contain?

Explanation: Sentiment lexicons (like SentiWordNet) map explicit sentiment valence weights directly to words or synsets for rule-based polarity calculation.

Q71. What framework evaluates language models on their ability to complete text transformations across a unified broad suite called GLUE?

Explanation: GLUE is a collection of diverse natural language understanding tasks designed to evaluate model performance across multiple domains.

Q72. Which relationship model maps a specific part-to-whole connection between two lexical tokens (e.g., "wheel" to "car")?

Explanation: Meronymy is a semantic relation where one term denotes a constituent part of, or a member of, something larger.

Q73. What is the primary purpose of Intent Detection in natural language understanding pipelines?

Explanation: Intent detection identifies the user's core objective (e.g., "book_flight" or "check_weather") to guide downstream systems.

Q74. In slot-filling tasks for conversational text, what are "Slots"?

Explanation: Slots are the specific variables (e.g., departure date, destination city) required to fulfill a detected intent.

Q75. What problem does a context-free grammar face when parsing highly flexible, free-word-order languages like Sanskrit or Finnish?

Explanation: CFGs rely on ordered structural rules, which can lead to a combinatorial explosion when trying to model languages with flexible word order.

Part 4: Chatbots and LLMs (Q76 - Q100)

Q76. Which core mechanism enables the Transformer architecture to process all tokens in parallel, replacing sequential recurrent steps?

Explanation: Scaled Dot-Product Self-Attention allows Transformers to model relationships between all tokens in a sequence simultaneously, enabling efficient parallel training.

Q77. What type of Transformer model is BERT?

Explanation: BERT (Bidirectional Encoder Representations from Transformers) uses an encoder architecture to process context from both left and right simultaneously.

Q78. What objective is used to pre-train autoregressive language models like the GPT family?

Explanation: GPT models are trained on causal language modeling, optimizing the network to predict the next token given preceding context.

Q79. Why do Transformers require "Positional Encodings" added directly to their input word embeddings?

Explanation: Without recurrent loops or convolutional windows, self-attention treats sequences as unordered bags of tokens, requiring positional encodings to preserve word order.

Q80. What training step aligns a base LLM with human preferences using feedback scores via reinforcement learning?

Explanation: RLHF optimizes language model outputs by training a reward model on human feedback and fine-tuning the LLM using PPO reinforcement learning.

Q81. What phenomenon occurs when an LLM confidently generates factually incorrect or ungrounded assertions?

Explanation: Hallucination refers to the generation of grammatically correct but factually inaccurate or fabricated information by a model.

Q82. Which architecture enhances LLM accuracy by retrieving relevant knowledge documents from an external vector store to ground the generation process?

Explanation: RAG combines retrieval models with generative LLMs, augmenting the prompt context with relevant documents from a vector index to mitigate hallucinations.

Q83. What occurs when adjusting the generation "Temperature" parameter of an LLM closer to 0?

Explanation: Lowering the temperature flattens the soft-max probability distribution, prioritizing high-probability tokens and making the output more deterministic.

Q84. What sampling method limits token selection to a dynamic subset whose cumulative probability reaches a specific threshold $p$?

Explanation: Top-p (nucleus) sampling dynamically scales the selection pool to include only the highest-probability tokens whose combined probability meets the threshold $p$.

Q85. In prompt engineering, what does "Few-Shot Prompting" involve?

Explanation: Few-shot prompting provides example pairs directly within the context window, leveraging the model's in-context learning capabilities without updating its weights.

Q86. What strategy prompts an LLM to decompose a complex problem into a sequence of intermediate logical steps?

Explanation: Chain-of-Thought prompting encourages the model to generate intermediate reasoning steps, improving its performance on complex reasoning tasks.

Q87. What is the main structural benefit of parameter-efficient fine-tuning techniques like LoRA (Low-Rank Adaptation)?

Explanation: LoRA freezes the original model weights and injects trainable low-rank decomposition matrices into the attention layers, significantly reducing memory overhead during fine-tuning.

Q88. What is the purpose of the "KV Cache" in LLM inference engines?

Explanation: The KV Cache stores attention states for past tokens, accelerating the autoregressive generation of subsequent tokens by avoiding redundant computations.

Q89. Which attention variant scales down the memory footprint of long-context Transformers by computing attention in small, compressed blocks?

Explanation: FlashAttention optimizes memory and speed by computing exact attention in blocks, minimizing read/write overhead between GPU memory levels.

Q90. What limitation does an LLM's "Context Window" impose?

Explanation: The context window defines the maximum sequence length the model can process at once, limited by the memory requirements of the self-attention mechanism.

Q91. Which quantization strategy reduces a model's weights from 16-bit floating-point (FP16) to 4-bit integers (INT4)?

Explanation: Quantization compresses model weights into lower-precision representations, significantly reducing memory usage and accelerating inference with minimal loss in accuracy.

Q92. What occurs during the "Alignment" phase of LLM training?

Explanation: Alignment uses techniques like fine-tuning and reinforcement learning (e.g., RLHF, DPO) to make model outputs safer and more helpful.

Q93. What decoding technique uses a fixed width $B$ to keep track of the top $B$ most probable candidate token sequences at each step?

Explanation: Beam search expands on greedy decoding by maintaining a fixed number (beam width) of high-probability paths at each step.

Q94. In a Transformer block, what follows the multi-head self-attention layer?

Explanation: Each Transformer layer processes tokens through a multi-head self-attention mechanism followed by a position-wise feed-forward network, using residual connections and layer normalization.

Q95. What is Direct Preference Optimization (DPO) used for in LLM pipelines?

Explanation: DPO simplifies preference alignment by optimizing the model directly on human preferences, bypassing the need to train a separate reward model as required in RLHF.

Q96. What metric tracks the computational complexity of an LLM call by counting structural text fragments?

Explanation: API providers meter costs and track model limits based on token counts processed through the input and output context windows.

Q97. What design pattern coordinates LLMs, external APIs, and vector stores into an autonomous loop to accomplish complex multi-step goals?

Explanation: Agentic frameworks use reasoning loops (like Reason + Act) to allow LLMs to call external tools, evaluate observations, and dynamically adjust their strategy to solve goals.

Q98. What open ecosystem hosts thousands of pre-trained open-weights language models, tokenizers, and datasets?

Explanation: Hugging Face is the leading hub for sharing and deploying open-weights models, datasets, and NLP pipelines.

Q99. What security vulnerability describes crafting inputs to bypass an LLM's safety guardrails and force it to execute prohibited actions?

Explanation: Prompt injection and jailbreaking trick the model into overriding its system instructions to generate restricted or unsafe content.

Q100. What architecture serves as the foundation for modern conversational LLMs like ChatGPT, Claude, and Gemini?

Explanation: The Transformer architecture, introduced by Vaswani et al. in 2017, forms the foundational core of modern state-of-the-art large language models.

100 Must-Know AI & NLP MCQs to Crack Your Next Competitive Exam