Hadoop Interview Questions and Answers freshers
Which method of the FileSystem object is used for reading a file in HDFS in Hadoop?
A. open()
B. access()
C. select()
D. None of the above
Answer: A
RPC means______. in Hadoop?
A. Remote processing call
B. Remote process call
C. Remote procedure call
D. None of the above
Answer: C
The switch given to “hadoop fs” command for detailed help ?
A. -show
B. -help
C. -?
D. None of the above
Answer: B
The size of block in HDFS in hadoop?
A. 512 bytes
B. 64 MB
C. 1024 KB
D. None of the above
Answer: B
Which MapReduce phase is theoretically able to utilize features of the underlying file system in order to optimize parallel execution in Hadoop?
A. Split
B. Map
C. Combine
Ans: A
What is the input to the Reduce function in Hadoop?
A. One key and a list of all values associated with that key.
B. One key and a list of some values associated with that key.
C. An arbitrarily sized list of key/value pairs.
Ans: A
How can a distributed filesystem such as HDFS provide opportunities for optimization of a MapReduce operation?
A. Data represented in a distributed filesystem is already sorted.
B. Distributed filesystems must always be resident in memory, which is much faster than disk.
C. Data storage and processing can be co-located on the same node, so that most input data relevant to Map or Reduce will be present on local disks or cache.
D. A distributed filesystem makes random access faster because of the presence of a dedicated node serving file metadata.
Ans: D
Which of the following MapReduce execution frameworks focus on execution in shared-memory environments?
A. Hadoop
B. Twister
C. Phoenix
Ans: C
What is the implementation language of the Hadoop MapReduce framework?
A. Java
B. C
C. FORTRAN
D. Python
Ans: A
The Combine stage, if present, must perform the same aggregation operation as Reduce ?
A. True
B. False
Ans: B
Which MapReduce stage serves as a barrier, where all previous stages must be completed before it may proceed?
A. Combine
B. Group (a.k.a. 'shuffle')
C. Reduce
D. Write
Ans: A
Which TACC resource has support for Hadoop MapReduce?
A. Ranger
B. Longhorn
C. Lonestar
D. Spur
Ans: A
Which of the following scenarios makes HDFS unavailable in Hadoop?
A. JobTracker failure
B. TaskTracker failure
C. DataNode failure
D. NameNode failure
E. Secondary NameNode failure
Answer: A
Which TACC resource has support for Hadoop MapReduce in Hadoop?
A. Ranger
B. Longhorn
C. Lonestar
D. Spur
Ans: A
Which MapReduce stage serves as a barrier, where all previous stages must be completed before it may proceed in Hadoop?
A. Combine
B. Group (a.k.a. 'shuffle')
C. Reduce
D. Write
Ans: A
Which of the following scenarios makes HDFS unavailable in Hadoop?
A. JobTracker failure
B. TaskTracker failure
C. DataNode failure
D. NameNode failure
E. Secondary NameNode failure
Answer: A
You are running a Hadoop cluster with all monitoring facilities properly configured. Which scenario will go undetected in Hadoop?
A. Map or reduce tasks that are stuck in an infinite loop.
B. HDFS is almost full.
C. The NameNode goes down.
D. A DataNode is disconnectedfrom the cluster.
E. MapReduce jobs that are causing excessive memory swaps.
Answer: C
Which of the following utilities allows you to create and run MapReduce jobs with any executable or script as the mapper and/or the reducer?
A. Oozie
B. Sqoop
C. Flume
D. Hadoop Streaming
Answer: D
You need a distributed, scalable, data Store that allows you random, realtime read/write access to hundreds of terabytes of data. Which of the following would you use in Hadoop?
A. Hue
B. Pig
C. Hive
D. Oozie
E. HBase
F. Flume
G. Sqoop
Answer: E
Workflows expressed in Oozie can contain in Hadoop?
A. Iterative repetition of MapReduce jobs until a desired answer or state is reached.
B. Sequences of MapReduce and Pig jobs. These are limited to linear sequences of actions with exception handlers but no forks.
C. Sequences of MapReduce jobs only; no Pig or Hive tasks or jobs. These MapReduce sequences can be combined with forks and path joins.
D. Sequences of MapReduce and Pig. These sequences can be combined with other actions including forks, decision points, and path joins.
Answer: D
You have an employee who is a Date Analyst and is very comfortable with SQL. He would like to run ad-hoc analysis on data in your HDFS duster. Which of the following is a data warehousing software built on top of Apache Hadoop that defines a simple SQL-like query language well-suited for this kind of user?
A. Pig
B. Hue
C. Hive
D. Sqoop
E. Oozie
F. Flume
G. Hadoop Streaming
Answer: C
Which of the following statements most accurately describes the relationship between MapReduce and Pig?
A. Pig provides additional capabilities that allow certain types of data manipulation not possible with MapReduce.
B. Pig provides no additional capabilities to MapReduce. Pig programs are executed as MapReduce jobs via the Pig interpreter.
C. Pig programs rely on MapReduce but are extensible, allowing developers to do special-purpose processing not provided by MapReduce.
D. Pig provides the additional capability of allowing you to control the flow of multiple MapReduce jobs.
Answer: D
In a MapReduce job, you want each of you input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?
A. Increase the parameter that controls minimum split size in the job configuration.
B. Write a custom MapRunner that iterates over all key-value pairs in the entire file.
C. Set the number of mappers equal to the number of input files you want to process.
D. Write a custom FileInputFormat and override the method isSplittable to always return false.
Answer: B
Which of the following best describes the workings of TextInputFormat in Hadoop?
A. Input file splits may cross line breaks. A line thatcrosses tile splits is ignored.
B. The input file is split exactly at the line breaks, so each Record Reader will read a series of complete lines.
C. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReaders of both splits containing the brokenline.
D. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the end of the brokenline.
E. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the beginningof thebroken line.
Answer: D
More Info:
Nice content presentation! Thanks for putting the efforts on gathering useful content and sharing here. You can find more Hadoop interview related question and answers in the below forum.
ReplyDeleteHadoop interview questions and answers