Questions tagged [hadoop]

Questions related to Apache Hadoop.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.

It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

The project includes these modules:

  • Hadoop Common: The common utilities that support the other Hadoop modules.
  • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
  • Hadoop YARN: A framework for job scheduling and cluster resource management.
  • Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

Licensed under the Apache v2 license

45 questions
5
votes
1 answer

How to installation spark standalone mode in ubuntu

I am trying to install spark standalone but show error. How i can solve this problem. Java version :- 1.8.0_131 Spark:- 2.2.0 Hadoop: 2.7.4 bashrc file setting Hadoop file location in local System: /usr/lib/hadoop/hadoop-2.7.4 Spark file location:…
Shiva Manhar
  • 161
  • 6
2
votes
1 answer

How often does the Impala StateStore refresh?

I am using Hive and Impala on the same cluster. I find that when I create new tables, the Impala StateStore does not refresh automatically even after a few hours. I know that I can accomplish this by running "refresh" in impala-shell (in a cron job…
2
votes
0 answers

Exporting an HBase table which is larger than the available HDFS space?

I have a 1TB hbase table which I need to export. Unfortunately I only have 600GB free in HDFS, so I was hoping to export it to a mounted file share. I've learned that the hbase export can only export to HDFS. Due to the above constraints I lack the…
alienth
  • 121
  • 2
1
vote
0 answers

Enabling JMX For Hadoop HDFS & Also MapReduce

I'm having a hard time figuring out how to enable JMX to submit metrics for HDFS and MapReduce jobs in Hadoop (CDH4). I've seen several links and read through 'The Definitive Guide' and 'Hadoop Operations' on the 'monitoring' chapters and it goes…
Ali Razeghi - AWS
  • 7,518
  • 1
  • 25
  • 38
1
vote
1 answer

MapReduce performance on a single PC

I have heard that Hadoop has better performance than MySQL. Until now, I have used relational databases so this is really new technology for me. I have a 6 core PC. Suppose I have a table with 20 columns and 5million rows. Does Hadoop give better…
Mokus
  • 1,017
  • 4
  • 15
  • 17
0
votes
1 answer

"ZooKeeper exists failed after 4 attempts" when launching Hbase

When launching Hbase I have the following error mike@mike-thinks:~/hbase-1.2.6/bin$ ./hbase shell 2017-11-30 17:26:42,137 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where…
Revolucion for Monica
  • 679
  • 1
  • 10
  • 27