对于Apache项目来说,projectname.apache.org
Hadoop:hadoop.apache.org
Hive:hive.apache.org
Spark:spark.apache.org
HBase:hbase.apache.org
1 master(NameNode/NN) 带 n个 slaves(DataNode/DN)
一个文件会被拆分成多个Block
blocksize:128M
130M -> 2个block:128M+2M
A typical deployment has a dedicated machine that runs only the NameNode software .Each of the other machines in the cluster runs one instance of the DataNode Software. The architecture does not preclude running multiple DataNode on the same machine but in a real deployment that is rarely in case
NameNode + N个DataNode
建议NN和DN是部署在不同的机器上
replication factor:副本因子
All blocks in a file except the last block are the same size