Big Data Training ( Hadoop,HDFS, PIG, HIVE): What is the difference between Traditional File System and hadoop Distributed File System(HDFS)?

The fundamental reason that big data mining systems were rare and expensive is that scaling a system to process large data sets is very difficult; as we will see, it has traditionally been limited to the processing power that can be built into a single computer.There are however two broad approaches to scaling a system as the size of the data increases, generally referred to as scale-up and scale-out.

Scale-up

In most enterprises, data processing has typically been performed on impressively large computers with impressively larger price tags. As the size of the data grows, the approach is to move to a bigger server or storage array. The cost of such hardware could easily be measured in hundreds of thousands or in millions of dollars.

The advantage of simple scale-up is that the architecture does not significantly change through the growth. Though larger components are used, the basic relationship (for example, database server and storage array) stays the same. For applications such as commercial database engines, the software handles the complexities of utilizing the available hardware, but in theory, increased scale is achieved by migrating the same software onto larger and larger servers. Note though that the difficulty of moving software

onto more and more processors is never trivial; in addition, there are practical limits on just how big a single host can be, so at some point, scale-up cannot be extended any further. The promise of a single architecture at any scale is also unrealistic. Designing a scale-up system to handle data sets of sizes such as 1 terabyte, 100 terabyte, and 1 petabyte may conceptually apply larger versions of the same components, but the complexity of their connectivity may vary from cheap commodity through custom hardware as the scale increases.

Scale-out

Instead of growing a system onto larger and larger hardware, the scale-out approach spreads the processing onto more and more machines. If the data set doubles, simply use two servers instead of a single double-sized one. If it doubles again, move to four hosts. The obvious benefit of this approach is that purchase costs remain much lower than for scale-up . Server hardware costs tend to increase sharply when one seeks to purchase larger machines , and though a single host may cost $5,000, one with ten times the processing power may cost a hundred times as much. The downside is that we need to develop strategies for splitting our data processing across a fleet of servers and the tools historically used for this purpose have proven to be complex. As a consequence, deploying a scale-out solution has required significant engineering effort; the system developer often needs to handcraft the mechanisms for data partitioning and reassembly, not to mention the logic to schedule the work across the cluster and handle individual machine failures.

Note: In Traditional File System we have to move the large data across the network so the network speed is hampered.

Whereas in Hadoop Distributed File system we don’t move the data ,we move only the task to those large data and the size of the task is lesser than the data size. So it does not affect on network bandwidth.

Big Data Training ( Hadoop,HDFS, PIG, HIVE)

Monday, 17 August 2015

What is the difference between Traditional File System and hadoop Distributed File System(HDFS)?

No comments:

Post a Comment