Thursday, 20 August 2015

Data types in Hadoop


There are various data types used in Hadoop such as
int -------->IntWritable

long -------->LongWritable

boolean -------->BooleanWritable

float -------->FloatWritable

byte --------> ByteWritable

We can use the following built-in data types as key and value

Text :This stores a UTF8 text
ByteWritable : This stores a sequence of bytes
VIntWritable and VLongWritable : These stores variable length integer and long values
Nullwritable:This is zero-length Writable type that can be used when you don’t want to use a key or value type

The following hadoop built-in collection data types can only be used as value types

ArrayWritable:This stores an array of values belonging to a Writable type.

Note: You may have question why we use Writable after every simple data types. Because in a big data world, structured objects need to be serialized to a byte stream for moving over the network or persisting to disk on the cluster...and then deserialized back again as needed. When you have vast amounts of data at like Facebook scale to store and move, your data need to be efficient and take as little space to store and time to move as possible.

4 comments:

  1. Data security is about keeping data safe. Many individuals, small businesses and major companies rely heavily on their computer systems. But they should also pay attention to other security means, such as using virtual data rooms.
    virtual data rooms for mergers and acquisitions

    ReplyDelete