search

Parallel Computing Technologies of Big Data

Please note! This essay has been submitted by a student.

Download PDF

Abstract

Due to huge boom in Big Data, Parallel computing technologies have become a must needed technological architecture to deal with analysis of Big Data. This platform architecture (Parallel Computing) discovers useful content easily with different data mining and analyze that data. This term paper is to give only an outline of a latest and promising topic of Parallel Computing Technologies of Big Data. For an individual who is planning to learn more about big data or have an intention to work as the data analyst, this term paper would prove to be enough helpful. The topic of parallel computing is very huge so the information from the paper couldn’t be enough to complete the topic of parallel computing. The paper starts with the historical aspect of big data than it is about the basis of parallel computing technologies some terminologies related to it and finally its future aspects and development, In the report there is discussion on parallel computer memory architecture followed by parallel programming models and finally, a brief overview on a number of issues and aspects of parallel computing technologies. The futuristic application and prospects of parallel computing is also discussed.

Introduction

The term “Big Data” is gaining popularity around for some time now, but then also the idea regarding the Big Data, what it actually means ,its properties and aspects is somewhat not clear. Since the beginning of 80’s there is an idea of huge data storage and since then there has been extensive work regarding this field. In big data the total data we have produced till date , approximately 90% have been generated in past 5 to 7 years. Today, every year we create much more data than what we would have created in last 50 to 60 years. And the amount of data users creating continues to increase rapidly and will be increasing with an accelerating speed, as in next few decades the amount of digital data present and growing would be exponentially high around at least 50-60 zettabytes.

Essay due? We'll write it for you!

Any subject

Min. 3-hour delivery

Pay if satisfied

Get your price

Bigdata

The term Big data refers to huge chunk of data which could be used by our skills and present day technologies. Also Digitally Big Data may be defined as storage of any important and vital information which must contain the 3 basic V’s (later defined)of Big Data. Big Data Analytics is the procedure to analyse large set of data(Big Data) using any relevant software to reveal some useful information. The data is classified mainly in three types as:

Structured Data

Structure data may be defined as type of information which is arranged and sorted in a proper order. Structure data could be easily processed by a machine by is extremely difficult for an ordinary person to understand. Structure data is easily computed, used or analysed by implementing some of most basic algorithms. Some most basic examples of structured data are: Binary codes(0 or 1), spreadsheets, standard query languages (sql).

Unctructured Data

In technical language unstructured data is data which couldn’t be processed easily by a machine. Unstructured data has usually more importance on human understanding. It is not based on RDBMS and could only be used or analysed using some deep , complex algorithms. Its examples include Word Documents, any text Message, data generated from social media such as GIF’s, videos etc

Semistructured

Semi-structured data is the data which is not structured or gets operated on in relational data base management system like structured data but contains some tags or other elements which could define hierarch to some extent . It is also called as Self Describing data also. Some of the examples of semi-structured data are: CSV,JSON or even XML. Now to handle and analyse this huge data we need some analytic tools and these tools are known as “ Big Data Analytic Tools”. The analytic tools uses different algorithms to analyses and use data.

The market research firm Gartner has categorized big data analytics tools into mainly four different types:

  1. Descriptive Analytics: These are the type of tools which are used by firms and show what could have been done or what was done. They show their result by telling that at some certain point or even over a span what was done or happened. They are somewhat basic tools.
  2. Diagnostic Analytics: Diagnostic tools are more like a step behind artificial intelligence as they points towards the basic reason behind any happening. These tools are more ahead as compared to descriptive tools because of it allows a data scientist to go to the root cause with the help of processing of the data.
  3. Predictive Analytics: As we discussed the last two tools are concerned with activity that happened in past but unlike them Predictive Analytics tools are the tools which deals with “What Will Happen Next?”. Predictive tools are based on advanced algorithms and the apply the AI and Machine Learning Technologies as they are able to able to foresee.
  4. Prescriptive Analytics: Much similar to predictive analytics but a step ahead. These tools are basically used by firms and they tell them what to do or not in any condition or case which could give them(firms) expected or optimum result. The employment of AI and Machine learning is far ahead as compared to other tools but the popularity is low.

The 3 V’S of Big Data

Velocity variety volume:

  1. Velocity: It’s the velocity with which the information travels in a medium.
  2. Volume: The volume (amount) of data present.
  3. 3Variety: The kind of data accessible to start to frame the big data discussion.

There is also some focus on development of other V’s, such as big data’s “veracity” and “value.”

Basis of Parallel Computation

Types of Existing Parallelism: The Two Extremes

Data Parallel

In data parallel there are no of parallel tasks and with least dependency amongst them. Data parallel requires the details of data under consideration (data mapping)and which is an extremely important step. In data parallel the data is programmed with messaging passing. An classic Example of data parallel is: – Build one data base for each state of India simultaneously, when customer data is distributed by customer code.

Programming Methodologies

Since there are plenty of programs in today’s world which are developed to deal with problems of day to day problems like hospitality data, banking , payroll processing etc than these software become more and more vast and. For this problem the solution is programming methodology in which there is analyzing and control of development process of any software. One of the examples is modular programming.

Task Parallel

Task Parallel is defined as the processing of a task in a machine on multiple cores which have varying functions .In task parallel the data stored could be in either same or in different data base. In task parallel the problem is divided into sub-topics, all the sub-topics of a task are then processed simultaneously. In task parallel it is much complicated and difficult to balance load. Example Of Task Parallel – Build one data base in parallel for entire India, though customer data is state code.

Parallel Systems Employed in Parallel Technologies

Memory Distribution

Distributed Memory

In Distributed Memory every single processor contains its own temporary memory (local memory) in the main computer in which all parallel processors are connected. In distributed memory system a processor have the ability and property that its memory remains classified. The domain of a processor regarding memory allocation ,edition ,deletion or manipulation is restricted to itself only. Any synchronization can only be done by passing explicit massages between processors.

Examples: Cray T3E, IBM SP2

Shared Memory

In shared memory, to a processor only 1 address regarding space is give. In shared memory all processes in parallel computer doesn’t have the domain to gain entry to the hub of shared memory.

Some Examples of Shared Memory are: SGI origin, SunE10000

Connection Topology

A connection topology is a schematic way in which different peripheral devices are connected over a network. A computer with each processor linked to every other processor would be an ideal case. But since that would be expensive and complex to handle, so computers or processors are made to be interlinked with such variation of network, such as torus or even hypercube etc because handling the interlinked processors system is not financially viable and very difficult to handle. The main issues involved in most of the networks design are the bandwidth of the network, communication involved and even the network latency. The bandwidth is defined as the total capacity (in Bits) that a specific medium can carry from one point to another. The network latency can be defined as delay that happens in data communication over a network.

Static Interconnects

It is also known as Direct Dedicated. Nodes connected directly using static point-to-point links. Such networks include: Fully connected networks, Rings, Meshes, Hypercube etc.

Dynamic Interconnects

In dynamic interconnects there are some switches employed which are there to analyse the dynamic links(virtual circuits) between different nodes. It is entirely different from point-to-point communication and far more ahead and advanced than that. Each node is connected to the specific subset of switches. These are established by configuring switches based on configuration demands.

Hadoop Distributed File System

Hadoop is mainly produced to manage data generated in bulk or Big Data, we have emails, documents, pdf media files etc. Handling such a huge amount of data would be extremely difficult or impossible. It’s the best example of parallelism(parallel computation) as in HADOOP It is a requirement that different types of machines having an entirely different processing system and data required are made to work as a single machine to obtain any desired result ,outcome or output. Whenever we search anything on internet or any web browser then there are hundreds and thousands of million pages are searched and generated in extremely less time. In fact almost over ninety percent of total data which have been generated in last decades are a result of search and analysis. In another words we might be heading towards a data explosion, since data are mainly of 3 types:

  1. Structured
  2. Unstructured
  3. Semistructured

Now it’s not difficult to imagine the amount of data created in the world every day. To deal with this problem efficiently many search engines and social networking sites have employed their own HDFS ( Hadoop Distributed File System ). For Example: Google uses its own HDFS such as Google File System and Map Reduce. Facebook has its own and the world’s largest Hadoop clusture and it generates at least 0.5 PETABYTE in 1 day

Basics Of HDFS

HDFS stands for Hadoop Distributed File System. It (Hadoop) is nothing more than software and is intended to spread and solve data storage management which is used for really big data sets. Other advantage of HDFS is that it is scalable and is very much tolerable towards faults.

Hadoop basically have of two most important components in it:

  1. HDFS: It is exactly a type of storage used by Hadoop software to store its clusters.
  2. Hadoop map reduce –It is mostly used to process and analyse data, it is also used to retrieve some data.

Parallel Programming Models

How can we write programs that run faster on a multicore pc??

How can we write a program that do not crash on a multicore pc??

The answer is right model!

Explicit Programming gives an essential and important engineering point balancing modularization and segregation in (at least) two cases.

Implicit Parallelism

Parallelizing Compilers

  • Implicit multithreading is concurrent execution of multiple threads extracted from single sequential program.
  • Supported by parallel languages parallelizing compilers that take of care identifying parallelism, the scheduling of calculation and placement of data.
  • Use a conventional language(like c, Fortran, Lisp, or Pascal) to write the program.

Explicit Parallelism

  • A programmer practicing explicit parallelism must explicitly state which instructions can be executed in parallel, full control of programmer.
  • Representation of concurrent computations by means of primitives in the form of special purpose directives or function calls or also called “parallelization overhead”
  • Most parallel firsts are associated to the process of coexistence, transmission or task partitioning.
  • In explicit parallelism there is only a single thread of control .
  • Examples: thread APIs, Erlang, Ada, cilk etc.

Multithreading – a message-passing program comprises of various operations, every one of which possess its own them of control and may execute different code.

  1. Asynchronous – the procedure of a mess
  2. Age-passing program execute asynchronously.
  3. Separate address space – the procedure of a parallel program inhabit in dissimilar encrypted location.
  4. Explicit interactions – the programmer must find an accurate solution of all the intercommunication issues; include the data mapping, communication and synchronization.

Future Aspects and Application of Big Data, BDA(Big Data Analytics) and Parallel Computing Techniques

Big Data Analytics in the form of parallel computation has immense application in daily life such as:

  • Big Data has an extreme potential to be developed as an efficient way to deal with knowledge.
  • Some analytical scientists also compare Big Data Analytics and Big Data with boom in nuclear technology.
  • It helps drive to drive actionable insights to health professionals monitoring remote patterns through monitoring systems.
  • BDA could drive and analyse all the required information to process and the procedure of 3D printing.

Wireless sensor platforms and Big Data would enable predictive analysis at remote location monitoring approx. In every industry. Wireless sensors combined with data analytic tools(which will be able to process the data generated from different sensors) , it could help and would work as a catalyst for artificially intelligence. With gradual collection of data at battle fields, parallel computing techniques can be used to analyse all the strategies, plans and patterns of either chemical, Biological and Radioactive or any foe force detection across war zones as well as vast areas. Applying parallel technologies and harnessing its potential will allow weather fore cast authorities to tackle the un-predictive nature of environment and will a boon for renewable energy.

In today’s world electronics have intruded every aspect of life and superconductors would be a game changer for the world and for superconductivity many aspects are to be considered simultaneously. So a parallel acomputer would be a best choice for the production and future development in super conductors.

Conclusion

The further work and development of technologies in big data and its analysis would bring a sudden boom and many new forms of industries. As after the industrial revolution world saw the immense change and the development of countries and technology was increased by many folds with a rapid pace similarly if world is able to harness the concept and basic idea behind big data it would revolutionize world with greater consequences as it would help to create a digital economy which would be able to compete all other forms of economies combined. Though there are many deficiencies in present day world to safeguard, analyses, store and process data generated from different parts and methods by present day technology. Since big data involves dealing with data so many developers and other user have an impression of it as a BACKEND work which is entirely wrong due to this misconception and lack of expertise in analyzing big data that the workforce in big data industry in just a handful. As the importance and application of big data is growing the methods and techniques to examine and analyse it will also have huge implication and will have an exponential growth.

74
writers online
to help you with essay
banner clock
Clock is ticking and inspiration doesn't come?
We`ll do boring work for you. No plagiarism guarantee. Deadline from 3 hours.

We use cookies to offer you the best experience. By continuing, we’ll assume you agree with our Cookies policy.