Stefan Marinov's Wiki

"Exploration through intelligent machinery."

User Tools

Site Tools


Parallel Computing

Parallel computing is the science and art of programming computers that can do more than one operation at once, concurrently, during the same cycle, often via having more than one processor (CPU).

Some parallel computers are just regular workstations that have more than one processor in them while others are giant single computers with many processors (supercomputers) or networks of individual computers configured to coordinate on computing problems (clusters). Parallel computers can run some types of programs far faster than traditional single processor computers (von Neumann architecture).


Parallel Computing

In traditional (serial) programming, a single processor executes program instructions in a step-by-step manner. Some operations, however, have multiple steps that do not have time dependencies and therefore can be separated into multiple tasks to be executed simultaneously. For example, adding a constant to all the elements of a matrix does not require that the result obtained from summing one element be acquired before summing the next element. Elements in the matrix can be made available to several processors, and the sums performed simultaneously, with the results available faster than if all operations had been performed serially.

Parallel computations can be performed on shared-memory systems with multiple CPUs, distributed-memory clusters made up of smaller shared-memory systems, or single-CPU systems. Coordinating the concurrent work of the multiple processors and synchronizing the results are handled by program calls to parallel libraries; these tasks usually require parallel programming expertise.

Grid Computing

The term grid computing denotes the connection of distributed computing, visualization, and storage resources to solve large-scale computing problems that otherwise could not be solved within the limited memory, computing power, or I/O capacity of a system or cluster at a single location. Much as an electrical grid provides power to distributed sites on demand, a computing grid can supply the infrastructure needed for applications requiring very large computing and I/O capacity.

The creation of a functional grid requires a high-speed network and grid middleware that lets the distributed resources work together in a relatively transparent manner. For example, whereas sharing resources on a single large system may require a batch scheduler, scheduling and dispatching jobs that run concurrently across multiple systems in a grid requires a metascheduler that interacts with each of the local schedulers. Additionally, a grid authorization system may be required to map user identities to different accounts and authenticate users on the various systems.


Supercomputer is a general term for computing systems capable of sustaining high-performance computing applications that require a large number of processors, shared or distributed memory, and multiple disks.

Cluster Computing

A cluster is a set of machines, usually (but not necessarily) on a private network which is attached to a dual-homed “master node”. Dual-homed means it sits on two networks at the same time, and may even act as a router between the two. This master node can allow logins, and is where large parallel jobs can be set up. Once the job is submitted, software on the master connects to the drones and runs the job there. This software is designed to fairly execute programs when there is available resources for them, and make sure that someone doesn't start a job on the same nodes that someone else is using for their processes so that everyone's programs get fair share of the machine.

General-Purpose Computing on GPUs (GPGPU)

GPU computing is the use of a GPU (Graphics Processing Unit) to do general purpose scientific and engineering computing. The model for GPU computing is to use a CPU and GPU together in a heterogeneous co-processing computing model. The sequential part of the application runs on the CPU and the computationally-intensive part is accelerated by the GPU. From the user's perspective, the application just runs faster because it is using the high-performance of the GPU to boost overall performance.

GPGPU is a fairly recent trend in computer engineering research. GPUs are co-processors that have been heavily optimized for computer graphics processing. Computer graphics processing is a field dominated by data parallel operations - particularly linear algebra matrix operations. In the early days, GPGPU processing was done by tricking the GPU by disguising computation loads as graphic loads so that programs could use the normal graphics APIs. However, several new programming languages and platforms have been built to do general purpose computation on GPUs. Some prominent examples include CUDA (Nvidia), CTM (AMD), OpenCL, BrookGPU, and OpenGL.


  • Parallel Algorithms for Regular Architectures: Meshes and Pyramids
    Russ Miller and Quentin F. Stout, MIT Press, September 1996, ISBN 0-262-13233-8
  • Parallel Scientific Computing in C++ and MPI: A Seamless Approach to Parallel Algorithms and their Implementation
    George Em Karniadakis and Robert M. Kirby II, Cambridge University Press, June 2003, ISBN 0521520800
  • Introduction to Parallel Computing: A Practical Guide with Examples in C (Software examples to book)
    Peter Arbenz and Wesley P. Petersen, Oxford University Press, February 2004, ISBN 0-19-851576-6 (hb) /|ISBN 0-19-851577-4 (pb)

Web Resources

This list mainly includes various books, tutorials and classes available online, for free. However, links to other materials like single presentations or texts may partially be present here as well.

Frameworks and Tools


See also: MPICH vs Open MPI, Beowulf cluster


  • Berkeley Open Infrastructure for Network Computing (BOINC) - an open-source software platform for computing using volunteered resources
  • CRAN Task View: High-Performance and Parallel Computing with R at the Comprehensive R Archive Network for Statistical Computing (CRAN)
  • FastFlow (FF) - a C++ parallel programming framework advocating high-level, pattern-based parallel programming. It chiefly supports streaming and data parallelism, targeting heterogenous platforms composed of clusters of shared-memory platforms, possibly equipped with computing accelerators.
  • Microsoft C++ AMP - a library for implementing (mostly GPU) data parallelism directly in C++; AMP stands for “Accelerated Massive Parallelism”
  • NVIDIA Compute Unified Device Architecture (CUDA) - a parallel computing platform and programming model for their specially-designed graphics processing units.
    • It gives program developers direct access to the virtual instruction set and memory of the parallel computational elements in CUDA GPUs.
  • OpenMP (Open Multi-Processing) - an API that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most processor architectures and operating systems.
    • It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.
    • Official website:
    • GOMP — An OpenMP implementation for GCC, the GNU Compiler Collection
  • Parallel Object Programming with C++ (POP-C++) - a comprehensive object-oriented system for developing HPC applications in large, heterogeneous, parallel and distributed computing infrastructures.
  • Pthreads - a POSIX standard for threads; defines a set of C programming language types, functions and constants.
  • Parallel Virtual Machine (PVM) - a software tool for parallel networking of computers written in C
    • It is designed to allow a network of heterogeneous Unix and/or Windows machines to be used as a single distributed parallel processor.
    • PVM was a step towards modern trends in distributed processing and grid computing but has, since the mid-1990s, largely been supplanted by the much more successful MPI standard.
    • Official website:

Special Considerations

Including limitations and possible problems resulting from the (improper) usage of parallel computing.

  • The program code should be changed in order to implement the parallel computing part.
    • Determining the best places to parallelise (i.e. those requiring the most computation power) is usually non-trivial.
    • Finding the best library/tools for parallelisation is dependent on the hardware, operating system, programming language used, as well as the computation itself.
  • Sometimes it could be difficult to recognize the bottle-neck you are having (I/O operations, memory, processing power, etc).
    • With massively parallel processing there is a lot of communication going back and forth, and the latency of Ethernet could be the cause of not achieving any type of time advantage whatsoever.
  • Although it is relatively easy to achieve modest parallelization, there are a lot of challenges when trying to scale efficiently (see Amdahl's law).
en/parallel_computing.txt · Last modified: 2015-03-26 19:06 by Stefan Marinov