Parallel computing is the science and art of programming computers that can do more than one operation at once, concurrently, during the same cycle, often via having more than one processor (CPU).
Some parallel computers are just regular workstations that have more than one processor in them while others are giant single computers with many processors (supercomputers) or networks of individual computers configured to coordinate on computing problems (clusters). Parallel computers can run some types of programs far faster than traditional single processor computers (von Neumann architecture).
In traditional (serial) programming, a single processor executes program instructions in a step-by-step manner. Some operations, however, have multiple steps that do not have time dependencies and therefore can be separated into multiple tasks to be executed simultaneously. For example, adding a constant to all the elements of a matrix does not require that the result obtained from summing one element be acquired before summing the next element. Elements in the matrix can be made available to several processors, and the sums performed simultaneously, with the results available faster than if all operations had been performed serially.
Parallel computations can be performed on shared-memory systems with multiple CPUs, distributed-memory clusters made up of smaller shared-memory systems, or single-CPU systems. Coordinating the concurrent work of the multiple processors and synchronizing the results are handled by program calls to parallel libraries; these tasks usually require parallel programming expertise.
The term grid computing denotes the connection of distributed computing, visualization, and storage resources to solve large-scale computing problems that otherwise could not be solved within the limited memory, computing power, or I/O capacity of a system or cluster at a single location. Much as an electrical grid provides power to distributed sites on demand, a computing grid can supply the infrastructure needed for applications requiring very large computing and I/O capacity.
The creation of a functional grid requires a high-speed network and grid middleware that lets the distributed resources work together in a relatively transparent manner. For example, whereas sharing resources on a single large system may require a batch scheduler, scheduling and dispatching jobs that run concurrently across multiple systems in a grid requires a metascheduler that interacts with each of the local schedulers. Additionally, a grid authorization system may be required to map user identities to different accounts and authenticate users on the various systems.
Supercomputer is a general term for computing systems capable of sustaining high-performance computing applications that require a large number of processors, shared or distributed memory, and multiple disks.
A cluster is a set of machines, usually (but not necessarily) on a private network which is attached to a dual-homed “master node”. Dual-homed means it sits on two networks at the same time, and may even act as a router between the two. This master node can allow logins, and is where large parallel jobs can be set up. Once the job is submitted, software on the master connects to the drones and runs the job there. This software is designed to fairly execute programs when there is available resources for them, and make sure that someone doesn't start a job on the same nodes that someone else is using for their processes so that everyone's programs get fair share of the machine.
GPU computing is the use of a GPU (Graphics Processing Unit) to do general purpose scientific and engineering computing. The model for GPU computing is to use a CPU and GPU together in a heterogeneous co-processing computing model. The sequential part of the application runs on the CPU and the computationally-intensive part is accelerated by the GPU. From the user's perspective, the application just runs faster because it is using the high-performance of the GPU to boost overall performance.
GPGPU is a fairly recent trend in computer engineering research. GPUs are co-processors that have been heavily optimized for computer graphics processing. Computer graphics processing is a field dominated by data parallel operations - particularly linear algebra matrix operations. In the early days, GPGPU processing was done by tricking the GPU by disguising computation loads as graphic loads so that programs could use the normal graphics APIs. However, several new programming languages and platforms have been built to do general purpose computation on GPUs. Some prominent examples include CUDA (Nvidia), CTM (AMD), OpenCL, BrookGPU, and OpenGL.
This list mainly includes various books, tutorials and classes available online, for free. However, links to other materials like single presentations or texts may partially be present here as well.
See also: MPICH vs Open MPI, Beowulf cluster
Including limitations and possible problems resulting from the (improper) usage of parallel computing.