Superscalar technology supports basic parallel computing (specifically, instruction-level parallelism) on one CPU (processor), which allows more than one instruction in each clock cycle by using more than one execution unit at the same time.
Superscalar processors are often pipelined as well, but that's a different technology that allows more than one instruction at once in each execution unit, rather than using multiple execution units at once.
Superscalar technology usually involves:
The simplest processors are scalar processors. On a scalar processor, instructions usually work with one or two data items at once. On a vector processor, instructions usually work with many data items at once. A superscalar processor is a mix of a scalar process and a vector processor: each instruction processes one data item, but more than one instruction runs at once, so many data items are handled at once by the processor.
In a superscalar processor, it's very important to have an accurate instruction dispatcher, so that the execution units are always busy with work that probably will be needed. If the instruction dispatcher isn't accurate, the processor will have to throw away some work and might not be any faster than a scalar processor. In 2008, all normal CPUs were superscalar, and could have up to 4 ALUs, 2 FPUs, and 2 SIMD units.