Semigroup
Aggregation, data combining
____ _____ __ __ ___ ____ ____ ___ _ _ ____ ____
/ ___|| ____| \/ |_ _/ ___| _ \ / _ \| | | | _ \/ ___|
\___ \| _| | |\/| || | | _| |_) | | | | | | | |_) \___ \
___) | |___| | | || | |_| | _ <| |_| | |_| | __/ ___) |
|____/|_____|_| |_|___\____|_| \_\\___/ \___/|_| |____/
Previous chapters were demanding. Understanding context mapping and parallel computations within a context can be challenging. This time we turn to a more fundamental concept that is still simple to understand: semigroups[1]. I am not sure there is anything simpler that is also as useful in practice.
Algebraic Data Structures #
So far, we have met functors and applicative functors. Putting the programming mindset aside, we can think of them as sets equipped with operations. From this point of view, they are very similar, except that an applicative functor is also a functor. Expanding the idea of a set with operations, where those operations are closed and their results remain inside the same set, opens a wider, more abstract point of view[2] on the tools we are building.
Algebraic structure[3] or algebraic system consists of a nonempty set A (called the underlying set, carrier set or domain), a collection of operations on A (typically binary operations such as addition and multiplication), and a finite set of identities (known as axioms) that these operations must satisfy.*
This perfectly correlates with what we've been dealing with so far about objects and morphisms and laws of their application. So algebraic structures are sets with one or more operations that obey specific laws.
When a new problem involves the same laws as such an algebraic structure, all the results that have been proved using only the laws of the structure can be directly applied to the new problem.
Remember functor country mapping example? That was about studying the problem in one country using a different country. Now, it's about the same country and building new bridges using laws for previous ones, so to speak.
Semigroup #
A semigroup is one of the simplest structures in computer science. At its core, a semigroup is simply a set equipped with an associative binary operation.
binary- involves two elements of the set (x,y)associative- recall that the operation∘is associative if the order of grouping does not matter.
Important note: Commutativity is not mandatory.
What makes semigroups particularly powerful is their composability. When you can combine two things of the same type to get another thing of the same type, and when this combination is associative, you unlock powerful patterns for building scalable, parallelizable systems.
Formal definition #
A semigroup is an algebraic structure consisting of:
- A set
S(the underlying set or carrier) - A binary operation
∘ : S × S -> S(closed under the operation) - Associativity law: For all
a, b, c ∈ S, the following must hold:(a ∘ b) ∘ c = a ∘ (b ∘ c)
Examples #
import Prelude hiding (Semigroup, (<>))
-- The Semigroup type class in Haskell
class Semigroup a where
(<>) :: a -> a -> a
-- List concatenation semigroup
instance Semigroup [a] where
(<>) = (++)
-- Integer addition semigroup (using Sum wrapper)
newtype Sum a = Sum a deriving (Show)
instance Num a => Semigroup (Sum a) where
Sum x <> Sum y = Sum (x + y)
-- Integer multiplication semigroup (using Product wrapper)
newtype Product a = Product a deriving (Show)
instance Num a => Semigroup (Product a) where
Product x <> Product y = Product (x * y)
-- Examples in action:
main :: IO ()
main = do
-- String concatenation
print $ "Hello" <> " " <> "World" -- "Hello World"
-- List concatenation
print $ ([1,2] :: [Int]) <> [3,4] <> [5,6] -- [1,2,3,4,5,6]
-- Sum semigroup
print $ (Sum 1 :: Sum Integer) <> Sum 2 <> Sum 3 -- Sum 6
-- Product semigroup
print $ (Product 2 :: Product Integer) <> Product 3 <> Product 4 -- Product 24
// Semigroup interface in TypeScript
interface Semigroup<T> {
combine(a: T, b: T): T;
}
// String concatenation semigroup
const StringSemigroup: Semigroup<string> = {
combine: (a, b) => a + b
};
// Array concatenation semigroup
const ArraySemigroup = <T>(): Semigroup<T[]> => ({
combine: (a, b) => [...a, ...b]
});
// Number addition semigroup
const SumSemigroup: Semigroup<number> = {
combine: (a, b) => a + b
};
// Number multiplication semigroup
const ProductSemigroup: Semigroup<number> = {
combine: (a, b) => a * b
};
// Generic combine function (associative)
function combineAll<T>(semigroup: Semigroup<T>, items: T[]): T {
if (items.length === 0) {
throw new Error("Cannot combine empty array");
}
return items.reduce(semigroup.combine);
}
// Examples in action:
console.log(StringSemigroup.combine("Hello", " World")); // "Hello World"
console.log(combineAll(StringSemigroup, ["Hello", " ", "World"])); // "Hello World"
console.log(combineAll(ArraySemigroup<number>(), [[1,2], [3,4], [5,6]])); // [1,2,3,4,5,6]
console.log(combineAll(SumSemigroup, [1, 2, 3, 4])); // 10
console.log(combineAll(ProductSemigroup, [2, 3, 4])); // 24
using System;
using System.Collections.Generic;
using System.Linq;
// Semigroup interface in C#
public interface ISemigroup<T>
{
T Combine(T a, T b);
}
// String concatenation semigroup
public class StringSemigroup : ISemigroup<string>
{
public string Combine(string a, string b) => a + b;
}
// List concatenation semigroup
public class ListSemigroup<T> : ISemigroup<List<T>>
{
public List<T> Combine(List<T> a, List<T> b) => a.Concat(b).ToList();
}
// Number addition semigroup
public class SumSemigroup : ISemigroup<int>
{
public int Combine(int a, int b) => a + b;
}
// Number multiplication semigroup
public class ProductSemigroup : ISemigroup<int>
{
public int Combine(int a, int b) => a * b;
}
// Generic combine function (associative)
public static class SemigroupExtensions
{
public static T CombineAll<T>(this ISemigroup<T> semigroup, IEnumerable<T> items)
{
var list = items.ToList();
if (!list.Any())
{
throw new ArgumentException("Cannot combine empty collection");
}
return list.Aggregate(semigroup.Combine);
}
}
// Examples in action:
class Program
{
static void Main()
{
var stringSemigroup = new StringSemigroup();
var listSemigroup = new ListSemigroup<int>();
var sumSemigroup = new SumSemigroup();
var productSemigroup = new ProductSemigroup();
// String concatenation
Console.WriteLine(stringSemigroup.Combine("Hello", " World")); // "Hello World"
Console.WriteLine(stringSemigroup.CombineAll(new[] { "Hello", " ", "World" })); // "Hello World"
// List concatenation
var list1 = new List<int> { 1, 2 };
var list2 = new List<int> { 3, 4 };
var list3 = new List<int> { 5, 6 };
Console.WriteLine(string.Join(",", listSemigroup.CombineAll(new[] { list1, list2, list3 }))); // "1,2,3,4,5,6"
// Sum semigroup
Console.WriteLine(sumSemigroup.CombineAll(new[] { 1, 2, 3, 4 })); // 10
// Product semigroup
Console.WriteLine(productSemigroup.CombineAll(new[] { 2, 3, 4 })); // 24
}
}
Triangle Semigroup Example #
Let's revisit our geometric example, but this time describe the semigroup in terms of centroids[4].
The Set and Operation #
To obtain a centroid-based semigroup, we restrict attention to one fixed triangle shape and all of its translations. Fix a reference triangle T_0 whose centroid is at the origin, and let S = {T_c | c ∈ R^2} be the set of all translated copies of T_0, where T_c is the unique translated copy whose centroid is c.
This works because, for a fixed triangle shape, the centroid uniquely determines the translation.
Define the operation by adding centroids:
T_c ∘ T_d = T_{c + d}
If c = (x1, y1) and d = (x2, y2), then:
c + d = (x1 + x2, y1 + y2)
Verification of Semigroup Laws #
- Closure
If c, d ∈ R^2, then c + d ∈ R^2, so T_{c + d} is again an element of S.
- Associativity
For any c, d, e ∈ R^2:
(T_c ∘ T_d) ∘ T_e = T_{(c + d) + e} = T_{c + (d + e)} = T_c ∘ (T_d ∘ T_e)
This holds because vector addition in R^2 is associative.
CSV Transform example #
Our practical example can benefit from semigroups as well. Let's take the initial one as a base.
With semigroups we can use things like:
- 👥 Person Combination: Merges person records by combining names and aggregating ages
- ➕ Age Summation: Uses addition semigroup to total all ages
- 📋 Name Collection: Concatenates arrays of names
- 🗂️ Dataset Merging: Combines multiple CSV datasets
- 📊 Statistics Aggregation: Accumulates counts, sums, and collections in parallel
// Semigroup interface
interface Semigroup<T> {
combine(a: T, b: T): T;
}
// CSV parsing function (as provided)
function readCSV(data: string): Person[] {
const [header, ...rows] = data.trim().split('\n');
const keys = header.split(',');
return rows.map(row => {
const values = row.split(',');
const obj = Object.fromEntries(keys.map((k, i) => [k, values[i]])) as any;
return {
name: obj.name,
age: obj.age,
totalAge: undefined,
count: undefined
} as Person;
});
}
// Person type for type safety
interface Person {
name: string;
age: string;
totalAge?: number;
count?: number;
}
// Semigroup for combining person records
const PersonSemigroup: Semigroup<Person> = {
combine: (p1, p2) => ({
name: p1.name + " & " + p2.name, // Combine names
age: Math.max(parseInt(p1.age), parseInt(p2.age)).toString(), // Take max age
totalAge: (p1.totalAge || parseInt(p1.age)) + (p2.totalAge || parseInt(p2.age)),
count: (p1.count || 1) + (p2.count || 1)
})
};
// Semigroup for aggregating ages (sum)
const AgeSumSemigroup: Semigroup<number> = {
combine: (a, b) => a + b
};
// Semigroup for collecting names
const NameListSemigroup: Semigroup<string[]> = {
combine: (a, b) => [...a, ...b]
};
// Generic combine function for arrays
function combineAll<T>(semigroup: Semigroup<T>, items: T[]): T {
if (items.length === 0) {
throw new Error("Cannot combine empty array");
}
return items.reduce(semigroup.combine);
}
// Example usage
const data = readCSV('name,age\nAlice,25\nBob,40\nCharlie,35');
console.log("Original data:", data);
// 1. Combine all persons into one record
const combinedPerson = combineAll(PersonSemigroup, data);
console.log("Combined person:", combinedPerson);
// 2. Sum all ages using semigroup
const ages = data.map(person => parseInt(person.age));
const totalAge = combineAll(AgeSumSemigroup, ages);
console.log("Total age:", totalAge);
// 3. Collect all names using semigroup
const nameArrays = data.map(person => [person.name]);
const allNames = combineAll(NameListSemigroup, nameArrays);
console.log("All names:", allNames);
// 4. Process multiple CSV datasets and combine them
const data2 = readCSV('name,age\nDave,28\nEve,32');
const data3 = readCSV('name,age\nFrank,45\nGrace,29');
// Combine datasets using array concatenation semigroup
const ArraySemigroup = <T>(): Semigroup<T[]> => ({
combine: (a, b) => [...a, ...b]
});
const allDatasets = combineAll(ArraySemigroup<Person>(), [data, data2, data3]);
console.log("Combined datasets:", allDatasets);
// 5. Create summary statistics using semigroups
interface Stats {
totalAge: number;
count: number;
names: string[];
}
const StatsSemigroup: Semigroup<Stats> = {
combine: (s1, s2) => ({
totalAge: s1.totalAge + s2.totalAge,
count: s1.count + s2.count,
names: [...s1.names, ...s2.names]
})
};
const statsFromData = (persons: Person[]): Stats[] =>
persons.map(p => ({
totalAge: parseInt(p.age),
count: 1,
names: [p.name]
}));
const overallStats = combineAll(StatsSemigroup, statsFromData(allDatasets));
console.log("Overall statistics:", overallStats);
console.log("Average age:", overallStats.totalAge / overallStats.count);
import Data.List.Split (splitOn)
import Data.Semigroup (Sum(..), getSum)
-- Person data type
data Person = Person
{ name :: String
, age :: Int
, totalAge :: Maybe Int
, count :: Maybe Int
} deriving (Show, Eq)
-- Semigroup instance for Person
instance Semigroup Person where
p1 <> p2 = Person
{ name = name p1 ++ " & " ++ name p2
, age = max (age p1) (age p2)
, totalAge = Just $ (maybe (age p1) id (totalAge p1)) + (maybe (age p2) id (totalAge p2))
, count = Just $ (maybe 1 id (count p1)) + (maybe 1 id (count p2))
}
-- Stats data type for summary statistics
data Stats = Stats
{ statsTotalAge :: Int
, statsCount :: Int
, statsNames :: [String]
} deriving (Show, Eq)
-- Semigroup instance for Stats
instance Semigroup Stats where
s1 <> s2 = Stats
{ statsTotalAge = statsTotalAge s1 + statsTotalAge s2
, statsCount = statsCount s1 + statsCount s2
, statsNames = statsNames s1 ++ statsNames s2
}
-- CSV parsing function
readCSV :: String -> [Person]
readCSV csvData =
let allLines = lines csvData
in case allLines of
[] -> []
(header:rows) ->
let keys = splitOn "," header
parseRow row =
let values = splitOn "," row
rowMap = zip keys values
n = lookup "name" rowMap
a = lookup "age" rowMap >>= readMaybe
in case (n, a) of
(Just nameVal, Just ageVal) ->
Just $ Person nameVal ageVal Nothing Nothing
_ -> Nothing
in map (\r -> case parseRow r of
Just p -> p
Nothing -> error "Malformed row") rows
where
readMaybe :: String -> Maybe Int
readMaybe s = case reads s of
[(x, "")] -> Just x
_ -> Nothing
-- Combine all elements using semigroup
combineAll :: Semigroup a => [a] -> Maybe a
combineAll [] = Nothing
combineAll (x:xs) = Just $ foldl (<>) x xs
-- Convert Person to Stats
personToStats :: Person -> Stats
personToStats p = Stats (age p) 1 [name p]
-- Main function
main :: IO ()
main = do
let csvData = "name,age\nAlice,25\nBob,40\nCharlie,35"
let dataList = readCSV csvData
putStrLn "Original data:"
mapM_ print dataList
-- 1. Combine all persons into one record
case combineAll dataList of
Just combinedPerson -> do
putStrLn "\nCombined person:"
print combinedPerson
Nothing -> putStrLn "No data to combine"
-- 2. Sum all ages using built-in Sum semigroup
let ages = map age dataList
let ageSums = map (Sum . fromIntegral :: Int -> Sum Integer) ages -- Convert to Sum semigroup
case combineAll ageSums of
Just totalSum -> putStrLn $ "\nTotal age (using Sum semigroup): " ++ show (getSum totalSum)
Nothing -> putStrLn "No ages to sum"
-- 3. Collect all names using semigroup
let names = map ((: []) . name) dataList -- Convert to list of lists
case combineAll names of
Just allNames -> putStrLn $ "All names: " ++ show allNames
Nothing -> putStrLn "No names to collect"
-- 4. Process multiple CSV datasets and combine them
let csvData2 = "name,age\nDave,28\nEve,32"
let csvData3 = "name,age\nFrank,45\nGrace,29"
let data2 = readCSV csvData2
let data3 = readCSV csvData3
let allDatasets = dataList ++ data2 ++ data3
putStrLn "\nCombined datasets:"
mapM_ print allDatasets
-- 5. Create summary statistics using semigroups
let statsFromData = map personToStats allDatasets
case combineAll statsFromData of
Just overallStats -> do
putStrLn "\nOverall statistics:"
print overallStats
let avgAge = fromIntegral (statsTotalAge overallStats) / fromIntegral (statsCount overallStats) :: Double
putStrLn $ "Average age: " ++ show avgAge
-- Also demonstrate Sum semigroup for total age calculation
let allAges = map (Sum . fromIntegral . age :: Person -> Sum Integer) allDatasets
case combineAll allAges of
Just totalAgeSum -> putStrLn $ "Total age (Sum semigroup): " ++ show (getSum totalAgeSum)
Nothing -> putStrLn "No ages to sum with Sum semigroup"
Nothing -> putStrLn "No stats to combine"
using System;
using System.Collections.Generic;
using System.Linq;
// Semigroup interface
public interface ISemigroup<T>
{
T Combine(T a, T b);
}
// Person class for type safety
public class Person
{
public string Name { get; set; }
public string Age { get; set; }
public int? TotalAge { get; set; }
public int? Count { get; set; }
public Person(string name, string age, int? totalAge = null, int? count = null)
{
Name = name;
Age = age;
TotalAge = totalAge;
Count = count;
}
}
// Semigroup for combining person records
public class PersonSemigroup : ISemigroup<Person>
{
public Person Combine(Person p1, Person p2)
{
return new Person(
name: $"{p1.Name} & {p2.Name}", // Combine names
age: Math.Max(int.Parse(p1.Age), int.Parse(p2.Age)).ToString(), // Take max age
totalAge: (p1.TotalAge ?? int.Parse(p1.Age)) + (p2.TotalAge ?? int.Parse(p2.Age)),
count: (p1.Count ?? 1) + (p2.Count ?? 1)
);
}
}
// Semigroup for aggregating ages (sum)
public class AgeSumSemigroup : ISemigroup<int>
{
public int Combine(int a, int b) => a + b;
}
// Semigroup for collecting names
public class NameListSemigroup : ISemigroup<List<string>>
{
public List<string> Combine(List<string> a, List<string> b)
{
var result = new List<string>(a);
result.AddRange(b);
return result;
}
}
// Generic array semigroup
public class ArraySemigroup<T> : ISemigroup<List<T>>
{
public List<T> Combine(List<T> a, List<T> b)
{
var result = new List<T>(a);
result.AddRange(b);
return result;
}
}
// Statistics class for aggregating data
public class Stats
{
public int TotalAge { get; set; }
public int Count { get; set; }
public List<string> Names { get; set; }
public Stats(int totalAge, int count, List<string> names)
{
TotalAge = totalAge;
Count = count;
Names = names ?? new List<string>();
}
}
// Semigroup for combining statistics
public class StatsSemigroup : ISemigroup<Stats>
{
public Stats Combine(Stats s1, Stats s2)
{
var combinedNames = new List<string>(s1.Names);
combinedNames.AddRange(s2.Names);
return new Stats(
totalAge: s1.TotalAge + s2.TotalAge,
count: s1.Count + s2.Count,
names: combinedNames
);
}
}
// Extension methods for semigroup operations
public static class SemigroupExtensions
{
// Generic combine function for collections
public static T CombineAll<T>(this ISemigroup<T> semigroup, IEnumerable<T> items)
{
var list = items.ToList();
if (!list.Any())
{
throw new ArgumentException("Cannot combine empty collection");
}
return list.Aggregate(semigroup.Combine);
}
}
// CSV parsing utility
public static class CsvParser
{
public static List<Person> ReadCSV(string data)
{
var lines = data.Trim().Split('\n');
if (lines.Length == 0) return new List<Person>();
var header = lines[0];
var rows = lines.Skip(1);
var keys = header.Split(',');
return rows.Select(row =>
{
var values = row.Split(',');
var obj = keys.Zip(values, (k, v) => new { Key = k, Value = v })
.ToDictionary(x => x.Key, x => x.Value);
return new Person(
name: obj["name"],
age: obj["age"]
);
}).ToList();
}
}
// Example usage
class Program
{
static void Main()
{
// Initialize semigroups
var personSemigroup = new PersonSemigroup();
var ageSumSemigroup = new AgeSumSemigroup();
var nameListSemigroup = new NameListSemigroup();
var arraySemigroup = new ArraySemigroup<Person>();
var statsSemigroup = new StatsSemigroup();
// Parse CSV data
var data = CsvParser.ReadCSV("name,age\nAlice,25\nBob,40\nCharlie,35");
Console.WriteLine("Original data:");
foreach (var person in data)
{
Console.WriteLine($" {person.Name}, {person.Age}");
}
// 1. Combine all persons into one record
var combinedPerson = personSemigroup.CombineAll(data);
Console.WriteLine($"\nCombined person: {combinedPerson.Name}, " +
$"age: {combinedPerson.Age}, total age: {combinedPerson.TotalAge}, " +
$"count: {combinedPerson.Count}");
// 2. Sum all ages using semigroup
var ages = data.Select(person => int.Parse(person.Age));
var totalAge = ageSumSemigroup.CombineAll(ages);
Console.WriteLine($"Total age: {totalAge}");
// 3. Collect all names using semigroup
var nameArrays = data.Select(person => new List<string> { person.Name });
var allNames = nameListSemigroup.CombineAll(nameArrays);
Console.WriteLine($"All names: [{string.Join(", ", allNames)}]");
// 4. Process multiple CSV datasets and combine them
var data2 = CsvParser.ReadCSV("name,age\nDave,28\nEve,32");
var data3 = CsvParser.ReadCSV("name,age\nFrank,45\nGrace,29");
var allDatasets = arraySemigroup.CombineAll(new[] { data, data2, data3 });
Console.WriteLine($"\nCombined datasets ({allDatasets.Count} people):");
foreach (var person in allDatasets)
{
Console.WriteLine($" {person.Name}, {person.Age}");
}
// 5. Create summary statistics using semigroups
var statsFromData = allDatasets.Select(p => new Stats(
totalAge: int.Parse(p.Age),
count: 1,
names: new List<string> { p.Name }
));
var overallStats = statsSemigroup.CombineAll(statsFromData);
Console.WriteLine($"\nOverall statistics:");
Console.WriteLine($" Total age: {overallStats.TotalAge}");
Console.WriteLine($" Count: {overallStats.Count}");
Console.WriteLine($" Names: [{string.Join(", ", overallStats.Names)}]");
Console.WriteLine($" Average age: {(double)overallStats.TotalAge / overallStats.Count:F2}");
}
}
Visualizing semigroups #
Understanding semigroups becomes much clearer with visual representations. Let's explore how semigroup operations work through diagrams and illustrations.
1. The fundamental semigroup operation takes two elements and
produces a third element of the same type
Semigroup Operation: S × S → S
a ∈ S b ∈ S
│ │
└────┬─────┘
│
∘ (binary operation)
│
▼
c ∈ S
Where: c = a ∘ b
The key property of semigroups is associativity.
The order of grouping doesn't matter:
Associativity: (a ∘ b) ∘ c = a ∘ (b ∘ c)
Left-associative grouping: Right-associative grouping:
(a ∘ b) ∘ c a ∘ (b ∘ c)
a b c a b c
│ │ │ │ │ │
└─┬─┘ │ │ └─┬─┘
│ │ │ │
∘ │ = │ ∘
│ │ │ │
└──┬──┘ └──┬──┘
│ │
∘ ∘
│ │
▼ ▼
result result
Both paths produce the same result!
2. Triangle example
Triangles with centroids `c`, `d`, and `c + d`
```txt
T_c + T_d = T_(c + d)
/\ /\ /\
/| \ /| \ /| \
/ •--\ / •--\ / •--\
/__|___\ /__|___\ /__|___\
centroid = c centroid = d centroid = c + d
The centroid is shown at the intersection of the medians; the triangle shape stays the same, only the centroid position changes.
Parallel Processing #
We mentioned parallel computations before, but have not examined them closely. Associativity is what makes semigroups so useful here. Because regrouping does not change the result, we can split a reduction into independent chunks, process those chunks separately, and combine the partial results later. The following examples compute parallel sums and illustrate a few techniques commonly used in parallel computing:
- Divide and Conquer: Breaking large datasets into smaller chunks that can be processed independently.
- Tree Reduction: Combining results in a tree-like fashion to minimize synchronization points.
- Load Distribution: Spreading work across available CPU cores or worker threads.
- Safe Parallelization: Maintaining correctness regardless of how the work is partitioned.
The key insight is that (a + b) + c = a + (b + c). We can safely split sum computations across multiple threads or workers and combine the partial results in any grouping, making parallel processing both safe and efficient.
import Control.DeepSeq (NFData)
import Control.Parallel.Strategies
import Data.List (foldl1')
import Data.Semigroup (Sum(..), getSum)
-- Parallel divide-and-conquer combiner
parallelCombine :: (NFData a, Semigroup a) => [a] -> Maybe a
parallelCombine [] = Nothing
parallelCombine [x] = Just x
parallelCombine xs =
let mid = length xs `div` 2
(left, right) = splitAt mid xs
[leftResult, rightResult] =
map parallelCombine [left, right] `using` parList rdeepseq
in case (leftResult, rightResult) of
(Just l, Just r) -> Just (l <> r)
(Just l, Nothing) -> Just l
(Nothing, Just r) -> Just r
(Nothing, Nothing) -> Nothing
-- Parallel map-reduce using strategies
parallelMapReduce :: (NFData b, Semigroup b) => (a -> b) -> [a] -> Maybe b
parallelMapReduce f xs =
let mapped = map f xs `using` parList rdeepseq
chunks = chunksOf 1000 mapped -- Process in chunks
chunkResults = map (foldl1' (<>)) chunks `using` parList rdeepseq
in case chunkResults of
[] -> Nothing
_ -> Just $ foldl1' (<>) chunkResults
-- Helper function to split list into chunks
chunksOf :: Int -> [a] -> [[a]]
chunksOf _ [] = []
chunksOf n xs =
let (chunk, rest) = splitAt n xs
in chunk : chunksOf n rest
-- Example usage
main :: IO ()
main = do
let numbers = [1..1000000]
let smallNumbers = [1..100]
-- Demonstrate parallelCombine with smaller dataset
let wrappedSmallNumbers = map (Sum . fromIntegral :: Int -> Sum Integer) smallNumbers
let combineResult = parallelCombine wrappedSmallNumbers
putStrLn $ "Parallel combine result: " ++ show (getSum <$> combineResult)
-- Parallel sum calculation using built-in Sum semigroup
let sumResult = parallelMapReduce (Sum . fromIntegral :: Int -> Sum Integer) numbers
putStrLn $ "Parallel map-reduce result: " ++ show (getSum <$> sumResult)
-- Compare with sequential
let sequentialSum = sum numbers
putStrLn $ "Sequential sum: " ++ show sequentialSum
-- Show both approaches give same result for smaller dataset
let sequentialSmall = sum smallNumbers
putStrLn $ "Sequential small sum: " ++ show sequentialSmall
// Semigroup interface
interface Semigroup<T> {
combine(a: T, b: T): T;
}
// Sum semigroup
const SumSemigroup: Semigroup<number> = {
combine: (a, b) => a + b
};
// Simulate parallel processing with Promise-based chunks
async function parallelReduce<T>(
semigroup: Semigroup<T>,
data: T[]
): Promise<T | null> {
if (data.length === 0) return null;
if (data.length === 1) return data[0];
// Split into chunks for parallel processing
const chunkSize = Math.ceil(data.length / (navigator.hardwareConcurrency || 4));
const chunks: T[][] = [];
for (let i = 0; i < data.length; i += chunkSize) {
chunks.push(data.slice(i, i + chunkSize));
}
// Process chunks in parallel (simulated with setTimeout)
const chunkPromises = chunks.map(chunk =>
new Promise<T>((resolve) => {
setTimeout(() => {
// Sequential reduce within each chunk
const result = chunk.reduce(semigroup.combine);
resolve(result);
}, Math.random() * 100); // Simulate processing time
})
);
// Wait for all chunks to complete
const chunkResults: T[] = await Promise.all(chunkPromises);
// Combine chunk results using a simple loop
if (chunkResults.length === 0) return null;
if (chunkResults.length === 1) return chunkResults[0];
let result = chunkResults[0];
for (let i = 1; i < chunkResults.length; i++) {
result = semigroup.combine(result, chunkResults[i]);
}
return result;
}
// Map-reduce with parallel processing
async function parallelMapReduce<T, U>(
semigroup: Semigroup<T>,
data: U[],
mapper: (item: U) => T
): Promise<T | null> {
const mapped = data.map(mapper);
return parallelReduce(semigroup, mapped);
}
// Example usage
async function demonstrateParallelSum() {
const numbers = Array.from({length: 1000000}, (_, i) => i + 1);
console.time('Parallel Sum');
const parallelSum = await parallelMapReduce(SumSemigroup, numbers, x => x);
console.timeEnd('Parallel Sum');
console.log('Parallel sum result:', parallelSum);
// Compare with sequential processing
console.time('Sequential Sum');
const sequentialSum = numbers.reduce((a, b) => a + b, 0);
console.timeEnd('Sequential Sum');
console.log('Sequential sum result:', sequentialSum);
}
// Run the demonstration
demonstrateParallelSum();
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
public interface ISemigroup<T>
{
T Combine(T a, T b);
}
// Semigroup implementation for sum
public class SumSemigroup : ISemigroup<long>
{
public long Combine(long a, long b) => a + b;
}
// Parallel processor using TPL
public class ParallelProcessor<T>
{
private readonly ISemigroup<T> _semigroup;
public ParallelProcessor(ISemigroup<T> semigroup)
{
_semigroup = semigroup;
}
// Parallel reduce using divide-and-conquer
public async Task<T?> ParallelReduceAsync(IEnumerable<T> data)
{
var list = data.ToList();
if (!list.Any()) return default(T);
if (list.Count == 1) return list[0];
return await Task.Run(() => ParallelReduceInternal(list));
}
private T ParallelReduceInternal(IList<T> data)
{
if (data.Count == 1) return data[0];
if (data.Count == 2) return _semigroup.Combine(data[0], data[1]);
// Split into chunks for parallel processing
var mid = data.Count / 2;
var left = data.Take(mid).ToList();
var right = data.Skip(mid).ToList();
// Process both halves in parallel
var leftTask = Task.Run(() => ParallelReduceInternal(left));
var rightTask = Task.Run(() => ParallelReduceInternal(right));
Task.WaitAll(leftTask, rightTask);
return _semigroup.Combine(leftTask.Result, rightTask.Result);
}
// Map-reduce with parallel processing
public async Task<T?> ParallelMapReduceAsync<U>(
IEnumerable<U> data,
Func<U, T> mapper)
{
var list = data.ToList();
if (!list.Any()) return default(T);
// Determine optimal chunk size
var processorCount = Environment.ProcessorCount;
var chunkSize = Math.Max(1, list.Count / processorCount);
// Split into chunks
var chunks = list
.Select((item, index) => new { item, index })
.GroupBy(x => x.index / chunkSize)
.Select(g => g.Select(x => x.item).ToList())
.ToList();
// Process chunks in parallel
var chunkTasks = chunks.Select(chunk => Task.Run(() =>
{
var mapped = chunk.Select(mapper);
return mapped.Aggregate(_semigroup.Combine);
}));
var chunkResults = await Task.WhenAll(chunkTasks);
// Combine chunk results
return chunkResults.Aggregate(_semigroup.Combine);
}
// PLINQ-based parallel processing
public T? PLinqMapReduce<U>(IEnumerable<U> data, Func<U, T> mapper)
{
return data.AsParallel()
.Select(mapper)
.Aggregate(_semigroup.Combine);
}
}
// Performance demonstration
class Program
{
static async Task Main(string[] args)
{
var numbers = Enumerable.Range(1, 10_000).ToList();
// Parallel sum processing
var sumProcessor = new ParallelProcessor<long>(new SumSemigroup());
var parallelSum = await sumProcessor.ParallelMapReduceAsync(numbers, x => (long)x);
Console.WriteLine($"Parallel Sum: {parallelSum}");
// PLINQ comparison
var plinqSum = sumProcessor.PLinqMapReduce(numbers, x => (long)x);
Console.WriteLine($"PLINQ Sum: {plinqSum}");
// Sequential comparison
var sequentialSum = numbers.Select(x => (long)x).Aggregate((a, b) => a + b);
Console.WriteLine($"Sequential Sum: {sequentialSum}");
// Demonstrate associativity in parallel context
Console.WriteLine("\nDemonstrating Associativity in Parallel Processing:");
var smallNumbers = new[] { 1, 2, 3, 4, 5, 6, 7, 8 };
var result1 = await sumProcessor.ParallelReduceAsync(smallNumbers.Select(x => (long)x));
var result2 = smallNumbers.Select(x => (long)x).Aggregate((a, b) => a + b);
Console.WriteLine($"Parallel result: {result1}");
Console.WriteLine($"Sequential result: {result2}");
Console.WriteLine($"Results match: {result1 == result2}");
}
}
We can visualize parallel computations as well.
Sequential Processing:
a -> (∘) -> temp1 -> (∘) -> temp2 -> (∘) -> result
b -> -> c -> -> d ->
Reduction depth: O(n)
Parallel Processing (Divide & Conquer):
Level 1 (Parallel)
a ──┐ ┌── c
|--> (∘) --> temp1 │
b ──┘ |--> (∘) --> temp2
│
d ──┘
Level 2 (Combine)
temp1 ──┐
|--> (∘) --> final_result
temp2 ──┘
Reduction depth: O(log n) with enough workers
Conclusion #
Semigroups are among the most fundamental and practical algebraic structures. Their simplicity, requiring only an associative binary operation, belies their power and versatility in real-world applications.
Semigroups are useful for:
🔧 Simplicity Minimal requirements: just a set with an associative operation Easy to understand and implement across different programming languages Natural abstraction that models many common operations (concatenation, addition, multiplication, merging)
⚡ Parallel Processing Associativity enables safe parallelization without coordination overhead Support for divide-and-conquer algorithms Scalability across multiple CPU cores and distributed systems MapReduce-style computations
🧩 Composability and Modularity Combine small, simple operations into larger transformations Build larger systems from smaller, testable components
Source code #
Reference implementation (opens in a new tab)