LingSieve

CUDA (GPU) implementation of the sieve of Eratosthenes.

Key components used to make it really fast:

segmented sieve of Eratosthenes
sieve in shared memory
bucket concept
wheel concept
blackkey concept - compact layout of primes to be sieved, one odd number per bit
reuse all data structures again and again - no malloc, no free hence no memory leaks and init time can be neglected
no bulk data transfer between CPU and GPU, only the sieve result will be transfered from GPU to CPU
different sieving method for very small, small, medium, large and very large numbers
internal granularity is 10^9, subsets of it will be dealt with by the counting module
Sieve range 0 ... 2^64 (1,8446744x10^19)

The program LingSieve counts primes and is designed for long duration tests (LDT), to handle large quantities of data, for the generation of large prime tables. That is what GPU computing is all about.
As of V3.2 small sieve jobs can be launched as well, e.g. LingSieve 100 -d=48 - to sieve from 100 ... 148 and find 9 primes.

Benchmark and HW Scaling

The GPU HW will scale the program accordingly, there is no need to configure anything.

Measurements V2.1 ... V2.3 taken with a NVIDIA GeForce GTX 1080 (Palit Gamerock)
Measurements V3.x taken with a NVIDIA GeForce GTX 5060 Ti (Palit Infinity 3)

Range	Count	V2.1	V2.2	V2.3	V3.0	V3.1	V3.2
0 … 10^11	4.118.054.813	0,60s	0,577s	0,564s	0,225s	0,195s	0,160s
0 … 10^12	37.607.912.018	8,31s	7,32s	5,91s	2,993s	2,643s	2,368s
0 … 10^13	346.065.536.839	103s	87,2s	75,9s	36,2s	33,1s	30,3s
0 … 10^14	3.204.941.750.802	1231s	1019s	952s	439s	408s	368s

0 … 10^11	4.118.054.813	0,60s	0,577s	0,564s	225ms	195ms	160ms
10^12 + 10^11	3.612.791.400	0,90s	0,795s	0,707s	303ms	293ms	273ms
10^13 + 10^11	3.340.141.707	1,08s	0,919s	0,815s	376ms	357ms	326ms
10^14 + 10^11	3.102.063.927	1,26s	1,057s	1,002s	469ms	439ms	397ms
10^15 + 10^11	2.895.317.534	1,44s	1,235s	1,186s	553ms	522ms	472ms
10^16 + 10^11	2.714.336.584	1,62s	1,494s	1,365s	642ms	610ms	554ms
10^17 + 10^11	2.554.712.095	1,81s	1,664s	1,548s	732ms	702ms	644ms
10^18 + 10^11	2.412.731.214	2,08s	1,912s	1,790s	827ms	794ms	742ms
4x10^18 + 10^11	2.334.654.194	2,34s	2,166s	2,056s	895ms	865ms	814ms
10^19 + 10^11	2.285.693.139	2,64s	2,518s	2,326s	989ms	942ms	895ms
1,4x10^19 + 10^11	2.268.304.926	2,81s	2,646s	2,461s	1032ms	984ms	939ms
1,8x10^19 + 10^11	2.255.482.326	2,91s	2,747s	2,586s	1073ms	1027ms	986ms

This is the fastest implementation of the sieve of Eratosthenes on a GPU to the best of my knowledge.

Prerequisite

NVIDIA graphics card with driver
GPU memory: 1346MB .. 1926MB depending on the range to be sieved
a 64-bit host application and non-embedded operating system (Linux, Windows)

Binary

The available binaries have been compiled with Microsoft Visual Studio Community for 64bit Windows and CUDA ToolKit for NVIDIA GPUs with compute capability (CC) as indicated in the table.

Year	V	MSVS	ToolKit	CC	Driver	Target	MD5
2018	2.1	2017	9.0	6.1	>=384.81	W10	294001B4092856F5C09A7DCBB907B340
2022	2.2	2017	9.2	6.1	>=398.26	W10	4EFCB3C000C2C6ACCE0A8C27C68E274F
2026	2.3	2022	12.4	6.1	>=525.xx	W1x	A62DB3C002BD4CF1BFFAB52A65A344EE
2026	3.0	2022	12.9	7.5&12.0	>=575.xx	W1x	E5F9A9A364BD51C4D1FA7761FBA3A259
2026	3.1	2022	12.9	7.5&12.0	>=575.xx	W1x	74494FB3BA629F488A926A7F798928C3
2026	3.2	2022	12.9	7.5&12.0	>=575.xx	W1x	4EAC17C270A5DDAFD6B1B018E5173E43

Sieve source code for V2.2 ... V3.2 is identical. Only program and CUDA parameters and tool sets were modified.

Usage

Open a Command Prompt and run the program LingSieve:

start without parameter - it will use initial values - StartValue (0), segment size (10^11) and start a LDT
if there is a valid Result file it will use the last entry as StartValue, segment size (10^11) and start a LDT
if there is a valid user input it will use it instead of 1) or 2)

Examples ≥ V3.2	Comment	Count
LingSieve -GPU	get your graphics cards parameter	-
LingSieve -bench	get your graphics cards performance	-
LingSieve /?	Display help text	-
LingSieve 0	Count from 0 ... 10^11	4.118.054.813
LingSieve 0 -d=1e9	Count from 0 ... 10^9	50.847.534
LingSieve 0 -d=1e10	Count from 0 ... 10^10	455.052.511
LingSieve 1e15 -d=1e11	Count from 10^15 ... 10^15+10^11	2.895.317.534
LingSieve 1e17 -d=1e12	Count from 10^17 ... 10^17+10^12	25.546.659.722

LingSieve 100 -d=48	Count from 100 ... 148	9
LingSieve 1e17 -d=2048	Count from 1e17 ... 1e17+2048	55
LingSieve 1e17+123 -d=1e4	Count from 1e17+123 ... 1e17+1e4+123	265

Status

I've sifted 6,8x10^16 numbers in various ranges without any errors, that is 68.000.000 cycles. Once I run it for more than 60 hours continuously without any problems, it just works as it is supposed to. The results were compared with the extensive prime sieve tables of Tomás Oliveira e Silva and Jan Büthe, randomly with the program primesieve 64bit of Kim Walisch.

Basic development is finished.

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
LingSieve LDT.jpg		LingSieve LDT.jpg
LingSieve1.jpg		LingSieve1.jpg
LingSieve2.2e.jpg		LingSieve2.2e.jpg
LingSieve2.3.jpg		LingSieve2.3.jpg
LingSieve3.0.png		LingSieve3.0.png
LingSieve3.1.png		LingSieve3.1.png
LingSieve3.2.png		LingSieve3.2.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LingSieve

Benchmark and HW Scaling

Prerequisite

Binary

Usage

Status

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

LingSieve

Benchmark and HW Scaling

Prerequisite

Binary

Usage

Status

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Packages