Skip to content

LingUaan/LingSieve

Repository files navigation

LingSieve

Download

CUDA (GPU) implementation of the sieve of Eratosthenes.

Key components used to make it really fast:

  • segmented sieve of Eratosthenes
  • sieve in shared memory
  • bucket concept
  • wheel concept
  • blackkey concept - compact layout of primes to be sieved, one odd number per bit
  • reuse all data structures again and again - no malloc, no free hence no memory leaks and init time can be neglected
  • no bulk data transfer between CPU and GPU, only the sieve result will be transfered from GPU to CPU
  • different sieving method for very small, small, medium, large and very large numbers
  • internal granularity is 10^9, subsets of it will be dealt with by the counting module
  • Sieve range 0 ... 2^64 (1,8446744x10^19)

The program LingSieve counts primes and is designed for long duration tests (LDT), to handle large quantities of data, for the generation of large prime tables. That is what GPU computing is all about.
As of V3.2 small sieve jobs can be launched as well, e.g. LingSieve 100 -d=48 - to sieve from 100 ... 148 and find 9 primes.

Benchmark and HW Scaling

The GPU HW will scale the program accordingly, there is no need to configure anything.

Measurements V2.1 ... V2.3 taken with a NVIDIA GeForce GTX 1080 (Palit Gamerock)
Measurements V3.x taken with a NVIDIA GeForce GTX 5060 Ti (Palit Infinity 3)

Range Count V2.1 V2.2 V2.3 V3.0 V3.1 V3.2
0 … 10^11 4.118.054.813 0,60s 0,577s 0,564s 0,225s 0,195s 0,160s
0 … 10^12 37.607.912.018 8,31s 7,32s 5,91s 2,993s 2,643s 2,368s
0 … 10^13 346.065.536.839 103s 87,2s 75,9s 36,2s 33,1s 30,3s
0 … 10^14 3.204.941.750.802 1231s 1019s 952s 439s 408s 368s
0 … 10^11 4.118.054.813 0,60s 0,577s 0,564s 225ms 195ms 160ms
10^12 + 10^11 3.612.791.400 0,90s 0,795s 0,707s 303ms 293ms 273ms
10^13 + 10^11 3.340.141.707 1,08s 0,919s 0,815s 376ms 357ms 326ms
10^14 + 10^11 3.102.063.927 1,26s 1,057s 1,002s 469ms 439ms 397ms
10^15 + 10^11 2.895.317.534 1,44s 1,235s 1,186s 553ms 522ms 472ms
10^16 + 10^11 2.714.336.584 1,62s 1,494s 1,365s 642ms 610ms 554ms
10^17 + 10^11 2.554.712.095 1,81s 1,664s 1,548s 732ms 702ms 644ms
10^18 + 10^11 2.412.731.214 2,08s 1,912s 1,790s 827ms 794ms 742ms
4x10^18 + 10^11 2.334.654.194 2,34s 2,166s 2,056s 895ms 865ms 814ms
10^19 + 10^11 2.285.693.139 2,64s 2,518s 2,326s 989ms 942ms 895ms
1,4x10^19 + 10^11 2.268.304.926 2,81s 2,646s 2,461s 1032ms 984ms 939ms
1,8x10^19 + 10^11 2.255.482.326 2,91s 2,747s 2,586s 1073ms 1027ms 986ms

This is the fastest implementation of the sieve of Eratosthenes on a GPU to the best of my knowledge.

Prerequisite

  • NVIDIA graphics card with driver
  • GPU memory: 1346MB .. 1926MB depending on the range to be sieved
  • a 64-bit host application and non-embedded operating system (Linux, Windows)

Binary

The available binaries have been compiled with Microsoft Visual Studio Community for 64bit Windows and CUDA ToolKit for NVIDIA GPUs with compute capability (CC) as indicated in the table.

Year V MSVS ToolKit CC Driver Target MD5
2018 2.1 2017 9.0 6.1 >=384.81 W10 294001B4092856F5C09A7DCBB907B340
2022 2.2 2017 9.2 6.1 >=398.26 W10 4EFCB3C000C2C6ACCE0A8C27C68E274F
2026 2.3 2022 12.4 6.1 >=525.xx W1x A62DB3C002BD4CF1BFFAB52A65A344EE
2026 3.0 2022 12.9 7.5&12.0 >=575.xx W1x E5F9A9A364BD51C4D1FA7761FBA3A259
2026 3.1 2022 12.9 7.5&12.0 >=575.xx W1x 74494FB3BA629F488A926A7F798928C3
2026 3.2 2022 12.9 7.5&12.0 >=575.xx W1x 4EAC17C270A5DDAFD6B1B018E5173E43

Sieve source code for V2.2 ... V3.2 is identical. Only program and CUDA parameters and tool sets were modified.

Usage

Open a Command Prompt and run the program LingSieve:

  1. start without parameter - it will use initial values - StartValue (0), segment size (10^11) and start a LDT
  2. if there is a valid Result file it will use the last entry as StartValue, segment size (10^11) and start a LDT
  3. if there is a valid user input it will use it instead of 1) or 2)
Examples ≥ V3.2 Comment Count
LingSieve -GPU get your graphics cards parameter -
LingSieve -bench get your graphics cards performance -
LingSieve /? Display help text -
LingSieve 0 Count from 0 ... 10^11 4.118.054.813
LingSieve 0 -d=1e9 Count from 0 ... 10^9 50.847.534
LingSieve 0 -d=1e10 Count from 0 ... 10^10 455.052.511
LingSieve 1e15 -d=1e11 Count from 10^15 ... 10^15+10^11 2.895.317.534
LingSieve 1e17 -d=1e12 Count from 10^17 ... 10^17+10^12 25.546.659.722
LingSieve 100 -d=48 Count from 100 ... 148 9
LingSieve 1e17 -d=2048 Count from 1e17 ... 1e17+2048 55
LingSieve 1e17+123 -d=1e4 Count from 1e17+123 ... 1e17+1e4+123 265

Status

I've sifted 6,8x10^16 numbers in various ranges without any errors, that is 68.000.000 cycles. Once I run it for more than 60 hours continuously without any problems, it just works as it is supposed to. The results were compared with the extensive prime sieve tables of Tomás Oliveira e Silva and Jan Büthe, randomly with the program primesieve 64bit of Kim Walisch.

Basic development is finished.

Packages

 
 
 

Contributors