Eratosthene optimisation, less memory usage by factor faster some result

antoine · July 12, 2023, 10:00pm

here some result bench of eratosthene using bitarray excluding factor of 2

limit 1e+11 found 4118054814 in 9 min. 51.17 sec. memory usage 5.82GB
limit 1e+12 found 37607912016 2 h. 0 min. 32.63 sec memory usage 58,08 GB

my implementation
limit: 1e+11 found 4118054882 in 3 min. 45.32 sec. memory usage 2,42 GB less memory up to 58,33 % compare to eratos
limit 1e+12 found 37607912066 48 min. 48.82 sec memory usage 24,19 GB same
limit 1.010 e+13 349405688301 14 h. 42 min. 11.82 sec Memory usage 212 GB less merory usage reaching 61,56 %

the limit here is a little higer precisely for 1e+11 is 100000001640.
need colaboration for doing crible on disk instead of in memory

kknechtel · July 12, 2023, 10:49pm

We can only possibly comment on code if that code is actually shown to us.

antoine · July 12, 2023, 11:03pm

thanks, but i didn’t need comment of, many month of work, was only looking for crazy guys like me for cribling on disk instead of in memory.
for now my limt is 1.010 e+13 for 230 Gb of memory usage that’s why i m looking for adapting that algo on disk. thank for the advice

this is the normal eratosthene sieves if u have any advice for doing it on disk instead

from bitarray import bitarray
from time import perf_counter


def eratostene(n):
    """
    Implements the Sieve of Eratosthenes algorithm to find prime numbers up to n.

    Args:
        n (int): Upper limit for the ℙ search.

    Returns:
        primes (list of int):
        List of prime numbers ℙ up to n.

    """
    # pattern for exclusion of 10k+3 & 10k+5 and of course even numbers

    primes = bitarray()
    pattern = bitarray('100101101101001')
    primes.extend(pattern * (n // 30))
    primes.extend(pattern[:n % 30])
    primes[0] = False

    # Perform the sieve algorithm
    benchcore = perf_counter()
    for index in range(3, (int(n ** 0.5) << 0b1) + 0b1):
        if primes[index]:
            p = (index << 0b1) + 0b1
            primes[p**2 >> 0b1::p] = 0
    benchcore_end = perf_counter()
    print(f"【Ératosthène】 find: {primes.count(1)+3} in {benchcore_end - benchcore:.2f} Sec.")

    # convert into odd numbers list
    for index in range(0b1, len(primes)):
        if primes[index]:
            yield (index << 0b1) + 0b1
print(list(eratosthene(10000000)))

pochmann · July 13, 2023, 12:03am

So you found 68 more primes for the same limit?

antoine · July 21, 2023, 4:06pm

                                       << Modular sieve >>

Memory needed: 0.2419 GB
Exclusion: [2, 3, 5, 7, 11] Modulo: 2310 Elements in maybe_prime: 480 Last element: 2309
Limit sieved: 10000001550 Limit asked: 1.0000000000000e+10 Elements prime in 1st loop: 338 Last element: 2309 Numbers of loop: 44

Sieving core 455052671 Prime numbers in 0 min. 19.05 sec.
All primes generated (455052511) in one list in (2 min. 12.75 sec.) total execution: 151.802 sec.

                                      <<Sieve Eratosthene>>

Memory needed: 0.5821 GB

Sieved core: 455052508 Prime numbers in 0 min. 47.60 sec.
All prime generated (455052511) in one list in (9 min. 26.65 sec.) Total execution: 614.242 sec.

antoine · August 20, 2023, 9:51am

Limit: 1.0000e+12

Initialisation 0 min. 5.59 sec.
Sieve core segment 1: 0 min. 51.45 sec.
Sieve core segment 2: 1 min. 3.94 sec.

Antoine            2 min.    0.98 sec.         Memory use: 2 x 11432.66 MB    Product of [2, 3, 5, 7, 11, 13] = 30030     Candidates: 5760

Eratosthene:  2 h. 0 min.	32.63 sec.

60 time faster

antoine · August 21, 2023, 11:01am

from bitarray import bitarray
from time import perf_counter as timer


class Chrono:
    """Class to measure time intervals and accumulate timing information.
    Attributes:
        bench (list of float): List of timer values used to track time intervals.

    # usage:
      - mytimer = Chrono()
      - print (f" blah blah blah {mytimer.mark(False,4)}")
    """
    def __init__(self):
        """Initialize the timer list with the current time."""
        self.bench = [timer()]

    def mark(self, tot=False, precision=2, layout="min"):
        """Mark a new event and return the time elapsed since the begining or the last marker.
        Args:
            tot (bool): Default (False) do math with previous time. (True) with starting time.
            precision (int): Default (2) Number of decimal places to display for time intervals.
            layout (str): Default ("min") display minutes too and ("sec") for seconds only.
        Returns:
            str: Time elapsed in the specified format."""
        self.bench.append(timer())
        seconds_elapsed = self.bench[-1] - self.bench[-2] if not tot else self.bench[-1] - self.bench[0]
        return f"{seconds_elapsed:.{precision}f} sec." if layout == "sec" else f"{seconds_elapsed // 60:.0f} min. {seconds_elapsed % 60:.{precision}f} sec."


def sieve_antoine(n, exclude):
    def gcd(a, b):
        while b != 0:
            a, b = b, a % b
        return a
    bench = Chrono()
    prod_exclude = 1
    for element in exclude:
        prod_exclude *= element  # Multiply excluded prime numbers together.
    candidates = [i for i in range(1, prod_exclude) if gcd(i, prod_exclude) == 1]  # Find every common factor in prod_exclude.

    arr_k = {val: bitarray() for val in candidates}  # Create an array for each candidate in candidates.
    [arr_k[val].extend(bitarray('1') * (n // prod_exclude + 1)) for val in candidates]  # size and init all arrays.

    print(f"\nLimit: {n:.4e} \033[92mMemory use: {((n / prod_exclude + 1) * len(candidates)) / 8 / 1024 / 1024:.2f} MB\033[0m"
          f"    Product of {exclude} = {prod_exclude}     Candidates: {len(candidates)}\nInitialisation    {bench.mark()}")

    for nth in range(0, int(n ** 0.5) // (prod_exclude - 1) + 1):  # limit ** 0.5 + 1
        for maybee in candidates:  # Iterate through each element of candidates.
            if nth == 0 and maybee == 1:  # don't sieve last number exclude % prod_exclude == 1  for the first loop
                continue
            if arr_k[maybee][nth]:  # If is true, is a prime. will wark all square in every array
                step = nth * prod_exclude + maybee  # Convert to real value.
                for x in candidates:  # Loop through candidates to mark each array.
                    if x < maybee:
                        x = x + prod_exclude  # Adjust x by adding prod_exclude. This adds 1 to the starting point index further if x is smaller than the candidate.
                    add = (x - maybee) >> 1
                    arr_k[(x * maybee) % prod_exclude][(step ** 2 + (step * add << 1)) // prod_exclude:: step] = False
    print(f"Sieve core:       {bench.mark()}")

    arr_k[1][0] = False  # Remove the number 1
    lst = list(exclus) + [(ix * prod_exclude) + val for ix in range(0, (n // prod_exclude) + 1) for val in candidates if arr_k[val][ix]]  # list exclude + Generate prime numbers.
    while lst[-1] > n: lst.pop()  # Remove over-generated numbers from the last window of prod_exclude.

    print(f"ℙrime generate:   {bench.mark()}    {len(lst)}\nExecution time:   {bench.mark(True,3,'min')}\n")
    return lst


def sieve_eratosthene(n):
    bench2 = Chrono()
    primes = bitarray()
    primes.extend(bitarray('1') * (n >> 1))
    print(f"                   Eratosthene        \033[92mMemory use: {n / 2 / 8 / 1024 / 1024:.3f} MB\033[0m")

    for index in range(1, (int(n ** 0.5) + 1 >> 1) + 1):
        if primes[index]:
            p_sqr_div2 = (p := (index << 1) + 1) ** 2 >> 1
            primes[p_sqr_div2::p] = False
    print(f"Sieve core:       {bench2.mark()}")

    lst = [(index << 1) + 1 for index in range(1, len(primes)) if primes[index]]
    print(f"ℙrime generate:   {bench2.mark()}    {len(lst)+1}\nExecution time:   {bench2.mark(True)} ")
    return [2] + lst


def compare(list1, list2):
    """Compare two lists and print the differences between them.
    Args:
        list1 (list[int]): First list to compare.
        list2 (list[int]): Second list to compare."""
    set1, set2 = set(list1), set(list2)
    oops1, oops2 = list(set1 - set2), list(set2 - set1)
    print("Both lists are identical") if len(oops1) == 0 and len(oops2) == 0 else print(f"Elements in list #1 not in list #2:{oops1} \nElements in list #2 not in list #1:{oops2}")


limit = 1000000000  # Limit.
# more you exclude, more the array will be short.
# if u excluse 2 economy 50% 3 is 16.5% more 5 is around 7% more ...bcose some are already exlude
# and shorter the array, the calculations done are less efficient.   better with big ranger
# sieving is faster up to 3 time and generate list faster than eratos for big list
exclus = [2, 3, 5, 7, 11]  # Exclude number, don't need to be consecutive( but exclude will be show first in the list). best up to 7 or 11 
ant = sieve_antoine(limit, exclus)
era = sieve_eratosthene(limit)
compare(ant, era)

antoine · August 25, 2023, 5:06pm

Multi core version
Limit: 1.0000e+12 (2 x 5.0 e 11)

                 Antoine          Product of [2, 3, 5, 7, 11, 13] = 30030     Candidates: 5760

Initialisation 0 min. 4.78 sec.
Sieve core segment 1: 0 min. 41.84 sec.
Sieve core segment 2: 1 min. 3.48 sec.
Global Execution time: 1 min. 45.316 sec.

69 time faster

antoine · August 29, 2023, 6:58pm

Limit: 10 000 000 000

                   Antoine        Memory use:  247.71 MB  modulo = 2310     Candidates: 480

(sieve 12 core, generation segmented 8 core in 1 list)
sieve: 0 min. 1.25 sec.
Generate in list: 0 min. 45.68 sec. 455052511
Execution time: 0 min. 46.93 sec.

               Eratosthene        Memory use: 596.046 MB

Sieve core: 0 min. 47.19 sec.
Generate in list: 8 min. 47.90 sec. 455052511
Execution time: 9 min. 35.08 sec.

Both lists are identicall

there is a way to generate a list more quickly?

hansgeunsmeyer · August 30, 2023, 2:56pm

If you’re going for raw speed - then using a list comprehension (as in your sieve_erasthostene function) may not be optimal. Also, have you tried using gmpy2 instead of bitarray to represent bit vectors (might be interesting to compare those two packages as to speed and memory usage)? You could also cythonize your current python code - even without writing special Cython code, that usually leads to speed ups.

hansgeunsmeyer · August 30, 2023, 4:19pm

Don’t know if you are aware of this, but the gmpy2 docs also contain a pretty smooth sample program for the sieve of Erasthotenes. See: Multiple-precision Integers (Advanced topics) — gmpy2 2.2.0a1 documentation

antoine · August 30, 2023, 6:36pm

i have try
its quick too but i get a overflow after 1.0 e+10
my way is still better with bittarray

1.0 +e9
Sieving & generation of 480 primes candidate in residue class set of 2310 (modulo) splited into 210 segments joined in 1 list . on 12 core,
Memory use: 24.00 MB
sieve: 0 min. 0.52 sec.
Generate in list: 0 min. 2.90 sec. 50847534
Running time: 0 min. 3.42 sec.

erato with gmpy2
11.020543336868286
50847534

will push my list as generator with this version of erato thanks

antoine · September 6, 2023, 7:55pm

Multi core version

from bitarray import bitarray
from multiprocessing import Pool
import pickle
import os
from time import perf_counter as timer


class Chrono:
    """Class to measure time intervals and store timing information."""
    def __init__(self):
        """Get current time in a list."""
        self.bench = [timer()]

    def mark(self, tot=False, precision=2, layout="min"):
        """Mark a new event and return the time elapsed since the begining or previous marker.
        Args:
            tot (bool): Default (False) result with previous marker. (True) with the first marker.
            precision (int): Default (2), seconds precision.
            layout (str): Default ("min") display min or ("sec") seconds only.
        Returns:
            str: Time elapsed in the specified format."""
        self.bench.append(timer())
        seconds_elapsed = self.bench[-1] - self.bench[-2] if not tot else self.bench[-1] - self.bench[0]
        return f"{seconds_elapsed:.{precision}f} sec." if layout == "sec" else f"{seconds_elapsed // 60:.0f} min. {seconds_elapsed % 60:.{precision}f} sec."


def sieve_eratosthene(n, verbose=True):
    primes = bitarray()
    primes.extend(bitarray('1') * (n >> 1))
    print(f"\033[92mMemory use: {n/2/1014/1024:.2f} MB\033[0m by Eratosthene") if verbose else None

    for i in range(1, (int(n ** 0.5) + 1 >> 1) + 1):
        if primes[i]:
            p_sqr_div2 = (p := (i << 1) + 1) ** 2 >> 1
            primes[p_sqr_div2::p] = False
    print(f"Sieve core:       {bench.mark()}") if verbose else None

    yield 2
    for _ in range(1, len(primes)):
        if primes[_]:
            yield (_ << 1) + 1


class Constants:
    """Constant values used in sieve calculations.

    Attributes:
        limit (int): The limit of the sieve.
        modulus (int): Primes exclude multiplied together.
        candidates (list[int]): Candidate primes within the modulus.
        master_loop (list[list[int]]): List of prime numbers. Need when all arrays aren't avaiable.
        add (list[list[int]]): sorted sublists and transposed, passed individually for performing parallel sieve in specific residue class."""
    @staticmethod
    def gcd(a, b):
        while b != 0:
            a, b = b, a % b
        return a

    def __init__(self, n, exclude):
        """ initializes object's attributes.
                .limit .modulus .candidates .master_loop .add

            Args:
                n (int): Limit of the sieve. Used to define master_loop.
                exclude (list[int]): Prime factors excluded from sieving."""
        self.limit = n
        self.modulus = 1
        for element in exclude:
            self.modulus *= element
        self.candidates = [i for i in range(1, self.modulus) if self.gcd(i, self.modulus) == 1]

        # List of primes == true, referring to the index of candidates.
        primes = list(sieve_eratosthene(int(n ** 0.5) + 1, False))  # Local use: Generate a list of prime numbers less than (n ** 0.5) + 1
        self.master_loop = [[] for _ in range((primes[-1] // self.modulus) + 1)]
        [self.master_loop[_ // self.modulus].append(self.candidates.index(_ % self.modulus)) for _ in primes[len(exclude):]]

        # list used for performing calculations in specific bitarray. load or make and save that list
        config_file_path = config_path + f"/{exclude}.cfg"
        if os.path.exists(config_file_path):                                        # Load constant ADD
            self.add = pickle.load(open(config_file_path, "rb"))
        else:                                                                       # create constant ADD if not
            self.add = []
            for i, value in enumerate(self.candidates):
                table_seq = [x + self.modulus for x in self.candidates[:i]] + self.candidates[i:]  # reconstruct candidates adding modulus for lower items than value.
                self.add.append([item[1] for item in sorted(((x * value) % self.modulus, (x - value) >> 1) for x in table_seq)])  # sorted list with 1st tuple, keep only the 2nd tuple.
            self.add = list(zip(*self.add))  # Transpose the sublists to group elements by index.
            os.makedirs(config_path) if not os.path.exists(config_path) else None  # Make dirpath if don't exist
            pickle.dump(self.add, open(config_file_path, "wb"))                    # Saving constant ADD


def sieve_residue_class(init, i):
    """Perform sieve calculations for one residue class set.

    Args:
        init: object holding constant (limit master_loop modulus candidates)
        i: Index used to get {keys} from candidates, representing the specific residue class.

    Returns:
        Tuple (specific residue class as keys, bitarays)"""
    value = bitarray()
    value.extend(bitarray('1') * (init.limit // init.modulus + 1))
    for nth in range(len(init.master_loop)):                    # Iterate each prod_exclude range
        for ix in init.master_loop[nth]:                       # Iterate the sublist to know if prime, datas are the index in candidates.
            step = nth * init.modulus + init.candidates[ix]   # Convert data into real number
            value[(step ** 2 + (step * init.add[i][ix] << 1)) // init.modulus::step] = 0
    return init.candidates[i], value


def generate_splited(psize, candidates, modulus, part):
    lenarray = len(part[1])
    f = [_ * modulus for _ in range(psize, psize + lenarray)]
    lst = [f[_] + c for _ in range(lenarray) for c in candidates if part[c][_]]
    return lst


def sieve_multicore_residue_class_set(limit, exclude, core, parts):
    bench2 = Chrono()
    init = Constants(limit, exclude)        # initialisation
    print(f"\nLimit: {limit}  {limit:.4e} \n\nSieving & generation of {len(init.candidates)} primes candidate in residue class set of \033[92m{init.modulus} (modulus) \033[0m"
          f"splited into {parts} segments joined in 1 list . on {core} core, \n\033[92mMemory use: {((limit // init.modulus) * len(init.candidates)) /1014/1024:.2f} MB\033[0m by Antoine")

    with Pool(core) as pool:                # multicore sieve
        results = dict(pool.starmap(sieve_residue_class, [(init, i) for i in range(len(init.candidates))]))
    results[1][0] = 0
    print(f"sieve:            {bench2.mark()}")

    part_size = len(results[1]) // parts     # split array for multicore gen
    partitions = {i + 1: {key: results[key][i * part_size:(i + 1) * part_size if i < parts - 1 else None] for key in init.candidates} for i in range(parts)}
    print(f"split:            {bench2.mark()}")

    with Pool(processes=core) as pool:      # multicore gen
        args = [(segment * part_size, init.candidates, init.modulus, partitions[segment + 1]) for segment in range(parts)]
        results = pool.starmap(generate_splited, args)
    while int(results[-1][-1]) > limit:     # remove extra numbers of the last residue class set
        results[-1].pop()
    print(f"Generate in list: {bench2.mark()}  {len(exclude) + sum(len(result) for result in results)}\nRunning time:     {bench2.mark(True)}\n")

    return exclude + [item for result in results for item in result]


if __name__ == "__main__":
    config_path = "d:/optimus"      # path of precal table ADD  take some time for big table that's i save
    limite = 10000000000
    exclusion = [2, 3, 5, 7, 11]    # Exclude numbers, need to be primes and consecutive. more you use egal less memory usage for sieving. but shorter the bitarray. need big limits to be efficient. best up to 7 or 11
    ant = sieve_multicore_residue_class_set(limite, exclusion, core=12, parts=100)   # number of core(process) max 61, and how many time all arrays are splited. create a list, so need alot of memory for big limit

    bench = Chrono()
    era = (sieve_eratosthene(limite))  # simple
    for gen_prime, list_prime in zip(era, (_ for _ in ant)):  # firstly the list "ant" is converted into a generator for compare both as generator
        if gen_prime != list_prime:                           # even with the use of generator it's a very long process for big range multicore is realy more quick
            msg = "Generator aren't identical."
            break
    else:
        msg = "Both generator are identical."
    print(f"Generator: {bench.mark()}\nRunning time:     {bench.mark(True)}\n\n{msg}")

Limit: 10000000000 1.0000e+10
Sieving & generation of 480 primes candidate in residue class set of 2310 (modulus) splited into 100 segments joined in 1 list . on 12 core,
Memory use: 2001.20 MB by Antoine
sieve: 0 min. 0.80 sec.
split: 0 min. 0.27 sec.
Generate in list: 0 min. 31.20 sec. 455052511
Running time: 0 min. 32.28 sec.
-_____________________________________-
Memory use: 4815.40 MB by Eratosthene
Sieve core: 0 min. 48.08 sec.
Generator: 9 min. 22.12 sec.
Running time: 10 min. 10.20 sec.

Both generator are identical.

Limit: 100000000000 1.0000e+11
Sieving & generation of 480 primes candidate in residue class set of 2310 (modulus) splited into 100 segments joined in 1 list . on 12 core,
Memory use: 20012.04 MB by Antoine
sieve: 0 min. 24.60 sec.
split: 0 min. 2.03 sec.
Generate in list: 5 min. 0.28 sec. 4118054813
Running time: 5 min. 26.92 sec.
-_____________________________________-
Memory use: 48153.97 MB by Eratosthene
Sieve core: 10 min. 20.09 sec.
Generator: 122 min. 11.36 sec.
Running time: 132 min. 41.45 sec.

antoine · June 28, 2024, 5:57pm

Limite: 1.0000e+10 Mod 2310 Residus de classe: 480
initialization: 0.153 sec.
sieve: 0.775 sec. cumul: 0.928 sec.
Splitting: 0.131 sec. cumul: 1.059 sec.
Generate in list: 5.870 sec. cumul: 6.929 sec. ℙremiers: 455 052 511
joining: 0.866 sec. total: 7.795 sec.

Limite: 1.0000e+11 Mod 2310 Residus de classe: 480
initialization: 0.416 sec.
sieve: 22.128 sec. cumul: 22.544 sec.
Splitting: 1.427 sec. cumul: 23.971 sec.
Generate in list: 51.182 sec. cumul: 75.153 sec. ℙremiers: 4 118 054 813
joining: 7.706 sec. total: 82.859 sec.