One simple trick to improve the python speed x10 x100 (Codon)

https://docs.exaloop.io/codon/

You can compile “python” (codon) to native code, or have some python (codon) functions in your python codebase native compiled by the codon JIT. Also supports OpenMP and GPUs.

Syntax: (same as python with additions): https://docs.exaloop.io/codon/language/basics

Seems that the golden rule is that you can directly use your .py file. (Except very large codebases, where you can use the codon JIT).

1. JIT

import codon
from time import time

def is_prime_python(n):
    if n <= 1:
        return False
    for i in range(2, n):
        if n % i == 0:
            return False
    return True

@codon.jit
def is_prime_codon(n):
    if n <= 1:
        return False
    for i in range(2, n):
        if n % i == 0:
            return False
    return True

t0 = time()
ans = sum(1 for i in range(100000, 200000) if is_prime_python(i))
t1 = time()
print(f'[python] {ans} | took {t1 - t0} seconds')

t0 = time()
ans = sum(1 for i in range(100000, 200000) if is_prime_codon(i))
t1 = time()
print(f'[codon]  {ans} | took {t1 - t0} seconds')
[python] 8392 | took 39.6610209941864 seconds
[codon]  8392 | took 0.998633861541748 seconds

2. OpenMP

@par
for i in range(10):
    import threading as thr
    print('hello from thread', thr.get_ident())
@par
for i in range(10):
    import threading as thr
    print('hello from thread', thr.get_ident())

3. GPU

Only Nvidia devices are supported

import gpu

@gpu.kernel
def hello(a, b, c):
    i = gpu.thread.x
    c[i] = a[i] + b[i]

a = [i for i in range(16)]
b = [2*i for i in range(16)]
c = [0 for _ in range(16)]

hello(a, b, c, grid=1, block=16)
print(c)

This code is equivalent to (simpler):

a = [i for i in range(16)]
b = [2*i for i in range(16)]
c = [0 for _ in range(16)]

@par(gpu=True)
for i in range(16):
    c[i] = a[i] + b[i]

print(c)