pyfock.Integrals

This submodule provides highly optimized and symmetry-aware routines for evaluating all essential one-electron and two-electron integrals needed in electronic structure calculations. The integrals are computed over Gaussian type orbitals (GTOs), and both CPU (NumPy) and GPU (CuPy) implementations are available for many routines to accelerate quantum chemistry workflows.

Included modules and functionality:

  • One-electron integrals:
    • Overlap integrals
    • Kinetic energy integrals
    • Nuclear attraction integrals
    • Dipole moment integrals
    • Gradients of one-electron integrals
  • Two-electron integrals:
    • Full 4-center two electron repulsion integrals using Rys quadrature
    • 3-center and 2-center two electron integrals for density fitting
    • Schwarz screening utilities for integral pruning
  • Exchange-correlation evaluation routines compatible with numerical grids
  • GPU-accelerated (CuPy) versions for key performance-critical routines
  • Modular helper functions for computing factorials, boys functions, contraction coefficients, etc.

Usage:

Example: from pyfock.Integrals import overlap_mat_symm S = overlap_mat_symm(basis)

  1"""
  2This submodule provides highly optimized and symmetry-aware routines for 
  3evaluating all essential one-electron and two-electron integrals needed in 
  4electronic structure calculations. The integrals are computed over Gaussian 
  5type orbitals (GTOs), and both CPU (NumPy) and GPU (CuPy) implementations are 
  6available for many routines to accelerate quantum chemistry workflows.
  7
  8Included modules and functionality:
  9-----------------------------------
 10- One-electron integrals:
 11    * Overlap integrals
 12    * Kinetic energy integrals
 13    * Nuclear attraction integrals
 14    * Dipole moment integrals
 15    * Gradients of one-electron integrals
 16- Two-electron integrals:
 17    * Full 4-center two electron repulsion integrals using Rys quadrature
 18    * 3-center and 2-center two electron integrals for density fitting
 19    * Schwarz screening utilities for integral pruning
 20- Exchange-correlation evaluation routines compatible with numerical grids
 21- GPU-accelerated (CuPy) versions for key performance-critical routines
 22- Modular helper functions for computing factorials, boys functions, contraction coefficients, etc.
 23
 24Usage:
 25------
 26
 27Example:
 28    from pyfock.Integrals import overlap_mat_symm
 29    S = overlap_mat_symm(basis)
 30
 31"""
 32# from .integral_helpers import fac
 33# from .integral_helpers import fastFactorial
 34# from .integral_helpers import comb
 35# from .integral_helpers import doublefactorial
 36# from .integral_helpers import c2k
 37# from .integral_helpers import calcS
 38# from .integral_helpers import vlriPartial
 39# from .integral_helpers import calcCgamminc
 40# from .integral_helpers import Fboys
 41from .mmd_nuc_mat_symm import mmd_nuc_mat_symm
 42from .nuc_mat_symm import nuc_mat_symm
 43from .nuc_mat_symm_cupy import nuc_mat_symm_cupy
 44from .kin_mat_symm import kin_mat_symm
 45from .kin_mat_symm_cupy import kin_mat_symm_cupy
 46from .kin_mat_symm_shell_cupy import kin_mat_symm_shell_cupy
 47from .overlap_mat_symm import overlap_mat_symm
 48from .overlap_mat_symm_cupy import overlap_mat_symm_cupy
 49from .cross_overlap_mat_symm import cross_overlap_mat_symm
 50from .dipole_moment_mat_symm import dipole_moment_mat_symm
 51from .dipole_moment_mat_symm_cupy import dipole_moment_mat_symm_cupy
 52from .conv_4c2e_symm import conv_4c2e_symm
 53from .mmd_4c2e_symm import mmd_4c2e_symm
 54from .rys_4c2e_symm import rys_4c2e_symm, rys_4c2e_symm_old
 55from .conv_3c2e_symm import conv_3c2e_symm
 56from .rys_3c2e_symm import rys_3c2e_symm
 57from .rys_3c2e_symm_cupy import rys_3c2e_symm_cupy
 58from .rys_3c2e_symm_cupy_fp32 import rys_3c2e_symm_cupy_fp32
 59from .conv_2c2e_symm import conv_2c2e_symm
 60from .rys_2c2e_symm import rys_2c2e_symm
 61from .rys_2c2e_symm_cupy import rys_2c2e_symm_cupy
 62from .rys_3c2e_tri import rys_3c2e_tri
 63from .rys_nuc_mat_symm import rys_nuc_mat_symm
 64from .schwarz_helpers import rys_3c2e_tri_schwarz
 65from .eval_xc_1 import eval_xc_1
 66from .eval_xc_2 import eval_xc_2
 67from .eval_xc_3 import eval_xc_3
 68from .eval_xc_1_cupy import eval_xc_1_cupy
 69from .eval_xc_2_cupy import eval_xc_2_cupy
 70from .eval_xc_3_cupy import eval_xc_3_cupy
 71from . import bf_val_helpers
 72from . import rys_helpers
 73from . import schwarz_helpers
 74from . import schwarz_helpers_cupy
 75from . import integral_helpers
 76from .overlap_mat_grad_symm import overlap_mat_grad_symm
 77from .kin_mat_grad_symm import kin_mat_grad_symm
 78from .nuc_mat_grad_symm import nuc_mat_grad_symm
 79
 80# __all__ = ['fac', 'fastFactorial', 'comb', 'doublefactorial', 'c2k', 'calcS', 'vlriPartial', 'calcCgamminc', 'Fboys', 'comb'\
 81#     , 'comb', 'comb', 'comb', 'comb', 'comb']
 82
 83__all__ = ['integral_helpers', 'mmd_nuc_mat_symm', 'nuc_mat_symm', 'kin_mat_symm', 'overlap_mat_symm', 'conv_4c2e_symm', 'mmd_4c2e_symm', 'rys_helpers', 'rys_4c2e_symm',\
 84     'rys_4c2e_symm_old', 'conv_3c2e_symm', 'rys_3c2e_symm', 'conv_2c2e_symm', 'rys_2c2e_symm', 'rys_3c2e_tri', 'rys_nuc_mat_symm', 'schwarz_helpers', 'rys_3c2e_tri_schwarz'\
 85        'bf_val_helpers', 'eval_xc_1', 'eval_xc_2', 'eval_xc_3', 'eval_xc_1_cupy', 'eval_xc_2_cupy', 'eval_xc_3_cupy', 'dipole_moment_mat_symm', 'kin_mat_symm_cupy'\
 86        , 'overlap_mat_symm_cupy', 'dipole_moment_mat_symm_cupy', 'nuc_mat_symm_cupy', 'kin_mat_symm_shell_cupy', 'rys_2c2e_symm_cupy', 'rys_3c2e_symm_cupy',\
 87        'rys_3c2e_symm_cupy_fp32', 'schwarz_helpers_cupy', 'overlap_mat_grad_symm', 'kin_mat_grad_symm', 'nuc_mat_grad_symm', 'cross_overlap_mat_symm']
 88
 89# # This code will import all of the modules in the library directory and expose all of 
 90# # the functions in those modules to the user. The user can then use the functions just 
 91# # like they would any other function in the library package:
 92
 93# # Keep in mind that this code will expose all functions in the library directory, 
 94# # including any private functions (i.e. functions that start with an underscore). 
 95# # If you only want to expose certain functions, 
 96# # you can modify the __all__ list to include only the functions that you want to expose.
 97
 98# import glob
 99
100# # Get a list of all module filenames in the library directory
101# # module_filenames = glob.glob('pyfock/Integrals/*.py')
102# module_filenames = glob.glob('*.py')
103
104# print(module_filenames)
105
106# # Remove the `__init__.py` file from the list
107# # module_filenames.remove('pyfock/Integrals/__init__.py')
108# module_filenames.remove('__init__.py')
109
110# # Use a for loop to import all of the modules in the Integrals directory
111# for module_filename in module_filenames:
112#     # module_filename = module_filename.replace("pyfock/Integrals/", "")
113#     module_name = module_filename[:-3]
114#     exec(f'from .{module_name} import *')
115
116# # Expose all of the functions in the __all__ variable
117# __all__ = [
118#     func for func in dir()
119#     if func[0] != '_' and callable(eval(func))
120# ]
def mmd_nuc_mat_symm(basis, mol, slice=None):
 7def mmd_nuc_mat_symm(basis, mol, slice=None):
 8    #Here the lists are converted to numpy arrays for better use with Numba.
 9    #Once these conversions are done we pass these to a Numba decorated
10    #function that uses prange, etc. to calculate the matrix efficiently.
11
12    # This function calculates the nuclear matrix for a given basis object.
13    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
14    # It is possible to only calculate a slice (block) of the complete matrix.
15    # slice is a 4 element list whose first and second elements give the range of the rows to be calculated.
16    # the third and fourth element give the range of columns to be calculated.
17    # slice = [start_row, end_row, start_col, end_col]
18    # The integrals are performed using the formulas
19
20    #We convert the required properties to numpy arrays as this is what Numba likes.
21    bfs_coords = np.array([basis.bfs_coords])
22    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
23    bfs_lmn = np.array([basis.bfs_lmn])
24    bfs_nprim = np.array([basis.bfs_nprim])
25    coordsBohrs = np.array([mol.coordsBohrs])
26    Z = np.array([mol.Zcharges])
27    natoms = mol.natoms
28        
29
30    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
31    #Numba won't be able to work with these efficiently.
32    #So, we convert them to a numpy 2d array by applying a trick,
33    #that the second dimension is that of the largest list. So that
34    #it can accomadate all the lists.
35    maxnprim = max(basis.bfs_nprim)
36    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
37    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
38    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
39    for i in range(basis.bfs_nao):
40        for j in range(basis.bfs_nprim[i]):
41            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
42            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
43            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
44        
45
46
47    if slice is None:
48        slice = [0, basis.bfs_nao, 0, basis.bfs_nao]
49        
50    #Limits for the calculation of overlap integrals
51    a = int(slice[0]) #row start index
52    b = int(slice[1]) #row end index
53    c = int(slice[2]) #column start index
54    d = int(slice[3]) #column end index
55
56    # print([a,b,c,d])
57    
58    V = mmd_nuc_mat_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d, Z[0], coordsBohrs[0], natoms)
59    
60    return V 
def nuc_mat_symm(basis, mol, slice=None, sqrt_ints4c2e_diag=None):
  7def nuc_mat_symm(basis, mol, slice=None, sqrt_ints4c2e_diag=None):
  8    """
  9    Compute the nuclear attraction matrix for a given basis and molecular geometry.
 10
 11    This function evaluates the one-electron nuclear attraction integrals 
 12    ⟨χ_i | V_nuc | χ_j⟩, where V_nuc is the Coulombic potential from the atomic nuclei.
 13    The integrals are computed between all basis functions defined in `basis` and 
 14    and the nuclei within the `mol` object.
 15    So in principle, it can be used to compute the nuclear potential matrix due to the
 16    nuclei in one molecule, in the basis of another molecule.
 17
 18    Efficient Numba-accelerated routines are used to improve performance. 
 19    A partial block of the matrix can be computed by specifying a `slice`.
 20
 21    Parameters
 22    ----------
 23    basis : object
 24        Basis set object with fields including:
 25        - bfs_coords: Cartesian coordinates of basis function centers.
 26        - bfs_coeffs: Contraction coefficients.
 27        - bfs_expnts: Gaussian exponents.
 28        - bfs_prim_norms: Normalization constants for primitives.
 29        - bfs_contr_prim_norms: Normalization constants for contracted functions.
 30        - bfs_lmn: Angular momentum quantum numbers (ℓ, m, n).
 31        - bfs_nprim: Number of primitives per basis function.
 32        - bfs_nao: Number of atomic orbitals (contracted basis functions).
 33
 34    mol : object
 35        Molecule object containing:
 36        - coordsBohrs: Cartesian coordinates of nuclei (in Bohr).
 37        - Zcharges: Nuclear charges.
 38        - natoms: Number of atoms.
 39
 40    slice : list of int, optional
 41        A 4-element list `[start_row, end_row, start_col, end_col]` defining 
 42        the sub-block of the matrix to compute. Rows and columns refer to AOs.
 43        If `None` (default), the entire nuclear attraction matrix is computed.
 44
 45    sqrt_ints4c2e_diag : ndarray, optional
 46        Optional Schwarz screening preconditioner.
 47        If provided, Schwarz screening is enabled to skip insignificant integrals.
 48        If `None` (default), all integrals are computed without screening.
 49
 50    Returns
 51    -------
 52    V : ndarray of shape (end_row - start_row, end_col - start_col)
 53        The computed (sub)matrix of nuclear attraction integrals.
 54
 55    Notes
 56    -----
 57    The integrals are evaluated using standard expressions over contracted Gaussians.
 58    If Schwarz screening is used (`sqrt_ints4c2e_diag` is not None), a threshold-based 
 59    pruning is applied using precomputed upper bounds.
 60
 61    Examples
 62    --------
 63    >>> V = nuc_mat_symm(basis, mol)
 64    >>> V_block = nuc_mat_symm(basis, mol, slice=[0, 10, 0, 10])
 65    >>> V_schwarz = nuc_mat_symm(basis, mol, sqrt_ints4c2e_diag=sqrt_diag)
 66    """
 67    #Here the lists are converted to numpy arrays for better use with Numba.
 68    #Once these conversions are done we pass these to a Numba decorated
 69    #function that uses prange, etc. to calculate the matrix efficiently.
 70
 71    # This function calculates the nuclear matrix for a given basis object.
 72    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
 73    # It is possible to only calculate a slice (block) of the complete matrix.
 74    # slice is a 4 element list whose first and second elements give the range of the rows to be calculated.
 75    # the third and fourth element give the range of columns to be calculated.
 76    # slice = [start_row, end_row, start_col, end_col]
 77    # The integrals are performed using the formulas
 78
 79    #We convert the required properties to numpy arrays as this is what Numba likes.
 80    bfs_coords = np.array([basis.bfs_coords])
 81    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
 82    bfs_lmn = np.array([basis.bfs_lmn])
 83    bfs_nprim = np.array([basis.bfs_nprim])
 84    coordsBohrs = np.array([mol.coordsBohrs])
 85    Z = np.array([mol.Zcharges])
 86    natoms = mol.natoms
 87        
 88
 89    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
 90    #Numba won't be able to work with these efficiently.
 91    #So, we convert them to a numpy 2d array by applying a trick,
 92    #that the second dimension is that of the largest list. So that
 93    #it can accomadate all the lists.
 94    maxnprim = max(basis.bfs_nprim)
 95    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
 96    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
 97    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
 98    for i in range(basis.bfs_nao):
 99        for j in range(basis.bfs_nprim[i]):
100            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
101            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
102            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
103        
104
105
106    if slice is None:
107        slice = [0, basis.bfs_nao, 0, basis.bfs_nao]
108        
109    #Limits for the calculation of overlap integrals
110    a = int(slice[0]) #row start index
111    b = int(slice[1]) #row end index
112    c = int(slice[2]) #column start index
113    d = int(slice[3]) #column end index
114
115    # print([a,b,c,d])
116
117    if sqrt_ints4c2e_diag is None:
118        #Create dummy array
119        sqrt_ints4c2e_diag = np.zeros((1,1), dtype=np.float64)
120        isSchwarz = False
121    else:
122        isSchwarz = True
123    
124    V = nuc_mat_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d, Z[0], coordsBohrs[0], natoms, sqrt_ints4c2e_diag, isSchwarz)
125    
126    return V 

Compute the nuclear attraction matrix for a given basis and molecular geometry.

This function evaluates the one-electron nuclear attraction integrals ⟨χ_i | V_nuc | χ_j⟩, where V_nuc is the Coulombic potential from the atomic nuclei. The integrals are computed between all basis functions defined in basis and and the nuclei within the mol object. So in principle, it can be used to compute the nuclear potential matrix due to the nuclei in one molecule, in the basis of another molecule.

Efficient Numba-accelerated routines are used to improve performance. A partial block of the matrix can be computed by specifying a slice.

Parameters

basis : object Basis set object with fields including: - bfs_coords: Cartesian coordinates of basis function centers. - bfs_coeffs: Contraction coefficients. - bfs_expnts: Gaussian exponents. - bfs_prim_norms: Normalization constants for primitives. - bfs_contr_prim_norms: Normalization constants for contracted functions. - bfs_lmn: Angular momentum quantum numbers (ℓ, m, n). - bfs_nprim: Number of primitives per basis function. - bfs_nao: Number of atomic orbitals (contracted basis functions).

mol : object Molecule object containing: - coordsBohrs: Cartesian coordinates of nuclei (in Bohr). - Zcharges: Nuclear charges. - natoms: Number of atoms.

slice : list of int, optional A 4-element list [start_row, end_row, start_col, end_col] defining the sub-block of the matrix to compute. Rows and columns refer to AOs. If None (default), the entire nuclear attraction matrix is computed.

sqrt_ints4c2e_diag : ndarray, optional Optional Schwarz screening preconditioner. If provided, Schwarz screening is enabled to skip insignificant integrals. If None (default), all integrals are computed without screening.

Returns

V : ndarray of shape (end_row - start_row, end_col - start_col) The computed (sub)matrix of nuclear attraction integrals.

Notes

The integrals are evaluated using standard expressions over contracted Gaussians. If Schwarz screening is used (sqrt_ints4c2e_diag is not None), a threshold-based pruning is applied using precomputed upper bounds.

Examples

>>> V = nuc_mat_symm(basis, mol)
>>> V_block = nuc_mat_symm(basis, mol, slice=[0, 10, 0, 10])
>>> V_schwarz = nuc_mat_symm(basis, mol, sqrt_ints4c2e_diag=sqrt_diag)
def kin_mat_symm(basis, slice=None):
 6def kin_mat_symm(basis, slice=None):
 7    """
 8    Compute the kinetic energy matrix for a given basis set.
 9
10    This function evaluates the one-electron kinetic energy integrals 
11    ⟨χ_i | -½∇² | χ_j⟩ for a set of contracted Gaussian basis functions defined in `basis`. 
12    It uses an optimized backend with improved memory layout, early termination, 
13    and partial vectorization for enhanced performance.
14
15    Block-wise computation is supported through the optional `slice` parameter.
16
17    Parameters
18    ----------
19    basis : object
20        Basis set object containing properties such as:
21        - bfs_coords: Cartesian coordinates of basis function centers.
22        - bfs_coeffs: Contraction coefficients.
23        - bfs_expnts: Gaussian exponents.
24        - bfs_prim_norms: Normalization constants for primitives.
25        - bfs_contr_prim_norms: Normalization constants for contracted functions.
26        - bfs_lmn: Angular momentum quantum numbers (ℓ, m, n).
27        - bfs_nprim: Number of primitives per basis function.
28        - bfs_nao: Number of atomic orbitals (contracted basis functions).
29
30    slice : list of int, optional
31        A 4-element list `[start_row, end_row, start_col, end_col]` specifying the 
32        matrix block to compute. Rows and columns refer to AOs. If `None` (default), 
33        the entire kinetic energy matrix is calculated.
34
35    Returns
36    -------
37    T : ndarray of shape (end_row - start_row, end_col - start_col)
38        The computed (sub)matrix of kinetic energy integrals.
39
40    Notes
41    -----
42    Internally, this function:
43    - Pads coefficient/exponent arrays for uniform shape and Numba compatibility.
44    - Reduces redundant operations and loops.
45    - Utilizes an optimized `kin_mat_symm_internal_optimized` backend.
46
47    Examples
48    --------
49    >>> T = kin_mat_symm(basis)
50    >>> T_block = kin_mat_symm(basis, slice=[0, 10, 0, 10])
51    """
52    
53    # Convert to numpy arrays (same as before)
54    bfs_coords = np.array(basis.bfs_coords)  # Remove unnecessary list wrapping
55    bfs_contr_prim_norms = np.array(basis.bfs_contr_prim_norms)
56    bfs_lmn = np.array(basis.bfs_lmn)
57    bfs_nprim = np.array(basis.bfs_nprim)
58    
59    # Optimize the coefficient/exponent arrays
60    maxnprim = max(basis.bfs_nprim)
61    bfs_coeffs = np.zeros((basis.bfs_nao, maxnprim))
62    bfs_expnts = np.zeros((basis.bfs_nao, maxnprim))
63    bfs_prim_norms = np.zeros((basis.bfs_nao, maxnprim))
64    
65    # Vectorized assignment (faster than nested loops)
66    for i in range(basis.bfs_nao):
67        n = basis.bfs_nprim[i]
68        bfs_coeffs[i, :n] = basis.bfs_coeffs[i]
69        bfs_expnts[i, :n] = basis.bfs_expnts[i]
70        bfs_prim_norms[i, :n] = basis.bfs_prim_norms[i]
71    
72    if slice is None:
73        slice = [0, basis.bfs_nao, 0, basis.bfs_nao]
74    
75    a, b, c, d = map(int, slice)
76    
77    T = kin_mat_symm_internal_optimized(
78        bfs_coords, bfs_contr_prim_norms, bfs_lmn, bfs_nprim, 
79        bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d
80    )
81    
82    return T

Compute the kinetic energy matrix for a given basis set.

This function evaluates the one-electron kinetic energy integrals ⟨χ_i | -½∇² | χ_j⟩ for a set of contracted Gaussian basis functions defined in basis. It uses an optimized backend with improved memory layout, early termination, and partial vectorization for enhanced performance.

Block-wise computation is supported through the optional slice parameter.

Parameters

basis : object Basis set object containing properties such as: - bfs_coords: Cartesian coordinates of basis function centers. - bfs_coeffs: Contraction coefficients. - bfs_expnts: Gaussian exponents. - bfs_prim_norms: Normalization constants for primitives. - bfs_contr_prim_norms: Normalization constants for contracted functions. - bfs_lmn: Angular momentum quantum numbers (ℓ, m, n). - bfs_nprim: Number of primitives per basis function. - bfs_nao: Number of atomic orbitals (contracted basis functions).

slice : list of int, optional A 4-element list [start_row, end_row, start_col, end_col] specifying the matrix block to compute. Rows and columns refer to AOs. If None (default), the entire kinetic energy matrix is calculated.

Returns

T : ndarray of shape (end_row - start_row, end_col - start_col) The computed (sub)matrix of kinetic energy integrals.

Notes

Internally, this function:

  • Pads coefficient/exponent arrays for uniform shape and Numba compatibility.
  • Reduces redundant operations and loops.
  • Utilizes an optimized kin_mat_symm_internal_optimized backend.

Examples

>>> T = kin_mat_symm(basis)
>>> T_block = kin_mat_symm(basis, slice=[0, 10, 0, 10])
def overlap_mat_symm(basis, slice=None):
 7def overlap_mat_symm(basis, slice=None):
 8    """
 9    Compute the overlap matrix for a given basis set using symmetry-aware integrals.
10
11    This function evaluates the overlap integrals ⟨χ_i | χ_j⟩ between all pairs 
12    of basis functions defined in the `basis` object. It supports partial evaluation 
13    of the overlap matrix by specifying a slice.
14
15    The integrals are computed using an efficient Numba-accelerated backend that 
16    benefits from parallelization via `prange` and preprocessed NumPy arrays.
17
18    Parameters
19    ----------
20    basis : object
21        A basis set object that contains information about basis functions, such as:
22        - bfs_coords: Cartesian coordinates of the basis function centers.
23        - bfs_coeffs: Contraction coefficients.
24        - bfs_expnts: Gaussian exponents.
25        - bfs_prim_norms: Primitive normalization constants.
26        - bfs_contr_prim_norms: Contraction normalization factors.
27        - bfs_lmn: Angular momentum quantum numbers (ℓ, m, n).
28        - bfs_nprim: Number of primitives per basis function.
29        - bfs_nao: Total number of atomic orbitals.
30    
31    slice : list of int, optional
32        A 4-element list specifying a sub-block of the matrix to compute:
33        [start_row, end_row, start_col, end_col].
34        If None (default), the full overlap matrix is computed.
35
36    Returns
37    -------
38    S : ndarray of shape (end_row - start_row, end_col - start_col)
39        The computed (sub)matrix of overlap integrals.
40
41    Notes
42    -----
43    This function is optimized for performance using preallocated NumPy arrays
44    and avoids nested Python lists which are not supported efficiently by Numba.
45
46    Examples
47    --------
48    >>> S = overlap_mat_symm(basis)
49    >>> S_block = overlap_mat_symm(basis, slice=[0, 10, 0, 10])
50    """
51    # Here the lists are converted to numpy arrays for better use with Numba.
52    # Once these conversions are done we pass these to a Numba decorated
53    # function that uses prange, etc. to calculate the matrix efficiently.
54
55    # This function calculates the overlap matrix for a given basis object.
56    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
57    # It is possible to only calculate a slice (block) of the complete matrix.
58    # slice is a 4 element list whose first and second elements give the range of the rows to be calculated.
59    # the third and fourth element give the range of columns to be calculated.
60    # slice = [start_row, end_row, start_col, end_col]
61    # The integrals are performed using the formulas
62
63    #We convert the required properties to numpy arrays as this is what Numba likes.
64    bfs_coords = np.array([basis.bfs_coords])
65    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
66    bfs_lmn = np.array([basis.bfs_lmn])
67    bfs_nprim = np.array([basis.bfs_nprim])
68        
69
70    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
71    #Numba won't be able to work with these efficiently.
72    #So, we convert them to a numpy 2d array by applying a trick,
73    #that the second dimension is that of the largest list. So that
74    #it can accomadate all the lists.
75    maxnprim = max(basis.bfs_nprim)
76    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
77    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
78    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
79    for i in range(basis.bfs_nao):
80        for j in range(basis.bfs_nprim[i]):
81            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
82            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
83            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
84        
85
86    if slice is None:
87        slice = [0, basis.bfs_nao, 0, basis.bfs_nao]
88        
89    #Limits for the calculation of overlap integrals
90    a = int(slice[0]) #row start index
91    b = int(slice[1]) #row end index
92    c = int(slice[2]) #column start index
93    d = int(slice[3]) #column end index
94        
95    S = overlap_mat_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d)
96    return S

Compute the overlap matrix for a given basis set using symmetry-aware integrals.

This function evaluates the overlap integrals ⟨χ_i | χ_j⟩ between all pairs of basis functions defined in the basis object. It supports partial evaluation of the overlap matrix by specifying a slice.

The integrals are computed using an efficient Numba-accelerated backend that benefits from parallelization via prange and preprocessed NumPy arrays.

Parameters

basis : object A basis set object that contains information about basis functions, such as: - bfs_coords: Cartesian coordinates of the basis function centers. - bfs_coeffs: Contraction coefficients. - bfs_expnts: Gaussian exponents. - bfs_prim_norms: Primitive normalization constants. - bfs_contr_prim_norms: Contraction normalization factors. - bfs_lmn: Angular momentum quantum numbers (ℓ, m, n). - bfs_nprim: Number of primitives per basis function. - bfs_nao: Total number of atomic orbitals.

slice : list of int, optional A 4-element list specifying a sub-block of the matrix to compute: [start_row, end_row, start_col, end_col]. If None (default), the full overlap matrix is computed.

Returns

S : ndarray of shape (end_row - start_row, end_col - start_col) The computed (sub)matrix of overlap integrals.

Notes

This function is optimized for performance using preallocated NumPy arrays and avoids nested Python lists which are not supported efficiently by Numba.

Examples

>>> S = overlap_mat_symm(basis)
>>> S_block = overlap_mat_symm(basis, slice=[0, 10, 0, 10])
def conv_4c2e_symm(basis, slice=None):
  7def conv_4c2e_symm(basis, slice=None):
  8    """
  9    Compute four-center two-electron (4c2e) electron repulsion integrals (ERIs)
 10    using the conventional and slow (analytical) formula-based method with full symmetry exploitation.
 11
 12    This function evaluates integrals of the form (A B | C D), where
 13    A, B, C, D are basis functions from the same primary basis set.
 14
 15    The "conv" variant uses explicit analytical integral formulas and
 16    nested loops over primitive Gaussians, following the derivations in:
 17
 18        J. Chem. Educ. 2018, 95, 9, 1572–1578
 19        https://pubs.acs.org/doi/full/10.1021/acs.jchemed.8b00255
 20
 21    Compared to Rys quadrature, this conventional approach is much more
 22    computationally expensive but exact for the given formula set, making
 23    it useful for validation and special-purpose computations.
 24
 25    Symmetries exploited
 26    --------------------
 27    For 4c2e integrals, the full 8-fold permutational symmetry is used:
 28
 29        (A B | C D) = (B A | C D) = (A B | D C) = (B A | D C)
 30                    = (C D | A B) = (D C | A B) = (C D | B A) = (D C | B A)
 31
 32    This reduces the number of independent integrals from:
 33        N_bf^4   →   N_bf*(N_bf+1)/2 * N_bf*(N_bf+1)/2
 34    when computing the full tensor.
 35
 36    Parameters
 37    ----------
 38    basis : object
 39        Basis set object containing:
 40        - bfs_coords : Cartesian coordinates of basis function centers
 41        - bfs_coeffs : Contraction coefficients
 42        - bfs_expnts : Gaussian exponents
 43        - bfs_prim_norms : Primitive normalization constants
 44        - bfs_contr_prim_norms : Contraction normalization factors
 45        - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n)
 46        - bfs_nprim : Number of primitives per basis function
 47        - bfs_nao : Total number of atomic orbitals
 48
 49    slice : list of int, optional
 50        An 8-element list specifying a sub-block of integrals to compute:
 51        [start_A, end_A, start_B, end_B, start_C, end_C, start_D, end_D]
 52        If None (default), computes the full (Nbf, Nbf, Nbf, Nbf) tensor.
 53
 54        **Note:** When slices are used, symmetry exploitation is restricted
 55        to permutations that lie entirely within the specified slice.
 56
 57    Returns
 58    -------
 59    ints4c2e : ndarray
 60        The computed 4-center 2-electron integrals for the requested range.
 61
 62    Notes
 63    -----
 64    - All basis set data are pre-packed into NumPy arrays for Numba acceleration.
 65    - Uses explicit primitive Gaussian formula evaluation without numerical quadrature.
 66    - Symmetry exploitation is maximal only for full-range computations.
 67    - Conventional evaluation scales as O(N_bf^4) without symmetry, but
 68      symmetry reduces the number of computations drastically.
 69
 70    Examples
 71    --------
 72    >>> eri_full = conv_4c2e_symm(basis)
 73    >>> eri_block = conv_4c2e_symm(basis, slice=[0,5, 0,5, 0,5, 0,5])
 74    """
 75    # Here the lists are converted to numpy arrays for better use with Numba.
 76    # Once these conversions are done we pass these to a Numba decorated
 77    # function that uses prange, etc. to calculate the 4c2e integrals efficiently.
 78    # This function calculates the 4c2e electron-electron ERIs for a given basis object.
 79    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
 80    
 81    # The integrals are performed using the formulas https://pubs.acs.org/doi/full/10.1021/acs.jchemed.8b00255
 82
 83    # It is possible to only calculate a slice (block/subset) of the complete set of integrals.
 84    # slice is an 8 element list whose first and second elements give the range of the A functions to be calculated.
 85    # and so on.
 86
 87    #We convert the required properties to numpy arrays as this is what Numba likes.
 88    bfs_coords = np.array([basis.bfs_coords])
 89    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
 90    bfs_lmn = np.array([basis.bfs_lmn])
 91    bfs_nprim = np.array([basis.bfs_nprim])
 92        
 93
 94    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
 95    #Numba won't be able to work with these efficiently.
 96    #So, we convert them to a numpy 2d array by applying a trick,
 97    #that the second dimension is that of the largest list. So that
 98    #it can accomadate all the lists.
 99    maxnprim = max(basis.bfs_nprim)
100    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
101    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
102    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
103    for i in range(basis.bfs_nao):
104        for j in range(basis.bfs_nprim[i]):
105            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
106            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
107            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
108        
109
110
111    
112    if slice is None:
113        slice = [0, basis.bfs_nao, 0, basis.bfs_nao, 0, basis.bfs_nao, 0, basis.bfs_nao]
114        
115    #Limits for the calculation of 4c2e integrals
116    indx_startA = int(slice[0])
117    indx_endA = int(slice[1])
118    indx_startB = int(slice[2])
119    indx_endB = int(slice[3])
120    indx_startC = int(slice[4])
121    indx_endC = int(slice[5])
122    indx_startD = int(slice[6])
123    indx_endD = int(slice[7])
124
125    ints4c2e = conv_4c2e_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts,\
126        indx_startA,indx_endA, indx_startB, indx_endB, indx_startC, indx_endC, indx_startD,indx_endD)
127
128    return ints4c2e

Compute four-center two-electron (4c2e) electron repulsion integrals (ERIs) using the conventional and slow (analytical) formula-based method with full symmetry exploitation.

This function evaluates integrals of the form (A B | C D), where A, B, C, D are basis functions from the same primary basis set.

The "conv" variant uses explicit analytical integral formulas and nested loops over primitive Gaussians, following the derivations in:

J. Chem. Educ. 2018, 95, 9, 1572–1578
https://pubs.acs.org/doi/full/10.1021/acs.jchemed.8b00255

Compared to Rys quadrature, this conventional approach is much more computationally expensive but exact for the given formula set, making it useful for validation and special-purpose computations.

Symmetries exploited

For 4c2e integrals, the full 8-fold permutational symmetry is used:

(A B | C D) = (B A | C D) = (A B | D C) = (B A | D C)
            = (C D | A B) = (D C | A B) = (C D | B A) = (D C | B A)

This reduces the number of independent integrals from: N_bf^4 → N_bf(N_bf+1)/2 * N_bf(N_bf+1)/2 when computing the full tensor.

Parameters

basis : object Basis set object containing: - bfs_coords : Cartesian coordinates of basis function centers - bfs_coeffs : Contraction coefficients - bfs_expnts : Gaussian exponents - bfs_prim_norms : Primitive normalization constants - bfs_contr_prim_norms : Contraction normalization factors - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n) - bfs_nprim : Number of primitives per basis function - bfs_nao : Total number of atomic orbitals

slice : list of int, optional An 8-element list specifying a sub-block of integrals to compute: [start_A, end_A, start_B, end_B, start_C, end_C, start_D, end_D] If None (default), computes the full (Nbf, Nbf, Nbf, Nbf) tensor.

**Note:** When slices are used, symmetry exploitation is restricted
to permutations that lie entirely within the specified slice.

Returns

ints4c2e : ndarray The computed 4-center 2-electron integrals for the requested range.

Notes

  • All basis set data are pre-packed into NumPy arrays for Numba acceleration.
  • Uses explicit primitive Gaussian formula evaluation without numerical quadrature.
  • Symmetry exploitation is maximal only for full-range computations.
  • Conventional evaluation scales as O(N_bf^4) without symmetry, but symmetry reduces the number of computations drastically.

Examples

>>> eri_full = conv_4c2e_symm(basis)
>>> eri_block = conv_4c2e_symm(basis, slice=[0,5, 0,5, 0,5, 0,5])
def mmd_4c2e_symm(basis, slice=None):
 7def mmd_4c2e_symm(basis, slice=None):
 8    # Here the lists are converted to numpy arrays for better use with Numba.
 9    # Once these conversions are done we pass these to a Numba decorated
10    # function that uses prange, etc. to calculate the 4c2e integrals efficiently.
11    # This function calculates the 4c2e electron-electron ERIs for a given basis object.
12    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
13    
14    # The integrals are performed using the formulas https://pubs.acs.org/doi/full/10.1021/acs.jchemed.8b00255
15
16    # It is possible to only calculate a slice (block/subset) of the complete set of integrals.
17    # slice is an 8 element list whose first and second elements give the range of the A functions to be calculated.
18    # and so on.
19
20    #We convert the required properties to numpy arrays as this is what Numba likes.
21    bfs_coords = np.array([basis.bfs_coords])
22    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
23    bfs_lmn = np.array([basis.bfs_lmn])
24    bfs_nprim = np.array([basis.bfs_nprim])
25        
26
27    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
28    #Numba won't be able to work with these efficiently.
29    #So, we convert them to a numpy 2d array by applying a trick,
30    #that the second dimension is that of the largest list. So that
31    #it can accomadate all the lists.
32    maxnprim = max(basis.bfs_nprim)
33    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
34    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
35    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
36    for i in range(basis.bfs_nao):
37        for j in range(basis.bfs_nprim[i]):
38            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
39            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
40            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
41        
42
43
44    
45    if slice is None:
46        slice = [0, basis.bfs_nao, 0, basis.bfs_nao, 0, basis.bfs_nao, 0, basis.bfs_nao]
47        
48    #Limits for the calculation of 4c2e integrals
49    indx_startA = int(slice[0])
50    indx_endA = int(slice[1])
51    indx_startB = int(slice[2])
52    indx_endB = int(slice[3])
53    indx_startC = int(slice[4])
54    indx_endC = int(slice[5])
55    indx_startD = int(slice[6])
56    indx_endD = int(slice[7])
57
58    ints4c2e = mmd_4c2e_symm_internal2(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts,\
59        indx_startA,indx_endA, indx_startB, indx_endB, indx_startC, indx_endC, indx_startD,indx_endD)
60
61    return ints4c2e
def rys_4c2e_symm(basis, slice=None):
  8def rys_4c2e_symm(basis, slice=None):
  9    """
 10    Compute four-center two-electron (4c2e) electron repulsion integrals (ERIs) 
 11    using the Rys quadrature method with exploitation of 8-fold permutational 
 12    symmetry.
 13
 14    This function evaluates integrals of the form (A B | C D), where A, B, C, D 
 15    are basis functions from the same primary basis set. It uses Numba-accelerated 
 16    backends and symmetry-aware optimizations to reduce computational cost.
 17
 18    Symmetries exploited
 19    --------------------
 20    The 4c2e integrals obey the following permutational symmetry relations:
 21
 22        (A B | C D) = (B A | C D)   # exchange within bra
 23                    = (A B | D C)  # exchange within ket
 24                    = (B A | D C)  
 25                    = (C D | A B)  # bra ↔ ket exchange
 26                    = (D C | A B)
 27                    = (C D | B A)
 28                    = (D C | B A)
 29
 30    These 8 equivalent permutations mean that only a subset of integrals 
 31    needs to be explicitly computed; the rest can be obtained by symmetry.
 32    
 33    This reduces the number of independent integrals from:
 34        N_bf^4   →   N_bf*(N_bf+1)/2 * N_bf*(N_bf+1)/2
 35    when computing the full tensor.
 36
 37    Parameters
 38    ----------
 39    basis : object
 40        Primary basis set object containing:
 41        - bfs_coords : Cartesian coordinates of basis function centers.
 42        - bfs_coeffs : Contraction coefficients.
 43        - bfs_expnts : Gaussian exponents.
 44        - bfs_prim_norms : Primitive normalization constants.
 45        - bfs_contr_prim_norms : Contraction normalization factors.
 46        - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n).
 47        - bfs_nprim : Number of primitives per basis function.
 48        - bfs_shell_index : Index of the shell each basis function belongs to.
 49        - bfs_nao : Total number of atomic orbitals.
 50
 51    slice : list of int, optional
 52        An 8-element list specifying a sub-block of integrals to compute:
 53        [start_A, end_A, start_B, end_B, start_C, end_C, start_D, end_D]
 54        If None (default), computes the full (Nbf, Nbf, Nbf, Nbf) block.
 55
 56        **Note:** When slices are used, not all 8-fold symmetries may be 
 57        available because the requested block may not contain all 
 58        symmetric permutations. In such cases, only the applicable 
 59        symmetries are used.
 60
 61    Returns
 62    -------
 63    ints4c2e : ndarray
 64        The computed 4-center 2-electron integrals for the requested range.
 65
 66    Notes
 67    -----
 68    - Precomputes and stores basis set data in NumPy arrays for Numba efficiency.
 69    - Exploits all possible symmetry permutations when the full tensor is 
 70      computed (no slice) to reduce redundant work.
 71    - If a slice is specified, symmetry exploitation is limited to the 
 72      permutations that fall within the slice's index ranges.
 73
 74    Examples
 75    --------
 76    >>> eri_full = rys_4c2e_symm(basis)
 77    >>> eri_block = rys_4c2e_symm(basis, slice=[0,5, 0,5, 0,5, 0,5])
 78    """
 79    # Here the lists are converted to numpy arrays for better use with Numba.
 80    # Once these conversions are done we pass these to a Numba decorated
 81    # function that uses prange, etc. to calculate the 4c2e integrals efficiently.
 82    # This function calculates the 4c2e electron-electron ERIs for a given basis object.
 83    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
 84
 85    # It is possible to only calculate a slice (block/subset) of the complete set of integrals.
 86    # slice is an 8 element list whose first and second elements give the range of the A functions to be calculated.
 87    # and so on.
 88
 89    #We convert the required properties to numpy arrays as this is what Numba likes.
 90    bfs_coords = np.array([basis.bfs_coords])
 91    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
 92    bfs_lmn = np.array([basis.bfs_lmn])
 93    bfs_nprim = np.array([basis.bfs_nprim])
 94        
 95
 96    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
 97    #Numba won't be able to work with these efficiently.
 98    #So, we convert them to a numpy 2d array by applying a trick,
 99    #that the second dimension is that of the largest list. So that
100    #it can accomadate all the lists.
101    maxnprim = max(basis.bfs_nprim)
102    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
103    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
104    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
105    shell_indices = np.array([basis.bfs_shell_index], dtype=np.uint16)[0]
106    for i in range(basis.bfs_nao):
107        for j in range(basis.bfs_nprim[i]):
108            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
109            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
110            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
111        
112
113
114    
115    if slice is None:
116        slice = [0, basis.bfs_nao, 0, basis.bfs_nao, 0, basis.bfs_nao, 0, basis.bfs_nao]
117        
118    #Limits for the calculation of 4c2e integrals
119    indx_startA = int(slice[0])
120    indx_endA = int(slice[1])
121    indx_startB = int(slice[2])
122    indx_endB = int(slice[3])
123    indx_startC = int(slice[4])
124    indx_endC = int(slice[5])
125    indx_startD = int(slice[6])
126    indx_endD = int(slice[7])
127
128
129    ints4c2e = rys_4c2e_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, shell_indices,\
130        indx_startA,indx_endA, indx_startB, indx_endB, indx_startC, indx_endC, indx_startD,indx_endD)
131
132    return ints4c2e

Compute four-center two-electron (4c2e) electron repulsion integrals (ERIs) using the Rys quadrature method with exploitation of 8-fold permutational symmetry.

This function evaluates integrals of the form (A B | C D), where A, B, C, D are basis functions from the same primary basis set. It uses Numba-accelerated backends and symmetry-aware optimizations to reduce computational cost.

Symmetries exploited

The 4c2e integrals obey the following permutational symmetry relations:

(A B | C D) = (B A | C D)   # exchange within bra
            = (A B | D C)  # exchange within ket
            = (B A | D C)  
            = (C D | A B)  # bra ↔ ket exchange
            = (D C | A B)
            = (C D | B A)
            = (D C | B A)

These 8 equivalent permutations mean that only a subset of integrals needs to be explicitly computed; the rest can be obtained by symmetry.

This reduces the number of independent integrals from: N_bf^4 → N_bf(N_bf+1)/2 * N_bf(N_bf+1)/2 when computing the full tensor.

Parameters

basis : object Primary basis set object containing: - bfs_coords : Cartesian coordinates of basis function centers. - bfs_coeffs : Contraction coefficients. - bfs_expnts : Gaussian exponents. - bfs_prim_norms : Primitive normalization constants. - bfs_contr_prim_norms : Contraction normalization factors. - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n). - bfs_nprim : Number of primitives per basis function. - bfs_shell_index : Index of the shell each basis function belongs to. - bfs_nao : Total number of atomic orbitals.

slice : list of int, optional An 8-element list specifying a sub-block of integrals to compute: [start_A, end_A, start_B, end_B, start_C, end_C, start_D, end_D] If None (default), computes the full (Nbf, Nbf, Nbf, Nbf) block.

**Note:** When slices are used, not all 8-fold symmetries may be 
available because the requested block may not contain all 
symmetric permutations. In such cases, only the applicable 
symmetries are used.

Returns

ints4c2e : ndarray The computed 4-center 2-electron integrals for the requested range.

Notes

  • Precomputes and stores basis set data in NumPy arrays for Numba efficiency.
  • Exploits all possible symmetry permutations when the full tensor is computed (no slice) to reduce redundant work.
  • If a slice is specified, symmetry exploitation is limited to the permutations that fall within the slice's index ranges.

Examples

>>> eri_full = rys_4c2e_symm(basis)
>>> eri_block = rys_4c2e_symm(basis, slice=[0,5, 0,5, 0,5, 0,5])
def rys_4c2e_symm_old(basis, slice=None):
416def rys_4c2e_symm_old(basis, slice=None):
417    # Here the lists are converted to numpy arrays for better use with Numba.
418    # Once these conversions are done we pass these to a Numba decorated
419    # function that uses prange, etc. to calculate the 4c2e integrals efficiently.
420    # This function calculates the 4c2e electron-electron ERIs for a given basis object.
421    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
422
423    # It is possible to only calculate a slice (block/subset) of the complete set of integrals.
424    # slice is an 8 element list whose first and second elements give the range of the A functions to be calculated.
425    # and so on.
426
427    #We convert the required properties to numpy arrays as this is what Numba likes.
428    bfs_coords = np.array([basis.bfs_coords])
429    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
430    bfs_lmn = np.array([basis.bfs_lmn])
431    bfs_nprim = np.array([basis.bfs_nprim])
432        
433
434    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
435    #Numba won't be able to work with these efficiently.
436    #So, we convert them to a numpy 2d array by applying a trick,
437    #that the second dimension is that of the largest list. So that
438    #it can accomadate all the lists.
439    maxnprim = max(basis.bfs_nprim)
440    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
441    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
442    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
443    for i in range(basis.bfs_nao):
444        for j in range(basis.bfs_nprim[i]):
445            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
446            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
447            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
448        
449
450
451    
452    if slice is None:
453        slice = [0, basis.bfs_nao, 0, basis.bfs_nao, 0, basis.bfs_nao, 0, basis.bfs_nao]
454        
455    #Limits for the calculation of 4c2e integrals
456    indx_startA = int(slice[0])
457    indx_endA = int(slice[1])
458    indx_startB = int(slice[2])
459    indx_endB = int(slice[3])
460    indx_startC = int(slice[4])
461    indx_endC = int(slice[5])
462    indx_startD = int(slice[6])
463    indx_endD = int(slice[7])
464
465    ints4c2e = rys_4c2e_symm_internal_old(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts,\
466        indx_startA,indx_endA, indx_startB, indx_endB, indx_startC, indx_endC, indx_startD,indx_endD)
467
468    return ints4c2e
def conv_3c2e_symm(basis, auxbasis, slice=None):
  7def conv_3c2e_symm(basis, auxbasis, slice=None):
  8    """
  9    Compute three-center two-electron (3c2e) electron repulsion integrals (ERIs)
 10    using the conventional and slow (analytical) formula-based method with symmetry exploitation.
 11
 12    This function evaluates integrals of the form (A B | C), where:
 13        - A and B are primary basis functions from `basis`
 14        - C is an auxiliary basis function from `auxbasis`
 15
 16    The "conv" variant uses explicit analytical integral formulas and nested
 17    loops over primitive Gaussians, following the derivations in:
 18
 19        J. Chem. Educ. 2018, 95, 9, 1572–1578
 20        https://pubs.acs.org/doi/full/10.1021/acs.jchemed.8b00255
 21
 22    Compared to the Rys quadrature method, this conventional approach is
 23    significantly more computationally expensive, but can be useful for
 24    validation or when Rys is not applicable.
 25
 26    Symmetries exploited
 27    --------------------
 28    For 3c2e integrals, the following bra symmetry is used:
 29
 30        (A B | C) = (B A | C)
 31
 32    This reduces the number of computed integrals from N_bf² * N_aux
 33    to N_bf*(N_bf+1)/2 * N_aux when the full tensor is computed.
 34
 35    Parameters
 36    ----------
 37    basis : object
 38        Primary basis set object containing:
 39        - bfs_coords : Cartesian coordinates of basis function centers
 40        - bfs_coeffs : Contraction coefficients
 41        - bfs_expnts : Gaussian exponents
 42        - bfs_prim_norms : Primitive normalization constants
 43        - bfs_contr_prim_norms : Contraction normalization factors
 44        - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n)
 45        - bfs_nprim : Number of primitives per basis function
 46        - bfs_nao : Total number of atomic orbitals
 47
 48    auxbasis : object
 49        Auxiliary basis set object with the same attributes as `basis`.
 50
 51    slice : list of int, optional
 52        A 6-element list specifying a sub-block of integrals to compute:
 53        [start_A, end_A, start_B, end_B, start_C, end_C]
 54        If None (default), computes the full (Nbf, Nbf, Naux) tensor.
 55
 56        **Note:** When slices are used, AB symmetry exploitation is limited
 57        to permutations that lie entirely within the specified slice.
 58
 59    Returns
 60    -------
 61    ints3c2e : ndarray
 62        The computed 3-center 2-electron integrals for the requested range.
 63        Shape: (Nbf, Nbf, Nauxbf) or 
 64        (end_A - start_A, end_B - start_B, end_C - start_C) 
 65        if slice is given.
 66
 67    Notes
 68    -----
 69    - All basis set data are pre-packed into NumPy arrays for Numba acceleration.
 70    - Uses explicit primitive Gaussian formula evaluation without numerical quadrature.
 71    - Symmetry exploitation is maximal only for full-range computations.
 72    - Conventional evaluation scales poorly compared to Rys quadrature, but is
 73      exact for the given formula set.
 74
 75    Examples
 76    --------
 77    >>> eri_full = conv_3c2e_symm(basis, auxbasis)
 78    >>> eri_block = conv_3c2e_symm(basis, auxbasis, slice=[0,5, 0,5, 0,10])
 79    """
 80    # Here the lists are converted to numpy arrays for better use with Numba.
 81    # Once these conversions are done we pass these to a Numba decorated
 82    # function that uses prange, etc. to calculate the 3c2e integrals efficiently.
 83    # This function calculates the 3c2e electron-electron ERIs for a given basis object and auxbasis object.
 84    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
 85    
 86    # The integrals are performed using the formulas https://pubs.acs.org/doi/full/10.1021/acs.jchemed.8b00255
 87
 88    # It is possible to only calculate a slice (block/subset) of the complete set of integrals.
 89    # slice is an 6 element list whose first and second elements give the range of the A functions to be calculated.
 90    # and so on.
 91    # slice = [indx_startA, indx_endA, indx_startB, indx_endB, indx_startC, indx_endC]
 92
 93    #We convert the required properties to numpy arrays as this is what Numba likes.
 94    bfs_coords = np.array([basis.bfs_coords])
 95    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
 96    bfs_lmn = np.array([basis.bfs_lmn])
 97    bfs_nprim = np.array([basis.bfs_nprim])
 98
 99    #We convert the required properties to numpy arrays as this is what Numba likes.
100    aux_bfs_coords = np.array([auxbasis.bfs_coords])
101    aux_bfs_contr_prim_norms = np.array([auxbasis.bfs_contr_prim_norms])
102    aux_bfs_lmn = np.array([auxbasis.bfs_lmn])
103    aux_bfs_nprim = np.array([auxbasis.bfs_nprim])
104        
105
106    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
107    #Numba won't be able to work with these efficiently.
108    #So, we convert them to a numpy 2d array by applying a trick,
109    #that the second dimension is that of the largest list. So that
110    #it can accomadate all the lists.
111    maxnprim = max(basis.bfs_nprim)
112    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
113    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
114    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
115    for i in range(basis.bfs_nao):
116        for j in range(basis.bfs_nprim[i]):
117            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
118            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
119            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
120
121    maxnprimaux = max(auxbasis.bfs_nprim)
122    aux_bfs_coeffs = np.zeros([auxbasis.bfs_nao, maxnprimaux])
123    aux_bfs_expnts = np.zeros([auxbasis.bfs_nao, maxnprimaux])
124    aux_bfs_prim_norms = np.zeros([auxbasis.bfs_nao, maxnprimaux])
125    for i in range(auxbasis.bfs_nao):
126        for j in range(auxbasis.bfs_nprim[i]):
127            aux_bfs_coeffs[i,j] = auxbasis.bfs_coeffs[i][j]
128            aux_bfs_expnts[i,j] = auxbasis.bfs_expnts[i][j]
129            aux_bfs_prim_norms[i,j] = auxbasis.bfs_prim_norms[i][j]
130            
131
132
133        
134    if slice is None:
135        slice = [0, basis.bfs_nao, 0, basis.bfs_nao, 0, auxbasis.bfs_nao]
136        
137    #Limits for the calculation of 4c2e integrals
138    indx_startA = int(slice[0])
139    indx_endA = int(slice[1])
140    indx_startB = int(slice[2])
141    indx_endB = int(slice[3])
142    indx_startC = int(slice[4])
143    indx_endC = int(slice[5])
144
145    ints3c2e = conv_3c2e_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts,aux_bfs_coords[0], aux_bfs_contr_prim_norms[0], aux_bfs_lmn[0], aux_bfs_nprim[0], aux_bfs_coeffs, aux_bfs_prim_norms, aux_bfs_expnts,indx_startA, indx_endA, indx_startB, indx_endB, indx_startC, indx_endC)
146    return ints3c2e

Compute three-center two-electron (3c2e) electron repulsion integrals (ERIs) using the conventional and slow (analytical) formula-based method with symmetry exploitation.

This function evaluates integrals of the form (A B | C), where: - A and B are primary basis functions from basis - C is an auxiliary basis function from auxbasis

The "conv" variant uses explicit analytical integral formulas and nested loops over primitive Gaussians, following the derivations in:

J. Chem. Educ. 2018, 95, 9, 1572–1578
https://pubs.acs.org/doi/full/10.1021/acs.jchemed.8b00255

Compared to the Rys quadrature method, this conventional approach is significantly more computationally expensive, but can be useful for validation or when Rys is not applicable.

Symmetries exploited

For 3c2e integrals, the following bra symmetry is used:

(A B | C) = (B A | C)

This reduces the number of computed integrals from N_bf² * N_aux to N_bf*(N_bf+1)/2 * N_aux when the full tensor is computed.

Parameters

basis : object Primary basis set object containing: - bfs_coords : Cartesian coordinates of basis function centers - bfs_coeffs : Contraction coefficients - bfs_expnts : Gaussian exponents - bfs_prim_norms : Primitive normalization constants - bfs_contr_prim_norms : Contraction normalization factors - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n) - bfs_nprim : Number of primitives per basis function - bfs_nao : Total number of atomic orbitals

auxbasis : object Auxiliary basis set object with the same attributes as basis.

slice : list of int, optional A 6-element list specifying a sub-block of integrals to compute: [start_A, end_A, start_B, end_B, start_C, end_C] If None (default), computes the full (Nbf, Nbf, Naux) tensor.

**Note:** When slices are used, AB symmetry exploitation is limited
to permutations that lie entirely within the specified slice.

Returns

ints3c2e : ndarray The computed 3-center 2-electron integrals for the requested range. Shape: (Nbf, Nbf, Nauxbf) or (end_A - start_A, end_B - start_B, end_C - start_C) if slice is given.

Notes

  • All basis set data are pre-packed into NumPy arrays for Numba acceleration.
  • Uses explicit primitive Gaussian formula evaluation without numerical quadrature.
  • Symmetry exploitation is maximal only for full-range computations.
  • Conventional evaluation scales poorly compared to Rys quadrature, but is exact for the given formula set.

Examples

>>> eri_full = conv_3c2e_symm(basis, auxbasis)
>>> eri_block = conv_3c2e_symm(basis, auxbasis, slice=[0,5, 0,5, 0,10])
def rys_3c2e_symm(basis, auxbasis, slice=None, schwarz=False, schwarz_threshold=1e-09):
 11def rys_3c2e_symm(basis, auxbasis, slice=None, schwarz=False, schwarz_threshold=1e-9):
 12    """
 13    Compute three-center two-electron (3c2e) electron repulsion integrals using
 14    the Rys quadrature method with symmetry considerations.
 15
 16    This function evaluates integrals of the form (A B | C), where A and B are
 17    basis functions from a primary basis set and C is from an auxiliary basis set.
 18    Symmetry in the first two indices is exploited ((A B | C) = (B A | C)),
 19    so only the upper-triangular part of the (A,B) block is computed and the (B A | C)
 20    integrals are formed by symmetry.
 21    The implementation uses Numba-accelerated backends with symmetry-aware
 22    optimizations and optional Schwarz screening to skip negligible contributions.
 23    Symmetry is utilized to only compute N_{bf}*(N_{bf}+1)/2*N_{auxbf}
 24
 25    Parameters
 26    ----------
 27    basis : object
 28        Primary basis set object containing information about atomic orbitals, such as:
 29        - bfs_coords : Cartesian coordinates of basis function centers.
 30        - bfs_coeffs : Contraction coefficients.
 31        - bfs_expnts : Gaussian exponents.
 32        - bfs_prim_norms : Primitive normalization constants.
 33        - bfs_contr_prim_norms : Contraction normalization factors.
 34        - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n).
 35        - bfs_nprim : Number of primitives per basis function.
 36        - bfs_nao : Total number of atomic orbitals.
 37
 38    auxbasis : object
 39        Auxiliary basis set object with the same attributes as `basis`, 
 40        but typically used for resolution-of-the-identity (RI) expansions.
 41
 42    slice : list of int, optional
 43        A 6-element list specifying a subset of integrals to compute:
 44        [start_A, end_A, start_B, end_B, start_C, end_C]
 45        If None (default), computes all N_{bf}*N_{bf}*N_{auxbf} integrals.
 46
 47    schwarz : bool, optional
 48        If True, applies Schwarz screening to skip calculations where the 
 49        product of integral bounds is below `schwarz_threshold`.
 50
 51    schwarz_threshold : float, optional
 52        The threshold for Schwarz screening (default is 1e-9).
 53
 54    Returns
 55    -------
 56    ints3c2e : ndarray of shape 
 57        (Nbf, Nbf, Nauxbf) or 
 58        (end_A - start_A, end_B - start_B, end_C - start_C) 
 59        if slice is given.
 60        The computed 3-center 2-electron integrals.
 61
 62    Notes
 63    -----
 64    - Uses preallocated NumPy arrays to store primitive data for efficient Numba processing.
 65    - Handles irregular contraction patterns by padding primitive arrays to the 
 66      size of the largest contraction.
 67    - If Schwarz screening is enabled, precomputes diagonal two-electron integrals 
 68      and uses their square roots for screening.
 69    - Symmetry relations are exploited to avoid redundant calculations.
 70
 71    Examples
 72    --------
 73    >>> ints = rys_3c2e_symm(basis, auxbasis)
 74    >>> ints_block = rys_3c2e_symm(basis, auxbasis, slice=[0, 5, 0, 5, 0, 10])
 75    >>> ints_screened = rys_3c2e_symm(basis, auxbasis, schwarz=True, schwarz_threshold=1e-10)
 76    """
 77    # Here the lists are converted to numpy arrays for better use with Numba.
 78    # Once these conversions are done we pass these to a Numba decorated
 79    # function that uses prange, etc. to calculate the 3c2e integrals efficiently.
 80    # This function calculates the 3c2e electron-electron ERIs for a given basis object and auxbasis object.
 81    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
 82    
 83
 84    # It is possible to only calculate a slice (block/subset) of the complete set of integrals.
 85    # slice is a 6 element list whose first and second elements give the range of the A functions to be calculated.
 86    # and so on.
 87    # slice = [indx_startA, indx_endA, indx_startB, indx_endB, indx_startC, indx_endC]
 88
 89    #We convert the required properties to numpy arrays as this is what Numba likes.
 90    bfs_coords = np.array([basis.bfs_coords])
 91    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
 92    bfs_lmn = np.array([basis.bfs_lmn])
 93    bfs_nprim = np.array([basis.bfs_nprim])
 94
 95    #We convert the required properties to numpy arrays as this is what Numba likes.
 96    aux_bfs_coords = np.array([auxbasis.bfs_coords])
 97    aux_bfs_contr_prim_norms = np.array([auxbasis.bfs_contr_prim_norms])
 98    aux_bfs_lmn = np.array([auxbasis.bfs_lmn])
 99    aux_bfs_nprim = np.array([auxbasis.bfs_nprim])
100        
101
102    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
103    #Numba won't be able to work with these efficiently.
104    #So, we convert them to a numpy 2d array by applying a trick,
105    #that the second dimension is that of the largest list. So that
106    #it can accomadate all the lists.
107    maxnprim = max(basis.bfs_nprim)
108    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
109    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
110    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
111    for i in range(basis.bfs_nao):
112        for j in range(basis.bfs_nprim[i]):
113            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
114            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
115            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
116
117    maxnprimaux = max(auxbasis.bfs_nprim)
118    aux_bfs_coeffs = np.zeros([auxbasis.bfs_nao, maxnprimaux])
119    aux_bfs_expnts = np.zeros([auxbasis.bfs_nao, maxnprimaux])
120    aux_bfs_prim_norms = np.zeros([auxbasis.bfs_nao, maxnprimaux])
121    for i in range(auxbasis.bfs_nao):
122        for j in range(auxbasis.bfs_nprim[i]):
123            aux_bfs_coeffs[i,j] = auxbasis.bfs_coeffs[i][j]
124            aux_bfs_expnts[i,j] = auxbasis.bfs_expnts[i][j]
125            aux_bfs_prim_norms[i,j] = auxbasis.bfs_prim_norms[i][j]
126            
127
128
129        
130    if slice is None:
131        slice = [0, basis.bfs_nao, 0, basis.bfs_nao, 0, auxbasis.bfs_nao]
132        
133    #Limits for the calculation of 4c2e integrals
134    indx_startA = int(slice[0])
135    indx_endA = int(slice[1])
136    indx_startB = int(slice[2])
137    indx_endB = int(slice[3])
138    indx_startC = int(slice[4])
139    indx_endC = int(slice[5])
140
141    if schwarz:
142        ints4c2e_diag = eri_4c2e_diag(basis)
143        ints2c2e = rys_2c2e_symm(auxbasis)
144        sqrt_ints4c2e_diag = np.sqrt(np.abs(ints4c2e_diag))
145        sqrt_diag_ints2c2e = np.sqrt(np.abs(np.diag(ints2c2e)))
146        print('Prelims calc done for Schwarz screening!')
147    else:
148        #Create dummy array
149        sqrt_ints4c2e_diag = np.zeros((1,1), dtype=np.float64)
150        sqrt_diag_ints2c2e = np.zeros((1), dtype=np.float64)
151
152    ints3c2e = rys_3c2e_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts,aux_bfs_coords[0], aux_bfs_contr_prim_norms[0], aux_bfs_lmn[0], aux_bfs_nprim[0], aux_bfs_coeffs, aux_bfs_prim_norms, aux_bfs_expnts,indx_startA, indx_endA, indx_startB, indx_endB, indx_startC, indx_endC, schwarz, sqrt_ints4c2e_diag, sqrt_diag_ints2c2e, schwarz_threshold)
153    return ints3c2e

Compute three-center two-electron (3c2e) electron repulsion integrals using the Rys quadrature method with symmetry considerations.

This function evaluates integrals of the form (A B | C), where A and B are basis functions from a primary basis set and C is from an auxiliary basis set. Symmetry in the first two indices is exploited ((A B | C) = (B A | C)), so only the upper-triangular part of the (A,B) block is computed and the (B A | C) integrals are formed by symmetry. The implementation uses Numba-accelerated backends with symmetry-aware optimizations and optional Schwarz screening to skip negligible contributions. Symmetry is utilized to only compute N_{bf}*(N_{bf}+1)/2*N_{auxbf}

Parameters

basis : object Primary basis set object containing information about atomic orbitals, such as: - bfs_coords : Cartesian coordinates of basis function centers. - bfs_coeffs : Contraction coefficients. - bfs_expnts : Gaussian exponents. - bfs_prim_norms : Primitive normalization constants. - bfs_contr_prim_norms : Contraction normalization factors. - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n). - bfs_nprim : Number of primitives per basis function. - bfs_nao : Total number of atomic orbitals.

auxbasis : object Auxiliary basis set object with the same attributes as basis, but typically used for resolution-of-the-identity (RI) expansions.

slice : list of int, optional A 6-element list specifying a subset of integrals to compute: [start_A, end_A, start_B, end_B, start_C, end_C] If None (default), computes all N_{bf}*N_{bf}*N_{auxbf} integrals.

schwarz : bool, optional If True, applies Schwarz screening to skip calculations where the product of integral bounds is below schwarz_threshold.

schwarz_threshold : float, optional The threshold for Schwarz screening (default is 1e-9).

Returns

ints3c2e : ndarray of shape (Nbf, Nbf, Nauxbf) or (end_A - start_A, end_B - start_B, end_C - start_C) if slice is given. The computed 3-center 2-electron integrals.

Notes

  • Uses preallocated NumPy arrays to store primitive data for efficient Numba processing.
  • Handles irregular contraction patterns by padding primitive arrays to the size of the largest contraction.
  • If Schwarz screening is enabled, precomputes diagonal two-electron integrals and uses their square roots for screening.
  • Symmetry relations are exploited to avoid redundant calculations.

Examples

>>> ints = rys_3c2e_symm(basis, auxbasis)
>>> ints_block = rys_3c2e_symm(basis, auxbasis, slice=[0, 5, 0, 5, 0, 10])
>>> ints_screened = rys_3c2e_symm(basis, auxbasis, schwarz=True, schwarz_threshold=1e-10)
def conv_2c2e_symm(basis, slice=None):
 8def conv_2c2e_symm(basis, slice=None):
 9    """
10    Compute two-center two-electron integrals (2c2e) using the conventional (slow) method,
11    exploiting permutation symmetry.
12
13    This function computes 2c2e electron repulsion integrals (ERIs) over a given
14    basis set using the conventional algorithm which is very slow compared to Rys quadrature. 
15    It supports symmetric integral evaluation and allows slicing the full ERI matrix into 
16    blocks for efficient parallelization or chunked computation.
17
18    Parameters
19    ----------
20    basis : Basis
21        A basis object containing basis function information such as:
22        coordinates, exponents, coefficients, angular momentum, normalization factors,
23        and number of primitives.
24
25    slice : list of int, optional
26        A list of four integers [start_row, end_row, start_col, end_col] specifying
27        a block (subset) of the full integral matrix to compute.
28        If not provided, the full matrix is computed.
29
30    Returns
31    -------
32    ints2c2e : np.ndarray
33        A 2D numpy array containing the computed 2c2e integrals for the specified slice.
34    """
35    # Here the lists are converted to numpy arrays for better use with Numba.
36    # Once these conversions are done we pass these to a Numba decorated
37    # function that uses prange, etc. to calculate the 3c2e integrals efficiently.
38    # This function calculates the 3c2e electron-electron ERIs for a given basis object and auxbasis object.
39    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
40    
41    # It is possible to only calculate a slice (block/subset) of the complete set of integrals.
42    # slice is a 4 element list whose first and second elements give the range of the A functions to be calculated.
43    # and so on.
44    # slice = [start_row, end_row, start_col, end_col]
45
46    #We convert the required properties to numpy arrays as this is what Numba likes.
47    bfs_coords = np.array([basis.bfs_coords])
48    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
49    bfs_lmn = np.array([basis.bfs_lmn])
50    bfs_nprim = np.array([basis.bfs_nprim])
51
52        
53
54    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
55    #Numba won't be able to work with these efficiently.
56    #So, we convert them to a numpy 2d array by applying a trick,
57    #that the second dimension is that of the largest list. So that
58    #it can accomadate all the lists.
59    maxnprim = max(basis.bfs_nprim)
60    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
61    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
62    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
63    for i in range(basis.bfs_nao):
64        for j in range(basis.bfs_nprim[i]):
65            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
66            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
67            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
68
69            
70
71        
72    if slice is None:
73        slice = [0, basis.bfs_nao, 0, basis.bfs_nao]
74        
75    #Limits for the calculation of overlap integrals
76    start_row = int(slice[0]) #row start index
77    end_row = int(slice[1]) #row end index
78    start_col = int(slice[2]) #column start index
79    end_col = int(slice[3]) #column end index
80
81    ints2c2e = conv_2c2e_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, start_row, end_row, start_col, end_col)
82    return ints2c2e

Compute two-center two-electron integrals (2c2e) using the conventional (slow) method, exploiting permutation symmetry.

This function computes 2c2e electron repulsion integrals (ERIs) over a given basis set using the conventional algorithm which is very slow compared to Rys quadrature. It supports symmetric integral evaluation and allows slicing the full ERI matrix into blocks for efficient parallelization or chunked computation.

Parameters

basis : Basis A basis object containing basis function information such as: coordinates, exponents, coefficients, angular momentum, normalization factors, and number of primitives.

slice : list of int, optional A list of four integers [start_row, end_row, start_col, end_col] specifying a block (subset) of the full integral matrix to compute. If not provided, the full matrix is computed.

Returns

ints2c2e : np.ndarray A 2D numpy array containing the computed 2c2e integrals for the specified slice.

def rys_2c2e_symm(basis, slice=None):
 8def rys_2c2e_symm(basis, slice=None):
 9    """
10    Compute symmetric two-center two-electron integrals using Rys quadrature.
11
12    This function evaluates the electron repulsion integrals (ERIs)
13    involving only two centers (2c2e), using the Rys quadrature method. 
14    The integrals are computed efficiently using Numba, and the data is prepared 
15    accordingly for compatibility with JIT compilation. It assumes symmetry, 
16    and computes only a specified block if requested.
17
18    Parameters
19    ----------
20    basis : object
21        A basis set object containing the basis function data. It must have attributes:
22        - bfs_coords : list of (x, y, z) coordinates of basis function centers
23        - bfs_contr_prim_norms : contraction-normalization factors
24        - bfs_lmn : angular momentum tuples (l, m, n)
25        - bfs_nprim : number of primitives for each basis function
26        - bfs_coeffs : contraction coefficients
27        - bfs_expnts : primitive exponents
28        - bfs_prim_norms : primitive normalization constants
29        - bfs_nao : total number of atomic orbitals (basis functions)
30
31    slice : list of int, optional
32        A four-element list `[start_row, end_row, start_col, end_col]` specifying the 
33        row and column ranges of the matrix to be computed. If None, the full matrix 
34        is computed.
35
36    Returns
37    -------
38    ints2c2e : ndarray
39        A 2D numpy array of shape `(end_row - start_row, end_col - start_col)` 
40        containing the computed symmetric two-center two-electron integrals.
41
42    Notes
43    -----
44    This function prepares the basis set data in a Numba-friendly format by 
45    converting ragged lists into padded 2D arrays, where the second dimension 
46    corresponds to the maximum number of primitives. The core integral computation 
47    is offloaded to a Numba-accelerated function `rys_2c2e_symm_internal`.
48    """
49    # Here the lists are converted to numpy arrays for better use with Numba.
50    # Once these conversions are done we pass these to a Numba decorated
51    # function that uses prange, etc. to calculate the 3c2e integrals efficiently.
52    # This function calculates the 3c2e electron-electron ERIs for a given basis object and auxbasis object.
53    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
54    
55    # It is possible to only calculate a slice (block/subset) of the complete set of integrals.
56    # slice is a 4 element list whose first and second elements give the range of the A functions to be calculated.
57    # and so on.
58    # slice = [start_row, end_row, start_col, end_col]
59
60    #We convert the required properties to numpy arrays as this is what Numba likes.
61    bfs_coords = np.array([basis.bfs_coords])
62    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
63    bfs_lmn = np.array([basis.bfs_lmn])
64    bfs_nprim = np.array([basis.bfs_nprim])
65
66        
67
68    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
69    #Numba won't be able to work with these efficiently.
70    #So, we convert them to a numpy 2d array by applying a trick,
71    #that the second dimension is that of the largest list. So that
72    #it can accomadate all the lists.
73    maxnprim = max(basis.bfs_nprim)
74    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
75    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
76    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
77    for i in range(basis.bfs_nao):
78        for j in range(basis.bfs_nprim[i]):
79            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
80            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
81            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
82
83            
84
85        
86    if slice is None:
87        slice = [0, basis.bfs_nao, 0, basis.bfs_nao]
88        
89    #Limits for the calculation of overlap integrals
90    start_row = int(slice[0]) #row start index
91    end_row = int(slice[1]) #row end index
92    start_col = int(slice[2]) #column start index
93    end_col = int(slice[3]) #column end index
94
95    ints2c2e = rys_2c2e_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, start_row, end_row, start_col, end_col)
96    return ints2c2e

Compute symmetric two-center two-electron integrals using Rys quadrature.

This function evaluates the electron repulsion integrals (ERIs) involving only two centers (2c2e), using the Rys quadrature method. The integrals are computed efficiently using Numba, and the data is prepared accordingly for compatibility with JIT compilation. It assumes symmetry, and computes only a specified block if requested.

Parameters

basis : object A basis set object containing the basis function data. It must have attributes: - bfs_coords : list of (x, y, z) coordinates of basis function centers - bfs_contr_prim_norms : contraction-normalization factors - bfs_lmn : angular momentum tuples (l, m, n) - bfs_nprim : number of primitives for each basis function - bfs_coeffs : contraction coefficients - bfs_expnts : primitive exponents - bfs_prim_norms : primitive normalization constants - bfs_nao : total number of atomic orbitals (basis functions)

slice : list of int, optional A four-element list [start_row, end_row, start_col, end_col] specifying the row and column ranges of the matrix to be computed. If None, the full matrix is computed.

Returns

ints2c2e : ndarray A 2D numpy array of shape (end_row - start_row, end_col - start_col) containing the computed symmetric two-center two-electron integrals.

Notes

This function prepares the basis set data in a Numba-friendly format by converting ragged lists into padded 2D arrays, where the second dimension corresponds to the maximum number of primitives. The core integral computation is offloaded to a Numba-accelerated function rys_2c2e_symm_internal.

def rys_3c2e_tri(basis, auxbasis):
  9def rys_3c2e_tri(basis, auxbasis):
 10    """
 11    Compute three-center two-electron (3c2e) electron repulsion integrals using
 12    the Rys quadrature method with symmetry considerations, stored in packed
 13    triangular form for memory efficiency.
 14
 15    This function evaluates integrals of the form (A B | C), where A and B are
 16    basis functions from a primary basis set and C is from an auxiliary basis set.
 17    Symmetry in the first two indices is exploited ((A B | C) = (B A | C)),
 18    so only the upper-triangular part of the (A,B) block is computed and stored.
 19    The resulting array has shape (N_{bf}*(N_{bf}+1)/2, N_{auxbf}), where the first
 20    index corresponds to a flattened 1D triangular index over basis function pairs.
 21
 22    Unlike `rys_3c2e_symm`, this routine does not support computing arbitrary slices
 23    — the full triangular set is always evaluated.
 24
 25    Parameters
 26    ----------
 27    basis : object
 28        Primary basis set object containing information about atomic orbitals, such as:
 29        - bfs_coords : Cartesian coordinates of basis function centers.
 30        - bfs_coeffs : Contraction coefficients.
 31        - bfs_expnts : Gaussian exponents.
 32        - bfs_prim_norms : Primitive normalization constants.
 33        - bfs_contr_prim_norms : Contraction normalization factors.
 34        - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n).
 35        - bfs_nprim : Number of primitives per basis function.
 36        - bfs_nao : Total number of atomic orbitals.
 37
 38    auxbasis : object
 39        Auxiliary basis set object with the same attributes as `basis`,
 40        typically used for resolution-of-the-identity (RI) expansions.
 41
 42    Returns
 43    -------
 44    ints3c2e : ndarray of shape (Nbf*(Nbf+1)//2, Nauxbf)
 45        The computed 3-center 2-electron integrals in packed triangular form.
 46        The mapping from a pair (i, j) with i ≤ j to the first dimension index
 47        follows standard upper-triangular packing order.
 48
 49    Notes
 50    -----
 51    - This is a memory-efficient variant of `rys_3c2e_symm` that avoids storing
 52      the full (Nbf, Nbf, Nauxbf) array.
 53    - Uses preallocated NumPy arrays for primitive data to ensure efficient Numba processing.
 54    - Handles irregular contraction patterns by padding primitive arrays to the size
 55      of the largest contraction in the set.
 56    - No Schwarz screening or partial computation is available in this function.
 57
 58    Examples
 59    --------
 60    >>> ints_tri = rys_3c2e_tri(basis, auxbasis)
 61    >>> # Retrieve value for pair (i, j) and auxiliary k:
 62    >>> def packed_index(i, j, Nbf):
 63    ...     if i > j:
 64    ...         i, j = j, i
 65    ...     return i * Nbf - i*(i-1)//2 + (j - i)
 66    >>> val = ints_tri[packed_index(i, j, basis.bfs_nao), k]
 67    """
 68    # This is a memory efficient version of the rys_3c2e_symm().
 69    # Instead of returning an array of shape (Nbf, Nbf, Nauxbf) it returns
 70    # an array of shape (Nbf*(Nbf+1)/2, Nauxbf), where the first index corresponds to the
 71    # 1D triangular matrix.
 72
 73    # You cannot provide a slice here to calculate only a subset.
 74
 75    # Here the lists are converted to numpy arrays for better use with Numba.
 76    # Once these conversions are done we pass these to a Numba decorated
 77    # function that uses prange, etc. to calculate the 3c2e integrals efficiently.
 78    # This function calculates the 3c2e electron-electron ERIs for a given basis object and auxbasis object.
 79    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
 80
 81    #We convert the required properties to numpy arrays as this is what Numba likes.
 82    bfs_coords = np.array([basis.bfs_coords])
 83    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
 84    bfs_lmn = np.array([basis.bfs_lmn])
 85    bfs_nprim = np.array([basis.bfs_nprim])
 86
 87    #We convert the required properties to numpy arrays as this is what Numba likes.
 88    aux_bfs_coords = np.array([auxbasis.bfs_coords])
 89    aux_bfs_contr_prim_norms = np.array([auxbasis.bfs_contr_prim_norms])
 90    aux_bfs_lmn = np.array([auxbasis.bfs_lmn])
 91    aux_bfs_nprim = np.array([auxbasis.bfs_nprim])
 92        
 93
 94    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
 95    #Numba won't be able to work with these efficiently.
 96    #So, we convert them to a numpy 2d array by applying a trick,
 97    #that the second dimension is that of the largest list. So that
 98    #it can accomadate all the lists.
 99    maxnprim = max(basis.bfs_nprim)
100    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
101    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
102    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
103    for i in range(basis.bfs_nao):
104        for j in range(basis.bfs_nprim[i]):
105            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
106            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
107            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
108
109    maxnprimaux = max(auxbasis.bfs_nprim)
110    aux_bfs_coeffs = np.zeros([auxbasis.bfs_nao, maxnprimaux])
111    aux_bfs_expnts = np.zeros([auxbasis.bfs_nao, maxnprimaux])
112    aux_bfs_prim_norms = np.zeros([auxbasis.bfs_nao, maxnprimaux])
113    for i in range(auxbasis.bfs_nao):
114        for j in range(auxbasis.bfs_nprim[i]):
115            aux_bfs_coeffs[i,j] = auxbasis.bfs_coeffs[i][j]
116            aux_bfs_expnts[i,j] = auxbasis.bfs_expnts[i][j]
117            aux_bfs_prim_norms[i,j] = auxbasis.bfs_prim_norms[i][j]
118            
119
120    ints3c2e = rys_3c2e_tri_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts,aux_bfs_coords[0], aux_bfs_contr_prim_norms[0], aux_bfs_lmn[0], aux_bfs_nprim[0], aux_bfs_coeffs, aux_bfs_prim_norms, aux_bfs_expnts, basis.bfs_nao, auxbasis.bfs_nao)
121    return ints3c2e

Compute three-center two-electron (3c2e) electron repulsion integrals using the Rys quadrature method with symmetry considerations, stored in packed triangular form for memory efficiency.

This function evaluates integrals of the form (A B | C), where A and B are basis functions from a primary basis set and C is from an auxiliary basis set. Symmetry in the first two indices is exploited ((A B | C) = (B A | C)), so only the upper-triangular part of the (A,B) block is computed and stored. The resulting array has shape (N_{bf}*(N_{bf}+1)/2, N_{auxbf}), where the first index corresponds to a flattened 1D triangular index over basis function pairs.

Unlike rys_3c2e_symm, this routine does not support computing arbitrary slices — the full triangular set is always evaluated.

Parameters

basis : object Primary basis set object containing information about atomic orbitals, such as: - bfs_coords : Cartesian coordinates of basis function centers. - bfs_coeffs : Contraction coefficients. - bfs_expnts : Gaussian exponents. - bfs_prim_norms : Primitive normalization constants. - bfs_contr_prim_norms : Contraction normalization factors. - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n). - bfs_nprim : Number of primitives per basis function. - bfs_nao : Total number of atomic orbitals.

auxbasis : object Auxiliary basis set object with the same attributes as basis, typically used for resolution-of-the-identity (RI) expansions.

Returns

ints3c2e : ndarray of shape (Nbf*(Nbf+1)//2, Nauxbf) The computed 3-center 2-electron integrals in packed triangular form. The mapping from a pair (i, j) with i ≤ j to the first dimension index follows standard upper-triangular packing order.

Notes

  • This is a memory-efficient variant of rys_3c2e_symm that avoids storing the full (Nbf, Nbf, Nauxbf) array.
  • Uses preallocated NumPy arrays for primitive data to ensure efficient Numba processing.
  • Handles irregular contraction patterns by padding primitive arrays to the size of the largest contraction in the set.
  • No Schwarz screening or partial computation is available in this function.

Examples

>>> ints_tri = rys_3c2e_tri(basis, auxbasis)
>>> # Retrieve value for pair (i, j) and auxiliary k:
>>> def packed_index(i, j, Nbf):
...     if i > j:
...         i, j = j, i
...     return i * Nbf - i*(i-1)//2 + (j - i)
>>> val = ints_tri[packed_index(i, j, basis.bfs_nao), k]
def rys_nuc_mat_symm(basis, mol, slice=None):
 9def rys_nuc_mat_symm(basis, mol, slice=None):
10    # Uses the rys 3c2e algorithm to calculate the nuclear attraction integrals by making one of the basis function as a steep/sharp s-primitive gaussian
11    # Ref: https://arxiv.org/pdf/2302.11307.pdf
12    # Here the lists are converted to numpy arrays for better use with Numba.
13    # Once these conversions are done we pass these to a Numba decorated
14    # function that uses prange, etc. to calculate the 3c2e integrals efficiently.
15    # This function calculates the 3c2e electron-electron ERIs for a given basis object and auxbasis object.
16    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
17    
18
19    # It is possible to only calculate a slice (block/subset) of the complete set of integrals.
20    # slice is a 6 element list whose first and second elements give the range of the A functions to be calculated.
21    # and so on.
22    # slice = [indx_startA, indx_endA, indx_startB, indx_endB, indx_startC, indx_endC]
23
24    #We convert the required properties to numpy arrays as this is what Numba likes.
25    bfs_coords = np.array([basis.bfs_coords])
26    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
27    bfs_lmn = np.array([basis.bfs_lmn])
28    bfs_nprim = np.array([basis.bfs_nprim])
29
30    coordsBohrs = np.array([mol.coordsBohrs])
31    Z = np.array([mol.Zcharges])
32    natoms = mol.natoms
33        
34
35    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
36    #Numba won't be able to work with these efficiently.
37    #So, we convert them to a numpy 2d array by applying a trick,
38    #that the second dimension is that of the largest list. So that
39    #it can accomadate all the lists.
40    maxnprim = max(basis.bfs_nprim)
41    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
42    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
43    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
44    for i in range(basis.bfs_nao):
45        for j in range(basis.bfs_nprim[i]):
46            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
47            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
48            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
49
50            
51
52
53        
54    if slice is None:
55        slice = [0, basis.bfs_nao, 0, basis.bfs_nao]
56        
57    #Limits for the calculation of 4c2e integrals
58    indx_startA = int(slice[0])
59    indx_endA = int(slice[1])
60    indx_startB = int(slice[2])
61    indx_endB = int(slice[3])
62
63    ints3c2e = rys_nuc_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, indx_startA, indx_endA, indx_startB, indx_endB, Z[0], coordsBohrs[0], natoms)
64    return ints3c2e
rys_3c2e_tri_schwarzbf_val_helpers
def eval_xc_1( basis, dmat, weights, coords, funcid=[1, 7], spin=0, blocksize=50000, debug=False, list_nonzero_indices=None, count_nonzero_indices=None, list_ao_values=None, list_ao_grad_values=None):
  9def eval_xc_1(basis, dmat, weights, coords, funcid=[1,7], spin=0, blocksize=50000, debug=False, list_nonzero_indices=None, count_nonzero_indices=None, list_ao_values=None, list_ao_grad_values=None):
 10    """
 11    Evaluate exchange-correlation (XC) energy and potential matrix for DFT
 12    using algorithm 1, which is a baseline method for grid-based DFT.
 13
 14    In algorithm 1, the XC term is evaluated by looping over blocks of grid 
 15    points, and within each block, the operations are parallelized. This 
 16    approach is functional but generally slower than algorithm 2 for CPU-based 
 17    execution.
 18
 19    This function evaluates the XC energy and potential matrix elements for a given
 20    density matrix using numerical integration over a 3D real-space grid. It supports 
 21    LDA and GGA functionals via LibXC and optional use of precomputed AO values and 
 22    gradients for performance gains. Sparse AO matrix techniques are also supported.
 23
 24    Parameters
 25    ----------
 26    basis : Basis
 27        A basis object containing basis function data: exponents, coefficients, 
 28        angular momentum, normalization, and other AO metadata.
 29
 30    dmat : np.ndarray
 31        The one-electron density matrix in the AO basis.
 32
 33    weights : np.ndarray
 34        Integration weights associated with each grid point.
 35
 36    coords : np.ndarray
 37        Grid point coordinates as an (N, 3) array.
 38
 39    funcid : list of int, optional
 40        LibXC functional IDs. Default is [1, 7] for Slater (X) and VWN (C) (LDA).
 41
 42    spin : int, optional
 43        Spin multiplicity: 0 for unpolarized. Spin-polarized (1) not currently supported.
 44
 45    blocksize : int, optional
 46        Number of grid points to process per block. Default is 50000.
 47
 48    debug : bool, optional
 49        If True, enables verbose timing and diagnostic output.
 50
 51    list_nonzero_indices : list of np.ndarray, optional
 52        List of AO indices with non-negligible contributions in each grid block
 53        for sparse matrix optimizations.
 54
 55    count_nonzero_indices : list of int, optional
 56        Number of significant AO indices per block; matches entries in `list_nonzero_indices`.
 57
 58    list_ao_values : list of np.ndarray, optional
 59        Precomputed AO values at grid points for each block.
 60
 61    list_ao_grad_values : list of tuple of np.ndarray, optional
 62        Precomputed AO gradient values (x, y, z) at grid points for each block.
 63
 64    Returns
 65    -------
 66    efunc : float
 67        Total exchange-correlation energy.
 68
 69    v : np.ndarray
 70        Exchange-correlation potential matrix in the AO basis.
 71
 72    Notes
 73    -----
 74    - This algorithm prioritizes code clarity and correctness, not maximum speed.
 75    - Only LDA and GGA functionals are currently supported. meta-GGA and hybrid functionals
 76      are planned for future implementation.
 77    - Uses LibXC for exchange-correlation energy and potential evaluation.
 78    - AO values and gradients can be reused via precomputation to improve speed.
 79    - The number of electrons (via integrated density) is printed for validation.
 80
 81    References
 82    ----------
 83    - LibXC Functional Codes: https://libxc.gitlab.io/functionals/
 84    - Functional energy and potential formulation: https://pubs.acs.org/doi/full/10.1021/ct200412r
 85    - LibXC Python interface: https://www.tddft.org/programs/libxc/manual/
 86    """
 87    # Evaluate the XC term
 88    # This is a slightly slower algorithm.
 89    # Here the parallelization is done when evaluating bfs or the density
 90
 91    # In order to evaluate a density functional we will use the 
 92    # libxc library with Python bindings.
 93    # However, some sort of simple functionals like LDA, GGA, etc would
 94    # need to be implemented in CrysX also, so that the it doesn't depend
 95    # on external libraries so heavily that it becomes unusable without those.
 96
 97    #Useful links:
 98    # LibXC manual: https://www.tddft.org/programs/libxc/manual/
 99    # LibXC gitlab: https://gitlab.com/libxc/libxc/-/tree/master
100    # LibXC python interface code: https://gitlab.com/libxc/libxc/-/blob/master/pylibxc/functional.py
101    # LibXC python version installation and example: https://www.tddft.org/programs/libxc/installation/
102    # Formulae for XC energy and potential calculation: https://pubs.acs.org/doi/full/10.1021/ct200412r
103    # LibXC code list: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/libxc.py
104    # PySCF nr_rks code: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/numint.py
105    # https://www.osti.gov/pages/servlets/purl/1650078
106
107
108
109    #OUTPUT
110    #Functional energy
111    efunc = 0.0
112    #Functional potential V_{\mu \nu} = \mu|\partial{f}/\partial{\rho}|\nu
113    v = np.zeros((basis.bfs_nao, basis.bfs_nao))
114    #TODO mGGA, Hybrid
115    
116    #Calculate number of blocks/batches
117    ngrids = coords.shape[0]
118    nblocks = ngrids//blocksize
119    nelec = 0.0
120
121    # If a list of significant basis functions for each block of grid points is provided
122    if list_nonzero_indices is not None:
123        dmat_orig = dmat
124
125    ### Calculate stuff necessary for bf/ao evaluation on grid points
126    ### Doesn't make any difference for 510 bfs but might be significant for >1000 bfs
127    # This will help to make the call to eval_bfs faster by skipping the mediator eval_bfs function 
128    # that prepares the following stuff at every iteration
129    #We convert the required properties to numpy arrays as this is what Numba likes.
130    bfs_coords = np.array([basis.bfs_coords])
131    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
132    bfs_lmn = np.array([basis.bfs_lmn])
133    bfs_nprim = np.array([basis.bfs_nprim])
134    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
135    #Numba won't be able to work with these efficiently.
136    #So, we convert them to a numpy 2d array by applying a trick,
137    #that the second dimension is that of the largest list. So that
138    #it can accomadate all the lists.
139    maxnprim = max(basis.bfs_nprim)
140    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
141    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
142    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
143    bfs_radius_cutoff = np.zeros([basis.bfs_nao])
144    for i in range(basis.bfs_nao):
145        for j in range(basis.bfs_nprim[i]):
146            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
147            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
148            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
149            bfs_radius_cutoff[i] = basis.bfs_radius_cutoff[i]
150    # Now bf/ao values can be evaluated by calling the following
151    # bf_values = Integrals.bf_val_helpers.eval_bfs(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, bfs_radius_cutoff, coord)
152
153    # For debugging and benchmarking purposes
154    durationLibxc = 0.0
155    durationE = 0.0
156    durationF = 0.0
157    durationZ = 0.0
158    durationV = 0.0
159    durationRho = 0.0
160    durationAO = 0.0
161
162    xc_family_dict = {1:'LDA',2:'GGA',4:'MGGA'} 
163
164    # Create a LibXC object  
165    funcx = pylibxc.LibXCFunctional(funcid[0], "unpolarized")
166    funcc = pylibxc.LibXCFunctional(funcid[1], "unpolarized")
167    x_family_code = funcx.get_family()
168    c_family_code = funcc.get_family()
169
170    # Loop over blocks/batches of grid points
171    for iblock in range(nblocks+1):
172        offset = iblock*blocksize
173
174        # Get weights and coordinates of grid points for this block/batch
175        weights_block = weights[offset : min(offset+blocksize,ngrids)]
176        coords_block = coords[offset : min(offset+blocksize,ngrids)] 
177
178        # Get the list of basis functions with significant contributions to this block
179        if list_nonzero_indices is not None:
180            non_zero_indices = list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]]
181            # Get the subset of density matrix corresponding to the siginificant basis functions
182            dmat = dmat_orig[np.ix_(non_zero_indices, non_zero_indices)]
183
184        if debug:
185            startAO = timer()
186        if xc_family_dict[x_family_code]=='LDA' and xc_family_dict[c_family_code]=='LDA':
187            if list_ao_values is not None: # If ao_values are calculated once and saved, then they can be provided to avoid recalculation
188                ao_value_block = list_ao_values[iblock]
189            else:
190                # ao_value_block = Integrals.eval_bfs(basis, coords_block)       
191                if list_nonzero_indices is not None:
192                    # ao_value_block = Integrals.bf_val_helpers.eval_bfs_sparse_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, coords_block, non_zero_indices)
193                    ao_value_block = Integrals.bf_val_helpers.eval_bfs_sparse_vectorized_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, coords_block, non_zero_indices)
194                else:
195                    ao_value_block = Integrals.bf_val_helpers.eval_bfs_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, coords_block)
196        # If either x or c functional is of GGA/MGGA type we need ao_grad_values
197        if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA':
198            # ao_value_block, ao_values_grad_block = Integrals.bf_val_helpers.eval_bfs_and_grad(basis, coords_block, deriv=1, parallel=True, non_zero_indices=non_zero_indices)
199            if list_nonzero_indices is not None:
200                # Calculating ao values and gradients together, didn't really do much improvement in computational speed
201                ao_value_block, ao_values_grad_block = Integrals.bf_val_helpers.eval_bfs_and_grad_sparse_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, coords_block, non_zero_indices)
202            else:
203                ao_value_block, ao_values_grad_block = Integrals.bf_val_helpers.eval_bfs_and_grad_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, coords_block)
204        if debug:
205            durationAO = durationAO + timer() - startAO
206            
207
208        if debug:
209            startRho = timer()
210        
211        if list_nonzero_indices is not None:
212            rho_block = contract('ij,mi,mj->m', dmat, ao_value_block, ao_value_block) # Original (pretty fast)
213        else:
214            rho_block = Integrals.bf_val_helpers.bf_val_helpers.eval_rho(ao_value_block, dmat) # This is by-far the fastest now <-----
215        
216        # If either x or c functional is of GGA/MGGA type we need rho_grad_values too
217        if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA':
218            rho_grad_block_x = contract('ij,mi,mj->m',dmat,ao_values_grad_block[0],ao_value_block)+\
219                                    contract('ij,mi,mj->m',dmat,ao_value_block,ao_values_grad_block[0])
220            rho_grad_block_y = contract('ij,mi,mj->m',dmat,ao_values_grad_block[1],ao_value_block)+\
221                                    contract('ij,mi,mj->m',dmat,ao_value_block,ao_values_grad_block[1])
222            rho_grad_block_z = contract('ij,mi,mj->m',dmat,ao_values_grad_block[2],ao_value_block)+\
223                                    contract('ij,mi,mj->m',dmat,ao_value_block,ao_values_grad_block[2])
224            sigma_block = np.zeros((3,weights_block.shape[0]))
225            sigma_block[1] = rho_grad_block_x**2 + rho_grad_block_y**2 + rho_grad_block_z**2
226
227        if debug:
228            durationRho = durationRho + timer() - startRho
229    
230        #LibXC stuff
231        # Exchange
232        if debug:
233            startLibxc = timer()
234        # Input dictionary for libxc
235        inp = {}
236        # Input dictionary needs density values at grid points
237        inp['rho'] = rho_block
238        if xc_family_dict[x_family_code]!='LDA':
239            # Input dictionary needs sigma (\nabla \rho \cdot \nabla \rho) values at grid points
240            inp['sigma'] = sigma_block[1]
241        # Calculate the necessary quantities using LibXC
242        retx = funcx.compute(inp)
243        # print('Duration for LibXC computations at grid points: ',durationLibxc)
244
245        # Correlation
246        # Input dictionary for libxc
247        inp = {}
248        # Input dictionary needs density values at grid points
249        inp['rho'] = rho_block
250        if xc_family_dict[c_family_code]!='LDA':
251            # Input dictionary needs sigma (\nabla \rho \cdot \nabla \rho) values at grid points
252            inp['sigma'] = sigma_block[1]
253        # Calculate the necessary quantities using LibXC
254        retc = funcc.compute(inp)
255
256        if debug:
257            durationLibxc = durationLibxc + timer() - startLibxc
258        # print('Duration for LibXC computations at grid points: ',durationLibxc)
259
260        if debug:
261            startE = timer()
262        #ENERGY-----------
263        e = retx['zk'] + retc['zk'] # Functional values at grid points
264        # Testing CrysX's own implmentation
265        #e = densfuncs.lda_x(rho)
266
267        # Calculate the total energy 
268        # Multiply the density at grid points by weights
269        den = rho_block*weights_block #elementwise multiply
270        efunc = efunc + np.dot(den, e) #Multiply with functional values at grid points and sum
271        nelec = nelec + np.sum(den)
272
273        if debug:
274            durationE = durationE + timer() - startE
275        # print('Duration for calculation of total density functional energy: ',durationE)
276
277        #POTENTIAL----------
278        # The derivative of functional wrt density is vrho
279        vrho = retx['vrho'] + retc['vrho']
280        vsigma = 0
281        # If either x or c functional is of GGA/MGGA type we need rho_grad_values
282        if xc_family_dict[x_family_code]!='LDA':
283            # The derivative of functional wrt grad \rho square.
284            vsigma = retx['vsigma']
285        if xc_family_dict[c_family_code]!='LDA':
286            # The derivative of functional wrt grad \rho square.
287            vsigma += retc['vsigma']
288        
289        if debug:
290            startF = timer()
291        # F = np.multiply(weights_block,vrho[:,0]) #This is fast enough.
292        v_rho_temp = vrho[:,0]
293        F = numexpr.evaluate('(weights_block*v_rho_temp)')
294        # If either x or c functional is of GGA/MGGA type we need rho_grad_values
295        if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA':
296            Ftemp = 2*weights_block*vsigma[:,0]
297            Fx = Ftemp*rho_grad_block_x
298            Fy = Ftemp*rho_grad_block_y
299            Fz = Ftemp*rho_grad_block_z
300            # Ftemp = 2*np.multiply(weights_block,vsigma.T)
301            # Fx = Ftemp*rho_grad_block_x
302            # Fy = Ftemp*rho_grad_block_y
303            # Fz = Ftemp*rho_grad_block_z
304        if debug:
305            durationF = durationF + timer() - startF
306        # print('Duration for calculation of F: ',durationF)
307
308        if debug:
309            startZ = timer()
310        ao_value_block_T = ao_value_block.T
311        z = numexpr.evaluate('(0.5*F*ao_value_block_T)')
312        # z = 0.5*np.einsum('m,mi->mi',F,ao_value_block)
313        # If either x or c functional is of GGA/MGGA type we need rho_grad_values
314        if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA':
315            z = z + Fx*ao_values_grad_block[0].T + Fy*ao_values_grad_block[1].T + Fz*ao_values_grad_block[2].T
316            
317        if debug:
318            durationZ = durationZ + timer() - startZ
319        # print('Duration for calculation of z : ',durationZ)
320        # Free memory
321        F = 0
322        ao_value_block_T = 0
323        vrho = 0
324
325        if debug:
326            startV = timer()
327        # Numexpr
328        v_temp = z @ ao_value_block  
329        v_temp_T = v_temp.T
330        if list_nonzero_indices is not None:
331            v[np.ix_(non_zero_indices, non_zero_indices)] += numexpr.evaluate('(v_temp + v_temp_T)')
332        else:
333            v = numexpr.evaluate('(v + v_temp + v_temp_T)')
334
335        
336        if debug:
337            durationV = durationV + timer() - startV
338
339    print('Number of electrons: ', nelec)
340
341    if debug:
342        print('Duration for AO values: ', durationAO)
343        print('Duration for V: ',durationV)
344        print('Duration for Rho at grid points: ',durationRho)
345        print('Duration for F: ',durationF)
346        print('Duration for Z: ',durationZ)
347        print('Duration for E: ',durationE)
348        print('Duration for LibXC: ',durationLibxc)
349
350    
351
352    return efunc[0], v

Evaluate exchange-correlation (XC) energy and potential matrix for DFT using algorithm 1, which is a baseline method for grid-based DFT.

In algorithm 1, the XC term is evaluated by looping over blocks of grid points, and within each block, the operations are parallelized. This approach is functional but generally slower than algorithm 2 for CPU-based execution.

This function evaluates the XC energy and potential matrix elements for a given density matrix using numerical integration over a 3D real-space grid. It supports LDA and GGA functionals via LibXC and optional use of precomputed AO values and gradients for performance gains. Sparse AO matrix techniques are also supported.

Parameters

basis : Basis A basis object containing basis function data: exponents, coefficients, angular momentum, normalization, and other AO metadata.

dmat : np.ndarray The one-electron density matrix in the AO basis.

weights : np.ndarray Integration weights associated with each grid point.

coords : np.ndarray Grid point coordinates as an (N, 3) array.

funcid : list of int, optional LibXC functional IDs. Default is [1, 7] for Slater (X) and VWN (C) (LDA).

spin : int, optional Spin multiplicity: 0 for unpolarized. Spin-polarized (1) not currently supported.

blocksize : int, optional Number of grid points to process per block. Default is 50000.

debug : bool, optional If True, enables verbose timing and diagnostic output.

list_nonzero_indices : list of np.ndarray, optional List of AO indices with non-negligible contributions in each grid block for sparse matrix optimizations.

count_nonzero_indices : list of int, optional Number of significant AO indices per block; matches entries in list_nonzero_indices.

list_ao_values : list of np.ndarray, optional Precomputed AO values at grid points for each block.

list_ao_grad_values : list of tuple of np.ndarray, optional Precomputed AO gradient values (x, y, z) at grid points for each block.

Returns

efunc : float Total exchange-correlation energy.

v : np.ndarray Exchange-correlation potential matrix in the AO basis.

Notes

  • This algorithm prioritizes code clarity and correctness, not maximum speed.
  • Only LDA and GGA functionals are currently supported. meta-GGA and hybrid functionals are planned for future implementation.
  • Uses LibXC for exchange-correlation energy and potential evaluation.
  • AO values and gradients can be reused via precomputation to improve speed.
  • The number of electrons (via integrated density) is printed for validation.

References

def eval_xc_2( basis, dmat, weights, coords, funcid=[1, 7], spin=0, ncores=2, blocksize=5000, list_nonzero_indices=None, count_nonzero_indices=None, list_ao_values=None, list_ao_grad_values=None, debug=False):
 18def eval_xc_2(basis, dmat, weights, coords, funcid=[1,7], spin=0, ncores=2, blocksize=5000, list_nonzero_indices=None, count_nonzero_indices=None, list_ao_values=None, list_ao_grad_values=None, debug=False):
 19    """
 20    Evaluate exchange-correlation (XC) energy and potential matrix for DFT
 21    using algorithm 2, which is the preferred algorithm for running DFT on CPU.
 22
 23    In algorithm 2, the XC term is evaluated by parallelizing over batches.
 24    In contrast, the algorithm 1 evaluates the XC term by looping over batches and 
 25    parallelizing the operations within the batch, which is suboptimal.
 26
 27    This function evaluates the XC energy and potential matrix elements for a given
 28    density matrix using grid-based numerical integration. It supports LDA and GGA 
 29    functionals (via LibXC), parallel execution with `joblib`, and sparse AO matrix blocks.
 30
 31    Parameters
 32    ----------
 33    basis : Basis
 34        A basis object containing basis function data: exponents, coefficients, 
 35        angular momentum, normalization, etc.
 36
 37    dmat : np.ndarray
 38        The one-electron density matrix in the AO basis.
 39
 40    weights : np.ndarray
 41        Integration weights associated with each grid point.
 42
 43    coords : np.ndarray
 44        Grid point coordinates as an (N, 3) array.
 45
 46    funcid : list of int, optional
 47        LibXC functional IDs. Default is [1, 7] for Slater (X) and VWN (C) (LDA).
 48
 49    spin : int, optional
 50        Spin multiplicity: 0 for unpolarized, 1 for spin-polarized (not allowed currently).
 51
 52    ncores : int, optional
 53        Number of threads/cores for parallel execution. Default is 2.
 54
 55    blocksize : int, optional
 56        Number of grid points to process per block. Default is 5000.
 57
 58    list_nonzero_indices : list of np.ndarray, optional
 59        Precomputed list of nonzero AO indices per block for sparse evaluation.
 60
 61    count_nonzero_indices : list of int, optional
 62        Number of nonzero indices in each block (matches `list_nonzero_indices`).
 63
 64    list_ao_values : list of np.ndarray, optional
 65        Precomputed AO values at grid points for each block.
 66
 67    list_ao_grad_values : list of tuple of np.ndarray, optional
 68        Precomputed AO gradient values (x, y, z) at grid points for each block.
 69
 70    debug : bool, optional
 71        If True, print detailed timing and diagnostic info.
 72
 73    Returns
 74    -------
 75    efunc : float
 76        Total exchange-correlation energy.
 77
 78    v : np.ndarray
 79        Exchange-correlation potential matrix in the AO basis.
 80
 81    Notes
 82    -----
 83    - Only supports LDA and GGA functionals currently. meta-GGA and hybrid support is in development.
 84    - Uses LibXC for energy and derivative functional evaluation.
 85    - Blocks are randomly shuffled before processing to balance parallel load.
 86    - Grid-based DFT evaluation using precomputed AO and gradient matrices is supported.
 87
 88    References
 89    ----------
 90    - LibXC Functional Codes: https://libxc.gitlab.io/functionals/
 91    - XC term evaluation inspired from: https://pubs.acs.org/doi/full/10.1021/ct200412r
 92    """
 93    # This performs parallelization at the blocks/batches level.
 94    # Therefore, joblib is perfect for such embarrasingly parallel task
 95    # In order to evaluate a density functional we will use the 
 96    # libxc library with Python bindings.
 97    # However, some sort of simple functionals like LDA, GGA, etc would
 98    # need to be implemented in CrysX also, so that the it doesn't depend
 99    # on external libraries so heavily that it becomes unusable without those.
100
101    #Useful links:
102    # LibXC manual: https://www.tddft.org/programs/libxc/manual/
103    # LibXC gitlab: https://gitlab.com/libxc/libxc/-/tree/master
104    # LibXC python interface code: https://gitlab.com/libxc/libxc/-/blob/master/pylibxc/functional.py
105    # LibXC python version installation and example: https://www.tddft.org/programs/libxc/installation/
106    # Formulae for XC energy and potential calculation: https://pubs.acs.org/doi/full/10.1021/ct200412r
107    # LibXC code list: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/libxc.py
108    # PySCF nr_rks code: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/numint.py
109    # https://www.osti.gov/pages/servlets/purl/1650078
110
111    #OUTPUT
112    #Functional energy
113    efunc = 0.0
114    #Functional potential V_{\mu \nu} = \mu|\partial{f}/\partial{\rho}|\nu
115    v = np.zeros((basis.bfs_nao, basis.bfs_nao))
116    nelec = 0
117
118    # TODO it only works for LDA functionals for now.
119    # Need to make it work for GGA, Hybrid, range-separated Hybrid and MetaGGA functionals as well.
120    
121
122    ngrids = coords.shape[0]
123    nblocks = ngrids//blocksize
124
125    # print('Number of blocks: ', nblocks)
126
127    # Some stuff to note timings
128    timings = None
129    if debug:
130        durationLibxc = 0.0
131        durationE = 0.0
132        durationF = 0.0
133        durationZ = 0.0
134        durationV = 0.0
135        durationRho = 0.0
136        durationAO = 0.0
137        timings = {'durationLibxc':durationLibxc, 'durationE':durationE, 'durationF':durationF, 'durationZ':durationZ, 'durationV':durationV, 'durationRho':durationRho, 'durationAO':durationAO}
138
139    
140
141    ### Calculate stuff necessary for bf/ao evaluation on grid points
142    ### Doesn't make any difference for 510 bfs but might be significant for >1000 bfs
143    # This will help to make the call to evalbfnumba1 faster by skipping the mediator evalbfnumbawrap function 
144    # that prepares the following stuff at every iteration
145    #We convert the required properties to numpy arrays as this is what Numba likes.
146    bfs_coords = np.array([basis.bfs_coords])
147    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
148    bfs_lmn = np.array([basis.bfs_lmn])
149    bfs_nprim = np.array([basis.bfs_nprim])
150    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
151    #Numba won't be able to work with these efficiently.
152    #So, we convert them to a numpy 2d array by applying a trick,
153    #that the second dimension is that of the largest list. So that
154    #it can accomadate all the lists.
155    maxnprim = max(basis.bfs_nprim)
156    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
157    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
158    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
159    bfs_radius_cutoff = np.zeros([basis.bfs_nao])
160    for i in range(basis.bfs_nao):
161        for j in range(basis.bfs_nprim[i]):
162            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
163            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
164            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
165            bfs_radius_cutoff[i] = basis.bfs_radius_cutoff[i]
166    # Now bf/ao values can be evaluated by calling the following
167    # bf_values = Integrals.bf_val_helpers.eval_bfs(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, bfs_radius_cutoff, coord)
168    bfs_data_as_np_arrays = [bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, bfs_radius_cutoff]
169
170    xc_family_dict = {1:'LDA',2:'GGA',4:'MGGA'} 
171    # Create a LibXC object  
172    funcx = pylibxc.LibXCFunctional(funcid[0], "unpolarized")
173    funcc = pylibxc.LibXCFunctional(funcid[1], "unpolarized")
174    x_family_code = funcx.get_family()
175    c_family_code = funcc.get_family()
176
177    #### Set number of cores for numba related evaluations within 'block_dens_func()' for example bf_value evaluations 
178    #### Apparently, it was still using as many cores as possible and creating a serial version of thise functions as I have 
179    #### currently done doesn't work  
180    numba.set_num_threads(1)
181    # Shuffle the blocks (for load balancing)
182    block_indices = list(range(nblocks+1))
183    random.shuffle(block_indices)
184
185    # rho_blocks = [np.zeros((blocksize,1)) for _ in range(nblocks+1)]
186    
187    if 2*ncores>nblocks:
188        batch_size='auto'
189    else:
190        batch_size = nblocks//(ncores*2)
191
192    # NumExpr expressions compile
193    expr_den = numexpr.NumExpr('rho_block*weights_block')
194    expr_F = numexpr.NumExpr('weights_block*v_rho_temp')
195    expr_z = numexpr.NumExpr('0.5*F*ao_value_block_T')
196    expr_v = numexpr.NumExpr('v_temp + v_temp_T')
197    expr_sigma_block = numexpr.NumExpr('rho_grad_block_x**2 + rho_grad_block_y**2 + rho_grad_block_z**2')
198    expr_Ftemp = numexpr.NumExpr('2*weights_block*vsigma_temp')
199    expr_Fx = numexpr.NumExpr('Ftemp*rho_grad_block_x')
200    expr_Fy = numexpr.NumExpr('Ftemp*rho_grad_block_y')
201    expr_Fz = numexpr.NumExpr('Ftemp*rho_grad_block_z')
202    expr_z_grad = numexpr.NumExpr('Fx*ao_value_gradx_block_T + Fy*ao_value_grady_block_T + Fz*ao_value_gradz_block_T')
203    numexpr_expr = {'expr_den':expr_den, 'expr_F':expr_F, 'expr_z':expr_z, 'expr_v':expr_v, 'expr_sigma_block':expr_sigma_block, \
204                    'expr_Fx':expr_Fx, 'expr_Fy':expr_Fy, 'expr_Fz':expr_Fz, 'expr_z_grad':expr_z_grad, 'expr_Ftemp':expr_Ftemp}
205
206
207    if list_nonzero_indices is not None:
208        if list_ao_values is not None:
209            if xc_family_dict[x_family_code]=='LDA' and xc_family_dict[c_family_code]=='LDA':
210                output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem', batch_size=batch_size, pre_dispatch=3*ncores)(delayed(block_dens_func)(weights[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat[np.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_ao_values[iblock], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, numexpr_expr=numexpr_expr, debug=debug) for iblock in block_indices)
211            else: #GGA
212                output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem', batch_size=batch_size)(delayed(block_dens_func)(weights[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat[np.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_ao_values[iblock], list_ao_grad_values[iblock], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, numexpr_expr=numexpr_expr, debug=debug) for iblock in block_indices)
213        else:
214            output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem', batch_size=batch_size)(delayed(block_dens_func)(weights[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat[np.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, numexpr_expr=numexpr_expr, debug=debug) for iblock in block_indices)
215            # output = Parallel(n_jobs=ncores, backend='loky')(delayed(block_dens_func)(weights[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat[np.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], funcx=None, funcc=None, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug) for iblock in block_indices)
216    else:
217        output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem', batch_size=batch_size)(delayed(block_dens_func)(weights[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat, funcid, bfs_data_as_np_arrays, non_zero_indices=None, ao_values=None, funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, numexpr_expr=numexpr_expr, debug=debug) for iblock in block_indices)
218        
219    indx_block_output = 0
220    for iblock in block_indices:
221        efunc += output[indx_block_output][0]
222        if list_nonzero_indices is not None:
223            non_zero_indices = list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]]
224            v[np.ix_(non_zero_indices, non_zero_indices)] += output[indx_block_output][1]
225            # slice = np.ix_(non_zero_indices, non_zero_indices)
226            # v[slice] = numexpr.evaluate("(v_ + output_)", {'v_':v[slice], 'output_':output[indx_block_output][1]})
227        else:
228            v += output[indx_block_output][1]
229        nelec += output[indx_block_output][2]
230        if debug:
231            timings['durationLibxc'] += output[indx_block_output][3]['durationLibxc']
232            timings['durationE'] += output[indx_block_output][3]['durationE']
233            timings['durationF'] += output[indx_block_output][3]['durationF']
234            timings['durationZ'] += output[indx_block_output][3]['durationZ']
235            timings['durationV'] += output[indx_block_output][3]['durationV']
236            timings['durationRho'] += output[indx_block_output][3]['durationRho']
237            timings['durationAO'] += output[indx_block_output][3]['durationAO']
238        indx_block_output += 1 
239        # v = numexpr.evaluate('(v + output[iblock][1])')
240
241    # print('here1')
242    # for i in range(1000):
243    #     a = dmat[np.ix_(list_nonzero_indices[60][0:count_nonzero_indices[60]], list_nonzero_indices[60][0:count_nonzero_indices[60]])]
244    # print('here2')
245    
246
247    numba.set_num_threads(ncores)    
248    print('Number of electrons: ', nelec)
249    if debug:
250        print('Timings:', timings)
251
252    ####### Free memory
253    ## The following is very important to prevent memory leaks and also to make sure that the number of 
254    # threads used by the program is same as that specified by the user 
255    # gc.collect()  # Avoiding using it for now, as it is usually quite slow, although in this case it might not make much difference
256    # Anyway, the following also works
257    output = 0
258    non_zero_indices = 0
259    coords = 0
260
261    return efunc[0], v

Evaluate exchange-correlation (XC) energy and potential matrix for DFT using algorithm 2, which is the preferred algorithm for running DFT on CPU.

In algorithm 2, the XC term is evaluated by parallelizing over batches. In contrast, the algorithm 1 evaluates the XC term by looping over batches and parallelizing the operations within the batch, which is suboptimal.

This function evaluates the XC energy and potential matrix elements for a given density matrix using grid-based numerical integration. It supports LDA and GGA functionals (via LibXC), parallel execution with joblib, and sparse AO matrix blocks.

Parameters

basis : Basis A basis object containing basis function data: exponents, coefficients, angular momentum, normalization, etc.

dmat : np.ndarray The one-electron density matrix in the AO basis.

weights : np.ndarray Integration weights associated with each grid point.

coords : np.ndarray Grid point coordinates as an (N, 3) array.

funcid : list of int, optional LibXC functional IDs. Default is [1, 7] for Slater (X) and VWN (C) (LDA).

spin : int, optional Spin multiplicity: 0 for unpolarized, 1 for spin-polarized (not allowed currently).

ncores : int, optional Number of threads/cores for parallel execution. Default is 2.

blocksize : int, optional Number of grid points to process per block. Default is 5000.

list_nonzero_indices : list of np.ndarray, optional Precomputed list of nonzero AO indices per block for sparse evaluation.

count_nonzero_indices : list of int, optional Number of nonzero indices in each block (matches list_nonzero_indices).

list_ao_values : list of np.ndarray, optional Precomputed AO values at grid points for each block.

list_ao_grad_values : list of tuple of np.ndarray, optional Precomputed AO gradient values (x, y, z) at grid points for each block.

debug : bool, optional If True, print detailed timing and diagnostic info.

Returns

efunc : float Total exchange-correlation energy.

v : np.ndarray Exchange-correlation potential matrix in the AO basis.

Notes

  • Only supports LDA and GGA functionals currently. meta-GGA and hybrid support is in development.
  • Uses LibXC for energy and derivative functional evaluation.
  • Blocks are randomly shuffled before processing to balance parallel load.
  • Grid-based DFT evaluation using precomputed AO and gradient matrices is supported.

References

def eval_xc_3( basis, dmat, weights, coords, funcid=[1, 7], spin=0, ncores=2, blocksize=5000, list_nonzero_indices=None, count_nonzero_indices=None, list_ao_values=None, list_ao_grad_values=None, debug=False):
 16def eval_xc_3(basis, dmat, weights, coords, funcid=[1,7], spin=0, ncores=2, blocksize=5000, list_nonzero_indices=None, count_nonzero_indices=None, list_ao_values=None, list_ao_grad_values=None, debug=False):
 17    # This performs parallelization at the blocks/batches level.
 18    # Therefore, joblib is perfect for such embarrasingly parallel task
 19    # In order to evaluate a density functional we will use the 
 20    # libxc library with Python bindings.
 21    # However, some sort of simple functionals like LDA, GGA, etc would
 22    # need to be implemented in CrysX also, so that the it doesn't depend
 23    # on external libraries so heavily that it becomes unusable without those.
 24
 25    #Useful links:
 26    # LibXC manual: https://www.tddft.org/programs/libxc/manual/
 27    # LibXC gitlab: https://gitlab.com/libxc/libxc/-/tree/master
 28    # LibXC python interface code: https://gitlab.com/libxc/libxc/-/blob/master/pylibxc/functional.py
 29    # LibXC python version installation and example: https://www.tddft.org/programs/libxc/installation/
 30    # Formulae for XC energy and potential calculation: https://pubs.acs.org/doi/full/10.1021/ct200412r
 31    # LibXC code list: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/libxc.py
 32    # PySCF nr_rks code: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/numint.py
 33    # https://www.osti.gov/pages/servlets/purl/1650078
 34
 35    # Start Ray.
 36    if not ray.is_initialized():
 37        ray.init(num_cpus = ncores-1)
 38
 39    #OUTPUT
 40    #Functional energy
 41    efunc = 0.0
 42    #Functional potential V_{\mu \nu} = \mu|\partial{f}/\partial{\rho}|\nu
 43    # with threadpool_limits(limits=1, user_api='blas'):
 44    v = np.zeros((basis.bfs_nao, basis.bfs_nao))
 45    nelec = 0
 46
 47    # TODO it only works for LDA functionals for now.
 48    # Need to make it work for GGA, Hybrid, range-separated Hybrid and MetaGGA functionals as well.
 49    
 50
 51    ngrids = coords.shape[0]
 52    nblocks = ngrids//blocksize
 53
 54    # print('Number of blocks: ', nblocks)
 55
 56    # Some stuff to note timings
 57    timings = None
 58    if debug:
 59        durationLibxc = 0.0
 60        durationE = 0.0
 61        durationF = 0.0
 62        durationZ = 0.0
 63        durationV = 0.0
 64        durationRho = 0.0
 65        durationAO = 0.0
 66        timings = {'durationLibxc':durationLibxc, 'durationE':durationE, 'durationF':durationF, 'durationZ':durationZ, 'durationV':durationV, 'durationRho':durationRho, 'durationAO':durationAO}
 67
 68    
 69
 70    ### Calculate stuff necessary for bf/ao evaluation on grid points
 71    ### Doesn't make any difference for 510 bfs but might be significant for >1000 bfs
 72    # This will help to make the call to evalbfnumba1 faster by skipping the mediator evalbfnumbawrap function 
 73    # that prepares the following stuff at every iteration
 74    #We convert the required properties to numpy arrays as this is what Numba likes.
 75    bfs_coords = np.array([basis.bfs_coords])
 76    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
 77    bfs_lmn = np.array([basis.bfs_lmn])
 78    bfs_nprim = np.array([basis.bfs_nprim])
 79    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
 80    #Numba won't be able to work with these efficiently.
 81    #So, we convert them to a numpy 2d array by applying a trick,
 82    #that the second dimension is that of the largest list. So that
 83    #it can accomadate all the lists.
 84    maxnprim = max(basis.bfs_nprim)
 85    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
 86    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
 87    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
 88    bfs_radius_cutoff = np.zeros([basis.bfs_nao])
 89    for i in range(basis.bfs_nao):
 90        for j in range(basis.bfs_nprim[i]):
 91            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
 92            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
 93            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
 94            bfs_radius_cutoff[i] = basis.bfs_radius_cutoff[i]
 95    # Now bf/ao values can be evaluated by calling the following
 96    # bf_values = Integrals.bf_val_helpers.eval_bfs(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, bfs_radius_cutoff, coord)
 97    bfs_data_as_np_arrays = [bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, bfs_radius_cutoff]
 98
 99    xc_family_dict = {1:'LDA',2:'GGA',4:'MGGA'} 
100    # Create a LibXC object  
101    funcx = pylibxc.LibXCFunctional(funcid[0], "unpolarized")
102    funcc = pylibxc.LibXCFunctional(funcid[1], "unpolarized")
103    x_family_code = funcx.get_family()
104    c_family_code = funcc.get_family()
105
106    # weights_put = ray.put(weights)
107    # coords_put = ray.put(coords)
108    bfs_data_as_np_arrays = ray.put(bfs_data_as_np_arrays)
109    if list_nonzero_indices is not None:
110        if list_ao_values is not None:
111            if xc_family_dict[x_family_code]=='LDA' and xc_family_dict[c_family_code]=='LDA':
112                # Launch four parallel square tasks
113                futures = [block_dens_func.remote(weights[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat[np.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_ao_values[iblock], funcx=None, funcc=None, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug) for iblock in range(nblocks+1)]
114    #         else: #GGA
115    #             output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem', batch_size=batch_size)(delayed(block_dens_func)(weights[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat[np.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_ao_values[iblock], list_ao_grad_values[iblock], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug) for iblock in block_indices)
116    #     else:
117    #         output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem', batch_size=batch_size)(delayed(block_dens_func)(weights[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat[np.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug) for iblock in block_indices)
118    # else:
119    #     output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem', batch_size=batch_size)(delayed(block_dens_func)(weights[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat, funcid, bfs_data_as_np_arrays, non_zero_indices=None, ao_values=None, funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug) for iblock in block_indices)
120        
121    # Retrieve results
122    output = ray.get(futures)
123
124    for iblock in range(nblocks+1):
125        efunc += output[iblock][0]
126        if list_nonzero_indices is not None:
127            non_zero_indices = list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]]
128            v[np.ix_(non_zero_indices, non_zero_indices)] += output[iblock][1]
129            # v[np.ix_(non_zero_indices, non_zero_indices)] = numexpr.evaluate("(v_ + output_)", {'v_':v[np.ix_(non_zero_indices, non_zero_indices)], 'output_':output[indx_block_output][1]})
130        else:
131            v += output[iblock][1]
132        nelec += output[iblock][2]
133        if debug:
134            timings['durationLibxc'] += output[iblock][3]['durationLibxc']
135            timings['durationE'] += output[iblock][3]['durationE']
136            timings['durationF'] += output[iblock][3]['durationF']
137            timings['durationZ'] += output[iblock][3]['durationZ']
138            timings['durationV'] += output[iblock][3]['durationV']
139            timings['durationRho'] += output[iblock][3]['durationRho']
140            timings['durationAO'] += output[iblock][3]['durationAO']
141        # v = numexpr.evaluate('(v + output[iblock][1])')
142
143
144
145    numba.set_num_threads(ncores)    
146    print('Number of electrons: ', nelec)
147    if debug:
148        print('Timings:', timings)
149
150    ####### Free memory
151    ## The following is very important to prevent memory leaks and also to make sure that the number of 
152    # threads used by the program is same as that specified by the user 
153    # gc.collect()  # Avoiding using it for now, as it is usually quite slow, although in this case it might not make much difference
154    # Anyway, the following also works
155    output = 0
156    non_zero_indices = 0
157    coords = 0
158
159    return efunc, v
def eval_xc_1_cupy( basis, dmat, weights, coords, funcid=[1, 7], spin=0, blocksize=50000, debug=False, list_nonzero_indices=None, count_nonzero_indices=None, list_ao_values=None, list_ao_grad_values=None, use_libxc=True, nstreams=1, ngpus=1, freemem=True, threads_per_block=None, type=<class 'numpy.float64'>):
 22def eval_xc_1_cupy(basis, dmat, weights, coords, funcid=[1,7], spin=0, blocksize=50000, debug=False, list_nonzero_indices=None, \
 23                   count_nonzero_indices=None, list_ao_values=None, list_ao_grad_values=None, use_libxc=True, nstreams=1, ngpus=1,\
 24                    freemem=True, threads_per_block=None, type=cp.float64):
 25    print('Calculating XC term using GPU and algo 1', flush=True)
 26    # Evaluate the XC term using GPU
 27    # Here instead of parallelizing over batches, the operations within
 28    # a batch are parallelized. Therefore, this requires a larger batchsize~20480.
 29    # While the CPU version of this algorithm is quite slow, 
 30    # the GPU version performs great!
 31
 32    # In order to evaluate a density functional we will use the 
 33    # libxc library with Python bindings.
 34    # However, some sort of simple functionals like LDA, GGA, etc would
 35    # need to be implemented in CrysX also, so that the it doesn't depend
 36    # on external libraries so heavily that it becomes unusable without those.
 37
 38    #Useful links:
 39    # LibXC manual: https://www.tddft.org/programs/libxc/manual/
 40    # LibXC gitlab: https://gitlab.com/libxc/libxc/-/tree/master
 41    # LibXC python interface code: https://gitlab.com/libxc/libxc/-/blob/master/pylibxc/functional.py
 42    # LibXC python version installation and example: https://www.tddft.org/programs/libxc/installation/
 43    # Formulae for XC energy and potential calculation: https://pubs.acs.org/doi/full/10.1021/ct200412r
 44    # LibXC code list: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/libxc.py
 45    # PySCF nr_rks code: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/numint.py
 46    # https://www.osti.gov/pages/servlets/purl/1650078
 47
 48    startTotal = timer()
 49
 50    # For debugging and benchmarking purposes
 51    durationLibxc = 0.0
 52    durationE = 0.0
 53    durationF = 0.0
 54    durationZ = 0.0
 55    durationV = 0.0
 56    durationRho = 0.0
 57    durationAO = 0.0
 58    durationPrelims = 0.0
 59    durationTotal = 0.0
 60    n_gpu= 1
 61    # nstreams = 1
 62    # Create streams for asynchronous execution
 63    # streams = [cp.cuda.Stream(non_blocking=False) for i in range(nstreams)]
 64    # nb_streams = [cuda.external_stream(streams[i].ptr) for i in range(nstreams)]
 65    # nb_streams = [cuda.stream() for i in range(nstreams)]
 66    # cp_streams = [cp.cuda.ExternalStream(nb_streams[i].ptr) for i in range(nstreams)]
 67    # TODO: Try using this for multiGPU support: https://github.com/cupy/cupy/issues/5692
 68    # streams = []
 69    # for i in range(n_gpu):
 70    #     cp.cuda.Device(i).use()
 71    #     streams.append(cp.cuda.Stream(non_blocking = True))
 72
 73    if not use_libxc:
 74        print('Not using LibXC for XC evaluations', flush=True)
 75
 76    if debug:
 77        startPrelims = timer()
 78    #OUTPUT
 79    #Functional energy
 80    efunc = 0.0
 81    #Functional potential V_{\mu \nu} = \mu|\partial{f}/\partial{\rho}|\nu
 82    v = cp.zeros((basis.bfs_nao, basis.bfs_nao), dtype=type)
 83    #print(v.dtype)
 84    #TODO mGGA, Hybrid
 85    
 86    #Calculate number of blocks/batches
 87    ngrids = coords.shape[0]
 88    nblocks = ngrids//blocksize
 89    nelec = 0.0
 90
 91    # Convert the arrays needed for CUDA computations to cupy arrays
 92
 93    # If a list of significant basis functions for each block of grid points is provided
 94    if list_nonzero_indices is not None:
 95        dmat_orig = dmat
 96        dmat_orig_cp = cp.asarray(dmat_orig, dtype=type)
 97    else:
 98        dmat = cp.asarray(dmat, dtype=type)
 99
100    ### Calculate stuff necessary for bf/ao evaluation on grid points
101    ### Doesn't make any difference for 510 bfs but might be significant for >1000 bfs
102    # This will help to make the call to eval_bfs faster by skipping the mediator eval_bfs function 
103    # that prepares the following stuff at every iteration
104    #We convert the required properties to numpy arrays as this is what Numba likes.
105    bfs_coords = cp.asarray([basis.bfs_coords], dtype=type)
106    bfs_contr_prim_norms = cp.array([basis.bfs_contr_prim_norms], dtype=type)
107    bfs_lmn = cp.array([basis.bfs_lmn])
108    bfs_nprim = cp.array([basis.bfs_nprim])
109    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
110    #Numba won't be able to work with these efficiently.
111    #So, we convert them to a numpy 2d array by applying a trick,
112    #that the second dimension is that of the largest list. So that
113    #it can accomadate all the lists.
114    maxnprim = max(basis.bfs_nprim)
115    bfs_coeffs = cp.zeros([basis.bfs_nao, maxnprim], dtype=type)
116    bfs_expnts = cp.zeros([basis.bfs_nao, maxnprim], dtype=type)
117    bfs_prim_norms = cp.zeros([basis.bfs_nao, maxnprim], dtype=type)
118    bfs_radius_cutoff = cp.zeros([basis.bfs_nao], dtype=type)
119    for i in range(basis.bfs_nao):
120        for j in range(basis.bfs_nprim[i]):
121            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
122            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
123            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
124            bfs_radius_cutoff[i] = basis.bfs_radius_cutoff[i]
125    # Now bf/ao values can be evaluated by calling the following
126    # bf_values = Integrals.bf_val_helpers.eval_bfs(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, bfs_radius_cutoff, coord)
127    for iblock in range(len(list_nonzero_indices)):
128        list_nonzero_indices[iblock] = cp.asarray(list_nonzero_indices[iblock])
129
130    
131    weights_cp = cp.asarray(weights, dtype=type)
132    coords_cp = cp.asarray(coords, dtype=type)
133    
134
135    xc_family_dict = {1:'LDA',2:'GGA',4:'MGGA'} 
136
137    # Create a LibXC object  
138    funcx = pylibxc.LibXCFunctional(funcid[0], "unpolarized")
139    funcc = pylibxc.LibXCFunctional(funcid[1], "unpolarized")
140    x_family_code = funcx.get_family()
141    c_family_code = funcc.get_family()
142
143
144    my_expr = contract_expression('ij,mi,mj->m', (150, 150), (blocksize, 150), (blocksize, 150))
145    if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA':
146        my_expr_grad1 = contract_expression('ij,kmi,mj->km', (150, 150), (3, blocksize, 150), (blocksize, 150))
147        my_expr_grad2 = contract_expression('ij,mi,kmj->km', (150, 150), (blocksize, 150), (3, blocksize, 150))
148
149    if threads_per_block is None:
150        # Determine the optimal number of blocks per grid
151        max_threads_per_block = cuda.get_current_device().MAX_THREADS_PER_BLOCK
152        thread_x = max_threads_per_block/16
153        thread_y = max_threads_per_block/64
154        threads_per_block = (thread_x, thread_y)
155    else:
156        thread_x = threads_per_block[0]
157        thread_y = threads_per_block[1]
158
159
160    if debug:
161        cp.cuda.Stream.null.synchronize()
162        durationPrelims += timer() - startPrelims
163
164    # A large scratch for evaluating batch local ao values
165    ao_value_block_ = cp.zeros((blocksize, max(count_nonzero_indices)), dtype=type)
166    # If either x or c functional is of GGA/MGGA type we need ao_grad_values
167    if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA':
168        ao_value_grad_block_ = cp.zeros((3, blocksize, max(count_nonzero_indices)), dtype=type)
169
170    # Just a precautionary synchronize here so that all the cupy arrays created above should finish getting initialized.
171    # cp.cuda.Stream.null.synchronize()
172    # cuda.synchronize()
173    # with cupyx.profiler.profile():
174    # with streams[0]:
175    # Loop over blocks/batches of grid points
176    for iblock in range(nblocks+1):
177        # stream_id = iblock % nstreams
178        # with streams[stream_id]:
179        # with nb_streams[stream_id]:
180            # cp.cuda.Device(stream_id).use()
181            # cp.cuda.Stream(cp_streams[stream_id]).use()
182        offset = iblock*blocksize
183
184        # Get weights and coordinates of grid points for this block/batch
185        weights_block = weights_cp[offset : min(offset+blocksize,ngrids)]
186        coords_block = coords_cp[offset : min(offset+blocksize,ngrids)] 
187        #coords_block_cp = cp.asarray(coords[offset : min(offset+blocksize,ngrids)])
188
189        # Get the list of basis functions with significant contributions to this block
190        if list_nonzero_indices is not None:
191            non_zero_indices = list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]]
192            # Get the subset of density matrix corresponding to the siginificant basis functions
193            dmat = dmat_orig_cp[cp.ix_(non_zero_indices, non_zero_indices)]
194
195        if debug:
196            startAO = timer()
197        if xc_family_dict[x_family_code]=='LDA' and xc_family_dict[c_family_code]=='LDA':
198            if list_ao_values is not None: # If ao_values are calculated once and saved, then they can be provided to avoid recalculation
199                ao_value_block = list_ao_values[iblock]
200            else:
201                # ao_value_block = Integrals.eval_bfs(basis, coords_block)       
202                if list_nonzero_indices is not None:
203                    # ao_value_block = Integrals.bf_val_helpers.eval_bfs_sparse_vectorized_internal_cupy(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, coords_block, non_zero_indices, scratches)
204                    # Get a subset of the scratch for evaluating batch local ao values
205                    ao_value_block = cp.asarray(ao_value_block_[0:coords_block.shape[0], 0:non_zero_indices.shape[0]]) 
206                    # cp.cuda.Stream.null.synchronize() # Precautionary synchronize here
207                    # blocks_per_grid = ((non_zero_indices.shape[0]//thread_x) + 1, (coords_block.shape[0]//thread_y) + 1) # "lazy" round-up
208                    blocks_per_grid = ((non_zero_indices.shape[0] + (thread_x - 1))//thread_x, (coords_block.shape[0] + (thread_y - 1))//thread_y) 
209                    # nb_stream = stream_cupy_to_numba(streams[stream_id])
210                    # get the pointer to actual CUDA stream
211                    # raw_str = streams[stream_id].ptr
212                    # nb_stream = numba.cuda.external_stream(raw_str)
213                    Integrals.bf_val_helpers.eval_bfs_sparse_internal_cuda[blocks_per_grid, threads_per_block](bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, coords_block, non_zero_indices, ao_value_block)
214                    
215                    # cuda.synchronize() # This is needed when using this in a cupy stream
216                    # cp.cuda.Stream.null.synchronize() 
217                else:
218                    ao_value_block = Integrals.bf_val_helpers.eval_bfs_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, coords_block)
219        # If either x or c functional is of GGA/MGGA type we need ao_grad_values
220        if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA':
221            if list_ao_values is not None: # If ao_values are calculated once and saved, then they can be provided to avoid recalculation
222                ao_value_block, ao_values_grad_block = list_ao_values[iblock], list_ao_grad_values[iblock]
223            else:
224                # ao_value_block, ao_values_grad_block = Integrals.bf_val_helpers.eval_bfs_and_grad(basis, coords_block, deriv=1, parallel=True, non_zero_indices=non_zero_indices)
225                if list_nonzero_indices is not None:
226                    # Calculating ao values and gradients together, didn't really do much improvement in computational speed
227                    # ao_value_block, ao_values_grad_block = Integrals.bf_val_helpers.eval_bfs_and_grad_sparse_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, coords_block, non_zero_indices)
228                    # Get a subset of the scratch for evaluating batch local ao values
229                    ao_value_block = cp.asarray(ao_value_block_[0:coords_block.shape[0], 0:non_zero_indices.shape[0]]) 
230                    ao_values_grad_block = cp.asarray(ao_value_grad_block_[0:3, 0:coords_block.shape[0], 0:non_zero_indices.shape[0]]) 
231                    blocks_per_grid = ((non_zero_indices.shape[0] + (thread_x - 1))//thread_x, (coords_block.shape[0] + (thread_y - 1))//thread_y) 
232                    # nb_stream = stream_cupy_to_numba(streams[stream_id])
233                    Integrals.bf_val_helpers.eval_bfs_and_grad_sparse_internal_cuda[blocks_per_grid, threads_per_block](bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], 
234                                                                                                            bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, 
235                                                                                                            coords_block, non_zero_indices, ao_value_block, 
236                                                                                                            ao_values_grad_block)
237                    
238                    # streams[stream_id].synchronize()
239                    # nb_streams[stream_id].synchronize()
240                    # cp.cuda.Stream.null.synchronize() 
241                    # streams[stream_id].synchronize()
242                    
243                else:
244                    ao_value_block, ao_values_grad_block = Integrals.bf_val_helpers.eval_bfs_and_grad_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, coords_block)
245        
246
247        # Convert to cupy array
248        ao_value_block = cp.asarray(ao_value_block, dtype=type)
249        if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA':
250            ao_values_grad_block = cp.asarray(ao_values_grad_block, dtype=type)
251
252        if debug:
253            cp.cuda.Stream.null.synchronize()
254            durationAO = durationAO + timer() - startAO
255
256        if debug:
257            startRho = timer()
258            # print('calculating rho...')
259        # rho_block = contract('ij,mi,mj->m', dmat, ao_value_block, ao_value_block, backend='cupy') # Original (pretty fast)
260        rho_block = my_expr(dmat, ao_value_block, ao_value_block, backend='cupy')
261        # X = ao_value_block @ dmat.T 
262        # rho_block = cp.sum(ao_value_block*X,axis=1)
263        # rho_block = contract('mi,im->m', ao_value_block, X)
264        # If either x or c functional is of GGA/MGGA type we need rho_grad_values too
265        if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA':
266            # rho_grad_block_x, rho_grad_block_y, rho_grad_block_z  = ( contract('ij,kmi,mj->km',dmat,ao_values_grad_block,ao_value_block, backend='cupy', optimize='optimal')+\
267            #                 contract('ij,mi,kmj->km',dmat,ao_value_block,ao_values_grad_block, backend='cupy', optimize='optimal') )[:]
268            rho_grad_block_x, rho_grad_block_y, rho_grad_block_z  = my_expr_grad1(dmat,ao_values_grad_block,ao_value_block, backend='cupy') \
269                                                                    + my_expr_grad2(dmat, ao_value_block, ao_values_grad_block, backend='cupy') 
270            sigma_block = calc_sigma(rho_grad_block_x, rho_grad_block_y, rho_grad_block_z)
271            if use_libxc:
272                sigma_block = cp.asnumpy(sigma_block)
273
274        if debug:
275            cp.cuda.Stream.null.synchronize()
276            durationRho = durationRho + timer() - startRho
277            #print('Duration for Rho at grid points: ',durationRho)
278            # print('done calculating rho')
279    
280        #LibXC stuff
281        if use_libxc:
282            # Exchange
283            if debug:
284                startLibxc = timer()
285            # Input dictionary for libxc
286            inp = {}
287            # Input dictionary needs density values at grid points
288            inp['rho'] = cp.asnumpy(rho_block)
289            if xc_family_dict[x_family_code]!='LDA':
290                # Input dictionary needs sigma (\nabla \rho \cdot \nabla \rho) values at grid points
291                inp['sigma'] = sigma_block
292            # Calculate the necessary quantities using LibXC
293            retx = funcx.compute(inp)
294            # durationLibxc = durationLibxc + timer() - startLibxc
295            # print('Duration for LibXC computations at grid points: ',durationLibxc)
296
297            # Correlation
298            # startLibxc = timer()
299            # Input dictionary for libxc
300            #inp = {}
301            # Input dictionary needs density values at grid points
302            #inp['rho'] = rho_block
303            if xc_family_dict[c_family_code]!='LDA':
304                # Input dictionary needs sigma (\nabla \rho \cdot \nabla \rho) values at grid points
305                inp['sigma'] = sigma_block
306            # Calculate the necessary quantities using LibXC
307            retc = funcc.compute(inp)
308            if debug:
309                cp.cuda.Stream.null.synchronize()
310                durationLibxc = durationLibxc + timer() - startLibxc
311            # print('Duration for LibXC computations at grid points: ',durationLibxc)
312        else:
313            # Exchange
314            if debug:
315                startLibxc = timer()
316            # Calculate the necessary quantities using own implementation
317            # retx = lda_x(rho_block)
318            # retx = gga_x_b88(rho_block, sigma_block)
319            if xc_family_dict[x_family_code]=='LDA':
320                retx = XC.func_compute(funcid[0], rho_block, use_gpu=True)
321            else:
322                retx = XC.func_compute(funcid[0], rho_block, sigma_block, use_gpu=True)
323
324            # Correlation
325            # Calculate the necessary quantities using own implementation
326            # retc = lda_c_vwn(rho_block)
327            # retc = gga_c_lyp(rho_block, sigma_block)
328            if xc_family_dict[c_family_code]=='LDA':
329                retc = XC.func_compute(funcid[1], rho_block, use_gpu=True)
330            else:
331                retc = XC.func_compute(funcid[1], rho_block, sigma_block, use_gpu=True)
332            if debug:
333                cp.cuda.Stream.null.synchronize()
334                durationLibxc = durationLibxc + timer() - startLibxc
335            # print('Duration for LibXC computations at grid points: ',durationLibxc)
336
337
338        if debug:
339            startE = timer()
340        #ENERGY-----------
341        if use_libxc:
342            e = retx['zk'] + retc['zk'] # Functional values at grid points
343        else:
344            e = retx[0] + retc[0]
345        # Testing CrysX's own implmentation
346        #e = densfuncs.lda_x(rho)
347
348        # Calculate the total energy 
349        # Multiply the density at grid points by weights
350        den = rho_block*weights_block
351        # den = cp.multiply(rho_block, weights_block)
352        # den = elementwise_multiply(rho_block, weights_block)
353        if use_libxc:
354            efunc = efunc + cp.dot(den, cp.asarray(e)) #Multiply with functional values at grid points and sum
355        else:
356            efunc = efunc + cp.dot(den, e) #Multiply with functional values at grid points and sum
357            # efunc = efunc + cp.sum(elementwise_multiply(den, e)) #Multiply with functional values at grid points and sum
358        nelec += cp.sum(den)
359        if debug:
360            cp.cuda.Stream.null.synchronize()
361            durationE = durationE + timer() - startE
362        # print('Duration for calculation of total density functional energy: ',durationE)
363
364        #POTENTIAL----------
365        # The derivative of functional wrt density is vrho
366        if debug:
367            startF = timer()
368        if use_libxc:
369            vrho = retx['vrho'] + retc['vrho']
370        else:
371            vrho = retx[1] + retc[1]
372        
373        vsigma = 0
374        # If either x or c functional is of GGA/MGGA type we need rho_grad_values
375        if xc_family_dict[x_family_code]!='LDA':
376            # The derivative of functional wrt grad \rho square.
377            if use_libxc:
378                vsigma += retx['vsigma']
379            else:
380                vsigma += retx[2]
381            
382        if xc_family_dict[c_family_code]!='LDA':
383            # The derivative of functional wrt grad \rho square.
384            if use_libxc:
385                vsigma += retc['vsigma']
386            else:
387                vsigma += retc[2]
388        
389        # F = np.multiply(weights_block,vrho[:,0]) #This is fast enough.
390        if use_libxc:
391            v_rho_temp = cp.asarray(vrho[:,0])
392        else:
393            v_rho_temp = vrho
394        # F = weights_block*v_rho_temp
395        # F = calc_F(weights_block, v_rho_temp)
396        # If either x or c functional is of GGA/MGGA type we need rho_grad_values
397        if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA':
398            if use_libxc:
399                Ftemp = 2*weights_block*cp.asarray(vsigma[:,0])
400            else:
401                Ftemp = 2*weights_block*vsigma
402                # Ftemp = 2*cp.multiply(weights_block, vsigma)
403                # Ftemp = 2*elementwise_multiply(weights_block, vsigma)
404            Fx = Ftemp*rho_grad_block_x
405            Fy = Ftemp*rho_grad_block_y
406            Fz = Ftemp*rho_grad_block_z
407        if debug:
408            cp.cuda.Stream.null.synchronize()
409            durationF = durationF + timer() - startF
410        # print('Duration for calculation of F: ',durationF)
411        if debug:
412            startZ = timer()
413        # z = 0.5*F*ao_value_block.T
414        # z = calc_z(F, ao_value_block.T)
415        z = calc_z(weights_block, v_rho_temp, ao_value_block.T)
416        # z = 0.5*np.einsum('m,mi->mi',F,ao_value_block)
417        # If either x or c functional is of GGA/MGGA type we need rho_grad_values
418        if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA':
419            z += calc_z_gga(Fx, Fy, Fz, ao_values_grad_block[0].T, ao_values_grad_block[1].T, ao_values_grad_block[2].T)
420            # z += Fx*ao_values_grad_block[0].T + Fy*ao_values_grad_block[1].T + Fz*ao_values_grad_block[2].T
421        
422            
423        if debug:
424            cp.cuda.Stream.null.synchronize()
425            durationZ = durationZ + timer() - startZ
426        # print('Duration for calculation of z : ',durationZ)
427        # Free memory
428        F = 0
429        ao_value_block_T = 0
430        vrho = 0
431
432        if debug:
433            startV = timer()
434        
435        #v_temp = z @ ao_value_block
436        #v_temp = cp.dot(z, ao_value_block)
437        v_temp = cp.matmul(z, ao_value_block)
438        if list_nonzero_indices is not None:
439            v[cp.ix_(non_zero_indices, non_zero_indices)] += v_temp + v_temp.T
440        else:
441            v += v_temp + v_temp.T
442
443            
444        if debug:
445            cp.cuda.Stream.null.synchronize()
446            durationV = durationV + timer() - startV
447
448    # for stream in streams:
449    #     stream.synchronize()
450
451
452    print('Number of electrons: ', nelec)
453    #print(v_temp.dtype)
454    if debug:
455        print('Duration for Prelims: ', durationPrelims)
456        print('Duration for AO values: ', durationAO)
457        print('Duration for V: ',durationV)
458        print('Duration for Rho at grid points: ',durationRho)
459        print('Duration for F: ',durationF)
460        print('Duration for Z: ',durationZ)
461        print('Duration for E: ',durationE)
462        print('Duration for LibXC: ',durationLibxc)
463        
464
465        
466
467    if debug:
468        startV_tonumpy = timer()
469    if debug:
470        cp.cuda.Stream.null.synchronize()
471        durationV_tonumpy = timer() - startV_tonumpy
472        print('Duration for V to numpy', durationV_tonumpy)
473        print('Total time: ', durationPrelims+durationAO+durationRho+durationLibxc+durationE+durationF+durationV+durationZ+durationV_tonumpy)
474
475
476    if use_libxc:
477        efunc = efunc[0]
478    if debug:
479        cp.cuda.Stream.null.synchronize()
480        durationTotal += timer() - startTotal
481        print('Total time2: ', durationTotal)
482
483    if freemem or nstreams!=1:
484        cp._default_memory_pool.free_all_blocks()
485
486
487    # Final synchronize before return
488    cp.cuda.Stream.null.synchronize()
489    return efunc, v
def eval_xc_2_cupy( basis, dmat, weights, coords, funcid=[1, 7], spin=0, ncores=2, blocksize=5000, list_nonzero_indices=None, count_nonzero_indices=None, list_ao_values=None, list_ao_grad_values=None, debug=False):
 18def eval_xc_2_cupy(basis, dmat, weights, coords, funcid=[1,7], spin=0, ncores=2, blocksize=5000, list_nonzero_indices=None, count_nonzero_indices=None, list_ao_values=None, list_ao_grad_values=None, debug=False):
 19    print('Calculating XC term using GPU and algo 2', flush=True)
 20    print('Algo 2 uses a hybrid CPU+GPU approach for XC evaluation.', flush=True)
 21    print('CPU threads specified by ncores are used to evaluate functional values (via LibXC) and AO/AO grad values.', flush=True)
 22    # This performs parallelization at the blocks/batches level.
 23    # Therefore, joblib is perfect for such embarrasingly parallel task
 24    # In order to evaluate a density functional we will use the 
 25    # libxc library with Python bindings.
 26    # However, some sort of simple functionals like LDA, GGA, etc would
 27    # need to be implemented in CrysX also, so that the it doesn't depend
 28    # on external libraries so heavily that it becomes unusable without those.
 29
 30    #Useful links:
 31    # LibXC manual: https://www.tddft.org/programs/libxc/manual/
 32    # LibXC gitlab: https://gitlab.com/libxc/libxc/-/tree/master
 33    # LibXC python interface code: https://gitlab.com/libxc/libxc/-/blob/master/pylibxc/functional.py
 34    # LibXC python version installation and example: https://www.tddft.org/programs/libxc/installation/
 35    # Formulae for XC energy and potential calculation: https://pubs.acs.org/doi/full/10.1021/ct200412r
 36    # LibXC code list: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/libxc.py
 37    # PySCF nr_rks code: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/numint.py
 38    # https://www.osti.gov/pages/servlets/purl/1650078
 39
 40    #OUTPUT
 41    #Functional energy
 42    efunc = 0.0
 43    #Functional potential V_{\mu \nu} = \mu|\partial{f}/\partial{\rho}|\nu
 44    v = cp.zeros((basis.bfs_nao, basis.bfs_nao))
 45    nelec = 0
 46
 47    # TODO it only works for LDA functionals for now.
 48    # Need to make it work for GGA, Hybrid, range-separated Hybrid and MetaGGA functionals as well.
 49    
 50
 51    ngrids = coords.shape[0]
 52    nblocks = ngrids//blocksize
 53
 54    # print('Number of blocks: ', nblocks)
 55
 56    # Some stuff to note timings
 57    timings = None
 58    if debug:
 59        durationLibxc = 0.0
 60        durationE = 0.0
 61        durationF = 0.0
 62        durationZ = 0.0
 63        durationV = 0.0
 64        durationRho = 0.0
 65        durationAO = 0.0
 66        timings = {'durationLibxc':durationLibxc, 'durationE':durationE, 'durationF':durationF, 'durationZ':durationZ, 'durationV':durationV, 'durationRho':durationRho, 'durationAO':durationAO}
 67
 68    
 69
 70    ### Calculate stuff necessary for bf/ao evaluation on grid points
 71    ### Doesn't make any difference for 510 bfs but might be significant for >1000 bfs
 72    # This will help to make the call to evalbfnumba1 faster by skipping the mediator evalbfnumbawrap function 
 73    # that prepares the following stuff at every iteration
 74    #We convert the required properties to numpy arrays as this is what Numba likes.
 75    bfs_coords = np.array([basis.bfs_coords])
 76    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
 77    bfs_lmn = np.array([basis.bfs_lmn])
 78    bfs_nprim = np.array([basis.bfs_nprim])
 79    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
 80    #Numba won't be able to work with these efficiently.
 81    #So, we convert them to a numpy 2d array by applying a trick,
 82    #that the second dimension is that of the largest list. So that
 83    #it can accomadate all the lists.
 84    maxnprim = max(basis.bfs_nprim)
 85    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
 86    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
 87    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
 88    bfs_radius_cutoff = np.zeros([basis.bfs_nao])
 89    for i in range(basis.bfs_nao):
 90        for j in range(basis.bfs_nprim[i]):
 91            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
 92            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
 93            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
 94            bfs_radius_cutoff[i] = basis.bfs_radius_cutoff[i]
 95    # Now bf/ao values can be evaluated by calling the following
 96    # bf_values = Integrals.bf_val_helpers.eval_bfs(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, bfs_radius_cutoff, coord)
 97    bfs_data_as_np_arrays = [bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, bfs_radius_cutoff]
 98
 99    xc_family_dict = {1:'LDA',2:'GGA',4:'MGGA'} 
100    # Create a LibXC object  
101    funcx = pylibxc.LibXCFunctional(funcid[0], "unpolarized")
102    funcc = pylibxc.LibXCFunctional(funcid[1], "unpolarized")
103    x_family_code = funcx.get_family()
104    c_family_code = funcc.get_family()
105
106    weights_cp = cp.asarray(weights)
107    dmat_cp = cp.asarray(dmat)
108    # Create streams for asynchronous execution
109    streams = [cp.cuda.Stream(non_blocking=True) for i in range(ncores)]
110
111    #### Set number of cores for numba related evaluations within 'block_dens_func()' for example bf_value evaluations 
112    #### Apparently, it was still using as many cores as possible and creating a serial version of thise functions as I have 
113    #### currently done doesn't work  
114    # numba.set_num_threads(1)
115    if list_nonzero_indices is not None:
116        if list_ao_values is not None:
117            if xc_family_dict[x_family_code]=='LDA' and xc_family_dict[c_family_code]=='LDA':
118                output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem', batch_size=nblocks//ncores)(delayed(block_dens_func)(weights_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat_cp[cp.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_ao_values[iblock], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug, stream=streams[iblock%len(streams)]) for iblock in range(nblocks+1))
119            else: #GGA
120                output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem')(delayed(block_dens_func)(weights_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat_cp[np.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_ao_values[iblock], list_ao_grad_values[iblock], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug) for iblock in range(nblocks+1))
121        else:
122            output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem')(delayed(block_dens_func)(weights_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat_cp[np.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug) for iblock in range(nblocks+1))
123    else:
124        output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem')(delayed(block_dens_func)(weights_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat_cp, funcid, bfs_data_as_np_arrays, non_zero_indices=None, ao_values=None, funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug) for iblock in range(nblocks+1))
125        
126    # print(len(output))
127    for iblock in range(0,len(output)):
128        efunc += output[iblock][0]
129        if list_nonzero_indices is not None:
130            non_zero_indices = list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]]
131            v[cp.ix_(non_zero_indices, non_zero_indices)] += output[iblock][1]
132        else:
133            v += output[iblock][1]
134        nelec += output[iblock][2]
135        if debug:
136            timings['durationLibxc'] += output[iblock][3]['durationLibxc']
137            timings['durationE'] += output[iblock][3]['durationE']
138            timings['durationF'] += output[iblock][3]['durationF']
139            timings['durationZ'] += output[iblock][3]['durationZ']
140            timings['durationV'] += output[iblock][3]['durationV']
141            timings['durationRho'] += output[iblock][3]['durationRho']
142            timings['durationAO'] += output[iblock][3]['durationAO']
143        # v = numexpr.evaluate('(v + output[iblock][1])')
144
145
146    
147
148    numba.set_num_threads(ncores)    
149    print('Number of electrons: ', nelec)
150    if debug:
151        print('Timings:', timings)
152
153    ####### Free memory
154    ## The following is very important to prevent memory leaks and also to make sure that the number of 
155    # threads used by the program is same as that specified by the user 
156    # gc.collect()  # Avoiding using it for now, as it is usually quite slow, although in this case it might not make much difference
157    # Anyway, the following also works
158    output = 0
159    non_zero_indices = 0
160    coords = 0
161    cp.cuda.Stream.null.synchronize()
162    return efunc, v
def eval_xc_3_cupy( basis, dmat, weights, coords, funcid=[1, 7], spin=0, blocksize=10240, debug=False, list_nonzero_indices=None, count_nonzero_indices=None, list_ao_values=None, list_ao_grad_values=None, use_libxc=True, nstreams=1, ngpus=1, freemem=True, threads_per_block=None, type=<class 'numpy.float64'>, streams=None, nb_streams=None, bfs_data_as_np_arrays=None):
 25def eval_xc_3_cupy(basis, dmat, weights, coords, funcid=[1,7], spin=0, blocksize=10240, debug=False, list_nonzero_indices=None, \
 26                   count_nonzero_indices=None, list_ao_values=None, list_ao_grad_values=None, use_libxc=True, nstreams=1, ngpus=1,\
 27                    freemem=True, threads_per_block=None, type=cp.float64, streams=None, nb_streams=None, bfs_data_as_np_arrays=None):
 28    print('Calculating XC term using GPU and algo 3', flush=True)
 29    if not use_libxc:
 30        print('Not using LibXC for XC evaluations', flush=True)
 31    # This performs parallelization at the blocks/batches level.
 32    # Therefore, joblib is perfect for such embarrasingly parallel task
 33    # In order to evaluate a density functional we will use the 
 34    # libxc library with Python bindings.
 35    # However, some sort of simple functionals like LDA, GGA, etc would
 36    # need to be implemented in CrysX also, so that the it doesn't depend
 37    # on external libraries so heavily that it becomes unusable without those.
 38
 39    #Useful links:
 40    # LibXC manual: https://www.tddft.org/programs/libxc/manual/
 41    # LibXC gitlab: https://gitlab.com/libxc/libxc/-/tree/master
 42    # LibXC python interface code: https://gitlab.com/libxc/libxc/-/blob/master/pylibxc/functional.py
 43    # LibXC python version installation and example: https://www.tddft.org/programs/libxc/installation/
 44    # Formulae for XC energy and potential calculation: https://pubs.acs.org/doi/full/10.1021/ct200412r
 45    # LibXC code list: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/libxc.py
 46    # PySCF nr_rks code: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/numint.py
 47    # https://www.osti.gov/pages/servlets/purl/1650078
 48
 49    #OUTPUT
 50    #Functional energy
 51    efunc = 0.0
 52    # Start from the main GPU
 53    cp.cuda.Device(0).use()
 54    #Functional potential V_{\mu \nu} = \mu|\partial{f}/\partial{\rho}|\nu
 55    v = cp.zeros((basis.bfs_nao, basis.bfs_nao), dtype=type)
 56    nelec = 0
 57
 58    # TODO it only works for LDA functionals for now.
 59    # Need to make it work for GGA, Hybrid, range-separated Hybrid and MetaGGA functionals as well.
 60    
 61
 62    ngrids = coords.shape[0]
 63    nblocks = ngrids//blocksize
 64
 65    # print('Number of blocks: ', nblocks)
 66
 67    # Some stuff to note timings
 68    timings = None
 69    if debug:
 70        durationLibxc = 0.0
 71        durationE = 0.0
 72        durationF = 0.0
 73        durationZ = 0.0
 74        durationV = 0.0
 75        durationRho = 0.0
 76        durationAO = 0.0
 77        timings = {'durationLibxc':durationLibxc, 'durationE':durationE, 'durationF':durationF, 'durationZ':durationZ, 'durationV':durationV, 'durationRho':durationRho, 'durationAO':durationAO}
 78
 79    
 80
 81    ### Calculate stuff necessary for bf/ao evaluation on grid points
 82    ### Doesn't make any difference for 510 bfs but might be significant for >1000 bfs
 83    # This will help to make the call to evalbfnumba1 faster by skipping the mediator evalbfnumbawrap function 
 84    # that prepares the following stuff at every iteration
 85    #We convert the required properties to numpy arrays as this is what Numba likes.
 86    if bfs_data_as_np_arrays is None:
 87        bfs_coords = cp.asarray([basis.bfs_coords], dtype=type)
 88        bfs_contr_prim_norms = cp.asarray([basis.bfs_contr_prim_norms], dtype=type)
 89        bfs_lmn = cp.asarray([basis.bfs_lmn])
 90        bfs_nprim = cp.asarray([basis.bfs_nprim])
 91        #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
 92        #Numba won't be able to work with these efficiently.
 93        #So, we convert them to a numpy 2d array by applying a trick,
 94        #that the second dimension is that of the largest list. So that
 95        #it can accomadate all the lists.
 96        maxnprim = max(basis.bfs_nprim)
 97        bfs_coeffs = cp.zeros([basis.bfs_nao, maxnprim], dtype=type)
 98        bfs_expnts = cp.zeros([basis.bfs_nao, maxnprim], dtype=type)
 99        bfs_prim_norms = cp.zeros([basis.bfs_nao, maxnprim], dtype=type)
100        bfs_radius_cutoff = cp.zeros([basis.bfs_nao], dtype=type)
101        for i in range(basis.bfs_nao):
102            for j in range(basis.bfs_nprim[i]):
103                bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
104                bfs_expnts[i,j] = basis.bfs_expnts[i][j]
105                bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
106                bfs_radius_cutoff[i] = basis.bfs_radius_cutoff[i]
107        # Now bf/ao values can be evaluated by calling the following
108        # bf_values = Integrals.bf_val_helpers.eval_bfs(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, bfs_radius_cutoff, coord)
109        bfs_data_as_np_arrays = [bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, bfs_radius_cutoff]
110
111    xc_family_dict = {1:'LDA',2:'GGA',4:'MGGA'} 
112    # Create a LibXC object  
113    funcx = pylibxc.LibXCFunctional(funcid[0], "unpolarized")
114    funcc = pylibxc.LibXCFunctional(funcid[1], "unpolarized")
115    x_family_code = funcx.get_family()
116    c_family_code = funcc.get_family()
117
118    my_expr = contract_expression('ij,mi,mj->m', (150, 150), (blocksize, 150), (blocksize, 150))
119    my_expr_grad1 = None
120    my_expr_grad2 = None
121    if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA':
122        my_expr_grad1 = contract_expression('ij,kmi,mj->km', (150, 150), (3, blocksize, 150), (blocksize, 150))
123        my_expr_grad2 = contract_expression('ij,mi,kmj->km', (150, 150), (blocksize, 150), (3, blocksize, 150))
124    contr_expr = [my_expr, my_expr_grad1, my_expr_grad2]
125
126    if threads_per_block is None:
127        # Determine the optimal number of blocks per grid
128        max_threads_per_block = cuda.get_current_device().MAX_THREADS_PER_BLOCK
129        thread_x = max_threads_per_block/16
130        thread_y = max_threads_per_block/64
131        threads_per_block = (thread_x, thread_y)
132    else:
133        thread_x = threads_per_block[0]
134        thread_y = threads_per_block[1]
135    
136    for iblock in range(len(list_nonzero_indices)):
137        list_nonzero_indices[iblock] = cp.asarray(list_nonzero_indices[iblock])
138
139    
140    weights_cp = cp.asarray(weights, dtype=type)
141    coords_cp = cp.asarray(coords, dtype=type)
142
143    dmat_cp = cp.asarray(dmat, dtype=type)
144    # Create dmat lists
145    dmats_list = []
146    for iblock in range(nblocks + 1):
147        dmats_list.append(cp.copy(dmat_cp[cp.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])]))
148        
149    # Create streams for asynchronous execution
150    # on different GPUs 
151    if streams is None and nb_streams is None:
152        streams = []
153        nb_streams = []
154        for i in range(ngpus):
155            cp.cuda.Device(i).use()
156            cp_stream = cp.cuda.Stream(non_blocking = True)
157            nb_stream = cuda.external_stream(cp_stream.ptr)
158            streams.append(cp_stream)
159            nb_streams.append(nb_stream)
160
161    if list_nonzero_indices is not None:
162        if list_ao_values is not None:
163            if xc_family_dict[x_family_code]=='LDA' and xc_family_dict[c_family_code]=='LDA':
164                output = Parallel(n_jobs=ngpus, backend='threading', require='sharedmem')(delayed(block_dens_func)(weights_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmats_list[iblock], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_ao_values[iblock], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug, stream=streams[iblock%ngpus], nb_stream=nb_streams[iblock%ngpus], thread_x=thread_x, 
165                    thread_y=thread_y, threads_per_block=threads_per_block, type=type, use_libxc=use_libxc, expressions=contr_expr, device=iblock%ngpus) for iblock in range(nblocks+1))
166            else: #GGA
167                output = Parallel(n_jobs=ngpus, backend='threading', require='sharedmem')(delayed(block_dens_func)(weights_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmats_list[iblock], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_ao_values[iblock], list_ao_grad_values[iblock], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug, stream=streams[iblock%ngpus], nb_stream=nb_streams[iblock%ngpus], thread_x=thread_x, 
168                    thread_y=thread_y, threads_per_block=threads_per_block, type=type, use_libxc=use_libxc, expressions=contr_expr, device=iblock%ngpus) for iblock in range(nblocks+1))
169        else:
170            output = Parallel(n_jobs=ngpus, backend='threading', require='sharedmem')(delayed(block_dens_func)(weights_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmats_list[iblock], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug, stream=streams[iblock%ngpus], nb_stream=nb_streams[iblock%ngpus], thread_x=thread_x, 
171                    thread_y=thread_y, threads_per_block=threads_per_block, type=type, use_libxc=use_libxc, expressions=contr_expr, device=iblock%ngpus) for iblock in range(nblocks+1))
172    else:
173        output = Parallel(n_jobs=ngpus, backend='threading', require='sharedmem')(delayed(block_dens_func)(weights_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat_cp, funcid, bfs_data_as_np_arrays, non_zero_indices=None, ao_values=None, funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug, stream=streams[iblock%ngpus], nb_stream=nb_streams[iblock%ngpus], thread_x=thread_x, 
174                    thread_y=thread_y, threads_per_block=threads_per_block, type=type, use_libxc=use_libxc, expressions=contr_expr, device=iblock%ngpus) for iblock in range(nblocks+1))
175        
176    for istream in range(ngpus):
177        streams[istream].synchronize()
178    # Switch back to main GPU
179    cp.cuda.Device(0).use()
180    for iblock in range(0,len(output)):
181        efunc += cp.asarray(output[iblock][0])
182        if list_nonzero_indices is not None:
183            non_zero_indices = cp.asarray(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])
184            # v[cp.ix_(non_zero_indices, non_zero_indices)] += cp.asarray(output[iblock][1])
185            v[cp.ix_(non_zero_indices, non_zero_indices)] = cp.asarray(output[iblock][1]) + cp.asarray(v)[cp.ix_(non_zero_indices, non_zero_indices)]
186        else:
187            # v += cp.asarray(output[iblock][1])
188            v = cp.asarray(output[iblock][1]) + cp.asarray(v)
189        nelec += cp.asarray(output[iblock][2]) 
190        if debug:
191            timings['durationLibxc'] += output[iblock][3]['durationLibxc']
192            timings['durationE'] += output[iblock][3]['durationE']
193            timings['durationF'] += output[iblock][3]['durationF']
194            timings['durationZ'] += output[iblock][3]['durationZ']
195            timings['durationV'] += output[iblock][3]['durationV']
196            timings['durationRho'] += output[iblock][3]['durationRho']
197            timings['durationAO'] += output[iblock][3]['durationAO']
198        # v = numexpr.evaluate('(v + output[iblock][1])')
199
200
201    
202   
203    print('Number of electrons: ', nelec)
204    if debug:
205        print('Timings:', timings)
206
207    ####### Free memory
208    ## The following is very important to prevent memory leaks and also to make sure that the number of 
209    # threads used by the program is same as that specified by the user 
210    # gc.collect()  # Avoiding using it for now, as it is usually quite slow, although in this case it might not make much difference
211    # Anyway, the following also works
212    output = 0
213    non_zero_indices = 0
214    coords = 0
215
216
217    cp.cuda.Stream.null.synchronize()
218
219    if use_libxc:
220        efunc = efunc[0]
221
222    if freemem:
223        for istream in range(ngpus):
224            cp.cuda.Device(istream).use()
225            cp._default_memory_pool.free_all_blocks()
226            # cuda.current_context().memory_manager.deallocations.clear()
227        # Switch back to main GPU
228        cp.cuda.Device(0).use()
229
230    return efunc, v
def dipole_moment_mat_symm(basis, slice=None, origin=array([0., 0., 0.])):
  7def dipole_moment_mat_symm(basis, slice=None, origin=np.zeros((3))):
  8    """
  9    Compute the dipole moment integral matrix in the atomic orbital (AO) basis
 10    using symmetry and the McMurchie–Davidson algorithm.
 11
 12    This routine evaluates the matrix elements:
 13
 14        μ_ij^(k) = ⟨ χ_i | r_k | χ_j ⟩
 15
 16    where:
 17        - χ_i, χ_j are Gaussian-type atomic orbitals (AOs)
 18        - r_k is the k-th Cartesian coordinate (x, y, or z)
 19        - origin is the coordinate origin for the dipole operator
 20
 21    The implementation follows the McMurchie–Davidson scheme for Gaussian
 22    integrals, adapted from:
 23
 24        https://github.com/jjgoings/McMurchie-Davidson
 25
 26    Symmetries exploited
 27    --------------------
 28    The dipole moment matrix is symmetric for real-valued AOs:
 29
 30        μ_ij^(k) = μ_ji^(k)
 31
 32    This halves the number of unique integrals from N_bf^2 to
 33    N_bf*(N_bf+1)/2 for each Cartesian component.
 34
 35    Parameters
 36    ----------
 37    basis : object
 38        Basis set object containing:
 39        - bfs_coords : Cartesian coordinates of AO centers
 40        - bfs_coeffs : Contraction coefficients
 41        - bfs_expnts : Gaussian exponents
 42        - bfs_prim_norms : Primitive normalization constants
 43        - bfs_contr_prim_norms : Contraction normalization factors
 44        - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n)
 45        - bfs_nprim : Number of primitives per basis function
 46        - bfs_nao : Number of atomic orbitals
 47
 48    slice : list of int, optional
 49        A 4-element list [row_start, row_end, col_start, col_end]
 50        specifying the sub-block of the matrix to compute.
 51        If None (default), computes the full symmetric matrix.
 52
 53    origin : ndarray of shape (3,), optional
 54        Origin of the dipole operator in Cartesian coordinates.
 55        Default is (0.0, 0.0, 0.0).
 56
 57    Returns
 58    -------
 59    M : ndarray of shape (3, n_rows, n_cols)
 60        The computed dipole moment integrals for the requested block,
 61        with the first dimension indexing the x, y, z components.
 62
 63    Notes
 64    -----
 65    - Basis set data are pre-packed into NumPy arrays for compatibility
 66      with Numba JIT compilation.
 67    - The algorithm avoids redundant evaluations by exploiting matrix
 68      symmetry when `slice` covers the full range.
 69    - The dipole moment integrals are typically used in computing
 70      molecular dipole moments from density matrices.
 71
 72    Examples
 73    --------
 74    >>> mu_full = dipole_moment_mat_symm(basis)
 75    >>> mu_block = dipole_moment_mat_symm(basis, slice=[0, 5, 0, 5])
 76    >>> mu_shifted = dipole_moment_mat_symm(basis, origin=np.array([1.0, 0.0, 0.0]))
 77    """
 78    # Here the lists are converted to numpy arrays for better use with Numba.
 79    # Once these conversions are done we pass these to a Numba decorated
 80    # function that uses prange, etc. to calculate the matrix efficiently.
 81
 82    # This function calculates the kinetic energy matrix for a given basis object.
 83    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
 84    # It is possible to only calculate a slice (block) of the complete matrix.
 85    # slice is a 4 element list whose first and second elements give the range of the rows to be calculated.
 86    # the third and fourth element give the range of columns to be calculated.
 87    # slice = [start_row, end_row, start_col, end_col]
 88    # The integrals are performed using the formulas
 89
 90    #We convert the required properties to numpy arrays as this is what Numba likes.
 91    bfs_coords = np.array([basis.bfs_coords])
 92    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
 93    bfs_lmn = np.array([basis.bfs_lmn])
 94    bfs_nprim = np.array([basis.bfs_nprim])
 95        
 96
 97    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
 98    #Numba won't be able to work with these efficiently.
 99    #So, we convert them to a numpy 2d array by applying a trick,
100    #that the second dimension is that of the largest list. So that
101    #it can accomadate all the lists.
102    maxnprim = max(basis.bfs_nprim)
103    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
104    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
105    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
106    for i in range(basis.bfs_nao):
107        for j in range(basis.bfs_nprim[i]):
108            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
109            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
110            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
111        
112
113
114    if slice is None:
115        slice = [0, basis.bfs_nao, 0, basis.bfs_nao]
116        
117    #Limits for the calculation of overlap integrals
118    a = int(slice[0]) #row start index
119    b = int(slice[1]) #row end index
120    c = int(slice[2]) #column start index
121    d = int(slice[3]) #column end index
122
123        
124    M = dipole_moment_mat_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d, origin)
125    
126    return M

Compute the dipole moment integral matrix in the atomic orbital (AO) basis using symmetry and the McMurchie–Davidson algorithm.

This routine evaluates the matrix elements:

μ_ij^(k) = ⟨ χ_i | r_k | χ_j ⟩

where: - χ_i, χ_j are Gaussian-type atomic orbitals (AOs) - r_k is the k-th Cartesian coordinate (x, y, or z) - origin is the coordinate origin for the dipole operator

The implementation follows the McMurchie–Davidson scheme for Gaussian integrals, adapted from:

https://github.com/jjgoings/McMurchie-Davidson

Symmetries exploited

The dipole moment matrix is symmetric for real-valued AOs:

μ_ij^(k) = μ_ji^(k)

This halves the number of unique integrals from N_bf^2 to N_bf*(N_bf+1)/2 for each Cartesian component.

Parameters

basis : object Basis set object containing: - bfs_coords : Cartesian coordinates of AO centers - bfs_coeffs : Contraction coefficients - bfs_expnts : Gaussian exponents - bfs_prim_norms : Primitive normalization constants - bfs_contr_prim_norms : Contraction normalization factors - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n) - bfs_nprim : Number of primitives per basis function - bfs_nao : Number of atomic orbitals

slice : list of int, optional A 4-element list [row_start, row_end, col_start, col_end] specifying the sub-block of the matrix to compute. If None (default), computes the full symmetric matrix.

origin : ndarray of shape (3,), optional Origin of the dipole operator in Cartesian coordinates. Default is (0.0, 0.0, 0.0).

Returns

M : ndarray of shape (3, n_rows, n_cols) The computed dipole moment integrals for the requested block, with the first dimension indexing the x, y, z components.

Notes

  • Basis set data are pre-packed into NumPy arrays for compatibility with Numba JIT compilation.
  • The algorithm avoids redundant evaluations by exploiting matrix symmetry when slice covers the full range.
  • The dipole moment integrals are typically used in computing molecular dipole moments from density matrices.

Examples

>>> mu_full = dipole_moment_mat_symm(basis)
>>> mu_block = dipole_moment_mat_symm(basis, slice=[0, 5, 0, 5])
>>> mu_shifted = dipole_moment_mat_symm(basis, origin=np.array([1.0, 0.0, 0.0]))
def kin_mat_symm_cupy(basis, slice=None, cp_stream=None):
 20def kin_mat_symm_cupy(basis, slice=None, cp_stream=None):
 21    # Here the lists are converted to numpy arrays for better use with Numba.
 22    # Once these conversions are done we pass these to a Numba decorated
 23    # function that uses prange, etc. to calculate the matrix efficiently.
 24
 25    # This function calculates the kinetic energy matrix for a given basis object.
 26    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
 27    # It is possible to only calculate a slice (block) of the complete matrix.
 28    # slice is a 4 element list whose first and second elements give the range of the rows to be calculated.
 29    # the third and fourth element give the range of columns to be calculated.
 30    # slice = [start_row, end_row, start_col, end_col]
 31    # The integrals are performed using the formulas
 32
 33    #We convert the required properties to numpy arrays as this is what Numba likes.
 34    bfs_coords = cp.array([basis.bfs_coords])
 35    bfs_contr_prim_norms = cp.array([basis.bfs_contr_prim_norms])
 36    bfs_lmn = cp.array([basis.bfs_lmn])
 37    bfs_nprim = cp.array([basis.bfs_nprim])
 38        
 39
 40    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
 41    #Numba won't be able to work with these efficiently.
 42    #So, we convert them to a numpy 2d array by applying a trick,
 43    #that the second dimension is that of the largest list. So that
 44    #it can accomadate all the lists.
 45    maxnprim = max(basis.bfs_nprim)
 46    bfs_coeffs = cp.zeros([basis.bfs_nao, maxnprim])
 47    bfs_expnts = cp.zeros([basis.bfs_nao, maxnprim])
 48    bfs_prim_norms = cp.zeros([basis.bfs_nao, maxnprim])
 49    for i in range(basis.bfs_nao):
 50        for j in range(basis.bfs_nprim[i]):
 51            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
 52            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
 53            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
 54        
 55
 56
 57    if slice is None:
 58        slice = [0, basis.bfs_nao, 0, basis.bfs_nao]
 59        
 60    #Limits for the calculation of overlap integrals
 61    a = int(slice[0]) #row start index
 62    b = int(slice[1]) #row end index
 63    c = int(slice[2]) #column start index
 64    d = int(slice[3]) #column end index
 65
 66    # Infer the matrix shape from the start and end indices
 67    num_rows = b - a 
 68    num_cols = d - c 
 69    start_row = a
 70    end_row = b
 71    start_col = c
 72    end_col = d
 73    matrix_shape = (num_rows, num_cols)
 74
 75    # Check if the slice of the matrix requested falls in the lower/upper triangle or in both the triangles
 76    upper_tri = False
 77    lower_tri = False
 78    both_tri_symm = False
 79    both_tri_nonsymm = False
 80    if end_row <= start_col:
 81        upper_tri = True
 82    elif start_row >= end_col:
 83        lower_tri = True
 84    elif start_row==start_col and end_row==end_col:
 85        both_tri_symm = True
 86    else:
 87        both_tri_nonsymm = True
 88
 89    # Initialize the matrix with zeros
 90    T = cp.zeros(matrix_shape) 
 91
 92    thread_x = 32
 93    thread_y = 32
 94
 95    if cp_stream is None:
 96        device = 0
 97        cp.cuda.Device(device).use()
 98        cp_stream = cp.cuda.Stream(non_blocking = True)
 99        nb_stream = cuda.external_stream(cp_stream.ptr)
100        cp_stream.use()
101    else:
102        nb_stream = cuda.external_stream(cp_stream.ptr)
103        cp_stream.use()
104
105
106    blocks_per_grid = ((num_rows + (thread_x - 1))//thread_x, (num_cols + (thread_y - 1))//thread_y) 
107    kin_mat_symm_internal_cuda[blocks_per_grid, (thread_x, thread_y), nb_stream](bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d, lower_tri, upper_tri, both_tri_symm, both_tri_nonsymm, T)
108    if both_tri_symm:
109        symmetrize[blocks_per_grid, (thread_x, thread_y), nb_stream](a,b,c,d,T)
110    
111    # cuda.synchronize()
112    cp_stream.synchronize()
113    cp.cuda.Stream.null.synchronize()
114    return T #+ T.T - cp.diag(cp.diag(T))
def overlap_mat_symm_cupy(basis, slice=None, cp_stream=None):
 20def overlap_mat_symm_cupy(basis, slice=None, cp_stream=None):
 21    # Here the lists are converted to numpy arrays for better use with Numba.
 22    # Once these conversions are done we pass these to a Numba decorated
 23    # function that uses prange, etc. to calculate the matrix efficiently.
 24
 25    # This function calculates the kinetic energy matrix for a given basis object.
 26    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
 27    # It is possible to only calculate a slice (block) of the complete matrix.
 28    # slice is a 4 element list whose first and second elements give the range of the rows to be calculated.
 29    # the third and fourth element give the range of columns to be calculated.
 30    # slice = [start_row, end_row, start_col, end_col]
 31    # The integrals are performed using the formulas
 32
 33    #We convert the required properties to numpy arrays as this is what Numba likes.
 34    bfs_coords = cp.array([basis.bfs_coords])
 35    bfs_contr_prim_norms = cp.array([basis.bfs_contr_prim_norms])
 36    bfs_lmn = cp.array([basis.bfs_lmn])
 37    bfs_nprim = cp.array([basis.bfs_nprim])
 38        
 39
 40    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
 41    #Numba won't be able to work with these efficiently.
 42    #So, we convert them to a numpy 2d array by applying a trick,
 43    #that the second dimension is that of the largest list. So that
 44    #it can accomadate all the lists.
 45    maxnprim = max(basis.bfs_nprim)
 46    bfs_coeffs = cp.zeros([basis.bfs_nao, maxnprim])
 47    bfs_expnts = cp.zeros([basis.bfs_nao, maxnprim])
 48    bfs_prim_norms = cp.zeros([basis.bfs_nao, maxnprim])
 49    for i in range(basis.bfs_nao):
 50        for j in range(basis.bfs_nprim[i]):
 51            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
 52            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
 53            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
 54        
 55
 56
 57    if slice is None:
 58        slice = [0, basis.bfs_nao, 0, basis.bfs_nao]
 59        
 60    #Limits for the calculation of overlap integrals
 61    a = int(slice[0]) #row start index
 62    b = int(slice[1]) #row end index
 63    c = int(slice[2]) #column start index
 64    d = int(slice[3]) #column end index
 65
 66    # Infer the matrix shape from the start and end indices
 67    num_rows = b - a 
 68    num_cols = d - c 
 69    start_row = a
 70    end_row = b
 71    start_col = c
 72    end_col = d
 73    matrix_shape = (num_rows, num_cols)
 74
 75    # Check if the slice of the matrix requested falls in the lower/upper triangle or in both the triangles
 76    upper_tri = False
 77    lower_tri = False
 78    both_tri_symm = False
 79    both_tri_nonsymm = False
 80    if end_row <= start_col:
 81        upper_tri = True
 82    elif start_row >= end_col:
 83        lower_tri = True
 84    elif start_row==start_col and end_row==end_col:
 85        both_tri_symm = True
 86    else:
 87        both_tri_nonsymm = True
 88
 89    # Initialize the matrix with zeros
 90    S = cp.zeros(matrix_shape) 
 91
 92    thread_x = 24
 93    thread_y = 16
 94
 95    if cp_stream is None:
 96        device = 0
 97        cp.cuda.Device(device).use()
 98        cp_stream = cp.cuda.Stream(non_blocking = True)
 99        nb_stream = cuda.external_stream(cp_stream.ptr)
100        cp_stream.use()
101    else:
102        nb_stream = cuda.external_stream(cp_stream.ptr)
103        cp_stream.use()
104
105    blocks_per_grid = ((num_rows + (thread_x - 1))//thread_x, (num_cols + (thread_y - 1))//thread_y) 
106    overlap_mat_symm_internal_cuda[blocks_per_grid, (thread_x, thread_y), nb_stream](bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d, lower_tri, upper_tri, both_tri_symm, both_tri_nonsymm, S)
107    if both_tri_symm:
108        symmetrize[blocks_per_grid, (thread_x, thread_y), nb_stream](a,b,c,d,S)
109
110    # cuda.synchronize()
111    cp_stream.synchronize()
112    cp.cuda.Stream.null.synchronize()
113        # cp._default_memory_pool.free_all_blocks()
114    return S 
def dipole_moment_mat_symm_cupy(basis, slice=None, origin=None, stream=None):
 20def dipole_moment_mat_symm_cupy(basis, slice=None, origin=None, stream=None):
 21    # Here the lists are converted to numpy arrays for better use with Numba.
 22    # Once these conversions are done we pass these to a Numba decorated
 23    # function that uses prange, etc. to calculate the matrix efficiently.
 24
 25    # This function calculates the kinetic energy matrix for a given basis object.
 26    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
 27    # It is possible to only calculate a slice (block) of the complete matrix.
 28    # slice is a 4 element list whose first and second elements give the range of the rows to be calculated.
 29    # the third and fourth element give the range of columns to be calculated.
 30    # slice = [start_row, end_row, start_col, end_col]
 31    # The integrals are performed using the formulas
 32
 33    if origin is None:
 34        origin = cp.zeros((3))
 35    #We convert the required properties to numpy arrays as this is what Numba likes.
 36    bfs_coords = cp.array([basis.bfs_coords])
 37    bfs_contr_prim_norms = cp.array([basis.bfs_contr_prim_norms])
 38    bfs_lmn = cp.array([basis.bfs_lmn])
 39    bfs_nprim = cp.array([basis.bfs_nprim])
 40        
 41
 42    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
 43    #Numba won't be able to work with these efficiently.
 44    #So, we convert them to a numpy 2d array by applying a trick,
 45    #that the second dimension is that of the largest list. So that
 46    #it can accomadate all the lists.
 47    maxnprim = max(basis.bfs_nprim)
 48    bfs_coeffs = cp.zeros([basis.bfs_nao, maxnprim])
 49    bfs_expnts = cp.zeros([basis.bfs_nao, maxnprim])
 50    bfs_prim_norms = cp.zeros([basis.bfs_nao, maxnprim])
 51    for i in range(basis.bfs_nao):
 52        for j in range(basis.bfs_nprim[i]):
 53            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
 54            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
 55            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
 56        
 57
 58
 59    if slice is None:
 60        slice = [0, basis.bfs_nao, 0, basis.bfs_nao]
 61        
 62    #Limits for the calculation of overlap integrals
 63    a = int(slice[0]) #row start index
 64    b = int(slice[1]) #row end index
 65    c = int(slice[2]) #column start index
 66    d = int(slice[3]) #column end index
 67
 68    # Infer the matrix shape from the start and end indices
 69    num_rows = b - a 
 70    num_cols = d - c 
 71    start_row = a
 72    end_row = b
 73    start_col = c
 74    end_col = d
 75    matrix_shape = (3, num_rows, num_cols)
 76
 77    # Check if the slice of the matrix requested falls in the lower/upper triangle or in both the triangles
 78    upper_tri = False
 79    lower_tri = False
 80    both_tri_symm = False
 81    both_tri_nonsymm = False
 82    if end_row <= start_col:
 83        upper_tri = True
 84    elif start_row >= end_col:
 85        lower_tri = True
 86    elif start_row==start_col and end_row==end_col:
 87        both_tri_symm = True
 88    else:
 89        both_tri_nonsymm = True
 90
 91    # Initialize the matrix with zeros
 92    M = cp.zeros(matrix_shape) 
 93
 94    thread_x = 32
 95    thread_y = 16
 96
 97    if stream is None:
 98        device = 0
 99        cp.cuda.Device(device).use()
100        cp_stream = cp.cuda.Stream(non_blocking = True)
101        nb_stream = cuda.external_stream(cp_stream.ptr)
102    else:
103        nb_stream = stream
104
105    blocks_per_grid = ((num_rows + (thread_x - 1))//thread_x, (num_cols + (thread_y - 1))//thread_y) 
106    dipole_moment_mat_symm_internal_cuda[blocks_per_grid, (thread_x, thread_y), nb_stream](bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d, lower_tri, upper_tri, both_tri_symm, both_tri_nonsymm, origin, M)
107    if both_tri_symm:
108        symmetrize[blocks_per_grid, (thread_x, thread_y), nb_stream](a,b,c,d,M)
109    cuda.synchronize()
110    return M 
def nuc_mat_symm_cupy(basis, mol, slice=None, cp_stream=None, sqrt_ints4c2e_diag=None):
 20def nuc_mat_symm_cupy(basis, mol, slice=None, cp_stream=None, sqrt_ints4c2e_diag=None):
 21    # Here the lists are converted to numpy arrays for better use with Numba.
 22    # Once these conversions are done we pass these to a Numba decorated
 23    # function that uses prange, etc. to calculate the matrix efficiently.
 24
 25    # This function calculates the kinetic energy matrix for a given basis object.
 26    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
 27    # It is possible to only calculate a slice (block) of the complete matrix.
 28    # slice is a 4 element list whose first and second elements give the range of the rows to be calculated.
 29    # the third and fourth element give the range of columns to be calculated.
 30    # slice = [start_row, end_row, start_col, end_col]
 31    # The integrals are performed using the formulas
 32
 33    #We convert the required properties to numpy arrays as this is what Numba likes.
 34    bfs_coords = cp.array([basis.bfs_coords])
 35    bfs_contr_prim_norms = cp.array([basis.bfs_contr_prim_norms])
 36    bfs_lmn = cp.array([basis.bfs_lmn])
 37    bfs_nprim = cp.array([basis.bfs_nprim])
 38    coordsBohrs = cp.array([mol.coordsBohrs])
 39    Z = cp.array([mol.Zcharges])
 40    natoms = mol.natoms
 41
 42    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
 43    #Numba won't be able to work with these efficiently.
 44    #So, we convert them to a numpy 2d array by applying a trick,
 45    #that the second dimension is that of the largest list. So that
 46    #it can accomadate all the lists.
 47    maxnprim = max(basis.bfs_nprim)
 48    bfs_coeffs = cp.zeros([basis.bfs_nao, maxnprim])
 49    bfs_expnts = cp.zeros([basis.bfs_nao, maxnprim])
 50    bfs_prim_norms = cp.zeros([basis.bfs_nao, maxnprim])
 51    for i in range(basis.bfs_nao):
 52        for j in range(basis.bfs_nprim[i]):
 53            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
 54            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
 55            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
 56        
 57
 58
 59    if slice is None:
 60        slice = [0, basis.bfs_nao, 0, basis.bfs_nao]
 61        
 62    #Limits for the calculation of overlap integrals
 63    a = int(slice[0]) #row start index
 64    b = int(slice[1]) #row end index
 65    c = int(slice[2]) #column start index
 66    d = int(slice[3]) #column end index
 67
 68    # Infer the matrix shape from the start and end indices
 69    num_rows = b - a 
 70    num_cols = d - c 
 71    start_row = a
 72    end_row = b
 73    start_col = c
 74    end_col = d
 75    matrix_shape = (num_rows, num_cols)
 76
 77    # Check if the slice of the matrix requested falls in the lower/upper triangle or in both the triangles
 78    upper_tri = False
 79    lower_tri = False
 80    both_tri_symm = False
 81    both_tri_nonsymm = False
 82    if end_row <= start_col:
 83        upper_tri = True
 84    elif start_row >= end_col:
 85        lower_tri = True
 86    elif start_row==start_col and end_row==end_col:
 87        both_tri_symm = True
 88    else:
 89        both_tri_nonsymm = True
 90
 91    # Initialize the matrix with zeros
 92    V = cp.zeros(matrix_shape) 
 93
 94    if sqrt_ints4c2e_diag is None:
 95        #Create dummy array
 96        sqrt_ints4c2e_diag = cp.zeros((1,1), dtype=cp.float64)
 97        isSchwarz = False
 98    else:
 99        isSchwarz = True
100        sqrt_ints4c2e_diag = cp.asarray(sqrt_ints4c2e_diag)
101
102    
103
104    if cp_stream is None:
105        device = 0
106        cp.cuda.Device(device).use()
107        cp_stream = cp.cuda.Stream(non_blocking = True)
108        nb_stream = cuda.external_stream(cp_stream.ptr)
109        cp_stream.use()
110    else:
111        nb_stream = cuda.external_stream(cp_stream.ptr)
112        cp_stream.use()
113
114    thread_x = 32
115    thread_y = 32
116    blocks_per_grid = ((num_rows + (thread_x - 1))//thread_x, (num_cols + (thread_y - 1))//thread_y) 
117    nuc_mat_symm_internal_cuda[blocks_per_grid, (thread_x, thread_y), nb_stream](bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d, Z[0], coordsBohrs[0], natoms, lower_tri, upper_tri, both_tri_symm, both_tri_nonsymm, sqrt_ints4c2e_diag, isSchwarz, V)
118    if both_tri_symm:
119        thread_x = 32
120        thread_y = 32
121        blocks_per_grid = ((num_rows + (thread_x - 1))//thread_x, (num_cols + (thread_y - 1))//thread_y) 
122        symmetrize[blocks_per_grid, (thread_x, thread_y), nb_stream](a,b,c,d,V)
123    if cp_stream is None:
124        cuda.synchronize()
125    else:
126        cp_stream.synchronize()
127        cp.cuda.Stream.null.synchronize()
128        # cp._default_memory_pool.free_all_blocks()
129    return V
def kin_mat_symm_shell_cupy(basis, slice=None, cp_stream=None):
20def kin_mat_symm_shell_cupy(basis, slice=None, cp_stream=None):
21    # Here the lists are converted to numpy arrays for better use with Numba.
22    # Once these conversions are done we pass these to a Numba decorated
23    # function that uses prange, etc. to calculate the matrix efficiently.
24
25    # This function calculates the kinetic energy matrix for a given basis object.
26    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
27    # It is possible to only calculate a slice (block) of the complete matrix.
28    # slice is a 4 element list whose first and second elements give the range of the rows to be calculated.
29    # the third and fourth element give the range of columns to be calculated.
30    # slice = [start_row, end_row, start_col, end_col]
31    # The integrals are performed using the formulas
32
33    #We convert the required properties to numpy arrays as this is what Numba likes.
34    bfs_coords = cp.array([basis.bfs_coords])
35    bfs_contr_prim_norms = cp.array([basis.bfs_contr_prim_norms])
36    bfs_lmn = cp.array([basis.bfs_lmn])
37    bfs_nprim = cp.array([basis.bfs_nprim])
38    bfs_offset = cp.array([basis.shell_bfs_offset])
39    bfs_nbfshell = cp.array([basis.bfs_nbfshell])
40        
41
42    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
43    #Numba won't be able to work with these efficiently.
44    #So, we convert them to a numpy 2d array by applying a trick,
45    #that the second dimension is that of the largest list. So that
46    #it can accomadate all the lists.
47    maxnprim = max(basis.bfs_nprim)
48    bfs_coeffs = cp.zeros([basis.bfs_nao, maxnprim])
49    bfs_expnts = cp.zeros([basis.bfs_nao, maxnprim])
50    bfs_prim_norms = cp.zeros([basis.bfs_nao, maxnprim])
51    for i in range(basis.bfs_nao):
52        for j in range(basis.bfs_nprim[i]):
53            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
54            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
55            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
56        
57
58
59    
60    matrix_shape = (basis.bfs_nao, basis.bfs_nao)
61
62    # Initialize the matrix with zeros
63    T = cp.zeros(matrix_shape) 
64
65    thread_x = 32
66    thread_y = 32
67
68    if cp_stream is None:
69        device = 0
70        cp.cuda.Device(device).use()
71        cp_stream = cp.cuda.Stream(non_blocking = True)
72        nb_stream = cuda.external_stream(cp_stream.ptr)
73        cp_stream.use()
74    else:
75        nb_stream = cuda.external_stream(cp_stream.ptr)
76        cp_stream.use()
77
78    thread_x = 32
79    thread_y = 32
80    blocks_per_grid = ((basis.nshells + (thread_x - 1))//thread_x, (basis.nshells + (thread_y - 1))//thread_y) 
81    kin_mat_symm_shell_internal_cuda[blocks_per_grid, (32, 32), nb_stream](bfs_offset[0], bfs_nbfshell[0], bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, basis.nshells, T)
82    blocks_per_grid = ((basis.bfs_nao + (thread_x - 1))//thread_x, (basis.bfs_nao + (thread_y - 1))//thread_y) 
83    symmetrize[blocks_per_grid, (thread_x, thread_y), nb_stream](0,basis.bfs_nao,0,basis.bfs_nao,T)
84    if cp_stream is None:
85        cuda.synchronize()
86    else:
87        cp_stream.synchronize()
88        cp.cuda.Stream.null.synchronize()
89    return T #+ T.T - cp.diag(cp.diag(T))
def rys_2c2e_symm_cupy(basis, slice=None, cp_stream=None):
 21def rys_2c2e_symm_cupy(basis, slice=None, cp_stream=None):
 22    # Here the lists are converted to numpy arrays for better use with Numba.
 23    # Once these conversions are done we pass these to a Numba decorated
 24    # function that uses prange, etc. to calculate the matrix efficiently.
 25
 26    # This function calculates the kinetic energy matrix for a given basis object.
 27    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
 28    # It is possible to only calculate a slice (block) of the complete matrix.
 29    # slice is a 4 element list whose first and second elements give the range of the rows to be calculated.
 30    # the third and fourth element give the range of columns to be calculated.
 31    # slice = [start_row, end_row, start_col, end_col]
 32    # The integrals are performed using the formulas
 33
 34    #We convert the required properties to numpy arrays as this is what Numba likes.
 35    bfs_coords = cp.array([basis.bfs_coords])
 36    bfs_contr_prim_norms = cp.array([basis.bfs_contr_prim_norms])
 37    bfs_lmn = cp.array([basis.bfs_lmn])
 38    bfs_nprim = cp.array([basis.bfs_nprim])
 39
 40    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
 41    #Numba won't be able to work with these efficiently.
 42    #So, we convert them to a numpy 2d array by applying a trick,
 43    #that the second dimension is that of the largest list. So that
 44    #it can accomadate all the lists.
 45    maxnprim = max(basis.bfs_nprim)
 46    bfs_coeffs = cp.zeros([basis.bfs_nao, maxnprim])
 47    bfs_expnts = cp.zeros([basis.bfs_nao, maxnprim])
 48    bfs_prim_norms = cp.zeros([basis.bfs_nao, maxnprim])
 49    for i in range(basis.bfs_nao):
 50        for j in range(basis.bfs_nprim[i]):
 51            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
 52            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
 53            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
 54        
 55
 56    DATA_X_cuda = cp.asarray(DATA_X)
 57    DATA_W_cuda = cp.asarray(DATA_W)
 58
 59    if slice is None:
 60        slice = [0, basis.bfs_nao, 0, basis.bfs_nao]
 61        
 62    #Limits for the calculation of overlap integrals
 63    a = int(slice[0]) #row start index
 64    b = int(slice[1]) #row end index
 65    c = int(slice[2]) #column start index
 66    d = int(slice[3]) #column end index
 67
 68    # Infer the matrix shape from the start and end indices
 69    num_rows = b - a 
 70    num_cols = d - c 
 71    start_row = a
 72    end_row = b
 73    start_col = c
 74    end_col = d
 75    matrix_shape = (num_rows, num_cols)
 76
 77    # Check if the slice of the matrix requested falls in the lower/upper triangle or in both the triangles
 78    # Check if the slice of the matrix requested falls in the lower/upper triangle or in both the triangles
 79    tri_symm = False
 80    no_symm = False
 81    if start_row==start_col and end_row==end_col:
 82        tri_symm = True
 83    else:
 84        no_symm = True
 85
 86    # Initialize the matrix with zeros
 87    V = cp.zeros(matrix_shape) 
 88
 89    
 90
 91    
 92
 93    if cp_stream is None:
 94        device = 0
 95        cp.cuda.Device(device).use()
 96        cp_stream = cp.cuda.Stream(non_blocking = True)
 97        nb_stream = cuda.external_stream(cp_stream.ptr)
 98        cp_stream.use()
 99    else:
100        nb_stream = cuda.external_stream(cp_stream.ptr)
101        cp_stream.use()
102
103    thread_x = 32
104    thread_y = 32
105    blocks_per_grid = ((num_rows + (thread_x - 1))//thread_x, (num_cols + (thread_y - 1))//thread_y) 
106    rys_2c2e_symm_internal_cuda[blocks_per_grid, (thread_x, thread_y), nb_stream](bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d, tri_symm, no_symm, DATA_X_cuda, DATA_W_cuda, V)
107    if tri_symm:
108        thread_x = 32
109        thread_y = 32
110        blocks_per_grid = ((num_rows + (thread_x - 1))//thread_x, (num_cols + (thread_y - 1))//thread_y) 
111        symmetrize[blocks_per_grid, (thread_x, thread_y), nb_stream](a,b,c,d,V)
112    if cp_stream is None:
113        cuda.synchronize()
114    else:
115        cp_stream.synchronize()
116        cp.cuda.Stream.null.synchronize()
117        # cp._default_memory_pool.free_all_blocks()
118    return V
def rys_3c2e_symm_cupy( basis, auxbasis, slice=None, schwarz=False, schwarz_threshold=1e-09, cp_stream=None):
 23def rys_3c2e_symm_cupy(basis, auxbasis, slice=None, schwarz=False, schwarz_threshold=1e-9, cp_stream=None):
 24    # Here the lists are converted to numpy arrays for better use with Numba.
 25    # Once these conversions are done we pass these to a Numba decorated
 26    # function that uses prange, etc. to calculate the matrix efficiently.
 27
 28    # This function calculates the kinetic energy matrix for a given basis object.
 29    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
 30    # It is possible to only calculate a slice (block) of the complete matrix.
 31    # slice is a 4 element list whose first and second elements give the range of the rows to be calculated.
 32    # the third and fourth element give the range of columns to be calculated.
 33    # slice = [start_row, end_row, start_col, end_col]
 34    # The integrals are performed using the formulas
 35
 36    #We convert the required properties to numpy arrays as this is what Numba likes.
 37    bfs_coords = cp.array([basis.bfs_coords])
 38    bfs_contr_prim_norms = cp.array([basis.bfs_contr_prim_norms])
 39    bfs_lmn = cp.array([basis.bfs_lmn])
 40    bfs_nprim = cp.array([basis.bfs_nprim])
 41
 42    aux_bfs_coords = cp.array([auxbasis.bfs_coords])
 43    aux_bfs_contr_prim_norms = cp.array([auxbasis.bfs_contr_prim_norms])
 44    aux_bfs_lmn = cp.array([auxbasis.bfs_lmn])
 45    aux_bfs_nprim = cp.array([auxbasis.bfs_nprim])
 46
 47    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
 48    #Numba won't be able to work with these efficiently.
 49    #So, we convert them to a numpy 2d array by applying a trick,
 50    #that the second dimension is that of the largest list. So that
 51    #it can accomadate all the lists.
 52    maxnprim = max(basis.bfs_nprim)
 53    bfs_coeffs = cp.zeros([basis.bfs_nao, maxnprim])
 54    bfs_expnts = cp.zeros([basis.bfs_nao, maxnprim])
 55    bfs_prim_norms = cp.zeros([basis.bfs_nao, maxnprim])
 56    for i in range(basis.bfs_nao):
 57        for j in range(basis.bfs_nprim[i]):
 58            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
 59            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
 60            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
 61    maxnprimaux = max(auxbasis.bfs_nprim)
 62    aux_bfs_coeffs = cp.zeros([auxbasis.bfs_nao, maxnprimaux])
 63    aux_bfs_expnts = cp.zeros([auxbasis.bfs_nao, maxnprimaux])
 64    aux_bfs_prim_norms = cp.zeros([auxbasis.bfs_nao, maxnprimaux])
 65    for i in range(auxbasis.bfs_nao):
 66        for j in range(auxbasis.bfs_nprim[i]):
 67            aux_bfs_coeffs[i,j] = auxbasis.bfs_coeffs[i][j]
 68            aux_bfs_expnts[i,j] = auxbasis.bfs_expnts[i][j]
 69            aux_bfs_prim_norms[i,j] = auxbasis.bfs_prim_norms[i][j]
 70        
 71
 72    DATA_X_cuda = cp.asarray(DATA_X)
 73    DATA_W_cuda = cp.asarray(DATA_W)
 74
 75    if slice is None:
 76        slice = [0, basis.bfs_nao, 0, basis.bfs_nao, 0, auxbasis.bfs_nao]
 77
 78    if schwarz:
 79        ints4c2e_diag = cp.asarray(eri_4c2e_diag(basis))
 80        ints2c2e = rys_2c2e_symm_cupy(auxbasis)
 81        sqrt_ints4c2e_diag = cp.sqrt(cp.abs(ints4c2e_diag))
 82        sqrt_diag_ints2c2e = cp.sqrt(cp.abs(cp.diag(ints2c2e)))
 83        print('Prelims calc done for Schwarz screening!')
 84    else:
 85        #Create dummy array
 86        sqrt_ints4c2e_diag = cp.zeros((1,1), dtype=cp.float64)
 87        sqrt_diag_ints2c2e = cp.zeros((1), dtype=cp.float64)
 88        
 89    #Limits for the calculation of 4c2e integrals
 90    indx_startA = int(slice[0])
 91    indx_endA = int(slice[1])
 92    indx_startB = int(slice[2])
 93    indx_endB = int(slice[3])
 94    indx_startC = int(slice[4])
 95    indx_endC = int(slice[5])
 96    # Infer the matrix shape from the start and end indices
 97    num_A = indx_endA - indx_startA 
 98    num_B = indx_endB - indx_startB 
 99    num_C = indx_endC - indx_startC
100    matrix_shape = (num_A, num_B, num_C)
101
102    
103    # Check if the slice of the matrix requested falls in the lower/upper triangle or in both the triangles
104    tri_symm = False
105    no_symm = False
106    if indx_startA==indx_startB and indx_endA==indx_endB:
107        tri_symm = True
108    else:
109        no_symm = True
110    
111
112    # Initialize the matrix with zeros
113    threeC2E = cp.zeros(matrix_shape, dtype=cp.float64)
114
115    if cp_stream is None:
116        device = 0
117        cp.cuda.Device(device).use()
118        cp_stream = cp.cuda.Stream(non_blocking = True)
119        nb_stream = cuda.external_stream(cp_stream.ptr)
120        cp_stream.use()
121    else:
122        nb_stream = cuda.external_stream(cp_stream.ptr)
123        cp_stream.use()
124
125    thread_x = 32
126    thread_y = 32
127    blocks_per_grid = ((num_A + (thread_x - 1))//thread_x, (num_B + (thread_y - 1))//thread_y) 
128    rys_3c2e_symm_internal_cuda_new[blocks_per_grid, (thread_x, thread_y), nb_stream](bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, aux_bfs_coords[0], aux_bfs_contr_prim_norms[0], aux_bfs_lmn[0], aux_bfs_nprim[0], aux_bfs_coeffs, aux_bfs_prim_norms, aux_bfs_expnts, indx_startA, indx_endA, indx_startB, indx_endB, indx_startC, indx_endC, tri_symm, no_symm, DATA_X_cuda, DATA_W_cuda, schwarz, sqrt_ints4c2e_diag, sqrt_diag_ints2c2e, schwarz_threshold, threeC2E)
129    if tri_symm:
130        symmetrize[blocks_per_grid, (thread_x, thread_y), nb_stream](indx_startA, indx_endA, indx_startB, indx_endB, indx_startC, indx_endC,threeC2E)
131    if cp_stream is None:
132        cuda.synchronize()
133    else:
134        cp_stream.synchronize()
135        cp.cuda.Stream.null.synchronize()
136        # cp._default_memory_pool.free_all_blocks()
137    return threeC2E
def rys_3c2e_symm_cupy_fp32( basis, auxbasis, slice=None, schwarz=True, schwarz_threshold=1e-09, cp_stream=None):
 23def rys_3c2e_symm_cupy_fp32(basis, auxbasis, slice=None, schwarz=True, schwarz_threshold=1e-9, cp_stream=None):
 24    # Here the lists are converted to numpy arrays for better use with Numba.
 25    # Once these conversions are done we pass these to a Numba decorated
 26    # function that uses prange, etc. to calculate the matrix efficiently.
 27
 28    # This function calculates the kinetic energy matrix for a given basis object.
 29    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
 30    # It is possible to only calculate a slice (block) of the complete matrix.
 31    # slice is a 4 element list whose first and second elements give the range of the rows to be calculated.
 32    # the third and fourth element give the range of columns to be calculated.
 33    # slice = [start_row, end_row, start_col, end_col]
 34    # The integrals are performed using the formulas
 35
 36    #We convert the required properties to numpy arrays as this is what Numba likes.
 37    bfs_coords = cp.array([basis.bfs_coords], dtype=cp.float32)
 38    bfs_contr_prim_norms = cp.array([basis.bfs_contr_prim_norms], dtype=cp.float32)
 39    bfs_lmn = cp.array([basis.bfs_lmn])
 40    bfs_nprim = cp.array([basis.bfs_nprim])
 41
 42    aux_bfs_coords = cp.array([auxbasis.bfs_coords], dtype=cp.float32)
 43    aux_bfs_contr_prim_norms = cp.array([auxbasis.bfs_contr_prim_norms], dtype=cp.float32)
 44    aux_bfs_lmn = cp.array([auxbasis.bfs_lmn])
 45    aux_bfs_nprim = cp.array([auxbasis.bfs_nprim], dtype=cp.float32)
 46
 47    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
 48    #Numba won't be able to work with these efficiently.
 49    #So, we convert them to a numpy 2d array by applying a trick,
 50    #that the second dimension is that of the largest list. So that
 51    #it can accomadate all the lists.
 52    maxnprim = max(basis.bfs_nprim)
 53    bfs_coeffs = cp.zeros([basis.bfs_nao, maxnprim], dtype=cp.float32)
 54    bfs_expnts = cp.zeros([basis.bfs_nao, maxnprim], dtype=cp.float32)
 55    bfs_prim_norms = cp.zeros([basis.bfs_nao, maxnprim], dtype=cp.float32)
 56    for i in range(basis.bfs_nao):
 57        for j in range(basis.bfs_nprim[i]):
 58            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
 59            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
 60            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
 61    maxnprimaux = max(auxbasis.bfs_nprim)
 62    aux_bfs_coeffs = cp.zeros([auxbasis.bfs_nao, maxnprimaux], dtype=cp.float32)
 63    aux_bfs_expnts = cp.zeros([auxbasis.bfs_nao, maxnprimaux], dtype=cp.float32)
 64    aux_bfs_prim_norms = cp.zeros([auxbasis.bfs_nao, maxnprimaux], dtype=cp.float32)
 65    for i in range(auxbasis.bfs_nao):
 66        for j in range(auxbasis.bfs_nprim[i]):
 67            aux_bfs_coeffs[i,j] = auxbasis.bfs_coeffs[i][j]
 68            aux_bfs_expnts[i,j] = auxbasis.bfs_expnts[i][j]
 69            aux_bfs_prim_norms[i,j] = auxbasis.bfs_prim_norms[i][j]
 70        
 71
 72    DATA_X_cuda = cp.asarray(DATA_X, dtype=cp.float32)
 73    DATA_W_cuda = cp.asarray(DATA_W, dtype=cp.float32)
 74
 75    if slice is None:
 76        slice = [0, basis.bfs_nao, 0, basis.bfs_nao, 0, auxbasis.bfs_nao]
 77
 78    if schwarz:
 79        ints4c2e_diag = cp.asarray(eri_4c2e_diag(basis))
 80        ints2c2e = rys_2c2e_symm_cupy(auxbasis)
 81        sqrt_ints4c2e_diag = cp.sqrt(cp.abs(ints4c2e_diag))
 82        sqrt_diag_ints2c2e = cp.sqrt(cp.abs(cp.diag(ints2c2e)))
 83        print('Prelims calc done for Schwarz screening!')
 84    else:
 85        #Create dummy array
 86        sqrt_ints4c2e_diag = cp.zeros((1,1), dtype=cp.float64)
 87        sqrt_diag_ints2c2e = cp.zeros((1), dtype=cp.float64)
 88        
 89    #Limits for the calculation of 4c2e integrals
 90    indx_startA = int(slice[0])
 91    indx_endA = int(slice[1])
 92    indx_startB = int(slice[2])
 93    indx_endB = int(slice[3])
 94    indx_startC = int(slice[4])
 95    indx_endC = int(slice[5])
 96    # Infer the matrix shape from the start and end indices
 97    num_A = indx_endA - indx_startA 
 98    num_B = indx_endB - indx_startB 
 99    num_C = indx_endC - indx_startC
100    matrix_shape = (num_A, num_B, num_C)
101
102    
103    # Check if the slice of the matrix requested falls in the lower/upper triangle or in both the triangles
104    tri_symm = False
105    no_symm = False
106    if indx_startA==indx_startB and indx_endA==indx_endB:
107        tri_symm = True
108    else:
109        no_symm = True
110    
111
112    # Initialize the matrix with zeros
113    threeC2E = cp.zeros(matrix_shape, dtype=cp.float32)
114
115    if cp_stream is None:
116        device = 0
117        cp.cuda.Device(device).use()
118        cp_stream = cp.cuda.Stream(non_blocking = True)
119        nb_stream = cuda.external_stream(cp_stream.ptr)
120        cp_stream.use()
121    else:
122        nb_stream = cuda.external_stream(cp_stream.ptr)
123        cp_stream.use()
124
125    thread_x = 32
126    thread_y = 32
127    blocks_per_grid = ((num_A + (thread_x - 1))//thread_x, (num_B + (thread_y - 1))//thread_y) 
128    rys_3c2e_symm_internal_cuda[blocks_per_grid, (thread_x, thread_y), nb_stream](bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, aux_bfs_coords[0], aux_bfs_contr_prim_norms[0], aux_bfs_lmn[0], aux_bfs_nprim[0], aux_bfs_coeffs, aux_bfs_prim_norms, aux_bfs_expnts, indx_startA, indx_endA, indx_startB, indx_endB, indx_startC, indx_endC, tri_symm, no_symm, DATA_X_cuda, DATA_W_cuda, schwarz, sqrt_ints4c2e_diag, sqrt_diag_ints2c2e, schwarz_threshold, threeC2E)
129    if tri_symm:
130        symmetrize[blocks_per_grid, (thread_x, thread_y), nb_stream](indx_startA, indx_endA, indx_startB, indx_endB, indx_startC, indx_endC,threeC2E)
131    if cp_stream is None:
132        cuda.synchronize()
133    else:
134        cp_stream.synchronize()
135        cp.cuda.Stream.null.synchronize()
136        # cp._default_memory_pool.free_all_blocks()
137    return threeC2E
def overlap_mat_grad_symm(basis, slice=None):
 7def overlap_mat_grad_symm(basis, slice=None):
 8    # Here the lists are converted to numpy arrays for better use with Numba.
 9    # Once these conversions are done we pass these to a Numba decorated
10    # function that uses prange, etc. to calculate the matrix efficiently.
11
12    # This function calculates the overlap matrix for a given basis object.
13    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
14    # It is possible to only calculate a slice (block) of the complete matrix.
15    # slice is a 4 element list whose first and second elements give the range of the rows to be calculated.
16    # the third and fourth element give the range of columns to be calculated.
17    # slice = [start_row, end_row, start_col, end_col]
18    # The integrals are performed using the formulas
19
20    #We convert the required properties to numpy arrays as this is what Numba likes.
21    bfs_coords = np.array([basis.bfs_coords])
22    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
23    bfs_lmn = np.array([basis.bfs_lmn])
24    bfs_nprim = np.array([basis.bfs_nprim])
25    bfs_atoms = np.array([basis.bfs_atoms])
26    natoms = max(basis.bfs_atoms) + 1
27        
28
29    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
30    #Numba won't be able to work with these efficiently.
31    #So, we convert them to a numpy 2d array by applying a trick,
32    #that the second dimension is that of the largest list. So that
33    #it can accomadate all the lists.
34    maxnprim = max(basis.bfs_nprim)
35    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
36    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
37    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
38    for i in range(basis.bfs_nao):
39        for j in range(basis.bfs_nprim[i]):
40            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
41            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
42            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
43        
44
45    if slice is None:
46        slice = [0, basis.bfs_nao, 0, basis.bfs_nao]
47        
48    #Limits for the calculation of overlap integrals
49    a = int(slice[0]) #row start index
50    b = int(slice[1]) #row end index
51    c = int(slice[2]) #column start index
52    d = int(slice[3]) #column end index
53        
54    dS = overlap_mat_grad_symm_internal_new(natoms, bfs_atoms[0], bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d)
55    return dS
def kin_mat_grad_symm(basis, slice=None):
 7def kin_mat_grad_symm(basis, slice=None):
 8    # Here the lists are converted to numpy arrays for better use with Numba.
 9    # Once these conversions are done we pass these to a Numba decorated
10    # function that uses prange, etc. to calculate the matrix efficiently.
11
12    # This function calculates the kinetic energy matrix for a given basis object.
13    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
14    # It is possible to only calculate a slice (block) of the complete matrix.
15    # slice is a 4 element list whose first and second elements give the range of the rows to be calculated.
16    # the third and fourth element give the range of columns to be calculated.
17    # slice = [start_row, end_row, start_col, end_col]
18    # The integrals are performed using the formulas
19
20    #We convert the required properties to numpy arrays as this is what Numba likes.
21    bfs_coords = np.array([basis.bfs_coords])
22    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
23    bfs_lmn = np.array([basis.bfs_lmn])
24    bfs_nprim = np.array([basis.bfs_nprim])
25    bfs_atoms = np.array([basis.bfs_atoms])
26    natoms = max(basis.bfs_atoms) + 1
27        
28
29    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
30    #Numba won't be able to work with these efficiently.
31    #So, we convert them to a numpy 2d array by applying a trick,
32    #that the second dimension is that of the largest list. So that
33    #it can accomadate all the lists.
34    maxnprim = max(basis.bfs_nprim)
35    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
36    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
37    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
38    for i in range(basis.bfs_nao):
39        for j in range(basis.bfs_nprim[i]):
40            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
41            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
42            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
43        
44
45
46    if slice is None:
47        slice = [0, basis.bfs_nao, 0, basis.bfs_nao]
48        
49    #Limits for the calculation of overlap integrals
50    a = int(slice[0]) #row start index
51    b = int(slice[1]) #row end index
52    c = int(slice[2]) #column start index
53    d = int(slice[3]) #column end index
54
55        
56    dT = kin_mat_symm_grad_internal_new(natoms, bfs_atoms[0], bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d)
57    
58    return dT 
def nuc_mat_grad_symm(basis, mol, slice=None, sqrt_ints4c2e_diag=None):
 7def nuc_mat_grad_symm(basis, mol, slice=None, sqrt_ints4c2e_diag=None):
 8    #Here the lists are converted to numpy arrays for better use with Numba.
 9    #Once these conversions are done we pass these to a Numba decorated
10    #function that uses prange, etc. to calculate the matrix efficiently.
11
12    # This function calculates the nuclear matrix for a given basis object.
13    # The basis object holds the information of basis functions like: exponents, coeffs, etc.
14    # It is possible to only calculate a slice (block) of the complete matrix.
15    # slice is a 4 element list whose first and second elements give the range of the rows to be calculated.
16    # the third and fourth element give the range of columns to be calculated.
17    # slice = [start_row, end_row, start_col, end_col]
18    # The integrals are performed using the formulas
19
20    #We convert the required properties to numpy arrays as this is what Numba likes.
21    bfs_coords = np.array([basis.bfs_coords])
22    bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms])
23    bfs_lmn = np.array([basis.bfs_lmn])
24    bfs_nprim = np.array([basis.bfs_nprim])
25    coordsBohrs = np.array([mol.coordsBohrs])
26    Z = np.array([mol.Zcharges])
27    natoms = mol.natoms
28        
29
30    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
31    #Numba won't be able to work with these efficiently.
32    #So, we convert them to a numpy 2d array by applying a trick,
33    #that the second dimension is that of the largest list. So that
34    #it can accomadate all the lists.
35    maxnprim = max(basis.bfs_nprim)
36    bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim])
37    bfs_expnts = np.zeros([basis.bfs_nao, maxnprim])
38    bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim])
39    for i in range(basis.bfs_nao):
40        for j in range(basis.bfs_nprim[i]):
41            bfs_coeffs[i,j] = basis.bfs_coeffs[i][j]
42            bfs_expnts[i,j] = basis.bfs_expnts[i][j]
43            bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j]
44        
45
46
47    if slice is None:
48        slice = [0, basis.bfs_nao, 0, basis.bfs_nao]
49        
50    #Limits for the calculation of overlap integrals
51    a = int(slice[0]) #row start index
52    b = int(slice[1]) #row end index
53    c = int(slice[2]) #column start index
54    d = int(slice[3]) #column end index
55
56    # print([a,b,c,d])
57
58    if sqrt_ints4c2e_diag is None:
59        #Create dummy array
60        sqrt_ints4c2e_diag = np.zeros((1,1), dtype=np.float64)
61        isSchwarz = False
62    else:
63        isSchwarz = True
64    
65    V = nuc_mat_grad_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d, Z[0], coordsBohrs[0], natoms, sqrt_ints4c2e_diag, isSchwarz)
66    
67    return V 
def cross_overlap_mat_symm(basisA, basisB, slice=None):
  7def cross_overlap_mat_symm(basisA, basisB, slice=None):
  8    """
  9    Compute the overlap matrix between two distinct basis sets.
 10
 11    This function calculates the overlap integrals ⟨χ_i^A | χ_j^B⟩ between basis functions 
 12    from two different basis sets, `basisA` and `basisB`. It supports block-wise computation 
 13    of the matrix to save time and memory during large-scale calculations.
 14
 15    S_{μν}^{AB} = ⟨ χ_μ^A | χ_ν^B ⟩
 16
 17    All integrals are evaluated using a Numba-accelerated backend that converts 
 18    required basis set properties into NumPy arrays for efficient performance.
 19
 20    Parameters
 21    ----------
 22    basisA : object
 23        The first basis set object containing properties such as:
 24        - bfs_coords: Cartesian coordinates of centers
 25        - bfs_coeffs: Contraction coefficients
 26        - bfs_expnts: Gaussian exponents
 27        - bfs_prim_norms: Primitive normalization constants
 28        - bfs_contr_prim_norms: Contraction normalization constants
 29        - bfs_lmn: Angular momentum quantum numbers
 30        - bfs_nprim: Number of primitives per AO
 31        - bfs_nao: Total number of atomic orbitals
 32
 33    basisB : object
 34        The second basis set object, structured identically to `basisA`.
 35
 36    slice : list of int, optional
 37        A 4-element list `[start_row, end_row, start_col, end_col]` that defines
 38        a block of the full matrix to compute. Rows correspond to functions in `basisA`,
 39        and columns to those in `basisB`. If `None` (default), the full matrix is computed.
 40
 41    Returns
 42    -------
 43    S : ndarray of shape (end_row - start_row, end_col - start_col)
 44        The computed cross overlap (sub)matrix.
 45
 46    Notes
 47    -----
 48    The function handles contraction and normalization internally. Since Numba does not
 49    support jagged lists, the function reshapes data into padded 2D arrays based on
 50    the maximum number of primitives in either basis set.
 51
 52    Examples
 53    --------
 54    >>> S = cross_overlap_mat_symm(basisA, basisB)
 55    >>> S_block = cross_overlap_mat_symm(basisA, basisB, slice=[0, 5, 0, 10])
 56    """
 57    # Here the lists are converted to numpy arrays for better use with Numba.
 58    # Once these conversions are done we pass these to a Numba decorated
 59    # function that uses prange, etc. to calculate the matrix efficiently.
 60
 61    # This function calculates the overlap matrix between two basis objects: basisA and basisB.
 62    # The basis objects hold the information of basis functions like: exponents, coeffs, etc.
 63    # It is possible to only calculate a slice (block) of the complete matrix.
 64    # slice is a 4 element list whose first and second elements give the range of the rows (basisA) to be calculated.
 65    # the third and fourth element give the range of columns (basisB) to be calculated.
 66    # slice = [start_row, end_row, start_col, end_col]
 67    # The integrals are performed using the formulas
 68
 69    #We convert the required properties to numpy arrays as this is what Numba likes.
 70    bfsA_coords = np.array([basisA.bfs_coords])
 71    bfsA_contr_prim_norms = np.array([basisA.bfs_contr_prim_norms])
 72    bfsA_lmn = np.array([basisA.bfs_lmn])
 73    bfsA_nprim = np.array([basisA.bfs_nprim])
 74    
 75    bfsB_coords = np.array([basisB.bfs_coords])
 76    bfsB_contr_prim_norms = np.array([basisB.bfs_contr_prim_norms])
 77    bfsB_lmn = np.array([basisB.bfs_lmn])
 78    bfsB_nprim = np.array([basisB.bfs_nprim])
 79        
 80
 81    #The remaining properties like bfs_coeffs are a list of lists of unequal sizes.
 82    #Numba won't be able to work with these efficiently.
 83    #So, we convert them to a numpy 2d array by applying a trick,
 84    #that the second dimension is that of the largest list. So that
 85    #it can accomadate all the lists.
 86    maxnprimA = max(basisA.bfs_nprim)
 87    bfsA_coeffs = np.zeros([basisA.bfs_nao, maxnprimA])
 88    bfsA_expnts = np.zeros([basisA.bfs_nao, maxnprimA])
 89    bfsA_prim_norms = np.zeros([basisA.bfs_nao, maxnprimA])
 90    for i in range(basisA.bfs_nao):
 91        for j in range(basisA.bfs_nprim[i]):
 92            bfsA_coeffs[i,j] = basisA.bfs_coeffs[i][j]
 93            bfsA_expnts[i,j] = basisA.bfs_expnts[i][j]
 94            bfsA_prim_norms[i,j] = basisA.bfs_prim_norms[i][j]
 95    
 96    maxnprimB = max(basisB.bfs_nprim)
 97    bfsB_coeffs = np.zeros([basisB.bfs_nao, maxnprimB])
 98    bfsB_expnts = np.zeros([basisB.bfs_nao, maxnprimB])
 99    bfsB_prim_norms = np.zeros([basisB.bfs_nao, maxnprimB])
100    for i in range(basisB.bfs_nao):
101        for j in range(basisB.bfs_nprim[i]):
102            bfsB_coeffs[i,j] = basisB.bfs_coeffs[i][j]
103            bfsB_expnts[i,j] = basisB.bfs_expnts[i][j]
104            bfsB_prim_norms[i,j] = basisB.bfs_prim_norms[i][j]
105        
106
107    if slice is None:
108        slice = [0, basisA.bfs_nao, 0, basisB.bfs_nao]
109        
110    #Limits for the calculation of overlap integrals
111    a = int(slice[0]) #row start index (basisA)
112    b = int(slice[1]) #row end index (basisA)
113    c = int(slice[2]) #column start index (basisB)
114    d = int(slice[3]) #column end index (basisB)
115        
116    S = cross_overlap_mat_internal(bfsA_coords[0], bfsA_contr_prim_norms[0], bfsA_lmn[0], bfsA_nprim[0], bfsA_coeffs, bfsA_prim_norms, bfsA_expnts,
117                                       bfsB_coords[0], bfsB_contr_prim_norms[0], bfsB_lmn[0], bfsB_nprim[0], bfsB_coeffs, bfsB_prim_norms, bfsB_expnts,
118                                       a, b, c, d)
119    return S

Compute the overlap matrix between two distinct basis sets.

This function calculates the overlap integrals ⟨χ_i^A | χ_j^B⟩ between basis functions from two different basis sets, basisA and basisB. It supports block-wise computation of the matrix to save time and memory during large-scale calculations.

S_{μν}^{AB} = ⟨ χ_μ^A | χ_ν^B ⟩

All integrals are evaluated using a Numba-accelerated backend that converts required basis set properties into NumPy arrays for efficient performance.

Parameters

basisA : object The first basis set object containing properties such as: - bfs_coords: Cartesian coordinates of centers - bfs_coeffs: Contraction coefficients - bfs_expnts: Gaussian exponents - bfs_prim_norms: Primitive normalization constants - bfs_contr_prim_norms: Contraction normalization constants - bfs_lmn: Angular momentum quantum numbers - bfs_nprim: Number of primitives per AO - bfs_nao: Total number of atomic orbitals

basisB : object The second basis set object, structured identically to basisA.

slice : list of int, optional A 4-element list [start_row, end_row, start_col, end_col] that defines a block of the full matrix to compute. Rows correspond to functions in basisA, and columns to those in basisB. If None (default), the full matrix is computed.

Returns

S : ndarray of shape (end_row - start_row, end_col - start_col) The computed cross overlap (sub)matrix.

Notes

The function handles contraction and normalization internally. Since Numba does not support jagged lists, the function reshapes data into padded 2D arrays based on the maximum number of primitives in either basis set.

Examples

>>> S = cross_overlap_mat_symm(basisA, basisB)
>>> S_block = cross_overlap_mat_symm(basisA, basisB, slice=[0, 5, 0, 10])