pyfock.Integrals
This submodule provides highly optimized and symmetry-aware routines for evaluating all essential one-electron and two-electron integrals needed in electronic structure calculations. The integrals are computed over Gaussian type orbitals (GTOs), and both CPU (NumPy) and GPU (CuPy) implementations are available for many routines to accelerate quantum chemistry workflows.
Included modules and functionality:
- One-electron integrals:
- Overlap integrals
- Kinetic energy integrals
- Nuclear attraction integrals
- Dipole moment integrals
- Gradients of one-electron integrals
- Two-electron integrals:
- Full 4-center two electron repulsion integrals using Rys quadrature
- 3-center and 2-center two electron integrals for density fitting
- Schwarz screening utilities for integral pruning
- Exchange-correlation evaluation routines compatible with numerical grids
- GPU-accelerated (CuPy) versions for key performance-critical routines
- Modular helper functions for computing factorials, boys functions, contraction coefficients, etc.
Usage:
Example: from pyfock.Integrals import overlap_mat_symm S = overlap_mat_symm(basis)
1""" 2This submodule provides highly optimized and symmetry-aware routines for 3evaluating all essential one-electron and two-electron integrals needed in 4electronic structure calculations. The integrals are computed over Gaussian 5type orbitals (GTOs), and both CPU (NumPy) and GPU (CuPy) implementations are 6available for many routines to accelerate quantum chemistry workflows. 7 8Included modules and functionality: 9----------------------------------- 10- One-electron integrals: 11 * Overlap integrals 12 * Kinetic energy integrals 13 * Nuclear attraction integrals 14 * Dipole moment integrals 15 * Gradients of one-electron integrals 16- Two-electron integrals: 17 * Full 4-center two electron repulsion integrals using Rys quadrature 18 * 3-center and 2-center two electron integrals for density fitting 19 * Schwarz screening utilities for integral pruning 20- Exchange-correlation evaluation routines compatible with numerical grids 21- GPU-accelerated (CuPy) versions for key performance-critical routines 22- Modular helper functions for computing factorials, boys functions, contraction coefficients, etc. 23 24Usage: 25------ 26 27Example: 28 from pyfock.Integrals import overlap_mat_symm 29 S = overlap_mat_symm(basis) 30 31""" 32# from .integral_helpers import fac 33# from .integral_helpers import fastFactorial 34# from .integral_helpers import comb 35# from .integral_helpers import doublefactorial 36# from .integral_helpers import c2k 37# from .integral_helpers import calcS 38# from .integral_helpers import vlriPartial 39# from .integral_helpers import calcCgamminc 40# from .integral_helpers import Fboys 41from .mmd_nuc_mat_symm import mmd_nuc_mat_symm 42from .nuc_mat_symm import nuc_mat_symm 43from .nuc_mat_symm_cupy import nuc_mat_symm_cupy 44from .kin_mat_symm import kin_mat_symm 45from .kin_mat_symm_cupy import kin_mat_symm_cupy 46from .kin_mat_symm_shell_cupy import kin_mat_symm_shell_cupy 47from .overlap_mat_symm import overlap_mat_symm 48from .overlap_mat_symm_cupy import overlap_mat_symm_cupy 49from .cross_overlap_mat_symm import cross_overlap_mat_symm 50from .dipole_moment_mat_symm import dipole_moment_mat_symm 51from .dipole_moment_mat_symm_cupy import dipole_moment_mat_symm_cupy 52from .conv_4c2e_symm import conv_4c2e_symm 53from .mmd_4c2e_symm import mmd_4c2e_symm 54from .rys_4c2e_symm import rys_4c2e_symm, rys_4c2e_symm_old 55from .conv_3c2e_symm import conv_3c2e_symm 56from .rys_3c2e_symm import rys_3c2e_symm 57from .rys_3c2e_symm_cupy import rys_3c2e_symm_cupy 58from .rys_3c2e_symm_cupy_fp32 import rys_3c2e_symm_cupy_fp32 59from .conv_2c2e_symm import conv_2c2e_symm 60from .rys_2c2e_symm import rys_2c2e_symm 61from .rys_2c2e_symm_cupy import rys_2c2e_symm_cupy 62from .rys_3c2e_tri import rys_3c2e_tri 63from .rys_nuc_mat_symm import rys_nuc_mat_symm 64from .schwarz_helpers import rys_3c2e_tri_schwarz 65from .eval_xc_1 import eval_xc_1 66from .eval_xc_2 import eval_xc_2 67from .eval_xc_3 import eval_xc_3 68from .eval_xc_1_cupy import eval_xc_1_cupy 69from .eval_xc_2_cupy import eval_xc_2_cupy 70from .eval_xc_3_cupy import eval_xc_3_cupy 71from . import bf_val_helpers 72from . import rys_helpers 73from . import schwarz_helpers 74from . import schwarz_helpers_cupy 75from . import integral_helpers 76from .overlap_mat_grad_symm import overlap_mat_grad_symm 77from .kin_mat_grad_symm import kin_mat_grad_symm 78from .nuc_mat_grad_symm import nuc_mat_grad_symm 79 80# __all__ = ['fac', 'fastFactorial', 'comb', 'doublefactorial', 'c2k', 'calcS', 'vlriPartial', 'calcCgamminc', 'Fboys', 'comb'\ 81# , 'comb', 'comb', 'comb', 'comb', 'comb'] 82 83__all__ = ['integral_helpers', 'mmd_nuc_mat_symm', 'nuc_mat_symm', 'kin_mat_symm', 'overlap_mat_symm', 'conv_4c2e_symm', 'mmd_4c2e_symm', 'rys_helpers', 'rys_4c2e_symm',\ 84 'rys_4c2e_symm_old', 'conv_3c2e_symm', 'rys_3c2e_symm', 'conv_2c2e_symm', 'rys_2c2e_symm', 'rys_3c2e_tri', 'rys_nuc_mat_symm', 'schwarz_helpers', 'rys_3c2e_tri_schwarz'\ 85 'bf_val_helpers', 'eval_xc_1', 'eval_xc_2', 'eval_xc_3', 'eval_xc_1_cupy', 'eval_xc_2_cupy', 'eval_xc_3_cupy', 'dipole_moment_mat_symm', 'kin_mat_symm_cupy'\ 86 , 'overlap_mat_symm_cupy', 'dipole_moment_mat_symm_cupy', 'nuc_mat_symm_cupy', 'kin_mat_symm_shell_cupy', 'rys_2c2e_symm_cupy', 'rys_3c2e_symm_cupy',\ 87 'rys_3c2e_symm_cupy_fp32', 'schwarz_helpers_cupy', 'overlap_mat_grad_symm', 'kin_mat_grad_symm', 'nuc_mat_grad_symm', 'cross_overlap_mat_symm'] 88 89# # This code will import all of the modules in the library directory and expose all of 90# # the functions in those modules to the user. The user can then use the functions just 91# # like they would any other function in the library package: 92 93# # Keep in mind that this code will expose all functions in the library directory, 94# # including any private functions (i.e. functions that start with an underscore). 95# # If you only want to expose certain functions, 96# # you can modify the __all__ list to include only the functions that you want to expose. 97 98# import glob 99 100# # Get a list of all module filenames in the library directory 101# # module_filenames = glob.glob('pyfock/Integrals/*.py') 102# module_filenames = glob.glob('*.py') 103 104# print(module_filenames) 105 106# # Remove the `__init__.py` file from the list 107# # module_filenames.remove('pyfock/Integrals/__init__.py') 108# module_filenames.remove('__init__.py') 109 110# # Use a for loop to import all of the modules in the Integrals directory 111# for module_filename in module_filenames: 112# # module_filename = module_filename.replace("pyfock/Integrals/", "") 113# module_name = module_filename[:-3] 114# exec(f'from .{module_name} import *') 115 116# # Expose all of the functions in the __all__ variable 117# __all__ = [ 118# func for func in dir() 119# if func[0] != '_' and callable(eval(func)) 120# ]
7def mmd_nuc_mat_symm(basis, mol, slice=None): 8 #Here the lists are converted to numpy arrays for better use with Numba. 9 #Once these conversions are done we pass these to a Numba decorated 10 #function that uses prange, etc. to calculate the matrix efficiently. 11 12 # This function calculates the nuclear matrix for a given basis object. 13 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 14 # It is possible to only calculate a slice (block) of the complete matrix. 15 # slice is a 4 element list whose first and second elements give the range of the rows to be calculated. 16 # the third and fourth element give the range of columns to be calculated. 17 # slice = [start_row, end_row, start_col, end_col] 18 # The integrals are performed using the formulas 19 20 #We convert the required properties to numpy arrays as this is what Numba likes. 21 bfs_coords = np.array([basis.bfs_coords]) 22 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 23 bfs_lmn = np.array([basis.bfs_lmn]) 24 bfs_nprim = np.array([basis.bfs_nprim]) 25 coordsBohrs = np.array([mol.coordsBohrs]) 26 Z = np.array([mol.Zcharges]) 27 natoms = mol.natoms 28 29 30 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 31 #Numba won't be able to work with these efficiently. 32 #So, we convert them to a numpy 2d array by applying a trick, 33 #that the second dimension is that of the largest list. So that 34 #it can accomadate all the lists. 35 maxnprim = max(basis.bfs_nprim) 36 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 37 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 38 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 39 for i in range(basis.bfs_nao): 40 for j in range(basis.bfs_nprim[i]): 41 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 42 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 43 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 44 45 46 47 if slice is None: 48 slice = [0, basis.bfs_nao, 0, basis.bfs_nao] 49 50 #Limits for the calculation of overlap integrals 51 a = int(slice[0]) #row start index 52 b = int(slice[1]) #row end index 53 c = int(slice[2]) #column start index 54 d = int(slice[3]) #column end index 55 56 # print([a,b,c,d]) 57 58 V = mmd_nuc_mat_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d, Z[0], coordsBohrs[0], natoms) 59 60 return V
7def nuc_mat_symm(basis, mol, slice=None, sqrt_ints4c2e_diag=None): 8 """ 9 Compute the nuclear attraction matrix for a given basis and molecular geometry. 10 11 This function evaluates the one-electron nuclear attraction integrals 12 ⟨χ_i | V_nuc | χ_j⟩, where V_nuc is the Coulombic potential from the atomic nuclei. 13 The integrals are computed between all basis functions defined in `basis` and 14 and the nuclei within the `mol` object. 15 So in principle, it can be used to compute the nuclear potential matrix due to the 16 nuclei in one molecule, in the basis of another molecule. 17 18 Efficient Numba-accelerated routines are used to improve performance. 19 A partial block of the matrix can be computed by specifying a `slice`. 20 21 Parameters 22 ---------- 23 basis : object 24 Basis set object with fields including: 25 - bfs_coords: Cartesian coordinates of basis function centers. 26 - bfs_coeffs: Contraction coefficients. 27 - bfs_expnts: Gaussian exponents. 28 - bfs_prim_norms: Normalization constants for primitives. 29 - bfs_contr_prim_norms: Normalization constants for contracted functions. 30 - bfs_lmn: Angular momentum quantum numbers (ℓ, m, n). 31 - bfs_nprim: Number of primitives per basis function. 32 - bfs_nao: Number of atomic orbitals (contracted basis functions). 33 34 mol : object 35 Molecule object containing: 36 - coordsBohrs: Cartesian coordinates of nuclei (in Bohr). 37 - Zcharges: Nuclear charges. 38 - natoms: Number of atoms. 39 40 slice : list of int, optional 41 A 4-element list `[start_row, end_row, start_col, end_col]` defining 42 the sub-block of the matrix to compute. Rows and columns refer to AOs. 43 If `None` (default), the entire nuclear attraction matrix is computed. 44 45 sqrt_ints4c2e_diag : ndarray, optional 46 Optional Schwarz screening preconditioner. 47 If provided, Schwarz screening is enabled to skip insignificant integrals. 48 If `None` (default), all integrals are computed without screening. 49 50 Returns 51 ------- 52 V : ndarray of shape (end_row - start_row, end_col - start_col) 53 The computed (sub)matrix of nuclear attraction integrals. 54 55 Notes 56 ----- 57 The integrals are evaluated using standard expressions over contracted Gaussians. 58 If Schwarz screening is used (`sqrt_ints4c2e_diag` is not None), a threshold-based 59 pruning is applied using precomputed upper bounds. 60 61 Examples 62 -------- 63 >>> V = nuc_mat_symm(basis, mol) 64 >>> V_block = nuc_mat_symm(basis, mol, slice=[0, 10, 0, 10]) 65 >>> V_schwarz = nuc_mat_symm(basis, mol, sqrt_ints4c2e_diag=sqrt_diag) 66 """ 67 #Here the lists are converted to numpy arrays for better use with Numba. 68 #Once these conversions are done we pass these to a Numba decorated 69 #function that uses prange, etc. to calculate the matrix efficiently. 70 71 # This function calculates the nuclear matrix for a given basis object. 72 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 73 # It is possible to only calculate a slice (block) of the complete matrix. 74 # slice is a 4 element list whose first and second elements give the range of the rows to be calculated. 75 # the third and fourth element give the range of columns to be calculated. 76 # slice = [start_row, end_row, start_col, end_col] 77 # The integrals are performed using the formulas 78 79 #We convert the required properties to numpy arrays as this is what Numba likes. 80 bfs_coords = np.array([basis.bfs_coords]) 81 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 82 bfs_lmn = np.array([basis.bfs_lmn]) 83 bfs_nprim = np.array([basis.bfs_nprim]) 84 coordsBohrs = np.array([mol.coordsBohrs]) 85 Z = np.array([mol.Zcharges]) 86 natoms = mol.natoms 87 88 89 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 90 #Numba won't be able to work with these efficiently. 91 #So, we convert them to a numpy 2d array by applying a trick, 92 #that the second dimension is that of the largest list. So that 93 #it can accomadate all the lists. 94 maxnprim = max(basis.bfs_nprim) 95 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 96 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 97 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 98 for i in range(basis.bfs_nao): 99 for j in range(basis.bfs_nprim[i]): 100 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 101 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 102 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 103 104 105 106 if slice is None: 107 slice = [0, basis.bfs_nao, 0, basis.bfs_nao] 108 109 #Limits for the calculation of overlap integrals 110 a = int(slice[0]) #row start index 111 b = int(slice[1]) #row end index 112 c = int(slice[2]) #column start index 113 d = int(slice[3]) #column end index 114 115 # print([a,b,c,d]) 116 117 if sqrt_ints4c2e_diag is None: 118 #Create dummy array 119 sqrt_ints4c2e_diag = np.zeros((1,1), dtype=np.float64) 120 isSchwarz = False 121 else: 122 isSchwarz = True 123 124 V = nuc_mat_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d, Z[0], coordsBohrs[0], natoms, sqrt_ints4c2e_diag, isSchwarz) 125 126 return V
Compute the nuclear attraction matrix for a given basis and molecular geometry.
This function evaluates the one-electron nuclear attraction integrals
⟨χ_i | V_nuc | χ_j⟩, where V_nuc is the Coulombic potential from the atomic nuclei.
The integrals are computed between all basis functions defined in basis and
and the nuclei within the mol object.
So in principle, it can be used to compute the nuclear potential matrix due to the
nuclei in one molecule, in the basis of another molecule.
Efficient Numba-accelerated routines are used to improve performance.
A partial block of the matrix can be computed by specifying a slice.
Parameters
basis : object Basis set object with fields including: - bfs_coords: Cartesian coordinates of basis function centers. - bfs_coeffs: Contraction coefficients. - bfs_expnts: Gaussian exponents. - bfs_prim_norms: Normalization constants for primitives. - bfs_contr_prim_norms: Normalization constants for contracted functions. - bfs_lmn: Angular momentum quantum numbers (ℓ, m, n). - bfs_nprim: Number of primitives per basis function. - bfs_nao: Number of atomic orbitals (contracted basis functions).
mol : object Molecule object containing: - coordsBohrs: Cartesian coordinates of nuclei (in Bohr). - Zcharges: Nuclear charges. - natoms: Number of atoms.
slice : list of int, optional
A 4-element list [start_row, end_row, start_col, end_col] defining
the sub-block of the matrix to compute. Rows and columns refer to AOs.
If None (default), the entire nuclear attraction matrix is computed.
sqrt_ints4c2e_diag : ndarray, optional
Optional Schwarz screening preconditioner.
If provided, Schwarz screening is enabled to skip insignificant integrals.
If None (default), all integrals are computed without screening.
Returns
V : ndarray of shape (end_row - start_row, end_col - start_col) The computed (sub)matrix of nuclear attraction integrals.
Notes
The integrals are evaluated using standard expressions over contracted Gaussians.
If Schwarz screening is used (sqrt_ints4c2e_diag is not None), a threshold-based
pruning is applied using precomputed upper bounds.
Examples
>>> V = nuc_mat_symm(basis, mol)
>>> V_block = nuc_mat_symm(basis, mol, slice=[0, 10, 0, 10])
>>> V_schwarz = nuc_mat_symm(basis, mol, sqrt_ints4c2e_diag=sqrt_diag)
6def kin_mat_symm(basis, slice=None): 7 """ 8 Compute the kinetic energy matrix for a given basis set. 9 10 This function evaluates the one-electron kinetic energy integrals 11 ⟨χ_i | -½∇² | χ_j⟩ for a set of contracted Gaussian basis functions defined in `basis`. 12 It uses an optimized backend with improved memory layout, early termination, 13 and partial vectorization for enhanced performance. 14 15 Block-wise computation is supported through the optional `slice` parameter. 16 17 Parameters 18 ---------- 19 basis : object 20 Basis set object containing properties such as: 21 - bfs_coords: Cartesian coordinates of basis function centers. 22 - bfs_coeffs: Contraction coefficients. 23 - bfs_expnts: Gaussian exponents. 24 - bfs_prim_norms: Normalization constants for primitives. 25 - bfs_contr_prim_norms: Normalization constants for contracted functions. 26 - bfs_lmn: Angular momentum quantum numbers (ℓ, m, n). 27 - bfs_nprim: Number of primitives per basis function. 28 - bfs_nao: Number of atomic orbitals (contracted basis functions). 29 30 slice : list of int, optional 31 A 4-element list `[start_row, end_row, start_col, end_col]` specifying the 32 matrix block to compute. Rows and columns refer to AOs. If `None` (default), 33 the entire kinetic energy matrix is calculated. 34 35 Returns 36 ------- 37 T : ndarray of shape (end_row - start_row, end_col - start_col) 38 The computed (sub)matrix of kinetic energy integrals. 39 40 Notes 41 ----- 42 Internally, this function: 43 - Pads coefficient/exponent arrays for uniform shape and Numba compatibility. 44 - Reduces redundant operations and loops. 45 - Utilizes an optimized `kin_mat_symm_internal_optimized` backend. 46 47 Examples 48 -------- 49 >>> T = kin_mat_symm(basis) 50 >>> T_block = kin_mat_symm(basis, slice=[0, 10, 0, 10]) 51 """ 52 53 # Convert to numpy arrays (same as before) 54 bfs_coords = np.array(basis.bfs_coords) # Remove unnecessary list wrapping 55 bfs_contr_prim_norms = np.array(basis.bfs_contr_prim_norms) 56 bfs_lmn = np.array(basis.bfs_lmn) 57 bfs_nprim = np.array(basis.bfs_nprim) 58 59 # Optimize the coefficient/exponent arrays 60 maxnprim = max(basis.bfs_nprim) 61 bfs_coeffs = np.zeros((basis.bfs_nao, maxnprim)) 62 bfs_expnts = np.zeros((basis.bfs_nao, maxnprim)) 63 bfs_prim_norms = np.zeros((basis.bfs_nao, maxnprim)) 64 65 # Vectorized assignment (faster than nested loops) 66 for i in range(basis.bfs_nao): 67 n = basis.bfs_nprim[i] 68 bfs_coeffs[i, :n] = basis.bfs_coeffs[i] 69 bfs_expnts[i, :n] = basis.bfs_expnts[i] 70 bfs_prim_norms[i, :n] = basis.bfs_prim_norms[i] 71 72 if slice is None: 73 slice = [0, basis.bfs_nao, 0, basis.bfs_nao] 74 75 a, b, c, d = map(int, slice) 76 77 T = kin_mat_symm_internal_optimized( 78 bfs_coords, bfs_contr_prim_norms, bfs_lmn, bfs_nprim, 79 bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d 80 ) 81 82 return T
Compute the kinetic energy matrix for a given basis set.
This function evaluates the one-electron kinetic energy integrals
⟨χ_i | -½∇² | χ_j⟩ for a set of contracted Gaussian basis functions defined in basis.
It uses an optimized backend with improved memory layout, early termination,
and partial vectorization for enhanced performance.
Block-wise computation is supported through the optional slice parameter.
Parameters
basis : object Basis set object containing properties such as: - bfs_coords: Cartesian coordinates of basis function centers. - bfs_coeffs: Contraction coefficients. - bfs_expnts: Gaussian exponents. - bfs_prim_norms: Normalization constants for primitives. - bfs_contr_prim_norms: Normalization constants for contracted functions. - bfs_lmn: Angular momentum quantum numbers (ℓ, m, n). - bfs_nprim: Number of primitives per basis function. - bfs_nao: Number of atomic orbitals (contracted basis functions).
slice : list of int, optional
A 4-element list [start_row, end_row, start_col, end_col] specifying the
matrix block to compute. Rows and columns refer to AOs. If None (default),
the entire kinetic energy matrix is calculated.
Returns
T : ndarray of shape (end_row - start_row, end_col - start_col) The computed (sub)matrix of kinetic energy integrals.
Notes
Internally, this function:
- Pads coefficient/exponent arrays for uniform shape and Numba compatibility.
- Reduces redundant operations and loops.
- Utilizes an optimized
kin_mat_symm_internal_optimizedbackend.
Examples
>>> T = kin_mat_symm(basis)
>>> T_block = kin_mat_symm(basis, slice=[0, 10, 0, 10])
7def overlap_mat_symm(basis, slice=None): 8 """ 9 Compute the overlap matrix for a given basis set using symmetry-aware integrals. 10 11 This function evaluates the overlap integrals ⟨χ_i | χ_j⟩ between all pairs 12 of basis functions defined in the `basis` object. It supports partial evaluation 13 of the overlap matrix by specifying a slice. 14 15 The integrals are computed using an efficient Numba-accelerated backend that 16 benefits from parallelization via `prange` and preprocessed NumPy arrays. 17 18 Parameters 19 ---------- 20 basis : object 21 A basis set object that contains information about basis functions, such as: 22 - bfs_coords: Cartesian coordinates of the basis function centers. 23 - bfs_coeffs: Contraction coefficients. 24 - bfs_expnts: Gaussian exponents. 25 - bfs_prim_norms: Primitive normalization constants. 26 - bfs_contr_prim_norms: Contraction normalization factors. 27 - bfs_lmn: Angular momentum quantum numbers (ℓ, m, n). 28 - bfs_nprim: Number of primitives per basis function. 29 - bfs_nao: Total number of atomic orbitals. 30 31 slice : list of int, optional 32 A 4-element list specifying a sub-block of the matrix to compute: 33 [start_row, end_row, start_col, end_col]. 34 If None (default), the full overlap matrix is computed. 35 36 Returns 37 ------- 38 S : ndarray of shape (end_row - start_row, end_col - start_col) 39 The computed (sub)matrix of overlap integrals. 40 41 Notes 42 ----- 43 This function is optimized for performance using preallocated NumPy arrays 44 and avoids nested Python lists which are not supported efficiently by Numba. 45 46 Examples 47 -------- 48 >>> S = overlap_mat_symm(basis) 49 >>> S_block = overlap_mat_symm(basis, slice=[0, 10, 0, 10]) 50 """ 51 # Here the lists are converted to numpy arrays for better use with Numba. 52 # Once these conversions are done we pass these to a Numba decorated 53 # function that uses prange, etc. to calculate the matrix efficiently. 54 55 # This function calculates the overlap matrix for a given basis object. 56 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 57 # It is possible to only calculate a slice (block) of the complete matrix. 58 # slice is a 4 element list whose first and second elements give the range of the rows to be calculated. 59 # the third and fourth element give the range of columns to be calculated. 60 # slice = [start_row, end_row, start_col, end_col] 61 # The integrals are performed using the formulas 62 63 #We convert the required properties to numpy arrays as this is what Numba likes. 64 bfs_coords = np.array([basis.bfs_coords]) 65 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 66 bfs_lmn = np.array([basis.bfs_lmn]) 67 bfs_nprim = np.array([basis.bfs_nprim]) 68 69 70 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 71 #Numba won't be able to work with these efficiently. 72 #So, we convert them to a numpy 2d array by applying a trick, 73 #that the second dimension is that of the largest list. So that 74 #it can accomadate all the lists. 75 maxnprim = max(basis.bfs_nprim) 76 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 77 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 78 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 79 for i in range(basis.bfs_nao): 80 for j in range(basis.bfs_nprim[i]): 81 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 82 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 83 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 84 85 86 if slice is None: 87 slice = [0, basis.bfs_nao, 0, basis.bfs_nao] 88 89 #Limits for the calculation of overlap integrals 90 a = int(slice[0]) #row start index 91 b = int(slice[1]) #row end index 92 c = int(slice[2]) #column start index 93 d = int(slice[3]) #column end index 94 95 S = overlap_mat_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d) 96 return S
Compute the overlap matrix for a given basis set using symmetry-aware integrals.
This function evaluates the overlap integrals ⟨χ_i | χ_j⟩ between all pairs
of basis functions defined in the basis object. It supports partial evaluation
of the overlap matrix by specifying a slice.
The integrals are computed using an efficient Numba-accelerated backend that
benefits from parallelization via prange and preprocessed NumPy arrays.
Parameters
basis : object A basis set object that contains information about basis functions, such as: - bfs_coords: Cartesian coordinates of the basis function centers. - bfs_coeffs: Contraction coefficients. - bfs_expnts: Gaussian exponents. - bfs_prim_norms: Primitive normalization constants. - bfs_contr_prim_norms: Contraction normalization factors. - bfs_lmn: Angular momentum quantum numbers (ℓ, m, n). - bfs_nprim: Number of primitives per basis function. - bfs_nao: Total number of atomic orbitals.
slice : list of int, optional A 4-element list specifying a sub-block of the matrix to compute: [start_row, end_row, start_col, end_col]. If None (default), the full overlap matrix is computed.
Returns
S : ndarray of shape (end_row - start_row, end_col - start_col) The computed (sub)matrix of overlap integrals.
Notes
This function is optimized for performance using preallocated NumPy arrays and avoids nested Python lists which are not supported efficiently by Numba.
Examples
>>> S = overlap_mat_symm(basis)
>>> S_block = overlap_mat_symm(basis, slice=[0, 10, 0, 10])
7def conv_4c2e_symm(basis, slice=None): 8 """ 9 Compute four-center two-electron (4c2e) electron repulsion integrals (ERIs) 10 using the conventional and slow (analytical) formula-based method with full symmetry exploitation. 11 12 This function evaluates integrals of the form (A B | C D), where 13 A, B, C, D are basis functions from the same primary basis set. 14 15 The "conv" variant uses explicit analytical integral formulas and 16 nested loops over primitive Gaussians, following the derivations in: 17 18 J. Chem. Educ. 2018, 95, 9, 1572–1578 19 https://pubs.acs.org/doi/full/10.1021/acs.jchemed.8b00255 20 21 Compared to Rys quadrature, this conventional approach is much more 22 computationally expensive but exact for the given formula set, making 23 it useful for validation and special-purpose computations. 24 25 Symmetries exploited 26 -------------------- 27 For 4c2e integrals, the full 8-fold permutational symmetry is used: 28 29 (A B | C D) = (B A | C D) = (A B | D C) = (B A | D C) 30 = (C D | A B) = (D C | A B) = (C D | B A) = (D C | B A) 31 32 This reduces the number of independent integrals from: 33 N_bf^4 → N_bf*(N_bf+1)/2 * N_bf*(N_bf+1)/2 34 when computing the full tensor. 35 36 Parameters 37 ---------- 38 basis : object 39 Basis set object containing: 40 - bfs_coords : Cartesian coordinates of basis function centers 41 - bfs_coeffs : Contraction coefficients 42 - bfs_expnts : Gaussian exponents 43 - bfs_prim_norms : Primitive normalization constants 44 - bfs_contr_prim_norms : Contraction normalization factors 45 - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n) 46 - bfs_nprim : Number of primitives per basis function 47 - bfs_nao : Total number of atomic orbitals 48 49 slice : list of int, optional 50 An 8-element list specifying a sub-block of integrals to compute: 51 [start_A, end_A, start_B, end_B, start_C, end_C, start_D, end_D] 52 If None (default), computes the full (Nbf, Nbf, Nbf, Nbf) tensor. 53 54 **Note:** When slices are used, symmetry exploitation is restricted 55 to permutations that lie entirely within the specified slice. 56 57 Returns 58 ------- 59 ints4c2e : ndarray 60 The computed 4-center 2-electron integrals for the requested range. 61 62 Notes 63 ----- 64 - All basis set data are pre-packed into NumPy arrays for Numba acceleration. 65 - Uses explicit primitive Gaussian formula evaluation without numerical quadrature. 66 - Symmetry exploitation is maximal only for full-range computations. 67 - Conventional evaluation scales as O(N_bf^4) without symmetry, but 68 symmetry reduces the number of computations drastically. 69 70 Examples 71 -------- 72 >>> eri_full = conv_4c2e_symm(basis) 73 >>> eri_block = conv_4c2e_symm(basis, slice=[0,5, 0,5, 0,5, 0,5]) 74 """ 75 # Here the lists are converted to numpy arrays for better use with Numba. 76 # Once these conversions are done we pass these to a Numba decorated 77 # function that uses prange, etc. to calculate the 4c2e integrals efficiently. 78 # This function calculates the 4c2e electron-electron ERIs for a given basis object. 79 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 80 81 # The integrals are performed using the formulas https://pubs.acs.org/doi/full/10.1021/acs.jchemed.8b00255 82 83 # It is possible to only calculate a slice (block/subset) of the complete set of integrals. 84 # slice is an 8 element list whose first and second elements give the range of the A functions to be calculated. 85 # and so on. 86 87 #We convert the required properties to numpy arrays as this is what Numba likes. 88 bfs_coords = np.array([basis.bfs_coords]) 89 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 90 bfs_lmn = np.array([basis.bfs_lmn]) 91 bfs_nprim = np.array([basis.bfs_nprim]) 92 93 94 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 95 #Numba won't be able to work with these efficiently. 96 #So, we convert them to a numpy 2d array by applying a trick, 97 #that the second dimension is that of the largest list. So that 98 #it can accomadate all the lists. 99 maxnprim = max(basis.bfs_nprim) 100 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 101 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 102 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 103 for i in range(basis.bfs_nao): 104 for j in range(basis.bfs_nprim[i]): 105 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 106 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 107 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 108 109 110 111 112 if slice is None: 113 slice = [0, basis.bfs_nao, 0, basis.bfs_nao, 0, basis.bfs_nao, 0, basis.bfs_nao] 114 115 #Limits for the calculation of 4c2e integrals 116 indx_startA = int(slice[0]) 117 indx_endA = int(slice[1]) 118 indx_startB = int(slice[2]) 119 indx_endB = int(slice[3]) 120 indx_startC = int(slice[4]) 121 indx_endC = int(slice[5]) 122 indx_startD = int(slice[6]) 123 indx_endD = int(slice[7]) 124 125 ints4c2e = conv_4c2e_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts,\ 126 indx_startA,indx_endA, indx_startB, indx_endB, indx_startC, indx_endC, indx_startD,indx_endD) 127 128 return ints4c2e
Compute four-center two-electron (4c2e) electron repulsion integrals (ERIs) using the conventional and slow (analytical) formula-based method with full symmetry exploitation.
This function evaluates integrals of the form (A B | C D), where A, B, C, D are basis functions from the same primary basis set.
The "conv" variant uses explicit analytical integral formulas and nested loops over primitive Gaussians, following the derivations in:
J. Chem. Educ. 2018, 95, 9, 1572–1578
https://pubs.acs.org/doi/full/10.1021/acs.jchemed.8b00255
Compared to Rys quadrature, this conventional approach is much more computationally expensive but exact for the given formula set, making it useful for validation and special-purpose computations.
Symmetries exploited
For 4c2e integrals, the full 8-fold permutational symmetry is used:
(A B | C D) = (B A | C D) = (A B | D C) = (B A | D C)
= (C D | A B) = (D C | A B) = (C D | B A) = (D C | B A)
This reduces the number of independent integrals from: N_bf^4 → N_bf(N_bf+1)/2 * N_bf(N_bf+1)/2 when computing the full tensor.
Parameters
basis : object Basis set object containing: - bfs_coords : Cartesian coordinates of basis function centers - bfs_coeffs : Contraction coefficients - bfs_expnts : Gaussian exponents - bfs_prim_norms : Primitive normalization constants - bfs_contr_prim_norms : Contraction normalization factors - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n) - bfs_nprim : Number of primitives per basis function - bfs_nao : Total number of atomic orbitals
slice : list of int, optional An 8-element list specifying a sub-block of integrals to compute: [start_A, end_A, start_B, end_B, start_C, end_C, start_D, end_D] If None (default), computes the full (Nbf, Nbf, Nbf, Nbf) tensor.
**Note:** When slices are used, symmetry exploitation is restricted
to permutations that lie entirely within the specified slice.
Returns
ints4c2e : ndarray The computed 4-center 2-electron integrals for the requested range.
Notes
- All basis set data are pre-packed into NumPy arrays for Numba acceleration.
- Uses explicit primitive Gaussian formula evaluation without numerical quadrature.
- Symmetry exploitation is maximal only for full-range computations.
- Conventional evaluation scales as O(N_bf^4) without symmetry, but symmetry reduces the number of computations drastically.
Examples
>>> eri_full = conv_4c2e_symm(basis)
>>> eri_block = conv_4c2e_symm(basis, slice=[0,5, 0,5, 0,5, 0,5])
7def mmd_4c2e_symm(basis, slice=None): 8 # Here the lists are converted to numpy arrays for better use with Numba. 9 # Once these conversions are done we pass these to a Numba decorated 10 # function that uses prange, etc. to calculate the 4c2e integrals efficiently. 11 # This function calculates the 4c2e electron-electron ERIs for a given basis object. 12 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 13 14 # The integrals are performed using the formulas https://pubs.acs.org/doi/full/10.1021/acs.jchemed.8b00255 15 16 # It is possible to only calculate a slice (block/subset) of the complete set of integrals. 17 # slice is an 8 element list whose first and second elements give the range of the A functions to be calculated. 18 # and so on. 19 20 #We convert the required properties to numpy arrays as this is what Numba likes. 21 bfs_coords = np.array([basis.bfs_coords]) 22 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 23 bfs_lmn = np.array([basis.bfs_lmn]) 24 bfs_nprim = np.array([basis.bfs_nprim]) 25 26 27 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 28 #Numba won't be able to work with these efficiently. 29 #So, we convert them to a numpy 2d array by applying a trick, 30 #that the second dimension is that of the largest list. So that 31 #it can accomadate all the lists. 32 maxnprim = max(basis.bfs_nprim) 33 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 34 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 35 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 36 for i in range(basis.bfs_nao): 37 for j in range(basis.bfs_nprim[i]): 38 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 39 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 40 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 41 42 43 44 45 if slice is None: 46 slice = [0, basis.bfs_nao, 0, basis.bfs_nao, 0, basis.bfs_nao, 0, basis.bfs_nao] 47 48 #Limits for the calculation of 4c2e integrals 49 indx_startA = int(slice[0]) 50 indx_endA = int(slice[1]) 51 indx_startB = int(slice[2]) 52 indx_endB = int(slice[3]) 53 indx_startC = int(slice[4]) 54 indx_endC = int(slice[5]) 55 indx_startD = int(slice[6]) 56 indx_endD = int(slice[7]) 57 58 ints4c2e = mmd_4c2e_symm_internal2(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts,\ 59 indx_startA,indx_endA, indx_startB, indx_endB, indx_startC, indx_endC, indx_startD,indx_endD) 60 61 return ints4c2e
8def rys_4c2e_symm(basis, slice=None): 9 """ 10 Compute four-center two-electron (4c2e) electron repulsion integrals (ERIs) 11 using the Rys quadrature method with exploitation of 8-fold permutational 12 symmetry. 13 14 This function evaluates integrals of the form (A B | C D), where A, B, C, D 15 are basis functions from the same primary basis set. It uses Numba-accelerated 16 backends and symmetry-aware optimizations to reduce computational cost. 17 18 Symmetries exploited 19 -------------------- 20 The 4c2e integrals obey the following permutational symmetry relations: 21 22 (A B | C D) = (B A | C D) # exchange within bra 23 = (A B | D C) # exchange within ket 24 = (B A | D C) 25 = (C D | A B) # bra ↔ ket exchange 26 = (D C | A B) 27 = (C D | B A) 28 = (D C | B A) 29 30 These 8 equivalent permutations mean that only a subset of integrals 31 needs to be explicitly computed; the rest can be obtained by symmetry. 32 33 This reduces the number of independent integrals from: 34 N_bf^4 → N_bf*(N_bf+1)/2 * N_bf*(N_bf+1)/2 35 when computing the full tensor. 36 37 Parameters 38 ---------- 39 basis : object 40 Primary basis set object containing: 41 - bfs_coords : Cartesian coordinates of basis function centers. 42 - bfs_coeffs : Contraction coefficients. 43 - bfs_expnts : Gaussian exponents. 44 - bfs_prim_norms : Primitive normalization constants. 45 - bfs_contr_prim_norms : Contraction normalization factors. 46 - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n). 47 - bfs_nprim : Number of primitives per basis function. 48 - bfs_shell_index : Index of the shell each basis function belongs to. 49 - bfs_nao : Total number of atomic orbitals. 50 51 slice : list of int, optional 52 An 8-element list specifying a sub-block of integrals to compute: 53 [start_A, end_A, start_B, end_B, start_C, end_C, start_D, end_D] 54 If None (default), computes the full (Nbf, Nbf, Nbf, Nbf) block. 55 56 **Note:** When slices are used, not all 8-fold symmetries may be 57 available because the requested block may not contain all 58 symmetric permutations. In such cases, only the applicable 59 symmetries are used. 60 61 Returns 62 ------- 63 ints4c2e : ndarray 64 The computed 4-center 2-electron integrals for the requested range. 65 66 Notes 67 ----- 68 - Precomputes and stores basis set data in NumPy arrays for Numba efficiency. 69 - Exploits all possible symmetry permutations when the full tensor is 70 computed (no slice) to reduce redundant work. 71 - If a slice is specified, symmetry exploitation is limited to the 72 permutations that fall within the slice's index ranges. 73 74 Examples 75 -------- 76 >>> eri_full = rys_4c2e_symm(basis) 77 >>> eri_block = rys_4c2e_symm(basis, slice=[0,5, 0,5, 0,5, 0,5]) 78 """ 79 # Here the lists are converted to numpy arrays for better use with Numba. 80 # Once these conversions are done we pass these to a Numba decorated 81 # function that uses prange, etc. to calculate the 4c2e integrals efficiently. 82 # This function calculates the 4c2e electron-electron ERIs for a given basis object. 83 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 84 85 # It is possible to only calculate a slice (block/subset) of the complete set of integrals. 86 # slice is an 8 element list whose first and second elements give the range of the A functions to be calculated. 87 # and so on. 88 89 #We convert the required properties to numpy arrays as this is what Numba likes. 90 bfs_coords = np.array([basis.bfs_coords]) 91 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 92 bfs_lmn = np.array([basis.bfs_lmn]) 93 bfs_nprim = np.array([basis.bfs_nprim]) 94 95 96 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 97 #Numba won't be able to work with these efficiently. 98 #So, we convert them to a numpy 2d array by applying a trick, 99 #that the second dimension is that of the largest list. So that 100 #it can accomadate all the lists. 101 maxnprim = max(basis.bfs_nprim) 102 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 103 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 104 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 105 shell_indices = np.array([basis.bfs_shell_index], dtype=np.uint16)[0] 106 for i in range(basis.bfs_nao): 107 for j in range(basis.bfs_nprim[i]): 108 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 109 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 110 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 111 112 113 114 115 if slice is None: 116 slice = [0, basis.bfs_nao, 0, basis.bfs_nao, 0, basis.bfs_nao, 0, basis.bfs_nao] 117 118 #Limits for the calculation of 4c2e integrals 119 indx_startA = int(slice[0]) 120 indx_endA = int(slice[1]) 121 indx_startB = int(slice[2]) 122 indx_endB = int(slice[3]) 123 indx_startC = int(slice[4]) 124 indx_endC = int(slice[5]) 125 indx_startD = int(slice[6]) 126 indx_endD = int(slice[7]) 127 128 129 ints4c2e = rys_4c2e_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, shell_indices,\ 130 indx_startA,indx_endA, indx_startB, indx_endB, indx_startC, indx_endC, indx_startD,indx_endD) 131 132 return ints4c2e
Compute four-center two-electron (4c2e) electron repulsion integrals (ERIs) using the Rys quadrature method with exploitation of 8-fold permutational symmetry.
This function evaluates integrals of the form (A B | C D), where A, B, C, D are basis functions from the same primary basis set. It uses Numba-accelerated backends and symmetry-aware optimizations to reduce computational cost.
Symmetries exploited
The 4c2e integrals obey the following permutational symmetry relations:
(A B | C D) = (B A | C D) # exchange within bra
= (A B | D C) # exchange within ket
= (B A | D C)
= (C D | A B) # bra ↔ ket exchange
= (D C | A B)
= (C D | B A)
= (D C | B A)
These 8 equivalent permutations mean that only a subset of integrals needs to be explicitly computed; the rest can be obtained by symmetry.
This reduces the number of independent integrals from: N_bf^4 → N_bf(N_bf+1)/2 * N_bf(N_bf+1)/2 when computing the full tensor.
Parameters
basis : object Primary basis set object containing: - bfs_coords : Cartesian coordinates of basis function centers. - bfs_coeffs : Contraction coefficients. - bfs_expnts : Gaussian exponents. - bfs_prim_norms : Primitive normalization constants. - bfs_contr_prim_norms : Contraction normalization factors. - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n). - bfs_nprim : Number of primitives per basis function. - bfs_shell_index : Index of the shell each basis function belongs to. - bfs_nao : Total number of atomic orbitals.
slice : list of int, optional An 8-element list specifying a sub-block of integrals to compute: [start_A, end_A, start_B, end_B, start_C, end_C, start_D, end_D] If None (default), computes the full (Nbf, Nbf, Nbf, Nbf) block.
**Note:** When slices are used, not all 8-fold symmetries may be
available because the requested block may not contain all
symmetric permutations. In such cases, only the applicable
symmetries are used.
Returns
ints4c2e : ndarray The computed 4-center 2-electron integrals for the requested range.
Notes
- Precomputes and stores basis set data in NumPy arrays for Numba efficiency.
- Exploits all possible symmetry permutations when the full tensor is computed (no slice) to reduce redundant work.
- If a slice is specified, symmetry exploitation is limited to the permutations that fall within the slice's index ranges.
Examples
>>> eri_full = rys_4c2e_symm(basis)
>>> eri_block = rys_4c2e_symm(basis, slice=[0,5, 0,5, 0,5, 0,5])
416def rys_4c2e_symm_old(basis, slice=None): 417 # Here the lists are converted to numpy arrays for better use with Numba. 418 # Once these conversions are done we pass these to a Numba decorated 419 # function that uses prange, etc. to calculate the 4c2e integrals efficiently. 420 # This function calculates the 4c2e electron-electron ERIs for a given basis object. 421 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 422 423 # It is possible to only calculate a slice (block/subset) of the complete set of integrals. 424 # slice is an 8 element list whose first and second elements give the range of the A functions to be calculated. 425 # and so on. 426 427 #We convert the required properties to numpy arrays as this is what Numba likes. 428 bfs_coords = np.array([basis.bfs_coords]) 429 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 430 bfs_lmn = np.array([basis.bfs_lmn]) 431 bfs_nprim = np.array([basis.bfs_nprim]) 432 433 434 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 435 #Numba won't be able to work with these efficiently. 436 #So, we convert them to a numpy 2d array by applying a trick, 437 #that the second dimension is that of the largest list. So that 438 #it can accomadate all the lists. 439 maxnprim = max(basis.bfs_nprim) 440 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 441 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 442 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 443 for i in range(basis.bfs_nao): 444 for j in range(basis.bfs_nprim[i]): 445 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 446 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 447 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 448 449 450 451 452 if slice is None: 453 slice = [0, basis.bfs_nao, 0, basis.bfs_nao, 0, basis.bfs_nao, 0, basis.bfs_nao] 454 455 #Limits for the calculation of 4c2e integrals 456 indx_startA = int(slice[0]) 457 indx_endA = int(slice[1]) 458 indx_startB = int(slice[2]) 459 indx_endB = int(slice[3]) 460 indx_startC = int(slice[4]) 461 indx_endC = int(slice[5]) 462 indx_startD = int(slice[6]) 463 indx_endD = int(slice[7]) 464 465 ints4c2e = rys_4c2e_symm_internal_old(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts,\ 466 indx_startA,indx_endA, indx_startB, indx_endB, indx_startC, indx_endC, indx_startD,indx_endD) 467 468 return ints4c2e
7def conv_3c2e_symm(basis, auxbasis, slice=None): 8 """ 9 Compute three-center two-electron (3c2e) electron repulsion integrals (ERIs) 10 using the conventional and slow (analytical) formula-based method with symmetry exploitation. 11 12 This function evaluates integrals of the form (A B | C), where: 13 - A and B are primary basis functions from `basis` 14 - C is an auxiliary basis function from `auxbasis` 15 16 The "conv" variant uses explicit analytical integral formulas and nested 17 loops over primitive Gaussians, following the derivations in: 18 19 J. Chem. Educ. 2018, 95, 9, 1572–1578 20 https://pubs.acs.org/doi/full/10.1021/acs.jchemed.8b00255 21 22 Compared to the Rys quadrature method, this conventional approach is 23 significantly more computationally expensive, but can be useful for 24 validation or when Rys is not applicable. 25 26 Symmetries exploited 27 -------------------- 28 For 3c2e integrals, the following bra symmetry is used: 29 30 (A B | C) = (B A | C) 31 32 This reduces the number of computed integrals from N_bf² * N_aux 33 to N_bf*(N_bf+1)/2 * N_aux when the full tensor is computed. 34 35 Parameters 36 ---------- 37 basis : object 38 Primary basis set object containing: 39 - bfs_coords : Cartesian coordinates of basis function centers 40 - bfs_coeffs : Contraction coefficients 41 - bfs_expnts : Gaussian exponents 42 - bfs_prim_norms : Primitive normalization constants 43 - bfs_contr_prim_norms : Contraction normalization factors 44 - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n) 45 - bfs_nprim : Number of primitives per basis function 46 - bfs_nao : Total number of atomic orbitals 47 48 auxbasis : object 49 Auxiliary basis set object with the same attributes as `basis`. 50 51 slice : list of int, optional 52 A 6-element list specifying a sub-block of integrals to compute: 53 [start_A, end_A, start_B, end_B, start_C, end_C] 54 If None (default), computes the full (Nbf, Nbf, Naux) tensor. 55 56 **Note:** When slices are used, AB symmetry exploitation is limited 57 to permutations that lie entirely within the specified slice. 58 59 Returns 60 ------- 61 ints3c2e : ndarray 62 The computed 3-center 2-electron integrals for the requested range. 63 Shape: (Nbf, Nbf, Nauxbf) or 64 (end_A - start_A, end_B - start_B, end_C - start_C) 65 if slice is given. 66 67 Notes 68 ----- 69 - All basis set data are pre-packed into NumPy arrays for Numba acceleration. 70 - Uses explicit primitive Gaussian formula evaluation without numerical quadrature. 71 - Symmetry exploitation is maximal only for full-range computations. 72 - Conventional evaluation scales poorly compared to Rys quadrature, but is 73 exact for the given formula set. 74 75 Examples 76 -------- 77 >>> eri_full = conv_3c2e_symm(basis, auxbasis) 78 >>> eri_block = conv_3c2e_symm(basis, auxbasis, slice=[0,5, 0,5, 0,10]) 79 """ 80 # Here the lists are converted to numpy arrays for better use with Numba. 81 # Once these conversions are done we pass these to a Numba decorated 82 # function that uses prange, etc. to calculate the 3c2e integrals efficiently. 83 # This function calculates the 3c2e electron-electron ERIs for a given basis object and auxbasis object. 84 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 85 86 # The integrals are performed using the formulas https://pubs.acs.org/doi/full/10.1021/acs.jchemed.8b00255 87 88 # It is possible to only calculate a slice (block/subset) of the complete set of integrals. 89 # slice is an 6 element list whose first and second elements give the range of the A functions to be calculated. 90 # and so on. 91 # slice = [indx_startA, indx_endA, indx_startB, indx_endB, indx_startC, indx_endC] 92 93 #We convert the required properties to numpy arrays as this is what Numba likes. 94 bfs_coords = np.array([basis.bfs_coords]) 95 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 96 bfs_lmn = np.array([basis.bfs_lmn]) 97 bfs_nprim = np.array([basis.bfs_nprim]) 98 99 #We convert the required properties to numpy arrays as this is what Numba likes. 100 aux_bfs_coords = np.array([auxbasis.bfs_coords]) 101 aux_bfs_contr_prim_norms = np.array([auxbasis.bfs_contr_prim_norms]) 102 aux_bfs_lmn = np.array([auxbasis.bfs_lmn]) 103 aux_bfs_nprim = np.array([auxbasis.bfs_nprim]) 104 105 106 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 107 #Numba won't be able to work with these efficiently. 108 #So, we convert them to a numpy 2d array by applying a trick, 109 #that the second dimension is that of the largest list. So that 110 #it can accomadate all the lists. 111 maxnprim = max(basis.bfs_nprim) 112 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 113 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 114 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 115 for i in range(basis.bfs_nao): 116 for j in range(basis.bfs_nprim[i]): 117 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 118 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 119 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 120 121 maxnprimaux = max(auxbasis.bfs_nprim) 122 aux_bfs_coeffs = np.zeros([auxbasis.bfs_nao, maxnprimaux]) 123 aux_bfs_expnts = np.zeros([auxbasis.bfs_nao, maxnprimaux]) 124 aux_bfs_prim_norms = np.zeros([auxbasis.bfs_nao, maxnprimaux]) 125 for i in range(auxbasis.bfs_nao): 126 for j in range(auxbasis.bfs_nprim[i]): 127 aux_bfs_coeffs[i,j] = auxbasis.bfs_coeffs[i][j] 128 aux_bfs_expnts[i,j] = auxbasis.bfs_expnts[i][j] 129 aux_bfs_prim_norms[i,j] = auxbasis.bfs_prim_norms[i][j] 130 131 132 133 134 if slice is None: 135 slice = [0, basis.bfs_nao, 0, basis.bfs_nao, 0, auxbasis.bfs_nao] 136 137 #Limits for the calculation of 4c2e integrals 138 indx_startA = int(slice[0]) 139 indx_endA = int(slice[1]) 140 indx_startB = int(slice[2]) 141 indx_endB = int(slice[3]) 142 indx_startC = int(slice[4]) 143 indx_endC = int(slice[5]) 144 145 ints3c2e = conv_3c2e_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts,aux_bfs_coords[0], aux_bfs_contr_prim_norms[0], aux_bfs_lmn[0], aux_bfs_nprim[0], aux_bfs_coeffs, aux_bfs_prim_norms, aux_bfs_expnts,indx_startA, indx_endA, indx_startB, indx_endB, indx_startC, indx_endC) 146 return ints3c2e
Compute three-center two-electron (3c2e) electron repulsion integrals (ERIs) using the conventional and slow (analytical) formula-based method with symmetry exploitation.
This function evaluates integrals of the form (A B | C), where:
- A and B are primary basis functions from basis
- C is an auxiliary basis function from auxbasis
The "conv" variant uses explicit analytical integral formulas and nested loops over primitive Gaussians, following the derivations in:
J. Chem. Educ. 2018, 95, 9, 1572–1578
https://pubs.acs.org/doi/full/10.1021/acs.jchemed.8b00255
Compared to the Rys quadrature method, this conventional approach is significantly more computationally expensive, but can be useful for validation or when Rys is not applicable.
Symmetries exploited
For 3c2e integrals, the following bra symmetry is used:
(A B | C) = (B A | C)
This reduces the number of computed integrals from N_bf² * N_aux to N_bf*(N_bf+1)/2 * N_aux when the full tensor is computed.
Parameters
basis : object Primary basis set object containing: - bfs_coords : Cartesian coordinates of basis function centers - bfs_coeffs : Contraction coefficients - bfs_expnts : Gaussian exponents - bfs_prim_norms : Primitive normalization constants - bfs_contr_prim_norms : Contraction normalization factors - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n) - bfs_nprim : Number of primitives per basis function - bfs_nao : Total number of atomic orbitals
auxbasis : object
Auxiliary basis set object with the same attributes as basis.
slice : list of int, optional A 6-element list specifying a sub-block of integrals to compute: [start_A, end_A, start_B, end_B, start_C, end_C] If None (default), computes the full (Nbf, Nbf, Naux) tensor.
**Note:** When slices are used, AB symmetry exploitation is limited
to permutations that lie entirely within the specified slice.
Returns
ints3c2e : ndarray The computed 3-center 2-electron integrals for the requested range. Shape: (Nbf, Nbf, Nauxbf) or (end_A - start_A, end_B - start_B, end_C - start_C) if slice is given.
Notes
- All basis set data are pre-packed into NumPy arrays for Numba acceleration.
- Uses explicit primitive Gaussian formula evaluation without numerical quadrature.
- Symmetry exploitation is maximal only for full-range computations.
- Conventional evaluation scales poorly compared to Rys quadrature, but is exact for the given formula set.
Examples
>>> eri_full = conv_3c2e_symm(basis, auxbasis)
>>> eri_block = conv_3c2e_symm(basis, auxbasis, slice=[0,5, 0,5, 0,10])
11def rys_3c2e_symm(basis, auxbasis, slice=None, schwarz=False, schwarz_threshold=1e-9): 12 """ 13 Compute three-center two-electron (3c2e) electron repulsion integrals using 14 the Rys quadrature method with symmetry considerations. 15 16 This function evaluates integrals of the form (A B | C), where A and B are 17 basis functions from a primary basis set and C is from an auxiliary basis set. 18 Symmetry in the first two indices is exploited ((A B | C) = (B A | C)), 19 so only the upper-triangular part of the (A,B) block is computed and the (B A | C) 20 integrals are formed by symmetry. 21 The implementation uses Numba-accelerated backends with symmetry-aware 22 optimizations and optional Schwarz screening to skip negligible contributions. 23 Symmetry is utilized to only compute N_{bf}*(N_{bf}+1)/2*N_{auxbf} 24 25 Parameters 26 ---------- 27 basis : object 28 Primary basis set object containing information about atomic orbitals, such as: 29 - bfs_coords : Cartesian coordinates of basis function centers. 30 - bfs_coeffs : Contraction coefficients. 31 - bfs_expnts : Gaussian exponents. 32 - bfs_prim_norms : Primitive normalization constants. 33 - bfs_contr_prim_norms : Contraction normalization factors. 34 - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n). 35 - bfs_nprim : Number of primitives per basis function. 36 - bfs_nao : Total number of atomic orbitals. 37 38 auxbasis : object 39 Auxiliary basis set object with the same attributes as `basis`, 40 but typically used for resolution-of-the-identity (RI) expansions. 41 42 slice : list of int, optional 43 A 6-element list specifying a subset of integrals to compute: 44 [start_A, end_A, start_B, end_B, start_C, end_C] 45 If None (default), computes all N_{bf}*N_{bf}*N_{auxbf} integrals. 46 47 schwarz : bool, optional 48 If True, applies Schwarz screening to skip calculations where the 49 product of integral bounds is below `schwarz_threshold`. 50 51 schwarz_threshold : float, optional 52 The threshold for Schwarz screening (default is 1e-9). 53 54 Returns 55 ------- 56 ints3c2e : ndarray of shape 57 (Nbf, Nbf, Nauxbf) or 58 (end_A - start_A, end_B - start_B, end_C - start_C) 59 if slice is given. 60 The computed 3-center 2-electron integrals. 61 62 Notes 63 ----- 64 - Uses preallocated NumPy arrays to store primitive data for efficient Numba processing. 65 - Handles irregular contraction patterns by padding primitive arrays to the 66 size of the largest contraction. 67 - If Schwarz screening is enabled, precomputes diagonal two-electron integrals 68 and uses their square roots for screening. 69 - Symmetry relations are exploited to avoid redundant calculations. 70 71 Examples 72 -------- 73 >>> ints = rys_3c2e_symm(basis, auxbasis) 74 >>> ints_block = rys_3c2e_symm(basis, auxbasis, slice=[0, 5, 0, 5, 0, 10]) 75 >>> ints_screened = rys_3c2e_symm(basis, auxbasis, schwarz=True, schwarz_threshold=1e-10) 76 """ 77 # Here the lists are converted to numpy arrays for better use with Numba. 78 # Once these conversions are done we pass these to a Numba decorated 79 # function that uses prange, etc. to calculate the 3c2e integrals efficiently. 80 # This function calculates the 3c2e electron-electron ERIs for a given basis object and auxbasis object. 81 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 82 83 84 # It is possible to only calculate a slice (block/subset) of the complete set of integrals. 85 # slice is a 6 element list whose first and second elements give the range of the A functions to be calculated. 86 # and so on. 87 # slice = [indx_startA, indx_endA, indx_startB, indx_endB, indx_startC, indx_endC] 88 89 #We convert the required properties to numpy arrays as this is what Numba likes. 90 bfs_coords = np.array([basis.bfs_coords]) 91 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 92 bfs_lmn = np.array([basis.bfs_lmn]) 93 bfs_nprim = np.array([basis.bfs_nprim]) 94 95 #We convert the required properties to numpy arrays as this is what Numba likes. 96 aux_bfs_coords = np.array([auxbasis.bfs_coords]) 97 aux_bfs_contr_prim_norms = np.array([auxbasis.bfs_contr_prim_norms]) 98 aux_bfs_lmn = np.array([auxbasis.bfs_lmn]) 99 aux_bfs_nprim = np.array([auxbasis.bfs_nprim]) 100 101 102 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 103 #Numba won't be able to work with these efficiently. 104 #So, we convert them to a numpy 2d array by applying a trick, 105 #that the second dimension is that of the largest list. So that 106 #it can accomadate all the lists. 107 maxnprim = max(basis.bfs_nprim) 108 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 109 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 110 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 111 for i in range(basis.bfs_nao): 112 for j in range(basis.bfs_nprim[i]): 113 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 114 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 115 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 116 117 maxnprimaux = max(auxbasis.bfs_nprim) 118 aux_bfs_coeffs = np.zeros([auxbasis.bfs_nao, maxnprimaux]) 119 aux_bfs_expnts = np.zeros([auxbasis.bfs_nao, maxnprimaux]) 120 aux_bfs_prim_norms = np.zeros([auxbasis.bfs_nao, maxnprimaux]) 121 for i in range(auxbasis.bfs_nao): 122 for j in range(auxbasis.bfs_nprim[i]): 123 aux_bfs_coeffs[i,j] = auxbasis.bfs_coeffs[i][j] 124 aux_bfs_expnts[i,j] = auxbasis.bfs_expnts[i][j] 125 aux_bfs_prim_norms[i,j] = auxbasis.bfs_prim_norms[i][j] 126 127 128 129 130 if slice is None: 131 slice = [0, basis.bfs_nao, 0, basis.bfs_nao, 0, auxbasis.bfs_nao] 132 133 #Limits for the calculation of 4c2e integrals 134 indx_startA = int(slice[0]) 135 indx_endA = int(slice[1]) 136 indx_startB = int(slice[2]) 137 indx_endB = int(slice[3]) 138 indx_startC = int(slice[4]) 139 indx_endC = int(slice[5]) 140 141 if schwarz: 142 ints4c2e_diag = eri_4c2e_diag(basis) 143 ints2c2e = rys_2c2e_symm(auxbasis) 144 sqrt_ints4c2e_diag = np.sqrt(np.abs(ints4c2e_diag)) 145 sqrt_diag_ints2c2e = np.sqrt(np.abs(np.diag(ints2c2e))) 146 print('Prelims calc done for Schwarz screening!') 147 else: 148 #Create dummy array 149 sqrt_ints4c2e_diag = np.zeros((1,1), dtype=np.float64) 150 sqrt_diag_ints2c2e = np.zeros((1), dtype=np.float64) 151 152 ints3c2e = rys_3c2e_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts,aux_bfs_coords[0], aux_bfs_contr_prim_norms[0], aux_bfs_lmn[0], aux_bfs_nprim[0], aux_bfs_coeffs, aux_bfs_prim_norms, aux_bfs_expnts,indx_startA, indx_endA, indx_startB, indx_endB, indx_startC, indx_endC, schwarz, sqrt_ints4c2e_diag, sqrt_diag_ints2c2e, schwarz_threshold) 153 return ints3c2e
Compute three-center two-electron (3c2e) electron repulsion integrals using the Rys quadrature method with symmetry considerations.
This function evaluates integrals of the form (A B | C), where A and B are basis functions from a primary basis set and C is from an auxiliary basis set. Symmetry in the first two indices is exploited ((A B | C) = (B A | C)), so only the upper-triangular part of the (A,B) block is computed and the (B A | C) integrals are formed by symmetry. The implementation uses Numba-accelerated backends with symmetry-aware optimizations and optional Schwarz screening to skip negligible contributions. Symmetry is utilized to only compute N_{bf}*(N_{bf}+1)/2*N_{auxbf}
Parameters
basis : object Primary basis set object containing information about atomic orbitals, such as: - bfs_coords : Cartesian coordinates of basis function centers. - bfs_coeffs : Contraction coefficients. - bfs_expnts : Gaussian exponents. - bfs_prim_norms : Primitive normalization constants. - bfs_contr_prim_norms : Contraction normalization factors. - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n). - bfs_nprim : Number of primitives per basis function. - bfs_nao : Total number of atomic orbitals.
auxbasis : object
Auxiliary basis set object with the same attributes as basis,
but typically used for resolution-of-the-identity (RI) expansions.
slice : list of int, optional A 6-element list specifying a subset of integrals to compute: [start_A, end_A, start_B, end_B, start_C, end_C] If None (default), computes all N_{bf}*N_{bf}*N_{auxbf} integrals.
schwarz : bool, optional
If True, applies Schwarz screening to skip calculations where the
product of integral bounds is below schwarz_threshold.
schwarz_threshold : float, optional The threshold for Schwarz screening (default is 1e-9).
Returns
ints3c2e : ndarray of shape (Nbf, Nbf, Nauxbf) or (end_A - start_A, end_B - start_B, end_C - start_C) if slice is given. The computed 3-center 2-electron integrals.
Notes
- Uses preallocated NumPy arrays to store primitive data for efficient Numba processing.
- Handles irregular contraction patterns by padding primitive arrays to the size of the largest contraction.
- If Schwarz screening is enabled, precomputes diagonal two-electron integrals and uses their square roots for screening.
- Symmetry relations are exploited to avoid redundant calculations.
Examples
>>> ints = rys_3c2e_symm(basis, auxbasis)
>>> ints_block = rys_3c2e_symm(basis, auxbasis, slice=[0, 5, 0, 5, 0, 10])
>>> ints_screened = rys_3c2e_symm(basis, auxbasis, schwarz=True, schwarz_threshold=1e-10)
8def conv_2c2e_symm(basis, slice=None): 9 """ 10 Compute two-center two-electron integrals (2c2e) using the conventional (slow) method, 11 exploiting permutation symmetry. 12 13 This function computes 2c2e electron repulsion integrals (ERIs) over a given 14 basis set using the conventional algorithm which is very slow compared to Rys quadrature. 15 It supports symmetric integral evaluation and allows slicing the full ERI matrix into 16 blocks for efficient parallelization or chunked computation. 17 18 Parameters 19 ---------- 20 basis : Basis 21 A basis object containing basis function information such as: 22 coordinates, exponents, coefficients, angular momentum, normalization factors, 23 and number of primitives. 24 25 slice : list of int, optional 26 A list of four integers [start_row, end_row, start_col, end_col] specifying 27 a block (subset) of the full integral matrix to compute. 28 If not provided, the full matrix is computed. 29 30 Returns 31 ------- 32 ints2c2e : np.ndarray 33 A 2D numpy array containing the computed 2c2e integrals for the specified slice. 34 """ 35 # Here the lists are converted to numpy arrays for better use with Numba. 36 # Once these conversions are done we pass these to a Numba decorated 37 # function that uses prange, etc. to calculate the 3c2e integrals efficiently. 38 # This function calculates the 3c2e electron-electron ERIs for a given basis object and auxbasis object. 39 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 40 41 # It is possible to only calculate a slice (block/subset) of the complete set of integrals. 42 # slice is a 4 element list whose first and second elements give the range of the A functions to be calculated. 43 # and so on. 44 # slice = [start_row, end_row, start_col, end_col] 45 46 #We convert the required properties to numpy arrays as this is what Numba likes. 47 bfs_coords = np.array([basis.bfs_coords]) 48 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 49 bfs_lmn = np.array([basis.bfs_lmn]) 50 bfs_nprim = np.array([basis.bfs_nprim]) 51 52 53 54 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 55 #Numba won't be able to work with these efficiently. 56 #So, we convert them to a numpy 2d array by applying a trick, 57 #that the second dimension is that of the largest list. So that 58 #it can accomadate all the lists. 59 maxnprim = max(basis.bfs_nprim) 60 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 61 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 62 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 63 for i in range(basis.bfs_nao): 64 for j in range(basis.bfs_nprim[i]): 65 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 66 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 67 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 68 69 70 71 72 if slice is None: 73 slice = [0, basis.bfs_nao, 0, basis.bfs_nao] 74 75 #Limits for the calculation of overlap integrals 76 start_row = int(slice[0]) #row start index 77 end_row = int(slice[1]) #row end index 78 start_col = int(slice[2]) #column start index 79 end_col = int(slice[3]) #column end index 80 81 ints2c2e = conv_2c2e_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, start_row, end_row, start_col, end_col) 82 return ints2c2e
Compute two-center two-electron integrals (2c2e) using the conventional (slow) method, exploiting permutation symmetry.
This function computes 2c2e electron repulsion integrals (ERIs) over a given basis set using the conventional algorithm which is very slow compared to Rys quadrature. It supports symmetric integral evaluation and allows slicing the full ERI matrix into blocks for efficient parallelization or chunked computation.
Parameters
basis : Basis A basis object containing basis function information such as: coordinates, exponents, coefficients, angular momentum, normalization factors, and number of primitives.
slice : list of int, optional A list of four integers [start_row, end_row, start_col, end_col] specifying a block (subset) of the full integral matrix to compute. If not provided, the full matrix is computed.
Returns
ints2c2e : np.ndarray A 2D numpy array containing the computed 2c2e integrals for the specified slice.
8def rys_2c2e_symm(basis, slice=None): 9 """ 10 Compute symmetric two-center two-electron integrals using Rys quadrature. 11 12 This function evaluates the electron repulsion integrals (ERIs) 13 involving only two centers (2c2e), using the Rys quadrature method. 14 The integrals are computed efficiently using Numba, and the data is prepared 15 accordingly for compatibility with JIT compilation. It assumes symmetry, 16 and computes only a specified block if requested. 17 18 Parameters 19 ---------- 20 basis : object 21 A basis set object containing the basis function data. It must have attributes: 22 - bfs_coords : list of (x, y, z) coordinates of basis function centers 23 - bfs_contr_prim_norms : contraction-normalization factors 24 - bfs_lmn : angular momentum tuples (l, m, n) 25 - bfs_nprim : number of primitives for each basis function 26 - bfs_coeffs : contraction coefficients 27 - bfs_expnts : primitive exponents 28 - bfs_prim_norms : primitive normalization constants 29 - bfs_nao : total number of atomic orbitals (basis functions) 30 31 slice : list of int, optional 32 A four-element list `[start_row, end_row, start_col, end_col]` specifying the 33 row and column ranges of the matrix to be computed. If None, the full matrix 34 is computed. 35 36 Returns 37 ------- 38 ints2c2e : ndarray 39 A 2D numpy array of shape `(end_row - start_row, end_col - start_col)` 40 containing the computed symmetric two-center two-electron integrals. 41 42 Notes 43 ----- 44 This function prepares the basis set data in a Numba-friendly format by 45 converting ragged lists into padded 2D arrays, where the second dimension 46 corresponds to the maximum number of primitives. The core integral computation 47 is offloaded to a Numba-accelerated function `rys_2c2e_symm_internal`. 48 """ 49 # Here the lists are converted to numpy arrays for better use with Numba. 50 # Once these conversions are done we pass these to a Numba decorated 51 # function that uses prange, etc. to calculate the 3c2e integrals efficiently. 52 # This function calculates the 3c2e electron-electron ERIs for a given basis object and auxbasis object. 53 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 54 55 # It is possible to only calculate a slice (block/subset) of the complete set of integrals. 56 # slice is a 4 element list whose first and second elements give the range of the A functions to be calculated. 57 # and so on. 58 # slice = [start_row, end_row, start_col, end_col] 59 60 #We convert the required properties to numpy arrays as this is what Numba likes. 61 bfs_coords = np.array([basis.bfs_coords]) 62 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 63 bfs_lmn = np.array([basis.bfs_lmn]) 64 bfs_nprim = np.array([basis.bfs_nprim]) 65 66 67 68 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 69 #Numba won't be able to work with these efficiently. 70 #So, we convert them to a numpy 2d array by applying a trick, 71 #that the second dimension is that of the largest list. So that 72 #it can accomadate all the lists. 73 maxnprim = max(basis.bfs_nprim) 74 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 75 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 76 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 77 for i in range(basis.bfs_nao): 78 for j in range(basis.bfs_nprim[i]): 79 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 80 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 81 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 82 83 84 85 86 if slice is None: 87 slice = [0, basis.bfs_nao, 0, basis.bfs_nao] 88 89 #Limits for the calculation of overlap integrals 90 start_row = int(slice[0]) #row start index 91 end_row = int(slice[1]) #row end index 92 start_col = int(slice[2]) #column start index 93 end_col = int(slice[3]) #column end index 94 95 ints2c2e = rys_2c2e_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, start_row, end_row, start_col, end_col) 96 return ints2c2e
Compute symmetric two-center two-electron integrals using Rys quadrature.
This function evaluates the electron repulsion integrals (ERIs) involving only two centers (2c2e), using the Rys quadrature method. The integrals are computed efficiently using Numba, and the data is prepared accordingly for compatibility with JIT compilation. It assumes symmetry, and computes only a specified block if requested.
Parameters
basis : object A basis set object containing the basis function data. It must have attributes: - bfs_coords : list of (x, y, z) coordinates of basis function centers - bfs_contr_prim_norms : contraction-normalization factors - bfs_lmn : angular momentum tuples (l, m, n) - bfs_nprim : number of primitives for each basis function - bfs_coeffs : contraction coefficients - bfs_expnts : primitive exponents - bfs_prim_norms : primitive normalization constants - bfs_nao : total number of atomic orbitals (basis functions)
slice : list of int, optional
A four-element list [start_row, end_row, start_col, end_col] specifying the
row and column ranges of the matrix to be computed. If None, the full matrix
is computed.
Returns
ints2c2e : ndarray
A 2D numpy array of shape (end_row - start_row, end_col - start_col)
containing the computed symmetric two-center two-electron integrals.
Notes
This function prepares the basis set data in a Numba-friendly format by
converting ragged lists into padded 2D arrays, where the second dimension
corresponds to the maximum number of primitives. The core integral computation
is offloaded to a Numba-accelerated function rys_2c2e_symm_internal.
9def rys_3c2e_tri(basis, auxbasis): 10 """ 11 Compute three-center two-electron (3c2e) electron repulsion integrals using 12 the Rys quadrature method with symmetry considerations, stored in packed 13 triangular form for memory efficiency. 14 15 This function evaluates integrals of the form (A B | C), where A and B are 16 basis functions from a primary basis set and C is from an auxiliary basis set. 17 Symmetry in the first two indices is exploited ((A B | C) = (B A | C)), 18 so only the upper-triangular part of the (A,B) block is computed and stored. 19 The resulting array has shape (N_{bf}*(N_{bf}+1)/2, N_{auxbf}), where the first 20 index corresponds to a flattened 1D triangular index over basis function pairs. 21 22 Unlike `rys_3c2e_symm`, this routine does not support computing arbitrary slices 23 — the full triangular set is always evaluated. 24 25 Parameters 26 ---------- 27 basis : object 28 Primary basis set object containing information about atomic orbitals, such as: 29 - bfs_coords : Cartesian coordinates of basis function centers. 30 - bfs_coeffs : Contraction coefficients. 31 - bfs_expnts : Gaussian exponents. 32 - bfs_prim_norms : Primitive normalization constants. 33 - bfs_contr_prim_norms : Contraction normalization factors. 34 - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n). 35 - bfs_nprim : Number of primitives per basis function. 36 - bfs_nao : Total number of atomic orbitals. 37 38 auxbasis : object 39 Auxiliary basis set object with the same attributes as `basis`, 40 typically used for resolution-of-the-identity (RI) expansions. 41 42 Returns 43 ------- 44 ints3c2e : ndarray of shape (Nbf*(Nbf+1)//2, Nauxbf) 45 The computed 3-center 2-electron integrals in packed triangular form. 46 The mapping from a pair (i, j) with i ≤ j to the first dimension index 47 follows standard upper-triangular packing order. 48 49 Notes 50 ----- 51 - This is a memory-efficient variant of `rys_3c2e_symm` that avoids storing 52 the full (Nbf, Nbf, Nauxbf) array. 53 - Uses preallocated NumPy arrays for primitive data to ensure efficient Numba processing. 54 - Handles irregular contraction patterns by padding primitive arrays to the size 55 of the largest contraction in the set. 56 - No Schwarz screening or partial computation is available in this function. 57 58 Examples 59 -------- 60 >>> ints_tri = rys_3c2e_tri(basis, auxbasis) 61 >>> # Retrieve value for pair (i, j) and auxiliary k: 62 >>> def packed_index(i, j, Nbf): 63 ... if i > j: 64 ... i, j = j, i 65 ... return i * Nbf - i*(i-1)//2 + (j - i) 66 >>> val = ints_tri[packed_index(i, j, basis.bfs_nao), k] 67 """ 68 # This is a memory efficient version of the rys_3c2e_symm(). 69 # Instead of returning an array of shape (Nbf, Nbf, Nauxbf) it returns 70 # an array of shape (Nbf*(Nbf+1)/2, Nauxbf), where the first index corresponds to the 71 # 1D triangular matrix. 72 73 # You cannot provide a slice here to calculate only a subset. 74 75 # Here the lists are converted to numpy arrays for better use with Numba. 76 # Once these conversions are done we pass these to a Numba decorated 77 # function that uses prange, etc. to calculate the 3c2e integrals efficiently. 78 # This function calculates the 3c2e electron-electron ERIs for a given basis object and auxbasis object. 79 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 80 81 #We convert the required properties to numpy arrays as this is what Numba likes. 82 bfs_coords = np.array([basis.bfs_coords]) 83 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 84 bfs_lmn = np.array([basis.bfs_lmn]) 85 bfs_nprim = np.array([basis.bfs_nprim]) 86 87 #We convert the required properties to numpy arrays as this is what Numba likes. 88 aux_bfs_coords = np.array([auxbasis.bfs_coords]) 89 aux_bfs_contr_prim_norms = np.array([auxbasis.bfs_contr_prim_norms]) 90 aux_bfs_lmn = np.array([auxbasis.bfs_lmn]) 91 aux_bfs_nprim = np.array([auxbasis.bfs_nprim]) 92 93 94 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 95 #Numba won't be able to work with these efficiently. 96 #So, we convert them to a numpy 2d array by applying a trick, 97 #that the second dimension is that of the largest list. So that 98 #it can accomadate all the lists. 99 maxnprim = max(basis.bfs_nprim) 100 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 101 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 102 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 103 for i in range(basis.bfs_nao): 104 for j in range(basis.bfs_nprim[i]): 105 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 106 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 107 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 108 109 maxnprimaux = max(auxbasis.bfs_nprim) 110 aux_bfs_coeffs = np.zeros([auxbasis.bfs_nao, maxnprimaux]) 111 aux_bfs_expnts = np.zeros([auxbasis.bfs_nao, maxnprimaux]) 112 aux_bfs_prim_norms = np.zeros([auxbasis.bfs_nao, maxnprimaux]) 113 for i in range(auxbasis.bfs_nao): 114 for j in range(auxbasis.bfs_nprim[i]): 115 aux_bfs_coeffs[i,j] = auxbasis.bfs_coeffs[i][j] 116 aux_bfs_expnts[i,j] = auxbasis.bfs_expnts[i][j] 117 aux_bfs_prim_norms[i,j] = auxbasis.bfs_prim_norms[i][j] 118 119 120 ints3c2e = rys_3c2e_tri_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts,aux_bfs_coords[0], aux_bfs_contr_prim_norms[0], aux_bfs_lmn[0], aux_bfs_nprim[0], aux_bfs_coeffs, aux_bfs_prim_norms, aux_bfs_expnts, basis.bfs_nao, auxbasis.bfs_nao) 121 return ints3c2e
Compute three-center two-electron (3c2e) electron repulsion integrals using the Rys quadrature method with symmetry considerations, stored in packed triangular form for memory efficiency.
This function evaluates integrals of the form (A B | C), where A and B are basis functions from a primary basis set and C is from an auxiliary basis set. Symmetry in the first two indices is exploited ((A B | C) = (B A | C)), so only the upper-triangular part of the (A,B) block is computed and stored. The resulting array has shape (N_{bf}*(N_{bf}+1)/2, N_{auxbf}), where the first index corresponds to a flattened 1D triangular index over basis function pairs.
Unlike rys_3c2e_symm, this routine does not support computing arbitrary slices
— the full triangular set is always evaluated.
Parameters
basis : object Primary basis set object containing information about atomic orbitals, such as: - bfs_coords : Cartesian coordinates of basis function centers. - bfs_coeffs : Contraction coefficients. - bfs_expnts : Gaussian exponents. - bfs_prim_norms : Primitive normalization constants. - bfs_contr_prim_norms : Contraction normalization factors. - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n). - bfs_nprim : Number of primitives per basis function. - bfs_nao : Total number of atomic orbitals.
auxbasis : object
Auxiliary basis set object with the same attributes as basis,
typically used for resolution-of-the-identity (RI) expansions.
Returns
ints3c2e : ndarray of shape (Nbf*(Nbf+1)//2, Nauxbf) The computed 3-center 2-electron integrals in packed triangular form. The mapping from a pair (i, j) with i ≤ j to the first dimension index follows standard upper-triangular packing order.
Notes
- This is a memory-efficient variant of
rys_3c2e_symmthat avoids storing the full (Nbf, Nbf, Nauxbf) array. - Uses preallocated NumPy arrays for primitive data to ensure efficient Numba processing.
- Handles irregular contraction patterns by padding primitive arrays to the size of the largest contraction in the set.
- No Schwarz screening or partial computation is available in this function.
Examples
>>> ints_tri = rys_3c2e_tri(basis, auxbasis)
>>> # Retrieve value for pair (i, j) and auxiliary k:
>>> def packed_index(i, j, Nbf):
... if i > j:
... i, j = j, i
... return i * Nbf - i*(i-1)//2 + (j - i)
>>> val = ints_tri[packed_index(i, j, basis.bfs_nao), k]
9def rys_nuc_mat_symm(basis, mol, slice=None): 10 # Uses the rys 3c2e algorithm to calculate the nuclear attraction integrals by making one of the basis function as a steep/sharp s-primitive gaussian 11 # Ref: https://arxiv.org/pdf/2302.11307.pdf 12 # Here the lists are converted to numpy arrays for better use with Numba. 13 # Once these conversions are done we pass these to a Numba decorated 14 # function that uses prange, etc. to calculate the 3c2e integrals efficiently. 15 # This function calculates the 3c2e electron-electron ERIs for a given basis object and auxbasis object. 16 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 17 18 19 # It is possible to only calculate a slice (block/subset) of the complete set of integrals. 20 # slice is a 6 element list whose first and second elements give the range of the A functions to be calculated. 21 # and so on. 22 # slice = [indx_startA, indx_endA, indx_startB, indx_endB, indx_startC, indx_endC] 23 24 #We convert the required properties to numpy arrays as this is what Numba likes. 25 bfs_coords = np.array([basis.bfs_coords]) 26 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 27 bfs_lmn = np.array([basis.bfs_lmn]) 28 bfs_nprim = np.array([basis.bfs_nprim]) 29 30 coordsBohrs = np.array([mol.coordsBohrs]) 31 Z = np.array([mol.Zcharges]) 32 natoms = mol.natoms 33 34 35 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 36 #Numba won't be able to work with these efficiently. 37 #So, we convert them to a numpy 2d array by applying a trick, 38 #that the second dimension is that of the largest list. So that 39 #it can accomadate all the lists. 40 maxnprim = max(basis.bfs_nprim) 41 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 42 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 43 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 44 for i in range(basis.bfs_nao): 45 for j in range(basis.bfs_nprim[i]): 46 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 47 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 48 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 49 50 51 52 53 54 if slice is None: 55 slice = [0, basis.bfs_nao, 0, basis.bfs_nao] 56 57 #Limits for the calculation of 4c2e integrals 58 indx_startA = int(slice[0]) 59 indx_endA = int(slice[1]) 60 indx_startB = int(slice[2]) 61 indx_endB = int(slice[3]) 62 63 ints3c2e = rys_nuc_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, indx_startA, indx_endA, indx_startB, indx_endB, Z[0], coordsBohrs[0], natoms) 64 return ints3c2e
9def eval_xc_1(basis, dmat, weights, coords, funcid=[1,7], spin=0, blocksize=50000, debug=False, list_nonzero_indices=None, count_nonzero_indices=None, list_ao_values=None, list_ao_grad_values=None): 10 """ 11 Evaluate exchange-correlation (XC) energy and potential matrix for DFT 12 using algorithm 1, which is a baseline method for grid-based DFT. 13 14 In algorithm 1, the XC term is evaluated by looping over blocks of grid 15 points, and within each block, the operations are parallelized. This 16 approach is functional but generally slower than algorithm 2 for CPU-based 17 execution. 18 19 This function evaluates the XC energy and potential matrix elements for a given 20 density matrix using numerical integration over a 3D real-space grid. It supports 21 LDA and GGA functionals via LibXC and optional use of precomputed AO values and 22 gradients for performance gains. Sparse AO matrix techniques are also supported. 23 24 Parameters 25 ---------- 26 basis : Basis 27 A basis object containing basis function data: exponents, coefficients, 28 angular momentum, normalization, and other AO metadata. 29 30 dmat : np.ndarray 31 The one-electron density matrix in the AO basis. 32 33 weights : np.ndarray 34 Integration weights associated with each grid point. 35 36 coords : np.ndarray 37 Grid point coordinates as an (N, 3) array. 38 39 funcid : list of int, optional 40 LibXC functional IDs. Default is [1, 7] for Slater (X) and VWN (C) (LDA). 41 42 spin : int, optional 43 Spin multiplicity: 0 for unpolarized. Spin-polarized (1) not currently supported. 44 45 blocksize : int, optional 46 Number of grid points to process per block. Default is 50000. 47 48 debug : bool, optional 49 If True, enables verbose timing and diagnostic output. 50 51 list_nonzero_indices : list of np.ndarray, optional 52 List of AO indices with non-negligible contributions in each grid block 53 for sparse matrix optimizations. 54 55 count_nonzero_indices : list of int, optional 56 Number of significant AO indices per block; matches entries in `list_nonzero_indices`. 57 58 list_ao_values : list of np.ndarray, optional 59 Precomputed AO values at grid points for each block. 60 61 list_ao_grad_values : list of tuple of np.ndarray, optional 62 Precomputed AO gradient values (x, y, z) at grid points for each block. 63 64 Returns 65 ------- 66 efunc : float 67 Total exchange-correlation energy. 68 69 v : np.ndarray 70 Exchange-correlation potential matrix in the AO basis. 71 72 Notes 73 ----- 74 - This algorithm prioritizes code clarity and correctness, not maximum speed. 75 - Only LDA and GGA functionals are currently supported. meta-GGA and hybrid functionals 76 are planned for future implementation. 77 - Uses LibXC for exchange-correlation energy and potential evaluation. 78 - AO values and gradients can be reused via precomputation to improve speed. 79 - The number of electrons (via integrated density) is printed for validation. 80 81 References 82 ---------- 83 - LibXC Functional Codes: https://libxc.gitlab.io/functionals/ 84 - Functional energy and potential formulation: https://pubs.acs.org/doi/full/10.1021/ct200412r 85 - LibXC Python interface: https://www.tddft.org/programs/libxc/manual/ 86 """ 87 # Evaluate the XC term 88 # This is a slightly slower algorithm. 89 # Here the parallelization is done when evaluating bfs or the density 90 91 # In order to evaluate a density functional we will use the 92 # libxc library with Python bindings. 93 # However, some sort of simple functionals like LDA, GGA, etc would 94 # need to be implemented in CrysX also, so that the it doesn't depend 95 # on external libraries so heavily that it becomes unusable without those. 96 97 #Useful links: 98 # LibXC manual: https://www.tddft.org/programs/libxc/manual/ 99 # LibXC gitlab: https://gitlab.com/libxc/libxc/-/tree/master 100 # LibXC python interface code: https://gitlab.com/libxc/libxc/-/blob/master/pylibxc/functional.py 101 # LibXC python version installation and example: https://www.tddft.org/programs/libxc/installation/ 102 # Formulae for XC energy and potential calculation: https://pubs.acs.org/doi/full/10.1021/ct200412r 103 # LibXC code list: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/libxc.py 104 # PySCF nr_rks code: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/numint.py 105 # https://www.osti.gov/pages/servlets/purl/1650078 106 107 108 109 #OUTPUT 110 #Functional energy 111 efunc = 0.0 112 #Functional potential V_{\mu \nu} = \mu|\partial{f}/\partial{\rho}|\nu 113 v = np.zeros((basis.bfs_nao, basis.bfs_nao)) 114 #TODO mGGA, Hybrid 115 116 #Calculate number of blocks/batches 117 ngrids = coords.shape[0] 118 nblocks = ngrids//blocksize 119 nelec = 0.0 120 121 # If a list of significant basis functions for each block of grid points is provided 122 if list_nonzero_indices is not None: 123 dmat_orig = dmat 124 125 ### Calculate stuff necessary for bf/ao evaluation on grid points 126 ### Doesn't make any difference for 510 bfs but might be significant for >1000 bfs 127 # This will help to make the call to eval_bfs faster by skipping the mediator eval_bfs function 128 # that prepares the following stuff at every iteration 129 #We convert the required properties to numpy arrays as this is what Numba likes. 130 bfs_coords = np.array([basis.bfs_coords]) 131 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 132 bfs_lmn = np.array([basis.bfs_lmn]) 133 bfs_nprim = np.array([basis.bfs_nprim]) 134 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 135 #Numba won't be able to work with these efficiently. 136 #So, we convert them to a numpy 2d array by applying a trick, 137 #that the second dimension is that of the largest list. So that 138 #it can accomadate all the lists. 139 maxnprim = max(basis.bfs_nprim) 140 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 141 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 142 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 143 bfs_radius_cutoff = np.zeros([basis.bfs_nao]) 144 for i in range(basis.bfs_nao): 145 for j in range(basis.bfs_nprim[i]): 146 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 147 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 148 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 149 bfs_radius_cutoff[i] = basis.bfs_radius_cutoff[i] 150 # Now bf/ao values can be evaluated by calling the following 151 # bf_values = Integrals.bf_val_helpers.eval_bfs(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, bfs_radius_cutoff, coord) 152 153 # For debugging and benchmarking purposes 154 durationLibxc = 0.0 155 durationE = 0.0 156 durationF = 0.0 157 durationZ = 0.0 158 durationV = 0.0 159 durationRho = 0.0 160 durationAO = 0.0 161 162 xc_family_dict = {1:'LDA',2:'GGA',4:'MGGA'} 163 164 # Create a LibXC object 165 funcx = pylibxc.LibXCFunctional(funcid[0], "unpolarized") 166 funcc = pylibxc.LibXCFunctional(funcid[1], "unpolarized") 167 x_family_code = funcx.get_family() 168 c_family_code = funcc.get_family() 169 170 # Loop over blocks/batches of grid points 171 for iblock in range(nblocks+1): 172 offset = iblock*blocksize 173 174 # Get weights and coordinates of grid points for this block/batch 175 weights_block = weights[offset : min(offset+blocksize,ngrids)] 176 coords_block = coords[offset : min(offset+blocksize,ngrids)] 177 178 # Get the list of basis functions with significant contributions to this block 179 if list_nonzero_indices is not None: 180 non_zero_indices = list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]] 181 # Get the subset of density matrix corresponding to the siginificant basis functions 182 dmat = dmat_orig[np.ix_(non_zero_indices, non_zero_indices)] 183 184 if debug: 185 startAO = timer() 186 if xc_family_dict[x_family_code]=='LDA' and xc_family_dict[c_family_code]=='LDA': 187 if list_ao_values is not None: # If ao_values are calculated once and saved, then they can be provided to avoid recalculation 188 ao_value_block = list_ao_values[iblock] 189 else: 190 # ao_value_block = Integrals.eval_bfs(basis, coords_block) 191 if list_nonzero_indices is not None: 192 # ao_value_block = Integrals.bf_val_helpers.eval_bfs_sparse_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, coords_block, non_zero_indices) 193 ao_value_block = Integrals.bf_val_helpers.eval_bfs_sparse_vectorized_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, coords_block, non_zero_indices) 194 else: 195 ao_value_block = Integrals.bf_val_helpers.eval_bfs_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, coords_block) 196 # If either x or c functional is of GGA/MGGA type we need ao_grad_values 197 if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA': 198 # ao_value_block, ao_values_grad_block = Integrals.bf_val_helpers.eval_bfs_and_grad(basis, coords_block, deriv=1, parallel=True, non_zero_indices=non_zero_indices) 199 if list_nonzero_indices is not None: 200 # Calculating ao values and gradients together, didn't really do much improvement in computational speed 201 ao_value_block, ao_values_grad_block = Integrals.bf_val_helpers.eval_bfs_and_grad_sparse_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, coords_block, non_zero_indices) 202 else: 203 ao_value_block, ao_values_grad_block = Integrals.bf_val_helpers.eval_bfs_and_grad_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, coords_block) 204 if debug: 205 durationAO = durationAO + timer() - startAO 206 207 208 if debug: 209 startRho = timer() 210 211 if list_nonzero_indices is not None: 212 rho_block = contract('ij,mi,mj->m', dmat, ao_value_block, ao_value_block) # Original (pretty fast) 213 else: 214 rho_block = Integrals.bf_val_helpers.bf_val_helpers.eval_rho(ao_value_block, dmat) # This is by-far the fastest now <----- 215 216 # If either x or c functional is of GGA/MGGA type we need rho_grad_values too 217 if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA': 218 rho_grad_block_x = contract('ij,mi,mj->m',dmat,ao_values_grad_block[0],ao_value_block)+\ 219 contract('ij,mi,mj->m',dmat,ao_value_block,ao_values_grad_block[0]) 220 rho_grad_block_y = contract('ij,mi,mj->m',dmat,ao_values_grad_block[1],ao_value_block)+\ 221 contract('ij,mi,mj->m',dmat,ao_value_block,ao_values_grad_block[1]) 222 rho_grad_block_z = contract('ij,mi,mj->m',dmat,ao_values_grad_block[2],ao_value_block)+\ 223 contract('ij,mi,mj->m',dmat,ao_value_block,ao_values_grad_block[2]) 224 sigma_block = np.zeros((3,weights_block.shape[0])) 225 sigma_block[1] = rho_grad_block_x**2 + rho_grad_block_y**2 + rho_grad_block_z**2 226 227 if debug: 228 durationRho = durationRho + timer() - startRho 229 230 #LibXC stuff 231 # Exchange 232 if debug: 233 startLibxc = timer() 234 # Input dictionary for libxc 235 inp = {} 236 # Input dictionary needs density values at grid points 237 inp['rho'] = rho_block 238 if xc_family_dict[x_family_code]!='LDA': 239 # Input dictionary needs sigma (\nabla \rho \cdot \nabla \rho) values at grid points 240 inp['sigma'] = sigma_block[1] 241 # Calculate the necessary quantities using LibXC 242 retx = funcx.compute(inp) 243 # print('Duration for LibXC computations at grid points: ',durationLibxc) 244 245 # Correlation 246 # Input dictionary for libxc 247 inp = {} 248 # Input dictionary needs density values at grid points 249 inp['rho'] = rho_block 250 if xc_family_dict[c_family_code]!='LDA': 251 # Input dictionary needs sigma (\nabla \rho \cdot \nabla \rho) values at grid points 252 inp['sigma'] = sigma_block[1] 253 # Calculate the necessary quantities using LibXC 254 retc = funcc.compute(inp) 255 256 if debug: 257 durationLibxc = durationLibxc + timer() - startLibxc 258 # print('Duration for LibXC computations at grid points: ',durationLibxc) 259 260 if debug: 261 startE = timer() 262 #ENERGY----------- 263 e = retx['zk'] + retc['zk'] # Functional values at grid points 264 # Testing CrysX's own implmentation 265 #e = densfuncs.lda_x(rho) 266 267 # Calculate the total energy 268 # Multiply the density at grid points by weights 269 den = rho_block*weights_block #elementwise multiply 270 efunc = efunc + np.dot(den, e) #Multiply with functional values at grid points and sum 271 nelec = nelec + np.sum(den) 272 273 if debug: 274 durationE = durationE + timer() - startE 275 # print('Duration for calculation of total density functional energy: ',durationE) 276 277 #POTENTIAL---------- 278 # The derivative of functional wrt density is vrho 279 vrho = retx['vrho'] + retc['vrho'] 280 vsigma = 0 281 # If either x or c functional is of GGA/MGGA type we need rho_grad_values 282 if xc_family_dict[x_family_code]!='LDA': 283 # The derivative of functional wrt grad \rho square. 284 vsigma = retx['vsigma'] 285 if xc_family_dict[c_family_code]!='LDA': 286 # The derivative of functional wrt grad \rho square. 287 vsigma += retc['vsigma'] 288 289 if debug: 290 startF = timer() 291 # F = np.multiply(weights_block,vrho[:,0]) #This is fast enough. 292 v_rho_temp = vrho[:,0] 293 F = numexpr.evaluate('(weights_block*v_rho_temp)') 294 # If either x or c functional is of GGA/MGGA type we need rho_grad_values 295 if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA': 296 Ftemp = 2*weights_block*vsigma[:,0] 297 Fx = Ftemp*rho_grad_block_x 298 Fy = Ftemp*rho_grad_block_y 299 Fz = Ftemp*rho_grad_block_z 300 # Ftemp = 2*np.multiply(weights_block,vsigma.T) 301 # Fx = Ftemp*rho_grad_block_x 302 # Fy = Ftemp*rho_grad_block_y 303 # Fz = Ftemp*rho_grad_block_z 304 if debug: 305 durationF = durationF + timer() - startF 306 # print('Duration for calculation of F: ',durationF) 307 308 if debug: 309 startZ = timer() 310 ao_value_block_T = ao_value_block.T 311 z = numexpr.evaluate('(0.5*F*ao_value_block_T)') 312 # z = 0.5*np.einsum('m,mi->mi',F,ao_value_block) 313 # If either x or c functional is of GGA/MGGA type we need rho_grad_values 314 if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA': 315 z = z + Fx*ao_values_grad_block[0].T + Fy*ao_values_grad_block[1].T + Fz*ao_values_grad_block[2].T 316 317 if debug: 318 durationZ = durationZ + timer() - startZ 319 # print('Duration for calculation of z : ',durationZ) 320 # Free memory 321 F = 0 322 ao_value_block_T = 0 323 vrho = 0 324 325 if debug: 326 startV = timer() 327 # Numexpr 328 v_temp = z @ ao_value_block 329 v_temp_T = v_temp.T 330 if list_nonzero_indices is not None: 331 v[np.ix_(non_zero_indices, non_zero_indices)] += numexpr.evaluate('(v_temp + v_temp_T)') 332 else: 333 v = numexpr.evaluate('(v + v_temp + v_temp_T)') 334 335 336 if debug: 337 durationV = durationV + timer() - startV 338 339 print('Number of electrons: ', nelec) 340 341 if debug: 342 print('Duration for AO values: ', durationAO) 343 print('Duration for V: ',durationV) 344 print('Duration for Rho at grid points: ',durationRho) 345 print('Duration for F: ',durationF) 346 print('Duration for Z: ',durationZ) 347 print('Duration for E: ',durationE) 348 print('Duration for LibXC: ',durationLibxc) 349 350 351 352 return efunc[0], v
Evaluate exchange-correlation (XC) energy and potential matrix for DFT using algorithm 1, which is a baseline method for grid-based DFT.
In algorithm 1, the XC term is evaluated by looping over blocks of grid points, and within each block, the operations are parallelized. This approach is functional but generally slower than algorithm 2 for CPU-based execution.
This function evaluates the XC energy and potential matrix elements for a given density matrix using numerical integration over a 3D real-space grid. It supports LDA and GGA functionals via LibXC and optional use of precomputed AO values and gradients for performance gains. Sparse AO matrix techniques are also supported.
Parameters
basis : Basis A basis object containing basis function data: exponents, coefficients, angular momentum, normalization, and other AO metadata.
dmat : np.ndarray The one-electron density matrix in the AO basis.
weights : np.ndarray Integration weights associated with each grid point.
coords : np.ndarray Grid point coordinates as an (N, 3) array.
funcid : list of int, optional LibXC functional IDs. Default is [1, 7] for Slater (X) and VWN (C) (LDA).
spin : int, optional Spin multiplicity: 0 for unpolarized. Spin-polarized (1) not currently supported.
blocksize : int, optional Number of grid points to process per block. Default is 50000.
debug : bool, optional If True, enables verbose timing and diagnostic output.
list_nonzero_indices : list of np.ndarray, optional List of AO indices with non-negligible contributions in each grid block for sparse matrix optimizations.
count_nonzero_indices : list of int, optional
Number of significant AO indices per block; matches entries in list_nonzero_indices.
list_ao_values : list of np.ndarray, optional Precomputed AO values at grid points for each block.
list_ao_grad_values : list of tuple of np.ndarray, optional Precomputed AO gradient values (x, y, z) at grid points for each block.
Returns
efunc : float Total exchange-correlation energy.
v : np.ndarray Exchange-correlation potential matrix in the AO basis.
Notes
- This algorithm prioritizes code clarity and correctness, not maximum speed.
- Only LDA and GGA functionals are currently supported. meta-GGA and hybrid functionals are planned for future implementation.
- Uses LibXC for exchange-correlation energy and potential evaluation.
- AO values and gradients can be reused via precomputation to improve speed.
- The number of electrons (via integrated density) is printed for validation.
References
- LibXC Functional Codes: https://libxc.gitlab.io/functionals/
- Functional energy and potential formulation: https://pubs.acs.org/doi/full/10.1021/ct200412r
- LibXC Python interface: https://www.tddft.org/programs/libxc/manual/
18def eval_xc_2(basis, dmat, weights, coords, funcid=[1,7], spin=0, ncores=2, blocksize=5000, list_nonzero_indices=None, count_nonzero_indices=None, list_ao_values=None, list_ao_grad_values=None, debug=False): 19 """ 20 Evaluate exchange-correlation (XC) energy and potential matrix for DFT 21 using algorithm 2, which is the preferred algorithm for running DFT on CPU. 22 23 In algorithm 2, the XC term is evaluated by parallelizing over batches. 24 In contrast, the algorithm 1 evaluates the XC term by looping over batches and 25 parallelizing the operations within the batch, which is suboptimal. 26 27 This function evaluates the XC energy and potential matrix elements for a given 28 density matrix using grid-based numerical integration. It supports LDA and GGA 29 functionals (via LibXC), parallel execution with `joblib`, and sparse AO matrix blocks. 30 31 Parameters 32 ---------- 33 basis : Basis 34 A basis object containing basis function data: exponents, coefficients, 35 angular momentum, normalization, etc. 36 37 dmat : np.ndarray 38 The one-electron density matrix in the AO basis. 39 40 weights : np.ndarray 41 Integration weights associated with each grid point. 42 43 coords : np.ndarray 44 Grid point coordinates as an (N, 3) array. 45 46 funcid : list of int, optional 47 LibXC functional IDs. Default is [1, 7] for Slater (X) and VWN (C) (LDA). 48 49 spin : int, optional 50 Spin multiplicity: 0 for unpolarized, 1 for spin-polarized (not allowed currently). 51 52 ncores : int, optional 53 Number of threads/cores for parallel execution. Default is 2. 54 55 blocksize : int, optional 56 Number of grid points to process per block. Default is 5000. 57 58 list_nonzero_indices : list of np.ndarray, optional 59 Precomputed list of nonzero AO indices per block for sparse evaluation. 60 61 count_nonzero_indices : list of int, optional 62 Number of nonzero indices in each block (matches `list_nonzero_indices`). 63 64 list_ao_values : list of np.ndarray, optional 65 Precomputed AO values at grid points for each block. 66 67 list_ao_grad_values : list of tuple of np.ndarray, optional 68 Precomputed AO gradient values (x, y, z) at grid points for each block. 69 70 debug : bool, optional 71 If True, print detailed timing and diagnostic info. 72 73 Returns 74 ------- 75 efunc : float 76 Total exchange-correlation energy. 77 78 v : np.ndarray 79 Exchange-correlation potential matrix in the AO basis. 80 81 Notes 82 ----- 83 - Only supports LDA and GGA functionals currently. meta-GGA and hybrid support is in development. 84 - Uses LibXC for energy and derivative functional evaluation. 85 - Blocks are randomly shuffled before processing to balance parallel load. 86 - Grid-based DFT evaluation using precomputed AO and gradient matrices is supported. 87 88 References 89 ---------- 90 - LibXC Functional Codes: https://libxc.gitlab.io/functionals/ 91 - XC term evaluation inspired from: https://pubs.acs.org/doi/full/10.1021/ct200412r 92 """ 93 # This performs parallelization at the blocks/batches level. 94 # Therefore, joblib is perfect for such embarrasingly parallel task 95 # In order to evaluate a density functional we will use the 96 # libxc library with Python bindings. 97 # However, some sort of simple functionals like LDA, GGA, etc would 98 # need to be implemented in CrysX also, so that the it doesn't depend 99 # on external libraries so heavily that it becomes unusable without those. 100 101 #Useful links: 102 # LibXC manual: https://www.tddft.org/programs/libxc/manual/ 103 # LibXC gitlab: https://gitlab.com/libxc/libxc/-/tree/master 104 # LibXC python interface code: https://gitlab.com/libxc/libxc/-/blob/master/pylibxc/functional.py 105 # LibXC python version installation and example: https://www.tddft.org/programs/libxc/installation/ 106 # Formulae for XC energy and potential calculation: https://pubs.acs.org/doi/full/10.1021/ct200412r 107 # LibXC code list: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/libxc.py 108 # PySCF nr_rks code: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/numint.py 109 # https://www.osti.gov/pages/servlets/purl/1650078 110 111 #OUTPUT 112 #Functional energy 113 efunc = 0.0 114 #Functional potential V_{\mu \nu} = \mu|\partial{f}/\partial{\rho}|\nu 115 v = np.zeros((basis.bfs_nao, basis.bfs_nao)) 116 nelec = 0 117 118 # TODO it only works for LDA functionals for now. 119 # Need to make it work for GGA, Hybrid, range-separated Hybrid and MetaGGA functionals as well. 120 121 122 ngrids = coords.shape[0] 123 nblocks = ngrids//blocksize 124 125 # print('Number of blocks: ', nblocks) 126 127 # Some stuff to note timings 128 timings = None 129 if debug: 130 durationLibxc = 0.0 131 durationE = 0.0 132 durationF = 0.0 133 durationZ = 0.0 134 durationV = 0.0 135 durationRho = 0.0 136 durationAO = 0.0 137 timings = {'durationLibxc':durationLibxc, 'durationE':durationE, 'durationF':durationF, 'durationZ':durationZ, 'durationV':durationV, 'durationRho':durationRho, 'durationAO':durationAO} 138 139 140 141 ### Calculate stuff necessary for bf/ao evaluation on grid points 142 ### Doesn't make any difference for 510 bfs but might be significant for >1000 bfs 143 # This will help to make the call to evalbfnumba1 faster by skipping the mediator evalbfnumbawrap function 144 # that prepares the following stuff at every iteration 145 #We convert the required properties to numpy arrays as this is what Numba likes. 146 bfs_coords = np.array([basis.bfs_coords]) 147 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 148 bfs_lmn = np.array([basis.bfs_lmn]) 149 bfs_nprim = np.array([basis.bfs_nprim]) 150 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 151 #Numba won't be able to work with these efficiently. 152 #So, we convert them to a numpy 2d array by applying a trick, 153 #that the second dimension is that of the largest list. So that 154 #it can accomadate all the lists. 155 maxnprim = max(basis.bfs_nprim) 156 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 157 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 158 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 159 bfs_radius_cutoff = np.zeros([basis.bfs_nao]) 160 for i in range(basis.bfs_nao): 161 for j in range(basis.bfs_nprim[i]): 162 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 163 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 164 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 165 bfs_radius_cutoff[i] = basis.bfs_radius_cutoff[i] 166 # Now bf/ao values can be evaluated by calling the following 167 # bf_values = Integrals.bf_val_helpers.eval_bfs(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, bfs_radius_cutoff, coord) 168 bfs_data_as_np_arrays = [bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, bfs_radius_cutoff] 169 170 xc_family_dict = {1:'LDA',2:'GGA',4:'MGGA'} 171 # Create a LibXC object 172 funcx = pylibxc.LibXCFunctional(funcid[0], "unpolarized") 173 funcc = pylibxc.LibXCFunctional(funcid[1], "unpolarized") 174 x_family_code = funcx.get_family() 175 c_family_code = funcc.get_family() 176 177 #### Set number of cores for numba related evaluations within 'block_dens_func()' for example bf_value evaluations 178 #### Apparently, it was still using as many cores as possible and creating a serial version of thise functions as I have 179 #### currently done doesn't work 180 numba.set_num_threads(1) 181 # Shuffle the blocks (for load balancing) 182 block_indices = list(range(nblocks+1)) 183 random.shuffle(block_indices) 184 185 # rho_blocks = [np.zeros((blocksize,1)) for _ in range(nblocks+1)] 186 187 if 2*ncores>nblocks: 188 batch_size='auto' 189 else: 190 batch_size = nblocks//(ncores*2) 191 192 # NumExpr expressions compile 193 expr_den = numexpr.NumExpr('rho_block*weights_block') 194 expr_F = numexpr.NumExpr('weights_block*v_rho_temp') 195 expr_z = numexpr.NumExpr('0.5*F*ao_value_block_T') 196 expr_v = numexpr.NumExpr('v_temp + v_temp_T') 197 expr_sigma_block = numexpr.NumExpr('rho_grad_block_x**2 + rho_grad_block_y**2 + rho_grad_block_z**2') 198 expr_Ftemp = numexpr.NumExpr('2*weights_block*vsigma_temp') 199 expr_Fx = numexpr.NumExpr('Ftemp*rho_grad_block_x') 200 expr_Fy = numexpr.NumExpr('Ftemp*rho_grad_block_y') 201 expr_Fz = numexpr.NumExpr('Ftemp*rho_grad_block_z') 202 expr_z_grad = numexpr.NumExpr('Fx*ao_value_gradx_block_T + Fy*ao_value_grady_block_T + Fz*ao_value_gradz_block_T') 203 numexpr_expr = {'expr_den':expr_den, 'expr_F':expr_F, 'expr_z':expr_z, 'expr_v':expr_v, 'expr_sigma_block':expr_sigma_block, \ 204 'expr_Fx':expr_Fx, 'expr_Fy':expr_Fy, 'expr_Fz':expr_Fz, 'expr_z_grad':expr_z_grad, 'expr_Ftemp':expr_Ftemp} 205 206 207 if list_nonzero_indices is not None: 208 if list_ao_values is not None: 209 if xc_family_dict[x_family_code]=='LDA' and xc_family_dict[c_family_code]=='LDA': 210 output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem', batch_size=batch_size, pre_dispatch=3*ncores)(delayed(block_dens_func)(weights[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat[np.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_ao_values[iblock], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, numexpr_expr=numexpr_expr, debug=debug) for iblock in block_indices) 211 else: #GGA 212 output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem', batch_size=batch_size)(delayed(block_dens_func)(weights[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat[np.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_ao_values[iblock], list_ao_grad_values[iblock], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, numexpr_expr=numexpr_expr, debug=debug) for iblock in block_indices) 213 else: 214 output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem', batch_size=batch_size)(delayed(block_dens_func)(weights[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat[np.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, numexpr_expr=numexpr_expr, debug=debug) for iblock in block_indices) 215 # output = Parallel(n_jobs=ncores, backend='loky')(delayed(block_dens_func)(weights[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat[np.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], funcx=None, funcc=None, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug) for iblock in block_indices) 216 else: 217 output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem', batch_size=batch_size)(delayed(block_dens_func)(weights[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat, funcid, bfs_data_as_np_arrays, non_zero_indices=None, ao_values=None, funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, numexpr_expr=numexpr_expr, debug=debug) for iblock in block_indices) 218 219 indx_block_output = 0 220 for iblock in block_indices: 221 efunc += output[indx_block_output][0] 222 if list_nonzero_indices is not None: 223 non_zero_indices = list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]] 224 v[np.ix_(non_zero_indices, non_zero_indices)] += output[indx_block_output][1] 225 # slice = np.ix_(non_zero_indices, non_zero_indices) 226 # v[slice] = numexpr.evaluate("(v_ + output_)", {'v_':v[slice], 'output_':output[indx_block_output][1]}) 227 else: 228 v += output[indx_block_output][1] 229 nelec += output[indx_block_output][2] 230 if debug: 231 timings['durationLibxc'] += output[indx_block_output][3]['durationLibxc'] 232 timings['durationE'] += output[indx_block_output][3]['durationE'] 233 timings['durationF'] += output[indx_block_output][3]['durationF'] 234 timings['durationZ'] += output[indx_block_output][3]['durationZ'] 235 timings['durationV'] += output[indx_block_output][3]['durationV'] 236 timings['durationRho'] += output[indx_block_output][3]['durationRho'] 237 timings['durationAO'] += output[indx_block_output][3]['durationAO'] 238 indx_block_output += 1 239 # v = numexpr.evaluate('(v + output[iblock][1])') 240 241 # print('here1') 242 # for i in range(1000): 243 # a = dmat[np.ix_(list_nonzero_indices[60][0:count_nonzero_indices[60]], list_nonzero_indices[60][0:count_nonzero_indices[60]])] 244 # print('here2') 245 246 247 numba.set_num_threads(ncores) 248 print('Number of electrons: ', nelec) 249 if debug: 250 print('Timings:', timings) 251 252 ####### Free memory 253 ## The following is very important to prevent memory leaks and also to make sure that the number of 254 # threads used by the program is same as that specified by the user 255 # gc.collect() # Avoiding using it for now, as it is usually quite slow, although in this case it might not make much difference 256 # Anyway, the following also works 257 output = 0 258 non_zero_indices = 0 259 coords = 0 260 261 return efunc[0], v
Evaluate exchange-correlation (XC) energy and potential matrix for DFT using algorithm 2, which is the preferred algorithm for running DFT on CPU.
In algorithm 2, the XC term is evaluated by parallelizing over batches. In contrast, the algorithm 1 evaluates the XC term by looping over batches and parallelizing the operations within the batch, which is suboptimal.
This function evaluates the XC energy and potential matrix elements for a given
density matrix using grid-based numerical integration. It supports LDA and GGA
functionals (via LibXC), parallel execution with joblib, and sparse AO matrix blocks.
Parameters
basis : Basis A basis object containing basis function data: exponents, coefficients, angular momentum, normalization, etc.
dmat : np.ndarray The one-electron density matrix in the AO basis.
weights : np.ndarray Integration weights associated with each grid point.
coords : np.ndarray Grid point coordinates as an (N, 3) array.
funcid : list of int, optional LibXC functional IDs. Default is [1, 7] for Slater (X) and VWN (C) (LDA).
spin : int, optional Spin multiplicity: 0 for unpolarized, 1 for spin-polarized (not allowed currently).
ncores : int, optional Number of threads/cores for parallel execution. Default is 2.
blocksize : int, optional Number of grid points to process per block. Default is 5000.
list_nonzero_indices : list of np.ndarray, optional Precomputed list of nonzero AO indices per block for sparse evaluation.
count_nonzero_indices : list of int, optional
Number of nonzero indices in each block (matches list_nonzero_indices).
list_ao_values : list of np.ndarray, optional Precomputed AO values at grid points for each block.
list_ao_grad_values : list of tuple of np.ndarray, optional Precomputed AO gradient values (x, y, z) at grid points for each block.
debug : bool, optional If True, print detailed timing and diagnostic info.
Returns
efunc : float Total exchange-correlation energy.
v : np.ndarray Exchange-correlation potential matrix in the AO basis.
Notes
- Only supports LDA and GGA functionals currently. meta-GGA and hybrid support is in development.
- Uses LibXC for energy and derivative functional evaluation.
- Blocks are randomly shuffled before processing to balance parallel load.
- Grid-based DFT evaluation using precomputed AO and gradient matrices is supported.
References
- LibXC Functional Codes: https://libxc.gitlab.io/functionals/
- XC term evaluation inspired from: https://pubs.acs.org/doi/full/10.1021/ct200412r
16def eval_xc_3(basis, dmat, weights, coords, funcid=[1,7], spin=0, ncores=2, blocksize=5000, list_nonzero_indices=None, count_nonzero_indices=None, list_ao_values=None, list_ao_grad_values=None, debug=False): 17 # This performs parallelization at the blocks/batches level. 18 # Therefore, joblib is perfect for such embarrasingly parallel task 19 # In order to evaluate a density functional we will use the 20 # libxc library with Python bindings. 21 # However, some sort of simple functionals like LDA, GGA, etc would 22 # need to be implemented in CrysX also, so that the it doesn't depend 23 # on external libraries so heavily that it becomes unusable without those. 24 25 #Useful links: 26 # LibXC manual: https://www.tddft.org/programs/libxc/manual/ 27 # LibXC gitlab: https://gitlab.com/libxc/libxc/-/tree/master 28 # LibXC python interface code: https://gitlab.com/libxc/libxc/-/blob/master/pylibxc/functional.py 29 # LibXC python version installation and example: https://www.tddft.org/programs/libxc/installation/ 30 # Formulae for XC energy and potential calculation: https://pubs.acs.org/doi/full/10.1021/ct200412r 31 # LibXC code list: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/libxc.py 32 # PySCF nr_rks code: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/numint.py 33 # https://www.osti.gov/pages/servlets/purl/1650078 34 35 # Start Ray. 36 if not ray.is_initialized(): 37 ray.init(num_cpus = ncores-1) 38 39 #OUTPUT 40 #Functional energy 41 efunc = 0.0 42 #Functional potential V_{\mu \nu} = \mu|\partial{f}/\partial{\rho}|\nu 43 # with threadpool_limits(limits=1, user_api='blas'): 44 v = np.zeros((basis.bfs_nao, basis.bfs_nao)) 45 nelec = 0 46 47 # TODO it only works for LDA functionals for now. 48 # Need to make it work for GGA, Hybrid, range-separated Hybrid and MetaGGA functionals as well. 49 50 51 ngrids = coords.shape[0] 52 nblocks = ngrids//blocksize 53 54 # print('Number of blocks: ', nblocks) 55 56 # Some stuff to note timings 57 timings = None 58 if debug: 59 durationLibxc = 0.0 60 durationE = 0.0 61 durationF = 0.0 62 durationZ = 0.0 63 durationV = 0.0 64 durationRho = 0.0 65 durationAO = 0.0 66 timings = {'durationLibxc':durationLibxc, 'durationE':durationE, 'durationF':durationF, 'durationZ':durationZ, 'durationV':durationV, 'durationRho':durationRho, 'durationAO':durationAO} 67 68 69 70 ### Calculate stuff necessary for bf/ao evaluation on grid points 71 ### Doesn't make any difference for 510 bfs but might be significant for >1000 bfs 72 # This will help to make the call to evalbfnumba1 faster by skipping the mediator evalbfnumbawrap function 73 # that prepares the following stuff at every iteration 74 #We convert the required properties to numpy arrays as this is what Numba likes. 75 bfs_coords = np.array([basis.bfs_coords]) 76 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 77 bfs_lmn = np.array([basis.bfs_lmn]) 78 bfs_nprim = np.array([basis.bfs_nprim]) 79 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 80 #Numba won't be able to work with these efficiently. 81 #So, we convert them to a numpy 2d array by applying a trick, 82 #that the second dimension is that of the largest list. So that 83 #it can accomadate all the lists. 84 maxnprim = max(basis.bfs_nprim) 85 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 86 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 87 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 88 bfs_radius_cutoff = np.zeros([basis.bfs_nao]) 89 for i in range(basis.bfs_nao): 90 for j in range(basis.bfs_nprim[i]): 91 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 92 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 93 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 94 bfs_radius_cutoff[i] = basis.bfs_radius_cutoff[i] 95 # Now bf/ao values can be evaluated by calling the following 96 # bf_values = Integrals.bf_val_helpers.eval_bfs(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, bfs_radius_cutoff, coord) 97 bfs_data_as_np_arrays = [bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, bfs_radius_cutoff] 98 99 xc_family_dict = {1:'LDA',2:'GGA',4:'MGGA'} 100 # Create a LibXC object 101 funcx = pylibxc.LibXCFunctional(funcid[0], "unpolarized") 102 funcc = pylibxc.LibXCFunctional(funcid[1], "unpolarized") 103 x_family_code = funcx.get_family() 104 c_family_code = funcc.get_family() 105 106 # weights_put = ray.put(weights) 107 # coords_put = ray.put(coords) 108 bfs_data_as_np_arrays = ray.put(bfs_data_as_np_arrays) 109 if list_nonzero_indices is not None: 110 if list_ao_values is not None: 111 if xc_family_dict[x_family_code]=='LDA' and xc_family_dict[c_family_code]=='LDA': 112 # Launch four parallel square tasks 113 futures = [block_dens_func.remote(weights[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat[np.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_ao_values[iblock], funcx=None, funcc=None, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug) for iblock in range(nblocks+1)] 114 # else: #GGA 115 # output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem', batch_size=batch_size)(delayed(block_dens_func)(weights[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat[np.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_ao_values[iblock], list_ao_grad_values[iblock], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug) for iblock in block_indices) 116 # else: 117 # output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem', batch_size=batch_size)(delayed(block_dens_func)(weights[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat[np.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug) for iblock in block_indices) 118 # else: 119 # output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem', batch_size=batch_size)(delayed(block_dens_func)(weights[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat, funcid, bfs_data_as_np_arrays, non_zero_indices=None, ao_values=None, funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug) for iblock in block_indices) 120 121 # Retrieve results 122 output = ray.get(futures) 123 124 for iblock in range(nblocks+1): 125 efunc += output[iblock][0] 126 if list_nonzero_indices is not None: 127 non_zero_indices = list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]] 128 v[np.ix_(non_zero_indices, non_zero_indices)] += output[iblock][1] 129 # v[np.ix_(non_zero_indices, non_zero_indices)] = numexpr.evaluate("(v_ + output_)", {'v_':v[np.ix_(non_zero_indices, non_zero_indices)], 'output_':output[indx_block_output][1]}) 130 else: 131 v += output[iblock][1] 132 nelec += output[iblock][2] 133 if debug: 134 timings['durationLibxc'] += output[iblock][3]['durationLibxc'] 135 timings['durationE'] += output[iblock][3]['durationE'] 136 timings['durationF'] += output[iblock][3]['durationF'] 137 timings['durationZ'] += output[iblock][3]['durationZ'] 138 timings['durationV'] += output[iblock][3]['durationV'] 139 timings['durationRho'] += output[iblock][3]['durationRho'] 140 timings['durationAO'] += output[iblock][3]['durationAO'] 141 # v = numexpr.evaluate('(v + output[iblock][1])') 142 143 144 145 numba.set_num_threads(ncores) 146 print('Number of electrons: ', nelec) 147 if debug: 148 print('Timings:', timings) 149 150 ####### Free memory 151 ## The following is very important to prevent memory leaks and also to make sure that the number of 152 # threads used by the program is same as that specified by the user 153 # gc.collect() # Avoiding using it for now, as it is usually quite slow, although in this case it might not make much difference 154 # Anyway, the following also works 155 output = 0 156 non_zero_indices = 0 157 coords = 0 158 159 return efunc, v
22def eval_xc_1_cupy(basis, dmat, weights, coords, funcid=[1,7], spin=0, blocksize=50000, debug=False, list_nonzero_indices=None, \ 23 count_nonzero_indices=None, list_ao_values=None, list_ao_grad_values=None, use_libxc=True, nstreams=1, ngpus=1,\ 24 freemem=True, threads_per_block=None, type=cp.float64): 25 print('Calculating XC term using GPU and algo 1', flush=True) 26 # Evaluate the XC term using GPU 27 # Here instead of parallelizing over batches, the operations within 28 # a batch are parallelized. Therefore, this requires a larger batchsize~20480. 29 # While the CPU version of this algorithm is quite slow, 30 # the GPU version performs great! 31 32 # In order to evaluate a density functional we will use the 33 # libxc library with Python bindings. 34 # However, some sort of simple functionals like LDA, GGA, etc would 35 # need to be implemented in CrysX also, so that the it doesn't depend 36 # on external libraries so heavily that it becomes unusable without those. 37 38 #Useful links: 39 # LibXC manual: https://www.tddft.org/programs/libxc/manual/ 40 # LibXC gitlab: https://gitlab.com/libxc/libxc/-/tree/master 41 # LibXC python interface code: https://gitlab.com/libxc/libxc/-/blob/master/pylibxc/functional.py 42 # LibXC python version installation and example: https://www.tddft.org/programs/libxc/installation/ 43 # Formulae for XC energy and potential calculation: https://pubs.acs.org/doi/full/10.1021/ct200412r 44 # LibXC code list: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/libxc.py 45 # PySCF nr_rks code: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/numint.py 46 # https://www.osti.gov/pages/servlets/purl/1650078 47 48 startTotal = timer() 49 50 # For debugging and benchmarking purposes 51 durationLibxc = 0.0 52 durationE = 0.0 53 durationF = 0.0 54 durationZ = 0.0 55 durationV = 0.0 56 durationRho = 0.0 57 durationAO = 0.0 58 durationPrelims = 0.0 59 durationTotal = 0.0 60 n_gpu= 1 61 # nstreams = 1 62 # Create streams for asynchronous execution 63 # streams = [cp.cuda.Stream(non_blocking=False) for i in range(nstreams)] 64 # nb_streams = [cuda.external_stream(streams[i].ptr) for i in range(nstreams)] 65 # nb_streams = [cuda.stream() for i in range(nstreams)] 66 # cp_streams = [cp.cuda.ExternalStream(nb_streams[i].ptr) for i in range(nstreams)] 67 # TODO: Try using this for multiGPU support: https://github.com/cupy/cupy/issues/5692 68 # streams = [] 69 # for i in range(n_gpu): 70 # cp.cuda.Device(i).use() 71 # streams.append(cp.cuda.Stream(non_blocking = True)) 72 73 if not use_libxc: 74 print('Not using LibXC for XC evaluations', flush=True) 75 76 if debug: 77 startPrelims = timer() 78 #OUTPUT 79 #Functional energy 80 efunc = 0.0 81 #Functional potential V_{\mu \nu} = \mu|\partial{f}/\partial{\rho}|\nu 82 v = cp.zeros((basis.bfs_nao, basis.bfs_nao), dtype=type) 83 #print(v.dtype) 84 #TODO mGGA, Hybrid 85 86 #Calculate number of blocks/batches 87 ngrids = coords.shape[0] 88 nblocks = ngrids//blocksize 89 nelec = 0.0 90 91 # Convert the arrays needed for CUDA computations to cupy arrays 92 93 # If a list of significant basis functions for each block of grid points is provided 94 if list_nonzero_indices is not None: 95 dmat_orig = dmat 96 dmat_orig_cp = cp.asarray(dmat_orig, dtype=type) 97 else: 98 dmat = cp.asarray(dmat, dtype=type) 99 100 ### Calculate stuff necessary for bf/ao evaluation on grid points 101 ### Doesn't make any difference for 510 bfs but might be significant for >1000 bfs 102 # This will help to make the call to eval_bfs faster by skipping the mediator eval_bfs function 103 # that prepares the following stuff at every iteration 104 #We convert the required properties to numpy arrays as this is what Numba likes. 105 bfs_coords = cp.asarray([basis.bfs_coords], dtype=type) 106 bfs_contr_prim_norms = cp.array([basis.bfs_contr_prim_norms], dtype=type) 107 bfs_lmn = cp.array([basis.bfs_lmn]) 108 bfs_nprim = cp.array([basis.bfs_nprim]) 109 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 110 #Numba won't be able to work with these efficiently. 111 #So, we convert them to a numpy 2d array by applying a trick, 112 #that the second dimension is that of the largest list. So that 113 #it can accomadate all the lists. 114 maxnprim = max(basis.bfs_nprim) 115 bfs_coeffs = cp.zeros([basis.bfs_nao, maxnprim], dtype=type) 116 bfs_expnts = cp.zeros([basis.bfs_nao, maxnprim], dtype=type) 117 bfs_prim_norms = cp.zeros([basis.bfs_nao, maxnprim], dtype=type) 118 bfs_radius_cutoff = cp.zeros([basis.bfs_nao], dtype=type) 119 for i in range(basis.bfs_nao): 120 for j in range(basis.bfs_nprim[i]): 121 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 122 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 123 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 124 bfs_radius_cutoff[i] = basis.bfs_radius_cutoff[i] 125 # Now bf/ao values can be evaluated by calling the following 126 # bf_values = Integrals.bf_val_helpers.eval_bfs(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, bfs_radius_cutoff, coord) 127 for iblock in range(len(list_nonzero_indices)): 128 list_nonzero_indices[iblock] = cp.asarray(list_nonzero_indices[iblock]) 129 130 131 weights_cp = cp.asarray(weights, dtype=type) 132 coords_cp = cp.asarray(coords, dtype=type) 133 134 135 xc_family_dict = {1:'LDA',2:'GGA',4:'MGGA'} 136 137 # Create a LibXC object 138 funcx = pylibxc.LibXCFunctional(funcid[0], "unpolarized") 139 funcc = pylibxc.LibXCFunctional(funcid[1], "unpolarized") 140 x_family_code = funcx.get_family() 141 c_family_code = funcc.get_family() 142 143 144 my_expr = contract_expression('ij,mi,mj->m', (150, 150), (blocksize, 150), (blocksize, 150)) 145 if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA': 146 my_expr_grad1 = contract_expression('ij,kmi,mj->km', (150, 150), (3, blocksize, 150), (blocksize, 150)) 147 my_expr_grad2 = contract_expression('ij,mi,kmj->km', (150, 150), (blocksize, 150), (3, blocksize, 150)) 148 149 if threads_per_block is None: 150 # Determine the optimal number of blocks per grid 151 max_threads_per_block = cuda.get_current_device().MAX_THREADS_PER_BLOCK 152 thread_x = max_threads_per_block/16 153 thread_y = max_threads_per_block/64 154 threads_per_block = (thread_x, thread_y) 155 else: 156 thread_x = threads_per_block[0] 157 thread_y = threads_per_block[1] 158 159 160 if debug: 161 cp.cuda.Stream.null.synchronize() 162 durationPrelims += timer() - startPrelims 163 164 # A large scratch for evaluating batch local ao values 165 ao_value_block_ = cp.zeros((blocksize, max(count_nonzero_indices)), dtype=type) 166 # If either x or c functional is of GGA/MGGA type we need ao_grad_values 167 if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA': 168 ao_value_grad_block_ = cp.zeros((3, blocksize, max(count_nonzero_indices)), dtype=type) 169 170 # Just a precautionary synchronize here so that all the cupy arrays created above should finish getting initialized. 171 # cp.cuda.Stream.null.synchronize() 172 # cuda.synchronize() 173 # with cupyx.profiler.profile(): 174 # with streams[0]: 175 # Loop over blocks/batches of grid points 176 for iblock in range(nblocks+1): 177 # stream_id = iblock % nstreams 178 # with streams[stream_id]: 179 # with nb_streams[stream_id]: 180 # cp.cuda.Device(stream_id).use() 181 # cp.cuda.Stream(cp_streams[stream_id]).use() 182 offset = iblock*blocksize 183 184 # Get weights and coordinates of grid points for this block/batch 185 weights_block = weights_cp[offset : min(offset+blocksize,ngrids)] 186 coords_block = coords_cp[offset : min(offset+blocksize,ngrids)] 187 #coords_block_cp = cp.asarray(coords[offset : min(offset+blocksize,ngrids)]) 188 189 # Get the list of basis functions with significant contributions to this block 190 if list_nonzero_indices is not None: 191 non_zero_indices = list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]] 192 # Get the subset of density matrix corresponding to the siginificant basis functions 193 dmat = dmat_orig_cp[cp.ix_(non_zero_indices, non_zero_indices)] 194 195 if debug: 196 startAO = timer() 197 if xc_family_dict[x_family_code]=='LDA' and xc_family_dict[c_family_code]=='LDA': 198 if list_ao_values is not None: # If ao_values are calculated once and saved, then they can be provided to avoid recalculation 199 ao_value_block = list_ao_values[iblock] 200 else: 201 # ao_value_block = Integrals.eval_bfs(basis, coords_block) 202 if list_nonzero_indices is not None: 203 # ao_value_block = Integrals.bf_val_helpers.eval_bfs_sparse_vectorized_internal_cupy(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, coords_block, non_zero_indices, scratches) 204 # Get a subset of the scratch for evaluating batch local ao values 205 ao_value_block = cp.asarray(ao_value_block_[0:coords_block.shape[0], 0:non_zero_indices.shape[0]]) 206 # cp.cuda.Stream.null.synchronize() # Precautionary synchronize here 207 # blocks_per_grid = ((non_zero_indices.shape[0]//thread_x) + 1, (coords_block.shape[0]//thread_y) + 1) # "lazy" round-up 208 blocks_per_grid = ((non_zero_indices.shape[0] + (thread_x - 1))//thread_x, (coords_block.shape[0] + (thread_y - 1))//thread_y) 209 # nb_stream = stream_cupy_to_numba(streams[stream_id]) 210 # get the pointer to actual CUDA stream 211 # raw_str = streams[stream_id].ptr 212 # nb_stream = numba.cuda.external_stream(raw_str) 213 Integrals.bf_val_helpers.eval_bfs_sparse_internal_cuda[blocks_per_grid, threads_per_block](bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, coords_block, non_zero_indices, ao_value_block) 214 215 # cuda.synchronize() # This is needed when using this in a cupy stream 216 # cp.cuda.Stream.null.synchronize() 217 else: 218 ao_value_block = Integrals.bf_val_helpers.eval_bfs_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, coords_block) 219 # If either x or c functional is of GGA/MGGA type we need ao_grad_values 220 if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA': 221 if list_ao_values is not None: # If ao_values are calculated once and saved, then they can be provided to avoid recalculation 222 ao_value_block, ao_values_grad_block = list_ao_values[iblock], list_ao_grad_values[iblock] 223 else: 224 # ao_value_block, ao_values_grad_block = Integrals.bf_val_helpers.eval_bfs_and_grad(basis, coords_block, deriv=1, parallel=True, non_zero_indices=non_zero_indices) 225 if list_nonzero_indices is not None: 226 # Calculating ao values and gradients together, didn't really do much improvement in computational speed 227 # ao_value_block, ao_values_grad_block = Integrals.bf_val_helpers.eval_bfs_and_grad_sparse_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, coords_block, non_zero_indices) 228 # Get a subset of the scratch for evaluating batch local ao values 229 ao_value_block = cp.asarray(ao_value_block_[0:coords_block.shape[0], 0:non_zero_indices.shape[0]]) 230 ao_values_grad_block = cp.asarray(ao_value_grad_block_[0:3, 0:coords_block.shape[0], 0:non_zero_indices.shape[0]]) 231 blocks_per_grid = ((non_zero_indices.shape[0] + (thread_x - 1))//thread_x, (coords_block.shape[0] + (thread_y - 1))//thread_y) 232 # nb_stream = stream_cupy_to_numba(streams[stream_id]) 233 Integrals.bf_val_helpers.eval_bfs_and_grad_sparse_internal_cuda[blocks_per_grid, threads_per_block](bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], 234 bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, 235 coords_block, non_zero_indices, ao_value_block, 236 ao_values_grad_block) 237 238 # streams[stream_id].synchronize() 239 # nb_streams[stream_id].synchronize() 240 # cp.cuda.Stream.null.synchronize() 241 # streams[stream_id].synchronize() 242 243 else: 244 ao_value_block, ao_values_grad_block = Integrals.bf_val_helpers.eval_bfs_and_grad_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, coords_block) 245 246 247 # Convert to cupy array 248 ao_value_block = cp.asarray(ao_value_block, dtype=type) 249 if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA': 250 ao_values_grad_block = cp.asarray(ao_values_grad_block, dtype=type) 251 252 if debug: 253 cp.cuda.Stream.null.synchronize() 254 durationAO = durationAO + timer() - startAO 255 256 if debug: 257 startRho = timer() 258 # print('calculating rho...') 259 # rho_block = contract('ij,mi,mj->m', dmat, ao_value_block, ao_value_block, backend='cupy') # Original (pretty fast) 260 rho_block = my_expr(dmat, ao_value_block, ao_value_block, backend='cupy') 261 # X = ao_value_block @ dmat.T 262 # rho_block = cp.sum(ao_value_block*X,axis=1) 263 # rho_block = contract('mi,im->m', ao_value_block, X) 264 # If either x or c functional is of GGA/MGGA type we need rho_grad_values too 265 if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA': 266 # rho_grad_block_x, rho_grad_block_y, rho_grad_block_z = ( contract('ij,kmi,mj->km',dmat,ao_values_grad_block,ao_value_block, backend='cupy', optimize='optimal')+\ 267 # contract('ij,mi,kmj->km',dmat,ao_value_block,ao_values_grad_block, backend='cupy', optimize='optimal') )[:] 268 rho_grad_block_x, rho_grad_block_y, rho_grad_block_z = my_expr_grad1(dmat,ao_values_grad_block,ao_value_block, backend='cupy') \ 269 + my_expr_grad2(dmat, ao_value_block, ao_values_grad_block, backend='cupy') 270 sigma_block = calc_sigma(rho_grad_block_x, rho_grad_block_y, rho_grad_block_z) 271 if use_libxc: 272 sigma_block = cp.asnumpy(sigma_block) 273 274 if debug: 275 cp.cuda.Stream.null.synchronize() 276 durationRho = durationRho + timer() - startRho 277 #print('Duration for Rho at grid points: ',durationRho) 278 # print('done calculating rho') 279 280 #LibXC stuff 281 if use_libxc: 282 # Exchange 283 if debug: 284 startLibxc = timer() 285 # Input dictionary for libxc 286 inp = {} 287 # Input dictionary needs density values at grid points 288 inp['rho'] = cp.asnumpy(rho_block) 289 if xc_family_dict[x_family_code]!='LDA': 290 # Input dictionary needs sigma (\nabla \rho \cdot \nabla \rho) values at grid points 291 inp['sigma'] = sigma_block 292 # Calculate the necessary quantities using LibXC 293 retx = funcx.compute(inp) 294 # durationLibxc = durationLibxc + timer() - startLibxc 295 # print('Duration for LibXC computations at grid points: ',durationLibxc) 296 297 # Correlation 298 # startLibxc = timer() 299 # Input dictionary for libxc 300 #inp = {} 301 # Input dictionary needs density values at grid points 302 #inp['rho'] = rho_block 303 if xc_family_dict[c_family_code]!='LDA': 304 # Input dictionary needs sigma (\nabla \rho \cdot \nabla \rho) values at grid points 305 inp['sigma'] = sigma_block 306 # Calculate the necessary quantities using LibXC 307 retc = funcc.compute(inp) 308 if debug: 309 cp.cuda.Stream.null.synchronize() 310 durationLibxc = durationLibxc + timer() - startLibxc 311 # print('Duration for LibXC computations at grid points: ',durationLibxc) 312 else: 313 # Exchange 314 if debug: 315 startLibxc = timer() 316 # Calculate the necessary quantities using own implementation 317 # retx = lda_x(rho_block) 318 # retx = gga_x_b88(rho_block, sigma_block) 319 if xc_family_dict[x_family_code]=='LDA': 320 retx = XC.func_compute(funcid[0], rho_block, use_gpu=True) 321 else: 322 retx = XC.func_compute(funcid[0], rho_block, sigma_block, use_gpu=True) 323 324 # Correlation 325 # Calculate the necessary quantities using own implementation 326 # retc = lda_c_vwn(rho_block) 327 # retc = gga_c_lyp(rho_block, sigma_block) 328 if xc_family_dict[c_family_code]=='LDA': 329 retc = XC.func_compute(funcid[1], rho_block, use_gpu=True) 330 else: 331 retc = XC.func_compute(funcid[1], rho_block, sigma_block, use_gpu=True) 332 if debug: 333 cp.cuda.Stream.null.synchronize() 334 durationLibxc = durationLibxc + timer() - startLibxc 335 # print('Duration for LibXC computations at grid points: ',durationLibxc) 336 337 338 if debug: 339 startE = timer() 340 #ENERGY----------- 341 if use_libxc: 342 e = retx['zk'] + retc['zk'] # Functional values at grid points 343 else: 344 e = retx[0] + retc[0] 345 # Testing CrysX's own implmentation 346 #e = densfuncs.lda_x(rho) 347 348 # Calculate the total energy 349 # Multiply the density at grid points by weights 350 den = rho_block*weights_block 351 # den = cp.multiply(rho_block, weights_block) 352 # den = elementwise_multiply(rho_block, weights_block) 353 if use_libxc: 354 efunc = efunc + cp.dot(den, cp.asarray(e)) #Multiply with functional values at grid points and sum 355 else: 356 efunc = efunc + cp.dot(den, e) #Multiply with functional values at grid points and sum 357 # efunc = efunc + cp.sum(elementwise_multiply(den, e)) #Multiply with functional values at grid points and sum 358 nelec += cp.sum(den) 359 if debug: 360 cp.cuda.Stream.null.synchronize() 361 durationE = durationE + timer() - startE 362 # print('Duration for calculation of total density functional energy: ',durationE) 363 364 #POTENTIAL---------- 365 # The derivative of functional wrt density is vrho 366 if debug: 367 startF = timer() 368 if use_libxc: 369 vrho = retx['vrho'] + retc['vrho'] 370 else: 371 vrho = retx[1] + retc[1] 372 373 vsigma = 0 374 # If either x or c functional is of GGA/MGGA type we need rho_grad_values 375 if xc_family_dict[x_family_code]!='LDA': 376 # The derivative of functional wrt grad \rho square. 377 if use_libxc: 378 vsigma += retx['vsigma'] 379 else: 380 vsigma += retx[2] 381 382 if xc_family_dict[c_family_code]!='LDA': 383 # The derivative of functional wrt grad \rho square. 384 if use_libxc: 385 vsigma += retc['vsigma'] 386 else: 387 vsigma += retc[2] 388 389 # F = np.multiply(weights_block,vrho[:,0]) #This is fast enough. 390 if use_libxc: 391 v_rho_temp = cp.asarray(vrho[:,0]) 392 else: 393 v_rho_temp = vrho 394 # F = weights_block*v_rho_temp 395 # F = calc_F(weights_block, v_rho_temp) 396 # If either x or c functional is of GGA/MGGA type we need rho_grad_values 397 if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA': 398 if use_libxc: 399 Ftemp = 2*weights_block*cp.asarray(vsigma[:,0]) 400 else: 401 Ftemp = 2*weights_block*vsigma 402 # Ftemp = 2*cp.multiply(weights_block, vsigma) 403 # Ftemp = 2*elementwise_multiply(weights_block, vsigma) 404 Fx = Ftemp*rho_grad_block_x 405 Fy = Ftemp*rho_grad_block_y 406 Fz = Ftemp*rho_grad_block_z 407 if debug: 408 cp.cuda.Stream.null.synchronize() 409 durationF = durationF + timer() - startF 410 # print('Duration for calculation of F: ',durationF) 411 if debug: 412 startZ = timer() 413 # z = 0.5*F*ao_value_block.T 414 # z = calc_z(F, ao_value_block.T) 415 z = calc_z(weights_block, v_rho_temp, ao_value_block.T) 416 # z = 0.5*np.einsum('m,mi->mi',F,ao_value_block) 417 # If either x or c functional is of GGA/MGGA type we need rho_grad_values 418 if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA': 419 z += calc_z_gga(Fx, Fy, Fz, ao_values_grad_block[0].T, ao_values_grad_block[1].T, ao_values_grad_block[2].T) 420 # z += Fx*ao_values_grad_block[0].T + Fy*ao_values_grad_block[1].T + Fz*ao_values_grad_block[2].T 421 422 423 if debug: 424 cp.cuda.Stream.null.synchronize() 425 durationZ = durationZ + timer() - startZ 426 # print('Duration for calculation of z : ',durationZ) 427 # Free memory 428 F = 0 429 ao_value_block_T = 0 430 vrho = 0 431 432 if debug: 433 startV = timer() 434 435 #v_temp = z @ ao_value_block 436 #v_temp = cp.dot(z, ao_value_block) 437 v_temp = cp.matmul(z, ao_value_block) 438 if list_nonzero_indices is not None: 439 v[cp.ix_(non_zero_indices, non_zero_indices)] += v_temp + v_temp.T 440 else: 441 v += v_temp + v_temp.T 442 443 444 if debug: 445 cp.cuda.Stream.null.synchronize() 446 durationV = durationV + timer() - startV 447 448 # for stream in streams: 449 # stream.synchronize() 450 451 452 print('Number of electrons: ', nelec) 453 #print(v_temp.dtype) 454 if debug: 455 print('Duration for Prelims: ', durationPrelims) 456 print('Duration for AO values: ', durationAO) 457 print('Duration for V: ',durationV) 458 print('Duration for Rho at grid points: ',durationRho) 459 print('Duration for F: ',durationF) 460 print('Duration for Z: ',durationZ) 461 print('Duration for E: ',durationE) 462 print('Duration for LibXC: ',durationLibxc) 463 464 465 466 467 if debug: 468 startV_tonumpy = timer() 469 if debug: 470 cp.cuda.Stream.null.synchronize() 471 durationV_tonumpy = timer() - startV_tonumpy 472 print('Duration for V to numpy', durationV_tonumpy) 473 print('Total time: ', durationPrelims+durationAO+durationRho+durationLibxc+durationE+durationF+durationV+durationZ+durationV_tonumpy) 474 475 476 if use_libxc: 477 efunc = efunc[0] 478 if debug: 479 cp.cuda.Stream.null.synchronize() 480 durationTotal += timer() - startTotal 481 print('Total time2: ', durationTotal) 482 483 if freemem or nstreams!=1: 484 cp._default_memory_pool.free_all_blocks() 485 486 487 # Final synchronize before return 488 cp.cuda.Stream.null.synchronize() 489 return efunc, v
18def eval_xc_2_cupy(basis, dmat, weights, coords, funcid=[1,7], spin=0, ncores=2, blocksize=5000, list_nonzero_indices=None, count_nonzero_indices=None, list_ao_values=None, list_ao_grad_values=None, debug=False): 19 print('Calculating XC term using GPU and algo 2', flush=True) 20 print('Algo 2 uses a hybrid CPU+GPU approach for XC evaluation.', flush=True) 21 print('CPU threads specified by ncores are used to evaluate functional values (via LibXC) and AO/AO grad values.', flush=True) 22 # This performs parallelization at the blocks/batches level. 23 # Therefore, joblib is perfect for such embarrasingly parallel task 24 # In order to evaluate a density functional we will use the 25 # libxc library with Python bindings. 26 # However, some sort of simple functionals like LDA, GGA, etc would 27 # need to be implemented in CrysX also, so that the it doesn't depend 28 # on external libraries so heavily that it becomes unusable without those. 29 30 #Useful links: 31 # LibXC manual: https://www.tddft.org/programs/libxc/manual/ 32 # LibXC gitlab: https://gitlab.com/libxc/libxc/-/tree/master 33 # LibXC python interface code: https://gitlab.com/libxc/libxc/-/blob/master/pylibxc/functional.py 34 # LibXC python version installation and example: https://www.tddft.org/programs/libxc/installation/ 35 # Formulae for XC energy and potential calculation: https://pubs.acs.org/doi/full/10.1021/ct200412r 36 # LibXC code list: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/libxc.py 37 # PySCF nr_rks code: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/numint.py 38 # https://www.osti.gov/pages/servlets/purl/1650078 39 40 #OUTPUT 41 #Functional energy 42 efunc = 0.0 43 #Functional potential V_{\mu \nu} = \mu|\partial{f}/\partial{\rho}|\nu 44 v = cp.zeros((basis.bfs_nao, basis.bfs_nao)) 45 nelec = 0 46 47 # TODO it only works for LDA functionals for now. 48 # Need to make it work for GGA, Hybrid, range-separated Hybrid and MetaGGA functionals as well. 49 50 51 ngrids = coords.shape[0] 52 nblocks = ngrids//blocksize 53 54 # print('Number of blocks: ', nblocks) 55 56 # Some stuff to note timings 57 timings = None 58 if debug: 59 durationLibxc = 0.0 60 durationE = 0.0 61 durationF = 0.0 62 durationZ = 0.0 63 durationV = 0.0 64 durationRho = 0.0 65 durationAO = 0.0 66 timings = {'durationLibxc':durationLibxc, 'durationE':durationE, 'durationF':durationF, 'durationZ':durationZ, 'durationV':durationV, 'durationRho':durationRho, 'durationAO':durationAO} 67 68 69 70 ### Calculate stuff necessary for bf/ao evaluation on grid points 71 ### Doesn't make any difference for 510 bfs but might be significant for >1000 bfs 72 # This will help to make the call to evalbfnumba1 faster by skipping the mediator evalbfnumbawrap function 73 # that prepares the following stuff at every iteration 74 #We convert the required properties to numpy arrays as this is what Numba likes. 75 bfs_coords = np.array([basis.bfs_coords]) 76 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 77 bfs_lmn = np.array([basis.bfs_lmn]) 78 bfs_nprim = np.array([basis.bfs_nprim]) 79 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 80 #Numba won't be able to work with these efficiently. 81 #So, we convert them to a numpy 2d array by applying a trick, 82 #that the second dimension is that of the largest list. So that 83 #it can accomadate all the lists. 84 maxnprim = max(basis.bfs_nprim) 85 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 86 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 87 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 88 bfs_radius_cutoff = np.zeros([basis.bfs_nao]) 89 for i in range(basis.bfs_nao): 90 for j in range(basis.bfs_nprim[i]): 91 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 92 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 93 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 94 bfs_radius_cutoff[i] = basis.bfs_radius_cutoff[i] 95 # Now bf/ao values can be evaluated by calling the following 96 # bf_values = Integrals.bf_val_helpers.eval_bfs(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, bfs_radius_cutoff, coord) 97 bfs_data_as_np_arrays = [bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, bfs_radius_cutoff] 98 99 xc_family_dict = {1:'LDA',2:'GGA',4:'MGGA'} 100 # Create a LibXC object 101 funcx = pylibxc.LibXCFunctional(funcid[0], "unpolarized") 102 funcc = pylibxc.LibXCFunctional(funcid[1], "unpolarized") 103 x_family_code = funcx.get_family() 104 c_family_code = funcc.get_family() 105 106 weights_cp = cp.asarray(weights) 107 dmat_cp = cp.asarray(dmat) 108 # Create streams for asynchronous execution 109 streams = [cp.cuda.Stream(non_blocking=True) for i in range(ncores)] 110 111 #### Set number of cores for numba related evaluations within 'block_dens_func()' for example bf_value evaluations 112 #### Apparently, it was still using as many cores as possible and creating a serial version of thise functions as I have 113 #### currently done doesn't work 114 # numba.set_num_threads(1) 115 if list_nonzero_indices is not None: 116 if list_ao_values is not None: 117 if xc_family_dict[x_family_code]=='LDA' and xc_family_dict[c_family_code]=='LDA': 118 output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem', batch_size=nblocks//ncores)(delayed(block_dens_func)(weights_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat_cp[cp.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_ao_values[iblock], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug, stream=streams[iblock%len(streams)]) for iblock in range(nblocks+1)) 119 else: #GGA 120 output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem')(delayed(block_dens_func)(weights_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat_cp[np.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_ao_values[iblock], list_ao_grad_values[iblock], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug) for iblock in range(nblocks+1)) 121 else: 122 output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem')(delayed(block_dens_func)(weights_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat_cp[np.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug) for iblock in range(nblocks+1)) 123 else: 124 output = Parallel(n_jobs=ncores, backend='threading', require='sharedmem')(delayed(block_dens_func)(weights_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat_cp, funcid, bfs_data_as_np_arrays, non_zero_indices=None, ao_values=None, funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug) for iblock in range(nblocks+1)) 125 126 # print(len(output)) 127 for iblock in range(0,len(output)): 128 efunc += output[iblock][0] 129 if list_nonzero_indices is not None: 130 non_zero_indices = list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]] 131 v[cp.ix_(non_zero_indices, non_zero_indices)] += output[iblock][1] 132 else: 133 v += output[iblock][1] 134 nelec += output[iblock][2] 135 if debug: 136 timings['durationLibxc'] += output[iblock][3]['durationLibxc'] 137 timings['durationE'] += output[iblock][3]['durationE'] 138 timings['durationF'] += output[iblock][3]['durationF'] 139 timings['durationZ'] += output[iblock][3]['durationZ'] 140 timings['durationV'] += output[iblock][3]['durationV'] 141 timings['durationRho'] += output[iblock][3]['durationRho'] 142 timings['durationAO'] += output[iblock][3]['durationAO'] 143 # v = numexpr.evaluate('(v + output[iblock][1])') 144 145 146 147 148 numba.set_num_threads(ncores) 149 print('Number of electrons: ', nelec) 150 if debug: 151 print('Timings:', timings) 152 153 ####### Free memory 154 ## The following is very important to prevent memory leaks and also to make sure that the number of 155 # threads used by the program is same as that specified by the user 156 # gc.collect() # Avoiding using it for now, as it is usually quite slow, although in this case it might not make much difference 157 # Anyway, the following also works 158 output = 0 159 non_zero_indices = 0 160 coords = 0 161 cp.cuda.Stream.null.synchronize() 162 return efunc, v
25def eval_xc_3_cupy(basis, dmat, weights, coords, funcid=[1,7], spin=0, blocksize=10240, debug=False, list_nonzero_indices=None, \ 26 count_nonzero_indices=None, list_ao_values=None, list_ao_grad_values=None, use_libxc=True, nstreams=1, ngpus=1,\ 27 freemem=True, threads_per_block=None, type=cp.float64, streams=None, nb_streams=None, bfs_data_as_np_arrays=None): 28 print('Calculating XC term using GPU and algo 3', flush=True) 29 if not use_libxc: 30 print('Not using LibXC for XC evaluations', flush=True) 31 # This performs parallelization at the blocks/batches level. 32 # Therefore, joblib is perfect for such embarrasingly parallel task 33 # In order to evaluate a density functional we will use the 34 # libxc library with Python bindings. 35 # However, some sort of simple functionals like LDA, GGA, etc would 36 # need to be implemented in CrysX also, so that the it doesn't depend 37 # on external libraries so heavily that it becomes unusable without those. 38 39 #Useful links: 40 # LibXC manual: https://www.tddft.org/programs/libxc/manual/ 41 # LibXC gitlab: https://gitlab.com/libxc/libxc/-/tree/master 42 # LibXC python interface code: https://gitlab.com/libxc/libxc/-/blob/master/pylibxc/functional.py 43 # LibXC python version installation and example: https://www.tddft.org/programs/libxc/installation/ 44 # Formulae for XC energy and potential calculation: https://pubs.acs.org/doi/full/10.1021/ct200412r 45 # LibXC code list: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/libxc.py 46 # PySCF nr_rks code: https://github.com/pyscf/pyscf/blob/master/pyscf/dft/numint.py 47 # https://www.osti.gov/pages/servlets/purl/1650078 48 49 #OUTPUT 50 #Functional energy 51 efunc = 0.0 52 # Start from the main GPU 53 cp.cuda.Device(0).use() 54 #Functional potential V_{\mu \nu} = \mu|\partial{f}/\partial{\rho}|\nu 55 v = cp.zeros((basis.bfs_nao, basis.bfs_nao), dtype=type) 56 nelec = 0 57 58 # TODO it only works for LDA functionals for now. 59 # Need to make it work for GGA, Hybrid, range-separated Hybrid and MetaGGA functionals as well. 60 61 62 ngrids = coords.shape[0] 63 nblocks = ngrids//blocksize 64 65 # print('Number of blocks: ', nblocks) 66 67 # Some stuff to note timings 68 timings = None 69 if debug: 70 durationLibxc = 0.0 71 durationE = 0.0 72 durationF = 0.0 73 durationZ = 0.0 74 durationV = 0.0 75 durationRho = 0.0 76 durationAO = 0.0 77 timings = {'durationLibxc':durationLibxc, 'durationE':durationE, 'durationF':durationF, 'durationZ':durationZ, 'durationV':durationV, 'durationRho':durationRho, 'durationAO':durationAO} 78 79 80 81 ### Calculate stuff necessary for bf/ao evaluation on grid points 82 ### Doesn't make any difference for 510 bfs but might be significant for >1000 bfs 83 # This will help to make the call to evalbfnumba1 faster by skipping the mediator evalbfnumbawrap function 84 # that prepares the following stuff at every iteration 85 #We convert the required properties to numpy arrays as this is what Numba likes. 86 if bfs_data_as_np_arrays is None: 87 bfs_coords = cp.asarray([basis.bfs_coords], dtype=type) 88 bfs_contr_prim_norms = cp.asarray([basis.bfs_contr_prim_norms], dtype=type) 89 bfs_lmn = cp.asarray([basis.bfs_lmn]) 90 bfs_nprim = cp.asarray([basis.bfs_nprim]) 91 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 92 #Numba won't be able to work with these efficiently. 93 #So, we convert them to a numpy 2d array by applying a trick, 94 #that the second dimension is that of the largest list. So that 95 #it can accomadate all the lists. 96 maxnprim = max(basis.bfs_nprim) 97 bfs_coeffs = cp.zeros([basis.bfs_nao, maxnprim], dtype=type) 98 bfs_expnts = cp.zeros([basis.bfs_nao, maxnprim], dtype=type) 99 bfs_prim_norms = cp.zeros([basis.bfs_nao, maxnprim], dtype=type) 100 bfs_radius_cutoff = cp.zeros([basis.bfs_nao], dtype=type) 101 for i in range(basis.bfs_nao): 102 for j in range(basis.bfs_nprim[i]): 103 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 104 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 105 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 106 bfs_radius_cutoff[i] = basis.bfs_radius_cutoff[i] 107 # Now bf/ao values can be evaluated by calling the following 108 # bf_values = Integrals.bf_val_helpers.eval_bfs(bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, bfs_radius_cutoff, coord) 109 bfs_data_as_np_arrays = [bfs_coords[0], bfs_contr_prim_norms[0], bfs_nprim[0], bfs_lmn[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, bfs_radius_cutoff] 110 111 xc_family_dict = {1:'LDA',2:'GGA',4:'MGGA'} 112 # Create a LibXC object 113 funcx = pylibxc.LibXCFunctional(funcid[0], "unpolarized") 114 funcc = pylibxc.LibXCFunctional(funcid[1], "unpolarized") 115 x_family_code = funcx.get_family() 116 c_family_code = funcc.get_family() 117 118 my_expr = contract_expression('ij,mi,mj->m', (150, 150), (blocksize, 150), (blocksize, 150)) 119 my_expr_grad1 = None 120 my_expr_grad2 = None 121 if xc_family_dict[x_family_code]!='LDA' or xc_family_dict[c_family_code]!='LDA': 122 my_expr_grad1 = contract_expression('ij,kmi,mj->km', (150, 150), (3, blocksize, 150), (blocksize, 150)) 123 my_expr_grad2 = contract_expression('ij,mi,kmj->km', (150, 150), (blocksize, 150), (3, blocksize, 150)) 124 contr_expr = [my_expr, my_expr_grad1, my_expr_grad2] 125 126 if threads_per_block is None: 127 # Determine the optimal number of blocks per grid 128 max_threads_per_block = cuda.get_current_device().MAX_THREADS_PER_BLOCK 129 thread_x = max_threads_per_block/16 130 thread_y = max_threads_per_block/64 131 threads_per_block = (thread_x, thread_y) 132 else: 133 thread_x = threads_per_block[0] 134 thread_y = threads_per_block[1] 135 136 for iblock in range(len(list_nonzero_indices)): 137 list_nonzero_indices[iblock] = cp.asarray(list_nonzero_indices[iblock]) 138 139 140 weights_cp = cp.asarray(weights, dtype=type) 141 coords_cp = cp.asarray(coords, dtype=type) 142 143 dmat_cp = cp.asarray(dmat, dtype=type) 144 # Create dmat lists 145 dmats_list = [] 146 for iblock in range(nblocks + 1): 147 dmats_list.append(cp.copy(dmat_cp[cp.ix_(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]])])) 148 149 # Create streams for asynchronous execution 150 # on different GPUs 151 if streams is None and nb_streams is None: 152 streams = [] 153 nb_streams = [] 154 for i in range(ngpus): 155 cp.cuda.Device(i).use() 156 cp_stream = cp.cuda.Stream(non_blocking = True) 157 nb_stream = cuda.external_stream(cp_stream.ptr) 158 streams.append(cp_stream) 159 nb_streams.append(nb_stream) 160 161 if list_nonzero_indices is not None: 162 if list_ao_values is not None: 163 if xc_family_dict[x_family_code]=='LDA' and xc_family_dict[c_family_code]=='LDA': 164 output = Parallel(n_jobs=ngpus, backend='threading', require='sharedmem')(delayed(block_dens_func)(weights_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmats_list[iblock], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_ao_values[iblock], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug, stream=streams[iblock%ngpus], nb_stream=nb_streams[iblock%ngpus], thread_x=thread_x, 165 thread_y=thread_y, threads_per_block=threads_per_block, type=type, use_libxc=use_libxc, expressions=contr_expr, device=iblock%ngpus) for iblock in range(nblocks+1)) 166 else: #GGA 167 output = Parallel(n_jobs=ngpus, backend='threading', require='sharedmem')(delayed(block_dens_func)(weights_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmats_list[iblock], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], list_ao_values[iblock], list_ao_grad_values[iblock], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug, stream=streams[iblock%ngpus], nb_stream=nb_streams[iblock%ngpus], thread_x=thread_x, 168 thread_y=thread_y, threads_per_block=threads_per_block, type=type, use_libxc=use_libxc, expressions=contr_expr, device=iblock%ngpus) for iblock in range(nblocks+1)) 169 else: 170 output = Parallel(n_jobs=ngpus, backend='threading', require='sharedmem')(delayed(block_dens_func)(weights_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmats_list[iblock], funcid, bfs_data_as_np_arrays, list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]], funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug, stream=streams[iblock%ngpus], nb_stream=nb_streams[iblock%ngpus], thread_x=thread_x, 171 thread_y=thread_y, threads_per_block=threads_per_block, type=type, use_libxc=use_libxc, expressions=contr_expr, device=iblock%ngpus) for iblock in range(nblocks+1)) 172 else: 173 output = Parallel(n_jobs=ngpus, backend='threading', require='sharedmem')(delayed(block_dens_func)(weights_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], coords_cp[iblock*blocksize : min(iblock*blocksize+blocksize,ngrids)], dmat_cp, funcid, bfs_data_as_np_arrays, non_zero_indices=None, ao_values=None, funcx=funcx, funcc=funcc, x_family_code=x_family_code, c_family_code=c_family_code, xc_family_dict=xc_family_dict, debug=debug, stream=streams[iblock%ngpus], nb_stream=nb_streams[iblock%ngpus], thread_x=thread_x, 174 thread_y=thread_y, threads_per_block=threads_per_block, type=type, use_libxc=use_libxc, expressions=contr_expr, device=iblock%ngpus) for iblock in range(nblocks+1)) 175 176 for istream in range(ngpus): 177 streams[istream].synchronize() 178 # Switch back to main GPU 179 cp.cuda.Device(0).use() 180 for iblock in range(0,len(output)): 181 efunc += cp.asarray(output[iblock][0]) 182 if list_nonzero_indices is not None: 183 non_zero_indices = cp.asarray(list_nonzero_indices[iblock][0:count_nonzero_indices[iblock]]) 184 # v[cp.ix_(non_zero_indices, non_zero_indices)] += cp.asarray(output[iblock][1]) 185 v[cp.ix_(non_zero_indices, non_zero_indices)] = cp.asarray(output[iblock][1]) + cp.asarray(v)[cp.ix_(non_zero_indices, non_zero_indices)] 186 else: 187 # v += cp.asarray(output[iblock][1]) 188 v = cp.asarray(output[iblock][1]) + cp.asarray(v) 189 nelec += cp.asarray(output[iblock][2]) 190 if debug: 191 timings['durationLibxc'] += output[iblock][3]['durationLibxc'] 192 timings['durationE'] += output[iblock][3]['durationE'] 193 timings['durationF'] += output[iblock][3]['durationF'] 194 timings['durationZ'] += output[iblock][3]['durationZ'] 195 timings['durationV'] += output[iblock][3]['durationV'] 196 timings['durationRho'] += output[iblock][3]['durationRho'] 197 timings['durationAO'] += output[iblock][3]['durationAO'] 198 # v = numexpr.evaluate('(v + output[iblock][1])') 199 200 201 202 203 print('Number of electrons: ', nelec) 204 if debug: 205 print('Timings:', timings) 206 207 ####### Free memory 208 ## The following is very important to prevent memory leaks and also to make sure that the number of 209 # threads used by the program is same as that specified by the user 210 # gc.collect() # Avoiding using it for now, as it is usually quite slow, although in this case it might not make much difference 211 # Anyway, the following also works 212 output = 0 213 non_zero_indices = 0 214 coords = 0 215 216 217 cp.cuda.Stream.null.synchronize() 218 219 if use_libxc: 220 efunc = efunc[0] 221 222 if freemem: 223 for istream in range(ngpus): 224 cp.cuda.Device(istream).use() 225 cp._default_memory_pool.free_all_blocks() 226 # cuda.current_context().memory_manager.deallocations.clear() 227 # Switch back to main GPU 228 cp.cuda.Device(0).use() 229 230 return efunc, v
7def dipole_moment_mat_symm(basis, slice=None, origin=np.zeros((3))): 8 """ 9 Compute the dipole moment integral matrix in the atomic orbital (AO) basis 10 using symmetry and the McMurchie–Davidson algorithm. 11 12 This routine evaluates the matrix elements: 13 14 μ_ij^(k) = ⟨ χ_i | r_k | χ_j ⟩ 15 16 where: 17 - χ_i, χ_j are Gaussian-type atomic orbitals (AOs) 18 - r_k is the k-th Cartesian coordinate (x, y, or z) 19 - origin is the coordinate origin for the dipole operator 20 21 The implementation follows the McMurchie–Davidson scheme for Gaussian 22 integrals, adapted from: 23 24 https://github.com/jjgoings/McMurchie-Davidson 25 26 Symmetries exploited 27 -------------------- 28 The dipole moment matrix is symmetric for real-valued AOs: 29 30 μ_ij^(k) = μ_ji^(k) 31 32 This halves the number of unique integrals from N_bf^2 to 33 N_bf*(N_bf+1)/2 for each Cartesian component. 34 35 Parameters 36 ---------- 37 basis : object 38 Basis set object containing: 39 - bfs_coords : Cartesian coordinates of AO centers 40 - bfs_coeffs : Contraction coefficients 41 - bfs_expnts : Gaussian exponents 42 - bfs_prim_norms : Primitive normalization constants 43 - bfs_contr_prim_norms : Contraction normalization factors 44 - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n) 45 - bfs_nprim : Number of primitives per basis function 46 - bfs_nao : Number of atomic orbitals 47 48 slice : list of int, optional 49 A 4-element list [row_start, row_end, col_start, col_end] 50 specifying the sub-block of the matrix to compute. 51 If None (default), computes the full symmetric matrix. 52 53 origin : ndarray of shape (3,), optional 54 Origin of the dipole operator in Cartesian coordinates. 55 Default is (0.0, 0.0, 0.0). 56 57 Returns 58 ------- 59 M : ndarray of shape (3, n_rows, n_cols) 60 The computed dipole moment integrals for the requested block, 61 with the first dimension indexing the x, y, z components. 62 63 Notes 64 ----- 65 - Basis set data are pre-packed into NumPy arrays for compatibility 66 with Numba JIT compilation. 67 - The algorithm avoids redundant evaluations by exploiting matrix 68 symmetry when `slice` covers the full range. 69 - The dipole moment integrals are typically used in computing 70 molecular dipole moments from density matrices. 71 72 Examples 73 -------- 74 >>> mu_full = dipole_moment_mat_symm(basis) 75 >>> mu_block = dipole_moment_mat_symm(basis, slice=[0, 5, 0, 5]) 76 >>> mu_shifted = dipole_moment_mat_symm(basis, origin=np.array([1.0, 0.0, 0.0])) 77 """ 78 # Here the lists are converted to numpy arrays for better use with Numba. 79 # Once these conversions are done we pass these to a Numba decorated 80 # function that uses prange, etc. to calculate the matrix efficiently. 81 82 # This function calculates the kinetic energy matrix for a given basis object. 83 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 84 # It is possible to only calculate a slice (block) of the complete matrix. 85 # slice is a 4 element list whose first and second elements give the range of the rows to be calculated. 86 # the third and fourth element give the range of columns to be calculated. 87 # slice = [start_row, end_row, start_col, end_col] 88 # The integrals are performed using the formulas 89 90 #We convert the required properties to numpy arrays as this is what Numba likes. 91 bfs_coords = np.array([basis.bfs_coords]) 92 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 93 bfs_lmn = np.array([basis.bfs_lmn]) 94 bfs_nprim = np.array([basis.bfs_nprim]) 95 96 97 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 98 #Numba won't be able to work with these efficiently. 99 #So, we convert them to a numpy 2d array by applying a trick, 100 #that the second dimension is that of the largest list. So that 101 #it can accomadate all the lists. 102 maxnprim = max(basis.bfs_nprim) 103 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 104 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 105 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 106 for i in range(basis.bfs_nao): 107 for j in range(basis.bfs_nprim[i]): 108 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 109 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 110 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 111 112 113 114 if slice is None: 115 slice = [0, basis.bfs_nao, 0, basis.bfs_nao] 116 117 #Limits for the calculation of overlap integrals 118 a = int(slice[0]) #row start index 119 b = int(slice[1]) #row end index 120 c = int(slice[2]) #column start index 121 d = int(slice[3]) #column end index 122 123 124 M = dipole_moment_mat_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d, origin) 125 126 return M
Compute the dipole moment integral matrix in the atomic orbital (AO) basis using symmetry and the McMurchie–Davidson algorithm.
This routine evaluates the matrix elements:
μ_ij^(k) = ⟨ χ_i | r_k | χ_j ⟩
where: - χ_i, χ_j are Gaussian-type atomic orbitals (AOs) - r_k is the k-th Cartesian coordinate (x, y, or z) - origin is the coordinate origin for the dipole operator
The implementation follows the McMurchie–Davidson scheme for Gaussian integrals, adapted from:
https://github.com/jjgoings/McMurchie-Davidson
Symmetries exploited
The dipole moment matrix is symmetric for real-valued AOs:
μ_ij^(k) = μ_ji^(k)
This halves the number of unique integrals from N_bf^2 to N_bf*(N_bf+1)/2 for each Cartesian component.
Parameters
basis : object Basis set object containing: - bfs_coords : Cartesian coordinates of AO centers - bfs_coeffs : Contraction coefficients - bfs_expnts : Gaussian exponents - bfs_prim_norms : Primitive normalization constants - bfs_contr_prim_norms : Contraction normalization factors - bfs_lmn : Angular momentum quantum numbers (ℓ, m, n) - bfs_nprim : Number of primitives per basis function - bfs_nao : Number of atomic orbitals
slice : list of int, optional A 4-element list [row_start, row_end, col_start, col_end] specifying the sub-block of the matrix to compute. If None (default), computes the full symmetric matrix.
origin : ndarray of shape (3,), optional Origin of the dipole operator in Cartesian coordinates. Default is (0.0, 0.0, 0.0).
Returns
M : ndarray of shape (3, n_rows, n_cols) The computed dipole moment integrals for the requested block, with the first dimension indexing the x, y, z components.
Notes
- Basis set data are pre-packed into NumPy arrays for compatibility with Numba JIT compilation.
- The algorithm avoids redundant evaluations by exploiting matrix
symmetry when
slicecovers the full range. - The dipole moment integrals are typically used in computing molecular dipole moments from density matrices.
Examples
>>> mu_full = dipole_moment_mat_symm(basis)
>>> mu_block = dipole_moment_mat_symm(basis, slice=[0, 5, 0, 5])
>>> mu_shifted = dipole_moment_mat_symm(basis, origin=np.array([1.0, 0.0, 0.0]))
20def kin_mat_symm_cupy(basis, slice=None, cp_stream=None): 21 # Here the lists are converted to numpy arrays for better use with Numba. 22 # Once these conversions are done we pass these to a Numba decorated 23 # function that uses prange, etc. to calculate the matrix efficiently. 24 25 # This function calculates the kinetic energy matrix for a given basis object. 26 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 27 # It is possible to only calculate a slice (block) of the complete matrix. 28 # slice is a 4 element list whose first and second elements give the range of the rows to be calculated. 29 # the third and fourth element give the range of columns to be calculated. 30 # slice = [start_row, end_row, start_col, end_col] 31 # The integrals are performed using the formulas 32 33 #We convert the required properties to numpy arrays as this is what Numba likes. 34 bfs_coords = cp.array([basis.bfs_coords]) 35 bfs_contr_prim_norms = cp.array([basis.bfs_contr_prim_norms]) 36 bfs_lmn = cp.array([basis.bfs_lmn]) 37 bfs_nprim = cp.array([basis.bfs_nprim]) 38 39 40 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 41 #Numba won't be able to work with these efficiently. 42 #So, we convert them to a numpy 2d array by applying a trick, 43 #that the second dimension is that of the largest list. So that 44 #it can accomadate all the lists. 45 maxnprim = max(basis.bfs_nprim) 46 bfs_coeffs = cp.zeros([basis.bfs_nao, maxnprim]) 47 bfs_expnts = cp.zeros([basis.bfs_nao, maxnprim]) 48 bfs_prim_norms = cp.zeros([basis.bfs_nao, maxnprim]) 49 for i in range(basis.bfs_nao): 50 for j in range(basis.bfs_nprim[i]): 51 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 52 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 53 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 54 55 56 57 if slice is None: 58 slice = [0, basis.bfs_nao, 0, basis.bfs_nao] 59 60 #Limits for the calculation of overlap integrals 61 a = int(slice[0]) #row start index 62 b = int(slice[1]) #row end index 63 c = int(slice[2]) #column start index 64 d = int(slice[3]) #column end index 65 66 # Infer the matrix shape from the start and end indices 67 num_rows = b - a 68 num_cols = d - c 69 start_row = a 70 end_row = b 71 start_col = c 72 end_col = d 73 matrix_shape = (num_rows, num_cols) 74 75 # Check if the slice of the matrix requested falls in the lower/upper triangle or in both the triangles 76 upper_tri = False 77 lower_tri = False 78 both_tri_symm = False 79 both_tri_nonsymm = False 80 if end_row <= start_col: 81 upper_tri = True 82 elif start_row >= end_col: 83 lower_tri = True 84 elif start_row==start_col and end_row==end_col: 85 both_tri_symm = True 86 else: 87 both_tri_nonsymm = True 88 89 # Initialize the matrix with zeros 90 T = cp.zeros(matrix_shape) 91 92 thread_x = 32 93 thread_y = 32 94 95 if cp_stream is None: 96 device = 0 97 cp.cuda.Device(device).use() 98 cp_stream = cp.cuda.Stream(non_blocking = True) 99 nb_stream = cuda.external_stream(cp_stream.ptr) 100 cp_stream.use() 101 else: 102 nb_stream = cuda.external_stream(cp_stream.ptr) 103 cp_stream.use() 104 105 106 blocks_per_grid = ((num_rows + (thread_x - 1))//thread_x, (num_cols + (thread_y - 1))//thread_y) 107 kin_mat_symm_internal_cuda[blocks_per_grid, (thread_x, thread_y), nb_stream](bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d, lower_tri, upper_tri, both_tri_symm, both_tri_nonsymm, T) 108 if both_tri_symm: 109 symmetrize[blocks_per_grid, (thread_x, thread_y), nb_stream](a,b,c,d,T) 110 111 # cuda.synchronize() 112 cp_stream.synchronize() 113 cp.cuda.Stream.null.synchronize() 114 return T #+ T.T - cp.diag(cp.diag(T))
20def overlap_mat_symm_cupy(basis, slice=None, cp_stream=None): 21 # Here the lists are converted to numpy arrays for better use with Numba. 22 # Once these conversions are done we pass these to a Numba decorated 23 # function that uses prange, etc. to calculate the matrix efficiently. 24 25 # This function calculates the kinetic energy matrix for a given basis object. 26 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 27 # It is possible to only calculate a slice (block) of the complete matrix. 28 # slice is a 4 element list whose first and second elements give the range of the rows to be calculated. 29 # the third and fourth element give the range of columns to be calculated. 30 # slice = [start_row, end_row, start_col, end_col] 31 # The integrals are performed using the formulas 32 33 #We convert the required properties to numpy arrays as this is what Numba likes. 34 bfs_coords = cp.array([basis.bfs_coords]) 35 bfs_contr_prim_norms = cp.array([basis.bfs_contr_prim_norms]) 36 bfs_lmn = cp.array([basis.bfs_lmn]) 37 bfs_nprim = cp.array([basis.bfs_nprim]) 38 39 40 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 41 #Numba won't be able to work with these efficiently. 42 #So, we convert them to a numpy 2d array by applying a trick, 43 #that the second dimension is that of the largest list. So that 44 #it can accomadate all the lists. 45 maxnprim = max(basis.bfs_nprim) 46 bfs_coeffs = cp.zeros([basis.bfs_nao, maxnprim]) 47 bfs_expnts = cp.zeros([basis.bfs_nao, maxnprim]) 48 bfs_prim_norms = cp.zeros([basis.bfs_nao, maxnprim]) 49 for i in range(basis.bfs_nao): 50 for j in range(basis.bfs_nprim[i]): 51 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 52 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 53 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 54 55 56 57 if slice is None: 58 slice = [0, basis.bfs_nao, 0, basis.bfs_nao] 59 60 #Limits for the calculation of overlap integrals 61 a = int(slice[0]) #row start index 62 b = int(slice[1]) #row end index 63 c = int(slice[2]) #column start index 64 d = int(slice[3]) #column end index 65 66 # Infer the matrix shape from the start and end indices 67 num_rows = b - a 68 num_cols = d - c 69 start_row = a 70 end_row = b 71 start_col = c 72 end_col = d 73 matrix_shape = (num_rows, num_cols) 74 75 # Check if the slice of the matrix requested falls in the lower/upper triangle or in both the triangles 76 upper_tri = False 77 lower_tri = False 78 both_tri_symm = False 79 both_tri_nonsymm = False 80 if end_row <= start_col: 81 upper_tri = True 82 elif start_row >= end_col: 83 lower_tri = True 84 elif start_row==start_col and end_row==end_col: 85 both_tri_symm = True 86 else: 87 both_tri_nonsymm = True 88 89 # Initialize the matrix with zeros 90 S = cp.zeros(matrix_shape) 91 92 thread_x = 24 93 thread_y = 16 94 95 if cp_stream is None: 96 device = 0 97 cp.cuda.Device(device).use() 98 cp_stream = cp.cuda.Stream(non_blocking = True) 99 nb_stream = cuda.external_stream(cp_stream.ptr) 100 cp_stream.use() 101 else: 102 nb_stream = cuda.external_stream(cp_stream.ptr) 103 cp_stream.use() 104 105 blocks_per_grid = ((num_rows + (thread_x - 1))//thread_x, (num_cols + (thread_y - 1))//thread_y) 106 overlap_mat_symm_internal_cuda[blocks_per_grid, (thread_x, thread_y), nb_stream](bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d, lower_tri, upper_tri, both_tri_symm, both_tri_nonsymm, S) 107 if both_tri_symm: 108 symmetrize[blocks_per_grid, (thread_x, thread_y), nb_stream](a,b,c,d,S) 109 110 # cuda.synchronize() 111 cp_stream.synchronize() 112 cp.cuda.Stream.null.synchronize() 113 # cp._default_memory_pool.free_all_blocks() 114 return S
20def dipole_moment_mat_symm_cupy(basis, slice=None, origin=None, stream=None): 21 # Here the lists are converted to numpy arrays for better use with Numba. 22 # Once these conversions are done we pass these to a Numba decorated 23 # function that uses prange, etc. to calculate the matrix efficiently. 24 25 # This function calculates the kinetic energy matrix for a given basis object. 26 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 27 # It is possible to only calculate a slice (block) of the complete matrix. 28 # slice is a 4 element list whose first and second elements give the range of the rows to be calculated. 29 # the third and fourth element give the range of columns to be calculated. 30 # slice = [start_row, end_row, start_col, end_col] 31 # The integrals are performed using the formulas 32 33 if origin is None: 34 origin = cp.zeros((3)) 35 #We convert the required properties to numpy arrays as this is what Numba likes. 36 bfs_coords = cp.array([basis.bfs_coords]) 37 bfs_contr_prim_norms = cp.array([basis.bfs_contr_prim_norms]) 38 bfs_lmn = cp.array([basis.bfs_lmn]) 39 bfs_nprim = cp.array([basis.bfs_nprim]) 40 41 42 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 43 #Numba won't be able to work with these efficiently. 44 #So, we convert them to a numpy 2d array by applying a trick, 45 #that the second dimension is that of the largest list. So that 46 #it can accomadate all the lists. 47 maxnprim = max(basis.bfs_nprim) 48 bfs_coeffs = cp.zeros([basis.bfs_nao, maxnprim]) 49 bfs_expnts = cp.zeros([basis.bfs_nao, maxnprim]) 50 bfs_prim_norms = cp.zeros([basis.bfs_nao, maxnprim]) 51 for i in range(basis.bfs_nao): 52 for j in range(basis.bfs_nprim[i]): 53 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 54 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 55 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 56 57 58 59 if slice is None: 60 slice = [0, basis.bfs_nao, 0, basis.bfs_nao] 61 62 #Limits for the calculation of overlap integrals 63 a = int(slice[0]) #row start index 64 b = int(slice[1]) #row end index 65 c = int(slice[2]) #column start index 66 d = int(slice[3]) #column end index 67 68 # Infer the matrix shape from the start and end indices 69 num_rows = b - a 70 num_cols = d - c 71 start_row = a 72 end_row = b 73 start_col = c 74 end_col = d 75 matrix_shape = (3, num_rows, num_cols) 76 77 # Check if the slice of the matrix requested falls in the lower/upper triangle or in both the triangles 78 upper_tri = False 79 lower_tri = False 80 both_tri_symm = False 81 both_tri_nonsymm = False 82 if end_row <= start_col: 83 upper_tri = True 84 elif start_row >= end_col: 85 lower_tri = True 86 elif start_row==start_col and end_row==end_col: 87 both_tri_symm = True 88 else: 89 both_tri_nonsymm = True 90 91 # Initialize the matrix with zeros 92 M = cp.zeros(matrix_shape) 93 94 thread_x = 32 95 thread_y = 16 96 97 if stream is None: 98 device = 0 99 cp.cuda.Device(device).use() 100 cp_stream = cp.cuda.Stream(non_blocking = True) 101 nb_stream = cuda.external_stream(cp_stream.ptr) 102 else: 103 nb_stream = stream 104 105 blocks_per_grid = ((num_rows + (thread_x - 1))//thread_x, (num_cols + (thread_y - 1))//thread_y) 106 dipole_moment_mat_symm_internal_cuda[blocks_per_grid, (thread_x, thread_y), nb_stream](bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d, lower_tri, upper_tri, both_tri_symm, both_tri_nonsymm, origin, M) 107 if both_tri_symm: 108 symmetrize[blocks_per_grid, (thread_x, thread_y), nb_stream](a,b,c,d,M) 109 cuda.synchronize() 110 return M
20def nuc_mat_symm_cupy(basis, mol, slice=None, cp_stream=None, sqrt_ints4c2e_diag=None): 21 # Here the lists are converted to numpy arrays for better use with Numba. 22 # Once these conversions are done we pass these to a Numba decorated 23 # function that uses prange, etc. to calculate the matrix efficiently. 24 25 # This function calculates the kinetic energy matrix for a given basis object. 26 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 27 # It is possible to only calculate a slice (block) of the complete matrix. 28 # slice is a 4 element list whose first and second elements give the range of the rows to be calculated. 29 # the third and fourth element give the range of columns to be calculated. 30 # slice = [start_row, end_row, start_col, end_col] 31 # The integrals are performed using the formulas 32 33 #We convert the required properties to numpy arrays as this is what Numba likes. 34 bfs_coords = cp.array([basis.bfs_coords]) 35 bfs_contr_prim_norms = cp.array([basis.bfs_contr_prim_norms]) 36 bfs_lmn = cp.array([basis.bfs_lmn]) 37 bfs_nprim = cp.array([basis.bfs_nprim]) 38 coordsBohrs = cp.array([mol.coordsBohrs]) 39 Z = cp.array([mol.Zcharges]) 40 natoms = mol.natoms 41 42 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 43 #Numba won't be able to work with these efficiently. 44 #So, we convert them to a numpy 2d array by applying a trick, 45 #that the second dimension is that of the largest list. So that 46 #it can accomadate all the lists. 47 maxnprim = max(basis.bfs_nprim) 48 bfs_coeffs = cp.zeros([basis.bfs_nao, maxnprim]) 49 bfs_expnts = cp.zeros([basis.bfs_nao, maxnprim]) 50 bfs_prim_norms = cp.zeros([basis.bfs_nao, maxnprim]) 51 for i in range(basis.bfs_nao): 52 for j in range(basis.bfs_nprim[i]): 53 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 54 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 55 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 56 57 58 59 if slice is None: 60 slice = [0, basis.bfs_nao, 0, basis.bfs_nao] 61 62 #Limits for the calculation of overlap integrals 63 a = int(slice[0]) #row start index 64 b = int(slice[1]) #row end index 65 c = int(slice[2]) #column start index 66 d = int(slice[3]) #column end index 67 68 # Infer the matrix shape from the start and end indices 69 num_rows = b - a 70 num_cols = d - c 71 start_row = a 72 end_row = b 73 start_col = c 74 end_col = d 75 matrix_shape = (num_rows, num_cols) 76 77 # Check if the slice of the matrix requested falls in the lower/upper triangle or in both the triangles 78 upper_tri = False 79 lower_tri = False 80 both_tri_symm = False 81 both_tri_nonsymm = False 82 if end_row <= start_col: 83 upper_tri = True 84 elif start_row >= end_col: 85 lower_tri = True 86 elif start_row==start_col and end_row==end_col: 87 both_tri_symm = True 88 else: 89 both_tri_nonsymm = True 90 91 # Initialize the matrix with zeros 92 V = cp.zeros(matrix_shape) 93 94 if sqrt_ints4c2e_diag is None: 95 #Create dummy array 96 sqrt_ints4c2e_diag = cp.zeros((1,1), dtype=cp.float64) 97 isSchwarz = False 98 else: 99 isSchwarz = True 100 sqrt_ints4c2e_diag = cp.asarray(sqrt_ints4c2e_diag) 101 102 103 104 if cp_stream is None: 105 device = 0 106 cp.cuda.Device(device).use() 107 cp_stream = cp.cuda.Stream(non_blocking = True) 108 nb_stream = cuda.external_stream(cp_stream.ptr) 109 cp_stream.use() 110 else: 111 nb_stream = cuda.external_stream(cp_stream.ptr) 112 cp_stream.use() 113 114 thread_x = 32 115 thread_y = 32 116 blocks_per_grid = ((num_rows + (thread_x - 1))//thread_x, (num_cols + (thread_y - 1))//thread_y) 117 nuc_mat_symm_internal_cuda[blocks_per_grid, (thread_x, thread_y), nb_stream](bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d, Z[0], coordsBohrs[0], natoms, lower_tri, upper_tri, both_tri_symm, both_tri_nonsymm, sqrt_ints4c2e_diag, isSchwarz, V) 118 if both_tri_symm: 119 thread_x = 32 120 thread_y = 32 121 blocks_per_grid = ((num_rows + (thread_x - 1))//thread_x, (num_cols + (thread_y - 1))//thread_y) 122 symmetrize[blocks_per_grid, (thread_x, thread_y), nb_stream](a,b,c,d,V) 123 if cp_stream is None: 124 cuda.synchronize() 125 else: 126 cp_stream.synchronize() 127 cp.cuda.Stream.null.synchronize() 128 # cp._default_memory_pool.free_all_blocks() 129 return V
20def kin_mat_symm_shell_cupy(basis, slice=None, cp_stream=None): 21 # Here the lists are converted to numpy arrays for better use with Numba. 22 # Once these conversions are done we pass these to a Numba decorated 23 # function that uses prange, etc. to calculate the matrix efficiently. 24 25 # This function calculates the kinetic energy matrix for a given basis object. 26 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 27 # It is possible to only calculate a slice (block) of the complete matrix. 28 # slice is a 4 element list whose first and second elements give the range of the rows to be calculated. 29 # the third and fourth element give the range of columns to be calculated. 30 # slice = [start_row, end_row, start_col, end_col] 31 # The integrals are performed using the formulas 32 33 #We convert the required properties to numpy arrays as this is what Numba likes. 34 bfs_coords = cp.array([basis.bfs_coords]) 35 bfs_contr_prim_norms = cp.array([basis.bfs_contr_prim_norms]) 36 bfs_lmn = cp.array([basis.bfs_lmn]) 37 bfs_nprim = cp.array([basis.bfs_nprim]) 38 bfs_offset = cp.array([basis.shell_bfs_offset]) 39 bfs_nbfshell = cp.array([basis.bfs_nbfshell]) 40 41 42 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 43 #Numba won't be able to work with these efficiently. 44 #So, we convert them to a numpy 2d array by applying a trick, 45 #that the second dimension is that of the largest list. So that 46 #it can accomadate all the lists. 47 maxnprim = max(basis.bfs_nprim) 48 bfs_coeffs = cp.zeros([basis.bfs_nao, maxnprim]) 49 bfs_expnts = cp.zeros([basis.bfs_nao, maxnprim]) 50 bfs_prim_norms = cp.zeros([basis.bfs_nao, maxnprim]) 51 for i in range(basis.bfs_nao): 52 for j in range(basis.bfs_nprim[i]): 53 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 54 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 55 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 56 57 58 59 60 matrix_shape = (basis.bfs_nao, basis.bfs_nao) 61 62 # Initialize the matrix with zeros 63 T = cp.zeros(matrix_shape) 64 65 thread_x = 32 66 thread_y = 32 67 68 if cp_stream is None: 69 device = 0 70 cp.cuda.Device(device).use() 71 cp_stream = cp.cuda.Stream(non_blocking = True) 72 nb_stream = cuda.external_stream(cp_stream.ptr) 73 cp_stream.use() 74 else: 75 nb_stream = cuda.external_stream(cp_stream.ptr) 76 cp_stream.use() 77 78 thread_x = 32 79 thread_y = 32 80 blocks_per_grid = ((basis.nshells + (thread_x - 1))//thread_x, (basis.nshells + (thread_y - 1))//thread_y) 81 kin_mat_symm_shell_internal_cuda[blocks_per_grid, (32, 32), nb_stream](bfs_offset[0], bfs_nbfshell[0], bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, basis.nshells, T) 82 blocks_per_grid = ((basis.bfs_nao + (thread_x - 1))//thread_x, (basis.bfs_nao + (thread_y - 1))//thread_y) 83 symmetrize[blocks_per_grid, (thread_x, thread_y), nb_stream](0,basis.bfs_nao,0,basis.bfs_nao,T) 84 if cp_stream is None: 85 cuda.synchronize() 86 else: 87 cp_stream.synchronize() 88 cp.cuda.Stream.null.synchronize() 89 return T #+ T.T - cp.diag(cp.diag(T))
21def rys_2c2e_symm_cupy(basis, slice=None, cp_stream=None): 22 # Here the lists are converted to numpy arrays for better use with Numba. 23 # Once these conversions are done we pass these to a Numba decorated 24 # function that uses prange, etc. to calculate the matrix efficiently. 25 26 # This function calculates the kinetic energy matrix for a given basis object. 27 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 28 # It is possible to only calculate a slice (block) of the complete matrix. 29 # slice is a 4 element list whose first and second elements give the range of the rows to be calculated. 30 # the third and fourth element give the range of columns to be calculated. 31 # slice = [start_row, end_row, start_col, end_col] 32 # The integrals are performed using the formulas 33 34 #We convert the required properties to numpy arrays as this is what Numba likes. 35 bfs_coords = cp.array([basis.bfs_coords]) 36 bfs_contr_prim_norms = cp.array([basis.bfs_contr_prim_norms]) 37 bfs_lmn = cp.array([basis.bfs_lmn]) 38 bfs_nprim = cp.array([basis.bfs_nprim]) 39 40 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 41 #Numba won't be able to work with these efficiently. 42 #So, we convert them to a numpy 2d array by applying a trick, 43 #that the second dimension is that of the largest list. So that 44 #it can accomadate all the lists. 45 maxnprim = max(basis.bfs_nprim) 46 bfs_coeffs = cp.zeros([basis.bfs_nao, maxnprim]) 47 bfs_expnts = cp.zeros([basis.bfs_nao, maxnprim]) 48 bfs_prim_norms = cp.zeros([basis.bfs_nao, maxnprim]) 49 for i in range(basis.bfs_nao): 50 for j in range(basis.bfs_nprim[i]): 51 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 52 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 53 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 54 55 56 DATA_X_cuda = cp.asarray(DATA_X) 57 DATA_W_cuda = cp.asarray(DATA_W) 58 59 if slice is None: 60 slice = [0, basis.bfs_nao, 0, basis.bfs_nao] 61 62 #Limits for the calculation of overlap integrals 63 a = int(slice[0]) #row start index 64 b = int(slice[1]) #row end index 65 c = int(slice[2]) #column start index 66 d = int(slice[3]) #column end index 67 68 # Infer the matrix shape from the start and end indices 69 num_rows = b - a 70 num_cols = d - c 71 start_row = a 72 end_row = b 73 start_col = c 74 end_col = d 75 matrix_shape = (num_rows, num_cols) 76 77 # Check if the slice of the matrix requested falls in the lower/upper triangle or in both the triangles 78 # Check if the slice of the matrix requested falls in the lower/upper triangle or in both the triangles 79 tri_symm = False 80 no_symm = False 81 if start_row==start_col and end_row==end_col: 82 tri_symm = True 83 else: 84 no_symm = True 85 86 # Initialize the matrix with zeros 87 V = cp.zeros(matrix_shape) 88 89 90 91 92 93 if cp_stream is None: 94 device = 0 95 cp.cuda.Device(device).use() 96 cp_stream = cp.cuda.Stream(non_blocking = True) 97 nb_stream = cuda.external_stream(cp_stream.ptr) 98 cp_stream.use() 99 else: 100 nb_stream = cuda.external_stream(cp_stream.ptr) 101 cp_stream.use() 102 103 thread_x = 32 104 thread_y = 32 105 blocks_per_grid = ((num_rows + (thread_x - 1))//thread_x, (num_cols + (thread_y - 1))//thread_y) 106 rys_2c2e_symm_internal_cuda[blocks_per_grid, (thread_x, thread_y), nb_stream](bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d, tri_symm, no_symm, DATA_X_cuda, DATA_W_cuda, V) 107 if tri_symm: 108 thread_x = 32 109 thread_y = 32 110 blocks_per_grid = ((num_rows + (thread_x - 1))//thread_x, (num_cols + (thread_y - 1))//thread_y) 111 symmetrize[blocks_per_grid, (thread_x, thread_y), nb_stream](a,b,c,d,V) 112 if cp_stream is None: 113 cuda.synchronize() 114 else: 115 cp_stream.synchronize() 116 cp.cuda.Stream.null.synchronize() 117 # cp._default_memory_pool.free_all_blocks() 118 return V
23def rys_3c2e_symm_cupy(basis, auxbasis, slice=None, schwarz=False, schwarz_threshold=1e-9, cp_stream=None): 24 # Here the lists are converted to numpy arrays for better use with Numba. 25 # Once these conversions are done we pass these to a Numba decorated 26 # function that uses prange, etc. to calculate the matrix efficiently. 27 28 # This function calculates the kinetic energy matrix for a given basis object. 29 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 30 # It is possible to only calculate a slice (block) of the complete matrix. 31 # slice is a 4 element list whose first and second elements give the range of the rows to be calculated. 32 # the third and fourth element give the range of columns to be calculated. 33 # slice = [start_row, end_row, start_col, end_col] 34 # The integrals are performed using the formulas 35 36 #We convert the required properties to numpy arrays as this is what Numba likes. 37 bfs_coords = cp.array([basis.bfs_coords]) 38 bfs_contr_prim_norms = cp.array([basis.bfs_contr_prim_norms]) 39 bfs_lmn = cp.array([basis.bfs_lmn]) 40 bfs_nprim = cp.array([basis.bfs_nprim]) 41 42 aux_bfs_coords = cp.array([auxbasis.bfs_coords]) 43 aux_bfs_contr_prim_norms = cp.array([auxbasis.bfs_contr_prim_norms]) 44 aux_bfs_lmn = cp.array([auxbasis.bfs_lmn]) 45 aux_bfs_nprim = cp.array([auxbasis.bfs_nprim]) 46 47 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 48 #Numba won't be able to work with these efficiently. 49 #So, we convert them to a numpy 2d array by applying a trick, 50 #that the second dimension is that of the largest list. So that 51 #it can accomadate all the lists. 52 maxnprim = max(basis.bfs_nprim) 53 bfs_coeffs = cp.zeros([basis.bfs_nao, maxnprim]) 54 bfs_expnts = cp.zeros([basis.bfs_nao, maxnprim]) 55 bfs_prim_norms = cp.zeros([basis.bfs_nao, maxnprim]) 56 for i in range(basis.bfs_nao): 57 for j in range(basis.bfs_nprim[i]): 58 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 59 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 60 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 61 maxnprimaux = max(auxbasis.bfs_nprim) 62 aux_bfs_coeffs = cp.zeros([auxbasis.bfs_nao, maxnprimaux]) 63 aux_bfs_expnts = cp.zeros([auxbasis.bfs_nao, maxnprimaux]) 64 aux_bfs_prim_norms = cp.zeros([auxbasis.bfs_nao, maxnprimaux]) 65 for i in range(auxbasis.bfs_nao): 66 for j in range(auxbasis.bfs_nprim[i]): 67 aux_bfs_coeffs[i,j] = auxbasis.bfs_coeffs[i][j] 68 aux_bfs_expnts[i,j] = auxbasis.bfs_expnts[i][j] 69 aux_bfs_prim_norms[i,j] = auxbasis.bfs_prim_norms[i][j] 70 71 72 DATA_X_cuda = cp.asarray(DATA_X) 73 DATA_W_cuda = cp.asarray(DATA_W) 74 75 if slice is None: 76 slice = [0, basis.bfs_nao, 0, basis.bfs_nao, 0, auxbasis.bfs_nao] 77 78 if schwarz: 79 ints4c2e_diag = cp.asarray(eri_4c2e_diag(basis)) 80 ints2c2e = rys_2c2e_symm_cupy(auxbasis) 81 sqrt_ints4c2e_diag = cp.sqrt(cp.abs(ints4c2e_diag)) 82 sqrt_diag_ints2c2e = cp.sqrt(cp.abs(cp.diag(ints2c2e))) 83 print('Prelims calc done for Schwarz screening!') 84 else: 85 #Create dummy array 86 sqrt_ints4c2e_diag = cp.zeros((1,1), dtype=cp.float64) 87 sqrt_diag_ints2c2e = cp.zeros((1), dtype=cp.float64) 88 89 #Limits for the calculation of 4c2e integrals 90 indx_startA = int(slice[0]) 91 indx_endA = int(slice[1]) 92 indx_startB = int(slice[2]) 93 indx_endB = int(slice[3]) 94 indx_startC = int(slice[4]) 95 indx_endC = int(slice[5]) 96 # Infer the matrix shape from the start and end indices 97 num_A = indx_endA - indx_startA 98 num_B = indx_endB - indx_startB 99 num_C = indx_endC - indx_startC 100 matrix_shape = (num_A, num_B, num_C) 101 102 103 # Check if the slice of the matrix requested falls in the lower/upper triangle or in both the triangles 104 tri_symm = False 105 no_symm = False 106 if indx_startA==indx_startB and indx_endA==indx_endB: 107 tri_symm = True 108 else: 109 no_symm = True 110 111 112 # Initialize the matrix with zeros 113 threeC2E = cp.zeros(matrix_shape, dtype=cp.float64) 114 115 if cp_stream is None: 116 device = 0 117 cp.cuda.Device(device).use() 118 cp_stream = cp.cuda.Stream(non_blocking = True) 119 nb_stream = cuda.external_stream(cp_stream.ptr) 120 cp_stream.use() 121 else: 122 nb_stream = cuda.external_stream(cp_stream.ptr) 123 cp_stream.use() 124 125 thread_x = 32 126 thread_y = 32 127 blocks_per_grid = ((num_A + (thread_x - 1))//thread_x, (num_B + (thread_y - 1))//thread_y) 128 rys_3c2e_symm_internal_cuda_new[blocks_per_grid, (thread_x, thread_y), nb_stream](bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, aux_bfs_coords[0], aux_bfs_contr_prim_norms[0], aux_bfs_lmn[0], aux_bfs_nprim[0], aux_bfs_coeffs, aux_bfs_prim_norms, aux_bfs_expnts, indx_startA, indx_endA, indx_startB, indx_endB, indx_startC, indx_endC, tri_symm, no_symm, DATA_X_cuda, DATA_W_cuda, schwarz, sqrt_ints4c2e_diag, sqrt_diag_ints2c2e, schwarz_threshold, threeC2E) 129 if tri_symm: 130 symmetrize[blocks_per_grid, (thread_x, thread_y), nb_stream](indx_startA, indx_endA, indx_startB, indx_endB, indx_startC, indx_endC,threeC2E) 131 if cp_stream is None: 132 cuda.synchronize() 133 else: 134 cp_stream.synchronize() 135 cp.cuda.Stream.null.synchronize() 136 # cp._default_memory_pool.free_all_blocks() 137 return threeC2E
23def rys_3c2e_symm_cupy_fp32(basis, auxbasis, slice=None, schwarz=True, schwarz_threshold=1e-9, cp_stream=None): 24 # Here the lists are converted to numpy arrays for better use with Numba. 25 # Once these conversions are done we pass these to a Numba decorated 26 # function that uses prange, etc. to calculate the matrix efficiently. 27 28 # This function calculates the kinetic energy matrix for a given basis object. 29 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 30 # It is possible to only calculate a slice (block) of the complete matrix. 31 # slice is a 4 element list whose first and second elements give the range of the rows to be calculated. 32 # the third and fourth element give the range of columns to be calculated. 33 # slice = [start_row, end_row, start_col, end_col] 34 # The integrals are performed using the formulas 35 36 #We convert the required properties to numpy arrays as this is what Numba likes. 37 bfs_coords = cp.array([basis.bfs_coords], dtype=cp.float32) 38 bfs_contr_prim_norms = cp.array([basis.bfs_contr_prim_norms], dtype=cp.float32) 39 bfs_lmn = cp.array([basis.bfs_lmn]) 40 bfs_nprim = cp.array([basis.bfs_nprim]) 41 42 aux_bfs_coords = cp.array([auxbasis.bfs_coords], dtype=cp.float32) 43 aux_bfs_contr_prim_norms = cp.array([auxbasis.bfs_contr_prim_norms], dtype=cp.float32) 44 aux_bfs_lmn = cp.array([auxbasis.bfs_lmn]) 45 aux_bfs_nprim = cp.array([auxbasis.bfs_nprim], dtype=cp.float32) 46 47 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 48 #Numba won't be able to work with these efficiently. 49 #So, we convert them to a numpy 2d array by applying a trick, 50 #that the second dimension is that of the largest list. So that 51 #it can accomadate all the lists. 52 maxnprim = max(basis.bfs_nprim) 53 bfs_coeffs = cp.zeros([basis.bfs_nao, maxnprim], dtype=cp.float32) 54 bfs_expnts = cp.zeros([basis.bfs_nao, maxnprim], dtype=cp.float32) 55 bfs_prim_norms = cp.zeros([basis.bfs_nao, maxnprim], dtype=cp.float32) 56 for i in range(basis.bfs_nao): 57 for j in range(basis.bfs_nprim[i]): 58 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 59 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 60 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 61 maxnprimaux = max(auxbasis.bfs_nprim) 62 aux_bfs_coeffs = cp.zeros([auxbasis.bfs_nao, maxnprimaux], dtype=cp.float32) 63 aux_bfs_expnts = cp.zeros([auxbasis.bfs_nao, maxnprimaux], dtype=cp.float32) 64 aux_bfs_prim_norms = cp.zeros([auxbasis.bfs_nao, maxnprimaux], dtype=cp.float32) 65 for i in range(auxbasis.bfs_nao): 66 for j in range(auxbasis.bfs_nprim[i]): 67 aux_bfs_coeffs[i,j] = auxbasis.bfs_coeffs[i][j] 68 aux_bfs_expnts[i,j] = auxbasis.bfs_expnts[i][j] 69 aux_bfs_prim_norms[i,j] = auxbasis.bfs_prim_norms[i][j] 70 71 72 DATA_X_cuda = cp.asarray(DATA_X, dtype=cp.float32) 73 DATA_W_cuda = cp.asarray(DATA_W, dtype=cp.float32) 74 75 if slice is None: 76 slice = [0, basis.bfs_nao, 0, basis.bfs_nao, 0, auxbasis.bfs_nao] 77 78 if schwarz: 79 ints4c2e_diag = cp.asarray(eri_4c2e_diag(basis)) 80 ints2c2e = rys_2c2e_symm_cupy(auxbasis) 81 sqrt_ints4c2e_diag = cp.sqrt(cp.abs(ints4c2e_diag)) 82 sqrt_diag_ints2c2e = cp.sqrt(cp.abs(cp.diag(ints2c2e))) 83 print('Prelims calc done for Schwarz screening!') 84 else: 85 #Create dummy array 86 sqrt_ints4c2e_diag = cp.zeros((1,1), dtype=cp.float64) 87 sqrt_diag_ints2c2e = cp.zeros((1), dtype=cp.float64) 88 89 #Limits for the calculation of 4c2e integrals 90 indx_startA = int(slice[0]) 91 indx_endA = int(slice[1]) 92 indx_startB = int(slice[2]) 93 indx_endB = int(slice[3]) 94 indx_startC = int(slice[4]) 95 indx_endC = int(slice[5]) 96 # Infer the matrix shape from the start and end indices 97 num_A = indx_endA - indx_startA 98 num_B = indx_endB - indx_startB 99 num_C = indx_endC - indx_startC 100 matrix_shape = (num_A, num_B, num_C) 101 102 103 # Check if the slice of the matrix requested falls in the lower/upper triangle or in both the triangles 104 tri_symm = False 105 no_symm = False 106 if indx_startA==indx_startB and indx_endA==indx_endB: 107 tri_symm = True 108 else: 109 no_symm = True 110 111 112 # Initialize the matrix with zeros 113 threeC2E = cp.zeros(matrix_shape, dtype=cp.float32) 114 115 if cp_stream is None: 116 device = 0 117 cp.cuda.Device(device).use() 118 cp_stream = cp.cuda.Stream(non_blocking = True) 119 nb_stream = cuda.external_stream(cp_stream.ptr) 120 cp_stream.use() 121 else: 122 nb_stream = cuda.external_stream(cp_stream.ptr) 123 cp_stream.use() 124 125 thread_x = 32 126 thread_y = 32 127 blocks_per_grid = ((num_A + (thread_x - 1))//thread_x, (num_B + (thread_y - 1))//thread_y) 128 rys_3c2e_symm_internal_cuda[blocks_per_grid, (thread_x, thread_y), nb_stream](bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, aux_bfs_coords[0], aux_bfs_contr_prim_norms[0], aux_bfs_lmn[0], aux_bfs_nprim[0], aux_bfs_coeffs, aux_bfs_prim_norms, aux_bfs_expnts, indx_startA, indx_endA, indx_startB, indx_endB, indx_startC, indx_endC, tri_symm, no_symm, DATA_X_cuda, DATA_W_cuda, schwarz, sqrt_ints4c2e_diag, sqrt_diag_ints2c2e, schwarz_threshold, threeC2E) 129 if tri_symm: 130 symmetrize[blocks_per_grid, (thread_x, thread_y), nb_stream](indx_startA, indx_endA, indx_startB, indx_endB, indx_startC, indx_endC,threeC2E) 131 if cp_stream is None: 132 cuda.synchronize() 133 else: 134 cp_stream.synchronize() 135 cp.cuda.Stream.null.synchronize() 136 # cp._default_memory_pool.free_all_blocks() 137 return threeC2E
7def overlap_mat_grad_symm(basis, slice=None): 8 # Here the lists are converted to numpy arrays for better use with Numba. 9 # Once these conversions are done we pass these to a Numba decorated 10 # function that uses prange, etc. to calculate the matrix efficiently. 11 12 # This function calculates the overlap matrix for a given basis object. 13 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 14 # It is possible to only calculate a slice (block) of the complete matrix. 15 # slice is a 4 element list whose first and second elements give the range of the rows to be calculated. 16 # the third and fourth element give the range of columns to be calculated. 17 # slice = [start_row, end_row, start_col, end_col] 18 # The integrals are performed using the formulas 19 20 #We convert the required properties to numpy arrays as this is what Numba likes. 21 bfs_coords = np.array([basis.bfs_coords]) 22 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 23 bfs_lmn = np.array([basis.bfs_lmn]) 24 bfs_nprim = np.array([basis.bfs_nprim]) 25 bfs_atoms = np.array([basis.bfs_atoms]) 26 natoms = max(basis.bfs_atoms) + 1 27 28 29 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 30 #Numba won't be able to work with these efficiently. 31 #So, we convert them to a numpy 2d array by applying a trick, 32 #that the second dimension is that of the largest list. So that 33 #it can accomadate all the lists. 34 maxnprim = max(basis.bfs_nprim) 35 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 36 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 37 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 38 for i in range(basis.bfs_nao): 39 for j in range(basis.bfs_nprim[i]): 40 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 41 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 42 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 43 44 45 if slice is None: 46 slice = [0, basis.bfs_nao, 0, basis.bfs_nao] 47 48 #Limits for the calculation of overlap integrals 49 a = int(slice[0]) #row start index 50 b = int(slice[1]) #row end index 51 c = int(slice[2]) #column start index 52 d = int(slice[3]) #column end index 53 54 dS = overlap_mat_grad_symm_internal_new(natoms, bfs_atoms[0], bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d) 55 return dS
7def kin_mat_grad_symm(basis, slice=None): 8 # Here the lists are converted to numpy arrays for better use with Numba. 9 # Once these conversions are done we pass these to a Numba decorated 10 # function that uses prange, etc. to calculate the matrix efficiently. 11 12 # This function calculates the kinetic energy matrix for a given basis object. 13 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 14 # It is possible to only calculate a slice (block) of the complete matrix. 15 # slice is a 4 element list whose first and second elements give the range of the rows to be calculated. 16 # the third and fourth element give the range of columns to be calculated. 17 # slice = [start_row, end_row, start_col, end_col] 18 # The integrals are performed using the formulas 19 20 #We convert the required properties to numpy arrays as this is what Numba likes. 21 bfs_coords = np.array([basis.bfs_coords]) 22 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 23 bfs_lmn = np.array([basis.bfs_lmn]) 24 bfs_nprim = np.array([basis.bfs_nprim]) 25 bfs_atoms = np.array([basis.bfs_atoms]) 26 natoms = max(basis.bfs_atoms) + 1 27 28 29 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 30 #Numba won't be able to work with these efficiently. 31 #So, we convert them to a numpy 2d array by applying a trick, 32 #that the second dimension is that of the largest list. So that 33 #it can accomadate all the lists. 34 maxnprim = max(basis.bfs_nprim) 35 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 36 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 37 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 38 for i in range(basis.bfs_nao): 39 for j in range(basis.bfs_nprim[i]): 40 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 41 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 42 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 43 44 45 46 if slice is None: 47 slice = [0, basis.bfs_nao, 0, basis.bfs_nao] 48 49 #Limits for the calculation of overlap integrals 50 a = int(slice[0]) #row start index 51 b = int(slice[1]) #row end index 52 c = int(slice[2]) #column start index 53 d = int(slice[3]) #column end index 54 55 56 dT = kin_mat_symm_grad_internal_new(natoms, bfs_atoms[0], bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d) 57 58 return dT
7def nuc_mat_grad_symm(basis, mol, slice=None, sqrt_ints4c2e_diag=None): 8 #Here the lists are converted to numpy arrays for better use with Numba. 9 #Once these conversions are done we pass these to a Numba decorated 10 #function that uses prange, etc. to calculate the matrix efficiently. 11 12 # This function calculates the nuclear matrix for a given basis object. 13 # The basis object holds the information of basis functions like: exponents, coeffs, etc. 14 # It is possible to only calculate a slice (block) of the complete matrix. 15 # slice is a 4 element list whose first and second elements give the range of the rows to be calculated. 16 # the third and fourth element give the range of columns to be calculated. 17 # slice = [start_row, end_row, start_col, end_col] 18 # The integrals are performed using the formulas 19 20 #We convert the required properties to numpy arrays as this is what Numba likes. 21 bfs_coords = np.array([basis.bfs_coords]) 22 bfs_contr_prim_norms = np.array([basis.bfs_contr_prim_norms]) 23 bfs_lmn = np.array([basis.bfs_lmn]) 24 bfs_nprim = np.array([basis.bfs_nprim]) 25 coordsBohrs = np.array([mol.coordsBohrs]) 26 Z = np.array([mol.Zcharges]) 27 natoms = mol.natoms 28 29 30 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 31 #Numba won't be able to work with these efficiently. 32 #So, we convert them to a numpy 2d array by applying a trick, 33 #that the second dimension is that of the largest list. So that 34 #it can accomadate all the lists. 35 maxnprim = max(basis.bfs_nprim) 36 bfs_coeffs = np.zeros([basis.bfs_nao, maxnprim]) 37 bfs_expnts = np.zeros([basis.bfs_nao, maxnprim]) 38 bfs_prim_norms = np.zeros([basis.bfs_nao, maxnprim]) 39 for i in range(basis.bfs_nao): 40 for j in range(basis.bfs_nprim[i]): 41 bfs_coeffs[i,j] = basis.bfs_coeffs[i][j] 42 bfs_expnts[i,j] = basis.bfs_expnts[i][j] 43 bfs_prim_norms[i,j] = basis.bfs_prim_norms[i][j] 44 45 46 47 if slice is None: 48 slice = [0, basis.bfs_nao, 0, basis.bfs_nao] 49 50 #Limits for the calculation of overlap integrals 51 a = int(slice[0]) #row start index 52 b = int(slice[1]) #row end index 53 c = int(slice[2]) #column start index 54 d = int(slice[3]) #column end index 55 56 # print([a,b,c,d]) 57 58 if sqrt_ints4c2e_diag is None: 59 #Create dummy array 60 sqrt_ints4c2e_diag = np.zeros((1,1), dtype=np.float64) 61 isSchwarz = False 62 else: 63 isSchwarz = True 64 65 V = nuc_mat_grad_symm_internal(bfs_coords[0], bfs_contr_prim_norms[0], bfs_lmn[0], bfs_nprim[0], bfs_coeffs, bfs_prim_norms, bfs_expnts, a, b, c, d, Z[0], coordsBohrs[0], natoms, sqrt_ints4c2e_diag, isSchwarz) 66 67 return V
7def cross_overlap_mat_symm(basisA, basisB, slice=None): 8 """ 9 Compute the overlap matrix between two distinct basis sets. 10 11 This function calculates the overlap integrals ⟨χ_i^A | χ_j^B⟩ between basis functions 12 from two different basis sets, `basisA` and `basisB`. It supports block-wise computation 13 of the matrix to save time and memory during large-scale calculations. 14 15 S_{μν}^{AB} = ⟨ χ_μ^A | χ_ν^B ⟩ 16 17 All integrals are evaluated using a Numba-accelerated backend that converts 18 required basis set properties into NumPy arrays for efficient performance. 19 20 Parameters 21 ---------- 22 basisA : object 23 The first basis set object containing properties such as: 24 - bfs_coords: Cartesian coordinates of centers 25 - bfs_coeffs: Contraction coefficients 26 - bfs_expnts: Gaussian exponents 27 - bfs_prim_norms: Primitive normalization constants 28 - bfs_contr_prim_norms: Contraction normalization constants 29 - bfs_lmn: Angular momentum quantum numbers 30 - bfs_nprim: Number of primitives per AO 31 - bfs_nao: Total number of atomic orbitals 32 33 basisB : object 34 The second basis set object, structured identically to `basisA`. 35 36 slice : list of int, optional 37 A 4-element list `[start_row, end_row, start_col, end_col]` that defines 38 a block of the full matrix to compute. Rows correspond to functions in `basisA`, 39 and columns to those in `basisB`. If `None` (default), the full matrix is computed. 40 41 Returns 42 ------- 43 S : ndarray of shape (end_row - start_row, end_col - start_col) 44 The computed cross overlap (sub)matrix. 45 46 Notes 47 ----- 48 The function handles contraction and normalization internally. Since Numba does not 49 support jagged lists, the function reshapes data into padded 2D arrays based on 50 the maximum number of primitives in either basis set. 51 52 Examples 53 -------- 54 >>> S = cross_overlap_mat_symm(basisA, basisB) 55 >>> S_block = cross_overlap_mat_symm(basisA, basisB, slice=[0, 5, 0, 10]) 56 """ 57 # Here the lists are converted to numpy arrays for better use with Numba. 58 # Once these conversions are done we pass these to a Numba decorated 59 # function that uses prange, etc. to calculate the matrix efficiently. 60 61 # This function calculates the overlap matrix between two basis objects: basisA and basisB. 62 # The basis objects hold the information of basis functions like: exponents, coeffs, etc. 63 # It is possible to only calculate a slice (block) of the complete matrix. 64 # slice is a 4 element list whose first and second elements give the range of the rows (basisA) to be calculated. 65 # the third and fourth element give the range of columns (basisB) to be calculated. 66 # slice = [start_row, end_row, start_col, end_col] 67 # The integrals are performed using the formulas 68 69 #We convert the required properties to numpy arrays as this is what Numba likes. 70 bfsA_coords = np.array([basisA.bfs_coords]) 71 bfsA_contr_prim_norms = np.array([basisA.bfs_contr_prim_norms]) 72 bfsA_lmn = np.array([basisA.bfs_lmn]) 73 bfsA_nprim = np.array([basisA.bfs_nprim]) 74 75 bfsB_coords = np.array([basisB.bfs_coords]) 76 bfsB_contr_prim_norms = np.array([basisB.bfs_contr_prim_norms]) 77 bfsB_lmn = np.array([basisB.bfs_lmn]) 78 bfsB_nprim = np.array([basisB.bfs_nprim]) 79 80 81 #The remaining properties like bfs_coeffs are a list of lists of unequal sizes. 82 #Numba won't be able to work with these efficiently. 83 #So, we convert them to a numpy 2d array by applying a trick, 84 #that the second dimension is that of the largest list. So that 85 #it can accomadate all the lists. 86 maxnprimA = max(basisA.bfs_nprim) 87 bfsA_coeffs = np.zeros([basisA.bfs_nao, maxnprimA]) 88 bfsA_expnts = np.zeros([basisA.bfs_nao, maxnprimA]) 89 bfsA_prim_norms = np.zeros([basisA.bfs_nao, maxnprimA]) 90 for i in range(basisA.bfs_nao): 91 for j in range(basisA.bfs_nprim[i]): 92 bfsA_coeffs[i,j] = basisA.bfs_coeffs[i][j] 93 bfsA_expnts[i,j] = basisA.bfs_expnts[i][j] 94 bfsA_prim_norms[i,j] = basisA.bfs_prim_norms[i][j] 95 96 maxnprimB = max(basisB.bfs_nprim) 97 bfsB_coeffs = np.zeros([basisB.bfs_nao, maxnprimB]) 98 bfsB_expnts = np.zeros([basisB.bfs_nao, maxnprimB]) 99 bfsB_prim_norms = np.zeros([basisB.bfs_nao, maxnprimB]) 100 for i in range(basisB.bfs_nao): 101 for j in range(basisB.bfs_nprim[i]): 102 bfsB_coeffs[i,j] = basisB.bfs_coeffs[i][j] 103 bfsB_expnts[i,j] = basisB.bfs_expnts[i][j] 104 bfsB_prim_norms[i,j] = basisB.bfs_prim_norms[i][j] 105 106 107 if slice is None: 108 slice = [0, basisA.bfs_nao, 0, basisB.bfs_nao] 109 110 #Limits for the calculation of overlap integrals 111 a = int(slice[0]) #row start index (basisA) 112 b = int(slice[1]) #row end index (basisA) 113 c = int(slice[2]) #column start index (basisB) 114 d = int(slice[3]) #column end index (basisB) 115 116 S = cross_overlap_mat_internal(bfsA_coords[0], bfsA_contr_prim_norms[0], bfsA_lmn[0], bfsA_nprim[0], bfsA_coeffs, bfsA_prim_norms, bfsA_expnts, 117 bfsB_coords[0], bfsB_contr_prim_norms[0], bfsB_lmn[0], bfsB_nprim[0], bfsB_coeffs, bfsB_prim_norms, bfsB_expnts, 118 a, b, c, d) 119 return S
Compute the overlap matrix between two distinct basis sets.
This function calculates the overlap integrals ⟨χ_i^A | χ_j^B⟩ between basis functions
from two different basis sets, basisA and basisB. It supports block-wise computation
of the matrix to save time and memory during large-scale calculations.
S_{μν}^{AB} = ⟨ χ_μ^A | χ_ν^B ⟩
All integrals are evaluated using a Numba-accelerated backend that converts required basis set properties into NumPy arrays for efficient performance.
Parameters
basisA : object The first basis set object containing properties such as: - bfs_coords: Cartesian coordinates of centers - bfs_coeffs: Contraction coefficients - bfs_expnts: Gaussian exponents - bfs_prim_norms: Primitive normalization constants - bfs_contr_prim_norms: Contraction normalization constants - bfs_lmn: Angular momentum quantum numbers - bfs_nprim: Number of primitives per AO - bfs_nao: Total number of atomic orbitals
basisB : object
The second basis set object, structured identically to basisA.
slice : list of int, optional
A 4-element list [start_row, end_row, start_col, end_col] that defines
a block of the full matrix to compute. Rows correspond to functions in basisA,
and columns to those in basisB. If None (default), the full matrix is computed.
Returns
S : ndarray of shape (end_row - start_row, end_col - start_col) The computed cross overlap (sub)matrix.
Notes
The function handles contraction and normalization internally. Since Numba does not support jagged lists, the function reshapes data into padded 2D arrays based on the maximum number of primitives in either basis set.
Examples
>>> S = cross_overlap_mat_symm(basisA, basisB)
>>> S_block = cross_overlap_mat_symm(basisA, basisB, slice=[0, 5, 0, 10])