mlspm.preprocessing#
- mlspm.preprocessing.add_cutout(Xs: list[ndarray], n_holes: int = 5)[source]#
Randomly add cutouts (rectangular patches of zeros) to images. In-place operation.
- Parameters:
Xs – Arrays to add cutouts to. Each array should be of shape
(batch_size, x, y, z).n_holes – Maximum number of cutouts to add.
- mlspm.preprocessing.add_gradient(Xs: list[ndarray], c: float = 0.3)[source]#
Add a constant gradient plane with random direction to arrays. In-place operation.
- Parameters:
Xs – Arrays to add gradients to. Each array should be of shape
(batch_size, x, y, z).c – Maximum range of gradient plane as a fraction of the range of the array values.
- mlspm.preprocessing.add_noise(Xs: list[ndarray], c: float = 0.1, randomize_amplitude: bool = False, normal_amplitude: bool = False)[source]#
Add uniform random noise to arrays. In-place operation.
- Parameters:
Xs – Arrays to add noise to. Each array should be of shape
(batch_size, ...).c – Amplitude of noise. Is multiplied by (max-min) of sample.
randomize_amplitude – If True, noise amplitude is uniform random in
[0, c]for each sample in the batch.normal_amplitude – If True and
randomize_amplitude==True, then instead of uniform, the noise amplitude is distributed like the absolute value of a normally distributed variable with zero mean and standard deviation equal to c.
- mlspm.preprocessing.add_norm(Xs: list[ndarray], per_layer: bool = True)[source]#
Normalize arrays by subracting the mean and dividing by standard deviation. In-place operation.
- Parameters:
Xs – Arrays to normalize. Each array should be of shape
(batch_size, ...).per_layer – If True, normalized separately for each element in last axis of Xs.
- mlspm.preprocessing.add_rotation_reflection(X: List[ndarray], Y: List[ndarray], reflections: bool = True, multiple: int = 2, crop: Tuple[int] | None = None, per_batch_item: bool = False) Tuple[ndarray, ndarray][source]#
Augment batch with random rotations and reflections.
- Parameters:
X – AFM images to augment. Each array should be of shape
(batch_size, x, y, z).Y – Reference image descriptors to augment. Each array should be of shape
(batch, x, y).reflections – Whether to augment with reflections. If True, each rotation is randomly reflected with 50% probability.
multiple – Multiplier for how many rotations to generate for every sample.
crop – If not None, then output batch is cropped to specified size
(x_size, y_size)in the middle of the image.per_batch_item – If True, rotation is randomized per batch item, otherwise same rotation for all.
- Returns:
Tuple (X, Y), where
X - Batch of rotation-augmented AFM images of shape
(batch*multiple, x_new, y_new, z).Y - Batch of rotation-augmented reference image descriptors of shape
(batch*multiple, x_new, y_new)
- mlspm.preprocessing.interpolate_and_crop(Xs: list[ndarray], real_dim: Tuple[int, int], target_res: float = 0.125, target_multiple: int = 8) list[ndarray][source]#
Interpolate a batch of AFM images to target resolution and crop to a target multiple of pixels in the xy plane.
- Parameters:
X – AFM images to interpolate and crop. Each array should be of shape
(batch_size, x, y, z).real_dim – Real-space size of AFM image region in x- and y-directions in Ångstroms.
target_res – Target size for a pixel in angstroms.
target_multiple – Target multiple of pixels of output image.
- Returns:
Interpolated and cropped AFM images.
- mlspm.preprocessing.minimum_to_zero(Ys: List[ndarray])[source]#
Shift values in arrays such that minimum is at zero. In-place operation.
- Parameters:
Ys – Arrays of shape (batch_size, …).
- mlspm.preprocessing.rand_shift_xy_trend(Xs: list[ndarray], max_layer_shift: float = 0.02, max_total_shift: float = 0.1)[source]#
Randomly shift z-layers in the xy-plane. Each shift is relative to previous one. In-place operation.
- Parameters:
Xs – Arrays to shift. Each array should be of shape
(batch_size, x, y, z).shift_step_max – Maximum fraction of image size by which to shift for each layer. Should be in the interval
[0, 1].max_shift_total – Maximum fraction of image size by which to shift in total. Should be in the interval
[0, 1]and more than shift_step_max.
- mlspm.preprocessing.random_crop(X: List[ndarray], Y: List[ndarray], min_crop: float = 0.5, max_aspect: float = 2.0, multiple: int = 8, distribution: Literal['flat', 'exp-log'] = 'flat') Tuple[ndarray, ndarray][source]#
Randomly crop images in a batch to a different size and aspect ratio.
- Parameters:
X – AFM images to crop. Each array should be of shape
(batch_size, x, y, z).Y – Reference image descriptors to crop. Each array should be of shape
(batch, x, y).min_crop – Minimum crop size as a fraction of the original size.
max_aspect – Maximum aspect ratio for crop. Cannot be more than 1/min_crop.
multiple – The crop size is rounded down to the specified integer multiple.
distribution – ‘flat’ or ‘exp-log’. How aspect ratios are distributed. If ‘flat’, then distribution is random uniform between (1, max_aspect) and half of time is flipped. If ‘exp-log’, then distribution is exp of log of uniform distribution over (1/max_aspect, max_aspect). ‘exp-log’ is more biased towards square aspect ratios.
- Returns:
Tuple (X, Y), where
X - Batch of cropped AFM images of shape
(batch, x_new, y_new, z).Y - Batch of cropped reference image descriptors of shape
(batch, x_new, y_new).
- mlspm.preprocessing.top_atom_to_zero(xyzs: list[ndarray])[source]#
Set the z coordinate of the highest atom in each molecule to 0. In-place operation.
- Parameters:
xyzs – Molecule arrays to modify. Each array should be of shape
(num_atoms, :), such that the first three elements on the second axis are the xyz coordinates.