mlspm.preprocessing#

mlspm.preprocessing.add_cutout(Xs: list[ndarray], n_holes: int = 5)[source]#

Randomly add cutouts (rectangular patches of zeros) to images. In-place operation.

Parameters:
  • Xs – Arrays to add cutouts to. Each array should be of shape (batch_size, x, y, z).

  • n_holes – Maximum number of cutouts to add.

mlspm.preprocessing.add_gradient(Xs: list[ndarray], c: float = 0.3)[source]#

Add a constant gradient plane with random direction to arrays. In-place operation.

Parameters:
  • Xs – Arrays to add gradients to. Each array should be of shape (batch_size, x, y, z).

  • c – Maximum range of gradient plane as a fraction of the range of the array values.

mlspm.preprocessing.add_noise(Xs: list[ndarray], c: float = 0.1, randomize_amplitude: bool = False, normal_amplitude: bool = False)[source]#

Add uniform random noise to arrays. In-place operation.

Parameters:
  • Xs – Arrays to add noise to. Each array should be of shape (batch_size, ...).

  • c – Amplitude of noise. Is multiplied by (max-min) of sample.

  • randomize_amplitude – If True, noise amplitude is uniform random in [0, c] for each sample in the batch.

  • normal_amplitude – If True and randomize_amplitude==True, then instead of uniform, the noise amplitude is distributed like the absolute value of a normally distributed variable with zero mean and standard deviation equal to c.

mlspm.preprocessing.add_norm(Xs: list[ndarray], per_layer: bool = True)[source]#

Normalize arrays by subracting the mean and dividing by standard deviation. In-place operation.

Parameters:
  • Xs – Arrays to normalize. Each array should be of shape (batch_size, ...).

  • per_layer – If True, normalized separately for each element in last axis of Xs.

mlspm.preprocessing.add_rotation_reflection(X: List[ndarray], Y: List[ndarray], reflections: bool = True, multiple: int = 2, crop: Tuple[int] | None = None, per_batch_item: bool = False) Tuple[ndarray, ndarray][source]#

Augment batch with random rotations and reflections.

Parameters:
  • X – AFM images to augment. Each array should be of shape (batch_size, x, y, z).

  • Y – Reference image descriptors to augment. Each array should be of shape (batch, x, y).

  • reflections – Whether to augment with reflections. If True, each rotation is randomly reflected with 50% probability.

  • multiple – Multiplier for how many rotations to generate for every sample.

  • crop – If not None, then output batch is cropped to specified size (x_size, y_size) in the middle of the image.

  • per_batch_item – If True, rotation is randomized per batch item, otherwise same rotation for all.

Returns:

Tuple (X, Y), where

  • X - Batch of rotation-augmented AFM images of shape (batch*multiple, x_new, y_new, z).

  • Y - Batch of rotation-augmented reference image descriptors of shape (batch*multiple, x_new, y_new)

mlspm.preprocessing.interpolate_and_crop(Xs: list[ndarray], real_dim: Tuple[int, int], target_res: float = 0.125, target_multiple: int = 8) list[ndarray][source]#

Interpolate a batch of AFM images to target resolution and crop to a target multiple of pixels in the xy plane.

Parameters:
  • X – AFM images to interpolate and crop. Each array should be of shape (batch_size, x, y, z).

  • real_dim – Real-space size of AFM image region in x- and y-directions in Ångstroms.

  • target_res – Target size for a pixel in angstroms.

  • target_multiple – Target multiple of pixels of output image.

Returns:

Interpolated and cropped AFM images.

mlspm.preprocessing.minimum_to_zero(Ys: List[ndarray])[source]#

Shift values in arrays such that minimum is at zero. In-place operation.

Parameters:

Ys – Arrays of shape (batch_size, …).

mlspm.preprocessing.rand_shift_xy_trend(Xs: list[ndarray], max_layer_shift: float = 0.02, max_total_shift: float = 0.1)[source]#

Randomly shift z-layers in the xy-plane. Each shift is relative to previous one. In-place operation.

Parameters:
  • Xs – Arrays to shift. Each array should be of shape (batch_size, x, y, z).

  • shift_step_max – Maximum fraction of image size by which to shift for each layer. Should be in the interval [0, 1].

  • max_shift_total – Maximum fraction of image size by which to shift in total. Should be in the interval [0, 1] and more than shift_step_max.

mlspm.preprocessing.random_crop(X: List[ndarray], Y: List[ndarray], min_crop: float = 0.5, max_aspect: float = 2.0, multiple: int = 8, distribution: Literal['flat', 'exp-log'] = 'flat') Tuple[ndarray, ndarray][source]#

Randomly crop images in a batch to a different size and aspect ratio.

Parameters:
  • X – AFM images to crop. Each array should be of shape (batch_size, x, y, z).

  • Y – Reference image descriptors to crop. Each array should be of shape (batch, x, y).

  • min_crop – Minimum crop size as a fraction of the original size.

  • max_aspect – Maximum aspect ratio for crop. Cannot be more than 1/min_crop.

  • multiple – The crop size is rounded down to the specified integer multiple.

  • distribution – ‘flat’ or ‘exp-log’. How aspect ratios are distributed. If ‘flat’, then distribution is random uniform between (1, max_aspect) and half of time is flipped. If ‘exp-log’, then distribution is exp of log of uniform distribution over (1/max_aspect, max_aspect). ‘exp-log’ is more biased towards square aspect ratios.

Returns:

Tuple (X, Y), where

  • X - Batch of cropped AFM images of shape (batch, x_new, y_new, z).

  • Y - Batch of cropped reference image descriptors of shape (batch, x_new, y_new).

mlspm.preprocessing.top_atom_to_zero(xyzs: list[ndarray])[source]#

Set the z coordinate of the highest atom in each molecule to 0. In-place operation.

Parameters:

xyzs – Molecule arrays to modify. Each array should be of shape (num_atoms, :), such that the first three elements on the second axis are the xyz coordinates.