birl.utilities.dataset module

Some functionality related to dataset

Copyright (C) 2016-2019 Jiri Borovec <jiri.borovec@fel.cvut.cz>

birl.utilities.dataset.args_expand_images(parser, nb_workers=1, overwrite=True)[source]
expand the parser by standard parameters related to images:
  • image paths

  • allow overwrite (optional)

  • number of jobs

Parameters
  • parser (obj) – existing parser

  • nb_workers (int) – number threads by default

  • overwrite (bool) – allow overwrite images

Return obj

>>> import argparse
>>> args_expand_images(argparse.ArgumentParser())  
ArgumentParser(...)
birl.utilities.dataset.args_expand_parse_images(parser, nb_workers=1, overwrite=True)[source]
expand the parser by standard parameters related to images:
  • image paths

  • allow overwrite (optional)

  • number of jobs

Parameters
  • parser (obj) – existing parser

  • nb_workers (int) – number threads by default

  • overwrite (bool) – allow overwrite images

Return dict

birl.utilities.dataset.common_landmarks(points1, points2, threshold=1.5)[source]

find common landmarks in two sets

Parameters
  • points1 (ndarray|list(list(float))) – first point set

  • points2 (ndarray|list(list(float))) – second point set

  • threshold (float) – threshold for assignment (for landmarks in pixels)

Return list(bool)

flags

>>> np.random.seed(0)
>>> common = np.random.random((5, 2))
>>> pts1 = np.vstack([common, np.random.random((10, 2))])
>>> pts2 = np.vstack([common, np.random.random((15, 2))])
>>> common_landmarks(pts1, pts2, threshold=1e-3)
array([[0, 0],
       [1, 1],
       [2, 2],
       [3, 3],
       [4, 4]])
>>> np.random.shuffle(pts2)
>>> common_landmarks(pts1, pts2, threshold=1e-3)
array([[ 0, 13],
       [ 1, 10],
       [ 2,  9],
       [ 3, 14],
       [ 4,  8]])
birl.utilities.dataset.compute_bounding_polygon(landmarks)[source]

get the polygon where all point lies inside

Parameters

landmarks (ndarray) – set of points

Return ndarray

pints of polygon

>>> np.random.seed(0)
>>> points = np.random.randint(1, 9, (45, 2))
>>> compute_bounding_polygon(points)  
[[1, 2], [2, 4], [1, 5], [2, 8], [7, 8], [8, 7], [8, 1], [3, 1], [3, 2]]
birl.utilities.dataset.compute_convex_hull(landmarks)[source]

compute convex hull around landmarks

Parameters

landmarks (ndarray) – set of points

Return ndarray

pints of polygon

>>> np.random.seed(0)
>>> pts = np.random.randint(15, 30, (10, 2))
>>> compute_convex_hull(pts)
array([[27, 20],
       [27, 25],
       [22, 24],
       [16, 21],
       [15, 18],
       [26, 18]])
birl.utilities.dataset.compute_half_polygon(landmarks, idx_start=0, idx_end=- 1)[source]

compute half polygon path

Parameters
  • idx_start (int) – index of starting point

  • idx_end (int) – index of ending point

  • landmarks (ndarray) – set of points

Return ndarray

set of points

>>> pts = [(-1, 1), (0, 0), (0, 2), (1, 1), (1, -0.5), (2, 0)]
>>> compute_half_polygon(pts, idx_start=0, idx_end=-1)
[[-1.0, 1.0], [0.0, 2.0], [1.0, 1.0], [2.0, 0.0]]
>>> compute_half_polygon(pts[:2], idx_start=-1, idx_end=0)
[[-1, 1], [0, 0]]
>>> pts = [[0, 2], [1, 5], [2, 4], [2, 5], [4, 4], [4, 6], [4, 8], [5, 8], [5, 8]]
>>> compute_half_polygon(pts)
[[0, 2], [1, 5], [2, 5], [4, 6], [4, 8], [5, 8]]
birl.utilities.dataset.convert_landmarks_from_itk(lnds, image_size)[source]

converting ITK format to used in ImageJ

Parameters
  • lnds (ndarray) – landmarks

  • image_size ((int,int)) – image height, width

Return ndarray

landmarks

>>> convert_landmarks_from_itk([[ 20, 145], [150,  50], [100, 150]], (150, 200))
array([[  5,  20],
       [100, 150],
       [  0, 100]])
>>> lnds = [[ 20, 145], [150,  50], [100, 150], [0, 0], [150, 200]]
>>> img_size = (150, 200)
>>> lnds2 = convert_landmarks_from_itk(convert_landmarks_to_itk(lnds, img_size), img_size)
>>> np.array_equal(lnds, lnds2)
True
birl.utilities.dataset.convert_landmarks_to_itk(lnds, image_size)[source]

converting used landmarks to ITK format

Parameters
  • lnds (ndarray) – landmarks

  • image_size ((int,int)) – image size - height, width

Return ndarray

landmarks

>>> convert_landmarks_to_itk([[5, 20], [100, 150], [0, 100]], (150, 200))
array([[ 20, 145],
       [150,  50],
       [100, 150]])
birl.utilities.dataset.detect_binary_blocks(vec_bin)[source]

detect the binary object by beginning, end and length in !d signal

Parameters

vec_bin (list(bool)) – binary vector with 1 for an object

Return tuple(list(int),list(int),list(int))

>>> vec = np.array([1] * 15 + [0] * 5 + [1] * 20)
>>> detect_binary_blocks(vec)
([0, 20], [15, 39], [14, 19])
birl.utilities.dataset.estimate_scaling(images, max_size=5000)[source]

find scaling for given set of images and maximal image size

Parameters
  • images (list(ndarray)) – input images

  • max_size (float) – max image size in any dimension

Return float

scaling in range (0, 1)

>>> estimate_scaling([np.zeros((12000, 300, 3))])  
0.4...
>>> estimate_scaling([np.zeros((1200, 800, 3))])
1.0
birl.utilities.dataset.find_largest_object(hist, threshold=0.01)[source]

find the largest objects and give its beginning end end

Parameters
  • hist (list(float)) – input vector

  • threshold (float) – threshold for input vector

Return list(int)

>>> vec = np.array([1] * 15 + [0] * 5 + [1] * 20)
>>> find_largest_object(vec)
(20, 39)
birl.utilities.dataset.find_split_objects(hist, nb_objects=2, threshold=0.01)[source]

find the N largest objects and set split as middle distance among them

Parameters
  • hist (list(float)) – input vector

  • nb_objects (int) – number of desired objects

  • threshold (float) – threshold for input vector

Return list(int)

>>> vec = np.array([1] * 15 + [0] * 5 + [1] * 20)
>>> find_split_objects(vec)
[17]
birl.utilities.dataset.generate_pairing(count, step_hide=None)[source]

generate registration pairs with an option of hidden landmarks

Parameters
  • count (int) – total number of samples

  • step_hide (int|None) – hide every N sample

Return list((int, int)), list(bool)

registration pairs

>>> generate_pairing(4, None)  
([(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)],
 [True, True, True, True, True, True])
>>> generate_pairing(4, step_hide=3)  
([(0, 1), (0, 2), (1, 2), (3, 1), (3, 2)],
 [False, False, True, False, False])
birl.utilities.dataset.get_close_diag_corners(points)[source]

finds points closes to the top left and bottom right corner

Parameters

points (ndarray) – set of points

Return tuple(ndarray,ndarray)

begin and end of imaginary diagonal

>>> np.random.seed(0)
>>> points = np.random.randint(1, 9, (20, 2))
>>> get_close_diag_corners(points)
(array([1, 2]), array([7, 8]), (12, 10))
birl.utilities.dataset.histogram_match_cumulative_cdf(source, reference, norm_img_size=1024)[source]

Adjust the pixel values of a gray-scale image such that its histogram matches that of a target image

Parameters
  • source (ndarray) – 2D image to be transformed, np.array<height1, width1>

  • reference (ndarray) – reference 2D image, np.array<height2, width2>

Return ndarray

transformed image, np.array<height1, width1>

>>> np.random.seed(0)
>>> img = histogram_match_cumulative_cdf(np.random.randint(128, 145, (150, 200)),
...                                      np.random.randint(0, 18, (200, 180)))
>>> img.astype(int)  
array([[13, 16,  0, ..., 12,  2,  5],
       [17,  9,  1, ..., 16,  9,  0],
       [11, 12, 14, ...,  8,  5,  4],
       ...,
       [12,  6,  3, ..., 15,  0,  3],
       [11, 17,  2, ..., 12, 12,  5],
       [ 6, 12,  3, ...,  8,  0,  1]])
>>> np.bincount(img.ravel()).astype(int)  
array([1705, 1706, 1728, 1842, 1794, 1866, 1771,    0, 1717, 1752, 1757,
       1723, 1823, 1833, 1749, 1718, 1769, 1747])
>>> img_source = np.random.randint(50, 245, (2500, 3000)).astype(float)
>>> img_source[-1, -1] = 255
>>> img = histogram_match_cumulative_cdf(img_source / 255., img)
>>> np.array(img.shape, dtype=int)
array([2500, 3000])
birl.utilities.dataset.image_histogram_matching(source, reference, use_color='hsv', norm_img_size=4096)[source]

adjust image histogram between two images

Optionally transform the image to more continues color space. The source and target image does not need to be the same size, but RGB/gray.

See cor related information:

Parameters
  • source (ndarray) – 2D image to be transformed

  • reference (ndarray) – reference 2D image

  • use_color (str) – using color space for hist matching

  • norm_img_size (int) – subsample image to this max size

Return ndarray

transformed image

>>> from birl.utilities.data_io import update_path, load_image
>>> path_imgs = os.path.join(update_path('data-images'), 'rat-kidney_', 'scale-5pc')
>>> img1 = load_image(os.path.join(path_imgs, 'Rat-Kidney_HE.jpg'))
>>> img2 = load_image(os.path.join(path_imgs, 'Rat-Kidney_PanCytokeratin.jpg'))
>>> image_histogram_matching(img1, img2).shape == img1.shape
True
>>> img = image_histogram_matching(img1[..., 0], np.expand_dims(img2[..., 0], 2))
>>> img.shape == img1.shape[:2]
True
>>> # this should return unchanged source image
>>> image_histogram_matching(np.random.random((10, 20, 30, 5)),
...                          np.random.random((30, 10, 20, 5))).ndim
4
birl.utilities.dataset.inside_polygon(polygon, point)[source]

check if a point is strictly inside the polygon

Parameters
  • polygon (ndarray|list) – polygon contour

  • point (tuple|list) – sample point

Return bool

inside

>>> poly = [[1, 1], [1, 3], [3, 3], [3, 1]]
>>> inside_polygon(poly, [0, 0])
False
>>> inside_polygon(poly, [1, 1])
False
>>> inside_polygon(poly, [2, 2])
True
birl.utilities.dataset.is_point_above_line(point_begin, point_end, point_test)[source]

If point is left from line

Parameters
  • point_begin (list(float)) – starting line point

  • point_end (list(float)) – ending line point

  • point_test (list(float)) – testing point

Return bool

left from line

>>> is_point_above_line([1, 1], [2, 2], [3, 4])
True
birl.utilities.dataset.is_point_in_quadrant_left(point_begin, point_end, point_test)[source]

If point is left quadrant from line end point

Note

negative response does not mean that that the point is on tight side

Parameters
  • point_begin (list(float)) – starting line point

  • point_end (list(float)) – ending line point

  • point_test (list(float)) – testing point

Return int

gives +1 if it is above, -1 if bellow and 0 elsewhere

>>> is_point_in_quadrant_left([1, 1], [3, 1], [2, 2])
1
>>> is_point_in_quadrant_left([3, 1], [1, 1], [2, 0])
1
>>> is_point_in_quadrant_left([1, 1], [3, 1], [2, 0])
-1
>>> is_point_in_quadrant_left([1, 1], [3, 1], [4, 2])
0
birl.utilities.dataset.is_point_inside_perpendicular(point_begin, point_end, point_test)[source]

If point is left from line and perpendicularly in between line segment

Note

negative response does not mean that that the point is on tight side

Parameters
  • point_begin (list(float)) – starting line point

  • point_end (list(float)) – ending line point

  • point_test (list(float)) – testing point

Return int

gives +1 if it is above, -1 if bellow and 0 elsewhere

>>> is_point_inside_perpendicular([1, 1], [3, 1], [2, 2])
1
>>> is_point_inside_perpendicular([1, 1], [3, 1], [2, 0])
-1
>>> is_point_inside_perpendicular([1, 1], [3, 1], [4, 2])
0
birl.utilities.dataset.line_angle_2d(point_begin, point_end, deg=True)[source]

Compute direction of line with given two points

the zero is horizontal in direction [1, 0]

Parameters
  • point_begin (list(float)) – starting line point

  • point_end (list(float)) – ending line point

  • deg (bool) – return angle in degrees

Return float

orientation

>>> [line_angle_2d([0, 0], p) for p in ((1, 0), (0, 1), (-1, 0), (0, -1))]
[0.0, 90.0, 180.0, -90.0]
>>> line_angle_2d([1, 1], [2, 3])  
63.43...
>>> line_angle_2d([1, 2], [-2, -3])  
-120.96...
birl.utilities.dataset.list_sub_folders(path_folder, name='*')[source]

list all sub folders with particular name pattern

Parameters
  • path_folder (str) – path to a particular folder

  • name (str) – name pattern

Return list(str)

folders

>>> from birl.utilities.data_io import update_path
>>> paths = list_sub_folders(update_path('data-images'))
>>> list(map(os.path.basename, paths))  
['images', 'landmarks', 'lesions_', 'rat-kidney_'...]
birl.utilities.dataset.load_large_image(img_path)[source]

loading very large images

Note

For the loading we have to use matplotlib while ImageMagic nor other lib (opencv, skimage, Pillow) is able to load larger images then 64k or 32k.

Parameters

img_path (str) – path to the image

Return ndarray

image

birl.utilities.dataset.norm_angle(angle, deg=True)[source]

Normalise to be in range (-180, 180) degrees

Parameters
  • angle (float) – input angle

  • deg (bool) – use degrees

Return float

norma angle

birl.utilities.dataset.parse_path_scale(path_folder)[source]

from given path with annotation parse scale

Parameters

path_folder (str) – path to the scale folder

Return int

scale

>>> parse_path_scale('scale-.1pc')
nan
>>> parse_path_scale('user-JB_scale-50pc')
50
>>> parse_path_scale('scale-10pc')
10
birl.utilities.dataset.project_object_edge(img, dimension)[source]

scale the image, binarise with Othu and project to one dimension

Parameters
  • img (ndarray) –

  • dimension (int) – select dimension for projection

Return list(float)

>>> img = np.zeros((20, 10, 3))
>>> img[2:6, 1:7, :] = 1
>>> img[10:17, 4:6, :] = 1
>>> project_object_edge(img, 0).tolist()  
[0.0, 0.0, 0.7, 0.7, 0.7, 0.7, 0.0, 0.0, 0.0, 0.0,
 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.0, 0.0, 0.0]
birl.utilities.dataset.save_large_image(img_path, img)[source]

saving large images more then 50k x 50k

Note

For the saving we have to use openCV while other lib (matplotlib, Pillow, ITK) is not able to save larger images then 32k.

Parameters
  • img_path (str) – path to the new image

  • img (ndarray) – image

>>> img = np.zeros((2500, 3200, 4), dtype=np.uint8)
>>> img[:, :, 0] = 255
>>> img[:, :, 1] = 127
>>> img_path = './sample-image.jpg'
>>> save_large_image(img_path, img)
>>> img2 = load_large_image(img_path)
>>> img2[0, 0].tolist()
[255, 127, 0]
>>> img.shape[:2] == img2.shape[:2]
True
>>> os.remove(img_path)
>>> img_path = './sample-image.png'
>>> save_large_image(img_path, img.astype(np.uint16) * 255)
>>> img3 = load_large_image(img_path)
>>> img.shape[:2] == img3.shape[:2]
True
>>> img3[0, 0].tolist()
[255, 127, 0]
>>> save_large_image(img_path, img2 / 255. * 1.15)  # test overwrite message
>>> os.remove(img_path)
birl.utilities.dataset.scale_large_images_landmarks(images, landmarks)[source]

scale images and landmarks up to maximal image size

Parameters
  • images (list(ndarray)) – list of images

  • landmarks (list(ndarray)) – list of landmarks

Return tuple(list(ndarray),list(ndarray))

lists of images and landmarks

>>> scale_large_images_landmarks([np.zeros((8000, 500, 3), dtype=np.uint8)],
...                              [None, None])  
([array(...)], [None, None])
birl.utilities.dataset.simplify_polygon(points, tol_degree=5)[source]

simplify path, drop point on the same line

Parameters
  • points (ndarray) – point in polygon

  • tol_degree (float) – tolerance on change in orientation

Return list(list(float))

pints of polygon

>>> pts = [[1, 2], [2, 4], [1, 5], [2, 8], [3, 8], [5, 8], [7, 8], [8, 7],
...     [8, 5], [8, 3], [8, 1], [7, 1], [6, 1], [4, 1], [3, 1], [3, 2], [2, 2]]
>>> simplify_polygon(pts)
[[1, 2], [2, 4], [1, 5], [2, 8], [7, 8], [8, 7], [8, 1], [3, 1], [3, 2]]
birl.utilities.dataset.CONVERT_RGB = {'hed': (skimage.color.rgb2hed, skimage.color.hed2rgb), 'hsv': (skimage.color.rgb2hsv, skimage.color.hsv2rgb), 'lab': (skimage.color.rgb2lab, skimage.color.lab2rgb), 'lch': (<function <lambda>>, <function <lambda>>), 'luv': (skimage.color.rgb2luv, skimage.color.luv2rgb), 'rgb': (<function <lambda>>, <function <lambda>>)}[source]

define pair of forward and backward color space conversion

birl.utilities.dataset.IMAGE_EXTENSIONS = ('.png', '.jpg', '.jpeg')[source]

supported image extensions

birl.utilities.dataset.MAX_IMAGE_SIZE = 5000[source]

maximal image size for visualisations, larger images will be downscaled

birl.utilities.dataset.REEXP_FOLDER_SCALE = '\\S*scale-(\\d+)pc'[source]

template for detecting/parsing scale from folder name

birl.utilities.dataset.TISSUE_CONTENT = 0.01[source]

threshold of tissue/background presence on potential cutting line