birl.utilities.dataset module

Some functionality related to dataset

Copyright (C) 2016-2019 Jiri Borovec <jiri.borovec@fel.cvut.cz>

birl.utilities.dataset.args_expand_images(parser, nb_workers=1, overwrite=True)[source]
expand the parser by standard parameters related to images:
  • image paths
  • allow overwrite (optional)
  • number of jobs
Parameters:
  • parser (obj) – existing parser
  • nb_workers (int) – number threads by default
  • overwrite (bool) – allow overwrite images
Return obj:
>>> import argparse
>>> args_expand_images(argparse.ArgumentParser())  # doctest: +ELLIPSIS
ArgumentParser(...)
birl.utilities.dataset.args_expand_parse_images(parser, nb_workers=1, overwrite=True)[source]
expand the parser by standard parameters related to images:
  • image paths
  • allow overwrite (optional)
  • number of jobs
Parameters:
  • parser (obj) – existing parser
  • nb_workers (int) – number threads by default
  • overwrite (bool) – allow overwrite images
Return dict:
birl.utilities.dataset.common_landmarks(points1, points2, threshold=1.5)[source]

find common landmarks in two sets

Parameters:
  • points1 (ndarray|list(list(float))) – first point set
  • points2 (ndarray|list(list(float))) – second point set
  • threshold (float) – threshold for assignment (for landmarks in pixels)
Return list(bool):
 

flags

>>> np.random.seed(0)
>>> common = np.random.random((5, 2))
>>> pts1 = np.vstack([common, np.random.random((10, 2))])
>>> pts2 = np.vstack([common, np.random.random((15, 2))])
>>> common_landmarks(pts1, pts2, threshold=1e-3)
array([[0, 0],
       [1, 1],
       [2, 2],
       [3, 3],
       [4, 4]])
>>> np.random.shuffle(pts2)
>>> common_landmarks(pts1, pts2, threshold=1e-3)
array([[ 0, 13],
       [ 1, 10],
       [ 2,  9],
       [ 3, 14],
       [ 4,  8]])
birl.utilities.dataset.compute_bounding_polygon(landmarks)[source]

get the polygon where all point lies inside

Parameters:landmarks (ndarray) – set of points
Return ndarray:pints of polygon
>>> np.random.seed(0)
>>> points = np.random.randint(1, 9, (45, 2))
>>> compute_bounding_polygon(points)  # doctest: +NORMALIZE_WHITESPACE
[[1, 2], [2, 4], [1, 5], [2, 8], [7, 8], [8, 7], [8, 1], [3, 1], [3, 2]]
birl.utilities.dataset.compute_convex_hull(landmarks)[source]

compute convex hull around landmarks

Parameters:landmarks (ndarray) – set of points
Return ndarray:pints of polygon
>>> np.random.seed(0)
>>> pts = np.random.randint(15, 30, (10, 2))
>>> compute_convex_hull(pts)
array([[27, 20],
       [27, 25],
       [22, 24],
       [16, 21],
       [15, 18],
       [26, 18]])
birl.utilities.dataset.compute_half_polygon(landmarks, idx_start=0, idx_end=-1)[source]

compute half polygon path

Parameters:
  • idx_start (int) – index of starting point
  • idx_end (int) – index of ending point
  • landmarks (ndarray) – set of points
Return ndarray:

set of points

>>> pts = [(-1, 1), (0, 0), (0, 2), (1, 1), (1, -0.5), (2, 0)]
>>> compute_half_polygon(pts, idx_start=0, idx_end=-1)
[[-1.0, 1.0], [0.0, 2.0], [1.0, 1.0], [2.0, 0.0]]
>>> compute_half_polygon(pts[:2], idx_start=-1, idx_end=0)
[[-1, 1], [0, 0]]
>>> pts = [[0, 2], [1, 5], [2, 4], [2, 5], [4, 4], [4, 6], [4, 8], [5, 8], [5, 8]]
>>> compute_half_polygon(pts)
[[0, 2], [1, 5], [2, 5], [4, 6], [4, 8], [5, 8]]
birl.utilities.dataset.convert_landmarks_from_itk(lnds, image_size)[source]

converting ITK format to used in ImageJ

Parameters:
  • lnds (ndarray) – landmarks
  • image_size ((int,int)) – image height, width
Return ndarray:

landmarks

>>> convert_landmarks_from_itk([[ 20, 145], [150,  50], [100, 150]], (150, 200))
array([[  5,  20],
       [100, 150],
       [  0, 100]])
>>> lnds = [[ 20, 145], [150,  50], [100, 150], [0, 0], [150, 200]]
>>> img_size = (150, 200)
>>> lnds2 = convert_landmarks_from_itk(convert_landmarks_to_itk(lnds, img_size), img_size)
>>> np.array_equal(lnds, lnds2)
True
birl.utilities.dataset.convert_landmarks_to_itk(lnds, image_size)[source]

converting used landmarks to ITK format

Parameters:
  • lnds (ndarray) – landmarks
  • image_size ((int,int)) – image size - height, width
Return ndarray:

landmarks

>>> convert_landmarks_to_itk([[5, 20], [100, 150], [0, 100]], (150, 200))
array([[ 20, 145],
       [150,  50],
       [100, 150]])
birl.utilities.dataset.detect_binary_blocks(vec_bin)[source]

detect the binary object by beginning, end and length in !d signal

Parameters:vec_bin (list(bool)) – binary vector with 1 for an object
Return tuple(list(int),list(int),list(int)):
 
>>> vec = np.array([1] * 15 + [0] * 5 + [1] * 20)
>>> detect_binary_blocks(vec)
([0, 20], [15, 39], [14, 19])
birl.utilities.dataset.estimate_scaling(images, max_size=5000)[source]

find scaling for given set of images and maximal image size

Parameters:
  • images (list(ndarray)) – input images
  • max_size (float) – max image size in any dimension
Return float:

scaling in range (0, 1)

>>> estimate_scaling([np.zeros((12000, 300, 3))])  # doctest: +ELLIPSIS
0.4...
>>> estimate_scaling([np.zeros((1200, 800, 3))])
1.0
birl.utilities.dataset.find_largest_object(hist, threshold=0.01)[source]

find the largest objects and give its beginning end end

Parameters:
  • hist (list(float)) – input vector
  • threshold (float) – threshold for input vector
Return list(int):
 
>>> vec = np.array([1] * 15 + [0] * 5 + [1] * 20)
>>> find_largest_object(vec)
(20, 39)
birl.utilities.dataset.find_split_objects(hist, nb_objects=2, threshold=0.01)[source]

find the N largest objects and set split as middle distance among them

Parameters:
  • hist (list(float)) – input vector
  • nb_objects (int) – number of desired objects
  • threshold (float) – threshold for input vector
Return list(int):
 
>>> vec = np.array([1] * 15 + [0] * 5 + [1] * 20)
>>> find_split_objects(vec)
[17]
birl.utilities.dataset.generate_pairing(count, step_hide=None)[source]

generate registration pairs with an option of hidden landmarks

Parameters:
  • count (int) – total number of samples
  • step_hide (int|None) – hide every N sample
Return list((int, int)), list(bool):
 

registration pairs

>>> generate_pairing(4, None)  # doctest: +NORMALIZE_WHITESPACE
([(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)],
 [True, True, True, True, True, True])
>>> generate_pairing(4, step_hide=3)  # doctest: +NORMALIZE_WHITESPACE
([(0, 1), (0, 2), (1, 2), (3, 1), (3, 2)],
 [False, False, True, False, False])
birl.utilities.dataset.get_close_diag_corners(points)[source]

finds points closes to the top left and bottom right corner

Parameters:points (ndarray) – set of points
Return tuple(ndarray,ndarray):
 begin and end of imaginary diagonal
>>> np.random.seed(0)
>>> points = np.random.randint(1, 9, (20, 2))
>>> get_close_diag_corners(points)
(array([1, 2]), array([7, 8]), (12, 10))
birl.utilities.dataset.histogram_match_cumulative_cdf(source, reference, norm_img_size=1024)[source]

Adjust the pixel values of a gray-scale image such that its histogram matches that of a target image

Parameters:
  • source (ndarray) – 2D image to be transformed, np.array<height1, width1>
  • reference (ndarray) – reference 2D image, np.array<height2, width2>
Return ndarray:

transformed image, np.array<height1, width1>

>>> np.random.seed(0)
>>> img = histogram_match_cumulative_cdf(np.random.randint(128, 145, (150, 200)),
...                                      np.random.randint(0, 18, (200, 180)))
>>> img.astype(int)  # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
array([[13, 16,  0, ..., 12,  2,  5],
       [17,  9,  1, ..., 16,  9,  0],
       [11, 12, 14, ...,  8,  5,  4],
       ...,
       [12,  6,  3, ..., 15,  0,  3],
       [11, 17,  2, ..., 12, 12,  5],
       [ 6, 12,  3, ...,  8,  0,  1]])
>>> np.bincount(img.ravel()).astype(int)  # doctest: +NORMALIZE_WHITESPACE
array([1705, 1706, 1728, 1842, 1794, 1866, 1771,    0, 1717, 1752, 1757,
       1723, 1823, 1833, 1749, 1718, 1769, 1747])
>>> img_source = np.random.randint(50, 245, (2500, 3000)).astype(float)
>>> img_source[-1, -1] = 255
>>> img = histogram_match_cumulative_cdf(img_source / 255., img)
>>> np.array(img.shape, dtype=int)
array([2500, 3000])
birl.utilities.dataset.image_histogram_matching(source, reference, use_color='hsv', norm_img_size=4096)[source]

adjust image histogram between two images

Optionally transform the image to more continues color space. The source and target image does not need to be the same size, but RGB/gray.

See cor related information:

Parameters:
  • source (ndarray) – 2D image to be transformed
  • reference (ndarray) – reference 2D image
  • use_color (str) – using color space for hist matching
  • norm_img_size (int) – subsample image to this max size
Return ndarray:

transformed image

>>> from birl.utilities.data_io import update_path, load_image
>>> path_imgs = os.path.join(update_path('data_images'), 'rat-kidney_', 'scale-5pc')
>>> img1 = load_image(os.path.join(path_imgs, 'Rat-Kidney_HE.jpg'))
>>> img2 = load_image(os.path.join(path_imgs, 'Rat-Kidney_PanCytokeratin.jpg'))
>>> image_histogram_matching(img1, img2).shape == img1.shape
True
>>> img = image_histogram_matching(img1[..., 0], np.expand_dims(img2[..., 0], 2))
>>> img.shape == img1.shape[:2]
True
>>> # this should return unchanged source image
>>> image_histogram_matching(np.random.random((10, 20, 30, 5)),
...                          np.random.random((30, 10, 20, 5))).ndim
4
birl.utilities.dataset.inside_polygon(polygon, point)[source]

check if a point is strictly inside the polygon

Parameters:
  • polygon (ndarray|list) – polygon contour
  • point (tuple|list) – sample point
Return bool:

inside

>>> poly = [[1, 1], [1, 3], [3, 3], [3, 1]]
>>> inside_polygon(poly, [0, 0])
False
>>> inside_polygon(poly, [1, 1])
False
>>> inside_polygon(poly, [2, 2])
True
birl.utilities.dataset.is_point_above_line(point_begin, point_end, point_test)[source]

If point is left from line

Parameters:
  • point_begin (list(float)) – starting line point
  • point_end (list(float)) – ending line point
  • point_test (list(float)) – testing point
Return bool:

left from line

>>> is_point_above_line([1, 1], [2, 2], [3, 4])
True
birl.utilities.dataset.is_point_in_quadrant_left(point_begin, point_end, point_test)[source]

If point is left quadrant from line end point

Note that negative response does not mean that that the point is on tight side

Parameters:
  • point_begin (list(float)) – starting line point
  • point_end (list(float)) – ending line point
  • point_test (list(float)) – testing point
Return int:

gives +1 if it is above, -1 if bellow and 0 elsewhere

>>> is_point_in_quadrant_left([1, 1], [3, 1], [2, 2])
1
>>> is_point_in_quadrant_left([3, 1], [1, 1], [2, 0])
1
>>> is_point_in_quadrant_left([1, 1], [3, 1], [2, 0])
-1
>>> is_point_in_quadrant_left([1, 1], [3, 1], [4, 2])
0
birl.utilities.dataset.is_point_inside_perpendicular(point_begin, point_end, point_test)[source]

If point is left from line and perpendicularly in between line segment

Note that negative response does not mean that that the point is on tight side

Parameters:
  • point_begin (list(float)) – starting line point
  • point_end (list(float)) – ending line point
  • point_test (list(float)) – testing point
Return int:

gives +1 if it is above, -1 if bellow and 0 elsewhere

>>> is_point_inside_perpendicular([1, 1], [3, 1], [2, 2])
1
>>> is_point_inside_perpendicular([1, 1], [3, 1], [2, 0])
-1
>>> is_point_inside_perpendicular([1, 1], [3, 1], [4, 2])
0
birl.utilities.dataset.line_angle_2d(point_begin, point_end, deg=True)[source]

Compute direction of line with given two points

the zero is horizontal in direction [1, 0]

Parameters:
  • point_begin (list(float)) – starting line point
  • point_end (list(float)) – ending line point
  • deg (bool) – return angle in degrees
Return float:

orientation

>>> [line_angle_2d([0, 0], p) for p in ((1, 0), (0, 1), (-1, 0), (0, -1))]
[0.0, 90.0, 180.0, -90.0]
>>> line_angle_2d([1, 1], [2, 3])  # doctest: +ELLIPSIS
63.43...
>>> line_angle_2d([1, 2], [-2, -3])  # doctest: +ELLIPSIS
-120.96...
birl.utilities.dataset.list_sub_folders(path_folder, name='*')[source]

list all sub folders with particular name pattern

Parameters:
  • path_folder (str) – path to a particular folder
  • name (str) – name pattern
Return list(str):
 

folders

>>> from birl.utilities.data_io import update_path
>>> paths = list_sub_folders(update_path('data_images'))
>>> list(map(os.path.basename, paths))  # doctest: +ELLIPSIS
['images', 'landmarks', 'lesions_', 'rat-kidney_'...]
birl.utilities.dataset.load_large_image(img_path)[source]

loading very large images

Note, for the loading we have to use matplotlib while ImageMagic nor other
lib (opencv, skimage, Pillow) is able to load larger images then 64k or 32k.
Parameters:img_path (str) – path to the image
Return ndarray:image
birl.utilities.dataset.norm_angle(angle, deg=True)[source]

Normalise to be in range (-180, 180) degrees

Parameters:
  • angle (float) – input angle
  • deg (bool) – use degrees
Return float:

norma angle

birl.utilities.dataset.parse_path_scale(path_folder)[source]

from given path with annotation parse scale

Parameters:path_folder (str) – path to the scale folder
Return int:scale
>>> parse_path_scale('scale-.1pc')
nan
>>> parse_path_scale('user-JB_scale-50pc')
50
>>> parse_path_scale('scale-10pc')
10
birl.utilities.dataset.project_object_edge(img, dimension)[source]

scale the image, binarise with Othu and project to one dimension

Parameters:
  • img (ndarray) –
  • dimension (int) – select dimension for projection
Return list(float):
 
>>> img = np.zeros((20, 10, 3))
>>> img[2:6, 1:7, :] = 1
>>> img[10:17, 4:6, :] = 1
>>> project_object_edge(img, 0).tolist()  # doctest: +NORMALIZE_WHITESPACE
[0.0, 0.0, 0.7, 0.7, 0.7, 0.7, 0.0, 0.0, 0.0, 0.0,
 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.0, 0.0, 0.0]
birl.utilities.dataset.save_large_image(img_path, img)[source]

saving large images more then 50k x 50k

Note, for the saving we have to use openCV while other lib (matplotlib, Pillow, ITK) is not able to save larger images then 32k.

Parameters:
  • img_path (str) – path to the new image
  • img (ndarray) – image
>>> img = np.zeros((2500, 3200, 4), dtype=np.uint8)
>>> img[:, :, 0] = 255
>>> img[:, :, 1] = 127
>>> img_path = './sample-image.jpg'
>>> save_large_image(img_path, img)
>>> img2 = load_large_image(img_path)
>>> img2[0, 0].tolist()
[255, 127, 0]
>>> img.shape[:2] == img2.shape[:2]
True
>>> os.remove(img_path)
>>> img_path = './sample-image.png'
>>> save_large_image(img_path, img.astype(np.uint16) * 255)
>>> img3 = load_large_image(img_path)
>>> img.shape[:2] == img3.shape[:2]
True
>>> img3[0, 0].tolist()
[255, 127, 0]
>>> save_large_image(img_path, img2 / 255. * 1.15)  # test overwrite message
>>> os.remove(img_path)
birl.utilities.dataset.scale_large_images_landmarks(images, landmarks)[source]

scale images and landmarks up to maximal image size

Parameters:
  • images (list(ndarray)) – list of images
  • landmarks (list(ndarray)) – list of landmarks
Return tuple(list(ndarray),list(ndarray)):
 

lists of images and landmarks

>>> scale_large_images_landmarks([np.zeros((8000, 500, 3), dtype=np.uint8)],
...                              [None, None])  # doctest: +ELLIPSIS
([array(...)], [None, None])
birl.utilities.dataset.simplify_polygon(points, tol_degree=5)[source]

simplify path, drop point on the same line

Parameters:
  • points (ndarray) – point in polygon
  • tol_degree (float) – tolerance on change in orientation
Return list(list(float)):
 

pints of polygon

>>> pts = [[1, 2], [2, 4], [1, 5], [2, 8], [3, 8], [5, 8], [7, 8], [8, 7],
...     [8, 5], [8, 3], [8, 1], [7, 1], [6, 1], [4, 1], [3, 1], [3, 2], [2, 2]]
>>> simplify_polygon(pts)
[[1, 2], [2, 4], [1, 5], [2, 8], [7, 8], [8, 7], [8, 1], [3, 1], [3, 2]]
birl.utilities.dataset.CONVERT_RGB = {'hed': (<sphinx.ext.autodoc.importer._MockObject object>, <sphinx.ext.autodoc.importer._MockObject object>), 'hsv': (<sphinx.ext.autodoc.importer._MockObject object>, <sphinx.ext.autodoc.importer._MockObject object>), 'lab': (<sphinx.ext.autodoc.importer._MockObject object>, <sphinx.ext.autodoc.importer._MockObject object>), 'lch': (<function <lambda>>, <function <lambda>>), 'luv': (<sphinx.ext.autodoc.importer._MockObject object>, <sphinx.ext.autodoc.importer._MockObject object>), 'rgb': (<function <lambda>>, <function <lambda>>)}[source]

define pair of forward and backward color space conversion

birl.utilities.dataset.IMAGE_EXTENSIONS = ('.png', '.jpg', '.jpeg')[source]

supported image extensions

birl.utilities.dataset.MAX_IMAGE_SIZE = 5000[source]

maximal image size for visualisations, larger images will be downscaled

birl.utilities.dataset.REEXP_FOLDER_SCALE = '\\S*scale-(\\d+)pc'[source]

template for detecting/parsing scale from folder name

birl.utilities.dataset.TISSUE_CONTENT = 0.01[source]

threshold of tissue/background presence on potential cutting line