birl.utilities.dataset module¶
Some functionality related to dataset
Copyright (C) 2016-2019 Jiri Borovec <jiri.borovec@fel.cvut.cz>
- birl.utilities.dataset.args_expand_images(parser, nb_workers=1, overwrite=True)[source]¶
- expand the parser by standard parameters related to images:
image paths
allow overwrite (optional)
number of jobs
- Parameters
- Return obj
>>> import argparse >>> args_expand_images(argparse.ArgumentParser()) ArgumentParser(...)
- birl.utilities.dataset.args_expand_parse_images(parser, nb_workers=1, overwrite=True)[source]¶
- expand the parser by standard parameters related to images:
image paths
allow overwrite (optional)
number of jobs
- birl.utilities.dataset.common_landmarks(points1, points2, threshold=1.5)[source]¶
find common landmarks in two sets
- Parameters
- Return list(bool)
flags
>>> np.random.seed(0) >>> common = np.random.random((5, 2)) >>> pts1 = np.vstack([common, np.random.random((10, 2))]) >>> pts2 = np.vstack([common, np.random.random((15, 2))]) >>> common_landmarks(pts1, pts2, threshold=1e-3) array([[0, 0], [1, 1], [2, 2], [3, 3], [4, 4]]) >>> np.random.shuffle(pts2) >>> common_landmarks(pts1, pts2, threshold=1e-3) array([[ 0, 13], [ 1, 10], [ 2, 9], [ 3, 14], [ 4, 8]])
- birl.utilities.dataset.compute_bounding_polygon(landmarks)[source]¶
get the polygon where all point lies inside
- Parameters
landmarks (ndarray) – set of points
- Return ndarray
pints of polygon
>>> np.random.seed(0) >>> points = np.random.randint(1, 9, (45, 2)) >>> compute_bounding_polygon(points) [[1, 2], [2, 4], [1, 5], [2, 8], [7, 8], [8, 7], [8, 1], [3, 1], [3, 2]]
- birl.utilities.dataset.compute_convex_hull(landmarks)[source]¶
compute convex hull around landmarks
- Parameters
landmarks (ndarray) – set of points
- Return ndarray
pints of polygon
>>> np.random.seed(0) >>> pts = np.random.randint(15, 30, (10, 2)) >>> compute_convex_hull(pts) array([[27, 20], [27, 25], [22, 24], [16, 21], [15, 18], [26, 18]])
- birl.utilities.dataset.compute_half_polygon(landmarks, idx_start=0, idx_end=- 1)[source]¶
compute half polygon path
- Parameters
- Return ndarray
set of points
>>> pts = [(-1, 1), (0, 0), (0, 2), (1, 1), (1, -0.5), (2, 0)] >>> compute_half_polygon(pts, idx_start=0, idx_end=-1) [[-1.0, 1.0], [0.0, 2.0], [1.0, 1.0], [2.0, 0.0]] >>> compute_half_polygon(pts[:2], idx_start=-1, idx_end=0) [[-1, 1], [0, 0]] >>> pts = [[0, 2], [1, 5], [2, 4], [2, 5], [4, 4], [4, 6], [4, 8], [5, 8], [5, 8]] >>> compute_half_polygon(pts) [[0, 2], [1, 5], [2, 5], [4, 6], [4, 8], [5, 8]]
- birl.utilities.dataset.convert_landmarks_from_itk(lnds, image_size)[source]¶
converting ITK format to used in ImageJ
- Parameters
- Return ndarray
landmarks
>>> convert_landmarks_from_itk([[ 20, 145], [150, 50], [100, 150]], (150, 200)) array([[ 5, 20], [100, 150], [ 0, 100]]) >>> lnds = [[ 20, 145], [150, 50], [100, 150], [0, 0], [150, 200]] >>> img_size = (150, 200) >>> lnds2 = convert_landmarks_from_itk(convert_landmarks_to_itk(lnds, img_size), img_size) >>> np.array_equal(lnds, lnds2) True
- birl.utilities.dataset.convert_landmarks_to_itk(lnds, image_size)[source]¶
converting used landmarks to ITK format
- Parameters
- Return ndarray
landmarks
>>> convert_landmarks_to_itk([[5, 20], [100, 150], [0, 100]], (150, 200)) array([[ 20, 145], [150, 50], [100, 150]])
- birl.utilities.dataset.detect_binary_blocks(vec_bin)[source]¶
detect the binary object by beginning, end and length in !d signal
- Parameters
- Return tuple(list(int),list(int),list(int))
>>> vec = np.array([1] * 15 + [0] * 5 + [1] * 20) >>> detect_binary_blocks(vec) ([0, 20], [15, 39], [14, 19])
- birl.utilities.dataset.estimate_scaling(images, max_size=5000)[source]¶
find scaling for given set of images and maximal image size
- Parameters
- Return float
scaling in range (0, 1)
>>> estimate_scaling([np.zeros((12000, 300, 3))]) 0.4... >>> estimate_scaling([np.zeros((1200, 800, 3))]) 1.0
- birl.utilities.dataset.find_largest_object(hist, threshold=0.01)[source]¶
find the largest objects and give its beginning end end
- Parameters
- Return list(int)
>>> vec = np.array([1] * 15 + [0] * 5 + [1] * 20) >>> find_largest_object(vec) (20, 39)
- birl.utilities.dataset.find_split_objects(hist, nb_objects=2, threshold=0.01)[source]¶
find the N largest objects and set split as middle distance among them
- Parameters
- Return list(int)
>>> vec = np.array([1] * 15 + [0] * 5 + [1] * 20) >>> find_split_objects(vec) [17]
- birl.utilities.dataset.generate_pairing(count, step_hide=None)[source]¶
generate registration pairs with an option of hidden landmarks
- Parameters
count (int) – total number of samples
step_hide (int|None) – hide every N sample
- Return list((int, int)), list(bool)
registration pairs
>>> generate_pairing(4, None) ([(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)], [True, True, True, True, True, True]) >>> generate_pairing(4, step_hide=3) ([(0, 1), (0, 2), (1, 2), (3, 1), (3, 2)], [False, False, True, False, False])
- birl.utilities.dataset.get_close_diag_corners(points)[source]¶
finds points closes to the top left and bottom right corner
- Parameters
points (ndarray) – set of points
- Return tuple(ndarray,ndarray)
begin and end of imaginary diagonal
>>> np.random.seed(0) >>> points = np.random.randint(1, 9, (20, 2)) >>> get_close_diag_corners(points) (array([1, 2]), array([7, 8]), (12, 10))
- birl.utilities.dataset.histogram_match_cumulative_cdf(source, reference, norm_img_size=1024)[source]¶
Adjust the pixel values of a gray-scale image such that its histogram matches that of a target image
- Parameters
source (ndarray) – 2D image to be transformed, np.array<height1, width1>
reference (ndarray) – reference 2D image, np.array<height2, width2>
- Return ndarray
transformed image, np.array<height1, width1>
>>> np.random.seed(0) >>> img = histogram_match_cumulative_cdf(np.random.randint(128, 145, (150, 200)), ... np.random.randint(0, 18, (200, 180))) >>> img.astype(int) array([[13, 16, 0, ..., 12, 2, 5], [17, 9, 1, ..., 16, 9, 0], [11, 12, 14, ..., 8, 5, 4], ..., [12, 6, 3, ..., 15, 0, 3], [11, 17, 2, ..., 12, 12, 5], [ 6, 12, 3, ..., 8, 0, 1]]) >>> np.bincount(img.ravel()).astype(int) array([1705, 1706, 1728, 1842, 1794, 1866, 1771, 0, 1717, 1752, 1757, 1723, 1823, 1833, 1749, 1718, 1769, 1747]) >>> img_source = np.random.randint(50, 245, (2500, 3000)).astype(float) >>> img_source[-1, -1] = 255 >>> img = histogram_match_cumulative_cdf(img_source / 255., img) >>> np.array(img.shape, dtype=int) array([2500, 3000])
- birl.utilities.dataset.image_histogram_matching(source, reference, use_color='hsv', norm_img_size=4096)[source]¶
adjust image histogram between two images
Optionally transform the image to more continues color space. The source and target image does not need to be the same size, but RGB/gray.
See cor related information:
https://www.researchgate.net/post/Histogram_matching_for_color_images
https://github.com/scikit-image/scikit-image/blob/master/skimage/transform/histogram_matching.py
https://stackoverflow.com/questions/32655686/histogram-matching-of-two-images-in-python-2-x
- Parameters
- Return ndarray
transformed image
>>> from birl.utilities.data_io import update_path, load_image >>> path_imgs = os.path.join(update_path('data-images'), 'rat-kidney_', 'scale-5pc') >>> img1 = load_image(os.path.join(path_imgs, 'Rat-Kidney_HE.jpg')) >>> img2 = load_image(os.path.join(path_imgs, 'Rat-Kidney_PanCytokeratin.jpg')) >>> image_histogram_matching(img1, img2).shape == img1.shape True >>> img = image_histogram_matching(img1[..., 0], np.expand_dims(img2[..., 0], 2)) >>> img.shape == img1.shape[:2] True >>> # this should return unchanged source image >>> image_histogram_matching(np.random.random((10, 20, 30, 5)), ... np.random.random((30, 10, 20, 5))).ndim 4
- birl.utilities.dataset.inside_polygon(polygon, point)[source]¶
check if a point is strictly inside the polygon
- Parameters
polygon (ndarray|list) – polygon contour
point (tuple|list) – sample point
- Return bool
inside
>>> poly = [[1, 1], [1, 3], [3, 3], [3, 1]] >>> inside_polygon(poly, [0, 0]) False >>> inside_polygon(poly, [1, 1]) False >>> inside_polygon(poly, [2, 2]) True
- birl.utilities.dataset.is_point_above_line(point_begin, point_end, point_test)[source]¶
If point is left from line
- Parameters
- Return bool
left from line
>>> is_point_above_line([1, 1], [2, 2], [3, 4]) True
- birl.utilities.dataset.is_point_in_quadrant_left(point_begin, point_end, point_test)[source]¶
If point is left quadrant from line end point
Note
negative response does not mean that that the point is on tight side
- Parameters
- Return int
gives +1 if it is above, -1 if bellow and 0 elsewhere
>>> is_point_in_quadrant_left([1, 1], [3, 1], [2, 2]) 1 >>> is_point_in_quadrant_left([3, 1], [1, 1], [2, 0]) 1 >>> is_point_in_quadrant_left([1, 1], [3, 1], [2, 0]) -1 >>> is_point_in_quadrant_left([1, 1], [3, 1], [4, 2]) 0
- birl.utilities.dataset.is_point_inside_perpendicular(point_begin, point_end, point_test)[source]¶
If point is left from line and perpendicularly in between line segment
Note
negative response does not mean that that the point is on tight side
- Parameters
- Return int
gives +1 if it is above, -1 if bellow and 0 elsewhere
>>> is_point_inside_perpendicular([1, 1], [3, 1], [2, 2]) 1 >>> is_point_inside_perpendicular([1, 1], [3, 1], [2, 0]) -1 >>> is_point_inside_perpendicular([1, 1], [3, 1], [4, 2]) 0
- birl.utilities.dataset.line_angle_2d(point_begin, point_end, deg=True)[source]¶
Compute direction of line with given two points
the zero is horizontal in direction [1, 0]
- Parameters
- Return float
orientation
>>> [line_angle_2d([0, 0], p) for p in ((1, 0), (0, 1), (-1, 0), (0, -1))] [0.0, 90.0, 180.0, -90.0] >>> line_angle_2d([1, 1], [2, 3]) 63.43... >>> line_angle_2d([1, 2], [-2, -3]) -120.96...
- birl.utilities.dataset.list_sub_folders(path_folder, name='*')[source]¶
list all sub folders with particular name pattern
- Parameters
- Return list(str)
folders
>>> from birl.utilities.data_io import update_path >>> paths = list_sub_folders(update_path('data-images')) >>> list(map(os.path.basename, paths)) ['images', 'landmarks', 'lesions_', 'rat-kidney_'...]
- birl.utilities.dataset.load_large_image(img_path)[source]¶
loading very large images
Note
For the loading we have to use matplotlib while ImageMagic nor other lib (opencv, skimage, Pillow) is able to load larger images then 64k or 32k.
- Parameters
img_path (str) – path to the image
- Return ndarray
image
- birl.utilities.dataset.norm_angle(angle, deg=True)[source]¶
Normalise to be in range (-180, 180) degrees
- birl.utilities.dataset.parse_path_scale(path_folder)[source]¶
from given path with annotation parse scale
- Parameters
path_folder (str) – path to the scale folder
- Return int
scale
>>> parse_path_scale('scale-.1pc') nan >>> parse_path_scale('user-JB_scale-50pc') 50 >>> parse_path_scale('scale-10pc') 10
- birl.utilities.dataset.project_object_edge(img, dimension)[source]¶
scale the image, binarise with Othu and project to one dimension
- Parameters
img (ndarray) –
dimension (int) – select dimension for projection
- Return list(float)
>>> img = np.zeros((20, 10, 3)) >>> img[2:6, 1:7, :] = 1 >>> img[10:17, 4:6, :] = 1 >>> project_object_edge(img, 0).tolist() [0.0, 0.0, 0.7, 0.7, 0.7, 0.7, 0.0, 0.0, 0.0, 0.0, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.0, 0.0, 0.0]
- birl.utilities.dataset.save_large_image(img_path, img)[source]¶
saving large images more then 50k x 50k
Note
For the saving we have to use openCV while other lib (matplotlib, Pillow, ITK) is not able to save larger images then 32k.
- Parameters
img_path (str) – path to the new image
img (ndarray) – image
>>> img = np.zeros((2500, 3200, 4), dtype=np.uint8) >>> img[:, :, 0] = 255 >>> img[:, :, 1] = 127 >>> img_path = './sample-image.jpg' >>> save_large_image(img_path, img) >>> img2 = load_large_image(img_path) >>> img2[0, 0].tolist() [255, 127, 0] >>> img.shape[:2] == img2.shape[:2] True >>> os.remove(img_path) >>> img_path = './sample-image.png' >>> save_large_image(img_path, img.astype(np.uint16) * 255) >>> img3 = load_large_image(img_path) >>> img.shape[:2] == img3.shape[:2] True >>> img3[0, 0].tolist() [255, 127, 0] >>> save_large_image(img_path, img2 / 255. * 1.15) # test overwrite message >>> os.remove(img_path)
- birl.utilities.dataset.scale_large_images_landmarks(images, landmarks)[source]¶
scale images and landmarks up to maximal image size
- Parameters
- Return tuple(list(ndarray),list(ndarray))
lists of images and landmarks
>>> scale_large_images_landmarks([np.zeros((8000, 500, 3), dtype=np.uint8)], ... [None, None]) ([array(...)], [None, None])
- birl.utilities.dataset.simplify_polygon(points, tol_degree=5)[source]¶
simplify path, drop point on the same line
- Parameters
points (ndarray) – point in polygon
tol_degree (float) – tolerance on change in orientation
- Return list(list(float))
pints of polygon
>>> pts = [[1, 2], [2, 4], [1, 5], [2, 8], [3, 8], [5, 8], [7, 8], [8, 7], ... [8, 5], [8, 3], [8, 1], [7, 1], [6, 1], [4, 1], [3, 1], [3, 2], [2, 2]] >>> simplify_polygon(pts) [[1, 2], [2, 4], [1, 5], [2, 8], [7, 8], [8, 7], [8, 1], [3, 1], [3, 2]]
- birl.utilities.dataset.CONVERT_RGB = {'hed': (skimage.color.rgb2hed, skimage.color.hed2rgb), 'hsv': (skimage.color.rgb2hsv, skimage.color.hsv2rgb), 'lab': (skimage.color.rgb2lab, skimage.color.lab2rgb), 'lch': (<function <lambda>>, <function <lambda>>), 'luv': (skimage.color.rgb2luv, skimage.color.luv2rgb), 'rgb': (<function <lambda>>, <function <lambda>>)}[source]¶
define pair of forward and backward color space conversion
- birl.utilities.dataset.IMAGE_EXTENSIONS = ('.png', '.jpg', '.jpeg')[source]¶
supported image extensions
- birl.utilities.dataset.MAX_IMAGE_SIZE = 5000[source]¶
maximal image size for visualisations, larger images will be downscaled