@knn_create(cloud:tab)
@knn_rebuild(cloud:tab)
@knn_delete(cloud:tab)

@knn_rnd(dim:int, max_range:numeric, N:int)

knnr_search(cloud:tab, point:tab, n:float)
@knn_rsearch(cloud:tab, point:tab, radius:float)

@knn_scan(cloud:tab, point:tab, n:float, f:fun)
@knn_rscan(cloud:tab, point:tab, radius:float, f:fun)

@knn_combine(cloud:tab, point:tab, n:float, g:fun, v:value)
@knn_rcombine(cloud:tab, point:tab, radius:float, g:fun, v:value)

These functions provides efficient ways to search in a cloud of points the Nearest Neighbors of an arbitrary point.

The cloud is given as a tab of points. A points is represented by a tab listing its coordinates (floats). Points must live all in the same euclidean space i.e., the have the same number of coordinates. This number is the dimension of the underlying space. Functions @knn_... can handle any space of dimension grether than 1. The distance used in this space is the _euclidean distance.

The function @knn_rnd(d, b, N) generates a random cloud of N points in d dimension in the bounding box [0, b]^d.

Function @knn_create builds an internal k-d tree which then used by the others functions. The argument is returned if the k-d tree can be built, and <undef> otherwise. The construction may fails if the cloud is empty, if the points have not the same dimension or if the point coordinates are not floats.

The internal k-d tree must be rebuild if the point coordinates are updated, or if some points are added or deleted to the cloud. This is done using function @knn_rebuild or by calling again @knn_create on the same tab argument.

The internal k-d tree can be released when there is no more nearest neighbors queries to process. This is done by calling @knn_delete on the corresponding cloud.


The six remaining functions are used to retrieve efficiently the nearest neighbors of a point given as the second argument :

  • The search variants return the result in a pait (a tab of two elements). The first element of the pair is a tab listing the indices of teh neighbors in the cloud tab. The second elemnt is a tab listing the distance of the corresponding neighbor to point.

  • The scan variants apply a function f on each neighbor. The number of neigbors is returned. The function f is applyed for each neighbor and takes two arguments: the index of the neigbor in the cloud tab and the distance of this tneighbor to point.

  • The combine variants apply a function g to combine the neighbors., The function g takes three arguments: an accumulator initialzed with value v0, the index of the neigbor in the cloud tab and the distance of this tneighbor to point.

  • The r... variants specify the neighborhood by a radius around a given poiunt.

  • The variants without r specify the neighborhood by looking for the n closest neigbors.

Notices:

  • Search radius and all returned distances are actually squared distances.

  • All functions leave their argument untouched.

  • k-d trees are used to avoid an exhaustive search of the neighbor. They are not always suitable for efficiently finding the nearest neighbour in high-dimensional spaces. As a general rule, the number N of points in the data should be N ≫ 2^d where d is the space dimension. Otherwise, when k-d trees are used with high-dimensional data, most of the points in the tree will be evaluated and the efficiency is no better than exhaustive search.


For example

     $cloud := @knn_rnd(2, 100, 456)
     $cloud.knn_create()

generates a cloud of 456 points in \mathbb{R}^2 with coordinates is in [0, 100]. Then, the k-d tree is build internally.

It is then possible to make several queries:

    $cloud.knn_search([5., 62.], 10)  ; returns the 10 closest neigbors of [5., 62.]
    $cloud.knn_rsearch([5., 62.], 9.) ; returns neigbors in a radius of 3 around [5., 62.]

remarks than a radious of 3.0 is specified through its square 9.0. The distances to point (5., 62.) given in the result are also squared.

The four others functions can be used to avoid the explicit construction of the query's result:

    $cloud.knn_scan([0., 0.],
                    10,
                    \$index, $dist.(print "neigborh " $index  " is at distance " (@sqrt($dist))))
will print the 10 closest points to the origin.

Using the rcombine function, we can for instance computes the barycenter of the points within a radius of 5 of the origin. The accumulator is a pair [n, p] where n is the number of points in the radius and p sums up the coordinates.

    $p := $cloud.knn_rcombine([0., 0.],
                              25.,
                              \$v, $index, $dist.([$v[0]+1, $v[1] + $cloud[$index]]),
                              [0, [0., 0.]])
    $barycenter := ($p[0] == 0 ? [0, 0] : $p[1]/$p[0])

If there is no points near enough the origin, we set the barycenter to [0, 0] by convention.

 

See also Mathematical Functions @abs    @acos    @asin    @atan    @atan2    @between    @bit_and    @bit_or    @bit_shiftl    @bit_shiftr    @ceil    @clear    @cos    @cosh    @exp    @floor    @knn_create    @knn_rebuild    @knn_delete    @knn_rnd    @knn_search    @knn_rsearch    @knn_scan    @knn_rscan    @knn_combine    @knn_rcombine    @log    @log10    @log2    @max    @min    @ode_solve    @pow    @rand    @rand_int    @random    @rnd_bernoulli    @rnd_binomial    @rnd_exponential    @rnd_gamma    @rnd_geometric    @rnd_normal    @rnd_uniform_float    @rnd_uniform_int    @round    @sin    @sinh    @sqrt    @tan    @y0    @y1