Performing data classification by warping space: a novel neural network architecture based on vector fields

Selected from arxiv

** By Daniel Vieira et al.**

** Compiled by Heart of the Machine**

** Participation: Jiang Siyuan, Liu Xiaokun**

Recently, vector fields have been used to analyze generative adversarial network (GAN) optimization problems and have yielded quite good results in terms of insight and understanding of GAN limitations, as well as scaling methods. This thesis proposes a new architecture that obtains powerful nonlinear properties by using vector fields as activation functions. Using binary cross-entropy as the loss function, the authors optimize the vector field by stochastic gradient descent methods and achieve good results on small data sets.

By applying the concept of vector fields to neural networks, a large number of established mathematical and physical concepts, abstractions, and visual analysis methods can be found within them. For example, this study utilizes Euler's method of solving ordinary differential equations [11] to implement the process of treating data points as particles flowing with a vector field.

In this paper, computational experiments are completed using three two-dimensional nonlinearly differentiable data sets with vector fields generated from simple Gaussian kernel functions. The loss function consistently decreases with increasing epoch for different initialization hyperparameters. In addition, the authors also further analyzed the experimental results.

** Thesis: Vector Field Based Neural Networks**

Address of the paper: https://arxiv.org/abs/1802.08235

In this paper, we propose a new neural network architecture that combines the rich mathematical and physical ideas in vector fields and uses them as a hidden layer to perform nonlinear transformations on the data. where data points are treated as particles that flow following the direction defined by the vector field, visually characterizing the transformation of data points during the classification process. The architecture moves data points following the flow of the vector field from an initial distribution to a new distribution, with the ultimate goal of separating different classes of data points. In this paper, the optimization problem is solved by learning this vector field through gradient descent.

** 2 Vector field neural nets**

A vector field in N -dimensional space is a smooth function K : R^n → R^n corresponding to the ordinary differential equation (ODE)

among others X ∈ R^n，ODE solution curve of X(t) is called a vector field K streamline。 Given at the time t_0 The top position is X(t_0) = X_0 particles， Its physical interpretation is that every vector K(X) denotes the velocity of a particle acting in a given spatial position， Streamlines indicate particles along the path X(t) Displacement accomplished during propagation。 in time t_N > t_0 time， The particle will be in position X(t_N)。

Given a family of vector fields K(X, θ) defined by some parameters θ, the authors propose a method for searching for the best vector field in the family of vector fields to transform all points X_0 in the input space. Furthermore, points of different classes can be linearly separated between points X(t_N) in the transformation space. Intuitively, vector fields characterize the transformations that make the data linearly separable.

The authors used Euler's method [11] to approximate the solution X(t_N) of the ODE using X_N , where it can be discretized as X_i ≈ X(t_0 + ih) and K(X, θ) can be used as the vector field for our iterative update:.

where h is the step size and N is the number of iterations, so t_N = t_0 + Nh is the hyperparameter and θ denotes the parameter of the vector field. For the Eulerian method, the streamlines of K(θ, X) can be computed exactly as h → 0.

Figure 1 below illustrates the input data transformed by the vector field hierarchy, which also presents the optimal vector field aimed at linearly separating the data. Note that the last layer of the architecture is the linear separator, which can be implemented by a Logistic function.

Figure 1: From left to right, the first row shows the input data, the neural network architecture and the distribution of data points transformed by the vector field layer. The second line shows the vector field and space warping.

** 4 Results and discussion**

This paper uses two scikit-learn machine learning datasets [12] (moons and circles) and a sine dataset (created by the authors).

Figure 2: sin, moons and circle datasets.

Figure 3: Plot of loss function vs. epoch. circle dataset, where θ is equal to 0.03, 0.3, and 3.0, respectively.

In Figure 4, it can be seen that the initial boundary layer is converted into a hyperplane in the transformed space. While the algorithm obtains good classification results by bending the space and extracting the center of the circle to the outside, it also generates an overlap of different points in the initial space.

Figure 4: Initial space, vector field and transformed space.

One way to mitigate the occurrence of overlapping data points in the transformed space is to use regularization, which will act as a damper to smooth out particle movement in the initial space to prevent different points in the initial space from overlapping in the transformed space.

Figure 5: Regularization of the sin dataset (5000 epochs, η = 3.0, λ = 0.0005).

** This article was compiled by Heart of the Machine, please contact this public website for permission to reprint.**

✄------------------------------------------------