This post documents three challenges faced when implementing an algorithm for controllable heliotrope video as describe here. I completed this as coursework for my current degree and this post is a slightly modified version of the report I submitted.
“Compute better paths that somehow take both image-difference AND trajectory-similarity into account.”
As no further guidance is provided in the notes, I define ‘trajectory similarity’ to mean how close the interpolated path comes to the input path which for our purposes, always is a linear element with known start and end points.
Rather than modifying the distance matrix itself, my method applies during the selection of the optimal path after the optical flow calculation.
Apart from the normal consideration of how close the interpolated path comes to the target point, this adds the factor of how close it came to the actual path that the user specified. The relative importance of those factors can be tuned by the user by adjusting a weighting coefficient, β, and by optimally applying a non-linear weighting as detailed below.
Calculation of trajectory similarity
The main input path the user provides is a piecewise-linear function. Thus, if a curve-shaped path is desired, the user can generate it by increasing the number of segments in that function. My trajectory method ensures that the synthesised paths come as close as possible to the actual ‘segments’ of the function, thereby guaranteeing a trajectory that comes close to the line input by the user originally.
Thus, the output of the optical flow for each path is evaluated once based on how close the end point comes to the desired output and additionally for how similar the trajectory was to the line segment. To do, this we calculate the average distance of all advected points in the path to the line segment. Using the appropriate weighting as given by β, we then combine this with the information of end point similarity. For every possible path through the graph, we thus calculate its final cost p:
‘dist’ represents the distance between its two operands,
‘path’ one possible path in
the set of Paths, P, which is generated by graph traversal
and line_segment is the current piece-wise linear element
This results in a vector expressing the cost we have given to all possible paths. The algorithm then simply selects the path with the minimal cost.
Note: As the optical flow tends to result in points that are clustered together, it is more stable to calculate the distance of each point along the optical flow to a truncated version of the piecewise linear element. Thus, if the number of substeps in the path was 3, we truncate the line segment three times as follows:
This is the implementation that can found in my script.
Because of the limited amount of input images and available paths, this does not actually make a difference for the input sequence in the majority of cases unless β is very close to 1. However, tests on different datasets should reveal the effect of this additional constraint more readily.
A user might wish to specify that trajectory similarity is only important for a specific part of the line segment. In this case, we can simply apply a weighting function to the distance values summed up for every path, which would modify our equation as follows.
Where w is a (non-linear) function yielding the appropriate weights.
Note that weighting is currently not implemented but one way the system could be expanded in future.
“Render slow motion interpolation based initially on the flow between real images in the collection. If synthesized images have artefacts, also consider inventing some post-processing tricks.”
My submission implements two approaches to slow motion. Both methods allow a variable number of interpolated frames.
Creating a slow motion video requires the reconstruction of frames in between the images in the selected path. Thus the problem can be seen as ‘guessing’ the values of a discrete function which describes the pixels P of each frame over time and for which samples at the start and end of the interval we are interested in are known. I call these images I1 and I2 respectively. Regardless of how sophisticated our method is, the approximation will always be a guess. However, we can improve it considerably by some inferences outlined in B.
The naïve method
The simplest way to achieve slow motion is to simply blend between the two frames that represent the discrete samples between which we wish to interpolate. The following graph illustrates how our scalar alpha which controls the transparency behaves for I1 and I2 in a time interval:
As expected, this method produces ghosting artefacts that are very noticeable.
Slow motion based on optical flow
Using information from the optical flow, we can warp the image on a per-pixel basis to create the missing information between frames.
Thus, for every time interval, we can warp the image at I1 by the results of the optical flow divided by a factor that represents the current sub-step of the frame we wish to interpolate.
While this produces satisfactory results in some scenarios, this method encounters significant problems if an object is for example rotated by a significant amount between I1 and I2.
This is because we are warping the image only based on I1. To overcome this limitation, I have implemented a backwards-forwards warp that generates half the required samples from I1 and half from I2 (using an inverse transform of the optical flow results at I2). Thus, most artefacts of this kind can be eliminated resulting in smooth, naturalistic looking slow motion.
I also implemented one post-processing technique. To avoid black seems at the borders of the image which result from information not being available ‘outside’ the area covered by the pixels at each frame, any pixel for which complementary information cannot be found is set to its original value. As the background in the current image is static anyway, this technique works but might need to be amended for a different type of input sequence with motion at its edges.
Note: In future versions, it might be worth considering implementing a per-pixel motion blur based on the optical flow which would very straightforward given the data already available.
“Produce and use your own data.”
In addition to the above, I have produced my own data for the system. The motivation for this was the creation of stylised motion, specifically creating a system that would allow a ball to be moved across a ‘drawing board’. The stylisation results from the fact that diagonal motion would be disallowed for the ball and would yield motion similar to a shape like this:
Settings this up proved far more difficult than I had original expected as enough motion samples needed to be provided for the ball with the added constraint that diagonal motion had to be more expensive in terms of frame-to-frame difference than non-diagonal motion.
As this extra bit of control was need, I opted to produce the material digitally. The following screen shot illustrates scene set-up in Maya:
Although this setup might seem rather simplistic, exact measurement of distances and timing of the animation was required. The animation of the ball can be described as looping through each line using the following pattern:
Figure 2: The animation of the ball
To move across the entire board, I rendered roughly 300 frames:
Figure 3: A Frame from the drawing board footage
Note that I added ambient occlusion to make the scene less artificial, i.e. to provide the optical flow with some difficulties.
Experiments carried out with the scene proved to be successful as the exact predicted behaviour of the ball is produced (Please see the readme file for information on how to run this configuration of the script).
Figure 4: Left: Desired Path (Red) & Calculated Path Outcomes (Blue); Right: Selected Trajectory (time-blend)
Thus, the footage created provides a virtual drawing board in which the ‘pencil’ can only move horizontally and vertically.
Note: I additionally created and rendered a sequence of a waving flag textured with a checkerboard pattern (in order to help the optical flow) to test whether any meaningful results could be yielded by the method, for example for use in the far background of scenes.
As can be seen from the images, the wind force is animated so information is provided of the flag moving all directions outlined in figure 4. Unfortunately this experiment was unsuccessful but I have included the footage for completeness.
Figure 5: A frame from the flag footage