Cleaning messy pose estimation

There exist several libraries to perform pose estimation. However, the pose estimation output can be messy because of missing frames and incorrect detection and sometimes needs to be cleaned to get the best quality.

I've implemented a simple pose cleaning method to improve the quality of the pose data to be used for my project and would like to share how I did. The code here assumes that the data is single-person pose estimation from AlphaPose, so feel free to adapt it if you need it for other use cases. The pose estimation library I used is AlphaPose, but this can be applied to other libraries' output as well.

In order to clean up the messy pose estimation, we need to:
1) Find correction target
- Find missing frames
- Find incorrect detection
2) Fix missing, incorrect frames

The full code of the pose cleaning can be found here.


Find correction target

In AlphaPose pose estimation output, there's "image_id" for each frame so we can easily find missing frames by checking this id. If the previous image_id was 27 while the current frame's image_id is 29, we can tell that the frame for image_id 28 is missing.

However, how can we judge if the frame contains misdetection or not? I made a simple logic as follows, which was also briefly mentioned in the previous post.


Whether the pose is correct or not was estimated by checking if any keypoint’s difference with its previous frame is too large, because it is impossible for any
joint to travel from one position to another one far away within one frame that is 1/30 seconds. 

To be specific, when X is a set of coordinates of the keypoints, and i is an
index of the current frame, a function  f that determines whether the frame contains misdetection by checking if the difference with the valid previous frame is larger than 50 (when the size of each frame is 640 x 320). This value 50 may be adjusted considering your data's frame size and size of the person in the image. If the previous frame Xi−1 was incorrect detection, it computes the difference
with the one frame before, which was the correct frame. However, this applies only when there is no more than one consecutive incorrect frame right before the current frame. Otherwise, only the first incorrect frame is excluded, and the rest are still used to compute differences with the next frame. 

In python code, filtering out misdetection can be implemented as below.

x = []
wrong = 0 # for counting total incorrect frames
        
for i,d in enumerate(.data):
    frame_idx = int(d['image_id'].split('.')[0])
        
    poselist = list(grouper(d['keypoints'], 3))
    pose = pd.DataFrame(poselist) #convert to pandas for easier computation
    
    if i == 0:
        prev = pose
    else:
        diff = pose - prev
        absdiff = max(diff.max()[:2].max(), -1* diff.min()[:2].min())
        # we count the misdetection and continue
        if absdiff > 50:
            if prevwrong:
                prev=pose
            else:
                wrong += 1
            prevwrong += 1
            continue

    # select only correct data
    x.append(frame_idx)



Fix missing, incorrect frames

Now we know which frames need to be adjusted, so we need to fix them. To simplify the implementation, I simply filtered out the incorrect frames (treating them as missing frames) and recovered all the missing frames using spline interpolation.

First, we use only the correct data to get the spline representation. 

x is the list of frame indices, and ys are the list of the sublists of each coordinate. 

x = []
ys = [[] for i in range(34)] #17 joints x 2 = 34 values for each frame
wrong = 0 # for counting total incorrect frames
        
for i,d in enumerate(.data):
    frame_idx = int(d['image_id'].split('.')[0])
        
    poselist = list(grouper(d['keypoints'], 3))
    pose = pd.DataFrame(poselist) #convert to pandas for easier computation
    
    if i == 0:
        prev = pose
    else:
        diff = pose - prev
        absdiff = max(diff.max()[:2].max(), -1* diff.min()[:2].min())
        # we count the misdetection and continue
        if absdiff > 50:
            if prevwrong:
                prev=pose
            else:
                wrong += 1
            prevwrong += 1
            continue

    # select only correct data
    x.append(frame_idx)
    y_idx = 0
    for row in poselist:
        for value in row[:2]:
            ys[y_idx].append(value)
            y_idx += 1
    prev = pose
    missing = 0
    prevwrong = 0

Then we determine spline approximation given x and ys using scipy library.

from scipy import interpolate

tcks = []
for y in ys:
    tck = interpolate.splrep(self.x, y)
    tcks.append(tck)

And then recover the missing (including incorrect) frames using this spline representation using the following function.

def recover_frame(frame_idx):
    keypoints = []
    for tck in tcks:
        kp = interpolate.splev(frame_idx, tck)
        keypoints.append(float(kp))
    return keypoints


Result

Here is the sample result of the cleaning AlphaPose.
As you can see, the cleaned one is much more stable than the original one.



Comments