Hi nitish,
There are a number of issues the impact on the performance of hand tracking - as a pointer.
The first is the time of sensor you are using, the second is the distance the hand is away from the sensor.
The first issue relates to how much RMS related depth noise the sensor is delivering. Basically, depth sensors in the hobby range ( anything under 1000 ) all suffer from different types of error noise - which basically means that from frame to frame - the exactly location reported for any given point can vary by optically observable amount.
For example - an intel d435 might say a point is one depth in the first frame and in the next frame - depending on how far away you are from the sensor that point might be reported as being up to 5mm different or even much worst at greater depths.
So the first job is finding a sensor that works well at the distance range you are interested in tracking across.
The second issue is that - these sensors cant see around corners very well - if for example you look at most of the raw depth samples converted to a color gradient from an orbbec sensor - you will likely see quite noticable dark halo’s around the edges of the closest objects - these halo’s are basically dead spots. The sensor has no idea what is in these dead areas so reports nothing. This is a serious weakness of what are called structured light type sensors.
The intel sensors aim to reduce this particular issue by using stereo sensors instead of mono sensors - and by using some clever post processing to fill in holes where they can - but this too is really just another form of processed noise - or misinformation.
Even the Kinect 2.0 sensor still suffered from quite severe noise issues at times - even though it was a more superior in theory - time of flight based sensor.
All of this compounds to what is key to many of these issues - most of the current technologies fall apart severely when you move hands in front of a body - which increases the noise and errors noticably.
When it comes to using all this - noisy information to try and track a hand - well lets just say the results are rarely good - out of the box.
The kinect sdk for windows looked to solve this in part by adding filtering code to the front end - after the skeleton was tracked - Unscented kalman filters and other techniques such as moving window filters and some forms of exponential filters and predictive motion filters all help in some ways - though some cause latency issues and noticable lag.
We have developed our own custom implementations in-house that merge a number of these techniques with discreet spacial filtering based on predictive comparisons and rejections of improbable jitter movements.
But this all is quite aggressive in terms of processing power and best suited to C++ code implementations over c# performance limitations
Westa