Where you are the controller
Krishna Kumar, Sr. Developer Evangelist - Academic [email_address]
Started as a $30,000 prototype Vision: Shift the world from thinking “We need to understand technology” to " Tec...
<ul><li>Option A: </li></ul>Why Kinect ?
Why Kinect ? <ul><li>Option You: </li></ul>
What is Kinect ?
What is Kinect ? <ul><li>An extraordinary new way to play, </li></ul><ul><li>where you are the controller </li></ul>Voic...
Kinect knows what to do! “ Xbox?!” “ Let’s Play!”
“ What are those things?” ① ③ ②
“ What are those things?” 3D Depth Sensors ① ③
Projected Invisible IR pattern
Depth Computation
Depth Map
“ What are those things?” RGB Camera ②
“ What are those things?” Multi-array Microphone
“ What are those things?” Motorized Tilt
<ul><li>Combination of RGB camera, depth sensor and multi-array microphone </li></ul><ul><ul><li>RBG camera delivers three...
 
Scope of Microsoft Research <ul><li>Significant Investment </li></ul><ul><ul><li>Investing > $9B in R&D (MSR & product dev...
Scope of Microsoft Research <ul><li>Research Areas </li></ul>research.microsoft.com
How does Kinect know what I do? “ Xbox?!” “ Let’s Play!”
Microsoft Research: Object Recognition J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost : Joint Appearance, Shap...
Microsoft Research: Human Body Tracking <ul><li>Wide range of motion </li></ul><ul><li>But limited agility </li></ul><ul><...
XBox calls MSR: September 2008 <ul><li>“ We need a body tracker with </li></ul><ul><ul><li>All body motions… </li></ul></u...
Teach the Computer/Machine Learning <ul><li>Step 1: Collect A LOT of Data </li></ul><ul><ul><li>Teams visit households acr...
Training Data
Training <ul><li>Millions of training images -> millions of classifier parameters </li></ul><ul><ul><li>Very far from “emb...
Recognize Joint Angles <ul><li>Classify each pixel’s probability of being each of 32 body parts </li></ul><ul><li>Determin...
Programmers View
Programmers View
A Platform is Born
Consumer Technologies Push The Envelope Price: $6000 Price: $150
Play Space Field of View and Operational Area <ul><li>Play Space : Ideally need 12ft x 12ft of play space though you can ...
Lighting and Environment <ul><li>Fluorescent or LED lighting are recommended </li></ul><ul><li>No direct light on player ...
Clothing Considerations <ul><li>Avoid anything that conceals your arms or legs </li></ul><ul><li>Avoid wearing flowing clo...
Kinect with more than just games <ul><li>Use your voice or a wave of your hand to: </li></ul><ul><ul><li>Video Kinec...
XBOX LIVE More Ways to Connect with Family and Friends VIDEO KINECT FAMILY CENTER SOCIAL NETWORKS <ul><li>Connect with fam...
 
 
 
 
 
ESPN Home-field advantage in your living room <ul><li>Access over 3,500 live global events from ESPN3.com, including out-o...
 
 
 
 
Where can Kinect go? <ul><li>Air Guitar Hero? </li></ul><ul><li>Shopping in 3D? </li></ul><ul><li>Remote Replacement? </...
 
 
The Kinect SDK <ul><li>Provides both Unmanaged and Managed API </li></ul><ul><ul><li>Unmanaged API – Concepts work in C++ ...
The Kinect Sensor <ul><li>A hybrid device containing the following input devices: </li></ul><ul><ul><li>A color (RGB) came...
RGB CAMERA MULTI-ARRAY MIC MOTORIZED TILT 3D DEPTH SENSORS
Kinect USB cable
The Innards
The Vision System IR laser projector IR camera RGB camera
Kinect video output <ul><li>30 HZ frame rate; 57deg field-of-view </li></ul>8-bit VGA RGB 640 x 480 12-bit monochrome 320 ...
The Audio System
Demo: Multichannel Echo Cancellation Input Stream (What the mic array hears) Post-MEC (What APIs present) MEC
The Kinect SDK <ul><li>Provides access to: </li></ul><ul><ul><li>RGB feed </li></ul></ul><ul><ul><li>Depth feed </li></ul>...
Data Streams <ul><li>Color stream at 640x480 resolution; 32BPP </li></ul><ul><li>Depth stream at 320 x 240 resolution; 16B...
RGB Camera Fundamentals
Camera Data
RGB stream Format <ul><li>Upto 640 x 480 resolution </li></ul><ul><li>Upto 32 bits per pixel </li></ul><ul><li>Data conta...
Stride Stride - # of bytes from one row of pixels in memory to the next
Demos::RGB Camera
Depth Camera Fundamentals
Camera Data
Depth Map Format <ul><li>320 x 240 resolution </li></ul><ul><li>16 bits per pixel </li></ul><ul><ul><li>Upper 13 bits: dep...
Depth Byte Buffer <ul><li>ImageFrame.Image.Bits </li></ul><ul><li>Array of bytes public byte [] Bits; </li></ul><ul><li...
Calculating Distance <ul><li>2 bytes per pixel (16 bits) </li></ul><ul><li>Depth – Distance per pixel </li></ul><ul><ul><l...
Demos::Depth Camera
Skeletal Tracking Fundamentals
Human Depth Sensing Object pattern similarity determines disparity
Kinect Depth Sensing IR pattern similarity determines disparity IR Projector IR Camera
Provided Data
Pipeline Architecture Title Space
Skeleton API
Joints <ul><li>Maximum two players tracked at once </li></ul><ul><ul><li>Six player proposals </li></ul></ul><ul><li>Each...
Provided Data <ul><li>Depth and segmentation map </li></ul>
Depth Map Format <ul><li>320 x 240 resolution </li></ul><ul><li>16 bits per pixel </li></ul><ul><ul><li>Upper 13 bits: dep...
Demos::Skeletal Tracking
Audio Fundamentals
Going Inside the Kinect <ul><li>Four microphone array with hardware-based audio processing </li></ul><ul><ul><li>Multichan...
Audio Data
Speech Recognition <ul><li>Grammar – What we are listening for </li></ul><ul><ul><li>Code – GrammarBuilder, Choices </li><...
Grammar <ul><li><!-- Confirmation_YesNo._value: string [&quot;Yes&quot;, &quot;No&quot;] --> </li></ul><ul><li>< rule id...
Demos::Audio
[email_address]
of 89

Kinect krishna kumar-itkan

Kinect deck - Krishna Kumar
Published on: Mar 3, 2016
Published in: Technology      
Source: www.slideshare.net


Transcripts - Kinect krishna kumar-itkan

  • 1. Where you are the controller
  • 2. Krishna Kumar, Sr. Developer Evangelist - Academic [email_address]
  • 3. Started as a $30,000 prototype Vision: Shift the world from thinking “We need to understand technology” to &quot; Technology needs to understand us &quot;
  • 4. <ul><li>Option A: </li></ul>Why Kinect ?
  • 5. Why Kinect ? <ul><li>Option You: </li></ul>
  • 6. What is Kinect ?
  • 7. What is Kinect ? <ul><li>An extraordinary new way to play, </li></ul><ul><li>where you are the controller </li></ul>Voice Recognition Face Recognition You Recognition Gesture Recognition “ Xbox”
  • 8. Kinect knows what to do! “ Xbox?!” “ Let’s Play!”
  • 9. “ What are those things?” ① ③ ②
  • 10. “ What are those things?” 3D Depth Sensors ① ③
  • 11. Projected Invisible IR pattern
  • 12. Depth Computation
  • 13. Depth Map
  • 14. “ What are those things?” RGB Camera ②
  • 15. “ What are those things?” Multi-array Microphone
  • 16. “ What are those things?” Motorized Tilt
  • 17. <ul><li>Combination of RGB camera, depth sensor and multi-array microphone </li></ul><ul><ul><li>RBG camera delivers three basic color components </li></ul></ul><ul><ul><li>Depth sensors “sees” the room in 3-D </li></ul></ul><ul><ul><li>Microphone locates voices by sound and extracts ambient noise </li></ul></ul><ul><li>Software makes all the magic possible </li></ul><ul><ul><li>Skeletal Tracking </li></ul></ul><ul><ul><li>Face, Gesture Recognition </li></ul></ul><ul><ul><li>Audio Echo cancellation </li></ul></ul><ul><ul><li>Audio Beam Forming </li></ul></ul><ul><ul><li>Speech Recognition </li></ul></ul>
  • 19. Scope of Microsoft Research <ul><li>Significant Investment </li></ul><ul><ul><li>Investing > $9B in R&D (MSR & product dev) </li></ul></ul><ul><li>Staff of over 850 in 55 research areas </li></ul><ul><li>International Research lab locations : </li></ul><ul><ul><li>Redmond, Washington (Sept, 1991) </li></ul></ul><ul><ul><li>San Francisco, California (1995) </li></ul></ul><ul><ul><li>Cambridge, United Kingdom (July, 1997) </li></ul></ul><ul><ul><li>Beijing, People’s Republic of China (Nov, 1998) </li></ul></ul><ul><ul><li>Mountain View, California (July, 2001) </li></ul></ul><ul><ul><li>Bangalore, India (January, 2005) </li></ul></ul><ul><ul><li>Cambridge, Massachusetts (February, 2008) </li></ul></ul><ul><li>Turning ideas into reality. </li></ul>research.microsoft.com
  • 20. Scope of Microsoft Research <ul><li>Research Areas </li></ul>research.microsoft.com
  • 21. How does Kinect know what I do? “ Xbox?!” “ Let’s Play!”
  • 22. Microsoft Research: Object Recognition J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost : Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation. European Conference on Computer Vision, 2006
  • 23. Microsoft Research: Human Body Tracking <ul><li>Wide range of motion </li></ul><ul><li>But limited agility </li></ul><ul><li>And not real-time </li></ul><ul><li>Infinite number of movements </li></ul>R Navaratnam, A Fitzgibbon, R Cipolla The Joint Manifold Model for Semi-supervised Multi-valued Regression IEEE Intl Conf on Computer Vision, 2007
  • 24. XBox calls MSR: September 2008 <ul><li>“ We need a body tracker with </li></ul><ul><ul><li>All body motions… </li></ul></ul><ul><ul><li>All agilities… </li></ul></ul><ul><ul><li>10x Real-time… </li></ul></ul><ul><ul><li>For multiple players… </li></ul></ul><ul><ul><li>… and it has to be 3D  ” </li></ul></ul><ul><li>MSR’s response? </li></ul>
  • 25. Teach the Computer/Machine Learning <ul><li>Step 1: Collect A LOT of Data </li></ul><ul><ul><li>Teams visit households across the globe, filming real users </li></ul></ul><ul><ul><li>Hollywood motion capture studio generates billions of CG images </li></ul></ul>
  • 26. Training Data
  • 27. Training <ul><li>Millions of training images -> millions of classifier parameters </li></ul><ul><ul><li>Very far from “embarrassingly parallel” </li></ul></ul><ul><ul><li>New algorithm for distributed decision-tree training </li></ul></ul><ul><ul><li>Major use of DryadLINQ </li></ul></ul><ul><ul><ul><li>available for download </li></ul></ul></ul>Distributed Data-Parallel Computing Using a High-Level Programming Language M Isard, Y Yu International Conference on Management of Data (SIGMOD), July 2009
  • 28. Recognize Joint Angles <ul><li>Classify each pixel’s probability of being each of 32 body parts </li></ul><ul><li>Determine probabilistic cluster of body configurations consistent with those parts </li></ul><ul><li>Present the most probable to the user </li></ul>t=1 t=2 t=3
  • 29. Programmers View
  • 30. Programmers View
  • 31. A Platform is Born
  • 32. Consumer Technologies Push The Envelope Price: $6000 Price: $150
  • 33. Play Space Field of View and Operational Area <ul><li>Play Space : Ideally need 12ft x 12ft of play space though you can make do with 10ft x 10ft </li></ul><ul><li>Player Position : Ideally is 6-10 feet away from camera </li></ul>
  • 34. Lighting and Environment <ul><li>Fluorescent or LED lighting are recommended </li></ul><ul><li>No direct light on player </li></ul><ul><li>No direct light into sensor lens </li></ul><ul><li>In a stage environment, all lights need to be Infrared-filtered </li></ul><ul><li>To avoid lighting noise do not intersect sensor lens fields of view </li></ul><ul><li>Avoid playing in/next to reflective surfaces </li></ul>
  • 35. Clothing Considerations <ul><li>Avoid anything that conceals your arms or legs </li></ul><ul><li>Avoid wearing flowing clothing such as scarves or long dresses and skirts </li></ul><ul><ul><li>Long skirts hide the legs and scarves are often mistaken for arms </li></ul></ul><ul><li>Avoid baggy jackets or overly baggy clothing </li></ul><ul><li>Generally, anything that hides the human form should be removed for optimal game play </li></ul><ul><li>If players with long hair are having difficulty playing, encourage them to pull their hair back and try playing again </li></ul>
  • 36. Kinect with more than just games <ul><li>Use your voice or a wave of your hand to: </li></ul><ul><ul><li>Video Kinect with others* </li></ul></ul><ul><ul><li>Manage your media gallery </li></ul></ul><ul><ul><ul><li>Music with Last.fm* </li></ul></ul></ul><ul><ul><ul><li>HD movies with Zune </li></ul></ul></ul><ul><ul><li>Get in the game with ESPN* </li></ul></ul>* with Xbox LIVE Gold membership
  • 37. XBOX LIVE More Ways to Connect with Family and Friends VIDEO KINECT FAMILY CENTER SOCIAL NETWORKS <ul><li>Connect with family and far away friends, all from the comfort of your living room with Xbox LIVE Video Chat </li></ul><ul><li>Experience the ease and convenience of chat on the big screen with Kinect-enabled auto camera zoom and pan. </li></ul><ul><ul><li>Family Center makes it easy to manage multiple user accounts and edit privacy settings from a single location </li></ul></ul><ul><ul><li>Ensure safe, secure fun for the whole family </li></ul></ul><ul><ul><li>Connect with friends, share photos and updates through Facebook and Twitter </li></ul></ul>
  • 43. ESPN Home-field advantage in your living room <ul><li>Access over 3,500 live global events from ESPN3.com, including out-of-market programming plus fresh video clips from ESPN.com </li></ul><ul><li>Enjoy features like HD programming and on-demand viewing, participate in polls, predictions and trivia. </li></ul><ul><li>See what the Xbox LIVE community is watching and declare what team you’re rooting for </li></ul><ul><li>With Kinect™ control the action right from your couch with just your voice or the wave of your hand </li></ul><ul><li>Featured Content: </li></ul><ul><ul><li>NCAA Football, NCAA Basketball, College Bowl Games, NBA, MLB, Soccer, Golf and Tennis majors </li></ul></ul>
  • 48. Where can Kinect go? <ul><li>Air Guitar Hero? </li></ul><ul><li>Shopping in 3D? </li></ul><ul><li>Remote Replacement? </li></ul><ul><li>Dance Instructor? </li></ul><ul><li>Education? </li></ul><ul><li>Personal Trainer? </li></ul><ul><li>Physical Therapy? </li></ul>“ Xbox?”
  • 51. The Kinect SDK <ul><li>Provides both Unmanaged and Managed API </li></ul><ul><ul><li>Unmanaged API – Concepts work in C++ </li></ul></ul><ul><ul><li>Managed API – Concepts work in both VB/C# </li></ul></ul><ul><li>Samples & documentation to get you started </li></ul><ul><li>Assumes some programming experience </li></ul><ul><li>http://research.microsoft.com/kinectsdk/ </li></ul>
  • 52. The Kinect Sensor <ul><li>A hybrid device containing the following input devices: </li></ul><ul><ul><li>A color (RGB) camera </li></ul></ul><ul><ul><li>A depth sensor </li></ul></ul><ul><ul><li>A microphone array </li></ul></ul><ul><ul><li>A tilt sensor </li></ul></ul><ul><li>Play space control is done through a tilt motor </li></ul><ul><ul><li>Pitch +/- 27 degrees </li></ul></ul>
  • 53. RGB CAMERA MULTI-ARRAY MIC MOTORIZED TILT 3D DEPTH SENSORS
  • 54. Kinect USB cable
  • 55. The Innards
  • 56. The Vision System IR laser projector IR camera RGB camera
  • 57. Kinect video output <ul><li>30 HZ frame rate; 57deg field-of-view </li></ul>8-bit VGA RGB 640 x 480 12-bit monochrome 320 x 240
  • 58. The Audio System
  • 59. Demo: Multichannel Echo Cancellation Input Stream (What the mic array hears) Post-MEC (What APIs present) MEC
  • 60. The Kinect SDK <ul><li>Provides access to: </li></ul><ul><ul><li>RGB feed </li></ul></ul><ul><ul><li>Depth feed </li></ul></ul><ul><ul><li>Skeletal Tracking capabilities </li></ul></ul><ul><ul><li>Audio Beam data </li></ul></ul><ul><ul><li>Speech Recognition </li></ul></ul>
  • 61. Data Streams <ul><li>Color stream at 640x480 resolution; 32BPP </li></ul><ul><li>Depth stream at 320 x 240 resolution; 16BPP </li></ul><ul><li>Skeletal Joint positions </li></ul><ul><li>Frame #s, TimeStamps, Tilt sensor data </li></ul><ul><li>Echo-canceled audio </li></ul><ul><li>Higher level systems </li></ul><ul><ul><li>Speech recognition </li></ul></ul>
  • 62. RGB Camera Fundamentals
  • 63. Camera Data
  • 64. RGB stream Format <ul><li>Upto 640 x 480 resolution </li></ul><ul><li>Upto 32 bits per pixel </li></ul><ul><li>Data contained in ImageFrame.Image.Bits </li></ul><ul><li>Array of bytes public byte [] Bits; </li></ul><ul><li>Array </li></ul><ul><ul><li>Starts at top left of image </li></ul></ul><ul><ul><li>Moves left to right, then top to bottom </li></ul></ul>
  • 65. Stride Stride - # of bytes from one row of pixels in memory to the next
  • 66. Demos::RGB Camera
  • 67. Depth Camera Fundamentals
  • 68. Camera Data
  • 69. Depth Map Format <ul><li>320 x 240 resolution </li></ul><ul><li>16 bits per pixel </li></ul><ul><ul><li>Upper 13 bits: depth in mm: 800 mm to 4000 mm range </li></ul></ul><ul><ul><li>Lower 3 bits: segmentation mask </li></ul></ul><ul><li>Depth value 0 means unknown </li></ul><ul><ul><li>Shadows, low reflectivity, and high reflectivity among the few reasons </li></ul></ul><ul><li>Segmentation index </li></ul><ul><ul><li>0 – no player </li></ul></ul><ul><ul><li>1 – skeleton 0 </li></ul></ul><ul><ul><li>2 – skeleton 1 </li></ul></ul><ul><ul><li>… </li></ul></ul>
  • 70. Depth Byte Buffer <ul><li>ImageFrame.Image.Bits </li></ul><ul><li>Array of bytes public byte [] Bits; </li></ul><ul><li>Array </li></ul><ul><ul><li>Starts at top left of image </li></ul></ul><ul><ul><li>Moves left to right, then top to bottom </li></ul></ul><ul><ul><li>Represents distance for pixel </li></ul></ul>
  • 71. Calculating Distance <ul><li>2 bytes per pixel (16 bits) </li></ul><ul><li>Depth – Distance per pixel </li></ul><ul><ul><li>Bitshift second byte by 8 </li></ul></ul><ul><ul><li>Distance (0,0) = ( int )(Bits[0] | Bits[1] << 8 ); </li></ul></ul><ul><li>DepthAndPlayer Index – Includes Player index </li></ul><ul><ul><li>Bitshift by 3 first byte (player index), 5 second byte </li></ul></ul><ul><ul><li>Distance (0,0) = ( int )(Bits[0] >> 3 | Bits[1] << 5 ); </li></ul></ul>
  • 72. Demos::Depth Camera
  • 73. Skeletal Tracking Fundamentals
  • 74. Human Depth Sensing Object pattern similarity determines disparity
  • 75. Kinect Depth Sensing IR pattern similarity determines disparity IR Projector IR Camera
  • 76. Provided Data
  • 77. Pipeline Architecture Title Space
  • 78. Skeleton API
  • 79. Joints <ul><li>Maximum two players tracked at once </li></ul><ul><ul><li>Six player proposals </li></ul></ul><ul><li>Each player with set of <x, y, z> joints in meters </li></ul><ul><li>Each joint has associated state </li></ul><ul><ul><li>Tracked, Not tracked, or Inferred </li></ul></ul><ul><li>Inferred - Occluded, clipped, or low confidence joints </li></ul><ul><li>Not Tracked - Rare, but your code must check for this state </li></ul>
  • 80. Provided Data <ul><li>Depth and segmentation map </li></ul>
  • 81. Depth Map Format <ul><li>320 x 240 resolution </li></ul><ul><li>16 bits per pixel </li></ul><ul><ul><li>Upper 13 bits: depth in mm: 800 mm to 4000 mm range </li></ul></ul><ul><ul><li>Lower 3 bits: segmentation mask </li></ul></ul><ul><li>Depth value 0 means unknown </li></ul><ul><ul><li>Shadows, low reflectivity, and high reflectivity among the few reasons </li></ul></ul><ul><li>Segmentation index </li></ul><ul><ul><li>0 – no player </li></ul></ul><ul><ul><li>1 – skeleton 0 </li></ul></ul><ul><ul><li>2 – skeleton 1 </li></ul></ul><ul><ul><li>… </li></ul></ul>
  • 82. Demos::Skeletal Tracking
  • 83. Audio Fundamentals
  • 84. Going Inside the Kinect <ul><li>Four microphone array with hardware-based audio processing </li></ul><ul><ul><li>Multichannel echo cancellation (MEC) </li></ul></ul><ul><ul><li>Sound position tracking </li></ul></ul><ul><ul><li>Other digital signal processing (noise suppression and reduction) </li></ul></ul>
  • 85. Audio Data
  • 86. Speech Recognition <ul><li>Grammar – What we are listening for </li></ul><ul><ul><li>Code – GrammarBuilder, Choices </li></ul></ul><ul><ul><li>Speech Recognition Grammar Specification (SRGS) </li></ul></ul><ul><ul><ul><li>C:Program Files (x86)Microsoft Speech Platform SDKSamplesSample Grammars </li></ul></ul></ul><ul><li>Note: Set AutomaticGainControl = false </li></ul>
  • 87. Grammar <ul><li><!-- Confirmation_YesNo._value: string [&quot;Yes&quot;, &quot;No&quot;] --> </li></ul><ul><li>< rule id =&quot;Confirmation_YesNo&quot; scope =&quot;public&quot;> </li></ul><ul><li>< example > yes </ example > </li></ul><ul><li>< example > no </ example > </li></ul><ul><li>< one-of > </li></ul><ul><li>< item > </li></ul><ul><li>< ruleref uri =&quot;#Confirmation_Yes&quot; /> </li></ul><ul><li></ item > </li></ul><ul><li>< item > </li></ul><ul><li>< ruleref uri =&quot;#Confirmation_No&quot; /> </li></ul><ul><li></ item > </li></ul><ul><li></ one-of > </li></ul><ul><li>< tag > out = rules.latest() </ tag > </li></ul><ul><li></ rule > </li></ul><ul><li></ rule > </li></ul><!-- Confirmation_Yes._value: string [&quot;Yes&quot;] --> < rule id =&quot;Confirmation_Yes&quot; scope =&quot;public&quot;> < example > yes </ example > < example > yes please </ example > < one-of > < item > yes </ item > < item > yeah </ item > < item > yep </ item > < item > ok </ item > </ one-of > < item repeat =&quot;0-1&quot;> please </ item > < tag > out._value = &quot;Yes&quot;; </ tag >
  • 88. Demos::Audio
  • 89. [email_address]

Related Documents