Maximizing On chip
parallelism
USING SIMULTANEOUS MULTITHREADING (SMT)
Presented By Nagelli Srikanth Goud
Problem with Super scalar
Bottleneck of superscalar processor
19 %
Nomenclature of Multithreading
 Fine Grain Multithreading
 Corse Grain Multithreading
 Simultaneous Multithreading
What’s SMT
 Superscalar + Multithreading
 Multiple Instructions
 Multiple threads
Additional Hardware Required to
Implement
 Multiple Program Counters
 Multiple Registers
 Fetch Unit Selecting
 Per th...
Cache design for SMT
Cache
Configuration
Associativity LRU policy Sim time
L1 -64 kB
L2 -256KB
L1 – DM
L2 – 4 way
LRU
L1 -...
0
5000
10000
15000
20000
25000
c1 c2 c3 c4 c5 c6
NumberofAccess
Configurations
How different cache configuration affect Ac...
0
5000
10000
15000
20000
25000
c1 c2 c3 c4 c5 c6
NumberOfHits
Configurations
How different cache configuration affects Hit...
Simulation Tool :Multi2Sim
 Used the PARSEC Benchmark Suite to evaluate the effect of increase in
number of threads and n...
Performance of Implementing SMT
 Increasing Threads with fixed number of cores
 Increasing Cores with fixed number of th...
EFFECT OF THREADING ON
PERFORMANCE
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
Simtime(ns)
Benchmarks
Ef...
EFFECT OF INCREASING CORES ON
PERFROMANCE
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
Canneal ferret str...
SMT VS MULTIPROCESSOR
SMT Configuration MP Configuration
8 thread 8 issue 8 cores 1 issue
6threads 6 issue 6 cores 1 issue...
SMT VS MP
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
Canneal ferret stream
cluster
fluidanianate freqmi...
Conclusion
 Simultaneous multithreading is better than single chip multi core
processor
 Less wastage of resources
 We ...
REFERENCES
 Simultaneous Multithreading: Maximizing On-Chip Parallelism by Dean M.
Tullsen, Susan J. Eggers, and Henry M....
of 17

Nagelli Srikanth Goud _Simultaneous Multithreading

Published on: Mar 3, 2016
Source: www.slideshare.net


Transcripts - Nagelli Srikanth Goud _Simultaneous Multithreading

  • 1. Maximizing On chip parallelism USING SIMULTANEOUS MULTITHREADING (SMT) Presented By Nagelli Srikanth Goud
  • 2. Problem with Super scalar
  • 3. Bottleneck of superscalar processor 19 %
  • 4. Nomenclature of Multithreading  Fine Grain Multithreading  Corse Grain Multithreading  Simultaneous Multithreading
  • 5. What’s SMT  Superscalar + Multithreading  Multiple Instructions  Multiple threads
  • 6. Additional Hardware Required to Implement  Multiple Program Counters  Multiple Registers  Fetch Unit Selecting  Per thread return Stack  Per thread identifier  Hardware cost increases but its worth That amount as the performance increases.
  • 7. Cache design for SMT Cache Configuration Associativity LRU policy Sim time L1 -64 kB L2 -256KB L1 – DM L2 – 4 way LRU L1 -256 kB L2 -1MB L1 – DM L2 – 4 way LRU L1 -512kB L2 -2MB L1 – DM L2 – 4 way LRU L1 -64 kB L2 -256KB L1 – DM L2 – 8 way LRU L1 -256 kB L2 -1MB L1 – DM L2 – 8 way LRU L1 -512kB L2 -2MB L1 – DM L2 – 8 way LRU Analyzing the result will update it while presenting
  • 8. 0 5000 10000 15000 20000 25000 c1 c2 c3 c4 c5 c6 NumberofAccess Configurations How different cache configuration affect Access times L1-0 L1-1 L1-2 L2-0 L2-1 MM
  • 9. 0 5000 10000 15000 20000 25000 c1 c2 c3 c4 c5 c6 NumberOfHits Configurations How different cache configuration affects Hit rate L1-0 L1-1 L1-2 L2-0 L2-1 MM
  • 10. Simulation Tool :Multi2Sim  Used the PARSEC Benchmark Suite to evaluate the effect of increase in number of threads and number of Cores  Modified the x86 configuration file and memory configuration file to implement different hardware configurations  Studied the effect of increase in threads on performance  Evaluated the difference in performance by varying number of cores vs number of threads
  • 11. Performance of Implementing SMT  Increasing Threads with fixed number of cores  Increasing Cores with fixed number of thread
  • 12. EFFECT OF THREADING ON PERFORMANCE 0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 Simtime(ns) Benchmarks Effect of threading on PARSEC benchmark suite 2 threads 4 threads 6 threads 8 threads
  • 13. EFFECT OF INCREASING CORES ON PERFROMANCE 0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 Canneal ferret stream cluster fluidanianate freqmine swaptions vips x264 bodytrack Simtime(ns) Benchmarks Effect of Cores on PARSEC benchmark suite 2 cores 4 cores 6 cores 8 cores
  • 14. SMT VS MULTIPROCESSOR SMT Configuration MP Configuration 8 thread 8 issue 8 cores 1 issue 6threads 6 issue 6 cores 1 issue 4 threads 4 issue 4 cores 1 issue 2 threads 2 issue 2 cores 1 issue
  • 15. SMT VS MP 0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 Canneal ferret stream cluster fluidanianate freqmine swaptions vips x264 bodytrack Simtime(ns) Benchmarks 2 cores 4 cores 6 cores 8 cores 2 threads 4 threads 6 threads 8 threads
  • 16. Conclusion  Simultaneous multithreading is better than single chip multi core processor  Less wastage of resources  We can achieve similar performance of a multicore processor by implementing Simultaneous multithreading on chip by increasing the number of threads  Issue slots are effectively utilized
  • 17. REFERENCES  Simultaneous Multithreading: Maximizing On-Chip Parallelism by Dean M. Tullsen, Susan J. Eggers, and Henry M.  https://www.multi2sim.org

Related Documents