History log of /XiangShan/src/main/scala/xiangshan/cache/mmu/L2TLBMissQueue.scala (Results 1 – 20 of 20)
Revision Date Author Comments
# 6967f5d5 02-Mar-2024 peixiaokun <[email protected]>

RVH_L2TLB: connect hptw to missqueue to deal with hptw bypass


# d0de7e4a 26-Aug-2023 peixiaokun <[email protected]>

RVH: finish the desigh of H extention


# 8891a219 08-Oct-2023 Yinan Xu <[email protected]>

Bump rocket-chip (#2353)


# 935edac4 21-Sep-2023 Tang Haojin <[email protected]>

chore: remove deprecated brackets, APIs, etc. (#2321)


# 3c02ee8f 25-Dec-2022 wakafa <[email protected]>

Separate Utility submodule from XiangShan (#1861)

* misc: add utility submodule

* misc: adjust to new utility framework

* bump utility: revert resetgen

* bump huancun


# f1fe8698 18-Jul-2022 Lemover <[email protected]>

l1tlb: tlb's req port can be configured to be block or non-blocked (#1656)

each tlb's port can be configured to be block or non-blocked.
For blocked port, there will be a req miss slot stored in tl

l1tlb: tlb's req port can be configured to be block or non-blocked (#1656)

each tlb's port can be configured to be block or non-blocked.
For blocked port, there will be a req miss slot stored in tlb, but belong to
core pipeline, which means only core pipeline flush will invalid them.

For another, itlb also use PTW Filter but with only 4 entries.
Last, keep svinval extension as usual, still work.


* tlb: add blocked-tlb support, miss frontend changes

* tlb: remove tlb's sameCycle support, result will return at next cycle

* tlb: remove param ShouldBlock, move block method into TLB module

* tlb: fix handle_block's miss_req logic

* mmu.filter: change filter's req.ready to canEnqueue

when filter can't let all the req enqueue, set the req.ready to false.
canEnqueue after filtering has long latency, so we use **_fake
without filtering, but the filter will still receive the reqs if
it can(after filtering).

* mmu.tlb: change name from BTlbPtwIO to VectorTlbPtwIO

* mmu: replace itlb's repeater to filter&repeaternb

* mmu.tlb: add TlbStorageWrapper to make TLB cleaner

more: BlockTlbRequestorIO is same with TlbRequestorIO, rm it

* mmu.tlb: rm unused param in function r_req_apply, fix syntax bug

* [WIP]icache: itlb usage from non-blocked to blocked

* mmu.tlb: change parameter NBWidth to Seq of boolean

* icache.mainpipe: fix itlb's resp.ready, not always true

* mmu.tlb: add kill sigal to blocked req that needs sync but fail

in frontend, icache,itlb,next pipe may not able to sync.
blocked tlb will store miss req ang blocks req, which makes itlb
couldn't work. So add kill logic to let itlb not to store reqs.

One more thing: fix icache's blocked tlb handling logic

* icache.mainpipe: fix tlb's ready_recv logic

icache mainpipe has two ports, but these two ports may not valid
all the same time. So add new signals tlb_need_recv to record whether
stage s1 should wait for the tlb.

* tlb: when flush, just set resp.valid and pf, pf for don't use it

* tlb: flush should concern satp.changed(for blocked io now)

* mmu.tlb: add new flush that doesn't flush reqs

Sfence.vma will flush inflight reqs and flushPipe
But some other sfence(svinval...) will not. So add new flush to
distinguish these two kinds of sfence signal

morw: forget to assign resp result when ptw back, fix it

* mmu.tlb: beautify miss_req_v and miss_v relative logic

* mmu.tlb: fix bug, when ptw back and bypass, concern level to genPPN

bug: when ptw back and bypass, forgot to concern level(1GB/2MB/4KB)
when genPPN.

by the way: some funtions need ": Unit = ", add it.

* mmu.filter: fix bug of canEnqueue, mixed with tlb_req and tlb.req

* icache.mainpipe: fix bug of tlbExcp's usage, & with tlb_need_back

Icache's mainpipe has two ports, but may only port 0 is valid.
When a port is invalid, the tlbexcp should be false.(Actually, should
be ignored).
So & tlb_need_back to fix this bug.

* sfence: instr in svinval ext will also flush pipe

A difficult problem to handle:
Sfence and Svinval will flush MMU, but only Sfence(some svinval)
will flush pipe. For itlb that some requestors are blocked and
icache doesn't recv flush for simplicity, itlb's blocked ptw req
should not be flushed.
It's a huge problem for MMU to handle for good or bad solutions. But
svinval is seldom used, so disable it's effiency.

* mmu: add parameter to control mmu's sfence delay latency

Difficult problem:
itlb's blocked req should not be abandoned, but sfence will flush
all infight reqs. when itlb and itlb repeater's delay is not same(itlb
is flushed, two cycles later, itlb repeater is flushed, then itlb's
ptw req after flushing will be also flushed sliently.
So add one parameter to control the flush delay to be the same.

* mmu.tlb: fix bug of csr.priv's delay & sfence valid when req fire

1. csr.priv's delay
csr.priv should not be delayed, csr.satp should be delayed.
for excep/intr will change csr.priv, which will be changed at one
instruction's (commit?). but csrrw satp will not, so satp has more
cycles to delay.
2. sfence
when sfence valid but blocked req fire, resp should still fire.
3. satp in TlbCsrBundle
let high bits of satp.ppn to be 0.U

* tlb&icache.mainpipe: rm commented codes

* mmu: move method genPPN to entry bundle

* l1tlb: divide l1tlb flush into flush_mmu and flush_pipe

Problem:
For l1tlb, there are blocked and non-blocked req ports.
For blocked ports, there are req slots to store missed reqs.
Some mmu flush like Sfence should not flush miss slots for outside
may still need get tlb resp, no matter wrong and correct resp.
For example. sfence will flush mmu and flush pipe, but won't flush
reqs inside icache, which waiting for tlb resp.
For example, svinval instr will flush mmu, but not flush pipe. so
tlb should return correct resp, althrough the ptw req is flushed
when tlb miss.

Solution:
divide l1tlb flush into flush_mmu and flush_pipe.
The req slot is considered to be a part of core pipeline and should
only be flushed by flush_pipe.
flush_mmu will flush mmu entries and inflight ptw reqs.
When miss but sfence flushed its ptw req, re-send.

* l1tlb: code clean, correct comments and rm unused codes

* l2tlb: divide filterSize into ifiterSize and dfilterSize

* l2tlb: prefetch req won't enter miss queue. Rename MSHR to missqueue

* l1tlb: when disable vm, ptw back should not bypass tlb and should let miss req go ahead

show more ...


# 92e3bfef 14-Apr-2022 Lemover <[email protected]>

mmu.l2tlb: divide missqueue into 'missqueue' and llptw (#1522)

old missqueue: cache req miss slot and mem access-er
Problem: these two func are totally different, make mq hard to handle in a single

mmu.l2tlb: divide missqueue into 'missqueue' and llptw (#1522)

old missqueue: cache req miss slot and mem access-er
Problem: these two func are totally different, make mq hard to handle in a single select policy.
Solution: divide these two funciton into two module.
new MissQueue: only hold reqs that page cache miss and need re-req cache, a simple flushable queue
llptw: Last level ptw, only access ptes, priorityMux queue

* mmu: rename PTW.scala to L2TLB.scala

* mmu: rename PTW to L2TLB

* mmu: rename PtwFsm to PTW

* mmu.l2tlb: divide missqueue into 'missqueue' and llptw

old missqueue: cache req miss slot and mem access-er
Problem: these two func are totally different, make mq hard to handle
in single select policy.
Solution: divide these two funciton into two module.
new MissQueue: only hold reqs that page cache miss and new re-req
cache
llptw: Last level ptw, only access ptes

* mmu.l2tlb: syntax bug that misses io assign

* mmu.l2tlb: fix bug that mistakes ptw's block signal

show more ...


# 1ca0e4f3 10-Dec-2021 Yinan Xu <[email protected]>

core: refactor hardware performance counters (#1335)

This commit optimizes the coding style and timing for hardware
performance counters.

By default, performance counters are RegNext(RegNext(_)).


# ca2f90a6 25-Oct-2021 Lemover <[email protected]>

pma: add pmp-like pma, software can read and write (#1169)

remove the old hard-wired pma and turn to pmp-like csr registers. the pma config is writen in pma register.
1. pma are m-priv csr, so only

pma: add pmp-like pma, software can read and write (#1169)

remove the old hard-wired pma and turn to pmp-like csr registers. the pma config is writen in pma register.
1. pma are m-priv csr, so only m-mode csrrw can change pma
2. even in m-mode, pma should be always checked, no matter lock or not
3. so carefully write pma, make sure not to "suicide"

* pma: add pmp-like pma, just module/bundle added, not to circuit

use reserved 2 bits as atomic and cached

* pma: add pmp-like pma into pmp module

pma have two more attribute than pmp
1. atmoic;
2. c/cache, if false, go to mmio.

pma uses 16+4 machine-level custom ready write csr.
pma will always be checked even in m-mode.

* pma: remove the old MemMap in tlb, mmio arrives next cycle

* pma: ptw raise af when mmio

* pma: fix bug of match's zip with last entry

* pma: fix bug of pass reset signal through method's parameter

strange bug, want to reset, pass reset signal to a method, does not
work.
import chisel3.Module.reset, the method can access reset it's self.

* pma: move some method to trait and fix bug of pma_init value

* pma: fix bug of pma init value assign way

* tlb: fix stupid bug that pf.ld not & fault_valid

* loadunit: fix bug that uop is flushed, pmp's dcache kill failed also

* ifu: mmio access needs f2_valid now

* loadunit: if mmio and have sent fastUop, flush pipe when commit

* storeunit: stu->lsq at stage1 and re-in lsq at stage2 to update mmio

show more ...


# cd365d4c 23-Oct-2021 rvcoresjw <[email protected]>

add performance counters at core and hauncun (#1156)

* Add perf counters
* add reg from hpm counter source
* add print perfcounter enable


# 45f497a4 21-Oct-2021 happy-lx <[email protected]>

asid: add asid, mainly work when hit check, not in sfence.vma (#1090)

add mmu's asid support.
1. put asid inside sram (if the entry is sram), or it will take too many sources.
2. when sfence, just

asid: add asid, mainly work when hit check, not in sfence.vma (#1090)

add mmu's asid support.
1. put asid inside sram (if the entry is sram), or it will take too many sources.
2. when sfence, just flush it all, don't care asid.
3. when hit check, check asid.
4. when asid changed, flush all the inflight ptw req for safety
5. simple asid unit test:
asid 1 write, asid 2 read and check, asid 2 write, asid 1 read and check. same va, different pa

* ASID: make satp's asid bits configurable to RW
* use AsidLength to control it

* ASID: implement asid refilling and hit checking
* TODO: sfence flush with asid

* ASID: implement sfence with asid
* TODO: extract asid from SRAMTemplate

* ASID: extract asid from SRAMTemplate
* all is down
* TODO: test

* fix write to asid

* Sfence: support rs2 of sfence and fix Fence Unit
* rs2 of Sfence should be Reg and pass it to Fence Unit
* judge the value of reg instead of the index in Fence Unit

* mmu: re-write asid

now, asid is stored inside sram, so sfence just flush it
it's a complex job to handle the problem that asid is changed but
no sfence.vma is executed. when asid is changed, all the inflight
mmu reqs are flushed but entries in storage is not influenced.
so the inflight reqs do not need to record asid, just use satp.asid

* tlb: fix bug of refill mask

* ci: add asid unit test

Co-authored-by: ZhangZifei <[email protected]>

show more ...


# bd5d9cb9 18-Oct-2021 Lemover <[email protected]>

l2tlb: optimize l2tlb prefetcher, able to across 2MB (#1129)


# bc063562 14-Oct-2021 Lemover <[email protected]>

l2tlb: add next-line prefetcher (#1108)

预取时机:

或者 发生miss时
或者 发生hit,但是hit的entry是预取上来的
当 页表2MB的level命中
当 预取项不跨2MB项对应的4KB page frame

前面两个限制是为了限制预取的数量

后面两个限制是限制预取请求只会访问最后一级页表

l2tlb: add next-line prefetcher (#1108)

预取时机:

或者 发生miss时
或者 发生hit,但是hit的entry是预取上来的
当 页表2MB的level命中
当 预取项不跨2MB项对应的4KB page frame

前面两个限制是为了限制预取的数量

后面两个限制是限制预取请求只会访问最后一级页表 -› 不占用FSM & (几乎)不会重新访问cache,造成卡死。

=============
some workloads: gcc(5.4%), wrf(13.6%),milc(9.2%)'s ipc increase.
some workloads decrease: namd(-2.5%).
but l2tlb's perf counters are better.
So I think it is worthy to adding the simple next-line prefetch.

The workloads are of ci and in cold-start state, so prefetch may seems to be much better than it should be.
But l2tlb's memory access ability is much better than what it needs, so the prefetch can be added.
=============

* mmu.l2tlb: add params filterSize

* mmu.l2tlb: add prefetch,dont work well

* mmu.l2tlb: add prefetch relative perf counter

* l2tlb: prefetch recv miss req and 'hit but pre-fetched' req

* l2tlb: fix some perf counter about prefetch

* l2tlb: prefetch not cross 2MB && not recv when 2MB level miss

* ci: when error, copy emu and SimTop.v to WAVE_HOME

show more ...


# b6982e83 11-Oct-2021 Lemover <[email protected]>

pmp: add pmp support (#1092)

* [WIP] PMP: add pmp to tlb & csr(ptw part is not added)

* pmp: add pmp, unified

* pmp: add pmp, distributed but same cycle

* pmp: pmp resp next cycle

* [WIP

pmp: add pmp support (#1092)

* [WIP] PMP: add pmp to tlb & csr(ptw part is not added)

* pmp: add pmp, unified

* pmp: add pmp, distributed but same cycle

* pmp: pmp resp next cycle

* [WIP] PMP: add l2tlb missqueue pmp support

* pmp: add pmp to ptw and regnext pmp for frontend

* pmp: fix bug of napot-match

* pmp: fix bug of method aligned

* pmp: when write cfg, update mask

* pmp: fix bug of store af getting in store unit

* tlb: fix bug, add af check(access fault from ptw)

* tlb: af may have higher priority than pf when ptw has af

* ptw: fix bug of sending paddr to pmp and recv af

* ci: add pmp unit test

* pmp: change PMPPlatformGrain to 6 (512bits)

* pmp: fix bug of read_addr

* ci: re-add pmp unit test

* l2tlb: lazymodule couldn't use @chiselName

* l2tlb: fix bug of l2tlb missqueue duplicate req's logic

filt the duplicate req:
old: when enq, change enq state to different state
new: enq + mem.req.fire, more robust

* pmp: pmp checker now supports samecycle & regenable

show more ...


# 1f0e2dc7 27-Sep-2021 Jiawei Lin <[email protected]>

128KB L1D + non-inclusive L2/L3 (#1051)

* L1D: provide independent meta array for load pipe

* misc: reorg files in cache dir

* chore: reorg l1d related files

* bump difftest: use clang to c

128KB L1D + non-inclusive L2/L3 (#1051)

* L1D: provide independent meta array for load pipe

* misc: reorg files in cache dir

* chore: reorg l1d related files

* bump difftest: use clang to compile verialted files

* dcache: add BankedDataArray

* dcache: fix data read way_en

* dcache: fix banked data wmask

* dcache: replay conflict correctly

When conflict is detected:
* Report replay
* Disable fast wakeup

* dcache: fix bank addr match logic

* dcache: add bank conflict perf counter

* dcache: fix miss perf counters

* chore: make lsq data print perttier

* dcache: enable banked ecc array

* dcache: set dcache size to 128KB

* dcache: read mainpipe data from banked data array

* dcache: add independent mainpipe data read port

* dcache: revert size change

* Size will be changed after main pipe refactor

* Merge remote-tracking branch 'origin/master' into l1-size

* dcache: reduce banked data load conflict

* MainPipe: ReleaseData for all replacement even if it's clean

* dcache: set dcache size to 128KB

BREAKING CHANGE: l2 needed to provide right vaddr index to probe l1,
and it has to help l1 to avoid addr alias problem

* chore: fix merge conflict

* Change L2 to non-inclusive / Add alias bits in L1D

* debug: hard coded dup data array for debuging

* dcache: fix ptag width

* dcache: fix amo main pipe req

* dcache: when probe, use vaddr for main pipe req

* dcache: include vaddr in atomic unit req

* dcache: fix get_tag() function

* dcache: fix writeback paddr

* huancun: bump version

* dcache: erase block offset bits in release addr

* dcache: do not require probe vaddr != 0

* dcache: opt banked data read timing

* bump huancun

* dcache: fix atom unit pipe req vaddr

* dcache: simplify main pipe writeback_vaddr

* bump huancun

* dcache: remove debug data array

* Turn on all usr bits in L1

* Bump huancun

* Bump huancun

* enable L2 prefetcher

* bump huancun

* set non-inclusive L2/L3 + 128KB L1 as default config

* Use data in TLBundleB to hint ProbeAck beeds data

* mmu.l2tlb: mem_resp now fills multi mq pte buffer

mq entries can just deq without accessing l2tlb cache

* dcache: handle dirty userbit

* bump huancun

* chore: l1 cache code clean up

* Remove l1plus cache
* Remove HasBankedDataArrayParameters

* Add bus pmu between L3 and Mem

* bump huncun

* dcache: fix l1 probe index generate logic

* Now right probe index will be used according to the len of alias bits

* dcache: clean up amo pipeline

* DCacheParameter rowBits will be removed in the future, now we set it to 128
to make dcache work

* dcache: fix amo word index

* bump huancun

Co-authored-by: William Wang <[email protected]>
Co-authored-by: zhanglinjuan <[email protected]>
Co-authored-by: TangDan <[email protected]>
Co-authored-by: ZhangZifei <[email protected]>
Co-authored-by: wangkaifan <[email protected]>

show more ...


# 9bd9cdfa 11-Sep-2021 Lemover <[email protected]>

mmu.l2tlb: add TimeOutAssert & cut down mem resp data buffer (#1021)

* mmu.l2tlb: add object TimeOutAssert

* mmu.l2tlb: add TimeOutAssert to Repeater

* mmu.l2tlb: cut down mem req buffer from

mmu.l2tlb: add TimeOutAssert & cut down mem resp data buffer (#1021)

* mmu.l2tlb: add object TimeOutAssert

* mmu.l2tlb: add TimeOutAssert to Repeater

* mmu.l2tlb: cut down mem req buffer from 8 ptes to 1 pte each

* util: move some utils from MMUBundle to utils

show more ...


# cc5a5f22 09-Sep-2021 Lemover <[email protected]>

mmu.l2tlb: partially rewrite fsm and miss queue for bug and optimization (#1007)

* mmu.l2tlb: l2tlb now support multiple parallel mem accesses

8 missqueue entry and 1 page table worker
mq entry

mmu.l2tlb: partially rewrite fsm and miss queue for bug and optimization (#1007)

* mmu.l2tlb: l2tlb now support multiple parallel mem accesses

8 missqueue entry and 1 page table worker
mq entry only supports page leaf entry
ptw supports all the three level entries

* mmu.tlb: fix bug of mq.refill_vpn and out.ready

* mmu.tlb: fix bug of perf counter

* mmu.tlb: l2tlb's l3 now 128 sets and 4 ways

* mmu.tlb: miss queue now will 'merge' same mem req addr

* mmu.l2tlb: ptw doesn't access last level pte

* mmu.l2tlb: add mem req mask into ptw

func block_decoupled doesn't work well and has bug in signal ready

* mmu.l2tlb: fix bug of sfence to fsm

add a new state s_check_pte to ptw
fsm now take memPte from outside, doesn't store it inside
mem_resp_valid will arrive a cycle before mem_resp_data

* mmu.l2tlb: rm some state in fsm

* mmu.tlb: set itlb default size

* mmu.l2tlb: unkonwn mq wait bug, change code style to avoid it

* mmu.l2tlb: opt, mq's entry with cache_l3 would not be blocked

* mmu.l2tlb: add many time out assert

* mmu.l2tlb: fix bug of mq enq state change & wait_id

* Revert "mmu.tlb: l2tlb's l3 now 128 sets and 4 ways"

This reverts commit 216e4192e4b01e68ce5502135318bc2473434907.

* Revert "mmu.tlb: set itlb default size"

This reverts commit 670bf1e408384964c601c0a55defbc767eb80698.

* mmu.l2tlb: set miss queue size to 9 and set filter size to 8

if they are equal, itlb may loss its req

show more ...


# b848eea5 05-Sep-2021 Lemover <[email protected]>

mmu.l2tlb: l2tlb now supports multiple mem access at the same time (#1003)

* mmu.l2tlb: l2tlb now support multiple parallel mem accesses

8 missqueue entry and 1 page table worker
mq entry only s

mmu.l2tlb: l2tlb now supports multiple mem access at the same time (#1003)

* mmu.l2tlb: l2tlb now support multiple parallel mem accesses

8 missqueue entry and 1 page table worker
mq entry only supports page leaf entry
ptw supports all the three level entries

* mmu.tlb: fix bug of mq.refill_vpn and out.ready

show more ...


# f320e0f0 24-Jul-2021 Yinan Xu <[email protected]>

misc: update PCL information (#899)

XiangShan is jointly released by ICT and PCL.


# 6d5ddbce 19-Jul-2021 Lemover <[email protected]>

cache,mmu: split PTW and TLB into several files (#890)