ck_slowpath+0x72/0xa9 [] mutex_lock+0x1e/0x22 [] shrink_icache_memory+0x49/0x213 [] shrink_slab+0xe3/0x158 [] try_to_free_pages+0x177/0x232 [] __alloc_pages+0x1fa/0x392 [] alloc_pages_current+0xd1/0xd6 [] __get_free_pages+0xe/0x4d [] __pollwait+0x5e/0xdf [] :nvidia:nv_kern_poll+0x2e/0x73 [] do_select+0x308/0x506 [] core_sys_select+0x1a6/0x254 [] sys_select+0xb5/0x157 Now I think the main problem is having the filesystem block (and do IO) in inode reclaim. The problem is that this doesn't get accounted well and penalizes a random allocator with a big latency spike caused by work generated from elsewhere. I think the best idea would be to avoid this. By design if possible, or by deferring the hard work to an asynchronous context. If the latter, then the fs would probably want to throttle creation of new work with queue size of the deferred work, but let's not get into those details. Anyway, the other obvious thing we looked at is the iprune_mutex which is causing the cascading blocking. We could turn this into an rwsem to improve concurrency. It is unreasonable to totally ban all potentially slow or blocking operations in inode reclaim, so I think this is a cheap way to get a small improvement. This doesn't solve the whole problem of course. The process doing inode reclaim will still take the latency hit, and concurrent processes may end up contending on filesystem locks. So fs developers should keep these problems in mind. Signed-off-by: Nick Piggin Cc: Jan Kara Cc: Al Viro Cc: Christoph Hellwig Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds