-1

we have Hadoop cluster and we are collection metrics collection data in order to investigate slowness behavior on spark applications

after long investigation on our Hadoop cluster

we noticed from Prometheus metrics point that node_disk_io_now is with high values more then normal , and its relevant for all HDFS disks on data-node machines

the node_disk_io_now definition is:

node_disk_io_now (field 9) The only field that should go to zero. Incremented as requests are given to appropriate struct request_queue and decremented as they finish.

we want to know , if tuning kernel parameters can gives positive aspects on disks performance

according to node_disk_io_now definition , seems that too many tasks are waiting in queue ,

and maybe some kernel parameters can help to improve the above bhavior so tasks in queue , will not be there for a long time

King David
  • 781
  • 2
  • 14
  • 27

1 Answers1

1

All tricks available on the hard drive user side can be discovered when the operating system or some application is asking the disk the right questions.

You might test if your operating system and your application will recognize a reserved area that is called host protected area which can be created in linux using the hdparm command.

https://en.wikipedia.org/wiki/Host_protected_area

There is a trap that I read about in

https://www.thomas-krenn.com/de/wiki/SSD_Over-Provisioning_mit_hdparm

that the operating system might reconfigure the hard drive(s) to ignore that setting. The linux itself seems to keep that information in /sys/module/libata/parameters/ignore_hpa according to the link above. "1" means automatic deactivation.

Therefore you need to try out what happens especially after a server reboot.

r2d3
  • 3,298
  • 1
  • 8
  • 24