Lenovo P620 warning

Lenovo switched to a new Motherboard vendor back in April. I had to wait 3 months to get a replacement for a bricked unit that happened during a BIOS update. The new revision has a flawed ethernet controller that leads to hardware panics several times a week. Lenovo refuses to acknowledge this, even though the linux kernel explicitly states Hardware Error.

Beware.

[Fri Jun  3 09:29:54 2022] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 514
[Fri Jun  3 09:29:54 2022] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
[Fri Jun  3 09:29:54 2022] {1}[Hardware Error]: event severity: corrected
[Fri Jun  3 09:29:54 2022] {1}[Hardware Error]:  Error 0, type: corrected
[Fri Jun  3 09:29:54 2022] {1}[Hardware Error]:  fru_text: PcieError
[Fri Jun  3 09:29:54 2022] {1}[Hardware Error]:   section_type: PCIe error
[Fri Jun  3 09:29:54 2022] {1}[Hardware Error]:   port_type: 0, PCIe end point
[Fri Jun  3 09:29:54 2022] {1}[Hardware Error]:   version: 0.2
[Fri Jun  3 09:29:54 2022] {1}[Hardware Error]:   command: 0x0407, status: 0x0010
[Fri Jun  3 09:29:54 2022] {1}[Hardware Error]:   device_id: 0000:01:00.0
[Fri Jun  3 09:29:54 2022] {1}[Hardware Error]:   slot: 0
[Fri Jun  3 09:29:54 2022] {1}[Hardware Error]:   secondary_bus: 0x00
[Fri Jun  3 09:29:54 2022] {1}[Hardware Error]:   vendor_id: 0x1d6a, device_id: 0x07b1
[Fri Jun  3 09:29:54 2022] {1}[Hardware Error]:   class_code: 000002
[Fri Jun  3 09:29:54 2022] {1}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[Fri Jun  3 09:29:54 2022] atlantic 0000:01:00.0: aer_status: 0x00002001, aer_mask: 0x00000000
[Fri Jun  3 09:29:54 2022] atlantic 0000:01:00.0:    [ 0] RxErr                  (First)
[Fri Jun  3 09:29:54 2022] atlantic 0000:01:00.0:    [13] NonFatalErr           
[Fri Jun  3 09:29:54 2022] atlantic 0000:01:00.0: aer_layer=Physical Layer, aer_agent=Receiver ID
[Sat Jun  4 11:42:33 2022] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 514
[Sat Jun  4 11:42:33 2022] {2}[Hardware Error]: It has been corrected by h/w and requires no further action
[Sat Jun  4 11:42:33 2022] {2}[Hardware Error]: event severity: corrected
[Sat Jun  4 11:42:33 2022] {2}[Hardware Error]:  Error 0, type: corrected
[Sat Jun  4 11:42:33 2022] {2}[Hardware Error]:   section_type: PCIe error
[Sat Jun  4 11:42:33 2022] {2}[Hardware Error]:   port_type: 0, PCIe end point
[Sat Jun  4 11:42:33 2022] {2}[Hardware Error]:   version: 0.2
[Sat Jun  4 11:42:33 2022] {2}[Hardware Error]:   command: 0x0407, status: 0x0010
[Sat Jun  4 11:42:33 2022] {2}[Hardware Error]:   device_id: 0000:01:00.0
[Sat Jun  4 11:42:33 2022] {2}[Hardware Error]:   slot: 0
[Sat Jun  4 11:42:33 2022] {2}[Hardware Error]:   secondary_bus: 0x00
[Sat Jun  4 11:42:33 2022] {2}[Hardware Error]:   vendor_id: 0x1d6a, device_id: 0x07b1
[Sat Jun  4 11:42:33 2022] {2}[Hardware Error]:   class_code: 000002
[Sat Jun  4 11:42:33 2022] {2}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[Sat Jun  4 11:42:33 2022] atlantic 0000:01:00.0: aer_status: 0x00000001, aer_mask: 0x00000000
[Sat Jun  4 11:42:33 2022] atlantic 0000:01:00.0:    [ 0] RxErr                  (First)
[Sat Jun  4 11:42:33 2022] atlantic 0000:01:00.0: aer_layer=Physical Layer, aer_agent=Receiver ID
[Sat Jun  4 12:02:22 2022] {3}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 514
[Sat Jun  4 12:02:22 2022] {3}[Hardware Error]: It has been corrected by h/w and requires no further action
[Sat Jun  4 12:02:22 2022] {3}[Hardware Error]: event severity: corrected
[Sat Jun  4 12:02:22 2022] {3}[Hardware Error]:  Error 0, type: corrected
[Sat Jun  4 12:02:22 2022] {3}[Hardware Error]:   section_type: PCIe error
[Sat Jun  4 12:02:22 2022] {3}[Hardware Error]:   port_type: 0, PCIe end point
[Sat Jun  4 12:02:22 2022] {3}[Hardware Error]:   version: 0.2
[Sat Jun  4 12:02:22 2022] {3}[Hardware Error]:   command: 0x0407, status: 0x0010
[Sat Jun  4 12:02:22 2022] {3}[Hardware Error]:   device_id: 0000:01:00.0
[Sat Jun  4 12:02:22 2022] {3}[Hardware Error]:   slot: 0
[Sat Jun  4 12:02:22 2022] {3}[Hardware Error]:   secondary_bus: 0x00
[Sat Jun  4 12:02:22 2022] {3}[Hardware Error]:   vendor_id: 0x1d6a, device_id: 0x07b1
[Sat Jun  4 12:02:22 2022] {3}[Hardware Error]:   class_code: 000002
[Sat Jun  4 12:02:22 2022] {3}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[Sat Jun  4 12:02:22 2022] atlantic 0000:01:00.0: aer_status: 0x00000001, aer_mask: 0x00000000
[Sat Jun  4 12:02:22 2022] atlantic 0000:01:00.0:    [ 0] RxErr                  (First)
[Sat Jun  4 12:02:22 2022] atlantic 0000:01:00.0: aer_layer=Physical Layer, aer_agent=Receiver ID
2 Likes

Just put a P620 together here and seeing the same thing in dmesg output.

Has this caused any problems for you?

Everything seems to be working as expected here, though I am using the built in NIC for 1gbe only, and I only just stood up the machine about 24hrs ago so still too early to really say anything else.

Kind of surprised to find just a single NIC onboard on this model, yet they chose to give us two PS/2 ports. This alone makes an AIC necessary for me as I need separate 1g and 10g minimum, so I’ll probably try installing a dual port NIC and disabling the onboard aquantia.

We too use the onboard nic for 1gigE too, and add a Mellanox card for 25gigE fiber.

What we see is sometimes the whole machine just freezes for a few minutes, and then it comes back. Co-incides with the NIC crash errors in the kernel log. Then we reboot the machine just to make sure. I would think a new kernel might have fixed this, but we are always locked to old OS with ADSK.