After some time network would die and the only way to bring it back was to restart the computer.
Nothing worked:
- NetworkManager
- sudo /etc/init.d/networking restart
- reloading kernel modules
Dec 7 08:09:36 moonbiter kernel: [ 3751.383550] NETDEV WATCHDOG: eth0: transmit timed outI have two ethernet cards: one builtin and one in a PCI slot (lspci comes in handy):
Dec 7 08:09:36 moonbiter kernel: [ 3751.383626] eth0: Transmit timed out, status 0003, PHY status 786d, resetting...
Dec 7 08:09:36 moonbiter kernel: [ 3751.383939] eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 7c)
05:07.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
Googling around provided with a lot of contradicting solutions. None of them worked. Finally, on one of the forums I found an advice to check IRQs. Indeed this could have been the problem. I recently bought nVidia GeForce 7300 GT and to fit it in I had to move my Realtek ethernet adapter to another PCI slot.
Looking into /proc/interrupts I found:
CPU0
0: 46332 IO-APIC-edge timer
1: 155 IO-APIC-edge i8042
6: 5 IO-APIC-edge floppy
7: 0 IO-APIC-edge parport0
8: 0 IO-APIC-edge rtc
9: 0 IO-APIC-fasteoi acpi
12: 1617 IO-APIC-edge i8042
14: 15664 IO-APIC-edge ide0
15: 1242 IO-APIC-edge ide1
17: 717 IO-APIC-fasteoi eth1, HDA Intel
20: 0 IO-APIC-fasteoi uhci_hcd:usb1
21: 0 IO-APIC-fasteoi sata_via, uhci_hcd:usb3, ehci_hcd:usb5
22: 0 IO-APIC-fasteoi uhci_hcd:usb2
23: 13202 IO-APIC-fasteoi uhci_hcd:usb4, eth0
24: 13183 IO-APIC-fasteoi nvidia
NMI: 0
LOC: 46223
ERR: 0
And in /var/syslog
Dec 7 06:59:06 moonbiter kernel: [ 1637.162744] irq 23: nobody cared (try booting with the "irqpoll" option)
Dec 7 06:59:06 moonbiter kernel: [ 1637.162748]
Dec 7 06:59:06 moonbiter kernel: [ 1637.162749] Call Trace:
Dec 7 06:59:06 moonbiter kernel: [ 1637.162751][__report_bad_irq+30/128] __report_bad_irq+0x1e/0x80
Dec 7 06:59:06 moonbiter kernel: [ 1637.162770] [note_interrupt+643/704] note_interrupt+0x283/0x2c0
Dec 7 06:59:06 moonbiter kernel: [ 1637.162777] [handle_fasteoi_irq+221/272] handle_fasteoi_irq+0xdd/0x110
Dec 7 06:59:06 moonbiter kernel: [ 1637.162897] [_end+129724926/2130332920] :nvidia:_nv003707rm+0x1f/0x27
Dec 7 06:59:06 moonbiter kernel: [ 1637.162904] [do_IRQ+123/256] do_IRQ+0x7b/0x100
Dec 7 06:59:06 moonbiter kernel: [ 1637.162909] [ret_from_intr+0/10] ret_from_intr+0x0/0xa
Dec 7 06:59:06 moonbiter kernel: [ 1637.162914] [pci_conf1_read+0/272] pci_conf1_read+0x0/0x110
Dec 7 06:59:06 moonbiter kernel: [ 1637.162921] [__do_softirq+84/224] __do_softirq+0x54/0xe0
Dec 7 06:59:06 moonbiter kernel: [ 1637.162929] [call_softirq+28/48] call_softirq+0x1c/0x30
Dec 7 06:59:06 moonbiter kernel: [ 1637.162933] [do_softirq+53/144] do_softirq+0x35/0x90
Dec 7 06:59:06 moonbiter kernel: [ 1637.162937] [do_IRQ+128/256] do_IRQ+0x80/0x100
Dec 7 06:59:06 moonbiter kernel: [ 1637.162942] [ret_from_intr+0/10] ret_from_intr+0x0/0xa
Dec 7 06:59:06 moonbiter kernel: [ 1637.162944][_end+127707073/2130332920] :processor:acpi_processor_idle+0x25f/0x456
Dec 7 06:59:06 moonbiter kernel: [ 1637.162964] [_end+127707063/2130332920] :processor:acpi_processor_idle+0x255/0x456
Dec 7 06:59:06 moonbiter kernel: [ 1637.162971] [_end+127706466/2130332920] :processor:acpi_processor_idle+0x0/0x456
Dec 7 06:59:06 moonbiter kernel: [ 1637.162976] [cpu_idle+112/192] cpu_idle+0x70/0xc0
Dec 7 06:59:06 moonbiter kernel: [ 1637.162982] [start_kernel+645/784] start_kernel+0x285/0x310
Dec 7 06:59:06 moonbiter kernel: [ 1637.162987] [x86_64_start_kernel+286/352] _sinittext+0x11e/0x160
Dec 7 06:59:06 moonbiter kernel: [ 1637.162991]
Dec 7 06:59:06 moonbiter kernel: [ 1637.162992] handlers:
Dec 7 06:59:06 moonbiter kernel: [ 1637.162994] [_end+128326184/2130332920] (usb_hcd_irq+0x0/0x60 [usbcore])
Dec 7 06:59:06 moonbiter kernel: [ 1637.163010] [_end+128535048/2130332920] (rhine_interrupt+0x0/0xc70 [via_rhine])
Dec 7 06:59:06 moonbiter kernel: [ 1637.163017] Disabling IRQ #23
So irq assigned to eth0 was disabled ...
Following advice from the forum I added "noapic" boot option to my /boot/grub/menu.lst
And now network is solid stable again :D.
Just to see how /proc/interrupts looks now:
CPU0
0: 145441 XT-PIC-XT timer
1: 1186 XT-PIC-XT i8042
2: 0 XT-PIC-XT cascade
4: 0 XT-PIC-XT uhci_hcd:usb3, ehci_hcd:usb5
5: 24510 XT-PIC-XT uhci_hcd:usb1, uhci_hcd:usb4, eth0
6: 5 XT-PIC-XT floppy
7: 47352 XT-PIC-XT parport0
8: 0 XT-PIC-XT rtc
9: 0 XT-PIC-XT acpi
10: 48019 XT-PIC-XT nvidia
11: 740 XT-PIC-XT sata_via, uhci_hcd:usb2, eth1, HDA Intel
12: 7125 XT-PIC-XT i8042
14: 51693 XT-PIC-XT ide0
15: 5002 XT-PIC-XT ide1
NMI: 0
LOC: 145342
ERR: 4
I'm not too happy with this solution but it works...