Upgrading the system unit from twinaxial console to operations console (feature 5544) (26 pages)
Summary of Contents for IBM Power AC922 8335-GTW
Page 1
Power Systems Problem analysis, system parts, and locations for the 8335-GTC, 8335-GTG, 8335-GTH, 8335-GTW, and 8335-GTX...
Page 2
Note Before using this information and the product it supports, read the information in “Safety notices” on page v, “Notices” on page 45, the IBM Systems Safety Notices manual, G229-9054, and the IBM Environmental Notices and User Guide, Z125–5823. ®...
Finding parts and locations ................. 25 8335-GTC, 8335-GTG, 8335-GTH, 8335-GTW, or 8335-GTX locations..........25 8335-GTC, 8335-GTG, 8335-GTH, 8335-GTW, or 8335-GTX parts............31 Notices........................45 Accessibility features for IBM Power Systems servers................46 Privacy policy considerations ........................47 Trademarks..............................47 Electronic emission notices........................47 Class A Notices.............................
Das Produkt ist nicht für den Einsatz an Bildschirmarbeitsplätzen im Sinne § 2 der Bildschirmarbeitsverordnung geeignet. Laser safety information IBM servers can use I/O cards or features that are fiber-optic based and that utilize lasers or LEDs. Laser compliance IBM servers may be installed inside or outside of an IT equipment rack.
Page 6
– For racks with a DC power distribution panel (PDP), connect the customer’s DC power source to the PDP. Ensure that the proper polarity is used when attaching the DC power and DC power return wiring. • Connect any equipment that will be attached to this product to properly wired outlets. •...
Page 7
• Rack-mounted devices are not to be used as shelves or work spaces. Do not place objects on top of rack-mounted devices. In addition, do not lean on rack mounted devices and do not use them to stabilize your body position (for example, when working from a ladder). •...
Page 8
CAUTION: Removing components from the upper positions in the rack cabinet improves rack stability during relocation. Follow these general guidelines whenever you relocate a populated rack cabinet within a room or building. • Reduce the weight of the rack cabinet by removing equipment starting at the top of the rack cabinet.
Page 9
DANGER: Rack-mounted devices are not to be used as shelves or work spaces. Do not place objects on top of rack-mounted devices. In addition, do not lean on rack-mounted devices and do not use them to stabilize your body position (for example, when working from a ladder). Stability hazard: •...
Page 10
DANGER: Multiple power cords. The product might be equipped with multiple AC power cords or multiple DC power cables. To remove all hazardous voltages, disconnect all power cords and power cables. (L003) (L007) CAUTION: A hot surface nearby. (L007) x Power Systems: Problem analysis, system parts, and locations for the 8335-GTC, 8335-GTG, 8335-GTH, 8335-GTW, and 8335-GTX...
Page 11
Exchange only with the IBM-approved part. Recycle or discard the battery as instructed by local regulations. In the United States, IBM has a process for the collection of this battery. For information, call 1-800-426-4333. Have the IBM part number for the battery unit available when you call.
Page 12
• LIFT TOOL intended for use to assist, lift, install, remove units (load) up into rack elevations. It is not to be used loaded transporting over major ramps nor as a replacement for such designated tools like pallet jacks, walkies, fork trucks and such related relocation practices. When this is not practicable, specially trained persons or services must be used (for instance, riggers or movers).
Page 13
Freewheeling will cause uneven cable wrapping around winch drum, damage cable, and may cause serious injury. • This TOOL must be maintained correctly for IBM Service personnel to use it. IBM shall inspect condition and verify maintenance history before operation. Personnel reserve the right not to use TOOL if inadequate.
Page 14
xiv Power Systems: Problem analysis, system parts, and locations for the 8335-GTC, 8335-GTG, 8335-GTH, 8335-GTW, and 8335-GTX...
Then Yes: Continue with the next step. Go to “Resolving a power problem” on page 3. 2. Can you access the baseboard management controller (BMC) across the network? Then Yes: Continue with the next step. Go to “Resolving a BMC access problem” on page 2. 3.
Note: If the IP address setting is incorrect, go to Configuring the BMC IP address. If the MAC address is 00:00:00:00:00:00, go to “Contacting IBM service and support” on page 24. 5. Complete the following actions: a. Power on to the Petitboot menu.
Yes: Continue with the next step. Go to “Contacting IBM service and support” on page 24. This ends the procedure. 5. Perform the following actions, one at a time until the problem is resolved: a. Resolve any serviceable alerts that are in the event log. Go to Resolving a hardware problem.
Then Yes: Continue with the next step. Continue with step 4. 3. Complete the following actions: a. Update the system firmware. For instructions, see Getting fixes. b. Check the system event logs. For instructions, see “Identifying a service action by using system event logs”...
d) Verify that the system is powered on by activating a serial over LAN (SOL) session through the baseboard management controller (BMC). If the system is not active, go to “Resolving a system firmware boot failure” on page 4. e) Replace the system backplane. •...
6. Was a service action identified? Then Yes: Continue with the next step. Go to “Collecting diagnostic data” on page 23. Then, go to “Contacting IBM service and support” on page 24. This ends the procedure. 7. Did the service action fix the problem? Then Yes: This ends the procedure.
Page 22
Then Continue with the next step. 2. To identify the correct service procedure to perform by using operating system log information, complete the following steps: a) Log in as the root user. b) To display the operating system logs, type dmesg and press Enter. 3.
Table 1. Resource names, examples, and service procedures for different types of operating system logs. (continued) Resource name Example of a log Type of problem Service procedure requiring a service action Detected error on PCIe bus or adapter Resolve any device PHB#xxx, where xxx is driver errors that are the PHB number.
If the network adapter is functioning again, review the IBM support tips to confirm that there are no PCI address, driver, or firmware conflicts. Then, reinstall the new adapters again one at a time until all adapters function properly.
Page 25
Table 3. GPU problems and service actions for the 8335-GTC, 8335-GTG, 8335-GTH, 8335-GTW, or 8335-GTX Problem Service action System unable to find GPU 1. Verify that the GPU is properly seated. 2. Verify that the drivers for the GPU are installed. 3.
Page 26
4. Does NPU chip 1 appear in the fence error log entry? • Yes: Continue with the next step. • No: Go to “Contacting IBM service and support” on page 24. This ends the procedure. 5. Replace the following items, one at a time, until the problem is resolved: Note: Go to “8335-GTC, 8335-GTG, 8335-GTH,...
Table 3. GPU problems and service actions for the 8335-GTC, 8335-GTG, 8335-GTH, 8335-GTW, or 8335-GTX (continued) Problem Service action GPU stops working suddenly 1. If the system was recently installed, moved, serviced, or upgraded, verify that the GPU is seated properly. 2.
Table 4. Storage device problems and service actions (continued) Problem Service action Drive stops working suddenly 1. Verify that all internal cables are properly seated and are not physically damaged. 2. Check the system logs to verify whether the system detected a problem. 3.
Flash adapter again. 2. Ensure that the latest I/O adapter firmware is installed. For instructions, see Getting firmware fixes for IBM I/O adapters by using Fix Central. 3. Ensure that you have the latest device driver service updates by installing the latest Linux distribution fixes.
Yes: Continue with the next step. Go to “Collecting diagnostic data” on page 23. Then, go to “Contacting IBM service and support” on page 24. This ends the procedure. 3. You can determine the GPU slot information by using the lshw command. To determine the GPU slot, complete the following steps: a) Record the PCI bus information that is in the error message.
a) The operating system log contains information about the NVMe Flash adapter in the form of a PCI address. Record the PCI address information for the NVMe Flash adapter that failed. For example, in the operating system log message nvme 0006:01:00.0: Failed status: ffffffff, reset controller, the PCI address of the failing NVMe Flash adapter is 0006:01:00.0.
Table 7. GPU and PCIe adapter user guides (continued) Name User guide QLogic QLogic website (http://driverdownloads.qlogic.com/QLogicDriverDownloads_UI/ IBM_Search.aspx) Resolving an over temperature problem for a water-cooled 8335-GTW or 8335-GTX system Learn how to identify the service action that is needed to resolve an over temperature problem. Procedure 1.
Page 33
8. Replace the cold plates. For instructions about how to replace the cold plates, see Removing and replacing the cold plates in the 8335-GTW or 8335-GTX. Does the problem persist? Then Yes: Go to “Contacting IBM service and support” on page 24. This ends the procedure. This ends the procedure. Beginning troubleshooting and problem analysis 19...
Determining and setting the thermal mode for an 8335-GTG, 8335-GTH, or 8335-GTX system Learn how to determine and set the thermal mode that is required for your system. You might be required to set the thermal mode of the system to a setting other than the default setting, depending on your system, adapter, and cable type.
Table 9. Thermal mode setting for the 8335-GTX system (continued) Adapter feature code Adapter description Cable type Thermal mode EC6G PCIe4 x16, 2-port HDR Copper DEFAULT 100 Gb InfiniBand Optical HEAVY_IO ConnectX-6 adapter All other adapters and cable types DEFAULT Setting the thermal mode After you determine the thermal mode based on the system model, adapter, and cable type, choose one of the following options to set the thermal mode.
Then Note: Alerts with a value of No that are displayed in the Serviceable column do not require service. 4. Starting with the first entry in the Active Alerts section with a value of Yes in the Serviceable column, complete the following steps until all entries are resolved: a.
Perform a ping test to verify the network connectivity. Collecting diagnostic data Learn how to collect diagnostic data to send to IBM service and support. About this task To collect diagnostic data, complete the following steps: Procedure 1.
3. To collect system event logs, type the following command and press Enter: openbmctool -U <username> -P <password> -H <BMC IP address or BMC host name> collect_service_data 4. Send the data that you collected during this procedure to IBM service and support. This ends the procedure. Contacting IBM service and support You can contact IBM service and support by telephone or through the IBM Support Portal.
Page 40
Table 12. Front view locations (continued) Index number FRU description FRU removal and replacement procedures Power switch and cable • Removing and replacing the power switch and cable in the 8335-GTC • Removing and replacing the power switch and cable in the 8335-GTG or 8335-GTH •...
Page 41
Table 13. Top view locations Index number FRU description FRU removal and replacement procedures Disk drive and fan card • Removing and replacing the disk drive and fan card in the 8335-GTC • Removing and replacing the disk drive and fan card in the8335-GTG or 8335-GTH •...
Page 42
Table 13. Top view locations (continued) Index number FRU description FRU removal and replacement procedures Baseboard management See Removing and replacing the controller (BMC) card BMC card in the 8335-GTC, 8335-GTG, 8335-GTH, 8335- GTW, or 8335-GTX. Trusted platform module See Removing and replacing the trusted platform module in the Note: 8335-GTC, 8335-GTG, and 8335-GTH and 8335-GTX.
Page 43
Figure 3. Memory locations The following table provides the memory locations. Finding parts and locations 29...
Page 44
Table 15. Memory locations Index number FRU description FRU removal and replacement procedures DIMM 0 See Removing and replacing memory in the 8335-GTC, 8335- DIMM 1 GTG, or 8335-GTH or Removing and replacing memory in the DIMM 2 8335-GTW or 8335-GTX. DIMM 3 DIMM 4 DIMM 5...
8335-GTC, 8335-GTG, 8335-GTH, 8335-GTW, or 8335-GTX parts Use this information to find the field-replaceable unit (FRU) part number. Rack final assembly Figure 4. Rack final assembly Table 16. Rack final assembly part numbers Index Part number Units per Description number assembly 45W8836 Fixed rail kit - contains left and right fixed rails and...
Page 46
Table 16. Rack final assembly part numbers (continued) Index Part number Units per Description number assembly 01EM209 Fixed rail kit - contains left and right fixed rails and attaching screws (8335-GTW) 00E4260 Slide rail kit - contains left and right slide rails and attaching screws (8335-GTC or 8335-GTG) 00E7329 Electronic Industries Association (EIA) bracket (right...
Page 47
System parts Figure 5. System parts Finding parts and locations 33...
Page 48
Table 17. System parts Index number Part number Units per Description assembly Top access cover assembly PCI adapters. Use the feature type of the adapter to find the FRU number in PCIe adapter information by feature type for the 8335-GTC, 8335-GTG, 8335-GTH, 8335-GTW, or 8335-GTX.
Page 49
Additional system parts (8335-GTC, 8335-GTG, or 8335-GTH air-cooled system) Figure 6. Additional system parts (8335-GTC, 8335-GTG, or 8335-GTH air-cooled system) Finding parts and locations 35...
Page 50
Table 18. Additional system parts (8335-GTC, 8335-GTG, or 8335-GTH air-cooled system) Index Part number Units per Description number assembly 01KL842 GPU kit (includes 16 GB GPU card, heat sink, and thermal interface material (TIM)) (8335-GTC) 01KL843 GPU kit (includes 16 GB GPU card, heat sink, and thermal interface material (TIM)) (8335-GTG or 8335- GTH) 02CL680...
Page 51
*8335-GTC, 8335-GTG, 8335-GTH, 8335-GTW, and 8335-GTX systems do not support mixing system processors with different DDx.y levels, different speeds, or differing numbers of cores. To determine the DDx.y level, type the following command and press Enter: openbmctool -U <username> -P <password> -H <BMC IP address or BMC host name> fru print The DDx.y level is the CPU version number in the format xy.
Page 52
Additional system parts (8335-GTW or 8335-GTX water-cooled system with 4 GPUs) Figure 7. Additional system parts (8335-GTW or 8335-GTX water-cooled system with 4 GPUs) 38 Power Systems: Problem analysis, system parts, and locations for the 8335-GTC, 8335-GTG, 8335-GTH, 8335-GTW, and 8335-GTX...
Page 53
Table 19. Additional system parts (8335-GTW or 8335-GTX water-cooled system with 4 GPUs) Index Part number Units per Description number assembly 01EM314 GPU kit (includes spreader assembly, 16 GB GPU card, air baffle, and TIM) (8335-GTW) 02CL351 GPU kit (includes spreader assembly, 16 GB GPU card, air baffle, and TIM) (8335-GTX) 02CL683 GPU kit (includes spreader assembly, 32 GB GPU card,...
Page 54
Table 19. Additional system parts (8335-GTW or 8335-GTX water-cooled system with 4 GPUs) (continued) Index Part number Units per Description number assembly 01EM321 DD2.1 18 core 2.900 GHz system processor module kit (includes system processor module, processor tray, 4 mm hex driver, module replacement tool, and air pump) (8335-GTW)* 02CL566 DD2.2 18 core 3.15 GHz system processor module kit...
Page 55
Additional system parts (8335-GTW or 8335-GTX water-cooled system with 6 GPUs) Figure 8. Additional system parts (8335-GTW or 8335-GTX water-cooled system with 6 GPUs) Finding parts and locations 41...
Page 56
Table 20. Additional system parts (8335-GTW or 8335-GTX water-cooled system with 6 GPUs) Index Part number Units per Description number assembly 01EM314 GPU kit (includes spreader assembly, 16 GB GPU card, air baffle, and TIM) (8335-GTW) 02CL351 GPU kit (includes spreader assembly, 16 GB GPU card, air baffle, and TIM) (8335-GTX) 02CL683 GPU kit (includes spreader assembly, 32 GB GPU card,...
Page 57
Table 20. Additional system parts (8335-GTW or 8335-GTX water-cooled system with 6 GPUs) (continued) Index Part number Units per Description number assembly 01EM321 DD2.1 18 core 2.900 GHz system processor module kit (includes system processor module, processor tray, 4 mm hex driver, module replacement tool, and air pump) (8335-GTW)* 02CL566 DD2.2 18 core 3.15 GHz system processor module kit...
Page 58
Table 21. Miscellaneous system parts (continued) Description Part number Units per assembly Time-of-day battery ( 8335-GTC, 00RY543 8335-GTG, 8335-GTH, 8335- GTW, or 8335-GTX) Baffle screw kit (8335-GTC, 8335- 01EM312 GTG, 8335-GTH, 8335-GTW, or 8335-GTX) Note: The screw kit includes one screw for each of the power supply air baffles and one screw for the empty GPU slot.
Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead.
This product uses standard navigation keys. Interface information The IBM Power Systems servers user interfaces do not have content that flashes 2 - 55 times per second. The IBM Power Systems servers web user interface relies on cascading style sheets to render content properly and to provide a usable experience.
Contact the vendor for accessibility information about its products. Related accessibility information In addition to standard IBM help desk and support websites, IBM has a TTY telephone service for use by deaf or hard of hearing customers to access sales and support services:...
Class A Notices The following Class A statements apply to the IBM servers that contain the POWER9 processor and its features unless designated as electromagnetic compatibility (EMC) Class B in the feature information. Federal Communications Commission (FCC) Statement Note: This equipment has been tested and found to comply with the limits for a Class A digital device, pursuant to Part 15 of the FCC Rules.
Page 63
Japan Electronics and Information Technology Industries Association Statement This statement explains the Japan JIS C 61000-3-2 product wattage compliance. This statement explains the Japan Electronics and Information Technology Industries Association (JEITA) statement for products less than or equal to 20 A per phase. This statement explains the JEITA statement for products greater than 20 A, single phase.
Page 64
Um dieses sicherzustellen, sind die Geräte wie in den Handbüchern beschrieben zu installieren und zu betreiben. Des Weiteren dürfen auch nur von der IBM empfohlene Kabel angeschlossen werden. IBM übernimmt keine Verantwortung für die Einhaltung der Schutzanforderungen, wenn das Produkt ohne Zustimmung von IBM verändert bzw.
Properly shielded and grounded cables and connectors must be used in order to meet FCC emission limits. Proper cables and connectors are available from IBM-authorized dealers. IBM is not responsible for any radio or television interference caused by unauthorized changes or modifications to this equipment.
Page 66
Tel: +49 800 225 5426 email: halloibm@de.ibm.com VCCI Statement - Japan Japan Electronics and Information Technology Industries Association Statement This statement explains the Japan JIS C 61000-3-2 product wattage compliance. This statement explains the Japan Electronics and Information Technology Industries Association (JEITA) statement for products less than or equal to 20 A per phase.
Permissions for the use of these publications are granted subject to the following terms and conditions. Applicability: These terms and conditions are in addition to any terms of use for the IBM website. Personal Use: You may reproduce these publications for your personal, noncommercial use provided that all proprietary notices are preserved.
Page 68
IBM reserves the right to withdraw the permissions granted herein whenever, in its discretion, the use of the publications is detrimental to its interest or, as determined by IBM, the above instructions are not being properly followed.