server woes: when to throw in the towel | FerrariChat

server woes: when to throw in the towel

Discussion in 'Other Off Topic Forum' started by Schatten, May 27, 2005.

This site may earn a commission from merchant affiliate links, including eBay, Amazon, Skimlinks, and others.

  1. Schatten

    Schatten F1 World Champ
    Owner

    Apr 3, 2001
    11,238
    Austin, TX
    Full Name:
    Randy
    Today marks the end of week 5 on this one project which wasn't supposed to take up much of my time at all. Five weeks ago, I received two nodes and a storage server - all for a clustered oracle system (approx. 38K). Due to my lack of time, and experience in oracle clustered environments, we even shelled out another 7K for setup and installation. That went smoothly, sorta.

    -- During the initial setup, the guys hired by the OEM replaced two cards and two cables.
    -- Since the setup, I've had the following replaced due to more failed parts
    ------ two more sets of cards
    ------ one failed hard drive
    ------ another set of cables
    ------ another failed hard drive
    ------ another set of cards
    ------ another set of cables
    ------ another failed hard drive
    ------ and yesterday, another failed hard drive

    Each time a drive is rebuilt on the array, it not only drops the oracle database, but it kills the data completely, leaving me with 'system data corruption' errors and 'megaraid' errors. Lovely, just lovely.

    They wonder where our backup is and there's no further effort on our part to even try to back it up because it will not even stay up long enough to make a difference as a production server. We're only using it as a test bed now. And five weeks later, we should be "in" production, not still testing and trying to get the damn thing working.

    I'm about to throw in the towel on this one. At least I didn't throw away the boxes that they all came in. Ain't that a plus?

    While I do not have to deal with people as much as Solipsist does, I have to deal with inanimate objects that seem to take a life of their own. No matter what I say to this thing, it still can't hear me.

    /end Friday rant.
     
  2. Enzo

    Enzo F1 Rookie

    Feb 14, 2002
    4,089
    MinneSOta
    Full Name:
    Pat Pasqualini
    I hear your pain. That's all i can say. I'm about ready to go into a different field or start working a one man show again in a small company.
     
  3. Schatten

    Schatten F1 World Champ
    Owner

    Apr 3, 2001
    11,238
    Austin, TX
    Full Name:
    Randy
    Well, their last attempt to get the oracle guys out here (4th time) after they replace 2 power supplies, two scsi cards, three cards on the storage array and cables galore.

    So that's what I'm doing this evening.
     
  4. yoda

    yoda F1 Rookie

    Sep 27, 2004
    2,598
    UT
    oracle is a beast
     
  5. tvrfreak

    tvrfreak F1 Rookie
    BANNED

    Mar 31, 2003
    3,879
    Arkansas
    Full Name:
    F K
    Sounds like there's something on the motherboard that's not quite right, or on the RAID controller card, and it fries the other components. I can't imagine cables being bad. Never EVER had that problem.

    Get a configured server from Dell, or a blank one. Stick Win20003 Advanced Server on it, put on the cluster stuff (it was called WolfPack during beta, dunno what they named it, then configure the AD policies, and then slap MS SQL Server on the bad boy. Standard stuff with plenty of support out there. And data migration should be a non-issue at this stage.

    Will the app run with SQL Server or will it require a complete rewrite? If so, I guess you're stuck with Ellison's love child. I don't envy you one bit. Best of luck.
     
  6. Enzo

    Enzo F1 Rookie

    Feb 14, 2002
    4,089
    MinneSOta
    Full Name:
    Pat Pasqualini

    We are running a couple of different clusters on win2003 and they work great
     
  7. DMC

    DMC Formula 3

    Nov 15, 2002
    2,385
    WI/IL
    Full Name:
    Dean
    Good to hear - we'll be doing a POC on clustering our Exchange servers next year.
     
  8. tvrfreak

    tvrfreak F1 Rookie
    BANNED

    Mar 31, 2003
    3,879
    Arkansas
    Full Name:
    F K
    Clustering on Exchange? I am presuming to be even more compliant with Sorbanes-Oxley? Exchange is tricky...even MS Support on Exchange is weak.

    Don't skimp on the expertise (vet them thoroughly), or you will end up with cascading nightmares.
     
  9. Spasso

    Spasso F1 World Champ

    Feb 16, 2003
    14,656
    The fabulous PNW
    Full Name:
    Han Solo
    Sounds to me like parts are being thrown at SYMPTOMS and nobody knows what the CAUSE is.

    A common mistake that auto mechanics make.
     
  10. GrigioGuy

    GrigioGuy Splenda Daddy
    Lifetime Rossa Owner

    Nov 26, 2001
    33,103
    E ' ' '/ F
    Full Name:
    Snike Fingersmith
    Ditto that, unless this is some sort of white-box setup I cannot imagine any modern server being that bad. How's the power to the server/chassis? I'd almost bet on bad voltage somewhere frying random parts.
     
  11. richard_wallace

    richard_wallace Formula 3

    Feb 6, 2004
    1,957
    Cincinnati, Ohio
    Full Name:
    Richard Wallace
    Drop Wintel and go Sun... much more stable for Oracle as a DB server... I typically use Sun 880's (now 890's) - very nice platform for DB servers...

    Also - I would recommend using a SAN and not local Raid drives - provides more redundancy and you will not lose data... Of course now you are talking a lot lot lot more dollars...

    Of course if you are talking for home stuff - I have a Wintel Box running Linux with Oracle 9i - raid 5... Works pretty well - but not real heavy use of course...
     
  12. Schatten

    Schatten F1 World Champ
    Owner

    Apr 3, 2001
    11,238
    Austin, TX
    Full Name:
    Randy
    nope, not at all. Exchange is on a different box, Win2003 Compaq server, runs flawlessly.

    Good thoughts, but... what makes you think it isn't a pre configured server from Dell? afterall, most of our stuff comes from there. heck, the building that we're in is an old Dell executive building.

    Win2K2 cluster? not a chance. sorry. SQL server? I don't even think so. it doesn't even come close to what we have been using and would be a ***** of a migration. complete rewrite? you are talking about custom apps that are currently being imported from a Sun box. a pair of E450's to be exact on 8i to the RHE3 with 10g.

    They are called SWAGs (some wise a$$ guess, not swag as it crap shirts/etc.) They are throwing out all the parts, because I'm about to pack it up and send it back to them. I didn't get out of there until 9pm with the hardware replacement and more failures which I corrected, but the data will have to be reconfigured and reimported (9million records on the lighter side) later through the weekend.

    Sorry to be pessimistic throughout this entire response, but Sun is what we're trying to get away from. We paid ~16K just to keep the support on the boxes for one year and due to the lack of support and various unknown hardware crashes that their support couldn't figure out - after days, weeks, and a month later, and then again... well, we're dropping that. it is also a huge cost savings when it comes to oracle licensing when purchasing it through this OEM.

    well.. it is back up. and while you guys are relaxing over the weekend... eh... =/ sorry, it's just been a long week.
     
  13. stephens

    stephens F1 Rookie
    Lifetime Rossa

    Feb 13, 2004
    4,647
    Australia
    Full Name:
    Stephen S
    We are a MS shop, but also run Sun 220 & 450R's. It was cheaper to buy spare servers off ebay than continue with the Sun maintenance contracts!!
     
  14. Evolved

    Evolved F1 Veteran

    Nov 5, 2003
    8,700
    Dell RAID controllers are total crap. Are they still using those PERC cards?

    Throwing Oracle on a box with those sounds like a recipe for disaster as Oracle never runs 100% on Windows. I have only used Oracle on Windows for testing for that reason. Did you try using win2k insted of 03?

    I would say you have a fussy power supply first then look at the above concerns. IBM Servers run Oracle well.

    i'm also partial to HP server for Windows.
     
  15. wrecktech

    wrecktech Formula Junior

    Jun 4, 2004
    368
    Fort Wayne, Indiana
    I thought a SWAG was a scientific wild ass guess, at least in Purdue engineering circles.

    Good luck, you were a great help when I had problems a few months ago. I wish I could return the favor.
     
  16. Schatten

    Schatten F1 World Champ
    Owner

    Apr 3, 2001
    11,238
    Austin, TX
    Full Name:
    Randy
    HP servers rock!

    And.... nope, this isn't Windows. It's RHE3. This weekend I'll be rebuilding the datastore by reinitializing it and setting up the RAW partition for OCFS to handle. Fun... oh what fun.
     
  17. Evolved

    Evolved F1 Veteran

    Nov 5, 2003
    8,700

    I've never used RHE3. How do you like it?
     
  18. Enzo

    Enzo F1 Rookie

    Feb 14, 2002
    4,089
    MinneSOta
    Full Name:
    Pat Pasqualini
    Yeah we have been running Win2003 Clustering for Exchange and our File servers. No problems at all with Exchange. We have and Exchange Org that vents out to different domains and then into different Forests and all brought together through MIIS for a Complete GAL of all involed with us within our Forest or the other Forests. We have had this installed for over a year now and the preformance is great and much better than the Groupwise system we had that we are still in the process of dumping 400+ servers out of the tree.
     
  19. tvrfreak

    tvrfreak F1 Rookie
    BANNED

    Mar 31, 2003
    3,879
    Arkansas
    Full Name:
    F K
    What is holding you up from dumping the 400 servers? Should be straightforward to excise them from the forest, unless you need to migrate apps or there's other dependencies? If so, are they related to Exchange?

    Also, anything, even letter-writing, is an improvement over Groupwise--agree? :)
     
  20. Enzo

    Enzo F1 Rookie

    Feb 14, 2002
    4,089
    MinneSOta
    Full Name:
    Pat Pasqualini

    The 400 or so servers in the Novell tree are throughtout about 13 states and are owned by different associations. Most of them are off after about a year but there are still some of our associations that have yet to migrate fully off of Novell. By the end of June we should be done and shut down the last of the Novell servers. We had just about 300+ applications at our location that we brought over and into MS SMS. We designed our Exchange system to be totally independent of the old Groupwise system so we we migrated it was an easy switch after we converted all archives over to a PST file.

    Groupwise blew big chunks. You were always running GW checks on mailboxs and Post Offices. We have had only 1 tiny problem with an Exchange mailbox and it was fixed in a heartbeat. Now I just have to finish up getting my MCSE (boot camp time)
     
  21. tvrfreak

    tvrfreak F1 Rookie
    BANNED

    Mar 31, 2003
    3,879
    Arkansas
    Full Name:
    F K
    Novell...a brand that elicits nothing but revulsion (at least in me)!

    I don't know about tougher, but I hear the MCSE certification has gotten a lot more comprehensive since I got certified in the late 90's. Only natural, I guess, considering all that's been added on. Since the courses were about a $1800 each and I didn't have time to read 500-page tomes, I did it by the brute force method...I figured I'd be way ahead if I kept taking the $60 exams until I passed. Luckily, I never needed to take anything more than twice, and about half I passed on the first try. At the time, if you took your exams at trade shows, you could take them for half price...

    How are you finding it? Good luck. :)
     
  22. Enzo

    Enzo F1 Rookie

    Feb 14, 2002
    4,089
    MinneSOta
    Full Name:
    Pat Pasqualini

    The tests are quite a bit harder than when I took them back in the NT 4.0 days. That's why I figured that I would just pony up and do the 2 week boot camp and let them tell me eaxctly what will be on the tests. I have been working with 2003 for more than a year now in a pretty complex enviroment so it should be easy to relate to what they will be testing me on. Hopefully they won't get too crazy with what they want you to know.
     
  23. Artherd

    Artherd F1 Veteran

    Jun 19, 2002
    6,588
    Bay Area, CA
    Full Name:
    Ben Cannon
    #23 Artherd, May 28, 2005
    Last edited by a moderator: Sep 7, 2017
    Hey Randy, dosen't it make you wish these guys still had Oracle support?

    I bet this machine here could handle your entire load and then some, and it's about 1/2 the cost on today's used market.

    Too bad oracle support ended around 7i or so?

    I had to move our web cluster to RHES3 about a year ago, off SGI Origin200s. We went from literally no problems in 4 years, to 3 seperate events of downtime greater than 30 min. What a fuster-cluck. And Redhat still dosen't recognise the built-in gig-ethernet jacks at all. It's only high-dollar INTEL motherboards running INTEL pro1000e chipsets, why should I expect it to work? <sigh>

    Bring back the SGIs. Heck, just bring back the application support. IRIX 6.5 and the Onyx2/Origin hardware are still competitive today.
    Image Unavailable, Please Login
     
  24. Schatten

    Schatten F1 World Champ
    Owner

    Apr 3, 2001
    11,238
    Austin, TX
    Full Name:
    Randy
    Ben - I have two of those in the datacenter now. Trying to get off of Sun and SGI at the moment.

    your MS Exchange tip of the day: place your logfiles on a different drive or array. the speed will be very noticable. smaller exchange systems, under 250 mailboxes will probably see little improvement.
     
  25. Evolved

    Evolved F1 Veteran

    Nov 5, 2003
    8,700
    Minor addition/clarification:

    That's physical device, not logical.

    Cheers,

    Me
     

Share This Page