There are real, and subtle differences between su and sudo

Most of the time, sudo just works. Every now and then, it doesn’t. Most recently was with a build I am working on, where I got a “permission denied” error for creating a directory.

The reason for this was non-obvious at first. You “are” superuser after all when you sudo, right? Aren’t you?

Sort of.

Your effective user ID has been set to the superuser. Your real user ID still is yours. This means things like your temp directory are not necessarily yours … er … the real user ID of the temp directory owner might be different from the effective user ID you are building as. And if you have a root_squash on an NFS mount, or your system uses one or the other security mechanisms to prevent privilege escalation … here be dragons.

So it seems, during a build of rust 0.14.0, I ran head first into this. I will freely admit that my mouth was agape for a bit. I will not admit to drool falling out, and have rapidly deleted any such webcam video.

Ok, more seriously, it was a WTF moment. Took me a second to understand, as my prompt says # when I sudo -s. The make was run as sudo. The make command failed under sudo with a permission (!!!@@!@@!) error. Then a fast ‘su’ and off to the races we went.

Seriously.

While I want to dig into this more, my goal here was building rust in a reliable and repeatable manner. I don’t have that going quite yet. Very close, but I’ve now run into LLVM/clang oddities, and have switched back to gcc for it. Build completes now, but install is still problematic because of this issue.

I could just build as root user, and other build environments I’ve built do that. I’ve been trying to get away from that, as it is a bad habit, and an errant make file could wreak havoc. But the converse is also true, in that during installation, often you need to be root to install into specific paths.

I can change that assumption, and create a specific path owned by a specific user, and off to the races I go. I prefer that model, and then let the admins set up sudo access to the tree.

Viewed 180 times by 105 viewers

Combine these things, and get a very difficult to understand customer service

In the process of disconnecting a service we don’t need anymore. So I call their number. Obviously reroutes to a remote call center. One where english is not the primary language.

I’m ok with this, but the person has a very thick and hard to understand accent. Their usage and idiom were not American, or British English. This also complicates matters somewhat, but I am used to it. I can infer where they were from, from their usage. It was very common in my dealings with other people there.

Of course, this isn’t bad enough.

The call center is busy, and you can hear lots of background noise.

Of course, this isn’t bad enough.

Now add a poor VOIP connection. I was doing this over a cell phone, and my connection is generally quite good … I’ve been on many hour long con calls over this phone, headset, etc. from this location. Its not an ultra busy part of the day. So I am not getting dropped connections. I have a major US carrier for the cell. So its not a tower congestion problem.

Likely a backhaul problem shipping the voice bits halfway around the world and back, on a congested/contended for link. Noticeable delays in response. Ghosting/echoing. All manner of artifacts.

Of course, this isn’t bad enough.

Finally, add a crappy mic on the remote person’s head set.

End result was, I had to struggle to understand the person. Really struggle. Some of it was guessing what they were saying. Some was not.

I have to wonder aloud, whether companies in search of cost reduction, think its a good idea to make it hard to understand the support staff, by a combination of language usage, poor equipment, substandard networking, etc.

I guess it is amusing that this is a large “business ISP” here in the US.

At bare minimum, they should have the headsets upgraded, the network (ha!) upgraded, and the work area more noise isolated so that you get less of these issues to deal with. Hiring people whom speak with less of a thick accent is also recommended, or conversely, training them on how to adapt their elocution so as to be more understandable.

I, as an escaped New Yorker, probably shouldn’t be answering phones myself (Hey, wassamadda for you?) … but seriously … at least make an effort on this.

Viewed 2005 times by 353 viewers

SSD/flash/memory shortage, day N+1

There has been a huge demand of SSD/Flash/memory components from a number of end users. Sadly not the day jobs customers … but enough to deplete the market of supply.

Watching basic economics at work is fascinating.

Supply is highly constrained, while demand is rising. Couple that with a (mis)expectation of continuous falling prices across the board leads to interesting conversations with customers.

We’ve tried to set expectations appropriately, but we’ve been bitten in the past by doing just this. That is, by being honest and up-front with our customers that some things will take more time to get, and cost more, we’ve watched customers go to different vendors, hear a different message, and then be screwed over as we weren’t being dishonest … while the other vendor was.

In another post, I said this was getting to me.

We’ve been advising customers placing orders 2+ months in advance for some specific sets of parts in very short supply. It does take some time for manufacturing to ramp up, and OEMs are in no hurry to flood a market and lower the effective purchase price (and their profits).

Yet, I am still seeing people think that parts are available with a quick phone call. For a large enough order (more than 1 or 2 systems worth), you need to get an allocation, and you need to get in queue for that allocation. That queue can be long. Other larger orders can and will bump you in queue. And the direct customers for the OEMs that bought all the product last time might just do it again. I’d call this highly likely. These aren’t the Dells, HPEs, etc. of the world. Go ahead and guess who might be doing this. And note that shortages in the broader market serve to underscore a portion of their message.

Many folks are building out their backlog from inventory, though their inventories aren’t deep, as the products age fast, and become obsolete in short time intervals. Many do just-in-time building. For those of us doing that, this is becoming painful at best.

Yeah, this is getting to me.

Viewed 6461 times by 575 viewers

A new (old) customer for the day job

Our friends at MSU HPCC now are the proud owners of a very fast/high performance Unison Flash storage system, and a ZFS backed high performance Unison storage spinning disk unit. Installed first week of Jan 2017.

As MSU is one of my alma mater institutions, I am quite happy about helping them out with this kit.

They’ve been a customer previously; they had bought some HPC MPI/OpenMP programming training in the dim and distant past.

Viewed 6625 times by 556 viewers

Architecture matters, and yes Virginia, there are no silver bullets for performance

Time and time again, the day job had been asked to discuss how the solutions are differentiated. Time and time again, we showed benchmarks on real workloads that show significant performance deltas. Not 2 or 3 sigma measurements. More often than not, 2x -> 10x better.

Yet … yet … we were asked, again and again, how we did it. We pointed to our architecture.

But, they complained, isn’t it the same as X (insert your favorite volume vendor here)?

No, we pointed out. It isn’t. We described the differences. Showed them precisely how the differences manifested. Showed them that the results are normal, repeatable, and generally different from what others made claims about.

Often in the past, we’ve heard that (insert random vendor here) has comparable systems. And when the real world measurements come out, we hear a very different message. Or when a customer eschewed our solution, went for the brand name version (at much higher cost), they rapidly discovered that engineering by spec sheet is a very … very bad thing to do.

It doesn’t work (engineering by spec sheet).

Yet, we heard this all the time.

In the past, I’ve railed against the notion of silver bullets. A silver bullet is a magical component, hardware or software, that will suddenly make something go much faster, and you know, give a competitor an unfair competitive advantage.

Marketing people love their silver bullets. They don’t work, but hey, they are fun to talk about.

How do we know they don’t work? Easy. Decades of benchmarking against them. Running real applications against them and our kit.

We designed and built something quite good. It enabled us to build parallel IO engines, tune the heck out of the engines. Move tremendous amounts of data between process/memory complex, storage, and RAM, without bottlenecks of other designs. Despite protestations and spec sheets to the contrary, measurements I and many others have done have demonstrated sustained and profound advantages of superior architectures over the silver bullet enhanced architectures.

I see spec sheets and marketing blurbs on products proclaiming them to be “the fastest” stuff in the west, with numbers that are lower … lower … than numbers we surpassed more than 3 years ago. Yet, we are told by some that these products are comparable.

Or, even more (unwittingly) humorously, that there really is no difference, even though, in a number of cases, we had just demonstrated a profound (nearly order of magnitude) difference.

It astounds me. No, confounds me … this may be a better way to articulate it.

We’ve not simply created a better mousetrap, we’ve tried to tell the world about it. And been ignored.

And we tried to get folks to invest in this. And been ignored.

All the while the market is going on validating our ideas (dense and high performance systems), and we see VCs investing in things like, I dunno … Secret?

This gets to you after a while. You start questioning a number of premises you had held to be truth.

So here we are, with (what I’d argue what is) a fantastic architecture second to none. And despite the simplicity and obviousness of our message, our (many and repeatable and sustained) measured results … we get people reading off a marketing spec sheet telling us we are not all that different. Though we are.

This is one of those inflection points in a company’s existence.

I’ve been asked multiple times in recent months to estimate what we could do if we took our stack to other hardware. Apart from the significant performance (and likely stability) loss, such that we’d be like everyone else, not much.

I’ve also been asked multiple times to “divulge our secrets”, though our architecture is open, our kernels are available online.

As I said, it gets to you.

I am thinking hard about this battle, and whether or not I want to keep fighting it.

Our kit is obviously, objectively better. And not by a little bit.

But it doesn’t matter if we can’t sell it, because people read spec sheets and think the numbers printed on it are what they will get in normal operational states, versus the best case scenarios that are specifically set up for those parts.

A friend noted the fallacy of engineering by spec sheet a while ago. They are right.

smh

Viewed 6555 times by 584 viewers

#Perl on the rise for #DevOps

Note: I do quite a bit of development in Perl, and have my own biases, so please do take this into consideration. It is one of many languages I use, but it is by and large, my current go-to language. I’ll discuss below.

According to TIOBE (yeah, I know), Perl usage is on the rise. The linked article posits that this is for DevOps reasons. The author of the article works at a company that makes money from Perl and Python … they build (actually very good) tools. Tools that I personally use (Komodo).

The rationale is that Perl is very powerful, quite fast, extremely flexible, and ubiquitous to boot. They compare performance of Python and Perl performance, noting some of the differences, and speculating why.

Generally, I don’t normally like saying “language X is better than Y”. Languages have domains of applicability, strengths, and weaknesses. Moreover, if you have to justify your choice by making the point “Y is better than X because of Z” then you’ve largely not understood the point of the languages in the first place. I’ve made this point in the past before, but “delegitimizing” a language (such as, I dunno, the line noise meme? or use of sigils … where the latter seems to be only applied to a single language …) isn’t a good language advocacy path.

So put that aside, and lets talk DevOps. DevOps at the core is about turning processes and hardware into larger portions of an algorithmic application delivery and support. To make automation simpler, to wrapper applications that aren’t services into something that looks/acts like a service. To enable composable systems, or if you prefer the moniker used today, Software Defined systems. I’m going to focus less on the container side here, and more on the process side.

There are many tools to help enable this. Some are fairly new and undergoing rapid development. Some are more mature, others are ossified.

Generally, you need a few specific features to build elements for inclusion in a DevOps pipeline. You need the ability to build API endpoints for services that you will be running. You need the ability to link these API end points to specific functional elements. Like running a non-service based program with specific arguments. You need the ability to ingest data in common formats, and output in common formats. You need the ability to easily send signals in or out of band (depending upon how your architecture is built).

This “glue” functionality is, to a very large extent, what Perl excels at. Ok, I am talking Perl5 here. Perl6 is (literally) a new language with a similar though not identical syntax … but from what I have seen, it can do this, and far far more. But that is a topic for another time.

You can create endpoints trivially in Perl using standard modules. You can set up simple servers, or restful APIs fairly trivially without much boiler plate code (see Mojolicious on CPAN. It has significant capability in Meta-programming via various modules such as Class::MOP/Moose, and others. It has amazing multi-language capabilities with the Inline:: series (Inline::C , Inline::Python, …). It interfaces quite trivially to external libraries written in any language (FFI::Platypus). It has the ability to run external code via a tremendously powerful interface, IPC::Run, as well as with simple back ticks. It can run multi-threaded, multi-process code using threads::shared and MCE amongst many others. Its database connectivity is excellent, and it is easy to hook into (No|New)SQL DBs. It has event loops for async processes.

I could keep going, but the point is that it is fairly trivial to build responsive services for DevOps using Perl and a smattering of these tools.

This said, some of the distribution providers (I am looking at you Red Hat) are still shipping not merely ancient tool sets, but tools that have been end of lifed for years … as their current supported tools. Like the ancient Go, Python, and other tools, Perl on these distributions is so woefully out of date, that some of the modules (Mojolicious and a few others) may not work properly. This is on them, they need to decide if they want developers whom need modern tools, or not.

What I’ve been doing has been building my own tree of tools. I’ll be refreshing this soon, and putting the refreshed tree up on github shortly. These are modern versions of Perl5, Perl6, Python3, Rust, Octave, R, Julia, Node, Jupyter, and a few others, along with my build environment. These tools make DevOps and analytics generally quite easy. All batteries included as far as I can tell based upon my usage, but happy to learn of more tools we need to include.

This environment is not yet set up for containerized deployment, as it is more of an add-in to an environment, than providing a specific service. We are looking at ways of packaging/using this in a more “traditional” container scenario.

But back to Perl and DevOps. The majority of Scalable Informatics code is Perl based DevOps code, and has been for more than a decade. The code is simple, fast, well debugged. Handles very intense loads.

Tastes great, and less filling.

I’ve not felt that Perl5 was dying as a language. I’ve thought that there are many tools out there, and some of them are pretty good. Perl is just one of them. Though for the moment, it is my go-to language.

Personally I am a polyglot, and I try to use the system that gets in my way the least; allowing me to express what I need the most, with the greatest simplicity and accuracy. I think that if there is a signal in the TIOBE data, that it likely reflects this to some degree. People rediscovering solutions to problems.

Creating something new for the sake of creating something new versus using a powerful system that exists and solves problems correctly now may not be the best pattern for follow for DevOps. We’ve seen this time and again in this industry though. Some of the patterns are fads, some have longevity (even if there is no valid reason for their existence).

DevOps going forward will continue to push hard on toolchains, and those whom enable the greatest functionality with the least pain will likely be the winners. Similarly with analytics …

Make it easy to use and adopt. Make it ubiquitous. Perl has this now. Which is why I think the article might be on to something, even if the assumptions on the data are not valid.

Viewed 25188 times by 1157 viewers

Another itch scratched

So there you are, with many software RAIDs. You’ve been building and rebuilding them. And somewhere along the line, you lost track of which devices were which. So somehow you didn’t clean up the last build right, and you thought you had a hot spare … until you looked at /proc/mdstat … and said … Oh …

So. I wanted to do the detailed accounting, in a simple way. I want the tool to tell me if I am missing a physical drive (e.g. a drive died), or if a disk thinks it is part of a raid, even though the OS doesn’t agree.

And yes, this latter bit can happen, if you re-build the array, and omit one of the devices for whatever reason.

Like I did.

So …

root@usn-t60:/opt/scalable/sbin# ./lsswraid --raid=md23
N(OS)	= 14
N(disk)	= 15
More Physical disk RAID elements than OS RAID elements, likely you have a previously built element which has not been cleared.
The extra devices are: sdz

root@usn-t60:/opt/scalable/sbin# grep sdz /proc/mdstat

And to add this particular device back in as a hot spare …

/dev/sdz: 4 bytes were erased at offset 0x00001000 (linux_raid_member): fc 4e 2b a9
 
root@usn-t60:/opt/scalable/sbin# mdadm /dev/md23 --add /dev/sdz
mdadm: added /dev/sdz

root@usn-t60:/opt/scalable/sbin# grep sdz /proc/mdstat
md23 : active raid6 sdz[16](S) sdap[14] sdar[13] sdas[12] sdau[11] sdat[10] sdaf[9] sdag[8] sdai[7] sdah[6] sdaj[5] sdak[4] sdam[3] sdal[2] sdaa[15]

Viewed 55165 times by 1922 viewers

ClusterHQ dies

ClusterHQ is now dead. They were an early container play, building a number of tools around Docker/etc. for the space.

Containers are a step between bare metal and VMs. FLocker (ClusterHQ’s product) is open source, and they were looking to monetize it in a different way (not on acquisition, but on support).

In this space though, Kubernetes reigns supreme. So competing products/projects need to adapt or outcompete.

And its very hard to outcompete something like k8s.

Again, I feel for the folks kicked to the street. And this is likely just the beginning.

Viewed 60932 times by 2130 viewers

fortran for webapps

Use Fortran for your MVC web app. No, really

Here you are, coding your new density functional theory app, and you want to give it a nice shiny new web framework front end. Config files are so … 80s … Like in grad school, man … You want shiny new MVC action, with the goodness of fortran mixed in.

Out comes Fortran.io.

Viewed 61601 times by 2057 viewers

Another fun bit of debugging

Ok … so here you are doing a code build.

Your environment is all set. You have ample space. Lots of CPU, lots of RAM. All packages are up to date.

You start your make.

You have another window open with dstat running, just to kinda, sorta watch the system, while you are doing other things.

And while you are working, you realize dstat has stopped scrolling.

Strange, why would that be.

Ping the machine

Not responding.

Ok … hmmm … it crashed? Look in the BMC SEL (our kernel dumps panic messages there). Nothing.

Look at the system condition … overheating? Heck no, its actually running cool.

Hmmm….

Ok. Maybe something spurious. Connect up the SOL console, watch it finish booting.

Iterate. Log in 2 windows. Start dstat in one, build in another.

and …

bang …

Hmmm … nothing on the console …

Ok, hook up icl (ipmi console logger) to it. Capture the data. Lets see what is really happening.

Rinse repeat.

Bang.

Look in the log (ipmi console log that is, it will have everything).

Nope, completely blank.

/var/log/{syslog,messages}

Nothing.

Only happens under load? Could I have a blown CPU? I did see an EDAC memory error crop up once … ok, lets try something stupid. Something that should not work.

Drop the memory frequency to lowest speed.

Nope.

Turn off SMT (aka HT).

Nope.

Ok, lets go full moron, and assume hardware is the culprit, and is somehow … somehow not triggering an MCE or EDAC subsystem.

Let me remove 1/2 the memory.

Why not. Can’t hurt, easy to see it it works, right?

Start the build.

Works.

Do two intensive builds at once.

Works.

Do 3.

Works.

smh.

This is new memory, older board, older CPUs. Never given me a problem before.

Crashed with no message whatsoever.

I am going to assume something like a loading issue with the CPU. I can run this at 1/2 the ram, though I’ll probably put 1/2 of what I took out back in to check, and see if its a bad RAM, or a loading problem. Bad RAM should have triggered EDAC/MCE. Loading problem … maybe not.

Viewed 61570 times by 2085 viewers