[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: zxid and shared/distributed filesystems?



Eric Rybski wrote:
> Hi Sampo,
>     Glad to hear that a shared NFS mount should work well.  On that topic,
> I
> have a few additional questions:
>
> 1. I'm assuming the library uses file locks to prevent shared log files
> (like log/act and log/err) from clobbered by parallel CGI processes.  Has
> this been tested under high volume (# requests/sec)?  Obviously,
> performance
> would differ on a NFS vs. local mount, but I'm just trying to get a rough
> sense of scalability at this point.

The locking logic is in write2_or_append_lock_c_path(). As can be seen,
the Windows port does not support locking, the other platforms
use lockf/flock style collaborative locking. Collaborative is
ok, because only contention ever expected would be from another
ZXID process. When operating on NFS, you need to check that
locking is implemented on your variant of NFS. I think it is
a mount option you have to enable.

However, most of the file system operations are designed such that
no locking is needed at all. In the session management files
are effectively write once and with difficult to guess names,
thus making collision extremely unlikely. This inherent lack of
need for locking should make shared filesystem operations
much cheaper.

About the only place where locking really is needed and is used is
in audit logging. This might be a good argument for keeping
audit logs local to each machine. Right now for simplicity
they are in /var/zxid hierarchy. I suggest using symlinks there.

Although I have designed zxid to be usable on shared filesystem,
I have not stress tested zxid on shared filesystem. I would be
interested in hearing your experiences.

> 2. While testing, I noticed a number of directories and files are created
> in
> log/issue/[hash]/wir, log/rely/[hash]/wir and a7n, and ses/.  It appears I
> will need to implement a cron process of some sort to keep file growth
> under
> control, as I need to avoid problems like max allowed folders or files per
> directory on a NFS mount.  Any suggestions on what's safe to delete
> independently from zxid?

What are the max allowed files limits for NFS? I know some old
local filesystems like ufs had such problems, but all modern Unixes
seem to ship with a filesystem that can support at least millions of
files per directory so I never felt the need to add complexity
of directory hashing. If volumes ever grow that huge as to need
investment in writing such code, I suggest investing the same effort
in adding a MySQL backend for storing the volumnious information.

Yes, you can use cron to clean old files away. Depending on session
expiry settings you decide you should be able to know which sessions
are stale.

However, deleting the files might not be the right move. Those files
are part of the audit trail you may need to prove your innocence
or somebodyelse's guilt. When to delete depends on your business
liability situation - and on data retention policy you have set
or are forced by law to follow, i.e. law may require you to
delete before you would want to, or law may require you to keep
business records longer you would otherwise want to. IANAL, but
myself I tend to keep everything for 5 years.

>    Secondly, I'm having an issue getting digital signature validation
> working with a PingFederate IdP instance.  The PF IdP metadata (cached in
> my
> cot/) includes a certificate, the POST SAMLResponse contains a signature,
> and I have the IdP CA cert in my /var/zxid/pem/ca.pem. But I keep getting
> errors like:
>
> t  zxidsso.c:559 zxid_sp_sso_finalize zx E SSO warn: assertion not signed.
> Sigval((null)) (nil)

I saw the mail had attachments, but I started replying before I had
opened them, so I may be missing something.

It is a best practise to sign Assertion as it allows assertions to be
used independently afterwards. Does Ping not sign the Assertion element?
They signing just the SAMLResponse? If so, it is not according
to best practise. Of course signature on response protects everything
inside, but prevents independent use of a7n. About the only place
where signing just response would be justified would be in Simple
Sign POST profile where objective is explicitly to avoid having
to implement XML-DSIG.

> I've currently worked around this by setting "NOSIG_FATAL=0" in the

I thought I had MSG_SIG_OK=1 as default setting. With MSG_SIG_OK
you should not have to turn off NOSIG_FATAL. The code is in
zxidsso.c:556 zxid_sp_sso_finalize(). I need to see if the
response level signature would update ses->sigres correctly
in regular POST profile (which Ping uses?). I have tested that
it is updated in the SimpleSign case.

Ok, found it. I am not checking the response signature, except
in the SimpleSign case, as can be seen in zxiddec.c:131
zxid_decode_redir_or_post(). I'll add support for checking
this in next release, probably in zxid_sp_dig_sso_a7n()
function.

> zxid.conf, but this isn't a long-term solution.  I'm currently developing
> against the simple API in a mod_perl environment, but I'm also getting the
> same result with the zxid CGI binary.  IdP metadata, POST request, and
> response are attached, for reference.
>
> Lastly, it appears the Perl examples in the 0.32 distribution are out
> of
> date.  For example, sp_dispatch now takes 3 args (zxid.pl was trying to
> pass
> 4), and sp_dispatch result is now a string code (not state numbers), and

I see. Perhaps you send me a patch?

> some XS wrapper code needed patches to handle empty pointers (functions
> which otherwise return zx_str) to prevent segfaults.  I was planning to
> use
> the full API (zxid.pl) for detailed exception handling/reporting, but am
> currently depending on the simple API until I can finish a first, stable
> integration.
>
> Perhaps I might provide some help updating the Perl components, spare time
> permitting?

Please do.

> Overall, other than the above mentioned issues, the library is
> working OK so far with a PingFederate IdP and perfectly with
> ssocircle.com.

Nice to know.

Cheers,
--Sampo

> Thanks,
> Eric
>
>
> On Sun, Aug 16, 2009 at 3:57 PM, <sampo@xxxxxxxxxxx> wrote:
>
>> Eric Rybski wrote:
>> > Hello all,
>> >     I have a few questions about zxid's session management.
>> Currently,
>> > I'm
>> > interested in a redundant SP implementation using an existing
>> > load-balanced
>> > web server infrastructure.  Since zxid is solely filesystem based at
>> this
>> > time, I'm considering a few options for central session storage:
>> >
>> > 1. Use a single SP server and proxy SSO requests to this server.
>>
>> Does not give you redundancy.
>>
>> > 2. Use a NFS mount for /var/zxid/ses (and likely /var/zxid/log/rely).
>>
>> Doable.
>>
>> > 3. Use a virtual filesystem for /var/zxid/ses
>> > (and likely /var/zxid/log/rely), such as memcachefs or mysqlfs.
>>
>> Better. For simplicity, I would simply share /var/zxid across the farm.
>>
>> Please note that if the objective is scaling / load balancing, sharing
>> across farm is good idea. If objective is fault tolerance, you need
>> to be very careful and conscious about how you provide the shared
>> filesystem in a redundant fashion.
>>
>> > Given how zxid currently manages sessions via pseudorandom numbers,
>> would
>> > it
>> > be safe to run concurrently across multiple webservers on a
>> centralized
>> > filesystem?
>>
>> Current session IDs are quite short, 48bits, see zxidconf.h
>> ZXID_ID_BITS,
>> for sake of convenience and conciceness.
>>
>> The standard assumption about "insignificant" probability of collision
>> in absence of any explicit duplicate check is that 128 bit random
>> number is sufficient (it is more probable that you will have collision
>> due to error induced by cosmic ray than on statistical basis).
>>
>> Thus, I would encourage you to review what your tolerance for collision
>> is and adjust ZXID_ID_BITS accordingly.
>>
>> Adjusting for the above consideration, I do not see any problem in
>> running
>> across a farm of servers that share filesystem.
>>
>> > It seems most SP/IdP implementations use a single-server
>> > (with
>> > optional failover-server) concept, but my target environment is
>> generally
>> > better suited for distributed web services and already has
>> infrastructure
>> > in
>> > place for options 2 or 3.
>> >
>> > My priorites are: 1. security; 2. fault tolerance.  Thus, if a
>> centralized
>> > filesystem could compromise user security in any way (e.g. session
>> > directory
>> > shared due to pseudorandom collisions), a single SP server would
>> likely
>> be
>> > the better option.
>>
>> Even single SP code does not make duplicate check. It assumes you
>> adjusted ZXID_ID_BITS to correspond to your tolerance of collision.
>>
>> > Note: I see that I can compile ZXID_ID_BITS with a fairly high value
>> (i.e.
>> > 144), so the chance of a pseudorandom collision should be extremely
>> > improbable in a real-world context.
>>
>> Right. 144 is base64 rounded number above 128. This is what I recommend
>> for high security. However, many "commercial grade" apps may want to
>> optimize for compactness and convenience while maintaining "reasonable"
>> security. Hence my default of 48 bits.
>>
>> Before anyone has knee-jerk reaction about security, please do
>> provide analysis about the right number of bits. I am not interested
>> in dogma.
>>
>> Cheers,
>> --Sampo
>>
>> > Thanks,
>> > Eric