Any idea how many DOE OOD installations there are?
We don’t have a good grasp of all sites that have OOD installed, but do keep track of ones that explicitly inform us. We do know it’s in production use at Idaho NL and Lawrence Berkeley NL.
We also are aware of the fact that Argonne NL, Lawrence Livermore NL, Pacific Northwest NL, and Oak Ridge NL all have evaluated and played around it, but I don’t know the current status of it at any of those locations.
If anyone has any updates, please let us know.
Because of the DOE/NOAA connection, Oak Ridge NL might provide the best insights regarding HPC security, reverse proxies, etc.
Any idea who to contact?
What type of information are you looking for? If it’s general cybersecurity audit of OOD, I’d recommend looking at the report from TrustedCI, which has done 2 independent evaluations of OOD: trustedci.org We also have some pretty good architecture overviews in the docs linked to from our main website. Alternatively, I know INL has done some pretty extensive audits of the code too as they’ve shared details of those with us. There are several people from INL that are active on this Discourse that you can search for an ping, including Brandon Biggs and Matt Sgambati
The technical problem getting OOD running for us is covered in another thread
The crux of the problem is security constraints, we don’t have a lot of latitude in changing fundamental security policies.
If we get to a complete technical dead end, we can either abandon OOD, or try to find a suitable, tested, alternate security policy. Since we work with DOE and they tend to have tighter security, it would be easier to pitch something they have operational as a path forward. Talking with someone at DOE using OOD may lead to some new ideas.
Another possibility is a Institutional service / development contract to bulldoze the problem in the OOD source. We definitely don’t want to fork the source and make the mods ourselves, for reasons of security and lack of expertise.
2 independent evaluations of OOD
Can you provide a link? Searching trustedci.org I was unable to find them.
a search of their blogs revealed the report.
Sorry, it was an oversight on my part not linking to these on the main openondemand.org website. I’ve added links to both reports in the Project Cybersecurity section now.
Just to document, you might notice that the publicly available version of the 2021 report says it is a redacted version (we didn’t need to redact anything in the 2018 report). The OOD team and TrustedCI jointly decided to redact a 2 page appendix from the official internal report prior to public release. This particular 2 page appendix is titled “FPVA Step 4 - Recommend Areas for Detailed Analysis” and contains a list of “the places where we would start looking in our detailed code assessment”. It’s basically a short roadmap for doing a FPVA of our code base. We felt it best to NOT publicly release that as to not provide any bad actors with a roadmap on how to look for vuneralibities in our code.
If anyone feels they need to see this appendix for some reason, please directly contact me to discuss.
I was able to have a productive conversation with INL. Because of our security design, they could not advise on that.
I am particularly interested in ORNL because we use their HPC facility & security for climate modeling. They have multiple HPC facilities, big shop.
I queried my ORNL contacts on our CM board, and they are unaware of OOD being evaluated for our HPC.
Any suggestions how to reach those who are evaluating at ORNL?
I would suggest reaching out to Jay Jay Billings at ORNL. I believe they use it on the CADES system.
Also, @matthew.dougherty can you share any details about ‘your security design’ that are causing issues with installing OOD? OOD is installed at hundreds of distinct centers all over the world, so it’s rare for us to hear about systemic reasons for not being able to install it. We’d be interested in understanding if there is something we could put on our roadmap to help resolve your issues. Happy to discuss offline at email@example.com if you would prefer to not discuss in public.
As for Billings, looks like he left for amazon three months ago. His profile at ORNL is up, but the email bounces. I will try to track him down to get the name of someone at ORNL. If they have it operational at ORNL, this would be a big advantage in getting through the CM gauntlet for our HPC at ORNL.
Regarding our OOD evaluation & security, it would be best for my team to answer it, perhaps a phone call. Many of the security issues are beyond me. At times I feel I am in Plato’s cave watching shadows. When we met with INL, their network staff talked to our network team, and the security issue became obvious to them, not something they encountered before.
They seemed to think, like our staff, that a source code change may be needed; something we would not attempt. This may be something best addressed by an Institutional service / development contract.
Got with Billings. His ORNL group was using OOD, but was disbanded a year ago. He was unable to provide any more info as to whether OOD is now being used, or the names of anyone at ORNL who might know.
Matthew: We’d be happy to have a technical discussion with your network team to better understand the issue and potentially come up with a source code change to resolve it. Please reach out to me at firstname.lastname@example.org to coordinate a meeting.
Matthew: To be clear, at a high level we understand the desire to have a Reverse Proxy with OOD on a separate host. But it still would be helpful to perhaps have a discussion with your technical team to understand more about this and your constraints.
Put another way, as I said, there are hundreds of HPC centers that run OOD and until now, none of them have had this security constraint you all have - this is a rather unique outlier. It would be helpful to understand more about the underlying security concerns / constraints to ensure any potential solution we come up with would be acceptable within them.