Open source machine learning security plus the machine learning and surveillance bonus issue

The Lindahl Letter

1×

0:00

-7:04

Open source machine learning security plus the machine learning and surveillance bonus issue

Dr. Nels Lindahl

Jun 10, 2022

Security is the element of open source software that has to always be considered. Depending on the size of the associated developer community participating and the rate of development the number of security vulnerabilities is going to rise and fall. It will be a constant battle between those people trying to take advantage of vulnerabilities and the people who fight the good fight of software security. I have taken a serious look at this topic before. Realities of risk associated with software security are a problem for both open source and proprietary software. Arguments have been made that bringing together more and more people who are using a piece of software via the open source model will create a scenario where risk is reduced via transparency and contribution from a multitude of sources.

Back in January Kent Walker who is the President of Global Affairs for Google shared a blog post about, “Making Open Source software safer and more secure” [1]. That missive talked about log4j, a recent open source vulnerability that was impactful to a variety of industries. A reminder was included about a $100 million dollar donation to the Open Source Security Foundation [2]. Kent reduced the question down to figuring out the critical projects instead of trying to boil the ocean, being clear about security testing baselines, and figuring out methods for increased support from both public and private sources.

Let’s zoom out for a second and look at public policy and regulation related to open source software security. Back on May 12, 2021 Executive Order 14028 was issued about “Improving the Nation’s Cybersecurity” [3]. The whole order is 15 pages long and may take you about 20 minutes to read. You can pivot from that to the update from May 11, 2022 to the National Institute of Standards and Technology (NIST) guidance on “Software Security in Supply Chains” [4]. That collection of online pages will take you a lot longer to read. It has a pretty high density of content. A lot of supply chains are now using machine learning and a mix of open source software elements to make things work along the path from production to delivery. As you can imagine a lot of policy makers are legitimately concerned about risks to supply chains.

Now that we have considered concerns related to the developers, companies, and governments looking at open source software security you can see the scope of risk involved. I’m not sure I see any easy solutions on the horizon for this one. It is going to be something that has to be mitigated in real time and a lot of people are going to have to work together to make that happen on an ongoing basis.

Does this week include some bonus edition content? Yes, it does. We are about to cover a bonus topic related to, “Machine learning and surveillance.”

Welcome to the bonus topic this week. My backlog of topics has grown a bit out of control. This is week 72 for example and the backlog has 120 topics. Moving forward I’m going to grab a few of the topics and work on making a few double issues of The Lindahl Letter.

Making sense of and working with mind boggling amounts of data is something that machine learning can help with based on anomaly detection and computer vision elements. You can quickly work through hours of security video footage from cameras at a building and only work with the footage where motion or some type of change occurs. In terms of overnight security and monitoring this means that a large portion of the effort can be almost immediately cleared away. No review is required. You can then move from anomaly detection to the more complicated elements of computer vision to tag elements in the video and flag things for manual review or intervention by alarming or notification. I jumped into a quick Google Scholar search for all of the academic papers that might include or be related to, “computer vision machine learning surveillance” [5]. This is an area where you can find some really solid and well understood use cases.

Back during week 37 coverage one of the links referenced out to the CLIP technology from OpenAI [6]. You can grab an implementation of that from Johan Modin over on GitHub that will help you do contrastive language to image searches [7]. When you see people in movies just searching hours and hours of video for the needle in the haystack and coming back with a quick response of all the examples of “The Man with One Red Shoe” it would be based on a technology like this making that magic happen. If you have not seen the 1985 Tom Hanks comedy thriller by the same name, then you might be missing out on the rich comedic depth of that reference. With the right amount of investment and computing power you can do amazing things in the surveillance space with machine learning. Some of them are shockingly advanced compared to where we were before.

The part of this topic that I really want to cover, but is again a deeper topic for conversation involves the various methods people stitch data together for internet tracking. Some of these tracking methods make the surveillance methods mentioned above seem primitive. I’ll try to figure out a solid way to explain how machine learning is being used within internet tracking frameworks and work that content into a weekly post in the not so distant future.

Links and thoughts:

“[ML News] DeepMind's Flamingo Image-Text model | Locked-Image Tuning | Jurassic X & MRKL”

“UiPath CEO Daniel Dines thinks automation can fight the great resignation”

“Vergecast: Google CEO Sundar Pichai on Google I/O 2022”

“The Download: Markdoc, VS Code Updates, Optimus Prime LEGO and More!”

Top 5 Tweets of the week:

The Verge @verge

Introducing the @voxmedia Writers Workshop, a free training and mentorship program that pairs our reporters, such as @corintxt, with aspiring journalists and includes exclusive Q&As with senior staff like @reckless & @sarahjeong. Learn more or apply now: voxmediaevents.com/writersworkshop

Sundar Pichai @sundarpichai

Always fun to go super deep on our products on The Vergecast. Chatted with @reckless and @pierce about AI advancements, our growing Pixel portfolio, AR + more!

theverge.comVergecast: Google CEO Sundar Pichai on Google I/O 2022We interview the Google CEO and then do another hour.

Casey Newton @CaseyNewton

Spent the day talking to Twitter employees. No one knows why the CEO fired two of his top two lieutenants, and lots of people are uneasy about what’s coming next platformer.news/p/twitters-mel…

Hugging Face @huggingface

Last week @MetaAI publicly released huge LMs, with up to ☄️30B parameters. Great win for Open-Source🎉 These checkpoints are now in 🤗transformers! But how to use such big checkpoints? Introducing Accelerate and ⚡️BIG MODEL INFERENCE⚡️ Load & USE the 30B model in colab (!)⬇️

We load the checkpoint that is saved on disk and we dispatch it to the devices. At no point is the checkpoint fully loaded in RAM; only parts of it to be dispatched to each device.

We load it as float16 so that we may load more layers at a time on each device for a faster execution time.

nilay patel @reckless

Nothing says decentralized marketplace like automated copyright protections

theverge.comOpenSea is adding NFT copy detection and verification featuresOpenSea will scan for flips, rotations, and other copies.

Footnotes:

[1] https://blog.google/technology/safety-security/making-open-source-software-safer-and-more-secure/

[2] https://openssf.org/

[3] https://www.federalregister.gov/documents/2021/05/17/2021-10460/improving-the-nations-cybersecurity or in PDF here https://www.govinfo.gov/content/pkg/FR-2021-05-17/pdf/2021-10460.pdf

[4] https://www.nist.gov/itl/executive-order-14028-improving-nations-cybersecurity/software-security-supply-chains

[5] https://scholar.google.com/scholar?q=computer+vision+machine+learning+surveillance&hl=en&as_sdt=0&as_vis=1&oi=scholart

What’s next for The Lindahl Letter?

Week 73: Symbolic machine learning
Week 74: ML content automation
Week 75: Is ML destroying engineering colleges?
Week 76: What is post theory science?
Week 77: What is GPT-NeoX-20B?

I’ll try to keep the what’s next list for The Lindahl Letter forward looking with at least five weeks of posts in planning or review. If you enjoyed this content, then please take a moment and share it with a friend. Thank you and enjoy the week ahead.