Don't Be Fooled: Securing Your Python Projects from PyPI Supply Chain Attacks

The cybersecurity landscape is becoming increasingly complex, with attackers continually evolving their methods to infiltrate systems and compromise sensitive data. One of the more insidious techniques gaining traction is the supply chain attack, particularly through the use of malicious packages in the Python Package Index (PYPI ). These supply chain attacks exploit the trust developers place in open-source libraries, allowing malicious actors to inject harmful code into software projects.

This blog will delve into the mechanics of supply chain attacks via PYPI packages, their implications, notable case studies, technological countermeasures, future trends, and predictions, as well as the steps necessary to mitigate such risks.

Contents hide

1 Understanding Supply Chain Attacks

2 The Mechanics of PYPI Supply Chain Attacks

3 Common PYPI Supply Chain Attack Techniques

4 Case Studies of PYPI Supply Chain Attacks

5 Deep analysis on Colorama

6 Phishing via PYPI Packages

7 Ten PyPi Packages Used to Steal Credentials

8 Technological Countermeasures

9 Future Trends and Predictions

10 Mitigation Strategies

11 Protecting Yourself from PYPI Supply Chain Attacks

12 Beyond the Basics: Advanced Security Measures

13 Conclusion

Understanding Supply Chain Attacks

A supply chain attack involves compromising a third-party component within a software supply chain, such as a library, dependency, or package, to inject malicious code into the final software product. In the context of PYPI, attackers upload malicious packages that appear legitimate, which are then unwittingly included in software projects by developers.

The Mechanics of PYPI Supply Chain Attacks

1. Creation of Malicious Packages: Attackers create a package that either mimics a popular library by using a similar name (typosquatting) or claims to provide valuable functionality.

2. Upload to PYPI : The malicious package is uploaded to PYPI , making it available for download and use by developers worldwide.

3. Inclusion in Projects: Developers, often in a hurry or relying on automated dependency management tools, inadvertently include the malicious package in their projects.

4. Execution of Malicious Code: Once the project is deployed, the malicious code within the package is executed, allowing the attacker to perform actions such as data theft, installing backdoors, or spreading malware.

Common PYPI Supply Chain Attack Techniques

Typosquatting: Attackers create packages with names that closely resemble popular ones, hoping developers will install them
accidentally. For instance, a malicious package named “django-mail” could target developers looking for the well-known “django-mailer” package.

Dependency Confusion: Attackers exploit the way Python resolves dependencies. By strategically naming a malicious package with a common name (e.g., “requests” or “logging”), they hope it will be chosen over the legitimate package with the same name.

Compromised Accounts: Attackers can steal developer credentials or gain access to their accounts to upload malicious packages under a trusted name. This technique leverages the reputation of the compromised developer to trick users into installing the malware.

Vulnerable Package Management Tools: Outdated or insecure package managers can have vulnerabilities that attackers can exploit to inject malicious code during the installation process.

Case Studies of PYPI Supply Chain Attacks

Colorama Typo squatting Attack

In 2022, a researcher discovered a series of packages on PYPI that closely mimicked the names of popular libraries, such as “Colorama” and “urllib3”. These packages contained malicious code designed to steal environment variables, which could include sensitive information like API keys and tokens.

Attack Vector:

The attacker uploaded packages with names like “Colorama” (instead of the legitimate “colorama”).
Developers who mistyped the package name during installation inadvertently installed the malicious package.
The malicious code within the package executed and exfiltrated environment variables to the attacker’s server.

Impact:

Hundreds of projects were potentially compromised, leading to the exposure of sensitive information and credentials.

Deep analysis on Colorama

In addition to spreading the malware through malicious GitHub repositories, the attacker also utilized a malicious Python package called “yocolor” to further distribute the “colorama” package containing the malware. They employed the same typosquatting technique, hosting the malicious package on the domain “files[.]pypihosted[.]org” and using an identical name to the legitimate “colorama” package.

Read more about the RisePro threat on GitHub

By manipulating the package installation process and exploiting the trust users place in the Python package ecosystem, the attacker ensured that the malicious “colorama” package would be installed whenever the malicious dependency was specified in the project’s requirements. This tactic allowed the attacker to bypass suspicions and infiltrate the systems of unsuspecting developers who relied on the integrity of the Python packaging system.

Stage 1

The first stage is where the unsuspected user downloads the malicious repo or package which contains the malicious dependency –“colorama” from the typosquatted domain, “files[.]pypihosted.org”

Example of how the malicious code looks like within the yocolor package

Stage 2

The malicious “colorama” package contains code that is identical to the legitimate package, with the exception of a short snippet of additional malicious code. Initially, this code was located within the file “colorama/tests/__init__.py”, but the attacker later moved it to “colorama/init.py” , likely to ensure that the malicious code is executed more reliably. This code sets the stage for the subsequent phases of the attack.

The attacker employed a clever technique to hide the malicious payload within the code. They used a significant amount of whitespace to push the malicious code off-screen, requiring someone inspecting the package to scroll horizontally for an extended period before discovering the hidden malicious content. This technique aimed to make the malicious code less noticeable during a quick review of the package’s source files.

This code fetches and executes another piece of Python code from “hxxps[:]//pypihosted[.]org/version,” which installs necessary libraries and decrypts hard-coded data using the “fernet” library. The decrypted code then searches for a valid Python interpreter and executes yet another obfuscated code snippet saved in a temporary file

Stage 3

The malware progresses further, fetching additional obfuscated Python code from another external link: hxxp[:]//162[.]248[.]100[.]217/inj, and executes it using “exec”.

Stage 4

Upon analysis, it’s clear that the attacker has put thought into obfuscating their code. Techniques such as the use of Chinese and Japanese character strings, zlib compression, and misleading variable names are just a few of the techniques employed to complicate the code’s analysis and comprehension.

The simplified code checks the compromised host’s operating system and selects a random folder and file name to host the final malicious Python code, which is retrieved from “hxxp[:]//162[.]248[.]100.217[:]80/grb.

A persistence mechanism is also employed by the malware by modifying the Windows registry to create a new run key, which ensures that the malicious Python code is executed every time the system is rebooted. This allows the malware to maintain its presence on the compromised system even after a restart.

Stage 5

The final stage of the malware, retrieved from the remote server, reveals the true extent of its data-stealing capabilities. It targets a wide range of popular software applications and steals sensitive information, some of which include:

Browser Data: The malware targets a wide range of web browsers, including Opera, Chrome, Brave, Vivaldi, Yandex, and Edge. It searches for specific directories associated with each browser and attempts to steal sensitive data such as cookies, autofill information, browsing history, bookmarks, credit cards, and login credentials.

Discord Data: The code specifically targets Discord by searching for Discord-related directories and files. It attempts to locate and decrypt Discord tokens, which can be used to gain unauthorized access to the victim’s Discord account.

Cryptocurrency Wallets: The malware includes a list of cryptocurrency wallets that it aims to steal from the victim’s system. It searches for specific directories associated with each wallet and attempts to steal wallet-related files. The stolen wallet data is then compressed into ZIP files and uploaded to the attacker’s server.

Telegram Sessions: The malware also attempts to steal Telegram session data. It searches for Telegram-related directories and files, aiming to capture the victim’s session information. With access to Telegram sessions, the attacker could potentially gain unauthorized access to the victim’s Telegram account and communications.

Read more about Telegram App Scams

Computer Files: The malware includes a file stealer component that searches for files with specific keywords in their names or extensions. It targets directories such as Desktop, Downloads, Documents, and Recent Files.

Instagram data: The malware attempts to steal sensitive information from the victim’s Instagram profile by leveraging the Instagram session token. The malware sends requests to the Instagram API using the stolen session token to retrieve various account details.

Further analysis of the final payload reveals that the malware also includes a keylogging component. It captures the victim’s keystrokes and saves them to a file, which is then uploaded to the attacker’s server. This capability allows the attacker to monitor and record the victim’s typed input, potentially exposing sensitive information such as passwords, personal messages, and financial details.

The stolen data is exfiltrated to the attacker’s server using various techniques. The code includes functions to upload files to anonymous file-sharing services like GoFile and Anonfiles. It also sends the stolen information to the attacker’s server using HTTP requests, along with unique identifiers like hardware ID or IP address to track the victim.

Phishing via PYPI Packages

In another instance, a series of malicious packages were found to include phishing URLs. When developers installed these packages, they were prompted to visit a website that appeared legitimate but was designed to steal their credentials.

Attack Vector:

Malicious packages included code that displayed a message prompting the user to visit a phishing site for “additional setup”.
The phishing site captured credentials entered by unsuspecting developers.

Impact:

User accounts were compromised, leading to unauthorized access to systems and services.

Ten PyPi Packages Used to Steal Credentials

The malicious PyPi packages discovered by CheckPoint and outlined in a new report are:

Ascii2text – Mimicking “art,” a popular ASCII Art Library for Python, Ascii2text uses the same description minus the release details. Its code fetches a malicious script that searches for local passwords and exfiltrates them via a Discord webhook.

Pyg-utils, Pymocks, PyProto2 – All three packages target AWS credentials and appear very similar to another set of packages discovered by Sonatype in June. The first even connects to the same domain (“pygrata.com”), while the other two target “pymocks.com”.

Test-async – Package with a vague description that fetches malicious code from a remote resource and notifies a Discord channel that a new infection has been established.

Free-net-VPN and Free-net-vpn2 – User credential harvester published to a site mapped by a dynamic DNS mapping service.

Zlibsrc – Mimicking the zlib project, this package contains a script that downloads and runs a malicious file from an external source.

Browserdiv – Package targeting the credentials of web design programmers. Uses Discord webhooks for data exfiltration.

WINRPCexploit – A credential-stealing package that promises to automate the exploitation of the Windows RPC vulnerability. However, when executed, the package will upload the server’s environment variables, which commonly contain credentials, to a remote site under the attacker’s control.

Technological Countermeasures

To protect against supply chain attacks via PYPI packages, organizations and developers can implement several technological
countermeasures:

1. Package Verification: Use tools and services that verify the integrity and authenticity of packages. Services like “pip audit” can help identify vulnerabilities and malicious code in dependencies.

2. Dependency Management: Employ strict dependency management practices. Use tools like pipenv or poetry to lock dependencies to known, trusted versions and prevent unauthorized updates.

3. Automated Scanning: Implement automated security scanning for code and dependencies. Tools like Snyk and GitHub’s Dependabot can continuously monitor for vulnerabilities in dependencies.

4. Code Reviews: Conduct thorough code reviews, particularly for new dependencies. Ensure that any third-party code included in a project is reviewed for potential security risks.

5. Sandboxing: Run new or untrusted code in a sandbox environment to observe its behavior before deploying it to production systems.

Future Trends and Predictions

1. Increased Frequency and Sophistication: As awareness of supply chain attacks grows, attackers will develop more sophisticated techniques to evade detection. This may include more subtle typosquatting or the use of advanced obfuscation methods.

2. Targeting CI/CD Pipelines: Continuous Integration and Continuous Deployment (CI/CD) pipelines, which automate software development processes, will become prime targets. Compromising these pipelines can allow attackers to inject malicious code into multiple projects at once.

3. Enhanced Detection Capabilities: Security tools and platforms will evolve to better detect and prevent supply chain attacks. Machine learning and AI will play a crucial role in identifying anomalies and potential threats in real-time.

4. Collaboration and Information Sharing: Increased collaboration between organizations, security researchers, and platform providers like PYPI will lead to more effective identification and mitigation of threats. Shared threat intelligence will be critical in staying ahead of attackers.

Mitigation Strategies

1. Education and Awareness: Educate developers and teams about the risks associated with supply chain attacks and the importance of verifying dependencies. Awareness is the first line of defense.

2. Regular Audits: Conduct regular security audits of projects and dependencies. Ensure that all third-party packages are vetted and that any unnecessary dependencies are removed.

3. Use of Trusted Repositories: Where possible, use trusted and verified repositories. Organizations can maintain internal repositories of approved packages to reduce the risk of introducing malicious codes.

4. Incident Response Plan: Develop and maintain an incident response plan specifically for supply chain attacks. This plan should include steps for identifying compromised packages, mitigating damage, and restoring affected systems.

Protecting Yourself from PYPI Supply Chain Attacks

While PYPI takes steps to identify and remove malicious packages, the responsibility ultimately lies with developers to be vigilant. Here are some key strategies to fortify your defenses:

Stay Updated: Regularly update your package manager and development tools to patch vulnerabilities that attackers might exploit. Scrutinize Package Details: Before installing a package, take a moment to review its description, author, and version history. Look for inconsistencies or red flags that might indicate malicious intent.

Use Trusted Sources: Consider using virtual environments and installing packages from trusted sources like official repositories or private repositories managed by your organization.

Leverage Dependency Verification Tools: There are tools available that can scan your project’s dependencies for known vulnerabilities or suspicious packages. These tools can be a valuable addition to your security toolkit.

Practice Secure Coding Practices: While securing the package ecosystem is important, don’t forget about secure coding practices within your own projects. Validate user input, sanitize data, and follow security best practices to minimize the impact of potential vulnerabilities.

Stay Informed: Keep yourself updated on the latest PYPI security threats and best practices. Subscribe to security advisories from PYPI and relevant security blogs to stay ahead of the curve.

Beyond the Basics: Advanced Security Measures

For organizations heavily reliant on Python and managing complex software supply chains, additional security measures can be
implemented:

Multi-Factor Authentication (MFA): Enforce MFA for developer accounts to make it more difficult for attackers to gain unauthorized access.
Code Signing: Implement code signing for packages to verify their authenticity and origin.
Static Code Analysis: Integrate static code analysis tools into your development workflow to identify potential vulnerabilities within your codebase and dependencies.

Conclusion

Supply chain attacks are a growing threat in the software development landscape. By understanding the tactics attackers use and implementing the security measures outlined above, developers can significantly reduce the risk of falling victim to a PYPI supply chain attack. Remember, security is an ongoing process. By staying vigilant and adopting a layered security approach, you can ensure the integrity of your Python projects and protect your users from harm.