Pandas is a popular Python-based software library for data analytics.
Pytest is a Python testing framework which encompasses unit tests, integration tests, end-to-end tests, and functional tests.
Pandas project uses pytest to test their software. For PR #51384, I had to learn how to write a simple Pytest test to test the functionality of the ‘cut()’ function. I read the documentation to gain basic knowledge about its syntax, and looked through the tests folder under pandas project to gain a deeper understanding of how tests are written and managed in pandas.
Doctests refer to chunks of text that are modelled after interactive Python sessions. Although they are embedded in text, they can be executed just like any other Python code snippets.
In pandas, doctests are mainly included in documentation to illustrate the functionality and features of different functions and data structures.
For PR #51389, I was involved with resolving ‘undefined variable’ errors in doctests, as well as improving existing examples in the documentation. With the help of flake8 (a style guide enforcement tool), as well as the doctests library documentation, I was able to pick out errors in the doctests and fix them.
Thus, the developers are able to show more depth in their area of expertise, which allowes new developers to engage in more meaningful conversations with regards to the issue they are working on. Due to their specialisation in reviewing PRs in a certain aspect, they are able to link problems to other existing related issues right away and provide more insight to new developers.
In addition, through PR reviews, the core developers also actively encourage new developers to investigate issues further and participate in the project in other ways. This way, new developers feel more committed to the project.
For example, while working on PR #51356, I detected an error in the way flake8 (a style guide enforcement tool) was processing template strings in doctests. I pursued the issue further and with my findings, the core developers requested that I open issue #51377 for it.
This helps to streamline processes and allow developers to start on development tasks faster.
After bug fixes are merged, relevant issues are still kept open with a new tag “Needs tests”.
The test added is specific to the bug fixed, such that in the future, it will be able to catch the same bug and prevent possible regression.
As compared to 4 set-up checks we have set up under CATcher’s GitHub Actions Continuous Integration process, pandas project has 39 checks in place. This is helpful in maintaining the codebase with many developers committing code often.
However, I also noticed that the presence of many checks often causes random failures. In addition, the checks take hours to complete. This could negatively affect efficiency when merging code.
Three main suggestions/tools to be adopted for CATcher would be:
Right now, senior developers review any PR they come across without being in charge of any particular area of expertise. Sometimes, this could lead to similar issues / PRs being responded by different senior developers, which causes ineffective knowledge sharing and possible miscommunication.
The CATcher team could make it a point to divide up the PRs into different aspects (i.e. by adding tags such as ‘tests’ or ‘documentation’) and each take responsibility over reviewing PRs in that particular aspect.
This would allow senior developers to build expertise, and also impart knowledge to junior developers more effectively.
Tests are not prioritised in the CATcher codebase although the maintainability of our application is crucial due to its usage during examinations.
The CATcher team should make it a habit to add or improve on tests right after bug fixes or tweaks in features to keep our codebase robust.
This would reduce the steps where a new contributor needs to request and wait for approval before getting assigned. It could also encourage more students to start contributing.
According to the project readme, httpexpect is for "concise, declarative, and easy to use end-to-end HTTP and REST API testing for Go (golang)". It allows users to incrementally build HTTP requests to test on a chain, and then inspect the response payload recursively. It also supports testing of WebSockets. This project has about 2.1k stars as of March 2023, even though there are not many PRs and issues now.
I made a total of 3 pull requests.
Expect()more than once for a single http request, or if the user tries to edit the request after calling
Expect(). This should not be allowed, because calling
Expect()would mean that the http request would be sent. This PR will fix this problem by explicitly checking that users do not violate such rules. The PR also adds a sentence in the documentation that warns users about these rules. Unit tests are also added to ensure the correctness of my new code.
Response. This PR will ensure that the entire body is not read inside the constructor, but instead read it only on demand when it is needed for the first time. New tests are added and existing tests are changed to align with the new changes. No documentation changes are needed because it does not affect the behaviour of
bodyWrapperis an internal wrapper that is used by the
Responseto enable reading the response body several times. Before the PR, when
bodyWrapperis constructed, it will read the entire body and cache it into memory, and close the reader. This PR will change the behaviour of the
bodyWrappersuch that it will read bytes from the reader only when some function is called to read a portion of the body. The content is then cached into a Go slice (a resizable array). To support infinite-length responses, it is possible to disable caching of the body in memory. This PR made big changes, including to the behaviour of functions like
bodyWrapper, so the tests for
bodyWrapperhave to be changed significantly. Similar to PR 266, no documentation changes are needed because it does not affect any external behaviour.
ready for reviewor
needs revisionto easily sort PRs based on their status.
Ockam is a suite of open source programming libraries and command line tools that handles end-to-end encryption, mutual authentication, key management, credential management, and authorization policy enforcement.
Modern applications are distributed and have an unwieldy number of interconnections that must trustfully exchange data. To trust data-in-motion, applications need end-to-end guarantees of data authenticity, integrity, and confidentiality. To be private and secure by-design, applications must have granular control over every trust and access decision. Ockam allows the app developer to add these controls and guarantees to any application.
As of March 2023, Ockam has a total of 3K stars of Github, 203 OSS contributors (including me) and 272K downloads on Crate (Rust's Package Registry).
I contributed mainly to the enhancement of the Ockam CLI.
refactor(rust): rename ockam tcp-listener create command arguments #3194
This is a refactor PR that updates certain CLI hints in Ockam to be more intuitive.
Original issue: Rename ockam tcp-listener create command arguments to --at
feat(rust): implement secure-channel-listener list command #3256
My second contribution is implementing a new feature in Ockam's CLI - the ability to create secure channel listeners. Basically the idea of secure channel listeners that they are secure components that are able to consume any event that passes through a secure channel. This allows users to create channel listeners that are more secure since all of the traffic will be end-to-end encrypted.
This involved creating new structs in Rust for user commands to the
secure-channel-listener command, interacting with the backend to modify secure channel listeners and doing proper error handling when the user supplies erroneous inputs. More details in the PR.
Original issue: Implement ockam secure-channel-listener list command #3192
refactor(rust): set default_node_name using clap #4120
My third contribution is a refactor code change that improves the code quality of Ockam's codebase by abstracting away the use of Optionals and replacing it with a utility function that handles the interaction with optional objects. Hence, a new developer would not have to be familiar with the proper way of handling optionals and could focus more on actual API development.
Original issue: ockam node start command: set default value to node argument using Clap attributes #4080
In addition to all the merged PRs, I'm pleased to say that I have been recognized by the team at Ockam for my contributions during one of their release 🎉!
While Ockam as a project only started in 2021, it has now garnered over 3k stars on Github and hundreds of contributors have previously contributed to it. I think one of the main things that I saw the team at Ockam doing to improve the visibility of the project is to maintain a healthy supply of
This is because many sites such as https://goodfirstissue.dev/ and https://goodfirstissues.com/ essentially scrapes Github repos for
good-first-issues tag and highlights those projects with a higher number of
Ockam's codebase is mainly written in Rust as it is more secure over other systems languages such as C++, while still retaining much of the performance benefits of being a low-level systems language which is important in the context of cryptographic operations.
The main resource that I used for learning Rust is the official Rust website. There's a more comprehensive textbook that I occasionally referred to when investigating certain semantics of the language. In particular, as someone who is very into Programming Languages and Compilers, I really appreciated Rust's approach to safe memory management through what they call the Ownership System. This is beautifully explained in Chapter 4 of the textbook.
Cargo is Rust's build system and package manager. Most Rust projects use this tool because Cargo handles a lot of tasks for them, such as building code, downloading the libraries their code depends on, and building those libraries. Ockam also uses Cargo.
I really appreciated the fact that Cargo is shipped together with Rust as a bundle, since other similar languages such as C++ do not and it led to a lot of pain in finding the right build tool and package manager. The main resource that I used can be found in the Rust textbook.
This topic is relevant to my project since one of the main uses of Ockam is to allow app developers to easily introduce end-to-end encryption to their project.
End-to-end encryption is a security method that basically ensures that only the sender and the receiver of a message are able to read the message. This means that any third-party intermediaries that the message passes through will not be able to read that message. In particular, it will mean that the government / relevant authorities will not be able to read that message.
End-to-end encryption has become a really hot topic in security recently due to the greater awareness and focus on the topic of user privacy.
Some relevant resources on end-to-end encryption:
Seeing how Ockam was able to greatly improve its visibility by having a lot of
good-first-issue issues, it would be good to have a healthy supply of
good-first-issue issues for Markbind too. This would involve Markbind developers to leave certain low-hanging fruits for new contributors to tackle.
Ockam uses a total of 17 Github Actions workflows to check various things when a PR is submitted such as style checks, commit message style check, ensuring that new tests were added and the tests passes, etc. I think there is still room to improve for Markbind in terms of using Github Actions to make certain checks. In particular, I think it would be helpful to have a workflow to check that commit messages follow the proper convention, since that is a pretty common issue that keeps coming up when reviewing PRs.
Ockam enforces a convention for commits where the commit has to be tagged with a relevant issue type such as
feat for feature,
fix for bug fix,
refactor for refactor and so on. I think this is a good way of organizing commits since one would know at a glance, what's the purpose of that commit. In addition, it would be easier when making a release since the commits are already properly tagged.
As MDN Web Docs is an educational resource, my main contribution focus has been on
As part of getting familiar with the MDN Web Docs workflow, I have made several (10+) PRs fixing small-scale issues. Most of such work has been done at the start of my contribution period, and I have since kept them to a minimum on a weekly basis, to explore more complex issues.
This includes PRs that are more complex and require more research and effort to complete.
Complete list of
Tools/technologies I learned:
It's also a first for me to actually read the HTML and CSS formal specifications, and I have to say being precise and defining standards is not easy!
It's quite easy to start contributing to the project, as it can be done entirely on GitHub. With the use of Markdown, it is also easy to make simple changes to the documentation. The general workflow goes as follows:
I think the project has merit in its own right, especially given the number of page views it gets. However, I think there are indeed areas of improvement that can be made. For example, the search functionality and UX is not as good as it could be. While the search input box gives immediate results in the form of a dropdown, if the search term is not found, the user will have to go to the dedicated search page, which shows a list of search results in plain text. This feels awkward, and I think I would be forced to search for the same term on Google instead. Another thing for improvement is the sidebar. The left sidebar of pages can be quite long and at times, it is not clear how the pages are structured. I think better categorization of the items in the sidebar would help.
Dendron is an open-source, local-first, markdown-based, note-taking tool. It's a personal knowledge management solution (PKM) built specifically for developers and integrates natively with IDEs like VS Code and VSCodium.
A list of some of my involvements in the project:
I also participated in the project Discord server and helped answer questions from new users for a short period of time.
Tools/technologies I learned:
Even though I did not dive deep into the codebase and contribute further (due to the fact that the project team decided to pivot to a different direction as the tool did not get to product-market fit), I did learn a lot about the project. I am impressed by the amount of work that has been put into the project, and the documentation is very well-written and detailed. I also adopted the tool for my own personal use since then.
I think the silver lining of this experience is that I have a better understanding of how VS Code extensions work, and I am now more aware of what it takes to build and maintain a large-scale project. For example, some of the events that the project team holds are quite interesting and perhaps we should consider doing something similar for our projects:
In Greenhouse talks, Dendron community members share the fruits of their learning. This may include showcasing workflows, tooling setups, systems, and other topics in personal knowledge management, but also anything that the speaker has in-depth knowledge of that may be of interest to the wider community.
The Dendron team highlights commonly used features and open the floor to community Q&A in the Dendron Discord.
A CROP (Community Request ) is an issue that is submitted and voted on by the community.
The project is well-documented, with a dedicated developer guide and details on how to get started contributing. To highlight some of the useful inclusions in the developer guide:
Wikimedia Commons is part of the Wikimedia family for non-profit free content that handles uploading, reviewing and sharing of pictures. The app allows users to upload their work directly from their mobile device where they might have taken the photo.
strip(introduced only in Java 11) so a manual implementation was required.
DialogUtil::showAlertDialogin the codebase where applicable (Minor code quality PRs)
Give tools/technologies you learned here. Include resources you used, and a brief summary of the resource.
The technical knowledge gained has been covered in the previous section. Here, I will cover some of the things I observed from working on a much larger OSS project.
I think these two main things are extremely relevant for RepoSense. While not quite as large or attractive to new OSS contributors, I think lowering the barrier to entry via good maintenance and assignment of issues, and good documentation can go a long way to make a contributor's experience better.
Quote from the official documentation
Checkstyle is a development tool to help programmers write Java code that adheres to a coding standard. It automates the process of checking Java code to spare humans of this boring (but important) task. This makes it ideal for projects that want to enforce a coding standard.
RepoSense uses Checkstyle to enforce its Java coding standard. The detailed configuration is in checkstyle.xml.
My contributions are mainly on enhancing the existing documentation of
Checkstyle. To be more specific, after experimenting with the tool, I added a significant number of new examples to document the various usage of
JavadocType, a check on
Javadoc of definitions of types such as
enum. Here is the list of examples that I added.
Pull request: Issue #7601: Add examples for JavadocType #12736; Issue: Update doc for JavadocType #7601
Checkstyle is a tool that helps to enforce Java coding standard. Through contributing to this project, I learned the usage of the tool as well as its powered enabled by the numerous checks that can be specified in the configuration file.
Maven is the management tool for Java-based project, and it is used by Checkstyle. Maven provides support for project build, dependency maintenance, and continuous integration. I had to install Maven during initial setup of Checkstyle and build the project using
mvn install. Additionally, I need to use commands such as
mvn clean verify to verify whether the CI will pass and
mvn clean site -Pno-validations to build the documentation site for preview when adding new changes. This also motivated me to learn about Maven along the way.
Contribution Guide; Development Workflow; Pull request template; Pull request rules
mvn clean verifyto ensure that the CI will pass. If there are errors, return to step 4
The contribution workflow seems quite strict.
Here are what can be adopted by RepoSense.
approvedlabel can be used to filter the list of relevant issues that are suitable for a pull request.
Checkstyle, as an open source project, is maintained relatively well, despite its large community. Most of the issues and pull requests are well formatted, thanks to its comprehensive contribution guideline. However, I noticed that its main documentation and Java API are from separate sources, although a significant part of the content in the API is a duplicate of that in the main documentaiton. A possible suggestion will be to centralize API related documentation in order to prevent inconsistency and reduce maintenance cost.
Quote from Wikipedia
MDN Web Docs, previously Mozilla Developer Network and formerly Mozilla Developer Center, is a documentation repository and learning resource for web developers.
My contributions are mainly on enhancing the documentation related to
Through working on Adjust the description for srcset of , I learned the usage of
srcset for image rendering on a HTML page.
srcsetthat originally deviated from other sources.
Through working on Add a more detailed explanation of boolean in glossary #24350, I learned the definition of
ARIA and how its enumarted attributes works.
yarn start. If changes need to be made, go back to step 2
The contribution workflow is quite straightforward.
Here are what can be adopted by RepoSense.
MDN Web Docs has quite a large community. Additionally, the current contribution guideline does not seem to impose too many rules on the commit and pull request standard. Consequently, different issues and pull requests from different contributors may have different styles, which can cause overhead for the reviewers. Therefore, a possible suggestion will be to introduce more rules to standarize the contribution and increase the maintenance efficiency.
NUSMods is the official course catalogue, module search and timetable builder for the National University of Singapore.
|Merged PR: Change ModuleLessonConfig value to array|
NUSMods uses Jest for their testing. I have never used Jest before and I was quite surprised by the ease of use of the tool. It is very easy to set up and the documentation is very clear.
Jest runs very fast and requires little set up. It is also very easy to write tests for React components. I was able to write tests for my code in a short amount of time.
The teams enforced a high coverage requirement for PRs. This is a good practice as it ensures that the code is well tested and the code quality is high. This is often neglected in RepoSense and we have to manually open issues for frontend code coverage.
RepoSense uses only Cypress for frontend testing. From my experience and research, Cypress is more suitable for end-to-end testing. This is due to the fact that cypress actually interact with the components in a browser. However, it is not as suitable for unit testing. Jest is a better choice for unit testing as it consumes less time and resources. I would suggest that we use Jest for unit testing and Cypress for end-to-end testing. This will also boost our frontend code coverage and seal up any corner cases that we are unable to test in Cypress due to time/resource limitations.
While I have some experience working with redux before, I was unaware of the need for a schema migration when updating the structure of a redux store.
In the process of implementing the feature, I needed to change the structure of the redux store. The project ran normally on my local browser as I had no data when I was developing. I was told by the maintainer to include a redux schema migration.
NUSMods loads the persisted data into the redux state in order to maintain students timetable data. Therefore, a change in the redux structure without any workaround will break the data of thousands of active users of NUSMods.
This tool is not applicable to RepoSense as we uses Vuex. As the report is loaded everytime based on the data, no data is persisted through this way. However, it is still good to keep in mind the change of store structure and its effect on different parts of the system.
Stortage of manpower
It is rather surprising that such a popular website is only maintained by 2 developers. The reviewing process is often very long and it could be discouraging to less experienced developers. More contributors could be trained for this role.