CATcher:
MarkBind:
RepoSense:
TEAMMATES:
Pandas is a popular Python-based software library for data analytics.
Pytest is a Python testing framework which encompasses unit tests, integration tests, end-to-end tests, and functional tests.
Pandas project uses pytest to test their software. For PR #51384, I had to learn how to write a simple Pytest test to test the functionality of the ‘cut()’ function. I read the documentation to gain basic knowledge about its syntax, and looked through the tests folder under pandas project to gain a deeper understanding of how tests are written and managed in pandas.
Doctests refer to chunks of text that are modelled after interactive Python sessions. Although they are embedded in text, they can be executed just like any other Python code snippets.
In pandas, doctests are mainly included in documentation to illustrate the functionality and features of different functions and data structures.
For PR #51389, I was involved with resolving ‘undefined variable’ errors in doctests, as well as improving existing examples in the documentation. With the help of flake8 (a style guide enforcement tool), as well as the doctests library documentation, I was able to pick out errors in the doctests and fix them.
Thus, the developers are able to show more depth in their area of expertise, which allowes new developers to engage in more meaningful conversations with regards to the issue they are working on. Due to their specialisation in reviewing PRs in a certain aspect, they are able to link problems to other existing related issues right away and provide more insight to new developers.
In addition, through PR reviews, the core developers also actively encourage new developers to investigate issues further and participate in the project in other ways. This way, new developers feel more committed to the project.
For example, while working on PR #51356, I detected an error in the way flake8 (a style guide enforcement tool) was processing template strings in doctests. I pursued the issue further and with my findings, the core developers requested that I open issue #51377 for it.
This helps to streamline processes and allow developers to start on development tasks faster.
After bug fixes are merged, relevant issues are still kept open with a new tag “Needs tests”.
The test added is specific to the bug fixed, such that in the future, it will be able to catch the same bug and prevent possible regression.
As compared to 4 set-up checks we have set up under CATcher’s GitHub Actions Continuous Integration process, pandas project has 39 checks in place. This is helpful in maintaining the codebase with many developers committing code often.
However, I also noticed that the presence of many checks often causes random failures. In addition, the checks take hours to complete. This could negatively affect efficiency when merging code.
Three main suggestions/tools to be adopted for CATcher would be:
Right now, senior developers review any PR they come across without being in charge of any particular area of expertise. Sometimes, this could lead to similar issues / PRs being responded by different senior developers, which causes ineffective knowledge sharing and possible miscommunication.
The CATcher team could make it a point to divide up the PRs into different aspects (i.e. by adding tags such as ‘tests’ or ‘documentation’) and each take responsibility over reviewing PRs in that particular aspect.
This would allow senior developers to build expertise, and also impart knowledge to junior developers more effectively.
Tests are not prioritised in the CATcher codebase although the maintainability of our application is crucial due to its usage during examinations.
The CATcher team should make it a habit to add or improve on tests right after bug fixes or tweaks in features to keep our codebase robust.
This would reduce the steps where a new contributor needs to request and wait for approval before getting assigned. It could also encourage more students to start contributing.
According to the project readme, httpexpect is for "concise, declarative, and easy to use end-to-end HTTP and REST API testing for Go (golang)". It allows users to incrementally build HTTP requests to test on a chain, and then inspect the response payload recursively. It also supports testing of WebSockets. This project has about 2.1k stars as of March 2023, even though there are not many PRs and issues now.
I made a total of 3 pull requests.
Expect()
more than once for a single http request, or if the user tries to edit the request after calling Expect()
. This should not be allowed, because calling Expect()
would mean that the http request would be sent. This PR will fix this problem by explicitly checking that users do not violate such rules. The PR also adds a sentence in the documentation that warns users about these rules. Unit tests are also added to ensure the correctness of my new code.Response
. This PR will ensure that the entire body is not read inside the constructor, but instead read it only on demand when it is needed for the first time. New tests are added and existing tests are changed to align with the new changes. No documentation changes are needed because it does not affect the behaviour of Response
for now.bodyWrapper
is an internal wrapper that is used by the Response
to enable reading the response body several times. Before the PR, when bodyWrapper
is constructed, it will read the entire body and cache it into memory, and close the reader. This PR will change the behaviour of the bodyWrapper
such that it will read bytes from the reader only when some function is called to read a portion of the body. The content is then cached into a Go slice (a resizable array). To support infinite-length responses, it is possible to disable caching of the body in memory. This PR made big changes, including to the behaviour of functions like Rewind()
and GetBody()
in bodyWrapper
, so the tests for bodyWrapper
have to be changed significantly. Similar to PR 266, no documentation changes are needed because it does not affect any external behaviour.ready for review
or needs revision
to easily sort PRs based on their status.Ockam is a suite of open source programming libraries and command line tools that handles end-to-end encryption, mutual authentication, key management, credential management, and authorization policy enforcement.
Modern applications are distributed and have an unwieldy number of interconnections that must trustfully exchange data. To trust data-in-motion, applications need end-to-end guarantees of data authenticity, integrity, and confidentiality. To be private and secure by-design, applications must have granular control over every trust and access decision. Ockam allows the app developer to add these controls and guarantees to any application.
As of March 2023, Ockam has a total of 3K stars of Github, 203 OSS contributors (including me) and 272K downloads on Crate (Rust's Package Registry).
The workflow for contributing to Ockam is pretty standard as far as open-source projects goes. A few things that I noticed that were really great were that the project maintainers were very helpful and the PRs were being reviewed quickly (often in less than 1 week's time).
good-first-issues
.type
and scope
. It should be organized as type(scope): <subject>
. For example feat(rust): ...
or refactor(elixer): ...
.Resources:
I contributed mainly to the enhancement of the Ockam CLI.
Merged PRs:
refactor(rust): rename ockam tcp-listener create command arguments #3194
This is a refactor PR that updates certain CLI hints in Ockam to be more intuitive.
Original issue: Rename ockam tcp-listener create command arguments to --at
feat(rust): implement secure-channel-listener list command #3256
My second contribution is implementing a new feature in Ockam's CLI - the ability to create secure channel listeners. Basically the idea of secure channel listeners that they are secure components that are able to consume any event that passes through a secure channel. This allows users to create channel listeners that are more secure since all of the traffic will be end-to-end encrypted.
This involved creating new structs in Rust for user commands to the secure-channel-listener
command, interacting with the backend to modify secure channel listeners and doing proper error handling when the user supplies erroneous inputs. More details in the PR.
Original issue: Implement ockam secure-channel-listener list command #3192
refactor(rust): set default_node_name using clap #4120
My third contribution is a refactor code change that improves the code quality of Ockam's codebase by abstracting away the use of Optionals and replacing it with a utility function that handles the interaction with optional objects. Hence, a new developer would not have to be familiar with the proper way of handling optionals and could focus more on actual API development.
Original issue: ockam node start command: set default value to node argument using Clap attributes #4080
In addition to all the merged PRs, I'm pleased to say that I have been recognized by the team at Ockam for my contributions during one of their release 🎉!
While Ockam as a project only started in 2021, it has now garnered over 3k stars on Github and hundreds of contributors have previously contributed to it. I think one of the main things that I saw the team at Ockam doing to improve the visibility of the project is to maintain a healthy supply of good-first-issues
issues.
This is because many sites such as https://goodfirstissue.dev/ and https://goodfirstissues.com/ essentially scrapes Github repos for good-first-issues
tag and highlights those projects with a higher number of good-first-issues
.
Ockam's codebase is mainly written in Rust as it is more secure over other systems languages such as C++, while still retaining much of the performance benefits of being a low-level systems language which is important in the context of cryptographic operations.
The main resource that I used for learning Rust is the official Rust website. There's a more comprehensive textbook that I occasionally referred to when investigating certain semantics of the language. In particular, as someone who is very into Programming Languages and Compilers, I really appreciated Rust's approach to safe memory management through what they call the Ownership System. This is beautifully explained in Chapter 4 of the textbook.
Cargo is Rust's build system and package manager. Most Rust projects use this tool because Cargo handles a lot of tasks for them, such as building code, downloading the libraries their code depends on, and building those libraries. Ockam also uses Cargo.
I really appreciated the fact that Cargo is shipped together with Rust as a bundle, since other similar languages such as C++ do not and it led to a lot of pain in finding the right build tool and package manager. The main resource that I used can be found in the Rust textbook.
This topic is relevant to my project since one of the main uses of Ockam is to allow app developers to easily introduce end-to-end encryption to their project.
End-to-end encryption is a security method that basically ensures that only the sender and the receiver of a message are able to read the message. This means that any third-party intermediaries that the message passes through will not be able to read that message. In particular, it will mean that the government / relevant authorities will not be able to read that message.
End-to-end encryption has become a really hot topic in security recently due to the greater awareness and focus on the topic of user privacy.
Some relevant resources on end-to-end encryption:
good-first-issue
issuesSeeing how Ockam was able to greatly improve its visibility by having a lot of good-first-issue
issues, it would be good to have a healthy supply of good-first-issue
issues for Markbind too. This would involve Markbind developers to leave certain low-hanging fruits for new contributors to tackle.
Ockam uses a total of 17 Github Actions workflows to check various things when a PR is submitted such as style checks, commit message style check, ensuring that new tests were added and the tests passes, etc. I think there is still room to improve for Markbind in terms of using Github Actions to make certain checks. In particular, I think it would be helpful to have a workflow to check that commit messages follow the proper convention, since that is a pretty common issue that keeps coming up when reviewing PRs.
Ockam enforces a convention for commits where the commit has to be tagged with a relevant issue type such as feat
for feature, fix
for bug fix, refactor
for refactor and so on. I think this is a good way of organizing commits since one would know at a glance, what's the purpose of that commit. In addition, it would be easier when making a release since the commits are already properly tagged.
MDN Web Docs is an open-source, collaborative project that documents web technologies including CSS, HTML, JavaScript, and Web APIs. Alongside detailed reference documentation, we provide extensive learning resources for students and beginners getting started with web development.
As MDN Web Docs is an educational resource, my main contribution focus has been on
As part of getting familiar with the MDN Web Docs workflow, I have made several (10+) PRs fixing small-scale issues. Most of such work has been done at the start of my contribution period, and I have since kept them to a minimum on a weekly basis, to explore more complex issues.
Selected PRs:
This includes PRs that are more complex and require more research and effort to complete.
Selected PRs:
Selected Issues:
Complete list of
Tools/technologies I learned:
It's also a first for me to actually read the HTML and CSS formal specifications, and I have to say being precise and defining standards is not easy!
Resources:
It's quite easy to start contributing to the project, as it can be done entirely on GitHub. With the use of Markdown, it is also easy to make simple changes to the documentation. The general workflow goes as follows:
I think the project has merit in its own right, especially given the number of page views it gets. However, I think there are indeed areas of improvement that can be made. For example, the search functionality and UX is not as good as it could be. While the search input box gives immediate results in the form of a dropdown, if the search term is not found, the user will have to go to the dedicated search page, which shows a list of search results in plain text. This feels awkward, and I think I would be forced to search for the same term on Google instead. Another thing for improvement is the sidebar. The left sidebar of pages can be quite long and at times, it is not clear how the pages are structured. I think better categorization of the items in the sidebar would help.
Dendron is an open-source, local-first, markdown-based, note-taking tool. It's a personal knowledge management solution (PKM) built specifically for developers and integrates natively with IDEs like VS Code and VSCodium.
A list of some of my involvements in the project:
I also participated in the project Discord server and helped answer questions from new users for a short period of time.
Tools/technologies I learned:
Even though I did not dive deep into the codebase and contribute further (due to the fact that the project team decided to pivot to a different direction as the tool did not get to product-market fit), I did learn a lot about the project. I am impressed by the amount of work that has been put into the project, and the documentation is very well-written and detailed. I also adopted the tool for my own personal use since then.
I think the silver lining of this experience is that I have a better understanding of how VS Code extensions work, and I am now more aware of what it takes to build and maintain a large-scale project. For example, some of the events that the project team holds are quite interesting and perhaps we should consider doing something similar for our projects:
In Greenhouse talks, Dendron community members share the fruits of their learning. This may include showcasing workflows, tooling setups, systems, and other topics in personal knowledge management, but also anything that the speaker has in-depth knowledge of that may be of interest to the wider community.
The Dendron team highlights commonly used features and open the floor to community Q&A in the Dendron Discord.
A CROP (Community Request ) is an issue that is submitted and voted on by the community.
Resources:
The project is well-documented, with a dedicated developer guide and details on how to get started contributing. To highlight some of the useful inclusions in the developer guide:
Tachiyomi is a free and open-source manga reader application for Android devices. It allows users to read manga from various sources, including popular websites like Mangadex, MangaPark, and Kissmanga, and also supports importing manga files from local storage. Tachiyomi provides a clean and customizable user interface and offers features like automatic updates, tracking reading progress, and support for multiple languages. Additionally, Tachiyomi offers extensions that enable users to access manga from additional sources and offers customization options such as dark mode, custom reading settings, and more.
https://github.com/tachiyomiorg/tachiyomi/pulls?page=1&q=is:pr+author:Two-Ai
I have been using Tachiyomi for at least 8 years now and I really like the app. Hence wanted to contribute back to the project. I also wanted to learn more about andriod development and kotlin.
As Tachiyomi is a manga reader that is fairly full featured, my main contributions where on bug fixes and code refactoring. I used an alternate github account just for this task as I didn't want my identity to get exposed (especially since this app is used by so many people and heavily forked).
Github account used: https://github.com/Two-Ai
I started contributions before starting CS3282, as I thought it would take alot longer to get my PRs merged, but the dev team was surprisingly fast at merging PRs. I started by fixing some bugs that I encountered while using the app. I also did some code refactoring to make the code more readable and easier to maintain.
I made roughly 30 PR's in the period from December 2022 to March of 2023. Most of my contributions were focused on small fixes in the download logic of the app, with some medium sized PRs which i will go into detail below.
Inline DownloadQueue into Downloader
One of my larger refactors which focused on moving the queue state into the downloader.
In this PR I simplified the logic for filtering manga. This reduced the complexity of the code by quite a bit.
Make DownloadManager the sole entry point for DownloadService
The PR proposed making the DownloadManager the sole entry point for the DownloadService, which improves the codebase in several ways. It provides a clear structure for the Downloader system, simplifies interactions between classes, reduces code duplication, avoids race conditions, and improves accessibility by exposing the Downloader interface to DownloadService without exposing the full Downloader in DownloadManager. These changes make the system easier to understand, modify, and maintain while reducing the risk of bugs caused by concurrent access to the system.
Complete list of
Tools/technologies I learned:
Skills learned:
Planning: I have learned the importance of planning when working on a complex codebase. Planning helps to identify potential issues and ensure that the changes made to the codebase will improve its structure and maintainability.
Separation of Concerns: I have learned the importance of separating concerns when designing a system. The proposed structure for the Downloader system separates the responsibilities of each class and provides a clear structure for the codebase. This separation of concerns makes the system easier to understand and modify.
Maintainability: I have learned the importance of writing maintainable code. The proposed changes that I've made to the codebase simplify interactions between classes, reduce duplication of code, and make the code more concise and easier to read and understand.
Avoiding Race Conditions: I have learned the importance of avoiding race conditions when working on a concurrent systems. The refactored code avoids race conditions by ensuring that the system state is consistent and by limiting the number of dependencies between classes.
Android Architecture Components: I have learned how to use Android Architecture Components such as LiveData, ViewModel, and Room to build more robust, maintainable, and testable Android applications.
Multithreading: I have learned how to manage multithreading in Android applications, including using AsyncTask, Handler, and Executor to perform long-running operations in the background.
Networking (experimental): I have learned how to use Android's networking libraries, such as Volley and OkHttp, to make network requests and fetch data from web services.
Tachiyomi's development workflow is quite simple. It uses github issues and pull requests to track bugs and features. The project is also split into multiple repositories, each with their own maintainers. The main repository contains the core code and the UI. The other repositories are for the extensions, which are used to fetch manga from different sources. The repositories are linked together using git submodules.
Most of the development talk actually happens on their discord where users issues and items that the lead devs want to work on will be layed out. The lead devs will then assign the issues to themselves or other contributors. The contributors will then work on the issue and submit a pull request. The pull request will then be reviewed by the lead devs and merged if it is good.
https://github.com/tachiyomiorg/tachiyomi/blob/master/CONTRIBUTING.md
I think the setting up and beginners doc should be improved. Also I would like them to give more feedback on the pull requeusts through github instead of having to see my DM's on discord.
Devfi is a platform that allows developers to earn crypto by contributing to open source projects. We believe that open source projects are the backbone of the software industry and that developers should be rewarded for their contributions.
View it here
This project was developed by me and my friends as we try to give open source developers some incentive to contribute. We plan on making it open source and to launch it as something that web3 developers can use to reward contributors (much like the current bounty systems, but now much easier to use).
We created a github bot that can be added by any github organization. The said organization can then create a bounty for any issue in their repository. The bounty will be paid out in the form of a crypto token that is created by us. The developer can then claim the bounty by submitting a pull request that fixes the issue. The bot will then verify the pull request and pay out the bounty to the developer.
Tools/technologies I learned:
Wikimedia Commons is part of the Wikimedia family for non-profit free content that handles uploading, reviewing and sharing of pictures. The app allows users to upload their work directly from their mobile device where they might have taken the photo.
PRs merged:
strip
(introduced only in Java 11) so a manual implementation was required.DialogUtil::showAlertDialog
in the codebase where applicable (Minor code quality PRs)
Issues created:
DialogUtil::showAlertDialog
Give tools/technologies you learned here. Include resources you used, and a brief summary of the resource.
Optional
.The technical knowledge gained has been covered in the previous section. Here, I will cover some of the things I observed from working on a much larger OSS project.
I think these two main things are extremely relevant for RepoSense. While not quite as large or attractive to new OSS contributors, I think lowering the barrier to entry via good maintenance and assignment of issues, and good documentation can go a long way to make a contributor's experience better.
Quote from the official documentation
Checkstyle is a development tool to help programmers write Java code that adheres to a coding standard. It automates the process of checking Java code to spare humans of this boring (but important) task. This makes it ideal for projects that want to enforce a coding standard.
RepoSense uses Checkstyle to enforce its Java coding standard. The detailed configuration is in checkstyle.xml.
My contributions are mainly on enhancing the existing documentation of Checkstyle
. To be more specific, after experimenting with the tool, I added a significant number of new examples to document the various usage of JavadocType
, a check on Javadoc
of definitions of types such as Interface
, class
, and enum
. Here is the list of examples that I added.
scope
propertyauthorFormat
propertyversionFormat
propertyscope
and excludeScope
propertiesallowMissingParamTags
propertyallowUnknownTags
propertyallowedAnnotations
propertyPull request: Issue #7601: Add examples for JavadocType #12736; Issue: Update doc for JavadocType #7601
Checkstyle is a tool that helps to enforce Java coding standard. Through contributing to this project, I learned the usage of the tool as well as its powered enabled by the numerous checks that can be specified in the configuration file.
Resources:
Maven is the management tool for Java-based project, and it is used by Checkstyle. Maven provides support for project build, dependency maintenance, and continuous integration. I had to install Maven during initial setup of Checkstyle and build the project using mvn install
. Additionally, I need to use commands such as mvn clean verify
to verify whether the CI will pass and mvn clean site -Pno-validations
to build the documentation site for preview when adding new changes. This also motivated me to learn about Maven along the way.
Resources:
Contribution Guide; Development Workflow; Pull request template; Pull request rules
Workflow:
approved
labelgit
mvn clean verify
to ensure that the CI will pass. If there are errors, return to step 4Note:
The contribution workflow seems quite strict.
Here are what can be adopted by RepoSense.
approved
label can be used to filter the list of relevant issues that are suitable for a pull request.Checkstyle, as an open source project, is maintained relatively well, despite its large community. Most of the issues and pull requests are well formatted, thanks to its comprehensive contribution guideline. However, I noticed that its main documentation and Java API are from separate sources, although a significant part of the content in the API is a duplicate of that in the main documentaiton. A possible suggestion will be to centralize API related documentation in order to prevent inconsistency and reduce maintenance cost.
Quote from Wikipedia
MDN Web Docs, previously Mozilla Developer Network and formerly Mozilla Developer Center, is a documentation repository and learning resource for web developers.
My contributions are mainly on enhancing the documentation related to HTML
, ARIA
, and JavaScript
.
Through working on Adjust the description for srcset of srcset
for image rendering on a HTML page.
Resources:
srcset
that originally deviated from other sources.Through working on Add a more detailed explanation of boolean in glossary #24350, I learned the definition of ARIA
and how its enumarted attributes works.
Resources:
ARIA
enumerate attributes.ARIA
.yarn start
. If changes need to be made, go back to step 2The contribution workflow is quite straightforward.
Here are what can be adopted by RepoSense.
MDN Web Docs has quite a large community. Additionally, the current contribution guideline does not seem to impose too many rules on the commit and pull request standard. Consequently, different issues and pull requests from different contributors may have different styles, which can cause overhead for the reviewers. Therefore, a possible suggestion will be to introduce more rules to standarize the contribution and increase the maintenance efficiency.
NUSMods is the official course catalogue, module search and timetable builder for the National University of Singapore.
Merged PR: Change ModuleLessonConfig value to array | |
Merged PR: Add support for Timetable for TAs | |
NUSMods uses Jest for their testing. I have never used Jest before and I was quite surprised by the ease of use of the tool. It is very easy to set up and the documentation is very clear.
Jest runs very fast and requires little set up. It is also very easy to write tests for React components. I was able to write tests for my code in a short amount of time.
The teams enforced a high coverage requirement for PRs. This is a good practice as it ensures that the code is well tested and the code quality is high. This is often neglected in RepoSense and we have to manually open issues for frontend code coverage.
RepoSense uses only Cypress for frontend testing. From my experience and research, Cypress is more suitable for end-to-end testing. This is due to the fact that cypress actually interact with the components in a browser. However, it is not as suitable for unit testing. Jest is a better choice for unit testing as it consumes less time and resources. I would suggest that we use Jest for unit testing and Cypress for end-to-end testing. This will also boost our frontend code coverage and seal up any corner cases that we are unable to test in Cypress due to time/resource limitations.
While I have some experience working with redux before, I was unaware of the need for a schema migration when updating the structure of a redux store.
In the process of implementing the feature, I needed to change the structure of the redux store. The project ran normally on my local browser as I had no data when I was developing. I was told by the maintainer to include a redux schema migration.
NUSMods loads the persisted data into the redux state in order to maintain students timetable data. Therefore, a change in the redux structure without any workaround will break the data of thousands of active users of NUSMods.
This tool is not applicable to RepoSense as we uses Vuex. As the report is loaded everytime based on the data, no data is persisted through this way. However, it is still good to keep in mind the change of store structure and its effect on different parts of the system.
Stortage of manpower
It is rather surprising that such a popular website is only maintained by 2 developers. The reviewing process is often very long and it could be discouraging to less experienced developers. More contributors could be trained for this role.