Why Bother?
Gone are the days where you could breeze through the world of academia and medical research with a basic knowledge of Excel formula and graphing. As research develops to require increasingly complex statistics and analyses, so has technology progressed to allow us to harness it for these purposes. Programmes now allow us to conduct everything from data cleaning to various statistical tests and analyses to generating visual or graphic summaries of our data – and if harnessed can be a powerful tool to better research and gain a key set of skills. Coding encompasses so much problem solving, that particularly when used for data science, you also learn transferable skills along with it.
Furthermore, coding is now a valued skill when approaching academics for research opportunities or applying for academic jobs, given its power to cut the time and effort involved in statistical analysis of findings. Through teaching myself to code in R for statistics during the summer of covid-19 I was able to conduct and publish a bibliometric analysis on noma the following year, and also cut down the time required to conduct appropriate analyses on my MSc dissertation data which is one of the largest datasets in the world on quality of life in neglected tropical diseases. Essentially, these skills are easy to teach yourself and to learn, and can hugely help to develop your skills and knowledge of research and academia.
Where and How Can you Learn?
Your first step is choosing your programming language – the 2 most appropriate for data science are R and Python, which both have their pros and cons. R is almost built for statistical and data analysis so has many libraries and packages for this, and makes it easy to create visualisations – whilst it is easy for people with no coding background, it can become difficult and not the most user-friendly to use from start to finish in a data science project, and isn’t made for production such as designing websites. Python on the other hand is more general and can be used for other things, with an easier syntax to understand – however can be difficult for people with no previous experience as it requires implementation of more complex data science functionalities. Personally I chose to start with R as it was most similar to STATA, a copyrighted programme I went on to use for my MSc degree, and I felt it had as far a reach as I required in terms of output as I was not going to be creating websites for example in Python.
The next step is to abandon your fear of failure, as coding inherently comes with it. Even the most experienced of statisticians will have code that stumps them or simply cannot find the error in. It is really important to approach this with a positive growth mindset of learning new skills and coming up with a plan. Sharing this with communities can be helpful too – in my first ventures I found the medtwitter community very helpful in sharing their top tips and starting places for coding in R. You can also implement this through using communities like GitHub where you can not only discuss, but look through open-source code uploaded by others.
Then practically it is about choosing a course and learning to play with various dummy datasets which you can find online. I highly recommend the short courses run by UCL on statistical computing with R programming, and there are also a number of free courses available on codeacademy. I also recommend the webpage Learning Statistics with R https://learningstatisticswithr.com/, and the Quick-R and SimpleR pages if this is the language you choose to use. AI technology has now also changed the face of coding for stats with Rtutor.ai, which harnesses the power of OpenAI’s platforms to translate your commands to R code. After this you should work on dummy projects to apply your knowledge and step out of your comfort zone – there are a number of these on Kaggle or you could even get involved with the non-profit DataKind who work on a number of data science projects for social impact.
This whole process will require a lot of patience, as it is impossible to perfect in a day or week – it just requires flexibility and adaptability, with the ability to think laterally to solve problems.