If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Biology library

Unit 1: lesson 2, the scientific method.


Scientific method example: Failure to toast

1. make an observation..

2. Ask a question.

3. Propose a hypothesis.

4. Make predictions.

5. Test the predictions.

6. Iterate.

Want to join the conversation?

Incredible Answer

Encyclopedia Britannica

scientific method

Our editors will review what you’ve submitted and determine whether to revise the article.

flow chart of scientific method

scientific method , mathematical and experimental technique employed in the sciences . More specifically, it is the technique used in the construction and testing of a scientific hypothesis .

The process of observing, asking questions, and seeking answers through tests and experiments is not unique to any one field of science. In fact, the scientific method is applied broadly in science, across many different fields. Many empirical sciences, especially the social sciences , use mathematical tools borrowed from probability theory and statistics , together with outgrowths of these, such as decision theory , game theory , utility theory, and operations research . Philosophers of science have addressed general methodological problems, such as the nature of scientific explanation and the justification of induction .

Earth's Place in the Universe. Introduction: The History of the Solar System. Aristotle's Philosophical Universe. Ptolemy's Geocentric Cosmos. Copernicus' Heliocentric System. Kepler's Laws of Planetary Motion.

The scientific method is critical to the development of scientific theories , which explain empirical (experiential) laws in a scientifically rational manner. In a typical application of the scientific method, a researcher develops a hypothesis , tests it through various means, and then modifies the hypothesis on the basis of the outcome of the tests and experiments. The modified hypothesis is then retested, further modified, and tested again, until it becomes consistent with observed phenomena and testing outcomes. In this way, hypotheses serve as tools by which scientists gather data. From that data and the many different scientific investigations undertaken to explore hypotheses, scientists are able to develop broad general explanations, or scientific theories.

See also Mill’s methods ; hypothetico-deductive method .

Book cover

How to Practice Academic Medicine and Publish from Developing Countries? pp 193–199 Cite as

How to Write the Introduction to a Scientific Paper?

24k Accesses

129 Altmetric

An Introduction to a scientific paper familiarizes the reader with the background of the issue at hand. It must reflect why the issue is topical and its current importance in the vast sea of research being done globally. It lays the foundation of biomedical writing and is the first portion of an article according to the IMRAD pattern ( I ntroduction, M ethodology, R esults, a nd D iscussion) [1].

I once had a professor tell a class that he sifted through our pile of essays, glancing at the titles and introductions, looking for something that grabbed his attention. Everything else went to the bottom of the pile to be read last, when he was tired and probably grumpy from all the marking. Don’t get put at the bottom of the pile, he said. Anonymous

Download chapter PDF

1 What is the Importance of an Introduction?

An Introduction to a scientific paper familiarizes the reader with the background of the issue at hand. It must reflect why the issue is topical and its current importance in the vast sea of research being done globally. It lays the foundation of biomedical writing and is the first portion of an article according to the IMRAD pattern ( I ntroduction, M ethodology, R esults, a nd D iscussion) [ 1 ].

It provides the flavour of the article and many authors have used phrases to describe it for example—'like a gate of the city’ [ 2 ], ‘the beginning is half of the whole’ [ 3 ], ‘an introduction is not just wrestling with words to fit the facts, but it also strongly modulated by perception of the anticipated reactions of peer colleagues’, [ 4 ] and ‘an introduction is like the trailer to a movie’. A good introduction helps captivate the reader early.

figure a

2 What Are the Principles of Writing a Good Introduction?

A good introduction will ‘sell’ an article to a journal editor, reviewer, and finally to a reader [ 3 ]. It should contain the following information [ 5 , 6 ]:

The known—The background scientific data

The unknown—Gaps in the current knowledge

Research hypothesis or question

Methodologies used for the study

The known consist of citations from a review of the literature whereas the unknown is the new work to be undertaken. This part should address how your work is the required missing piece of the puzzle.

3 What Are the Models of Writing an Introduction?

The Problem-solving model

First described by Swales et al. in 1979, in this model the writer should identify the ‘problem’ in the research, address the ‘solution’ and also write about ‘the criteria for evaluating the problem’ [ 7 , 8 ].

The CARS model that stands for C reating A R esearch S pace [ 9 , 10 ].

The two important components of this model are:

Establishing a territory (situation)

Establishing a niche (problem)

Occupying a niche (the solution)

In this popular model, one can add a fourth point, i.e., a conclusion [ 10 ].

4 What Is Establishing a Territory?

This includes: [ 9 ]

Stating the general topic and providing some background about it.

Providing a brief and relevant review of the literature related to the topic.

Adding a paragraph on the scope of the topic including the need for your study.

5 What Is Establishing a Niche?

Establishing a niche includes:

Stating the importance of the problem.

Outlining the current situation regarding the problem citing both global and national data.

Evaluating the current situation (advantages/ disadvantages).

Identifying the gaps.

Emphasizing the importance of the proposed research and how the gaps will be addressed.

Stating the research problem/ questions.

Stating the hypotheses briefly.

Figure 17.1 depicts how the introduction needs to be written. A scientific paper should have an introduction in the form of an inverted pyramid. The writer should start with the general information about the topic and subsequently narrow it down to the specific topic-related introduction.

figure 1

Flow of ideas from the general to the specific

6 What Does Occupying a Niche Mean?

This is the third portion of the introduction and defines the rationale of the research and states the research question. If this is missing the reviewers will not understand the logic for publication and is a common reason for rejection [ 11 , 12 ]. An example of this is given below:

Till date, no study has been done to see the effectiveness of a mesh alone or the effectiveness of double suturing along with a mesh in the closure of an umbilical hernia regarding the incidence of failure. So, the present study is aimed at comparing the effectiveness of a mesh alone versus the double suturing technique along with a mesh.

7 How Long Should the Introduction Be?

For a project protocol, the introduction should be about 1–2 pages long and for a thesis it should be 3–5 pages in a double-spaced typed setting. For a scientific paper it should be less than 10–15% of the total length of the manuscript [ 13 , 14 ].

8 How Many References Should an Introduction Have?

All sections in a scientific manuscript except the conclusion should contain references. It has been suggested that an introduction should have four or five or at the most one-third of the references in the whole paper [ 15 ].

9 What Are the Important Points Which Should be not Missed in an Introduction?

An introduction paves the way forward for the subsequent sections of the article. Frequently well-planned studies are rejected by journals during review because of the simple reason that the authors failed to clarify the data in this section to justify the study [ 16 , 17 ]. Thus, the existing gap in knowledge should be clearly brought out in this section (Fig. 17.2 ).

figure 2

How should the abstract, introduction, and discussion look

The following points are important to consider:

The introduction should be written in simple sentences and in the present tense.

Many of the terms will be introduced in this section for the first time and these will require abbreviations to be used later.

The references in this section should be to papers published in quality journals (e.g., having a high impact factor).

The aims, problems, and hypotheses should be clearly mentioned.

Start with a generalization on the topic and go on to specific information relevant to your research.

10 Example of an Introduction

figure b

11 Conclusions

An Introduction is a brief account of what the study is about. It should be short, crisp, and complete.

It has to move from a general to a specific research topic and must include the need for the present study.

The Introduction should include data from a literature search, i.e., what is already known about this subject and progress to what we hope to add to this knowledge.

Moore A. What’s in a discussion section? Exploiting 2-dimensionality in the online world. Bioassays. 2016;38(12):1185.

CrossRef   Google Scholar  

Annesley TM. The discussion section: your closing argument. Clin Chem. 2010;56(11):1671–4.

CrossRef   CAS   Google Scholar  

Bavdekar SB. Writing the discussion section: describing the significance of the study findings. J Assoc Physicians India. 2015;63(11):40–2.

PubMed   Google Scholar  

Foote M. The proof of the pudding: how to report results and write a good discussion. Chest. 2009;135(3):866–8.

Kearney MH. The discussion section tells us where we are. Res Nurs Health. 2017;40(4):289–91.

Ghasemi A, Bahadoran Z, Mirmiran P, Hosseinpanah F, Shiva N, Zadeh-Vakili A. The principles of biomedical scientific writing: discussion. Int J Endocrinol Metab. 2019;17(3):e95415.

Swales JM, Feak CB. Academic writing for graduate students: essential tasks and skills. Ann Arbor, MI: University of Michigan Press; 2004.

Google Scholar  

Colombo M, Bucher L, Sprenger J. Determinants of judgments of explanatory power: credibility, generality, and statistical relevance. Front Psychol. 2017;8:1430.

Mozayan MR, Allami H, Fazilatfar AM. Metadiscourse features in medical research articles: subdisciplinary and paradigmatic influences in English and Persian. Res Appl Ling. 2018;9(1):83–104.

Hyland K. Metadiscourse: mapping interactions in academic writing. Nordic J English Stud. 2010;9(2):125.

Hill AB. The environment and disease: association or causation? Proc Royal Soc Med. 2016;58(5):295–300.

Alpert JS. Practicing medicine in Plato’s cave. Am J Med. 2006;119(6):455–6.

Walsh K. Discussing discursive discussions. Med Educ. 2016;50(12):1269–70.

Polit DF, Beck CT. Generalization in quantitative and qualitative research: myths and strategies. Int J Nurs Stud. 2010;47(11):1451–8.

Jawaid SA, Jawaid M. How to write introduction and discussion. Saudi J Anaesth. 2019;13(Suppl 1):S18–9.

Jawaid SA, Baig M. How to write an original article. In: Jawaid SA, Jawaid M, editors. Scientific writing: a guide to the art of medical writing and scientific publishing. Karachi: Published by Med-Print Services; 2018. p. 135–50.

Hall GM, editor. How to write a paper. London: BMJ Books, BMJ Publishing Group; 2003. Structure of a scientific paper. p. 1–5.

Download references

Author information

Authors and affiliations.

Department of Surgical Gastroenterology and Liver Transplantation, Sir Ganga Ram Hospital, New Delhi, India

Samiran Nundy

Department of Internal Medicine, Sir Ganga Ram Hospital, New Delhi, India

Institute for Global Health and Development, The Aga Khan University, South Central Asia, East Africa and United Kingdom, Karachi, Pakistan

Zulfiqar A. Bhutta

You can also search for this author in PubMed   Google Scholar

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and Permissions

Copyright information

© 2022 The Author(s)

About this chapter

Cite this chapter.

Nundy, S., Kakar, A., Bhutta, Z.A. (2022). How to Write the Introduction to a Scientific Paper?. In: How to Practice Academic Medicine and Publish from Developing Countries?. Springer, Singapore. https://doi.org/10.1007/978-981-16-5248-6_17

Download citation

DOI : https://doi.org/10.1007/978-981-16-5248-6_17

Published : 24 October 2021

Publisher Name : Springer, Singapore

Print ISBN : 978-981-16-5247-9

Online ISBN : 978-981-16-5248-6

eBook Packages : Medicine Medicine (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Introduction, Methods and Results


The Introduction should provide readers with the background information needed to understand your study, and the reasons why you conducted your experiments. The Introduction should answer the question: what question/problem was studied?

While writing the background, make sure your citations are:

TIP: Do not write a literature review in your Introduction, but do cite reviews where readers can find more information if they want it.

Once you have provided background material and stated the problem or question for your study, tell the reader the purpose of your study. Usually the reason is to fill a gap in the knowledge or to answer a previously unanswered question. For example, if a drug is known to work well in one population, but has never been tested in a different population, the purpose of a study could be to test the efficacy and safety of the drug in the second population.

The final thing to include at the end of your Introduction is a clear and exact statement of your study aims. You might also explain in a sentence or two how you conducted the study.

Materials and Methods

This section provides the reader with all the details of how you conducted your study. You should:

TIP: Check the ‘Instructions for Authors’ for your target journal to see how manuscripts should present the Materials and Methods. Also, as another guide, look at previously published papers in the journal or sample reports on the journal website.

In the Results section, simply state what you found, but  do not interpret the results or discuss their implications.

TIP: There is a famous saying in English: “A picture is worth a thousand words.” This means that, sometimes, an image can explain your findings far better than text could. So make good use of figures and tables in your manuscript! However, avoid including redundant figures and tables (e.g. two showing the same thing in a different format), or using figures and tables where it would be better to just include the information in the text (e.g. where there is not enough data for a table or figure).

Back │ Next

Sample Paper in Scientific Format

Biology 151/152.

The sample paper below has been compressed into the left-hand column on the pages below. In the right-hand column we have included notes explaining how and why the paper is written as it is.

UCI Libraries Mobile Site

Libaries home page

Writing a Scientific Paper: METHODS

Writing a "good" methods section

The purpose is to provide enough detail that a competent worker could repeat the experiment. Many of your readers will skip this section because they already know from the Introduction the general methods you used. However careful writing of this section is important because for your results to be of scientific merit they must be reproducible. Otherwise your paper does not represent good science.

Goals: • Exact technical specifications and quantities and source or method of preparation • Describe equipment used and provide illustrations where relevant. • Chronological presentation (but related methods described together) • Questions about "how" and "how much" are answered for the reader and not left for them to puzzle over • Discuss statistical methods only if unusual or advanced • When a large number of components are used prepare tables for the benefit of the reader • Do not state the action without stating the agent of the action

"Methods Checklist" from: How to Write a Good Scientific Paper. Chris A. Mack. SPIE. 2018.

Method (Materials, Theory, Design, Modeling, etc.)

 Describe how the results were generated with sufficient detail so that an independent researcher (working in the same field) could reproduce the results sufficiently to allow validation of the conclusions.

o Can the reader assess internal validity (conclusions are supported by the results presented)?

o Can the reader assess external validity (conclusions are properly generalized beyond these specific results)?

 Has the chosen method been justified?

 Are data analysis and statistical approaches justified, with assumptions and biases considered?

 Avoid: including results in the Method section; including extraneous details (unnecessary to enable reproducibility or judge validity); treating the method as a chronological history of events; unneeded references to commercial products; references to “proprietary” products or processes unavailable to the reader. 

Off-campus? Please use the Software VPN and choose the group UCIFull to access licensed content. For more information, please Click here

Software VPN is not available for guests, so they may not have access to some content when connecting from off-campus.

Writing a First-Class Scientific Paper – Best Tips and Examples

01 September, 2021

13 minutes read

Author:  Kate Smith

A scientific paper is a nightmare to most students and even experienced researchers. After all, this time-consuming process involves library journeys, dozens of writing hours, and intense mental effort. What if you do this task for the first time? What is a scientific paper, how long should it be, and how do you write it? We will help you figure this all out, so keep reading.

scientific paper

What is a Scientific Paper?

A scientific paper is a manuscript that reports scientific findings to the public. Scientists publish research pieces in scientific journals, and you have probably come across several scientific papers while doing your homework. These pieces are usually 3,000 – 10,000 words long.

Writing a scientific paper is intimidating because most students and researchers struggle to encapsulate raw data into digestible format. How do you put all those numbers on paper then? Well, for this, you should stick to a specific scientific paper structure. Check it out below.

Understanding the Scientific Paper Outline

In general, all writing pieces follow an outline consisting of three main elements:

A scientific paper is no exception, but its outline comprises more parts within these three elements:

As you see, a scientific paper contains nine parts, but they all fall into three categories. However, following a logical and clear scientific paper outline is not enough to complete this task with flying colors. The truth is that the topic of your scientific paper matters much more than its outline. How to choose the right topic then? Check this out below.

How to Choose a Topic for a Scientific Paper?

Here is why you should think twice before selecting a topic for your scientific paper:

Now, let’s check the effective tips on choosing your research topic:

Narrow Down the Scope of Your Research

Let’s say you’re going to discuss global warming, and you chose the “global warming” title. But what are you going to research in the first place? The temperature increase rates? Livestock as the primary contributor to CO2 emissions? Or the practices to postpone the inevitable death of human civilization?

Fitting gigabytes of related information into a scientific paper is impossible. Therefore, you should narrow down your research topic.

Choose Manageable Topics

If choosing a widely discussed/solved topic or attempting something revolutionary is a bad idea, you should aim for the happy middle.

Select a subject with enough coverage and potential for further discoveries. For example, proving that permafrost is melting is a bad idea. Every news channel is screaming about it. But calculating the future ice melting rates based on the current climate situation is much more captivating and valuable.

Pick Debatable Topics

Well, let’s talk more about global warming. Scientists and politicians scratch their heads over slowing down climate change. Some say people should eat less meat, drive/fly less, and consume fewer plastic products. Others say companies should shut down their factories that pump millions of tons of CO2 into the air 24/7.

But what if overpopulation is an overlooked trigger of global warming? What if governments should invest more cash into birth control research programs? One can develop this highly debatable topic into a winning scientific paper with eye-opening calculations and projections.

How to Write a Scientific Paper Step By Step

How to start a scientific paper .

How to End a Scientific Paper?

In the end, revise your paper and make sure it’s error-free, authentic, and follows the designated format. Assuming that your paper is around 3,000-10,000 words, you will fail to edit and proofread it in one go. Therefore, you have to split this work into several stages.

First, you can do the heavy editing. Perhaps, you would want to rewrite entire paragraphs. Then, you can check your paper for factual errors, inconsistencies, and illogical statements. After that, you can switch to proofreading. It’s better to dedicate a day or two to this job because a fresh pair of eyes will spot many more errors than a tired one.

How to Write Scientific Paper Sections?

Writing sections of a scientific paper is no joke, especially for the first time. But after implementing our writing tips, you will nail it. Check how you can write each section of your scientific paper below.

How to Write an Abstract for a Scientific Paper?

An abstract is a 200-250-word summary of your paper. It explains to your reader the sense of your research. Scientists believe that an ideal abstract is a standalone piece. Your reader should understand your research without reading the full text. Your abstract should convey the essence of your study through these mini sections:

As you see, the abstract copies the general paper structure on a smaller scale.

Scientific Paper Introduction

A scientific paper introduction provides background information, explains the significance of your research, and guides the reader further to the body of your paper. The introduction answers the following questions:

Scientific Paper Body

The Methods section describes what you did to achieve the goal of your paper. It explains how you did your research and what steps you took. As scientists say, this part must provide your reader with enough information to repeat your experiment and get the same results.

Think of it as a recipe. While searching for a stewed beef recipe, you expect one to tell you how much meat, salt, oil, and pepper you need, and for how long you should cook the meal. The same applies to scientific papers. 

The Results section describes your findings based on your research methods and explains how they correlate with the goal of your paper. A good rule of thumb is to include graphs and tables to illustrate your results.

Check these tips for writing a meaningful Results section:

Scientific Paper Conclusion

Discussion is one of the most challenging parts of a scientific paper. After all, you have to interpret your results, give them meaning, find dependencies, relationships, etc. The discussion piece aims to:

Scientific Paper Examples

If you are searching for a proper scientific paper example, here are a couple of samples to lead you in:

Scientific Paper Writing Tips

These three tips will help you take your research to the next level. Read further, and you will understand how.

Incorporate Simple Language

Your paper must deliver a message to your audience, so ensure that your readers will understand your findings.

Sure, scientific writing implies strong arguments and academic language. But jargon, abstract terms, and filler words don’t make papers scientific. Clarity and precision do that instead.

To keep your paper clear and concise, you should avoid wordy phrases, passive voice, and long sentences as much as possible. Here is a bad example:

“In the event that the sentence of a scientific paper exceeds 25 words, its structural components must be separated in order to achieve clarity of writing.” Here is a good example: “You should keep your sentences up to 25 words long for higher readability.”

Some research topics, however, may not allow such simple writing techniques. Therefore, you have to balance scientific jargon and plain language. By doing so, you will perform better than most researchers do.

Avoid Zombie Nouns

Zombie nouns are made from other parts of speech. For instance, apply – application, assume – assumption, prepare – preparation, indicate – indication, etc.

Many scholars stuff their papers with zombie nouns insofar that nobody understands what they want to say. Here is a bad example:

“The prognostication of further global temperature inflation leads to the conclusion that the polar ice cap termination is possible in the nearest future.”

You can transform this piece into a shorter sentence though: “Scientists predict that the polar ice cap will melt soon due to climate change.”

Use Writing Tools

Why not use writing tools like grammar and plagiarism checkers, citation machines, and readability tools? They will save you hours of editing and proofreading because you can reduce errors in real-time while using them. Check these helpful writing tools:

Writing Help by HandmadeWriting

Writing a scientific paper is challenging but pretty doable once you apply all the tips we mentioned above. In practice, a scientific paper doesn’t differ too much from other academic tasks regarding the actual writing process.

Treat it like a detailed essay – research and wrap your findings into precise logical language. But if you feel you’re writing a scientific paper against the clock, you can delegate this assignment to our essay writers and we will do all the heavy lifting for you. Just place an order, and once it’s done, hand the first-class scientific paper to your professor.

Best Essay Writing Services

Best Essay Writing Services 2023

Student life can often be quite challenging because students have to deal with challenging college essay writing assignments. To facilitate the learning process, many services help you complete written work and get high scores. Now we will tell you about the best services that you can turn to and get high-quality papers. Essay Writing Service […]

A life lesson in Romeo and Juliet taught by death

A life lesson in Romeo and Juliet taught by death

Due to human nature, we draw conclusions only when life gives us a lesson since the experience of others is not so effective and powerful. Therefore, when analyzing and sorting out common problems we face, we may trace a parallel with well-known book characters or real historical figures. Moreover, we often compare our situations with […]

Nursing Research Paper Topics

Nursing Research Paper Topics

Selecting an academic paper topic is a crucial step in the writing process. The variety of nursing research topics makes it challenging to find the appropriate paper theme. But if you choose a sound nursing research paper subject, it will contribute to a flawless thesis statement, using relevant resources, a smooth writing process, and impressive […]

Enago Academy

How to Write the Methods Section of a Scientific Article

' src=

What Is the Methods Section of a Research Paper?

The Methods section of a research article includes an explanation of the procedures used to conduct the experiment. For authors of scientific research papers , the objective is to present their findings clearly and concisely and to provide enough information so that the experiment can be duplicated.

Research articles contain very specific sections, usually dictated by either the target journal or specific style guides. For example, in the social and behavioral sciences, the American Psychological Association (APA) style guide is used to gather information on how the manuscript should be arranged . As with most styles, APA’s objectives are to ensure that manuscripts are written with minimum distractions to the reader. Every research article should include a detailed Methods section after the Introduction.

Why is the Methods Section Important?

The Methods section (also referred to as “Materials and Methods”) is important because it provides the reader enough information to judge whether the study is valid and reproducible.

Structure of the Methods Section in a Research Paper

While designing a research study, authors typically decide on the key points that they’re trying to prove or the “ cause-and-effect relationship ” between objects of the study. Very simply, the study is designed to meet the objective. According to APA, a Methods section comprises of the following three subsections: participants, apparatus, and procedure.

How do You Write a Method Section in Biology?

In biological sciences, the Methods section might be more detailed, but the objectives are the same—to present the study clearly and concisely so that it is understandable and can be duplicated.

If animals (including human subjects) were used in the study, authors should ensure to include statements that they were treated according to the protocols outlined to ensure that treatment is as humane as possible.

Research conducted at an institution using human participants is overseen by the Institutional Review Board (IRB) with which it is affiliated. IRB is an administrative body whose purpose is to protect the rights and welfare of human subjects during their participation in the study.

Literature Search

Literature searches are performed to gather as much information as relevant from previous studies. They are important for providing evidence on the topic and help validate the research. Most are accomplished using keywords or phrases to search relevant databases. For example, both MEDLINE and PubMed provide information on biomedical literature. Google Scholar, according to APA, is “one of the best sources available to an individual beginning a literature search.” APA also suggests using PsycINFO and refers to it as “the premier database for locating articles in psychological science and related literature.”

Authors must make sure to have a set of keywords (usually taken from the objective statement) to stay focused and to avoid having the search move far from the original objective. Authors will benefit by setting limiting parameters, such as date ranges, and avoiding getting pulled into the trap of using non-valid resources, such as social media, conversations with people in the same discipline, or similar non-valid sources, as references.

Related: Ready with your methods section and looking forward to manuscript submission ? Check these journal selection guidelines now!

What Should be Included in the Methods Section of a Research Paper?

One commonly misused term in research papers is “methodology.” Methodology refers to a branch of the Philosophy of Science which deals with scientific methods, not to the methods themselves, so authors should avoid using it. Here is the list of main subsections that should be included in the Methods section of a research paper ; authors might use subheadings more clearly to describe their research.

What Should not be Included in Your Methods Section?

Common pitfalls can make the manuscript cumbersome to read or might make the readers question the validity of the research. The University of Southern California provides some guidelines .

According to the University of Richmond , authors must avoid including extensive details or an exhaustive list of equipment that have been used as readers could quickly lose attention. These unnecessary details add nothing to validate the research and do not help the reader understand how the objective was satisfied. A well-thought-out Methods section is one of the most important parts of the manuscript. Authors must make a note to always prepare a draft that lists all parts, allow others to review it, and revise it to remove any superfluous information.

' src=

m so confused about ma research but now m okay so thank uh so mxh

Rate this article Cancel Reply

Your email address will not be published.

scientific method paper introduction

Enago Academy's Most Popular

difference between abstract and introduction

Abstract Vs. Introduction — Do you know the difference?

Ross wants to publish his research. Feeling positive about his research outcomes, he begins to…

scientific method paper introduction

Demystifying Research Methodology with Field Experts

Choosing research methodology Research design and methodology Evidence-based research approach How RAxter can assist researchers

Best Research Methodology

How to Choose Best Research Methodology for Your Study

Successful research conduction requires proper planning and execution. While there are multiple reasons and aspects…

Methods and Methodology

Top 5 Key Differences Between Methods and Methodology

While burning the midnight oil during literature review, most researchers do not realize that the…

scientific method paper introduction


如何寻找原创研究课题 快速定位目标文献的有效搜索策略 如何根据期刊指南准备手稿的对应部分 论文手稿语言润色实用技巧分享

How to Draft the Acknowledgment Section of a Manuscript

Discussion Vs. Conclusion: Know the Difference Before Drafting Manuscripts

Annex Vs. Appendix: Do You Know the Difference?

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

We hate spam too. We promise to protect your privacy and never spam you.

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

scientific method paper introduction

For what are you most likely to depend on AI-assistance?

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

How To Write A Lab Report | Step-by-Step Guide & Examples

Published on May 20, 2021 by Pritha Bhandari . Revised on July 15, 2022.

A lab report conveys the aim, methods, results, and conclusions of a scientific experiment. The main purpose of a lab report is to demonstrate your understanding of the scientific method by performing and evaluating a hands-on lab experiment. This type of assignment is usually shorter than a research paper .

Lab reports are commonly used in science, technology, engineering, and mathematics (STEM) fields. This article focuses on how to structure and write a lab report.

Table of contents

Structuring a lab report, introduction, frequently asked questions about lab reports.

The sections of a lab report can vary between scientific fields and course requirements, but they usually contain the purpose, methods, and findings of a lab experiment .

Each section of a lab report has its own purpose.

Although most lab reports contain these sections, some sections can be omitted or combined with others. For example, some lab reports contain a brief section on research aims instead of an introduction, and a separate conclusion is not always required.

If you’re not sure, it’s best to check your lab report requirements with your instructor.

Your title provides the first impression of your lab report – effective titles communicate the topic and/or the findings of your study in specific terms.

Create a title that directly conveys the main focus or purpose of your study. It doesn’t need to be creative or thought-provoking, but it should be informative.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

See an example

scientific method paper introduction

An abstract condenses a lab report into a brief overview of about 150–300 words. It should provide readers with a compact version of the research aims, the methods and materials used, the main results, and the final conclusion.

Think of it as a way of giving readers a preview of your full lab report. Write the abstract last, in the past tense, after you’ve drafted all the other sections of your report, so you’ll be able to succinctly summarize each section.

To write a lab report abstract, use these guiding questions:

Nitrogen is a necessary nutrient for high quality plants. Tomatoes, one of the most consumed fruits worldwide, rely on nitrogen for healthy leaves and stems to grow fruit. This experiment tested whether nitrogen levels affected tomato plant height in a controlled setting. It was expected that higher levels of nitrogen fertilizer would yield taller tomato plants.

Levels of nitrogen fertilizer were varied between three groups of tomato plants. The control group did not receive any nitrogen fertilizer, while one experimental group received low levels of nitrogen fertilizer, and a second experimental group received high levels of nitrogen fertilizer. All plants were grown from seeds, and heights were measured 50 days into the experiment.

The effects of nitrogen levels on plant height were tested between groups using an ANOVA. The plants with the highest level of nitrogen fertilizer were the tallest, while the plants with low levels of nitrogen exceeded the control group plants in height. In line with expectations and previous findings, the effects of nitrogen levels on plant height were statistically significant. This study strengthens the importance of nitrogen for tomato plants.

Your lab report introduction should set the scene for your experiment. One way to write your introduction is with a funnel (an inverted triangle) structure:

Begin by providing background information on your research topic and explaining why it’s important in a broad real-world or theoretical context. Describe relevant previous research on your topic and note how your study may confirm it or expand it, or fill a gap in the research field.

This lab experiment builds on previous research from Haque, Paul, and Sarker (2011), who demonstrated that tomato plant yield increased at higher levels of nitrogen. However, the present research focuses on plant height as a growth indicator and uses a lab-controlled setting instead.

Next, go into detail on the theoretical basis for your study and describe any directly relevant laws or equations that you’ll be using. State your main research aims and expectations by outlining your hypotheses .

Based on the importance of nitrogen for tomato plants, the primary hypothesis was that the plants with the high levels of nitrogen would grow the tallest. The secondary hypothesis was that plants with low levels of nitrogen would grow taller than plants with no nitrogen.

Your introduction doesn’t need to be long, but you may need to organize it into a few paragraphs or with subheadings such as “Research Context” or “Research Aims.”

A lab report Method section details the steps you took to gather and analyze data. Give enough detail so that others can follow or evaluate your procedures. Write this section in the past tense. If you need to include any long lists of procedural steps or materials, place them in the Appendices section but refer to them in the text here.

You should describe your experimental design, your subjects, materials, and specific procedures used for data collection and analysis.

Experimental design

Briefly note whether your experiment is a within-subjects  or between-subjects design, and describe how your sample units were assigned to conditions if relevant.

A between-subjects design with three groups of tomato plants was used. The control group did not receive any nitrogen fertilizer. The first experimental group received a low level of nitrogen fertilizer, while the second experimental group received a high level of nitrogen fertilizer.

Describe human subjects in terms of demographic characteristics, and animal or plant subjects in terms of genetic background. Note the total number of subjects as well as the number of subjects per condition or per group. You should also state how you recruited subjects for your study.

List the equipment or materials you used to gather data and state the model names for any specialized equipment.

List of materials

35 Tomato seeds

15 plant pots (15 cm tall)

Light lamps (50,000 lux)

Nitrogen fertilizer

Measuring tape

Describe your experimental settings and conditions in detail. You can provide labelled diagrams or images of the exact set-up necessary for experimental equipment. State how extraneous variables were controlled through restriction or by fixing them at a certain level (e.g., keeping the lab at room temperature).

Light levels were fixed throughout the experiment, and the plants were exposed to 12 hours of light a day. Temperature was restricted to between 23 and 25℃. The pH and carbon levels of the soil were also held constant throughout the experiment as these variables could influence plant height. The plants were grown in rooms free of insects or other pests, and they were spaced out adequately.

Your experimental procedure should describe the exact steps you took to gather data in chronological order. You’ll need to provide enough information so that someone else can replicate your procedure, but you should also be concise. Place detailed information in the appendices where appropriate.

In a lab experiment, you’ll often closely follow a lab manual to gather data. Some instructors will allow you to simply reference the manual and state whether you changed any steps based on practical considerations. Other instructors may want you to rewrite the lab manual procedures as complete sentences in coherent paragraphs, while noting any changes to the steps that you applied in practice.

If you’re performing extensive data analysis, be sure to state your planned analysis methods as well. This includes the types of tests you’ll perform and any programs or software you’ll use for calculations (if relevant).

First, tomato seeds were sown in wooden flats containing soil about 2 cm below the surface. Each seed was kept 3-5 cm apart. The flats were covered to keep the soil moist until germination. The seedlings were removed and transplanted to pots 8 days later, with a maximum of 2 plants to a pot. Each pot was watered once a day to keep the soil moist.

The nitrogen fertilizer treatment was applied to the plant pots 12 days after transplantation. The control group received no treatment, while the first experimental group received a low concentration, and the second experimental group received a high concentration. There were 5 pots in each group, and each plant pot was labelled to indicate the group the plants belonged to.

50 days after the start of the experiment, plant height was measured for all plants. A measuring tape was used to record the length of the plant from ground level to the top of the tallest leaf.

In your results section, you should report the results of any statistical analysis procedures that you undertook. You should clearly state how the results of statistical tests support or refute your initial hypotheses.

The main results to report include:

The mean heights of the plants in the control group, low nitrogen group, and high nitrogen groups were 20.3, 25.1, and 29.6 cm respectively. A one-way ANOVA was applied to calculate the effect of nitrogen fertilizer level on plant height. The results demonstrated statistically significant ( p = .03) height differences between groups.

Next, post-hoc tests were performed to assess the primary and secondary hypotheses. In support of the primary hypothesis, the high nitrogen group plants were significantly taller than the low nitrogen group and the control group plants. Similarly, the results supported the secondary hypothesis: the low nitrogen plants were taller than the control group plants.

These results can be reported in the text or in tables and figures. Use text for highlighting a few key results, but present large sets of numbers in tables, or show relationships between variables with graphs.

You should also include sample calculations in the Results section for complex experiments. For each sample calculation, provide a brief description of what it does and use clear symbols. Present your raw data in the Appendices section and refer to it to highlight any outliers or trends.

The Discussion section will help demonstrate your understanding of the experimental process and your critical thinking skills.

In this section, you can:

Interpreting your results involves clarifying how your results help you answer your main research question. Report whether your results support your hypotheses.

Compare your findings with other research and explain any key differences in findings.

An effective Discussion section will also highlight the strengths and limitations of a study.

When describing limitations, use specific examples. For example, if random error contributed substantially to the measurements in your study, state the particular sources of error (e.g., imprecise apparatus) and explain ways to improve them.

The results support the hypothesis that nitrogen levels affect plant height, with increasing levels producing taller plants. These statistically significant results are taken together with previous research to support the importance of nitrogen as a nutrient for tomato plant growth.

However, unlike previous studies, this study focused on plant height as an indicator of plant growth in the present experiment. Importantly, plant height may not always reflect plant health or fruit yield, so measuring other indicators would have strengthened the study findings.

Another limitation of the study is the plant height measurement technique, as the measuring tape was not suitable for plants with extreme curvature. Future studies may focus on measuring plant height in different ways.

The main strengths of this study were the controls for extraneous variables, such as pH and carbon levels of the soil. All other factors that could affect plant height were tightly controlled to isolate the effects of nitrogen levels, resulting in high internal validity for this study.

Your conclusion should be the final section of your lab report. Here, you’ll summarize the findings of your experiment, with a brief overview of the strengths and limitations, and implications of your study for further research.

Some lab reports may omit a Conclusion section because it overlaps with the Discussion section, but you should check with your instructor before doing so.

A lab report conveys the aim, methods, results, and conclusions of a scientific experiment . Lab reports are commonly assigned in science, technology, engineering, and mathematics (STEM) fields.

The purpose of a lab report is to demonstrate your understanding of the scientific method with a hands-on lab experiment. Course instructors will often provide you with an experimental design and procedure. Your task is to write up how you actually performed the experiment and evaluate the outcome.

In contrast, a research paper requires you to independently develop an original argument. It involves more in-depth research and interpretation of sources and data.

A lab report is usually shorter than a research paper.

The sections of a lab report can vary between scientific fields and course requirements, but it usually contains the following:

The results chapter or section simply and objectively reports what you found, without speculating on why you found these results. The discussion interprets the meaning of the results, puts them in context, and explains why they matter.

In qualitative research , results and discussion are sometimes combined. But in quantitative research , it’s considered important to separate the objective results from your interpretation of them.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2022, July 15). How To Write A Lab Report | Step-by-Step Guide & Examples. Scribbr. Retrieved March 3, 2023, from https://www.scribbr.com/academic-writing/lab-report/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, guide to experimental design | overview, steps, & examples, how to write an apa methods section, how to write an apa results section, what is your plagiarism score.

scientific method paper introduction

scientific method paper introduction

Paper Rockets to Learn the Scientific Method

scientific method paper introduction

Learning Objectives

NGSS Alignment

Materials needed to make paper rockets

To make model rockets you'll need paper, scissors, tape, straws, and a tape measure.

Background Information for Teachers

This lesson is designed to guide your students through the steps of the scientific method (Figure 1) using a fun, hands-on project: paper rockets. You can read about the scientific method, or assign your students to read about it, in much more detail in this guide .

Diagram of the scientific method

The scientific method starts with a question, then background research is conducted to try to answer that question. If you want to find evidence for an answer or an answer itself then you construct a hypothesis and test that hypothesis in an experiment. If the experiment works and the data is analyzed you can either prove or disprove your hypothesis. If your hypothesis is disproved, then you can go back with the new information gained and create a new hypothesis to start the scientific process over again.

Your students will build small rockets out of paper and tape. The rocket fits onto a straw, and can be launched by blowing into the straw (Figure 2).

Paper rocket attached to a straw

Before doing this lesson with your students, it will help if you are familiar with some of the basic science concepts behind the paper rocket's flight. Depending on what variables students choose to investigate for their projects, they may need to do more research about some of these concepts. The references in the Additional Background section provide an overview of these concepts. Here are some potential points of confusion students may have:

Forces acting on a rocket

The thrust of the rocket comes from the gas being expelled and acts as a force that pushes the rocket upwards. The weight of the rocket has a force that pulls the rest of the rocket straight towards the ground. Drag is similar to friction and will act as a force opposite of the rockets direction of travel, and lift created by the air passing over the rocket body will create a force perpendicular to the thrust force.

Prep Work (5 minutes)

Engage (10 minutes), explore (90 minutes), reflect (90 minutes), explore our science videos.

scientific method paper introduction

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Similar articles being viewed by others

Slider with three articles shown per slide. Use the Previous and Next buttons to navigate the slides or the slide controller buttons at the end to navigate through each slide.

scientific method paper introduction

Incorporating chemical sub-structures and protein evolutionary information for inferring drug-target interactions

20 April 2020

Lei Wang, Zhu-Hong You, … Wei Zhang

scientific method paper introduction

New machine learning and physics-based scoring functions for drug discovery

04 February 2021

Isabella A. Guedes, André M. S. Barreto, … Maria A. Miteva

scientific method paper introduction

XGB-DrugPred: computational prediction of druggable proteins using eXtreme gradient boosting and optimized features set

01 April 2022

Rahu Sikander, Ali Ghulam & Farman Ali

scientific method paper introduction

Feature importance correlation from machine learning indicates functional relationships between proteins and similar compound binding characteristics

09 July 2021

Raquel Rodríguez-Pérez & Jürgen Bajorath

scientific method paper introduction

Predicting protein-ligand interactions based on bow-pharmacological space and Bayesian additive regression trees

22 May 2019

Li Li, Ching Chiek Koh, … Dong-Qing Wei

scientific method paper introduction

Using Two-dimensional Principal Component Analysis and Rotation Forest for Prediction of Protein-Protein Interactions

27 August 2018

Lei Wang, Zhu-Hong You, … Yong Zhou

scientific method paper introduction

A machine learning method based on the genetic and world competitive contests algorithms for selecting genes or features in biological applications

08 February 2021

Yosef Masoudi-Sobhanzadeh, Habib Motieghader, … Ali Masoudi-Nejad

scientific method paper introduction

Identification of novel inhibitors of Keap1/Nrf2 by a promising method combining protein–protein interaction-oriented library and machine learning

01 April 2021

Yugo Shimizu, Tomoki Yonezawa, … Kazuyoshi Ikeda

scientific method paper introduction

Prediction of kinase inhibitors binding modes with machine learning and reduced descriptor sets

12 January 2021

Ibrahim Abdelbaky, Hilal Tayara & Kil To Chong

Drug–target interaction prediction based on protein features, using wrapper feature selection

Scientific Reports volume  13 , Article number:  3594 ( 2023 ) Cite this article

3 Altmetric

Metrics details

Drug–target interaction prediction is a vital stage in drug development, involving lots of methods. Experimental methods that identify these relationships on the basis of clinical remedies are time-taking, costly, laborious, and complex introducing a lot of challenges. One group of new methods is called computational methods. The development of new computational methods which are more accurate can be preferable to experimental methods, in terms of total cost and time. In this paper, a new computational model to predict drug–target interaction (DTI), consisting of three phases, including feature extraction, feature selection, and classification is proposed. In feature extraction phase, different features such as EAAC, PSSM and etc. would be extracted from sequence of proteins and fingerprint features from drugs. These extracted features would then be combined. In the next step, one of the wrapper feature selection methods named IWSSR, due to the large amount of extracted data, is applied. The selected features are then given to rotation forest classification, to have a more efficient prediction. Actually, the innovation of our work is that we extract different features; and then select features by the use of IWSSR. The accuracy of the rotation forest classifier based on tenfold on the golden standard datasets (enzyme, ion channels, G-protein-coupled receptors, nuclear receptors) is as follows: 98.12, 98.07, 96.82, and 95.64. The results of experiments indicate that the proposed model has an acceptable rate in DTI prediction and is compatible with the proposed methods in other papers.


Predicting the interactions between drugs and targets is vital in the drug discovery task. Recently, the focus of researchers has been on innovative drug development strategies on the basis of knowledge regarding the available drugs 1 . In order to attain their functions, drugs are generally coated with at least one protein. Therefore, finding out new interactions among drugs and target proteins is pivotal for new drug development, because the misconceived statement of proteins may give rise to drug side effects 2 . Identifying DTIs is highly crucial in discovering and developing new drugs. Due to the high cost and the time required to recognize DTIs experimentally, computational approaches have been suggested which can recognize potential DTIs in order to accelerate developing new drugs 3 . Valuable insights into the function of the drug mechanism are the results of computational approaches for DTI prediction 4 . Computational approaches fall into three categories: Ligand-based approaches, Docking-based approaches and Chemogenomic-based approaches 5 . Each approach has its advantages and disadvantages. Ligand-based approaches are beneficial even in the absence of an empirical 3-dimensional structure. These approaches have high computational complexity and require large amount of data to obtain correct information 6 . Docking-based approaches model the reality more accurately, despite their high computational cost and low scalability. Another advantage of these approaches is that they are as flexible as Ligand-based approaches. These approaches problem is the lack of data 3-dimensional structure. Considering that they require this 3-dimensional structure, Ligand-based approaches are proposed that these approaches will work well even in the case of the lack of data 3-dimensional structure 7 . Third category of computational approaches are chemogenomic-based approaches. One of the advantages of this approaches is that special analogs in drugs can be detected more easily. Another benefit of these approaches is that the coverage of the chemical space is more complete. Moreover, the results obtained from a drug may be used for the discovery of relevant drugs. In addition, using this approach makes attaining structure–activity relationships easier 8 . The basis of the studies on the prediction of DTIs can be one of the methods of machine learning. Machine learning methods in this area include feature based methods (FBM), Kernel based methods (KBM), and Similarity-based methods (SBM) 9 .

Newly, kernel-based methods have been widely applied to identify DTIs. In addition to modeling nonlinear relationships, these methods propose models that can be applied to various data such as stings and time-series data. The problem with these methods is that the proposed models have low interpretability and understanding. Also, if large datasets are applied, these methods are not computationally efficient 10 .

In feature-based approaches, each Drug and protein is represented by a numerical feature vector, which demonstrates the different types of physical, chemical, and molecular features of each of the relevant samples 11 . One of the advantages of feature extraction methods is that they can reveal the intrinsic features of compounds and targets that have a crucial role in DTIs, the outcome of which would be more interpretable 11 .

Feature-based methods are divided into two categories: methods according to deep learning, and classical feature-based methods 12 . The input to deep learning methods is often the protein sequence and the structure of the drug. From this type of data, different features are extracted during different layers. In the end, the prediction of DTIs occurs in the final layer 13 , 14 .

In 15 sequence-based deep learning, 16 deep neural multi-function learning, 17 deep convolution neural networks, 18 light deep convolution neural networks, 19 end-to-end deep learning approaches are applied to predict interactions between drug and target. In using Autoencoders, we can also mention 20 and 21 that were done in 2021.

The remaining of the paper is organized as follows. In the next Section, we introduce the related works. Then we explain the method. After that, we report experimental results obtained on different classification. Finally, we draw the conclusions.

Related works

Numerous computational methods have been developed for DTI prediction problem. In 2021, Jiajie Peng and colleagues used the learning representation graph to provide a framework 22 . In another study, the data needed to predict DTIs were described 1 .

Kernel-based methods are one of the machine learning methods that many people have studied in this field. Muhammad Ammad-ud-din et al. analyzed integrated and personalized QSAR approaches in cancer by kernelized Bayesian Matrix Factorization 23 . In a study conducted in 2018, Anna Cichonska et al. proposed a method with multiple pairwise kernels for effective memory and time learning 24 . Another important category is similarity-based methods 25 . Similarity-based approaches rely on the hypothesis that compounds which are biologically, topologically, and chemically similar, have similar functions and bioactivity, therefore have similar targets 26 . In 27 a similarity-based monitoring technique was presented to identify the interactions among new drugs and known targets.

In order to predict DTI, a similarity model is proposed, in 2021 that uses two-dimensional CNN in the external products between column vectors corresponding to two similarity matrices in drugs and targets 28 .

There are also various machine learning methods for this prediction. Using multi-tag learning, Seo May et al. represented a framework for predicting interactions 29 . In another work by Nin Metai et al. in 2020, similarity-based methods, as well as machine learning approaches, were used 30 . Although machine learning-based methods have been proven to be effective in identifying DTIs, there are still many challenges:

Most methods that are in the form of supervised learning have difficulty selecting negative samples.

Predictive models on the basis of machine learning are usually constructed and evaluated with overly simple experimental settings.

Most machine learning-based methods have poor descriptive features. Therefore, it is difficult to distinguish a potential drug mechanism from its function considering a pharmacological perspective 31 , 32 .

More generally, the key challenges in predicting DTIs include the extraction of all critical drug–target features, the issue of data inconsistencies, and data class imbalances during the prediction process. Feature-based methods are one of the machine learning methods that many people have studied in this field. Articles that have been written so far based on feature-based methods for identifying DTIs have often been innovative in four areas: feature extraction, feature selection, balancing and new classifier 33 .

In the field of feature extraction, Cheng Wong et al. tested features with fingerprint for electro topological status of drugs and APAAC of target proteins in 2020 32 . In 2021, a FastUS algorithm was proposed to work with unbalanced data 34 .

In 2 , the features of drugs and proteins are combined to provide the features of per drug-protein pair. In 35 they has proposed a new predictive method that used the SMOTE method to work with data that is not balanced. In 36 , Zheng Yang et al. applied a new computational model along with the PSHOG gradient and the PSSM matrix for feature extraction. In a 2020 study, a new computational approach was proposed which used the GIST feature 37 . In another study by Zheng Wong et al. in 2020, a useful computational methodology was proposed which applied protein sequence information 38 .

In another study 39 , an efficient computational method was proposed using the Rotation Forest classifier and the LBP feature extraction method in predicting PPIs from the PSSM matrix. In 2019, Hassan Mahmoud et al. proposed a new computational model to identify DTIs 40 . In the realm of proposing a new classifier, Dmitry Karasov et al. proposed an approach providing the Fuzzy classification of target sequences 41 . In another study in 2020, a new DTI prediction method was proposed in which bi-clustering trees were built on reconstructed networks 42 .

In the present methods, no attention has been thrown to the extraction of effective features. While this matter causes a high discrimination quality, an increase in the verification rate, and therefore a higher detection quality. Furthermore, in extracting features, the dimension of the features is high, so this issue is needed to be managed.

Data imbalance is another problem that currently exists. So that unknown interactions are many times more than True-Positive interactions. As a result, the imbalance between the two classes is a challenge that needs to be worked on.

In addition to the challenges that are commonly associated with deep learning-based DTI models, due to the fact that deep learning methods require a large amount of data for network training and also have a high computational load, we have omitted this method in this study. Hence, classical methods have been considered, in which the feature is extracted from the sequence of drug and protein 1 , 43 .

In this work, a machine-based learning method is proposed to identify DTIs. In this method, first, different features are extracted from the sequence of proteins, and the feature vector of proteins is formed. Then, a fingerprint is extracted from the structure of the drug. These features are combined, that Due to the high dimension of the features, the features are then selected based on the IWSSR method. Finally, the rotation forest model is then trained to identify interactions. Figure  1 shows the proposed method flowchart. The details of each step are given below.

figure 1

General steps of the proposed method.

Feature extraction

In this step, the information of each sequence is returned to a numeric vector by the use of a feature extraction algorithm. This step is one of the most important steps in classification phase that will directly affect the results of the model prediction. Regarding the fact that this study has two inputs of drug and protein, feature extraction is divided into two categories: feature extraction from drugs and feature extraction from proteins.

Feature extraction of drugs

Researchers have shown that molecular fingerprints can describe the structure of a drug. The fingerprint of structural relationships shows drugs as the vectors of Boolean substructure through separating the molecular structure of drugs into various sections.

Even though each molecule is divided into separate parts, it preserves the structural information of the entire drug. These descriptors curtail the possibility of information failure and imprudent encounters in the description and screening procedure. In particular, a predefined dictionary that includes all the infrastructures corresponding to the fragments of the drug molecule. In case a fragment is present in the dictionary, its location on the user's device is set to "one"; Otherwise it is considered as "zero". The database of the complete fingerprint creates an effective way for the description of the drug molecular formation in the shape of binary fingerprint vectors. In this paper, a map of the chemical formation derived from the PubChem system at https://pubchem.ncbi.nlm.nih.gov/ is used. This scheme contains 881 molecular infrastructures. Therefore, the descriptors of the structure of drug molecular of features have used the 881-dimensional binary vector format 28 .

Feature extraction of proteins

One of the most significant phases in identifying DTIs is the extraction of important features from protein sequences. For this purpose, in this paper, various features from protein sequences have been extracted. These features include EAAC, EGAAC, DDE, TF-IDF, k-gram, BINA, PSSM, NUM, PsePSSM, PseAAC. The description and the feature extraction method of each is presented below:

Enhanced amino acid composition (EAAC)

This method was proposed by Chen et al. In this algorithm, protein sequence information is extracted and the amino acid frequency information is calculated based on it. This method is calculated based on the following equation:

In this relation, m shows the amino acids, n indicates various windows with different size, H(m,n) is the number of amino acids of type m and H(n) is the window longitude n 44 .

Enhanced grouped amino acid composition (EGAAC)

In this method, protein sequences are converted to numerical vectors based on their features. This method is an influential feature elicitation algorithm that is applied in bioinformatics study area namely, prediction of malonation sites, etc. 20 different sorts of amino acids are set into five groups regarding five physical and chemical features (physicochemical): The aliphatic group includes GAVLMI amino acids, the aromatic group includes GFYW amino acids, the positively charged group includes KRH amino acids, the negatively charged group includes DE amino acids, and the uncharged group includes STCPNQ amino acids. Depending on the basis of this grouping, the following equation is recommended for the calculation of EGAAC:

In this formula, H(g,n) demonstrates the number of amino acids in group g in window n and H(n) is equal to window longitude n. In this study, the window size is considered to be L-5 (L is length of proteins sequence) 44 .

Dipeptide deviation from the expected mean (DDE)

In 45 , which has been studied in the field of feature extraction based on amino acid composition, the Dipeptide Deviation method from the expected mean (DDE) has been proposed and developed in order to distinguish epitopes of a cell from non-epitopes by the use of this feature extraction method. For this purpose, the Dipeptide composition of a protein (DC) sequence is first calculated as follows:

In this regard, \({H}_{mm}\) is the amino acid pairs number mn and H is the amount of the protein sequence. The second step is to compute the theoretical mean (TM) and theoretical variance (TV) of a protein sequence as follows:

In this regard, \({C}_{m}\) is the codons number that encodes the first amino acid and \({C}_{n}\) is the number of codons that encodes the second amino acid, and \({C}_{H}\) is the aggregate of all probable codons.

At last, DDE is calculated according to DC, TM and TV values. The computation of the DDE feature vector is as follows 44 :

Term frequency-inverse document Frequency (TF-IDF)

The TF-IDF feature extraction method consists of two terms: TF, meaning term frequency, and IDF, which is called inverse document frequency. To obtain the TF-IDF equation, each of these two terms must be calculated separately and the product of the two terms must be multiplied. Each of the two terms is calculated as follows: TF (t, d) is the number of repetitions of the amino acid t over the total number of proteins. There are opinions, how to calculate this value as follows:

After calculating these two terms, the TF-IDF value is obtained based on the following equation 46 :

1-g is the specification of k-grams for which k is arranged to 1. The relative frequencies of all 21 sorts of amino acids (20 standard amino acids and the unreal code O when their length are not equal) are computed in 1-g applying the equation which is presented as follows:

where \({N}_{r}\) designates the number of amino acid r and N designates the longitude of the section. Consequently, a 21-dimensional vector would be achieved for each section 47 .

2-g computes the relative frequencies of all probable dipeptides in the sequence. The factors of the feature vector are described as:

where \({N}_{rs}\) declares the number of the dipeptide rs, N states the longitude of the section and N-1 shows the total number of dipeptides in the encoded section 47 .

Numerical representation for amino acids (NUM)

NUM aims to reverse sequences of amino acids into sequences of numerical values as by mapping amino acids in an alphabetical range: the 20 standard amino acids are given as 1, 2, 3, …, 20, and the unreal amino acid O is demonstrated as 21 47 .

The binary encoding of amino acids transforms per amino acid in a part to a 21-dimensional orthogonal binary vector. Not the same as NUM defined over, BINA indicates per amino acid as a 21-dimensional binary vector encoded by one ‘1’ and 20 ‘0’ factors. For example, alanine (‘A’) is demonstrated as 100,000,000,000,000,000,000, cysteine (‘C’) is demonstrated as 010000000000000000000, etc., when the dummy amino acid ‘O’ is demonstrated as 000000000000000000001 47 .

PSSM, or position-specific scoring matrix, is a kind of scoring matrix applied in BLAST protein surveys, where a score for an amino acid is assigned separately on the basis of its position in a sequence of several proteins. In general, this method extracts evolution-based features.

In this regard, L shows the size of the protein sequence, 20 shows the 20 amino acids, and Pi, j, the possibility of mutation of the amino acid ith to the amino acid jth in the process of biological development. Therefore, PSSM scores are demonstrated as positive or negative integers. Positive scores show that the presented amino acid replacement takes place at a greater rate than is accidentally expected, but negative scores manifest that replacement takes place not more than what is anticipated. PSSM contains protein sequence positional information and evolutionary information 46 .

PSSM which is described above, has two major problems as follows:

As protein sequence length changes, machine learning algorithms cannot handle them directly.

PSSM does not apply to the sequence order information.

To overcome these two problems, PSSM is replaced by PsePSSM.

PsePSSM or Pseudo Position-Specific Score Matrix can be calculated using the following formulas:

The \({n}_{th}\) rank correlation factor is shown by \({{p}_{j}}^{\mathrm{n}}\) which can be obtained through computing PSSM scores relating to two consecutive Amino Acid residues respecting j in one protein sequence.

\(\upvarepsilon\) is related to the amount of rank correlation factor which is needed to be less than the length of the smallest protein sequence 48 .

The concept of PseAAC or pseudo amino acid composition is representative of the advanced version of AAC. A sequence protein is demonstrated by P, and L represents Amino Acid residues.

PseAAC formula is calculated as follows:

AAC is a 20-dimensional array and each element of this array represents the number of each Amino Acid occurrence in the P sequence by the length L.

AAC has the problem of lacking sequence order data. So, when classifying there would be no chance of using a protein sequence. To overcome this problem, PseAAC is recommended which is a set of 20 + λ discrete factors. The first 20 factors in PseAAC can be equal to conventional AAC. Although factors from 20 + 1 to 20 + λ demonstrate various sequence order correlation factors. The number of λ factors can change and relate to the size of functions of Amino Acids that can be collected. Therefore with AAC, features can be elicited on the features such as mass, which can be different for various Amino Acids and can be calculated in the previous studies 49 . Extracted features from protein sequences are listed in Table 1 .

Combination of features

Regarding the fact that the goal is to identify DTI, the features relevant to drug and protein are combined and each pair is considered as a sample. If there is a connection between them, it is labeled "one". Otherwise, the label “zero” is assigned to them.

Feature selection

Because of the high number of features in each pair of drugs and proteins, giving rise to problems such as time complexity, as well as the possibility of model preprocessing, it is better to select the related features and remove the unrelated ones by the use of feature-selecting methods. Thus, at this stage, the IWSSR method is used to reduce the number of input variables for developing the prediction model. Hence, duplicated, irrelevant, and noisy features are discarded since they enhance the complexity of the model and make it harder to predict DTI. Moreover, they make the training of the model more difficult, and therefore the results of the predictions will not be reliable.

In this step, applying the IWSSR hybrid algorithm, the effective features are looked for in the space of features. The IWSSR algorithm, which is an expansion of the IWSS algorithm, is one of the algorithms for selecting a feature subcategory based on the wrapper. In this strategy, first of all, in the filter level, the relationship per feature to the class labels is computed and weight is related to each feature. In IWSSR, the SU standard is applied to weight features. SU is a standard based on nonlinear information theory. This standard assesses each feature separately and allocates a number to each of them in the range of [1 and 0] that indicates the weight of every feature according to its class label. The vast amount shows the great significance of the feature. This standard is computed as follows:

where C is the class label, Fi shows the ith feature, and H represents the entropy. Next, in the wrapper step, the features are set in decreasing manner based on their weight. An additional method is then applied to choose a subcategory of features. Figure  2 reveals the pseudo-code of the IWSSR algorithm. In this algorithm, S is the candidate subcategory of the chosen features. Initially, the selected subcategory is empty, and in the first repetition, the feature with the highest rank is joined to the selected subcategory.

figure 2

IWSSR pseudo-code algorithm 50 .

After that, a classifier is taught on the basis of the selected subcategory and the training data. Classification accuracy is kept as the greatest outcome obtained. The next step is done in two levels; in the first level, a high-ranking feature that has not been assessed yet is substituted with every feature in the selected set. After per replacement, a new classifier is trained applying the gained subcategory. The accuracy of the classifier is then computed. If the supplement of a recent feature increases the accuracy of the classifier in comparison with the former subcategory, the obtained outcome is retained as the greatest one. In this way, the dependency of the selected feature is measured with the previously chosen features, and if it is not dependent on any of the chosen features, it will be joined to the selected subcategory. In the next level, the investigated feature (the feature that was substituted by the features of the chosen subcategory in the first level) is joint to the chosen subcategory S (gained in the preceding level) and a recent classifier is trained on the basis of the recent subcategory, and the accuracy of the classifier is computed. If the accuracy of the subcategory is better than the accuracy of the elected subcategory in the first level, it will be kept as the greatest obtained outcome. After the first and second levels, if we achieve a greater subcategory in every level, the most satisfactory subcategory is chosen as the subcategory of this cycle (repetition) and the desired feature is used in the chosen subcategory 50 .

Classification of features

The classifier used in this article is Rotation Forest. Due to the fact that this classifier has diverse parameters to be adjusted, the Cross-validation K-Fold method or passing evaluation is used to adjust the parameters of the classification model. Rotation Forest is a classification method that is mainly applied in supervised learning. This method was first offered by Rodriguez et al. 35 and its prophesy accuracy is similar to that of an Ensemble learning classifier. In the Rotation Forest algorithm, the feature set S is split into K size of subcategories by chance, and the Bootstrap prototyping technique is used to train 75% of the genuine samples in every feature subcategory so that the sparse rotation matrix is obtained. The classifier is then built in several steps applying matrix features. The work of the Rotation Forest algorithm is on the basis of feature transfer and feature selection, and concentrates on improving the accuracy and the difference of the base classifiers. The Principal Component Analysis (PCA) method is applied to do feature deformation in all the split subcategories whose aim is to store data effectively. Not only does this method distinguish per subcategory from the other, but it also plays an important task in data preprocessing. Thus, Rotation Forest can develop Ensemble variety and increase the accuracy of the foundation classifier. Assume that W = [ \({W}_{1}\) , \({W}_{2}\) ,…, \({W}_{n}\) ] includes n features of a sample. We consider W as a set of training samples whose amount is N * n. N indicates the number of samples. Assume H as a range of features, assuming the corresponding label is Y = [ \({Y}_{1}\) , \({Y}_{2}\) ,…, \({Y}_{n}\) ] ^ T. The feature set is split into K non uniform subcategories by chance. Assume that the number of decision trees is equal to L, which can be represented as \({T}_{1}\) , \({T}_{2}\) ,…, \({T}_{L}\) , respectively. The steps for building a Rotation Forest classifier are as follows (Fig.  3 ):

Choose the appropriate parameter for K; the feature set H is split into K subcategory (s) by chance where per subcategory includes (n/K) features.

\({H}_{ij}\) represents the \({j}_{th}\) subcategory of the training subcategory that is applied to train the ith classifier ( \({T}_{i})\) . For every subcategory, a recent \({W}_{ij}\) training set is made after a re-sampling from bootstrap, with 75% of the W training set.

To produce the coefficients in the effective \({P}_{ij}\) matrix, principal component analysis (PCA) is used on \({W}_{ij}\) that is an M * 1 matrix. \({P}_{ij}\) is displayed as \({B}_{ij}\) (1),…, \({B}_{ij}\) ( \({M}_{j}\) ).

The coefficients obtained in the \({P}_{ij}\) matrix have formed a sparse rotation matrix called \({R}_{i}\) , which is shown below:

figure 3

Rotation forest 51 .

At the time of prediction, using the sample ω, \({d}_{ij}\) in (x \({R}_{i}^{a}\) ) is considered as a probability that predicts whether ω belongs to λj or not by using the Ti classifier. Then the level of trust in the class is calculated using the average combination, the formula of which is as follows:

The category with the highest probability will be considered as a test sample x 36 , 37 .

Predicting the new DTI

The final step is to predict interactions. In the end, after training the Rotation Forest model, the model is used to predict the new DTI. On the basis of the chosen evaluation criteria, which are described in detail in “ The Results ” section, acceptable results have been obtained from this step.

The results

Evaluation criteria.

In this paper, we have applied 4 evaluation criteria to evaluate the efficiency of the proposed method. These criteria include accuracy (Acc), sensitivity (Sen), precision (Pre), and Matthew correlation coefficient (MCC), which are calculated as follows:

In addition, Receiver Operating Characteristic curves (ROCs) have been used to describe the results, and the space under the curve (AUC) has been computed to confirm the possibility of making predictions 36 .

This study has applied the Gold Standard data set utilized by Yamanishi et al. 52 as a Benchmark dataset downloaded from http://web.kuicr.kyoto-u.ac.jp/supp/yoshi/drugtarget/ . In the Gold Standard Database, information on DTIs is gained from the KEGG BRITE, BRENDA, Super-Target, and DrugBank datasets. This dataset is split into four major datasets including enzymes, ion channels (IC), G-protein-coupled receptors (GPCR), and nuclear receptors (NR). The number of understood drugs in these datasets are 445, 210, 223, and 54, in the order given; and the number of known proteins in these datasets are 664, 204, 95, and 26, in the order given. After precise testing of these drugs and proteins, an amount of 5,127 pairs of DTIs were gained, and the number of interactions between drug and protein couples known so far in each dataset was 2926, 1476, 635, and 90, in the order given. Extended information on drugs and proteins is available from the KEGG database before further analysis 53 , 54 . Each protein is displayed using an amino acid sequence and after that stored in a text file. The chemical form of every drug molecule is converted to the Mol file format, after which the file format is downloaded. The information of the datasets applied in this article is presented in Table 2 35 .

Results from different features

As stated above, in order to predict DTIs precisely, different features must be extracted from the protein-drug sequence. Given that the purpose of this paper is to extract the effective features of the protein sequence, the extracted features are analyzed in this section. In this paper, 10 feature-extraction methods are applied to protein sequences and extract different kinds of protein features.

In order to evaluate the extracted features by each method, the rotation forest model is trained using each of the EAAC, EGAAC, DDE, TF-IDF, K-gram, BINA, PSSM, PsePSSM, PseAAC and NUM features on the basis of cross-validation with the value of k = 10. The results of this experiment are demonstrated on Enzyme data set in Table 3 .

As evident in Table 3 , the features extracted by PsePSSM have greater differentiating power and have a higher detection rate in the whole data set. Moreover, PSSM, PseAAC and BINA methods have acceptable performance too. Each of these features represents a pattern of data that makes the classification model identify interactions well.

In order to compare the extracted features, the ROC diagram in Fig.  4 is drawn for 5 types of features by the use of different methods. In this diagram, it is also obvious that the PSSM feature performs better than the other ones and has a higher area under the diagram. The TF-IDF method had lower performance compared with the other methods. On the basis of the results of Fig.  4 and Table 3 , it can be concluded that the combination of diverse features improves the performance of the classification model in identifying DTIs.

figure 4

ROC diagram for the comparison of the five features.

For this purpose, the extracted features are combined in various modes, and the classifier is trained and tested on the basis of the combination of features. Among the various modes, three had better performance. In the first mode, the features related to the methods (PSSM, EGAAC, EAAC) are combined and the resulting feature vector has 2125 features. In the second mode, the features relevant to PSSM, EGAAC, EAAC, DDE, BINA methods are combined and the feature vector length is 4625, and in the third mode, the features pertinent to PSSM, EGAAC, EAAC, DDE, BINA, K-gram, TF-IDF, NUM, PsePSSM, PseAAC methods are combined. In this mode, the resulting feature vector length contains 6293 features. As it is evident, in all these three modes, the performance of the classification model is greater than the mode before the combination of features. This indicates that the variety of features increases the efficiency of the models. On the other side, in the second mode, the performance of most classification methods is better than that of the third ones. In the second mode, the features are combined well. However, in the first one, there are still some related features that are not included in the combination; hence, the accuracy of the model does not increase much. In addition, in the third mode, since the number of features shows an excessive increase, the model has been over-fitted and the accuracy of the model has been decreased. Therefore, it is better to identify the effective and relevant features and remove the unrelated and noise ones via selecting features. Table 4 shows the results on different categories, without feature selection. The comparison has done on SVM 32 , RF 35 , XGBoost 55 , and DNN 13 classifiers.

As evident in Table 4 , all features are combined with the purpose of selecting the effective ones. Then, important features are selected using the IWSSR method. The number of the selected features varies in different datasets. By the use of the IWSSR method, 22 features have been selected in the enzyme dataset, 30 features in the ion channel dataset, 27 features in the GPCR dataset, and 18 features in the nuclear receptor set. This number of features is much less compared with the main ones. In addition, the performance of the classification model is substantially enhanced on various datasets. This indicates that the IWSSR method has prevented the over-fitting of the classification models and has selected the related features in the prediction of interactions. Table 5 shows the results of feature selection on different classifiers.

Error analysis is carried out to show stability and resistivity of the model. The error bar shows estimated errors in order to attain a deeper understanding of the measurements. Generally, error bars are utilized to show the standard error, standard deviation, or minimum/maximum values in a data. The size of the error bars shows the uncertainty in the measurements. A small error bar indicates the certainty and significance of the measurements whilst a long error bar addresses sparsity and a lesser number of data values. The accuracies of the models via a tenfold cross-validation are showed out in Fig.  5 for the underlying datasets. As evident from Fig.  5 , RF has outperformed the others, and SVM and DNN depicts the highest error regarding the lengths of the bars. This shows that RF results are more reliable and meaningful.

figure 5

Studying classification models based on error bars for underlying datasets.

For better evaluation, the proposed method, AUROC curves for different classifiers on the basis of the proposed features are shown in Fig.  6 , respectively. As it is clear from the results, on the basis of the selected features, the Rotation Forest classifier has a better performance in comparison with the other methods. This is because the selected features have a good distinguishing feature. In addition, since the Rotation Forest classifier selects the most suitable features for constructing trees, it turns out to be well-generalizable. According to the figures, it is apparent that other classifiers have acceptable performance as well.

figure 6

ROC curves of different classifiers on the data sets.

In order to better evaluation, in this paper, each dataset is divided into two datasets; a test dataset and an independent dataset. 90% of the original data is chosen randomly for the training and test dataset and 10% for the independent dataset. For this purpose, the training dataset is used to train, and test data is used to evaluate and justify the proposed method, and the independent dataset is applied for final performance evaluation of the proposed method. The results of these experiments are shown in Table 6 . The results approve that the proposed method is robust and it has high accuracy rate. Therefore, the method can be used to classify new-drug, new-target, and new drug-new target with high accuracy.

Comparison with other methods

For better evaluation, the proposed method has been compared to the other available methods that have utilized the mentioned data set. The results of this experiment are shown in Table 7 . The compared methods have extracted various features from the protein sequence and used different classifiers. As evident, the values of Acc, Sn, Sp, and MCC of the proposed method are the best ones. In the enzyme dataset, the proposed accuracy rate is 98.12, which is at least 0.8 and at most 9% better than the other methods. This efficiency can also be seen in other data sets. This represents that the extracted and selected features have absolutely good differentiating power.

One of the reasons that our proposed method is better, compared to other methods, is that our method offers better features by defining and selecting the features that end in more accurate results. In fact, our method observes specificity and sensibility and also considers balance in classes. Hence, bias is not towards the majority class. Unlike Reference 4, where one of its specificity is 87 and its sensibility is 90, in our method, these two do not make so much difference. That is, it doesn’t care what data is used.

In this paper, a DTI prediction based on protein features, using wrapper feature selection was proposed. This machine learning model consisted of three phases, including feature extraction, feature selection, and classification. In the first phase, it would extract different features such as EAAC, PSSM and etc. from sequence of proteins information and fingerprint information from drugs. These extracted features would then be combined. In the next step, one of the wrapper feature selection methods named IWSSR, due to the large amount of extracted data, is applied. The selected features are then given to Rotation Forest classifier, to have more efficient prediction. Actually, the innovation of our work is that we define the features; and then select a feature selection method such as IWSSR. The results of experiments indicate that the proposed model has an acceptable rate in DTI prediction and is compatible with the proposed methods in other papers.

Data availability

This study has applied the Gold Standard data set utilized by Yamanishi et al. 52 as a Benchmark dataset downloaded from http://web.kuicr.kyoto-u.ac.jp/supp/yoshi/drugtarget/ .

Bagherian, M. et al. Machine learning approaches and databases for prediction of drug–target interaction: A survey paper. Brief. Bioinform. 22 (1), 247–269 (2021).

Article   PubMed   Google Scholar  

Li, Y., Huang, Y. A., You, Z. H., Li, L. P. & Wang, Z. Drug–target interaction prediction based on drug fingerprint information and protein sequence. Molecules 24 (16), 2999 (2019).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Zeng, X. et al. Network-based prediction of drug–target interactions using an arbitrary-order proximity embedded deep forest. Bioinformatics 36 (9), 2805–2812 (2020).

Mohamed, S. K., Nováček, V. & Nounu, A. Discovering protein drug targets using knowledge graph embeddings. Bioinformatics 36 (2), 603–610 (2020).

Article   CAS   PubMed   Google Scholar  

Sachdev, K. & Gupta, M. K. A comprehensive review of feature based methods for drug target interaction prediction. J. Biomed. Inform. 93 , 103159 (2019).

Rognan, D. Chemogenomic approaches to rational drug design. Br. J. Pharmacol. 152 (1), 38–52 (2007).

Peska, L., Buza, K. & Koller, J. Drug–target interaction prediction: A Bayesian ranking approach. Comput. Methods Programs Biomed. 152 , 15–21 (2017).

Wu, Z. et al. SDTNBI: An integrated network and chemoinformatics tool for systematic prediction of drug–target interactions and drug repositioning. Brief. Bioinform. 18 (2), 333–347 (2017).

CAS   PubMed   Google Scholar  

Nath, A., Kumari, P. & Chaube, R. Prediction of human drug targets and their interactions using machine learning methods: current and future perspectives. Comput. Drug Discov. Des. 21–30 (2018).

Güvenç Paltun, B., Mamitsuka, H. & Kaski, S. Improving drug response prediction by integrating multiple data sources: matrix factorization, kernel and network-based approaches. Brief. Bioinform. 22 (1), 346–359 (2021).

Rifaioglu, A. S. et al. Recent applications of deep learning and machine intelligence on in silico drug discovery: Methods, tools and databases. Brief. Bioinform. 20 (5), 1878–1912 (2019).

Kuppala, K., Banda, S. & Barige, T. R. An overview of deep learning methods for image registration with focus on feature-based approaches. Int. J. Image Data Fusion 11 (2), 113–135 (2020).

Article   ADS   Google Scholar  

Huang, K., Xiao, C., Glass, L. M. & Sun, J. MolTrans: Molecular interaction transformer for drug–target interaction prediction. Bioinformatics 37 (6), 830–836 (2021).

Nguyen, T. et al. GraphDTA: Predicting drug–target binding affinity with graph neural networks. Bioinformatics 37 (8), 1140–1147 (2021).

Chen, L. et al. TransformerCPI: Improving compound–protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments. Bioinformatics 36 (16), 4406–4414 (2020).

Lee, K. & Kim, D. In-silico molecular binding prediction for human drug targets using deep neural multi-task learning. Genes 10 (11), 906 (2019).

Rayhan, F., Ahmed, S., Mousavian, Z., Farid, D. M. & Shatabda, S. FRnet-DTI: Deep convolutional neural network for drug–target interaction prediction. Heliyon 6 (3), e03444 (2020).

Wang, S., Du, Z., Ding, M., Zhao, R., Rodriguez-Paton, A. & Song, T. LDCNN-DTI: A novel light deep convolutional neural network for drug–target interaction predictions. In 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 1132–1136 (2020).

Monteiro, N. R., Ribeiro, B. & Arrais, J. Drug–target interaction prediction: End-to-end deep learning approach. IEEE/ACM Trans. Comput. Boil. Bioinform. 18 , 2364–2374 (2020).

Article   Google Scholar  

Sun, C., Cao, Y., Wei, J. M. & Liu, J. Autoencoder-based drug–target interaction prediction by preserving the consistency of chemical properties and functions of drugs. Bioinformatics 37 (20), 3618–3625 (2021).

Article   CAS   Google Scholar  

Sajadi, S. Z., Zare Chahooki, M. A., Gharaghani, S. & Abbasi, K. AutoDTI++: Deep unsupervised learning for DTI prediction by autoencoders. BMC Bioinform. 22 (1), 1–19 (2021).

Peng, J. et al. An end-to-end heterogeneous graph representation learning-based framework for drug–target interaction prediction. Brief. Bioinform. 22 (5), 430 (2021).

Ammad-Ud-Din, M. et al. Integrative and personalized QSAR analysis in cancer by kernelized Bayesian matrix factorization. J. Chem. Inf. Model. 54 (8), 2347–2359 (2014).

Cichonska, A. et al. Learning with multiple pairwise kernels for drug bioactivity prediction. Bioinformatics 34 (13), i509–i518 (2018).

Sridhar, D., Fakhraei, S. & Getoor, L. A probabilistic approach for collective similarity-based drug–drug interaction prediction. Bioinformatics 32 (20), 3175–3182 (2016).

Spaen, Q. P. Applications and Advances in Similarity-Based Machine Learning (University of California, 2019).

Google Scholar  

Thafar, M. A. et al. DTiGEMS+: Drug–target interaction prediction using graph embedding, graph mining, and similarity-based techniques. J. Cheminform. 12 (1), 1–17 (2020).

Shim, J., Hong, Z. Y., Sohn, I. & Hwang, C. Prediction of drug–target binding affinity using similarity-based convolutional neural network. Sci. Rep. 11 (1), 1–9 (2021).

Mei, S. & Zhang, K. A multi-label learning framework for drug repurposing. Pharmaceutics 11 (9), 466 (2019).

Mathai, N. & Kirchmair, J. Similarity-based methods and machine learning approaches for target prediction in early drug discovery: Performance and scope. Int. J. Mol. Sci. 21 (10), 3585 (2020).

Zhou, L. et al. Revealing drug–target interactions with computational models and algorithms. Molecules 24 (9), 1714 (2019).

Wang, C. et al. Predicting drug–target interactions with electrotopological state fingerprints and amphiphilic pseudo amino acid composition. Int. J. Mol. Sci. 21 (16), 5694 (2020).

Sorkhi, A. G., Abbasi, Z., Mobarakeh, M. I. & Pirgazi, J. Drug–target interaction prediction using unifying of graph regularized nuclear norm with bilinear factorization. BMC Bioinform. 22 (1), 1–23 (2021).

Mahmud, S. H. et al. PreDTIs: prediction of drug–target interactions based on multiple feature information using gradient boosting framework with data balancing and feature selection techniques. Brief. Bioinform. 22 (5), bbab046 (2021).

Shi, H. et al. Predicting drug–target interactions using Lasso with random forest based on evolutionary information and chemical structure. Genomics 111 (6), 1839–1852 (2019).

Zhao, Z. Y., Huang, W. Z., Zhan, X. K., Pan, J., Huang, Y. A., Zhang, S. W. & Yu, C. Q. An ensemble learning-based method for inferring drug–target interactions combining protein sequences and drug fingerprints. BioMed. Res. Int. 2021 (2021).

Zhan, X., You, Z., Yu, C., Li, L. & Pan, J. Ensemble learning prediction of drug–target interactions using GIST descriptor extracted from PSSM-based evolutionary information. BioMed. Res. Int. 2020 (2020).

Wang, Z. et al. Prediction of protein–protein interactions from protein sequences by combining matpca feature extraction algorithms and weighted sparse representation models. Math. Probl. Eng. 2020 , 1–11 (2020).

Li, Y. et al. An ensemble classifier to predict protein–protein interactions by combining PSSM-based evolutionary information with local binary pattern model. Int. J. Mol. Sci. 20 (14), 3511 (2019).

Mahmud, S. H. et al. Prediction of drug–target interaction based on protein features using undersampling and feature selection techniques with boosting. Anal. Biochem. 589 , 113507 (2020).

Karasev, D., Sobolev, B., Lagunin, A., Filimonov, D. & Poroikov, V. Prediction of protein–ligand interaction based on sequence similarity and ligand structural features. Int. J. Mol. Sci. 21 (21), 8152 (2020).

Article   PubMed   PubMed Central   Google Scholar  

Pliakos, K. & Vens, C. Drug–target interaction prediction with tree-ensemble learning and output space reconstruction. BMC Bioinform. 21 (1), 1–11 (2020).

Agyemang, B. et al. Multi-view self-attention for interpretable drug–target interaction prediction. J. Biomed. Inform. 110 , 103547 (2020).

Wang, M. et al. DeepMal: Accurate prediction of protein malonylation sites by deep neural networks. Chemom. Intell. Lab. Syst. 207 , 104175 (2020).

Saravanan, V. & Gautham, N. Harnessing computational biology for exact linear B-cell epitope prediction: a novel amino acid composition-based feature descriptor. Omics J. Integr. Boil. 19 (10), 648–658 (2015).

Ezzat, A., Wu, M., Li, X. L. & Kwoh, C. K. Computational prediction of drug–target interactions using chemogenomic approaches: An empirical survey. Brief. Bioinform. 20 (4), 1337–1357 (2019).

Zhang, Y. et al. Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework. Brief. Bioinform. 20 (6), 2185–2199 (2019).

Akbar, S. et al. iHBP-DeepPSSM: Identifying hormone binding proteins using PsePSSM based evolutionary features and deep learning approach. Chemom. Intell. Lab. Syst. 204 , 104103 (2020).

Javed, F. & Hayat, M. Predicting subcellular localization of multi-label proteins by incorporating the sequence features into Chou’s PseAAC. Genomics 111 (6), 1325–1332 (2019).

Pirgazi, J., Alimoradi, M., Esmaeili Abharian, T. & Olyaee, M. H. An efficient hybrid filter-wrapper metaheuristic-based gene selection method for high dimensional datasets. Sci. Rep. 9 (1), 1–15 (2019).

Pirgazi, J., Khanteymoori, A. R. & Jalilkhani, M. GENIRF: An algorithm for gene regulatory network inference using rotation forest. Curr. Bioinform. 13 (4), 407–419 (2018).

Yamanishi, Y., Araki, M., Gutteridge, A., Honda, W. & Kanehisa, M. Prediction of drug–target interaction networks from the integration of chemical and genomic spaces. Bioinformatics 24 (13), i232–i240 (2008).

Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44 , D457–D462 (2016).

Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28 , 27–30 (2000).

Kumar, M., & Kumar, M. XGBoost: 2D-object recognition using shape descriptors and extreme gradient boosting classifier. In Computational Methods and Data Engineering 207–222. Springer (2021).

Download references

Author information

Authors and affiliations.

Faculty of Computer and Information Technology Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran

Hengame Abbasi Mesrabadi

Department of Electrical Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran

Department of Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran

Jamshid Pirgazi

You can also search for this author in PubMed   Google Scholar


H.A.M., K.F. and J.P. designed the research. H.A.M wrote and performed computer programs. H.A.M. and J.P. analyzed the results. H.A.M. wrote the first version of the manuscript. K.F. and J.P. revised and edited the manuscript.

Corresponding author

Correspondence to Karim Faez .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and Permissions

About this article

Cite this article.

Abbasi Mesrabadi, H., Faez, K. & Pirgazi, J. Drug–target interaction prediction based on protein features, using wrapper feature selection. Sci Rep 13 , 3594 (2023). https://doi.org/10.1038/s41598-023-30026-y

Download citation

Received : 19 November 2022

Accepted : 14 February 2023

Published : 03 March 2023

DOI : https://doi.org/10.1038/s41598-023-30026-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

scientific method paper introduction


  1. Example of introduction in research paper apa

    scientific method paper introduction

  2. Paper Scientific Method Example

    scientific method paper introduction

  3. 😱 Scientific method paper. Scientific method. 2019-01-23

    scientific method paper introduction

  4. Example Of Scientific Paper : Example Of Scientific Paper Studocu : Writing a scientific paper

    scientific method paper introduction

  5. Scientific Method Research Paper Examples : 002 Discussion Section Of Scientific Research Paper

    scientific method paper introduction

  6. Scientific Method Research Paper Example

    scientific method paper introduction


  1. How to write a scientific paper: introduction (Nicoletta Di Blas)

  2. essay semple method paper flower #shorts #video

  3. essay semple method paper Rose flower #shorts #video #how to make paper flower Rose

  4. Scientific Method

  5. origami paper boat//easy method paper boat making//origami

  6. Introduction to physics


  1. Writing an Introduction for a Scientific Paper

    This section provides guidelines on how to construct a solid introduction to a scientific paper including background information, study question, biological rationale, hypothesis, and general approach. If the Introduction is done well, there should be no question in the reader's mind why and on what basis you have posed a specific hypothesis.

  2. Writing a Research Paper Introduction

    The five steps in this article will help you put together an effective introduction for either type of research paper. Table of contents Step 1: Introduce your topic Step 2: Describe the background Step 3: Establish your research problem Step 4: Specify your objective (s) Step 5: Map out your paper Research paper introduction examples

  3. The scientific method (article)

    The scientific method has five basic steps, plus one feedback step: Make an observation. Ask a question. Form a hypothesis, or testable explanation. Make a prediction based on the hypothesis. Test the prediction. Iterate: use the results to make new hypotheses or predictions.

  4. Writing a Scientific Paper: INTRODUCTION

    The introduction supplies sufficient background information for the reader to understand and evaluate the experiment you did. It also supplies a rationale for the study. Goals: • Present the problem and the proposed solution • Presents nature and scope of the problem investigated • Reviews the pertinent literature to orient the reader

  5. Scientific method

    scientific method, mathematical and experimental technique employed in the sciences. More specifically, it is the technique used in the construction and testing of a scientific hypothesis. The process of observing, asking questions, and seeking answers through tests and experiments is not unique to any one field of science.

  6. How to Write the Introduction to a Scientific Paper?

    A scientific paper should have an introduction in the form of an inverted pyramid. The writer should start with the general information about the topic and subsequently narrow it down to the specific topic-related introduction. Fig. 17.1 Flow of ideas from the general to the specific Full size image 6 What Does Occupying a Niche Mean?

  7. Introduction, Methods and Results

    The Introduction should provide readers with the background information needed to understand your study, and the reasons why you conducted your experiments. The Introduction should answer the question: what question/problem was studied? While writing the background, make sure your citations are:

  8. Steps of the Scientific Method

    The six steps of the scientific method include: 1) asking a question about something you observe, 2) doing background research to learn what is already known about the topic, 3) constructing a hypothesis, 4) experimenting to test the hypothesis, 5) analyzing the data from the experiment and drawing conclusions, and 6) communicating the results to …

  9. Sample Paper in Scientific Format

    Sample Paper in Scientific Format Sample Paper in Scientific Format Posted on August 20, 2017 Biology 151/152 The sample paper below has been compressed into the left-hand column on the pages below. In the right-hand column we have included notes explaining how and why the paper is written as it is.

  10. Writing a Scientific Paper: METHODS

    Writing a Scientific Paper: METHODS Discussion of how to understand and write different sections of a scientific paper. Discussions of how to write Abstract, Introduction, Methods, Data, and Discussion. URL: https://guides.lib.uci.edu/scientificwriting Writing a "good" methods section


    Introduction to Scientific Method Lesson #1 Page 3 Activity 1 Lecture/Notes on the Scientific Method - use overhead to go through notes outline with students. The Scientific Method The scientific method is a systematic approach to gather knowledge to answer questions about the world we live in. Steps of the Scientific Method: 1. Observations

  12. A Complete Guide on How to Write a Scientific Paper

    A scientific paper is a manuscript that reports scientific findings to the public. Scientists publish research pieces in scientific journals, and you have probably come across several scientific papers while doing your homework. These pieces are usually 3,000 - 10,000 words long.

  13. How to Write the Methods Section of a Scientific Article

    The Methods section of a research article includes an explanation of the procedures used to conduct the experiment. For authors of scientific research papers, the objective is to present their findings clearly and concisely and to provide enough information so that the experiment can be duplicated.

  14. How To Write A Lab Report

    A lab report conveys the aim, methods, results, and conclusions of a scientific experiment. The main purpose of a lab report is to demonstrate your understanding of the scientific method by performing and evaluating a hands-on lab experiment. This type of assignment is usually shorter than a research paper.

  15. Scientific Writing Made Easy: A Step‐by‐Step Guide to Undergraduate

    Clear scientific writing generally follows a specific format with key sections: an introduction to a particular topic, hypotheses to be tested, a description of methods, key results, and finally, a discussion that ties these results to our broader knowledge of the topic (Day and Gastel 2012 ).

  16. Paper Rockets to Learn the Scientific Method

    This lesson is designed to guide your students through the steps of the scientific method (Figure 1) using a fun, hands-on project: paper rockets. You can read about the scientific method, or assign your students to read about it, in much more detail in this guide. Figure 1. Steps of the scientific method. Your students will build small rockets ...

  17. Scientific Papers

    In the Introduction section, state the motivation for the work presented in your paper and prepare readers for the structure of the paper. Write four components, probably (but not...

  18. Drug-target interaction prediction based on protein features, using

    Drug-target interaction prediction is a vital stage in drug development, involving lots of methods. Experimental methods that identify these relationships on the basis of clinical remedies are ...

  19. 4 Step approach to writing the Introduction section of a research paper

    To learn in more detail the guidelines to write a great Introduction section, check out this course: How to write a strong introduction for your research paper References: 1. Araújo C G. 2014. Detailing the writing of scientific manuscripts: 25-30 paragraphs. Arquivos Brasileiros de Cardiologia 102 (2): e21-e23 2. Boxman R and Boxman E. 2017.